You are on page 1of 536

PROBABILITY THEORY

WITH APPLICATIONS

Second Edition
Mathematics and Its A ~ ~ l i c a t i o n s

Managing Editor:

M. HAZEWINKEL
Centrefor Mathematics and Computer Science, Amsterdam, The Netherlands

Volume 582
PROBABILITY THEORY
WITH APPLICATIONS
Second Edition

M.M. RAO
University of California, Riverside, California

R.J. SWIFT
California State Polytechnic University, Pomona, California

- Springer
Library of Congress Control Number: 2005049973

Printed on acid-free paper.

AMS Subject Classifications: 60Axx, 60Exx, 60Fxx, 60Gxx, 62Bxx, 62Exx, 62Gxx, 62Mxx, 93Cxx

0 2006 Springer Science+Business Media, Inc.


All rights rcscrvcd. This work may not bc translatcd or copicd in wholc or in part without thc writtcn
permission of the publisher (Springer Science+Business Media, Tnc., 233 Spring Street, New York, NY
10013, USA), cxccpt for bricf cxccrpts in conncction with rcvicws or scholarly analysis. Usc in
conncction with any form of information storagc and rctricval, clcctronic adaptation, computcr software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not idcntificd as such, is not to bc takcn as an cxprcssion of opinion as to whcthcr or not thcy arc subjcct
to proprietary rights.

Printed in the United States of America.


To the memory of my brother-in-law,
Raghavayya V. Kavuri
M.M.R.

To the memory of my parents,


Randall and Julia Swift
R.J.S.
Contents

Preface to Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix

Preface to First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv


.
List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Part I .Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 Background Material and Preliminaries . . . . . . . . . . . . . . . . . . . . 3


1.1 What is Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Random Variables and Measurability Results . . . . . . . . . . . . . . . 7
1.3 Expectations and the Lebesgue Theory . . . . . . . . . . . . . . . . . . . . . 12
1.4 Image Measure and the Fundamental Theorem of Probability . 20
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
.
2 Independence and Strong Convergence . . . . . . . . . . . . . . . . . . . . 33
2.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Coilvergelice Concepts, Series and Inequalities . . . . . . . . . . . . . . . 46
2.3 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
2.4 Applications to Empiric Distributions. Densities. Queueing.
andRandom Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3 Conditioning and Some Dependence Classes . . . . . . . . . . . . . . . 103


3.1 Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.2 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.3 MarkovDependeiice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .140 .
3.4 Existelice of Various Random Families . . . . . . . . . . . . . . . . . . . . 158 .
3.5 Martingale Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174 .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Part I1 .Analytical Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221


.
4 Probability Distributions and Characteristic Functions . . . . 223
4.1 Distribution Functioiis and the Selection Principle . . . . . . . . . . .223
4.2 Characteristic Functions. Inversion. aiid Lkvy's Continuity
Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
4.3 Cramkr's Theorem on Fourier Transforms of Signed Measures . 251
4.4 Bochner's Theorem on Positive Definite Functions . . . . . . . . . . . 256
viii Contents

4.5 Some Multidimensional Extensions . . . . . . . . . . . . . . . . . . . . . . . . 265


4.6 Equivalence of Convergences for Sums of Independent
Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
..
5 Weak Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
5.1 Classical Central Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . .291 .
5.2 Infinite Divisibility and the Lkvy-Khintchine Formula . . . . . . . .304
5.3 General Limit Laws, Including Stability . . . . . . . . . . . . . . . . . . . 318 .
5.4 Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
5.5 Kolmogorov's Law of the Iterated Logarithm . . . . . . . . . . . . . . . . 364
5.6 Application t o a Stochastic Difference Equation . . . . . . . . . . . . . 375
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
..
Part I11 .Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409

6 Stopping Times. Martingales. and Convergences . . . . . . . . . . .411


6.1 Stopping Times and Their Calculus . . . . . . . . . . . . . . . . . . . . . . . . 411
6.2 Wald's Equation and an Application . . . . . . . . . . . . . . . . . . . . . . . 415
6.3 Stopped Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .420 .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
..
7 Limit Laws for Some Dependent Sequences . . . . . . . . . . . . . . . . 429
7.1 Central Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .429 .
7.2 Limit Laws for a Random Number of Random Variables . . . . . .436
7.3 Ergodic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .449 .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

8 A Glimpse of Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . 459 .


8.1 Brownian Motion: Definition and Construction . . . . . . . . . . . . . .459
8.2 Some Properties of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . 463
8.3 Law of the Iterated Logarithm for Brownian Motion . . . . . . . . . 467
8.4 Gaussian and General Additive Processes . . . . . . . . . . . . . . . . . . .470
8.5 Second-Order Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
..
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .509
..
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .523


..
Preface to Second Edition

The following is a revised aiid somewhat enlarged account of Probability


Theory with Applications, whose basic aim as expressed in the preface to the
first edition (appended here) is maintained. In this revision, the material and
presentation is better highlighted with several (small and large) alterations
made to each chapter. We believe that these additions make a better text for
graduate students and also a reference work for a later study. We now discuss
in some detail the subject of this text, as modified here. It is hoped that this
will provide an appreciation for the view-point of this edition, as well as the
earlier one, published over two decades ago.
In the present setting, the work is organized into three parts, the first
being on the foundations of the subject, consists of Chapters 1-3. The second
part concentrates on the analytical aspects of probability in relatively large
chapters 4-5. The final part in Chapters 6-8 treats some serious and deep
applications of the subject. The point of view presented here has the following
focus. Parts I and I1 can be essentially studied independently with only cursory
cross-references. Each part could easily be used for a quarter or semester
long beginning graduate course in Probability Theory. The prerequisite is
a graduate course in Real Analysis, although it is possible to study the two
subjects concurrently. Each of these parts of this text also has applications and
ideas some of which are discussed as problems that illustrate as well as extend
the basic subject. The final part of the text can be used for a follow-up course
on the preceding material or for a seminar thereafter. Numerous suggestions
for further study aiid even several research problems are pointed out. We now
detail some of these points for a better view of the treatment which is devoted
to the mathematical content, avoiding nonmathematical views and concepts.
To accommodate the new material and not to substantially increase the
size of the volume, we had t o omit most of the original Chapter 6 and part
of Chapter 7. Thus this new version has eight chapters, but it is still well
x Preface t o Second Edition

focused and the division into parts makes the work more useful. We now turn
t o explaining the new format.
The first part, on foundations, treats the two fundamental ideas of prob-
ability, independelice and conditioning. In Chapter 1 we recall the necessary
results from Real Analysis which we recoininelid for a perusal. It is also iin-
portant that readers take a careful look at the fundamental law of probability
and the basic uniform continuity of characteristic functions.
Chapter 2 undertakes a serious study of (statistical) independence, which
is a distinguishing feature of Probability Theory. Independence is treated in
considerable detail in this chapter, both the basic strong and weak laws, as
well as the convergence of series of random variables. The applications consid-
ered here illustrate such results as the Glivenko-Cantelli Theorem for empiric
and density estimation, random walks, and queueing theory. There are also
exercises (with hints) of special interest aiid we recommend that all readers
pay particular attention t o Problems 5 and 6, aiid also 7, 15 aiid 21 which
explain the very special nature of the subject and the concept of independence
itself.
The somewhat long third chapter is devoted t o the second fundamental
idea, namely conditioning. As far as we know, no other graduate text in prob-
ability has treated the subject of coiiditioiial probability in such detail aiid
specificity. To mention some noteworthy points of our presentation, we have
included: (i) the unsuspected, but spectacular, failure of the Vitali convergence
theorem for conditional probabilities. This is a consequence of an interesting
theorem of Blackwell and Dubins. We include a discussion and imposition of a
restriction for a positive conclusion t o prevail, (ii) the basic problem (still un-
resolved) of calculating coiiditioiial expected values (probabilities) when the
conditioning is relative t o random variables taking uiicountably inany values,
particularly when the random variables arise from continuous distributions.
In this setting, multiple answers (all natural) for the same question are ex-
hibited via a Gaussian family. The calculations we give follow some work
by Kac aiid Slepian, leading t o paradoxes. These difficulties arise from the
necessary calculation of the Radon-Nikod9m derivative which is fuiidaineiital
here, and for which no algorithmic procedure exists in the literature. A search
through E. Bishop's text on the foundations of constructivism (in the way
of L.E.J. Brower) shows that we do not yet have a solution or a resolution
for the problems discussed. Thus our results are on existence and hence use
"idealistic methods", which present, t o future researchers in Bishop's words,
"a challenge t o find a coiistructive version aiid t o give a coiistructive proof."
Until this is fulfilled, we have t o live with subjectively chosen solutions, for
applications of our work in practice.
It is in this context, we detail in chapter 3, the Jessen-Kolmogorov-
Bochner-Tulcea theorems on existence of arbitrary families of random vari-
ables on (suitable) spaces. We also iiiclude here the basic martingale limit
theorems with applications t o U-statistics, likelihood ratios, Markov processes
and quasi-martingales. Several exercises, (about 50) add complements t o the
Preface t o Second Edition xi

theory. These exercises include the concept of sufficiency, a martingale proof of


the Radon-Nikod9m theorem, aspects of Markov kernels, ergodic-martingale
relations and many others. Thus here and throughout the text one finds that
the exercises contain a large amount of additional information on the subject
of probability. Many of these exercises can be omitted in a first reading but we
strongly urge our readers t o at least glance through them all and then return
later for a serious study. Here and elsewhere in the book, we follow the lead
of Feller's classics.
The classical as well as modern aspects of the so called analytical theory
of probability is the subject of the detailed treatment of Part 11. This part
coiisists of the two chapters 4 aiid 5 , with the latter being the longest in the
text. These chapters can be studied with the basic outline of chapter 1 and just
the notion of independence translated t o analysis. The main aim of Chapter 4
is t o use distribution theory (or image probabilities using random variables) on
Euclidean spaces. This fully utilizes the topological structure of their ranges.
Thus the basic results are on characteristic fuiictioiis including the Lkvy-
Bochner-Cram& theorems and their multidimensional versions. The chapter
concludes with a proof of the equivalence of convergences-pointwise a.e., in
probability and in distribution-for sums of independent random variables.
Regarding some characterizations, we particularly recoininelid Problems 4,
16, 26, and 33 in this chapter.
The second longest chapter of the text, is chapter 5 and is the heart of
analytical theory. This chapter contains the customary central limit theory
with Berry-Essen error estimation. It also contains a substantial introduction
t o infinite divisibility, iiicludiiig the Lkvy-Khintchine representation, stable
laws, aiid the Donsker invariance principle with applications t o Kolmogorov-
Sinirnov type theorems. The basic law of the iterated logarithm, with H. Te-
icher's (somewhat) simplified proof, is presented. This chapter also contains
interesting applications in several exercises. Noteworthy are Bochner's gen-
eralization of stable types (without positive definiteness) in Exercises 26-27
aiid Wendel's "elementary" treatment of Spitzer's identity in Exercise 33. We
recommend that these exercises be completed by filling in the details of the
proofs outlined there. We have included the m-dependent central limit theo-
rem and an illustration t o exemplify the applicability and limitations of the
classical invariance principle in statistical theory. Several additional aspects
of infinite divisibility and stability are also discussed in the exercises. These
problems are recommended for study so that certain interesting ideas arising
in applicatioiis of the subject can be learned by such an effort. These are also
useful for the last part of the book.
The preceding parts I & 11, prepare the reader t o take a serious look
at Part 111, which is devoted t o the next stage of our subject. This part is
devoted t o what we consider as very important in modern applications, both
new aiid significant, in the subject. Chapters 6 aiid 7 are relatively short,
but are concerned with the limit theory of nonindependent random sequences
which demand new techniques. Chapter 6 introduces and uses stopping time
xii Preface t o Second Edition

techniques. We establish Wald's identities, which play key roles in sequential


analysis, and the Doob optional stopping and sampling theorems, which are
essential for key developments in martingale theory. Chapter 7 contains central
limit theorems for a raiidoin number of raiidoin variables and the Birkhoff
ergodic theorem. The latter shows a natural setting for strict stationarity of
families of random variables and sets the stage for the last chapter of the text.
Chapter 8 presents a glimpse of the panorama of stochastic processes with
some analysis. There is a significant increase and expansion of the last chapter
of the first edition. It can be studied t o get a sense of the expanding vistas
of the subject which appear to have great prospects aiid potential for further
research. The following items are considered to exhibit just a few of the many
new and deep applications.
The chapter begins with a short existence proof of Brownian motion di-
rectly through (random) Fourier series aiid then establishes the continuity,
nondifferentiability of its sample paths, statioiiarity of its increments, as well
as the iterated logarithm law for it. These ideas lead to a study of (general)
additive processes with independent, stable and strictly stationary increments.
The Poisson process plays a key role very similar to Brownian motion, and
points to a study of random measures with independent values on disjoint
sets. We indicate some modern developments following the work of Kahane-
Marcus-Pisier, geiieraliziiig the classical Paley-Zygmund analysis of raiidoin
Fourier series. This opens up many possibilities for a study of sample continu-
ity of the resulting (random) functions as sums, with just 0 < a! 5 2 moments.
These ideas lead to an analysis of strongly stationary classes (properly) con-
tained in strictly stationary families. The case n = 2 is special since Hilbert
space geometry is available for it. Thus the (popular) weakly stationary case
is coilsidered with its related (but more general) classes of weakly, strictly aiid
strongly harmonizable processes. These are outlined along with their integral
representations, giving a picture of the present state of stochastic analysis.
Again we include several complements as exercises with hints in the way pi-
oneered by Feller, aiid strongly recommend t o o u r readers t o at least glance
through t h e m t o have a better view of the possibilities and applications that
are opened u p here. In this part therefore, problems 6 and 7 of Chapter 6; and
problems 2, 6, and 10 of Chapter 7, and problems 8, 12, 15, and 16 in chapter
8 are interesting as they reveal the unfolding areas shown by this work.
This book gives our view of how Probability Theory could be presented aiid
studied. It has evolved as a collaboratioii resulting from decades of research
experience aiid lectures prepared by the first author and the experieiices of
the second author who, as a student, studied and learned the subject from the
first edition and then subsequently used it as a research reference. His notes
and clarifications are implemented in this edition to improve the value of the
text. This project has been a satisfying effort resulting in a newer text that is
offered to the public.
In the preparation of the present edition we were aided by some colleagues,
friends and students. We express our sincere gratitude t o Mary Jane Hill for
Preface t o Second Edition xiii

her assistance and diligence with aspects of typesetting and other technical
points of the manuscript. Our colleague Michael L. Green offered valuable
comments, and Kunthel By, who read drafts of the early chapters with a
student's perspective, provided clarifications. We would like t o thank our wives
Durgamba Rao and Kelly Swift, for their love, support, aiid understanding.
We sincerely thank all these people, and hope that the new edition will
serve well as a graduate text as well as a reference volume for many aspiring
and working mathematical scientists. It is our hope that we have succeeded,
at least t o some extent, t o convey the beauty aiid magnificence of probability
theory aiid its manifold applications t o our audience.

Riverside, CA M.M.Rao,
Pomona, CA R. J. Swzft
Preface to First Edition

The material in this book is designed for a standard graduate course on prob-
ability theory, including some important applications. It was prepared from
the sets of lecture notes for a course that I have taught several times over the
past 20 years. The present version reflects the reactions of my audiences as
well as some of the textbooks that I used. Here I have tried t o focus on those
aspects of the subject that appeared t o me t o add interest both pedagogi-
cally and methodologically. In this regard, I inelltioil the following features
of the book: it emphasizes the special character of the subject and its prob-
lems while eliminating the mystery surrounding it as much as possible; it
gradually expands the content, thus showing the blossoming of the subject;
it indicates the need for abstract theory even in applications aiid shows the
inadequacy of existing results for certain apparently simple real-world prob-
lems (See Chapter 6); it attempts t o deal with the existence problems for
various classes of random families that figure in the main results of the sub-
ject; it contains a more complete (and I hope more detailed) treatment of
conditional expectatioiis and of conditional probabilities than any existing
textbook known t o me; it shows a deep internal relation among the Lkvy
coiitiiiuity theorem, Bochner's theorem on positive definite functions, aiid the
Kolmogorov-Bochner existence theorem; it makes a somewhat more detailed
treatment of the invariance principles and of limit laws for a random number
of (ordered) random variables together with applications in both areas; aiid it
provides an unhurried treatment that pays particular attention t o inotivatioii
at every stage of development.
Since this is a textbook, essentially all proofs are given in complete de-
tail (even at the risk of repetition), and some key results are given multiple
proofs when each argument has something t o contribute. On the other hand,
generalization for its own sake is avoided, aiid as a rule, abstract-Banach-
xvi Preface t o First Edition

space-valued random variables have not been included (if they have been, the
demands on the reader's preparation would have had t o be much higher).
Regarding the prerequisites, a knowledge of the Lebesgue integral would
be ideal, aiid at least a concurrent study of real analysis is recommended.
The necessary results are reviewed in Chapter 1, aiid some results that are
generally not covered in such a course, but are essential for our work, are given
with proofs. In the rest of the book, the treatment is detailed and complete,
in accordance with the basic purpose of the text. Thus it can be used for
self-study by mature scientists having no prior knowledge of probability.
The main part of the book consists of Chapters 2-5. Even though I regard
the order presented here t o be the most natural, one can start, after a review
of the relevant part of Chapter 1, with Chapter 2, 3 or 4, and with a little
discussion of independence, Chapter 5 can be studied. The last four chapters
concern applications and problems arising from the preceding work aiid partly
generalizing it. The material there indicates some of the inany directions along
which the theory is progressing.
There are several exercises at the end of each chapter. Some of these
are routine, but others demand more serious effort. For many of the latter
type, hints are provided, and there are a few that complement the text (e.g.,
Spitzer's identity and aspects of stability in Chapter 5); for them, essentially
complete details are given. I present some of these not only as good illustra-
tions but also for reference purposes.
I have included in the list of references only those books and articles that
influenced my treatment; but other works can be obtained from these sources.
Detailed credits and priorities of discovery have not been scrupulously as-
signed, although historical accounts are given in the interest of motivation.
For cross-referencing purposes, all the items in the book are serially nuin-
bered. Thus 3.4.9 is the ninth item of Section 4 of Chapter 3. In a given
section (chapter) the corresponding section (and chapter) number is omitted.
The material presented here is based on the subject as I learned it from
Professor A/I. D. Donsker's beautiful lectures many years ago. I feel it is appro-
priate here t o express my gratitude t o him for that opportunity. This book has
benefited from my experience with generations of participants in my classes
and has been read by Derek K. Chang from a student's point of view; his
questions have resolved several ambiguities in the text. The manuscript was
prepared with partial support from an Office of Naval Research contract aiid
a University of California, Riverside, research grant. The difficult task of con-
verting my handwritten copy into the finished typed product was ably done
by Joyce Kepler, Joanne McIntosh, and Anna McDermott, with the care and
interest of Florence Kelly. Both D. Chang and J . Sroka have aided me in proof-
reading and preparation of the Index. To all these people and organizations I
wish t o express my appreciation for this help aiid support.

(M.M. RUO)
List of Symbols

a.a. alinost all


a.e. alinost everywhere
ch.f.(s) characteristic function(s)
d.f. (s) distribution function(s)
iff if and only if
i.i.d. independent identically distributed
r.v. (s) random variable(s)
m.g.f. moment generating function
AAB symmetric difference of A and B
0 empty set
( a ,b? open interval
(a,c,P) a probability space
P(fl ( A ? ) = P[f E A](= (P o f p l ( A ) )
XA indicator of A
A minimum symbol
v maximum symbol
R reals
c complex iiuinbers
N natural iiuinbers (=positive integers)
sigma algebra generated by the r.v.s X i ,
i = 1 , . . . ,n
variance of the r.v. X
correlation of X and Y
the set of scalar r.v.s on
( 0 7 C, P?

the set of pth power


integrable r.v.s. on (0,C, P)
the Lebesgue space of
equivaleiice classes of
r.v.s from C"
usually a partition of a set
the Lebesgue space on R
with Lebesgue measure
u is absolutely continuous
relative t o p (measures)
u is singular relative t o p
xviii List of Symbols

= [E(IX~)]'/~
= [JQI x I~~P]~/~
=p-norm of X
integral part of the real
number n 0 >
topological equivalence
means an/bn + 1 as n + cc
boundary of the set A
distinguished logarithm
of P
signum function
convolution of f l and f 2 in
L1(R)
the kth binomial coefficient
Part I Foundations

The mathematical basis of probability, namely real analysis, is sketched


with essential details of key results, including the fundamental law of prob-
ability, and a characterization of uniform integrability in Chapter 1 which is
used frequently through out the book. Most of the important results on inde-
pendence, the laws of large numbers, convergence of series as well as some key
applications on random walks and queueing are treated in Chapter 2, which
also contains some important complements as problems. Then a quite detailed
treatment of conditional probabilities with applications t o Markovian fami-
lies, martingales, and the Kolmogorov-Bochner-Tulcea existence theorems on
processes are included in Chapter 3. Also important additional results are
in a long problems section. The basic foundations of modern probability are
detailed in this part.
Chapter I

Background Material and Preliminaries

In this chapter, after briefly discussing the begiiiiiiiigs of probability theory


we shall review some standard background material. Basic concepts are intro-
duced and immediate consequences are noted. Then the fundamental law of
probability and some of its implications are recorded.

1.1 What Is Probability?


Before considering what probability is or what it does, a brief historical discus-
sion of it will be illuminating. In a general sense, one can think of a probability
as a long-term average, or (in a combinatorial sense) as the proportion of the
number of favorable outcomes t o the number of possible and equally likely
ones (all being finite in number in a real world). If the last condition is not
valid, one may give certain weights t o outcomes based on one's beliefs about
the situation. Other concepts can be similarly formulated. Such ideas are still
seriously discussed in different schools of thought on probability.
Basically, the concept originates from the recognition of the uncertainty of
outcome of an action or experiment; the assignment of a numerical value arises
in determining the degree of uncertainty. The need for measuring this degree
has been recognized for a very long time. In the Indian Jaiiia philosophy the
uiicertaiiity was explicitly stated as early as the fifth century B.C., and it was
classified into seven categories under the name syadvada system. Applications
of this idea also seem t o have been prevalent. There are references in medieval
Hindu texts t o the practice of giving alms t o religious mendicants without
ascertaining whether they were deserving or not. It was noted on observation
that "only ten out of a hundred were undeserving," so the public (or the
4 1 Background Material and Preliminaries

donors) were advised t o continue the practice. This is a clear forerunner of


what is now known as the frequency interpretation of probability.
References related t o gambling may be found throughout recorded history.
The great Indian epic, the Mahabharata, deals importantly with gambling.
Explicit iiuinerical assignment, as in the previous example, was not always
recorded, but its implicit recognition is discernible in the story. The Jaina case
was discussed with source material by Mahalanobis (1954), and an interesting
application of the syudvuda system was illustrated by Haldane (1957).
On the other hand, it has become customary among a section of historians
of this subject t o regard probability as having its roots in calculatioiis based
on the assuinptioii of equal likelihood of the outcomes of throws of dice. This
is usually believed t o start with the correspondence of Fermat and Pascal in
the 1650s or (occasionally) with Cardano in about 1550 and Galileo a little
later. The Fermat-Pascal correspondence has been nicely dramatized by R h y i
[see his book (1970) for references] t o make it more appealing and t o give the
impression of a true begiiiiiing of probabilistic ideas.
Various reasons have been advanced as t o why the concept of probability
could not have started before. Apparently an unwritten edict for this is that
the origins of the subject should be coupled approximately with the Industrial
Revolutioii in Europe. Note also that the calculations made in this period
with regard t o probability, assume equal likelihood. However, all outcomes
are iiot always equally likely. Thus the true starting point must come much
later-perhaps with E. Borel, A. Liapounov, and others at the end of the
nineteenth century, or even only with Kolmogorov's work of 1933, since the
presently accepted broad based theory started only then! Another brief
personal viewpoint is expressed in the elementary text by Neuts (1973). We
cannot go into the merits of all these historical formulatioils of the subject
here. A good scholarly discussion of such a (historical) basis has been given in
Maistrov's book (1974). One has t o keep in mind that a considerable amount
of subjectivity appears in all these treatments (which may be inevitable).
Thus the preceding sketch leads us t o conclude that the concepts of un-
certainty aiid prediction, aiid hence probabilistic ideas, started a long time
ago. Perhaps they can be placed 2500 years ago or more. They may have
originated at several places in the world. The methods of the subject have
naturally been refined as time went on. Whether there has been cross- fertil-
ization of ideas due t o trade and commerce among various parts of the world
in the early developinelit is iiot clear, although it cannot be ruled out. But the
sixteenth-seventeenth century "beginning" based on gambling and problems
of dice cannot be taken as the sole definitive starting point of probability.
With these generalities, let us turn t o the present-day concept of probability
that is the foundation for our treatment of the subject.

As late as the early 1920s, R. von hlises summed up the situation, no doubt in
despair, by saying, "Today, probability theory is not a mathematical science."
1.1 What Is Probability? 5

As is clear from the preceding discourse, probability is a numerical mea-


sure of the uncertainty of outcomes of an action or experiment. The actual
assignment of these values must be based on experience and should generally
be verifiable when the experiment is (if possible) repeated under essentially
the same conditions. From the modern point of view, therefore, we consider all
possible outcomes of an experiment and represent them by (distinct) points of
a nonempty set. Since the collection of all such possibilities can be infinitely
large, various interesting combinations of them, useful t o the experiments,
have t o be considered. It is here that the modern viewpoint distinguishes it-
self by introducing an algebraic structure into the coinbiiiatioiis of outcomes,
which are called events. Thus one coiisiders an algebra of events as the pri-
mary datum. This is evidently a computational convenience, though a decisive
one, and it must and does include everything of conceivable use for an exper-
iment. Then each event is assigned a iiuinerical measure corresponding t o
the "amount" of uncertainty in such a way that this assignment has natural
additivity and coiisistency properties. Once this setup is accepted, an ax-
iomatic formulation in the style of twentieth-century mathematics in general
becomes desirable as well as inevitable. This may also be regarded as building
a mathematical model t o describe the experiment at hand. A precise and sat-
isfactory formulatioil of the latter has beeii given by Kolmogorov (1933), and
the resulting analytical structure is almost universally accepted. In its mani-
fold applications, some alterations have beeii proposed by de Finetti, R&iyi,
Savage, and others. However, as shown by the first author (Rao 1981) in a
monograph on the modern foundations of the subject, the analytical structure
of Kolmogorov actually takes care of these alterations when his work is inter-
preted from an abstract point of view. This is especially relevant in the case
of coiiditioiial probabilities, which we discuss in detail in Chapter 3. Thus we
take the Kolmogorov setup as the basis of this book and develop the theory
while keeping in contact with the phenomenological origins of the subject as
much as possible. Also, we illustrate each concept as well as the general theory
with concrete (but not necessarily numerical) examples. This should show the
importance and definite utility of our subject.
The preceding account implies that the methods of real analysis play a key
role in this treatment. Indeed they do, and the reader should ideally be already
familiar with them, although concurrent study in real analysis should suffice.
Dealing with special cases that are immediately applicable t o probability is
not necessary. In fact, experience indicates that it can distort the general
coinpreheiisioii of both subjects. To avoid misunderstanding, the key results
are recalled below for reference, mostly without proofs.
With this preamble, let us start with the axiomatic formulation of Kol-
mogorov. Let fl be a noiieinpty point set representing all possible outcomes
of an experiment, and let C be an algebra of subsets of fl. The members
of C, called events, are the collections of outcomes that are of interest t o
the experimenter. Thus C is nonempty and is closed under finite unions and
complements, hence also under differences. Let P : C + R+ be a mapping,
6 1 Background Material and Preliminaries

called a probability, defined for all elements of C so that the following rules
are satisfied.

<
( I ) For each A E C, 0 P ( A ) and P(a)= 1.
(2) A, B E C, A n B = 0,implies P ( A U B ) = P ( A ) + P(B).
From these two rules, we deduce immediately that (i) (taking B = 0)
P(0) = 0 and (ii) A > B, A, B E C, implies P ( A - B) = P ( A ) - P(B). In
particular, P ( A C )= 1 P ( A ) for any A E C, where AC= a - A.
-

Such a P is called a "finitely additive probability." At this stage, oiie


strengthens (2) by introducing a continuity condition, namely, countable ad-
ditivity, as follows:
(2') If A l , A2,. . . are disjoint events of 0 such that A = U:!& is also
an event of R,then P ( A ) = Cr?lP ( A k ) .
Clearly (2') implies (2), but trivial examples show that (2), is strictly
weaker than (2'). The justification for (2') is primarily operational in that a
very satisfactory theory emerges that has ties at the deepest levels t o many
branches of mathematics. There are other cogent reasons too. For instance,
a good knowledge of the theory with this "countably additive probability"
enables oiie t o develop a finitely additive theory. Indeed, every finitely addi-
tive probability fuiiction can be made t o correspond uniquely t o a countably
additive one on a "nice" space, according t o an isomorphism theorem that
depends on the Stone space representation of Boolean algebras. For this and
other reasons, we are primarily concerned with the countably additive case,
and so henceforth a probability function always stands for one that satisfies
rules o r axioms (1) and (2'). The other concept will be qualified "finitely
additive," if it is used at all.
If P : C + R+ is a probability in the above sense and C is an algebra,
it is a familiar result from real analysis that P can be uniquely extended t o
the a-algebra (i.e., algebra closed under countable unions) generated by C
(i.e., the smallest 0-algebra containing C ) . Hence we may and do assume for
conveiiieiice that C is a a-algebra, and the triple (a,C, P) is then called a
probability space. Thus a probability space, in Kolmogorov's model, is a fi-
nite measure space whose measure function is normalized so that the whole
space has measure one. Consequently several results from real analysis can be
employed profitably in our study. However, this does not imply that probabil-
ity theory is just a special case of the standard measure theory, since, as we
shall see, it has its own special features that are absent in the general theory.
Foremost of these, is the concept of probabilistic (or statistical) independence.
With this firmly in hand, several modifications of the concept have evolved,
so that the theory has been enriched and branched out in various directions.
These developments, some of which are considered in Chapter 3, attest t o the
individuality and vitality of probability theory.
A concrete example illustrating the above discussion is the following:
1.2 Random Variables 7

Example 1. Let R, = {0,1} be a two-point space for each i = 1 , 2 , . . ..


This space corresponds to the ith toss of a coin, where 0 represents its tail
and 1 its head and is known as a Bernoulli trial. Let Ci = (0,{0}, {I}, Qi}
aiid Pi({O)) = q aiid Pi({l)) = p, O < p = 1 q < 1. Then (ai,Ci,Pi),
-

i = 1 , 2 , .. ., are identical copies of the same probability space. If (fl, C, P)[=


@i21(R,, C,, Pi)] is the product measure space, then 0 = {x : x =
(21x2,. . . ) , x i = 0 , l for all i}, and C is the a-algebra generated by the semir-
ing C = {Inc R : In consists of those x E 0 whose first n components
have a prescribed pattern). For instance, I2 can be the set of all x in f l
whose first two components are 1. If I n ( € C) has the first n components
coiisistiiig of k 1's aiid n - k O's, then P(I,) = pQn-" aiid P ( R ) = 1.
[Recall that a semiring is a nonempty class C which is closed under inter-
sections and if A, B E C,A c B , then there are sets Ai E C such that
A = A1 c . . . c An = B with Ai+1 Ai E C.]
-

The reader should verify that C is a semiriiig and that P satisfies coii-
ditioiis ( I ) aiid (2'), so that it is a probability on C with the above-stated
properties. We use this example for some other illustrations.

1.2 Random Variables and Measurability Results


As the definition implies, a probability space is generally based on an abstract
point set 0 without any algebraic or topological properties. It is therefore use-
ful to consider various mappings of R into topological spaces with finer struc-
ture in order to make available several mathematical results for such spaces.
We thus coiisider the simplest aiid most familiar space, the real line R.To
reflect the structure of C , we start with the a-algebra B of R, generated by all
open intervals. It is the Borel a-algebra. Let us now introduce a fundamental
concept:

Definition 1 A random variable f on f l is a finite real-valued measur-


able function. Thus f : f l + R is a raiidoin variable if fP1(B) c C, where
B is the Borel a-algebra of R; or fP1(A) = {w : f ( w ) E A} E C, for
A = (-oo,x),x E R. (Also written f-'(-oo,x), [f < x] for fP1(A).)

Thus a raiidoin variable is a fuiictioii and each outcome w E R is assigned


a real number f (w) E R.This expresses the heuristic notion of "randomness"
as a mathematical concept. A fundamental nature of this formulation will be
seen later (cf. Problem 5 (c) of Chapter 2). The point of this concept is that
it is of real interest when related to a probability function P. Its relation is
obtained in terms of image probabilities, also called distribution functions in
our case. The latter coiicept is given in the following:
8 1 Background Material and Preliminaries

Definition 2 If f : R + R is a random variable, then its distribution


function is a mapping Ff : R + R+ given by

Evidently P and f uniquely determine F f . The converse implication is


slightly involved. It follows from definitions that Ff is a nonnegative nonde-
creasing left continuous [i.e., Ff(x - 0) = Ff(x)] bounded mapping of R into
[O, I] such that lim,,-, F(z) = Ff(-oo) = 0, Ff(+oo) = lim,,+, F(z) =
1. Now any function F with these properties arises from some probability
space; let f l = R, C = B, f = identity, and P ( A ) = JAdF,A E B. The
general case of several variables is considered later. First let us present some
elementary properties of random variables.
In the definition of a raiidoin variable, the probability measure played no
part. Using the measure function, we can make the structure of the class of all
random variables richer than without it. Recall that (fl, C, P) is complete if
for any null set A E C [i.e., P ( A ) = 0] every subset B of A is also in C, so that
P ( B ) is defined and is zero. It is known and easy to see that every probability
space (indeed a measure space) can always be completed if it is not already
complete. The need for coinpletioii arises from simple examples. In fact, let
f l , fi, . . . be a sequeiice of raiidoin variables that forms a Cauchy sequeiice in
measure, so that for E > 0, we have lim,,,,, P [f, - fml > E] = 0. Then
there may not be a unique random variable f such that

However, if (0,C, P) is complete, then there always exists such an f , and if


f' is another limit function, then P{w : f ( w ) # fl(w)} = 0; i.e., the limit is
unique outside a set of zero probability. Thus if, Lo is the class of random
variables on ( R , C, P), a complete probability space, then Lo is an algebra
and contains the limits of sequences of random variables that are Cauchy in
measure. (See Problem 3 on the structure of Lo.) The following measurability
result on fuiictioiis of random variables is useful in this study. It is due to Doob
and, in the form we state it, to Dynkin. As usual, B is the Bore1 0-algebra of R.

Proposition 3 Let ( R , C) and (S,A) be measurable spaces and f : 0 +


S be measurable, i.e., fP1(A) c C . Then a function g : 0 + R is measurable
relative to the a-algebra f p l ( A ) [i.e., gpl(B) c f p l ( A ) ] iff ( = if and only
if) there is a measurable function h : S + R such that g = h o f . (This result
is sometimes refered to, for coiivience, as the "Doob-Dynkin lemma.")

Proof One direction is immediate. For g = h o f : R + R is measurable


implies g p l (B) = (h o f ) - l ( B ) = f p l(hpl (B)) c f -'(A), since hpl(B) c A
For the converse, let g be fpl(A)-measurable. Clearly f p l ( A ) is a a-
algebra contained in C . It suffices to prove the result for g simple, i.e., g =
Czl a i x ~, A,, E f (A). Indeed, if this is proved, then the general case is
1.2 Random Variables 9

obtained as follows. Since g is measurable for the a-algebra fP1(A), by the


structure theorem of measurable functions there exist simple functions g,,
measurable for f p l ( A ) , such that gn(w) + g(w) as n + oo for each w E 0.
Using the special case, there is an A-measurable h, : S + R, g, = h, o f ,
for each n > 1. Let So- = {s E S : hn(s) + h(s), n + 00). Theii S o E A , and
g ( R ) c S. Let h(s) = h(s) if s E So,= 0 if s E S - S o . Then h is A-measurable
and g(w) = h(f (w)), w E 0. Consequently, we need t o prove the special case.
Thus let g be simple: g = C r = l aixA,, and Ai = fP1(Bi)E f -'(A), for a
Bi E A. Define h = Cy=l a i x ~. ,Theii h : S + R is A-measurable aiid simple.
[Here the B j need not be disjoint even if the A, are. To have symmetry in the
definitions, we may replace Bi by C,, where C1 = B1 aiid Ci = B, ~ 5 1 - : ~ ~
for i > 1. So Ci E A, disjoint, f p l ( C , ) = Ai, aiid h = Cy=l a,xc, is the same
function.] Thus

aiid h o f = g. This completes the proof.

A number of specializations are possible from the above result. If S = Rn


and A is the Borel a-algebra of Rn, then by this result there is an h : Rn + R,
(Borel) measurable, which satisfies the requirements. This yields the following:

Corollary 4 Let (R, C) and (Rn, A) be measurable spaces, and f : fl +


Rn be measurable. Then g : fl + R is fpl(A)-measurable iff there is a Borel
measurable function h : Rn + R such that g = h(fl, fi, . . . , f,) = h o f where

If A is replaced by the larger a-algebra of all (completion of A) Lebesgue


measurable subsets of R n , then h will be a Lebesgue measurable function.
The above result will be of special interest in studying, among other things,
the structure of conditional probabilities. Some of these questions will be con-
sidered in Chapter 3. The mapping f in the above corollary is also called a
multidimensional random variable and f of the theorem, an abstract random
variable. We state this concept for reference.

Definition 5 Let (0,C ) be a measurable space and S be a separable


metric space with its Borel a-algebra. (E.g., S = Rn or C n or R?). Then a
mapping f : R + S is called a generalized (or abstract) random variable (and
random vector if S = Rn or Cn) whenever f p l ( B ) E C for each open (or
closed) set B c S, and it is a random variable if S = R. [See Problem 2b for
10 1 Background Material and Preliminaries

an alternative definition if S = Rn.]

+
As a special case, we get f : 0 + C,where f = f l i f 2 , f j : R + R, j =
1 , 2 , is a complex random variable if its real and imaginary parts f l , f 2 are
(real) random variables. To illustrate the above ideas, consider the following:

Example 6 Let ( R , C , P) be the space as in the last example, and


f, : 0 + R be given as fn(w) = n if the first 1 appears on the nth com-
ponent (the preceding are zeroes), = 0 otherwise. Siiice C = 2Q = ?(a),it is
clear that f, is a random variable aiid in fact each fuiictioii on f l is measurable
for C . This example will be further discussed in illustrating other concepts.

Resuming the theme, it is necessary t o discuss the validity of the results


on 0-algebras generated by certain simple classes of sets aiid functions. In this
connection the monotone class theorem aiid its substitute, as introduced by
E. B. Dynkin, called the (T, A)-classes will be of some interest. Let us state
the concept and the result precisely.

Definition 7 A nonempty collection C of subsets of a nonempty set 0 is


>
called (i) a monotone class if {A,, n 1) c C, An moiiotoiie +- limn A, E C,
(ii) a T-(or product) class if A, B E C +- A n B E C, (iii) a A- (or latticial) class
if (a) A , B E C , A ~ =BB + A u B E C , (b) A , B E C , A B + A - B E
C, 0 E C, and (c) A, E C, A, c A,+l + uA, E C; (iv) the smallest class
of sets C containing a given collection A having a certain property (e.g., a
monotone class, or a a-algebra) is said t o be generated by A.

The following two results relate a given collectioii and its desirable gener-
ated class. They will be needed later on. Note that a A-class which is a T-class
is a a-algebra. We detail some nonobvious (mathematical) facts.

Proposition 8 (a) If A is an algebra, then the monotone class generated


by A is the same as the a-algebra generated by A.
(b) If A is a A-class and B is a T-class, A > B, then A also contains the
a-algebra generated by B.

Proof The argument is similar for both parts. Siiice the proof of (a) is in
most textbooks, here we prove (b).
The proof of (b) is not straightforward, but is based on the followiiig idea.
Consider the collection A1 = {A c 0 : A n B E A. for all B E B}. Here
we take A. > B, and A. is the smallest A-class, which is the intersection of
all such collectioiis containing B. The class A1 is not empty. In fact B c dl.
We observe that A1 is a A-class. Clearly R E A l l Ai E A l l Al n A, = 8 +-
A, n B, i = 1 , 2 , are disjoint for all B E B, and Ai n B E do.Since A. is a
A-class, (A1 u A,) n B = (A1 n B) u (A, n B) E Ao, so that Al U A, E A1.
Similarly A1 > A2 + A l n B - A 2 n B = ( A l - A 2 ) n B E A. andA1-A2 E A1.
1.2 Random Variables 11

The monotonicity is similarly verified. Thus A1 is a A-class. Since A. is the


smallest A-class, A1 > A. > B. Hence A E A0 c A1, B E B + A n B E Ao.
Next consider A2 = {A c R : A n B E Ao, all B E Ao) By the pre-
ceding work, A2 3 B and, by an entirely similar argument, we can coiiclude
that Aa is also a A-class. Hence Az 3 A0 3 B. This means with A, B E Ao,
A n B E A. c A a , and hence A. is a T-class. But by Definition 7, a collection
which is both a T- and a A-class is a a-algebra. Thus A. is a a-algebra > B.
Then a ( B ) c Ao, where a ( B ) is the generated a-algebra by B. Since A. c A ,
the proposition is proved.

The next result, containing two assertions, is of interest in theoretical ap-


plications.

Proposition 9 Let B ( R ) be the space of real bounded functions on R and


'Ft c B ( f l ) be a linear set containing constants and satisfying (i) f, E 'Ft, f, +
f uniformly +- f E 'Ft, or (4) f E 'Ft +- f* E 'Ft, where f + = max(f,O) and
f-=f+-f,and(ii)O<f,~'Ft,f,Tf,f~B(Q)+f~'Ft.IfCc'Ft
is any set which is closed under multiplication and C = a(C) is the small-
est a-algebra relative to which every element of C is measurable, then every
f ( E B ( R ) ) which is C-measurable belongs to 'Ft. The same conclusion holds
if C c C is not necessarily closed under multiplication, but 'Ft satisfies (i')
[instead of (i)] C is a linear set closed under infima, and f E C + f A 1 E C.

Proof The basic idea is similar t o that of the above result. Let A. be an
algebra, generated by C and 1, which is closed under uniform convergence and
is contained in 'Ft. Clearly A. exists. Let dl be the largest such algebra. The
existelice of A1 is a consequence of the fact that the class of all such A. is
closed under unions and hence is partially ordered by inclusion. The existence
of the desired class A1 follows from the maximal principle of Hausdorff.
If f E A1, then there is a k > 0, such that if f l 5 k and if p(.) is
any polynomial on [-k, k], then p ( f ) E dl. Also by the classical Weierstrass
approximation theorem the function h : [-k, k] + R, h(z) = 1x1, is the
uniform limit of polynomials p, on [-k, k]. Hence p,(f) + If uniformly so
that (by the uniform closure of A1) If 1 E A1 and A1 is a vector lattice.
Observe that A1 automatically satisfies (ii), since if 0 < g, E A l , g, 1'
g E B ( f l ) , then g E 'Ft, aiid if A2 is generated by dl aiid g (as Ao), then by
the inaximality of A l l Az = d l . Thus A1 satisfies (i) and (ii) aiid is a vector
lattice. The second part essentially has this conclusion as its hypothesis. Let us
+
verify this. By (i'), i f f E 'Ft, then f * E 'Ft, so that f + f - = f 1 E 'Ft. Hence
+ +
if f , g E 'Ft, then f V g = +(if - gl f g) E 'Ft, since f - g E 'Ft (because
'Ft is a vector space). Thus 'Ft is a vector lattice. Coiisequently we coiisider
vector lattices containing C and 1 which are subsets of 'Ft. Next one chooses
a maximal lattice (as above). If this is A;, then it has the same properties as
A z . Thus it suffices t o consider Az and prove that each f in B ( R ) which is
C-measurable is in A2 ( C 'Ft ).
12 1 Background Material and Preliminaries

Let S = {A c R : XA E A2) Since A2 is an algebra, S is a T-class. Also


S is closed under disjoint unions and monotone limits. Thus it is a A-class
as well, and by the preceding proposition it is a a-algebra. If 0 < g [ B~( R ) ]
is S-measurable, then there exist 0 <
g, 1' g : g, is an S-measurable simple
function. But then gn E A2 and so g E A2 also. Since A2 is a lattice, this result
extends to all g E B ( 0 ) which are S-measurable. To complete the proof, it is
<
only necessary t o verify C = a(C) c S . Let 0 f E C and B = [ f 11 E C.>
We claim that B E S . In fact, let g = f A 1. Then g E A2 and 0 < <
g 1.
Now [g = 11 = B aiid [g < 11 = Be. Thus gn E A2 aiid gn J 0 on B e ,
or 1 gn
- 1 on B e . Since 1 g n E A2, and it is closed under bounded
-

monotone limits, we have 1 gn 1' XB? E A2 + B C E S, so that B E S. If


-

0 < >
f E C,B, = [ f > a ] = [ f l u 11 for a > 0, then f l u E A2 and by the
above proof B, E S for each a. But such sets as B, clearly generate C, so
that C c S. This completes the result in the algebra case.
In the lattice case A, B E S + XAXB = m i i i ( ~xB) ~ , E A;, SO that
A n B E S . Thus S is a n-class again. That it is a A-class is proved as before,
so that S is a a-algebra. The rest of the argument holds verbatim. Since with
each f E C, one has f A 1 E C we do not need to go t o A;, and the proof is
simplified. This establishes the result in both cases.

1.3 Expectations and the Lebesgue Theory


If X : R + R is a random variable (r.v.) on C1, then X is said to have an
expected value iff it is integrable in Lebesgue's sense, relative to P . This means
X is also integrable. It is suggestively denoted

E(X)=Ep(X)=
L XdP,

the integral on the right being the (absolute) Lebesgue integral. Thus E ( X )
exists, by definition, iff E(IX1) exists. Let C1 be the class of all Lebesgue
integrable functions on ( R , C, P). Then E : C1 + R is a positive linear
mapping since the integral has that property. Thus for X , Y E C1 we have

and E(1) = 1 since P ( R ) = 1, E ( X ) >


0 if X >
0 a.e. The operator E is
also called the (mathematical) expectation on C1. It is clear that the standard
results of Lebesgue integration are thus basic for the following work. In the
next section we relate this theory to the distribution function of X .
To fix the notation and terminology, let us recall the key theorems of
Lebesgue's theory, the details of which the reader can find in any standard
1.3 Expectations and the Lebesgue Theory 13

text on real analysis [see, e.g., Royden (1968, 1988), Sion (1968), or Rao (1987,
2004)l.
The basic Lebesgue theorems that are often used in the sequel, are the
following:

Theorem 1 (Monotone Convergence) Let 0 < X I < X 2 < . . . be a


sequence of random variables o n ( 0 ,C , P ) . Then X = limn X n is a measurable
(extended) real valued function ( o r a "defective" random variable) and

lim E(X,) = E ( X )
n-00

holds, where the right side can be infinite.

A result of equal importaiice is the following:

Theorem 2 (Dominated Convergence) Let {X,, n > 1) be a sequence


of random variables on (R, C , P) such that (2) limn,, Xn = X exists at
all points of R except for a set N c fl, P ( N ) = 0, (written Xn + X
a.e.), and (ii) X,I < Y, an r.v., with E ( Y ) < oo. Then X is an r.v. and
limn E(X,) = E ( X ) holds, all quantities being finite.

The next statement is a consequence of Theorem 1.

Theorem 3 (Fatou's Lemma) Let {X,,n > 1) be any sequence of


nonnegative random variables on ( R , C , P ) . Then we have E(1im inf, X,) <
lim inf, E (X,) .

In fact, if Yk = inf{Xn, n > k), then Theorem 1 applies to {Yk,k >


1).
Note that these theorems are valid if P is replaced by a noiifiiiite measure.
Many of the deeper results in analysis are usually based on inequalities.
We present here some of the classical inequalities that occur frequently in our
subject. First recall that a mapping q5 : R + R is called convex if for any
a , p > O , a + P = l , o n e has

From this definition, it follows that if {&, n > 1) is a sequence of convex


functions a, E R', then CT=la,& is also convex on R, and if $, + $, then
$ is convex. Further, from elementary calculus, we know that each twice-
differentiable function q5 is convex iff its second derivative $/' is nonnegative.
It can be shown that a measurable convex function on an open interval is nec-
essarily coiitiiiuous there. These facts will be used without comment. Hereafter
"convex function" always stands for a measurable convex function on R.
Let $(x) = - logx, for x > 0. Then $'/(x) > 0, so that it is convex. Hence
(3) becomes
14 1 Background Material and Preliminaries

Since log is an increasing function, this yields for a > 0, ,C? > 0, z > 0, y > O

For any pair of random variables X , Y on (R, >


C , P),and p 1,q = p/(p - l ) ,
<
we define I X l p = [ E ( x I " ) ] ~ / "1, p < oo, and I I X , (= essential supre-
mum of X I ) = inf{k > 0 : P[IXI > k] = 0). Then I . I p , 1 p < <cm,
is a positively homogeneous illvariant metric, called the p-norm; i.e., if
d(X, Y) = IX Y I, . I then d(., .)is a metric, d ( X 2,Y
- + +
2) = d(X, Y)
aiid d(aX, 0) = la d(X, 0), a E R.We have

Theorem 4 Let X , Y be random variables on ( 0 ,C , P ) . Then


(i) Holder's Inequality

(ii) Minkowski's Inequality

Proof (i) If IXIIP = 0 , or I Y I , = 0 , t h e n X = O a.e., or Y = O a . e . , so


that (5) is true and trivial. Now suppose I X I . 1 > 0, and I Y 1 , > 0. If p = 1,
then q = cm, and we have I Y l l o o = ess sup YI, by definition (= k, say),

Thus (5) is true in this case. Let then p > 1, so that q = p/(p - 1) > 1. In (4)
set a = l/p,,C?= l / q , z = ( l X / I X I , ) P ( w ) , aiid y = (IY/IY1,)q(w). Then it
becomes

Applying the (positive) operator E to both sides of (7) we get

This proves (5) in this case also, and hence it is true as stated.
+ <
(ii) Since X < +
Y P 2P m a x ( X l p , IYlP) 2P[XlP IYlP], the linearity of
+
E implies E(IX YIP) < cm, so that (6) is meaningful. If p = 1, the result
+ < +
follows from X Y X I IYI. If p = oo, X I <
IIX,, YI <
I Y , , a.e.
Hence IX YI+ < + 1x1, IYl,, a.e., so that (6) again holds in this case.
1.3 Expectations and the Lebesgue Theory 15

Now let 1 < p < oo. If I X + Y I P = 0, then (6) is trivial and true. Thus let
IIX+YIl, > 0 also.
Consider

Since (p - 1) > 0, let q = p/(p - 1).Then (p - l)q = p, and

Hence applying (5) to the two terms of (8) separately we get

or
I IX + YI . I 1x1.I + I lYl. 1
This completes the proof.

Some specializatioiis of the above result, which holds for any measure
space, in the context of probability spaces are needed. Taking Y = 1 a.e. in

Hence writing 4(x) = Z I P , (9) says that q5(E(XI)) I E(q5(X)). We prove


below that this is true for any contiiiuous convex fuiictioii 4, provided the
respective expectations exist. The significance of (9) is the following. If X is
an r.v., s > 0, and E ( X I S ) < oo, then X is said to have the s t h moment
finite. Thus if X has pth moment, p > 1, then its expectation exists. More is
true, namely, all of its lower-order moments exist, as seen from

Corollary 5 Let X be an r .v., on a probability space, with sth moment


finite. If 0 < r < <
s , then ( E ( X I ~ ) ) ~ (/ E~ ( x I ~ ) ) ~More/ ~ . generally, for
<
any 0 r,, i = 1 , 2 , 3 , if ifr, = E ( I X I r f ) ,we have the Liapounov inequality:
16 1 Background Material and Preliminaries

so that all lower-order moments exist. The inequality holds if we show that
&" is a noiidecreasing fuiictioii of r > 0. But this follows from (9) if we let
>
p = s/r 1 and replace X by X I T there. Thus

which is the desired result on taking the s t h root.


For the Liapounov inequality ( l o ) , note that Po = 1, and on the open
interval (0, s), p, is twice differentiable if P, < ce [use the dominated con-
vergence (Theorem ) for differentiation relative t o r under the integral sign],
and

Let y, = logp,. If X $ 0 a.e., then this is well defined and

because

by the Holder inequality with exponent 2. Thus y, is also convex in r . Taking


+ + + +
a = r3/(r2 r3),P = r ~ / ( r a r3) and x = rl, y' = rl r 2 r3 in (4) with
+ +
4(r) = y,, one gets a x Py' = rl rz, so that

which is (10). Note that the coiivexity of y, can also be proved with a direct
application of the Holder inequality. This completes the proof.

The special case of (5) with p = 2 is the classical Cauchy-Buniakowski-


Schwarz (or CBS) inequality. Due t o its great applicational potential, we state
it as

Corollary 6 (CBS Inequality) If X , Y have two moments finite, then


X Y is integrable and

Proof Because of its interest we present an independent proof. Since X , Y


+
have two moments, it is evident that t X Y has two moments for any t E R,
and we have
1.3 Expectations and the Lebesgue Theory

This is a quadratic equation in t which is never negative. Hence it has no


distinct real roots. Thus its discriminant must be nonpositive. Consequently,

This is (11),and the proof is complete.

Remark The conditions for equality in (5), (6), ( l o ) , and (11) can be
obtained immediately, and will be left to the reader. We invoke them later
when necessary.
One can now present the promised generalization of (9) as

Proposition 7 (Jensen's Inequality) If $ : R + R is convex and X


is a n r.v. on (R,
C , P ) such that E ( X ) and E ( 4 ( X ) ) exist, then

Proof Let xo, x1 be two points on the line and x be an intermediate point
+ +
so that x = a x 1 pxo, where 0 I a I 1, a p = 1. Then by (3)

For definiteness, let zo < < x x l so that with n = (z x o ) / ( z l


- - xo),C,? =
(21 - x ) / ( x l - xo), we get x. Hence the above inequality becomes

so that
(x - xo)($(x) - 4(x1)) I (x1 - x)(4(xo) - 4(x)).
By setting y = 21, yo = x, this becomes

In this inequality, written as 4(y) > +


4(yo) g(yo)(y yo), the right side is
-

called the support line of 4 at y = yo. Let X(w) = y, and yo = E ( X ) in (13).


Then 4 ( X ) is an r.v., and taking expectations, we get

This is (12), and the result holds. [Note: t i < tz +-g(t1) < g ( t ~ ) . ~ ]
This is not entirely trivial. Use ( 3 ) in different forms carefully. [See, e.g., G.H.
Hardy, J.E. Littlewood, and G. PolyA (1934, p. 93).]
18 1 Background Material and Preliminaries

In establishing (10) we first showed that &/' = [ E ( I X I ~ )is] an


~ Iincreas-
~
ing function of r. This has the following consequence:

Proposition 8 For any random variable X , lim,,,, (E[IX~'])~I'=


1x1I'm
Proof If 1x1, = 0, X = 0 a.e., the result is true. So let 0 <k = 1x1, <
cm. Then, by definition, P[IXI > k] = 0. Hence

so that for any 0 < t < k,

Letting r + cc in (14), we get k > >


l i m , , , ( ~ ( I ~ ~ ) ) ~ / ~t. Since t < k is
arbitrary, the result follows on letting t 1' k.
Let X , Y be two random variables with finite second moments. Then we
can define (a) the variance of X as

which always exists since a 2 ( X ) < E ( X 2 ) < cm; and (b) the covariance of
X . Y as

This also exists since by the CBS inequality,

The normalized covariance, called the correlation, between X and Y, denoted


p(X, Y), is then
cov(X, Y )
p(X, Y ) = (18)
a ( X ) a ( Y )'
where a ( X ) , o ( Y ) are the positive square roots of the corresponding variances.
Thus Ip(X, Y ) < 1 by (17). The quantity 0 <o ( X ) is called the standard
deviation of X . Note that if E ( X ) = 0, then p2 = a 2 ( X ) , and generally
>
Pz a 2 ( X ) , by (15).
Another simple but very useful inequality is given by

P r o p o s i t i o n 9 (i) (Markov's I n e q u a l i t y ) If < : R + R+ is a Bore1


function and X is an r.v. on (a, C , P), then for any X > 0,
1.3 Expectations and the Lebesgue Theory

(ii) ( ~ e b ~ ~ eInequality)
v's If X has a finite variance, then

Proof For (i) we have

(ii). In (19), replace X by X E ( X ) , [(x) by x2 aiid X by X2. Then 6


-

being one-to-one on , R [IX E ( X ) l 2 > X2] = [IX E ( X ) I > X],and the


' - -

result follows from that inequality.

Another interesting consequence is

Corollary 10 If X I , . . . , Xn are n random variables each with two m o -


m e n t s finite, then we have

and if they are uncorrelated [i.e., p(Xi, X I ) =0 for i # j] then

This follows immediately from definitions. The second line says that for
uncorrelated random variables, the variance of the sum is the sum of the
variances. We later strengthen this concept into what is called "independence"
and deduce several results of great importance in the subject.
For future use, we iiiclude two fundamental results on multiple integration
aiid differentiation of set functions.

T h e o r e m 11 (i) (Fubini-Stone) Let (Qi, Ci,pi)i = 1 , 2 , be a pair of


measure spaces and (R, C, p) be their product. I f f : Q + R i s a measurable
and p-integrable function, then

Ll f ( w i , .)pl(dwl) i s p2 - measurable,

and, moreover,
L2 f (., ~ z ) p z ( d w z )i s p1 - measurable,
20 1 Background Material and Preliminaries

-+ .
(ii) (Tonelli) If i n the above p1, pz are a-finite and f : R + R zs measur-
able, o r pC1,are arbitrary measures but there exists a sequence of p-integrable
simple functions fn : 0 + R+ such that fn 1' f a.e. ( p ) , t h e n again (21) holds
even though both sides m a y n o w be infinite.

The detailed arguments for this result are found in most standard texts
[cf., e.g., Zaaiien (1967), Rao (1987, 2004)l. The other key result is the follow-
ing:

T h e o r e m 12 (i) (Lebesgue D e c o m p o s i t i o n ) Let p and u be two finite


o r 0-finite measures o n (a, C), a measurable space. T h e n u can be uniquely
+
expressed as u = y u2, where ul vanishes o n p-null sets and there i s a set
A E C such that p(A) = 0 and u2(AC)= 0. T h u s u2 i s diflerent from zero only
o n a p-null set. (Here u2 i s called singular o r o r t h o g o n a l t o p and denoted
p 1u 2 Note also that ul 1u2 i s written.)

(ii) ( R a d o n - N i k o d g m T h e o r e m ) If p i s a 0-finite measure o n ( R , C)


and u : C + @ i s a-additive, and vanishes o n p-null sets (denoted u << p ) ,
then there exists a p-unique function (or density) f : 0 + such that

This important result is also proved in the above-stated references.

1.4 Image Measure and the Fundamental Theorem of


Probability
As noted in the beginning of Section 2, the basic probability spaces often in-
volve abstract sets without any topology. However, when a random variable
(or vector) is defined on such ( R , C, P),we can associate a distribution fuiic-
tion on the range space which usually has a nice topological structure, as in
Definition 2.2. Evidently the same probability space can generate numerous
image measures by using different measurable mappings, or random variables.
There is a fuiidaineiital relation between the expectation of a fuiictioii of a
random variable on the original space and the integral on its image space.
1.4 Fundamental Theorem of Probability 21

The latter is often more convenient in evaluation of these expectations than


to work on the original abstract spaces.
A comprehensive result on these ideas is contained in

T h e o r e m 1 (i) ( I m a g e M e a s u r e s ) . Let (a,C, p) be a measure space


with (S,A) as a measurable space, and f : R + S be measurable [i.e.,
-+ .
f f l ( A ) c C]. If u = p o f f l : A + R 2s the image measure, t h e n for
each g : S + R measurable, we have

in the sense that if either side exists, so does the other and equality holds.
(ii) ( F u n d a m e n t a l Law of P r o b a b i l i t y ) . If (a,C , p ) i s a probability
space and X : f l + R i s a r a n d o m variable with distribution function F x ,
and g : R + R is a Bore1 function, Y = g ( X ) , t h e n

in the sense that if either side exists, so does the other with equality holding.
(iii) I n particular, for a n y p > 0 ,

Proof (i) This very general statement is easily deduced from the definition
of the image measure. Indeed, if g(s) = xA(s),A E A, theii the left side of ( I )
becomes

Thus (1) is true, and by the linearity of the integral and the (a-additivity of
v) the same result holds if g = Cr=la i x ~, ,a simple function with ai 0 . If >
> <
g 0 is measurable, theii there exist simple functioiis 0 g, 1' g, so that ( I )
holds by the Lebesgue monotone convergence theorem. Since any measurable
g = g + g f with g* >
0 aiid measurable, the last statement implies the truth
of (1) in general for which g+ or g- is integrable.
(ii) Taking S = R, ( p is probability) we get v ( - m , z ) = Fx(z), the
distribution function of X . Thus (1) is simply (2). If Y = g ( X ) : L? + R, then
clearly Y is a random variable. Replace X by Y, g by identity, aiid S by R in

which establishes all parts of (2).


22 1 Background Material and Preliminaries

(iii) This is just a useful application of (ii), stated in a convenient form.


In fact, the first part of (3) being (2), for the last equation consider, with
Y = 1x1,and writing P for p:

Hence (2) becomes

(by integrating by parts and making a change of variable)

= ypP1(1 + Fx(-y) - F x ( y ) ) dy (by Thmrein 1). (4)

This is (3), and the proof is complete. In the last equality, Fx(-oo) = 0 and
Fx(+oo) = 1 are substituted.

In the above theorem, it is clear that g can be complex valued since the
+
stated result applies t o g = gl iga, where gl,g2 are real measurable func-
tions. We use this fact t o illustrate the followiiig important concept on Fourier
transform of real random variables. Indeed if X : R + R is any random vari-
able, g : R + C is a Bore1 function, then g o X : R + C is a complex random
+
variable. If gt(x) = costx i s i n t x = eitx, then gt : R + C is a bounded
continuous function and g t ( X ) is a bounded complex random variable for all
t € R.
Thus the followiiig definition is meaningful:

q5x (t) = E ( g t ( X ) ) = E(cos t X ) + iE(si1i t X ) , t E R. (5)


The mapping q5x : R + C, defined for each random variable X , is called the
characteristic function of X . It exists without any moment restrictions on X ,
and q5x (0) = 1, Iq5x (t) 1<1. As an application of the above theorem we have

Proposition 2 The characteristic function 4x of a random variable X


is uniformly continuous on R.
Proof By Theorem 1 ii, we have the identity
1.4 Fundamental Theorem of Probability

cjx (t) = IC(eitX) = 1R


e i t " d ~ x(z).

Hence given E > 0, choose L, > 0 such that Fx(L,) - Fx(-LC) >
1- ( ~ 1 4 ) .
If t l < t 2 , consider, with the elementary properties of Stieltjes integrals,

(eatl' - e i t 2 x ) d ~(z)
x
+ i l > L E l

If 6, = E / ( ~ L , )and It2 - tl < S, then (6) implies


& E
cjx(t1) - cjx(t2)l < 5 + 5 = &-
This completes the proof.

This result shows that many properties of random variables on abstract


probability spaces can be studied through their image laws and their char-
acteristic functions with nice continuity properties. We make a deeper study
of this aspect of the subject in Chapter 4. First, it is necessary t o introduce
several concepts of probability theory aiid establish its individuality as a sep-
arate discipline with its own innate beauty and elegance. This we do in part
in the next two chapters, and the full story emerges as the subject develops,
with its manifold applications, reaching most areas of scientific significance.
Before closing this chapter we present a few results on uniform integrabil-
ity of sets of random variables. This concept is of importance in applications
where an integrable dominating function is not available t o start with. Let us
state the concept.

Definition 3 An arbitrary collection {Xt, t E T) of r.v.s on a probability


space ( 0 ,C , P) is said t o be uniformly integrable if (i) E(IXtI) Iko < oo,
t E T, aiid (ii) liinp(A),o JA X t ldP = 0 uniformly in t E T.
24 1 Background Material and Preliminaries

The earliest occasion on which the reader may have encountered this con-
cept is perhaps in studying real analysis, in the form of the Vitali theorem,
which for finite measure spaces is a generalization of the dominated conver-
gence criterion (Theorem 2.2). Let us recall this result.

Theorem 4 (Vitali) Let X I , X 2 , . . . be a sequence of random variables


o n a probability space ( 0 ,C , P) such that X, + X a.e. (or only in measure).
If {X,, n > 1) is a uniformly integrable set, t h e n we have

Actually the conclusion holds i f only E(IX,) < oo,n > 1, and (ii) of Defini-
tion 3 is satisfied for {X,, n > 1).

Note that if X,I 5 Y and Y is integrable, then {X,, n >


1) is trivially
uniformly integrable. The point of the above result is that there may be no
such dominating function Y. Thus it is useful t o have a characterization of this
important concept, which is given by the next result. It contains the classical
all-important de La Valle'e Poussin criterion obtained in about 1915. It was
brought t o light for probabilistic applicatioiis by Meyer (1966).

Theorem 5 Let K = {Xt, t E T ) be a set of integrable random variables


o n a probability space. T h e n the following conditions are equivalent [ ( i ) e (iii)
is due t o de la Vallee Poussin]:
(i) K i s uniformly integrable.
(ii)
liin I X t d P = 0 uniformly i n t E T . (7)
a+cc

(iii) There exists a convex function 4 : R + R', 4(O) = 0, 4(-z) = $(z),


and 4(z)/z /' cc as z /' oo, such that S U P t t ~E ( 4 ( X t ) ) < cc.
Proof (i) +-(ii) By Proposition 3.9 (Markov's inequality) we have

uniformly in t E T. Thus by the second condition of Definition 3, given E > 0,


there is a 6, > 0 such that for any A E C , P ( A ) < 6, + JA jaXtldP < E
uniformly in t E T . Let At = [lXt1 > a] and choose a > ko/6, so that
P ( A t ) < 6, by ( 8 ) , and hence JAL X t l d P < E , whatever t E T is. This is ( 7 ) ,
and (ii) holds.
(ii) + (iii) Here we need t o construct explicitly a convex function 4 of the
desired kind. Let 0 5 a , < a,+l /' oo be a sequence of iiuinbers such that by
(7) we have
1.4 Fundamental Theorem of Probability 25

sup/ X~ d~ < 2-"-17 n > 1. (9)


t [Xt>a,,]
>
The sequence {an, n 1) is determined by the set K: but not the individual
+
X t . Let N ( n ) = the number of a k in [n,n I ) , = 0 if there is no a k in this
set, aiid put ((n) = N ( k ) , with N(0) = 0. Then ((n) /' oo. Define

where [ ( t )is a constant on [k, k + 1) and increases only by jumps. Clearly $(.)
is convex, q5-x) = q5(x), 4(0) = 0, $(x)/x > <(k)((x- k)/x) 1' oo, for k < x
aiid x, k /' oo. We claim that this function satisfies the requirements of (iii).
Indeed, let us calculate E ( 4 ( X t ) ) . We have

However,

Summing over n , we get with (9)

(11)
Thus (10) and (11) imply sup, E ( 4 ( X t ) ) 5 1, aiid (iii) follows.
(iii) + (i) is a consequence of the Holder inequality for Orlicz spaces since
4(.) can be assumed here to be the so-called Young function. The proof is
similar t o the case in which 4(x) = xl",p > 1. By the support line property,
<
the boundedness of E(q5(Xt)) I k < oo implies that of E ( I X t ) kl < oo.
The second condition follows from [q = p/(p I ) ]-
26 1 Background Material and Preliminaries

as P ( A ) + 0. The general Young function has the same argument. How-


ever, without using the Orlicz space theory, we follow a little longer but an
alternative and more elementary route, by proving (iii) + (ii) + (i) now.
Thus let (iii) be true. Then set i&= sup, E(q5(Xt)) < oo.Given E > 0, let
> >
0 < b, = & / E and choose a = a, such that 1x1 a, +- q5(x) lxlb,, which is
>
possible since q5(x)/x 7cc as x 7cc. Thus w E [IXtl a,] + b,Xtl(w) <
4(Xt (w??7 and

This clearly implies (ii).


Finally, (ii) +- (i). It is evident that (7) implies that if E = 1, then there
is a1 > 0 such that

+
So there is a k(> 1 a l ) < oo such that sup, E ( I X t ) < k < oo.To verify the
second condition of Definition 3, we have for A t C

Given E > 0, choose a = a, > 0, so that by (ii) the first integral is < E
uniformly in t . For this a,, (12) becomes

Since E > 0 is arbitrary this integral is zero, and (i) holds. This completes the
demonstration.

The following is an interesting supplement t o the above, called Scheff4's


lemma, it is proved for probability distributions on the line. We present it in
a slightly more general form.

Proposition 6 (Scheff6) Let X , X, >


0 be integrable random variables
on a probability space ( 0 ,C , P) and X, + X a.e. (or in measure). Then
E(X,) + E ( X ) as n + cc iff {X,, n >
1) is uniformly integrable, which is
equivalent to saying that lim,,, E(IX, - XI) = 0.
1.4 Fundamental Theorem of Probability 27

Proof If {X,, n >


1) is uniformly integrable, then E(X,) + E ( X ) by the
Vitali theorem (Theorem 4) even without positivity. Since {IXn - XI, n 1) >
is again uniformly integrable and X n - XI + 0 a.e. (or in measure), the last
statement follows from the above theorem. Thus it is the converse which is of
interest, and it needs the additional hypothesis.
Thus let X , X, >
0 and be integrable. Then the equation

<
is employed in the argument. Since min(Xn,X ) X , and min(Xn, X ) + X
a.e., the dominated convergence theorem implies E (min(Xn,X ) ) + E ( X ) as
n + oo. Hence taking expectations on both sides of (14) and letting n + oo,
we get E(max(X,, X ) ) + E ( X ) as well. On the other hand,

IXn - X = max(Xn, X ) - min(Xn, X ) . (15)


Applying the operator E to both sides of (151, and using the preceding facts
on the limits to the right-side expressions, we get E(IX, XI) + 0. This
-

implies for each E > 0 that there is an n, such that for all n > n, aiid all
A E C,

It follows that, because each finite set of integrable random variables is always
uniformly integrable,

lim
P(A)-0
Lx,~P< lim
P(A)-0
(16)

uniformly in n. Thus, because E > 0 is arbitrary, {X,,n > 1) is uniformly


integrable, as asserted.
In Scheffk's original version, it was assumed that d P = f d p , where p
is a a- finite measure. Thus f is called the density of P relative to p. If
Sn = f . X n > 0, then JQgndp = SOXn . f d p = JO X n d P is taken as unity,
-

so that g, itself is a probability density relative to p . In this form {g,, n 1)>


is assumed t o satisfy 1 = JQg,dp + gdp = 1 aiid g, + g a.e. (or in
measure). It is clear that the preceding result is another form of this result,
and both are essentially the same statements. These results can be further
generalized. (See, e.g., Problems 7-9.)
One denotes by CP(fl, C, P), or CP, the class of all pth-power integrable
random variables on (fl, C, P).By the Holder and Minkowski inequalities, it
follows that Cp is a vector space, p > 1, over the scalars. Thus f E Cp iff
I f 1 I p = [ E (f lp)ll/p < oo, and I . lp is the p-norm, i.e., I f 11, = 0 iff f = 0
a.e., a f + 91, < +
alIf l p 19p,a E (or a E c?.When If - 911, = 0,

a.e.1. Then the quotient LP = CP/


{f,,n > 1) c Cp, I fm - f n l p
-
so that f = g a.e., one identifies the equivalence classes (f g iff f = g
is a normed linear space. Moreover, if
+ 0, as n , m + oo then it is not hard to
28 1 Background Material and Preliminaries

see that there is a P-unique f E C" such that If - f, l lP + 0, so that C" is


complete. The space of equivalence classes (L", I . I ,),>1 is thus a complete
normed linear (or Banach) space, called the Lebesgue space, for 1 p cm. < <
It is customary to call the elements of LP functioiis when a member of its
equivaleiice class is meant. We also follow this custom.

Exercises
1. Let 0 be a nonempty set and A c 0. Then X A , called the indicator
(or "characteristic," in older terminology) function, which is 1 on A, 0 on
f l A = A", is useful in some calculations on set operations. We illustrate its
uses by this problem.

(a) If A, c R , z = 1 , 2 , and AIAAa is the symmetric difference, show that


X A ~ A A=~ IXA~ - X A ~ .
(b) If A, c R,n = 1 , 2 , . . . , is a sequence, A = lim sup, A, ( = the
set of points that belong to infinitely many A,, = n g l U,)I, A,) and B =
liminf, A, (= the set of points that belong to all but finitely many A,, =
00
Uk=, nnykA,), show that XA = limsup, XA,,, XB = liminf, A,, and A = B
(this common set is called the limit and denoted limn A,) iff XA = limn XA,,.
(c) (E. Bishop) If A, c 0,n = 1 , 2 , .. . , define C1 = A1, C2 = ClAA2,. . . ,
C, = CnPlAAn. Show that limn C, = C exists [in the sense of (b) above] iff
lim, A, = 0.[Hint: Use the indicator fuiictioiis and the results of (a) and (b).
Verify that I x c , + I - XC,, I = XA,,+l .I
(d) If (0,E, P) is a probability space, and {A,, n >
1) c C , suppose
that limn A, exists in the sense of (b). Then show that limn P(A,) exists and
equals P(lim, A,).

2. (a) Let (fl, C, P) be a probability space and {Ai, 1 5 i 5 n} c C,


n > 2. Prove (Poincare's formula) that
n
- x
l<i<j<n
P(A, n Aj) + x
l<i<j<k<n
P ( A , n Aj n A,)

Thus the first two terms usually underestimate the probability of Uy=lAi.
(b) Let (R,, A,),i = 0 , 1 , . . . , n , be measurable spaces, f : Go + x,"=,Ri
be a mapping. Establish that f is measurable iff each component of f =
( f l ,. . . , f,) is. [Hint: Verify f p l ( a ( C ) ) = o ( f p l ( C ) ) for a collection C of sets.]
Exercises 29

>
3. (a) Let {X,, n 1) be a sequence of random variables on a probability
space (0,C , P ) . Show that X, + X , a random variable, in probability iff

(b) If X , Y are any pair of ra,ndom variables, and Lo is the set of all random
variables, define

and verify that d(., .) is a metric on Lo and that Lo is an algebra of random


variables.
(c) If X NY denotes X = Y a.e., and Lo = Lo/ N, show that (Lo,d(., .))
is a complete linear metric space, in the sense that it is a vector space and
each Cauchy sequence for d(., .) converges in Lo.
(d) Prove that (Lp, I . I I p ) introduced in the last paragraph of Section 1.4
is complete.

4. Consider the probability space of Example 2.6. If f : 0 + R is the ran-


dom variable defined there, verify that E ( f ) = l i p and a2( f ) = (1 - p)/p2.
In particular, if p = 1/2, then E(f) = 2, 0 2 ( f )= 2, so that the expected
number of tosses of a fair coin to get the first head is 2 but the variance is
also 2, which is "large."

5. (a) Let X be an r.v. on ( 0 , C , P ) . Prove that E ( X ) exists iff

for some a > 0, and hence also for all a > 0.


(b) If E ( X ) exists, show that it can be evaluated as

[See Theorem 4.1iii.l

6. (a) Let X be a bounded random variable on (R, C , P). Then for any
E > 0 and < + >
any r > 0, verify that E ( I X r ) E~ a r P [ X E], where a is the
bound on X I . In particular, if a = 1, we have E ( x ~ ) 5 P [ X I E].
- E~ >
(b) Obtain an improved version of the one-sided ceby:ev7s inequality as
follows:
Var ( X )
P I X > E ( X ) +&I 5 &, Var(X).
+

[Hint: Let Y =X - E ( X ) and a2 = Var X. Set


1 Background Material and Preliminaries

Then if B = [Y > E ] , verify that E(f (Y)) > P ( B ) and E(f (Y)) = -.I
>
7. Let {X,, n 1) be a sequence of r.v.s on (0,C, P) such that X, + X
a.e., where X is an r.v. If 0 < p < cm and E(IX,") < cm,n > 1, then
{lXn I", n > 1) is uniformly integrable iff E ( X n - XI") + 0 as n + cm.
The same argument applies to a more general situation as follows. Suppose
4 : R+ + R+ is a symmetric function, 4(0) = 0, and either 4 is coiitin-
uous concave increasing function on R+ or is a convex fuiiction satisfying
4(2x) < ccj(x),x > 0, for some 0 < c < cm. If E(4(X,)) < k < cc
and E(cj(Xn)) + E ( 4 ( X ) ) , t h e n E ( 4 ( X n - X ) ) + 0 as n + cm and
{4(Xn),n > 1) is uniformly integrable. [Hint: Observe that there is a con-
stant 0 < 2 < cm such that in both the above convex and concave cases,
>
~ ( Z + Y )I c [ 4 ( z ) + 4 ( y ) l , z , yE R.Hence ~ [ 4 ( X ~ I ) + 4 ( X ) 1 - 4 - X ) 0
a.e.1

8. (Doob) Let (0,C, P) be a probability space, C > Fn > Fn+lbe a-


subalgebras, and X n : R + R be Fn-measurable (hence also measurable for
C ) . Suppose that v,(A) = JA X,dP, A E F, satisfies for each n 1, v,(A) > >
u,+l (A),A E 3,. Such sequences exist, as we shall see in Chapter 3. (A trivial
example satisfying the above conditions is the following: 3,+1 = 3, = C all
n , and Xn >Xn+1 a.e. for all n >
1.) Show that {X,, n >
1) is uniformly
integrable iff (*) limn vn(R) > G O . In the special example of a decreasing
sequence for which (*) holds, deduce that there is a random variable X such
that E(IX, XI) + 0 as n + GO. [m:
If A; = [X,I > A], verify that
-

P(A;) + 0 as X 1' cc uniformly in n , after noting that jaX,dP sA ul(0) < +


<
2v,(fl) for all 1 n I rn. Finally, verify that

9. [This is an advanced problem.] (a) Let ( R , C, p) be a measure space,


>
X, : 0 + R, n 1, be random variables such that (i) X, + X a.e., n + cm,
and (ii) X n = Yn Zn, n + >
1, where the random variables Yn, Zn satisfy
(a)& + Z a.e. and
exists, A E C, and (iii)
Zndp + sQ SO
Zdp E R, n + cm, ( P ) limn,, Yndp SA

liin liin
m+cx n i o o
1..
Y,dp =0

for any A, J, 0,A, E C. Then limn,, SO


Xndp = JQ Xdp. If p ( 0 ) < w,
(iii) may be omitted here. [Hints: If X : A H limn,, JA Yndp, then X : C + R
is additive aiid vanishes on p-null sets. ( P ) aiid (iii) + X is also 0-additive,
so that X(A) = SA
Y1dp for a p-unique r.v. Y1, since the Y,, being integrable,
Exercises 31

vanish outside a fixed a-finite set, and p may thus be assumed a-finite. It may
be noted that (iii) is a consequence of (p) if p(Q) < oo.Next, ( p ) also implies

so that it is "weakly coiivergent" t o Y'. Let F E C , p ( F ) < oo.Then by the


Vitali-Hahn-Saks theorem (cf. Dunford-Schwartz, III.7.2),

uniformly in n. Also YnxF = (X, - Zn)xF -- ( X - Z ) x F a.e. Let Y = X - Z.


These two imply SO IY, - Y x ~ d + p 0. Deduce that Y = Y' a.e., and then
YnxF + YxF = Y'xF in measure on each F E C, with p ( F ) < 00. Hence
by another theorem in Dunford-Schwartz (III.8.12), Jn IY, Y ldp + 0. Thus
-

using ( a ) ,this implies the result. The difficulty is that the hypothesis is weaker
than the dominated or Vitali convergence theorems, and the X,,n > 1, are
not uniformly integrable. The result can be extended if the X, are vector
valued.]
(b) The following example shows how the hypotheses of the above part
caii be specialized. Let X,, g,, h, be random variables such that (i)X, + X
a.e.,g,+ga.e.,andh,+ha.e.asn--oo,(ii)g,<X,<h,,n>l,
and (iii) SOgndp -- s Q g d p E R, SO h,dp -- SO hdp E R , n -- oo. Then
limn,, SO X,dp = SO X d p E R. [Let Y, = X, - g,, 2, = g,. Then (i) and
(ii a ) of (a) hold.
Now 0 < Y,< h, - g, and Sn(h, - g,)dp + s Q ( h g)dp by hypoth-
-

esis. Since h, - g, > 0 and we may assume that these are finite after some
n , let us take n = 1 for convenience. As shown in Proposition 4.6, this im-
plies the uniform integrability of {h, - g,, n > I}, and (ii p) and (iii) will
hold, since SQ (h, - g,) - (h g ) d p + 0 is then true. Note that no order
-

relation of the range is involved in (a), while this is crucial in the present
formulation.] Observe that if g, < <
0 h,, we may take g, = -h,, replacing
h, by max(h,, -g,) if necessary, so that IX, < h, andSQ h,dp -- SQ hdp
implies the h, sequence, and hence the X, sequence, is uniformly integrable
as in Proposition 4.6. The result of (b) (proved differently) is due t o John
W. Pratt. The problem is presented here t o show how uniform integrability
caii appear in different forms. The latter are neither more natural nor elegant
than the ones usually given.

10. This is a slight extension of the Fubini-Stone theorem. Let (ai,Ci),


i =1 , 2 , be two measurable spaces and R = R1 x R2, C = C1 @ C2 their
products. Let P(.,.): fll x C2 + R+ be such that P ( w l , . ) : C2 + R+ is
a probability, wl E Ql and P ( . , A) : Ql -- R+ be a El-measurable function
for each A E C2. Prove that the mapping Q : (A, B) H SA
P(w1, B)p(dwl)
for any probability p : C1 -- R+ uniquely defines a probability measure on
32 1 Background Material and Preliminaries

( R , C), sometimes called a mixture relative to p, and if X : R + R+ is any


random variable, then the mapping w l H Ju2 X(w1, w ~ ) P (dw2) w ~is,Q(., R2)-
measurable and we have the equation

[If P(w1:) is independent of w l then this reduces to Theorem 3.1 1(ii) and the
proof is a modification of that result.]

11. (Skorokhod) For a pair of mixtures as in the preceding problem, the


Radon-Nikod9m theorem can be extended; this is of interest in probabilistic
and other applications. Let (Qi, Ci),i = 1 , 2 , be two measurable spaces and
Pi : fll x C 2 + R+,pi : C1 +, '
R aiid Qi : (A, B ) H JA Pi(wl, B)pi(dwl), i =
1 , 2 , be defined as in the above problem satisfying the same conditions there.
Then Q1 << Qa on ( f l , C), the product measurable space iff p1 << pa aiid
PI(wl, .) << P a (wl , .) for a.a. (wl). When the hypothesis holds (i.e., Q1 << Q z ) ,
deduce that

[Hints:If Q1 << Qa, then observe that, by considering the marginal measures
Q,(., Ra), we also have p1 << pa. Next note that for a.a.(wl),

is a inoliotolie class and an algebra. Deduce that PI( w l , .) << Pa(w1, .),
a.a,(wl). The converse is simpler, and then the above formula follows. Only a
careful application of the "chain rule" is needed. Here the proof can be simpli-
fied and the application of monotone class theorem avoided if C2 is assumed
countably generated as was originally done.]
Chapter 2

Independence and Strong Convergence

This chapter is devoted t o the fundamental concept of independence and t o


several results based on it, including the Kolmogorov strong laws and his three
series theorem. Some applications t o einpiric distributions, densities, queueing
sequences and random walk are also given. A number of important results,
included in the problems section, indicate the profound impact of the concept
of independence on the subject. All these facts provide deep motivation for
further study and development of probability theory.

2.1 Independence
If A and B are two events of a probability space ( R , C, P), it is natural t o
say that A is independent of B whenever the occurrence or nonoccurrence of
A has no influence on the occurrence or nonoccurrence of B . Consequently
the uncertainty about joint occurrence of both A and B must be higher than
either of the individual events. This means that the probability of a joint
occurrence of A and B should be "much smaller" than either of the individual
probabilities. This intuitive feeling can be formalized mathematically by the
equation

for a pair of events A, B. How should intuition translate for three events
A, B, C if every pair among them is independent? The following ancient ex-
ample, due t o S. Bernstein, shows that, for a satisfactory mathematical ab-
straction, more care is necessary. Thus if R = {dl,w2, w3, w4), C = P ( f l ) ,the
power set, let each point carry the same weight, so that

Let A = {dl,w2), B = {wl, w3), and C = {w4, wl). Then clearly P ( A n B) =


i, i,
P ( A ) P ( B ) = P ( B n C ) = P ( B ) P ( C ) = and P ( C n A ) = P ( C ) P ( A ) = i .
34 2 Independence and Strong Convergence

But P ( A n B n C ) = i, i.
and P ( A ) P ( B ) P ( C ) = Thus A, B , C are not
independent. Also A, ( B n C ) are not independent, and similarly B , ( C n A)
and C, (A n B ) are not independent.
These coiisideratioiis lead us to introduce the precise coiicept of mutual
iiidependeiice of a collection of events by not pairwise but by systems of equa-
tions so that the above anomaly cannot occur.

Definition 1 Let ( R , C, P) be a probability space and {Ai, i E I ) c P(0)


be a family of events. They are said to be pairwise independent if for each
distinct i, j in I we have P(Ai n Aj) = P(A,)P(Aj). If A,,, . . . ,A,", are n
>
(distinct) events, n 2, then they are mutually independent if

holds simultaneously for each rn = 2,3, ...,n. The whole class {Ai, i E I ) is
said to be mutually independent if each finite subcollectioii is mutually inde-
>
pendent in the above sense, i.e., equations (1) hold for each n 2. Similarly if
{Ai, i E I) is a collection of families of events from C then they are mutually
independent if for each n , Ai, E Ai, we have the set of equations (1) holding
for Ai,, k = 1, ...,m , 1 < m 5 n. Thus if Ai E Ai then {Ai, i E I ) is a mutually
independent family. [Followingcustom, we usually omit the word "mutually" .]

It is clear that the (mutual) independence concept is given by a system


of equations (1) which can be arbitrarily large depending on the richness of
C . Indeed for each n events, (1) is a set of 2" n 1 equations, whereas the
- -

n
pairwise case needs only ( ) equations. Similarly %-wisen iiidependeiice has
2
n
(rn)
equations, and it does not imply other independences if 2 < m < n is
a fixed number m. It is the strength of the (mutual) concept that allows all
n > 2. This is the mathematical abstraction of the intuitive feeling of inde-
pendence that experience has shown to be the best possible one. It seems to
give a satisfactory approxiination to the heuristic idea of iiidepeiideiice in the
physical world. In addition, this mathematical formulation has been found
successful in applications to such areas as number theory, and Fourier analy-
sis. The notion of independence is fundamental t o probability theory
and distinguishes it from measure theory. The coiicept translates itself
to random variables in the following form.

Definition 2 Let (0,C , P) be a probability space and {Xi, i E I ) be


abstract random variables on R into a measurable space (S,A). Then they
are said to be mutually independent if the class {B, i E I ) of 0-algebras in C
is mutually independent in the sense of Definition 1, where Bi = x i 1 ( A ) , the
a-algebra generated by X,, i E I . Pairwise independence is defined similarly.
2.1 Independence 35

Taking S = R(orRn) and A as its Bore1 a-algebra, one gets the corresponding
concept for real (or vector) random families.

It is perhaps appropriate at this place t o observe that inany such (inde-


pendent) families of events or raiidoin variables on an (fl, C, P) need not exist
if (0,C) is not rich enough. Since 0 and 0 are clearly independent of each
event A E R , the set of equations (1) is non vacuous. Consider the trivial
example R = (0, 11, C = P(0) = (0, {0), {I), R), P({O))= p = 1 - P({1)),
0 < p < 1. Then, omitting the 0,f l , there are no other independent events, and
if X, : f l + R,i = l , 2 , defined as X1 (0) = 1 = X 2(1)aiid X I (1) = 2 = X2(0),
then X I , X 2 are distinct raiidoin variables, but they are not independent. Any
other random variables defined on R can be obtained as functions of these two,
and it is easily seen that there are no nonconstant independent random vari-
ables on this R . Thus ( R , C, P) is not rich enough t o support nontrivial (i.e.,
nonconstant) independent raiidoin variables. We show later that a probability
space can be enlarged t o have more sets, so that one can always assume the
existence of enough independent families of events or random variables. We
now consider some of the profound consequences of this mathematical formal-
ization of the natural concept of mutual independence. It may be noted that
the latter is also termed statistical (stochastic or probabilistic) independence
t o contrast it with other concepts such as linear independence aiid functional
independence. [The functions X I , X 2 in the above illustration are linearly in-
dependent but not mutually (or statistically) independent! See also Problem
1.1
To understand the implications of equations ( I ) , we coiisider different
forms (or consequences) of Defiiiitioiis 1 aiid 2. First note that if {A,, i E
I} c C is a class of mutually iiidepeiideiit events, then it is evident that
{a(A,),i E I) is an independent class. However, the same cannot be said
if the singleton Ai is replaced by a bigger family Gi = {A:, j E Ji) c C,
where each Ji has at least two elements, i E I, as simple examples show. Thus
{o(Gi), i E I} need not be independent. On the other hand, we can make the
following statements.

Theorem 3 (a) Let {A, Bi, i E I) be classes of events from ( R , C , P)


such that they are all mutually independent in the sense of Definition 1. If
each Bi, i E I , is a n-class, then for any subset J of I , the generated a-algebra
a(B,, i E J) and A are independent of each other.
(b) Definition 2 with S = R reduces to the statement that for each fi-
nite subset i l , . . . , i n of I and random variables x,,, . . . , X i , ,, the collection
of events {[Xi, < X I , . . . ,Xi,, < x n ] , x j E R, j = 1,..., n , n > 1) forms an
independent class.

Proof (a) Let B = a(B,, i E J),J c I . If A E A, B j E Bj1


36 2 Independence and Strong Convergence

are independeiit by hypothesis, i.e., (1) holds. We need t o show that

If B is of the form B1 n . . . n B,, where Bi E Bi, i E J, then (2) holds


by (1). Let D be the collection of all sets B which are finite intersections of
sets each belonging to a Bj, j E J. Since each Bj is a T-class, it follows that
2) is also a T-class, and by the preceding observation, (2) holds for A aiid
D l so that they are independent. Also it is clear that Bj c D, j E J. Thus
a ( B j , j E J) c a ( D ) . We establish (2) for A and a ( D ) t o complete the proof
of this part, and it involves another idea often used in the subject in similar
arguments.
Define a class G as follows:

B = { B E a ( D ) : P ( A n B ) = P ( A ) P ( B ) ,A E A). (3)
Evidently D c G. Also R E G, and if B1, Bz E with B1 n Bz = 0,then

P((B1 u B z ) n A) = + P ( B 2 n A) (since the B, n A are disjoint)


P ( B 1 n A)

= P(B1)P ( A ) + P(B2)P ( A ) [ by definition of (3)]

Hence B1 U B2 E G. Similarly if B1 > Bz, B, E G, then

P((B1 - B2) n A) = P ( B l n A) - P ( B 2 n A) (since El n A 3 B2 n A)

Thus El B2 E G. Finally, if B, E G, B, c
- we can show, from the
fact that P is 0-additive, that limn En = U,>lBn E G. Hence G is a A-class.
Since B > D , by Proposition 1.2.8b, G > ;(Dl. But (3) implies G and A
are independent. Thus A and a ( D ) are independent also, as asserted. Note
that since J c I is an arbitrary subset, we need the full hypothesis that
{A, B,, i E I) is a mutually independeiit collection, aiid not a mere two-by-
two independence.
(b) It is clear that Definition 2 implies the statement here. Conversely, let
B1 be the collection of sets {[X,, < XI, x E R), and
2.1 Independence

It is evident that B1 and B2 are T-classes. Indeed,

and similarly for B2. Hence by (a), B1 and a ( & ) are independent. Since
B1 is a T-class, we also get, by (a) again, that o ( B 1 ) and 0 ( B 2 ) are inde-
pendent. But a ( & ) = x ; ' ( ~ ) [ =o ( X i , ) ] ,and 0 ( B 2 )= o ( u ~ = ~ x ~ , ' ( R ) ) [ =
a ( X i , , . . . , X,,,)],where R is the Borel a-algebra of R.
Heiice if A1 c a ( X i , ) ,A j c xcl(R)(= o ( X z J ) )c o ( B z ) ,then Al and
{ A z ,...,A,} are independent. Thus

p ( A l n . . . n A,) =P(Al) . P ( A 2 n . . . n A,). (4)


Next consider Xi, and ( X i , , . . . , X i , , ) .The above argument can be applied to
get
P ( A 2 n . . . n A,) = P ( A 2 ). P ( A , n . . . n A,).
Coiitinuiiig this finitely inany times aiid substituting in (4), we get ( 1 ) .Heiice
Definition 2 holds. This completes the proof.

The above result says that we can obtain ( 1 ) for random variables if we
assume the apparently weaker coiiditioii in part (b) of the above theorem.
This is particularly useful in computations. Let us record some consequences.

Corollary 4 Let {Bi,i E I ) be an arbitrary collection of mutually inde-


C , P ) , and Ji c I , Jl n J2 = 0.If
pendent T-classes in (R,

then B1 and Ga are independent. The same is true if ';fi= n ( B j 1j c J,), i =


1,2, are the generated T-classes.

If X , Y are independent random variables, f , g are any pair of real Borel


fuiictioiis on R , then f o X , g o Y are also independent random variables.
This is because ( f o X ) - ' ( R ) = X p l ( f p ' ( R ) ) c X p l ( R ) , and similarly
( g o y ) ' ( R )c Y p l( R ) aiid
; X p l (R)Y
, p l( R )are independent 0-subalgebras
of C . The same argument leads to the following:

Corollary 5 If X I , . . . , X , are mutually independent random variables


on (a, C , P ) and f : R h R , g : R n p b R are any Borel functions, then
the random variables f ( X I ,. . . , X k ) , g(Xk+',. . . , X,) are independent; and
a ( X 1 , .. . , X k ) , a ( X k + ' , . . . , X,) are independent a-algebras, for any k > 1.

Another consequence relates to distribution functions and expectations


when the latter exist.
38 2 Independence and Strong Convergence

Corollary 6 If X1 . . . X, are independent random variables on (0,C , P ) ,


then their joint distribution is the product of their individual distributions:

= PIX, < z,]

If, moreover, each of the random variables is integrable, then their product
is integrable and we have

Proof By Theorem 3b, (1) and (5) is each equivalent to independence,


and so the image functions Fx,,,,.,x,, and nr=l
Fx, are identical. In Defi-
nition 2.2 the distribution function of a single random variable is given. The
same holds for a (finite) random vector, and FX,,..,,x7, is termed a joint distri-
bution function of X I , . . . , X,. The result on image measures (Theorem 1.4.1)
connects the integrals on the R-space with those on R n , the range space of
( X I ,. . . , X n ) .
We now prove (6). Taking f (z) = 1x1, f : B + R+ being a Bore1 function,
by Corollary 5, IX1 , . . . , IXn I are also mutually independent. Then by (5) and
Tonelli's theorem,

[by Theorem 1.4.li with G as the image law]

+ z,dGx, (,,), (by Tonelli's theorem)


a=l

n
= E ( I X ~ I [by
) , Theorem 1.4.lil.
i=l
2.1 Independence 39

Since the right side is finite by hypothesis, so is the left side. Now that
n r = l = X,I is integrable we can use the same computation above for X,
and Fx,,..,x,,(= n,=,
n
Fx,),and this time use Fubini's theorem in place of
Tonelli's. Then we get (6) in place of (7). This proves the result.

Note. It must be remembered that a direct application of Fubini's theorem


is not possible in the above argument since the integrability of I n r = l XiI has
to be established first for this result (cf. Theorem 1.3.11). In this task we need
Tonelli's theorem for nonnegative random variables, and thus the proof caiiiiot
be shortened. Alternatively, oiie can prove (6) first for simple random variables
with Theorem 3b, and then use the Lebesgue monotone (or dominated) coii-
vergence theorem, essentially repeating part of the proof for Tonelli's theorem.

We shall now establish oiie of the most surprising consequences of the iii-
dependelice concept, the zero-one law. If X I , X 2 , . . . is a sequence of random
variables, then n,"==, > >
o ( X i , i n ) is called the tail a-algebra of {X,, n 1).

Theorem 7 (Kolmogorov's Zero-One Law) A n y event belonging t o the


tail a-algebra of a sequence of independent random variables o n ( 0 ,C , P) has
probability either zero o r one.

Proof Denote by 7 = a ( X k ,k > n), the tail a-algebra of the se-


quence. Then by Theorem 3a, a(X,) and a ( X k ,k > + n 1) are independent
a-algebras for each n > 1. But 7 c a ( X k ,k > +
n I ) , so that a(X,) and
7 are independent for each n. By Theorem 3a again 7 is independelit of
a(o(X,),n > >
1) = a(X,, n 1). However, 7 C o(X,,n > 1) also, so that 7
is independent of itself! Hence A E 7 implies

thus we must have P ( A ) = 0 or 1, completing the proof.

An immediate consequence is that any fuiictioii measurable relative to 7


of the theorem must be a coilstant with probability one. Thus lim sup, X,,
lim inf, X, (and limn X, itself, if this exists) of independent random variables
are constants with probability one. Similarly if

then C r = l Xn(w) converges iff w E A, for each n , i.e., iff w E A = An.


Since clearly A, E o ( X k ,k >
n), A E 7, so that P ( A ) = 0 or 1. Thus for
independent X, the series C,"==, X, converges with probability 0 or 1. The
following form of the above theorem is given in Tucker (1967).
40 2 Independence and Strong Convergence

Corollary 8 Let I be a n arbitrary infinite index set, and {Xi, i E I) be


a family of independent r a n d o m variables o n ( Q , C , P ) . If 3 i s the directed
(by inclusion) set of all finite subsets of I , the (generalized) tail a-algebra i s
defined as

T h e n P takes only 0 and 1 values o n '&.

Proof The argument is similar t o that of the theorem. Note that lo and
BJ = a ( X i , i E J) are independent for each J E 3, as in the above proof.
So by Theorem 3a, '& and B = a(BJ, J E 3)are independent. But clearly
B = a ( X i , i E I),so that '& c B. Hence the result follows as before.

Let us now show that independent random variables can be assumed to


exist on a probability space by a process of enlargement of the space b y ad-
junction. The procedure is as follows: Let ( R , C, P) be a probability space. If
this is not rich enough, let (R,, C,, P,),i = 1,..., n , be n copies of the given
n
space. Let (fi, 2,P) = ( x T = ~ Q ~ ~ Ci, Pi) be their Cartesian prod-
uct. If X I , ...,Xn are random variables on ( R , C, P), define a "new" set of
functioiis XI, ...,X, on ( f l , C, P) by the equations

Then for each a E R,

which is a measurable rectangle and hence is in 2 . Thus xi is a random


variable. Also, since Pi = P, we deduce that

by Fubini's theorem and the fact that Pi(Ri) = 1. Consequently the xi are
independent (cf. Theorem 3b) and each xi has the same distribution as X,.
Thus by enlargement of (fl, C, P) to (dlC, P), we have n independent ran-
dom variables. This procedure can be employed for the existence of any finite
collection of independent random variables without altering the probability
structure (see also Problem 5 (a)). The results of Section 3.4 establishing the
Kolmogorov- Bochner theorem will show that this enlargement can be used
for a n y collection of random variables (countable or not). Consequently, we
can and do develop the theory without any question of the richness of the
2.1 Independence 41

underlying a-algebra or of the existence of families of independent random


variables.
The following elementary but powerful results, known as the Borel-Cantelli
lemmas, are true even for the weaker pairwise independent events. Recall that
liinsup, A, = {w : w E A, for infinitely inany n ) . This set is abbreviated as
{A,, i.0.) [= {A, occurs infinitely often)].

Theorem 9 (i) (First Borel-Cantelli Lemma). Let {A,, n > 1) be a


sequence of events in ( R ,C , P) such that C;==,P(A,) < oo. T h e n

P(1im sup A,) = P(A,, i.0. ) = 0.


n

(ii) (Second Borel-Cantelli Lemma). Let {A,, n > 1 ) be a sequence of


pairwise independent events in (a, C , P) such that C,"==, P(A,) = oo. T h e n
P(A,, 2.0.) = 1.
(iii) I n particular, if {A,, n > 1) i s a sequence of (pairwise o r mutually) in-
dependent events, t h e n P(A,, 2.0.) = 0 o r 1 according t o whether Cr=lP(A,)
i s < o o or=oo.

Proof (i) This simple result is used more often than the other more in-

definition, A = limsup, A, =
by the a-subadditivity of P, we have
'
volved parts, since the events need not be (even pairwise) independent. By
U k r nAk Uk>, Ak for all n 1. Hence >

Letting n + oo, and using the convergence of the series Cr=l P ( A k ) , the
result follows.
(ii) (After Chung, 1974) Let {A,,n >
1) be pairwise independent. By
Problem 1 of Chapter 1 , we have

A = [A,, i.o.1 iff XA = lim sup


n
xA7,
Hence
P(A) = 1 iff P[lim sup xA7,= 11 = 1
n

Now we use the hypothesis that the series diverges:


42 2 Independence and Strong Convergence

where S, = Cr=lX A , and the monotonicity of S, is used above. With (11)


and the pairwise independence of A,, we shall show that

P([lim S, = oo]) = 1.
n-00

which in view of (10) proves the assertion.


Now given N > 0, we have by ceby~ev'sinequality, with E = N2/-,

Equivalently,

To simplify this we need t o evaluate Var S,. Let p, = P(A,). Then

If In= )cA,, - p,, then the 1, are orthogonal random variables. In fact, using
the inner product notation,

= P(A, n A,) - p,p, = 0, if n # rn (by pairwise independence)

Thus

Since by (11) E(S,) 7oo, (15) yields Z/V~TS~/E(S~)


5 (E(s,))-'/~ + 0.
Thus given N > 1, and 0 < a1 = a / N for 0 < a! < 1, there exists no =
n o ( n , N ) such that n > no +- Jv~~s,/E(s,) <
a1 < 1. Since n l = n / N ,
we get
N 5 aE(S,), n no. >
2.1 Independence 43

Consequeiitly (12) implies, with 1 > /3 = 1 - a > 0 and the monotonicity of


Sn7 (i.e., Sn T)

Let n + cm, and then N + cm (so that /3 4 1); (17) gives P[lim,,, S, =
oo]= 1. This establishes the result because of (10).
(iii) This is an immediate consequence of (ii), and again gives a zero-
one phenomenon! However, in the case of mutual independence, the proof is
simpler than that of (ii), and we give the easy argument here, for variety. Let
A: = En. Then En,n > 1, are independent, since {o(A,), n 1) forins an >
independent class. Let P(A,) = a,. To show that

it suffices t o verify that, for each n > 1,P(Uk,, Ak) = 1, or equivalently

Now for any n 2 1,

= liin
mi00
n (1 - air) (by independelice of Ek )

- x
00

k=n+l
ak) =0 ( since xa

k=l
a; = oo by hypothesis).

This completes the proof of the theorem.

Note 10 The estimates in the proof of (ii) yield a stronger statement


than we have asserted. One can actually show that

P [ liin
sn
-= 11 = 1.
n-m E(Sn)

In fact, (12) implies for each n and N ;


2 Independence and Strong Convergence

Since V ~ T S , / ( E ( S , ) ) ~+ 0 as n + oo, for each fixed N , this gives

Sn
p[lim sup -< 1 ] > 1 - - ,
1
n E(Sn) - N2
and letting N + oo,we get

s,
liin sup -< 1 a.e.
n E(Sn)
On the other hand by (17), P[P < S,/E(S,)] > 1 - 1/N2, n > no. Hence for
each fixed N , this yields

P
[ < liminf sn -
E(Sn)
] >IpF.
1

Now let N + ce aiid note that p + 1; then by the monotonicity of events


<
in brackets of (19) we get 1 lim inf, [Sn/E(Sn)]
a.e. These two statements
imply the assertion.

Before leaving this section, we present, under a stronger hypothesis than


that of Theorem 7, a zero-one law due t o Hewitt aiid Savage (1955), which is
useful in applications. We include a short proof as in Feller (1966).

Definition 11 If X I , . . . , X , are random variables on (0,C, P),then they


are symmetric (or symmetrically dependent) if for each permutation i l , . . . , i n
of ( 1 , 2 , .. . , n ) , the vectors (X,, , . . . , X,,,) aiid ( X I , . . . , X,) have the same
>
joint distribution. A sequence {X,, n 1) is symmetric if {Xk, 1 5 k 5 n ) is
symmetric for each n > 1.

We want t o consider some functions of X = {X,, n 1). Now X : 0 + >


R" = x E l R i , where Ri = R is an illfinite vector. If B" = is the
(usual) product a-algebra, then

Let g : 0 + R be Co-measurable. Then by Proposition 1.2.3 there is a Bore1


function h : R" + R (i.e., h is B"-measurable) such that g = h o X =
h ( X l , X z , . . .). Thus if {X,, n >
1) is a symmetric sequence, then each Co-
measurable g is symmetric l, so that
In detail, this means if g : lWW i R, then g(X1,.. . , X,, X,+I, . . .) =
g(X,, , . . . , X,,, , X,+I,. . .) for each permutation ( i l , .. . , i,) of ( 1 , 2 , . . . , n), each
n > 1.
2.1 Independence 45

for each finite permutation. Let A E Co. Then A is a s y m m e t r i c event if XA


is a symmetric fuiictioii in the above sense. The following result is true:

Theorem 1 2 (Hewitt-Savage Zero-One Law). If X I , X 2 , . . . are indepen-


dent with a c o m m o n distribution, t h e n every s y m m e t r i c set in

has probability zero o r one.

Proof Recall that if p : Co x Co + R+ defined by p ( A , B ) = P ( A A B )


with A as symmetric difference, then (C,p) is a (semi) metric space on which
the operations U, n, and A are continuous. Also, U r = l a(X1, . . . , X,) c Co
is a dense subspace, in this metric.
Heiice if A E Co,there exists A, E o ( X 1 , . . . , X,) such that p(A, A,) + 0,
and by the definition of o ( X 1 , . . . ,X,) there is a Borel set B, C Rn such that
A, = [ ( X I , .. . , X,) E B,]. Since

X = (XZ1,. . . , X2,,, Xn+l . . .) and X = ( X i , . . . , X,, X n + l . . .)

have the same (finite dimensional) distributions, because the X, are identi-
cally distributed, and we have for any B E Bm, P(X E B ) = P ( X E B). In
particular, if the permutation is such that A, = [(X2,, X a n - ~.,. . , X,+I) E
B,], then A, and A, are independent and p(A, A,) + 0 as n. + oo again.
Indeed, let T be the 1-1 measurable permutation mapping TA, = A, and
TA = A since A is symmetric. So

Hence also A, n A, + A n A = A, by the continuity of n in the p-metric. But

by independence.
Letting n + oo, and noting that the metric function is also contiiiuous in
the resulting topology, it follows that A, + A in p +-P(A,) + P ( A ) . Heiice

lim P(A,
,100
n A,) = P(A n A) = lim P(A,) . P(A,) = P ( A ) ~
,100

Thus P ( A ) = P(AI2 so that P ( A ) = 0 or 1, as asserted.

R e m a r k s (1)It is not difficult t o verify that if S, = EL=, X k , X k as in the


theorem, then for any Borel set B, the event [S, E B i.o.1 is not necessarily a
tail event but is a symmetric one. Thus this is covered by the above theorem,
but not by the Kolmogorov zero-one law.
46 2 Independence and Strong Convergence

(2) Note 10, as well as part (ii) of Theorem 9, indicate how several weaken-
i n g ~of the independence condition can be formulated. A number of different
extensions of Borel-Cantelli lemmas have appeared in the literature, and they
are useful for special problems. The poiiit here is that the concept of inde-
pendence, as given in Defiiiitioiis 1 and 2, leads t o some very striking results,
which then motivate the introduction of different types of dependences for a
sustained study. In this chapter we present only the basic results founded on
the independence hypothesis; later on we discuss how some natural extensions
suggest themselves.

2.2 Convergence Concepts, Series, and Inequalities


There are four coilvergelice concepts often used in probability theory. They
are poilitwise a.e., in mean, in probability, aiid in distribution. Some of these
have already appeared in Chapter 1. We state them again and give some inter-
relations here. It turns out that for sums of independent (integrable) random
variables, these are all equivalent, but this is a relatively deep result. A partial
solution is given in Problem 16. Several inequalities are needed for the proof
of the general case. We start with the basic Kolmogorov inequality aiid a few
of its variants. As consequences, some important "strong limit laws" will be
established. Applications are given in Section 2.4.

Definition 1 Let {X, X n , n >1) be a family of random variables on a


probability space (R, C, P ) .
(a) X, + X pointwise a.e. if there is a set N E C, P ( N ) = 0 aiid
Xn(w) + X ( w ) , as n + oo, for eachw E R - N .
(b) The sequence is said t o converge t o X in probability if for each E > 0,
we have limn,, P [ X n - XI > P
E ] = 0, symbolically written as Xn + X (or
as p limn Xn = X ) .
(c) The sequence is said t o converge in distribution t o X , often written
D
Xn + X if FX,, (z) + &(z) at all points z E R for which z is a continu-
ity poiiit of Fx, where FX,, , FX are distribution functions of X, and X (cf.
Definition 2.2).
>
(d) Finally, if { X , X n , n 1) have p-moments, 0 < p < oo, then the
LJ'
sequence is said t o tend t o X in pth order mean, written Xn + X , if
E ( X n X P ) + 0. If p = 1, we simply say that Xn + X in mean.
-

The first two as well as the last convergences have already appeared, and
these are defined and profitably employed in general analysis on arbitrary
measure spaces. However, on finite measure spaces there are some additional
relations which are of particular interest in our study. The third concept, on
2.2 Convergence Concepts 47

the other hand, is somewhat special t o probability theory since distribution


functions are image probability measures on R. This plays a pivotal role in
probability theory, and so we study the concept in greater detail.
Some ainplification of the conditioiis for "in distribution" is in order. If
X = a a.e., then Fx(z) = 0 for z < a , = 1 for z > a . Thus we are asking that
D
for X, + a , Fx,,(2) + F,(z) for z < a and for z > a but not at z = a , the
discoiitiiiuity point of Fx. Why? The restriction on the set is that it should
be only a "continuity set" for the limit function Fx. This condition is arrived
at after noting the "natural-looking" conditions proved themselves useless.
For instance, if X, = a, a.e., and a, + a as numbers, then Fx,,(z) +
Fx (z) for all z E R {a), but FX,, (a) f + Fx (a), siiice IFx,,(a), n
- > 1)
is an oscillating sequence if there are infinitely inany n on both sides of a.
Similarly, if {X, = a,, n > >
1) diverges, it is possible that {Fx,,(x), n 1)
may converge for each x E R t o a function taking values in the open interval
( 0 , l ) . Other unwanted exclusions may appear. Thus the stipulated condition
is weak enough t o ignore such uninteresting behavior. But it is not too weak,
siiice we do want the convergence on a suitable dense set of R. (Note that
the set of discontinuity points of a monotone function is at most countable,
so that the continuity set of Fx is R - { that countable set ).) Actually,
the condition comes from the so-called simple convergence on Coo(R), the
space of contiiiuous fuiictioiis with compact supports, which translates t o the
condition we gave for the distribution fuiictioiis on R according t o a theorem
in abstract analysis. For this reason N. Bourbaki actually calls it the vague
convergence, and others call it the weak-star convergence. We shall use the
terminology introduced in the definition and the later work shows how these
last two terms can also be justifiably used.
The first three convergences are related as follows:

Proposition 2 Let X, and X be random variables on ( R , C, P). Then


X, i X a.e. + X, + X + X, + X. If, moreover, X = a a.e., where
P D

a E R, then X, i X + X, i X also. I n general these implications are not


D P

reversible. (Here, as usual, the limits are taken as n + oo.)

Proof The first implication is a standard result for any finite measure. In
fact, if X, + X a.e., then there is a set N E C, P ( N ) = 0, and on R - N ,
X,(w) + X(w). Thus lim sup, X,(w) = X(w), w E R - N , and for each E > 0,

Hence the set has measure zero. Since P is a finite measure, this implies
48 2 Independence and Strong Convergence

< P ( N ) = 0. (1)

Consequently,

P
Thus X n i X , and the first assertion is proved.
For the next implication, let F x , Fx,, be the distribution functions of X
and X,, and let a , b be continuity points of Fx with a < b. Then

[ X < a] = [ X < a , X n < b] U [ X < a , X n > b]


c [ X , < b] u [ X < a , X n > b],
so that computing probabilities of these sets gives

P
Also, since X n +X , with E =b- a > 0 , one has from the inclusion

[X < a,Xn > b] c [IX, X I > b a ] ,


l i m P [ X < a , X n > b] = O .
n (4)
Thus ( 3 ) becomes
Fx ( a ) < liin iiif FX,, (b).
n (5)
Next, by an identical computation, but with c, d ( c < d ) in place of a , b and
X,, X in place of X , X n in ( 3 ) ,one gets

Fx,, ( c ) I F x ( d ) + P [ X n < c, X > dl. (6)


The last term tends to zero as n + cm,as in (4). Consequently ( 6 ) becomes

liin sup Fx,, ( c ) I Fx ( d ) . (7)


n

From (5) and ( 7 ) we get for a < b I c < d,


Fx ( a ) I lim inf Fx,, (b) < lim sup F X ,(b)
~
n n

< lim sup Fx,, ( c ) < FX ( d ).


n
2.2 Convergence Concepts 49

Letting a 1' b = c and d J c, where b = c is a continuity point of Fx, (8) gives


D
lim, Fx,,(b) = Fx(b), so that X, + X , since such points of continuity of Fx
are everywhere dense in R.
If now X = a a.e., then for each E > 0,

= 1 - Fx,, (a + + FX,,(a
E) - E) + 0 as n + oo,

since

*
and a! E are points of continuity of Fx for each E > 0. Thus X + a. This
P

completes the proof except for the last comment, which is illustrated by the
following simple pair of standard counter-examples.

Let X,, X be defined on ( R , C, P) as two-valued random variables such


that P([X, = a]) = 1 = P ( [ X , = b]),a < b, for all n. Next let P ( [ X =
b]) = = P ( [ X = a]). Then for each n , w E 0 , for which X,(w) = a (or
b), we set X(w) = b (or a ) , respectively. Thus {w : IX, Xl(w) - E) = fl >
if 0 < E < b a , aiid X, f i X in probability. But Fx,, = Fx, so that
-

D
X, + X trivially. This shows that the last implication callnot be reversed in
general. Next, consider the first one. Let R = [O, 11, C = Bore1 0-algebra of R ,
and P = Lebesgue measure. For each n > 1, express n in a binary expansion,
+ < < >
n = 2r k, 0 k 2r, r 0. Define f, = XA,, , where A, = [k/2r, (k+ 1)/2'].
It is clear that f, is measurable, and for 0 < E < 1,

But f,(w) f + 0 for any w E R. This establishes all assertions. (If we are
allowed to change probability spaces, keeping the same image measures of the
random variables, these problems become less significant. Cf. Problem 5 (b).)
In spite of the last part, we shall be able t o prove the equivalence t o a
subclass of random variables, namely, if the X, form a sequence of partial
sums of independent random variables. For this result we need to develop
probability theory much further, aiid thus it is postponed until Chapter 4.
(For a partial result, see Problem 16.) Here we proceed with the iinplicatioiis
that do not refer to "convergence in distribution."
The following result is of interest in many calculations.

P r o p o s i t i o n 3 (F. Riesz). Let {X, X,, n >


1) be random variables on
P
( R , C, P) such that X, + X . Then there exists a subsequence {X,, , k 1) >
with X,, + X a.e. as k + oo.
50 2 Independence and Strong Convergence

Proof Since for each E > 0, P [ X , - XI >


E] + 0, let n1 be chosen such
>
that n n1 + P[IX, - XI >
11 < ,and if n l < na < . . . < nk are selected,
let nk+l > nk be chosen such that

If Ak = [IX,, XI- >


1/2kp1], B k = U,,,A,,
-
then for w E B i , X,, -

Xl(w) < 1/2'-' for all r >


k . Hence if B = limn B, = Uk2, Ak, theii
for w E B C ,X,, (w) + X(w) as r + cm. But we also have B c B, for all n ,
so that

Thus r > 1) is the desired subsequence, completing the proof.


{x,",,
Remark We have not used the finiteness of P in the above proof, and the
result holds on noiifiiiite measure spaces as well. (Also there can be infinitely
many such a.e. convergent subsequences.) But the next result is strictly for
(finite or) probability measures only.

>
Recall that a sequence {X,, n 1) on ( R , C, P) converges P-uniformly to
X if for each E > 0, there is a set A, E C such that P(A,) < E aiid on R A E ,
X, + X uniformly. We then have

T h e o r e m 4 (Egorov). Let {X,X,,n > 1) be a sequence of random


variables o n ( R , C, P). T h e n X, + X a.e. iff the sequence converges t o X
P-uniformly.

Proof One direction is simple. In fact, if X, + X P-uniformly, then for


E = l / n O there is an A,, E C with P(A,,) < l / n O and X,(w) + X(w)
uniformly on R A,,. If A =
- n,,,
A,, theii P ( A ) = 0, aiid if w E fl A, -

theii X,(w) + X(w), i.e., the sequence converges a.e. The other direction is
non-trivial.
Thus let X, i X a.e. Then there is an N E C, P ( N ) = 0, and X,(w) +
X(w) for each w E R - N . If k > 1, m > 1 are integers and we define
1
Ak,, = {w E 0 - N : IX,(w) - X ( w ) < - for all n
m
> k),
then the facts that X, + X on 0 - N and Ak,, c &+I,, imply that
R-N = U >
=,: Ak,, for all m 1. Consequently for each E > 0, and each
m >
1, we can find a large enough ko = k o ( ~m, ) such that Aka,, has large
measure, i.e., P ( R Ak,,,) < €12". If A, = Urn=, Aio(E,m),m
00
- then
2.2 Convergence Concepts 51

On the other hand, n > ko(&,m ) + IX,(w) - X(w)1 < l l m for w E Ak,,,.
Thus

for every m > 1, so that X, +X uniformly on A:. This completes the proof.

Also, the following is a simple consequence of Markov's inequality.


c
R e m a r k Let { X , X n , n > 1) c P ( Q , C ,P) such that X, + X, p > 0.
T h e n X, 3 X .

Proof Given E > 0, we have

by the pth mean convergence hypothesis. Note that there is generally no re-
lation between mean convergence and pointwise a.e., since for the latter the
random variables need not be in any LP, p > 0.
We now specialize the convergence theory if the sequences are partial sums
of independent random variables, and present important consequences. Some
further, less sharp, assertions in the general case are possible. Some of these
are included as problems at the end of the chapter.
At the root of the pointwise convergence theory, there is usually a "max-
imal inequality," for a set of random variables. Here is a generalized version
of ~ e b y # e v ' inequality.
s The latter was proved for only one r.v. We thus start
with the fundamental result:

Theorem 5 (Kolmogorov's Inequality). Let X I , X2,. . . be a sequence


of independent r a n d o m variables o n (a,C , P) with m e a n s PI, = E ( X k ) and
variances a; = V a r X k . If S, = Xk and E > 0, t h e n

Proof If n = 1, then (10) is ~ e b y ~ e vinequality,


's but the present result is
deeper than the former. The proof shows how the result may be generalized
t o certain nonindependent cases, particularly t o martingale sequences, t o be
studied in the next chapter.
>
Let A = {w : maxllks, ISk(w) - E(Sk)l E}. We express A as a disjoint
union of n events; such a decomposition appears in our subject on several
occasions. It became one of the standard tools. [It is often called a process of
disjunctification of a compound event such as A.] Thus let
52 2 Independence and Strong Convergence

and for 1 < k < n,

In words, Ak is the set of w such that S k ( w ) E(Sk)lexceeds E for the first


-

time. It is clear that the Ak are disjoint, Ak E C , and A = UL=l Ak. Let
Y , = X, - p, and S, = xi=l Yk, so that E(s,) = 0, VarS, = VarS,. Now
consider

n
(Yk+l + . . . + Y,,)~dP, +
since Sn = Sk Y,,
i=k+l

(since h ~Sk, and Y,,i > k + 1, are independent)

Adding on 1 < k < n , we get

Siiice VarS, = xy=lVarX,, by independelice of the Xi, this gives ( l o ) , and


completes the proof.

Remark The only place in the above proof where we use the independence
hypothesis is to go from (11) t o the next line to conclude that

Any other hypothesis that guarantees the nonnegativity of this term gives the
corresponding maximal inequality. There are several classes of iioiiindepeiident
random variables iiicludiiig (positive sub-) C2-martingale sequences giving
such a result. This will be seen in the next chapter.
2.2 Convergence Concepts 53

All the strong convergence theorems that follow in this section are due t o
Kolmogorov.

Theorem 6 Let X I , X2, . . . be a sequence of independent random variables


on (a,
C , P) with means p l , p2,. . . , and variances a:, a;, . . . . Let

and a2 = C,"==, a:. Suppose that a2 < ce and C p = l p k converges. T h e n


C p = l X k converges a.e. and i n the mean of order 2 to a n r.v. X . Moreover,
E ( X ) = C p = l p k , V a r X = a 2 , and for any E > 0 ,

Proof It should be shown that limn Sn exists a.e. If this is proved, since
C r = l pk converges, we get

lim
n
x
n

k=l
X k = lim Sn
n+cx
+ n+m
lim xn

k=l
pk = X exists a.e.

>
But the sequence {Sn(w),n 1) of scalars converges iff it satisfies the Cauchy
criterion, i.e., iff inf, supk S m + k ( ~ ) - S m
= (0~ ) Thus let E > 0 be given,
a.e.
and by Theorem 5,

Hence letting k + ce in (13) and noting that the events

form an increasing sequence, we get

It follows that

Letting E /' GO, since 02 < GO, the right side of (15) goes t o zero, so
that limsup,,,IS, -SmI < cc a.e. But IS, 5 S n - S m + ISml,so
2 Independence and Strong Convergence

< SmI+ limsup


n>m
S, - Sml

< Sml+ n)m


sup IS,-Sm < cm a.e.

Thus liin sup, Sn,liin inf, Sn must be finite a.e. Also

as m + cm for each E > 0. It follows that limsup, Sn = lim inf, Sn a.e., and
the limit exists as asserted.
If we let m = 0 in (14) and Xo = 0, then (14) implies (12). It remains to
establish mean convergence. In fact, consider for rn < n , with = X, -p,,

E((s,-s,,)') = E ( ( x m + , + . . .+x,)") = xn

k=m+l
a: +0 as m , n + cm. (17)

Thus S, + S in L 2 ( P ) , and hence also in L1(P), since I f I l I f l a for <


any f E L2. It follows that E (S2)= limn E (S;) = limn x i = l a; = a2,and
+
E(S) = limn E(Sn)= 0. But X = S C;=, p n , so that E ( X ) = C;==, p,.
This completes the proof.

Remarks (1) If we are given that limn x r = l X k exists in L 2 and x:=l p,


converges, then Sn = x i = l ( X k - p k ) + S in L 2 also, so that a: =
E(s;) + E ( s ~ = ) a2.Thus C k = l a; < cm. Hence by the theorem Cr!l XI,
also exists a.e.
(2) If the hypothesis of iiidepeiideiice is simply dropped in the above the-
orem, the result is certainly false. In fact let X, = X l n , where E ( X ) = 0,
0 < V a r X = a2 < oo,so that x r = l pk = 0 and

00
But Xn = X x r = l 1/n, diverges a.e., on the set where X > 0, a.e.
A partial converse of the above theorem is as follows.

Theorem 7 Let {X,, n > 1) be a uniformly bounded sequence of in-


dependent random variables on (R,
C , P) with means zero and variances
2.2 Convergence Concepts 55

{a:, n > 1). I f x:=l X, converges on a set of positive measure, then


x r = l a; < ce, and hence the series actually converges a.e. on the whole
space 0 .

Proof Let Xo = 0 and S, = C r = l XX,.If A is the set of positive measure


on which S, i S a.e., then by Theorem 4 (of Egorov), there is a measurable
subset A c A of arbitrarily small measure such that if Bo = A - A C A,
we have P(Bo)> 0 and S, i S on Bo uniformly. Since S is an r.v., we can
find a set B c Bo of positive measure (arbitrarily close t o that of Bo),aiid a
positive number d such t h a t IS, 1 5 d < ce on B. Thus if 2 nr=o[lS,l 5 dl,
-;

then 2 E C , 2 1 B , aiid ~ ( 2 ) >


P(B) > 0.
Let A, = n;=,[Slcl I dl, so that A, J A. If C, = A, - A,+1, and
Co = C,, which is a disjoint union, let a, = JA,, S;dP. Clearly a, I
d2P(A,) <d2, so that {a,, n 2 1) is a bounded sequence. Consider

However,

IA,,-I

and
x ; d p = E(XA ,b+, x;) = a i p ( ~ , - l ) by independence of X A , , and
~ ~ X,,

1A,,-I
&S,-ldP = E ( X , ) E ( X A ~ , + , S , - ~=

>
since E(X,) = 0. Thus by noting t h a t P(AnP1) P(A,), (18) becomes, with
) 0,

these simplifications and the hypothesis that X,I c < ce a.e., <
56 2 Independence and Strong Convergence

Summing over n = 1 , 2 , . . . , m , we get (ao = 0)

Hence recalling that a m < d 2 , one has

Since P(Z) > 0, (19) implies that Cr=la: < oo. This yields the last state-
ment and, in view of Theorem 6, completes the proof.

As an immediate consequence, we have

Corollary 8 If {X,, n > 1) is a uniformly bounded sequence of indepen-


dent random variables o n (0,C, P) with E(X,) = 0, n > 1, then Cr=lX,
converges with probability 0 o r 1.

We are now in a position t o establish a very general result on this topic.

Theorem 9 (Three Series Theorem). Let {X,,n > 1) be a sequence of


independent random variables o n (0,C, P). T h e n Cr=lX, converges a.e. iff
the following three series converge. For some (and then every) 0 < c < cm,
(i) Cr=iP([IXnl > el),
(4 C= :1 E ( X i ) ,
(iii) Xr=lcr2(Xi),
where X i i s the truncation of X, at c, so that X i = X, if X , < C , and
= 0 otherwise.

Proof Sufficiency is immediate. In fact, suppose the three series converge.


By (i), and the first Borel-Cantelli lemma, P[limsup, IX,I > c] = 0, so that
for large enough n , X, = X i a.e. Next, the convergence of (ii) and (iii) imply,
by Theorem 6, Cr=lX i converges a.e. Since X, = X i for large n , Cz=lX,
itself converges a.e. Note that c > 0 is arbitrarily fixed.
Conversely, suppose Czl X, converges a.e. Then limn X, = 0 a.e. Hence
if A,,, = [X, # X i ] = [IX, > c] for any fixed c > 0, then the A,,, are inde-
pendent and P[limsup, A,,,] = 0. Thus by the second Borel-Cantelli lemma
(cf. Theorem 1.9iii), P(A,,,) < cm, which proves (i). Also, Xi
converges a.e., since for large enough n , X i and X, are equal a.e. But now
the XE are uniformly bounded. We would like t o reduce the result t o The-
orem 7. However, E(X:) is not necessarily zero. Thus we need a new idea
for this reduction. One considers a sequence of independent random variables
X; which are also independent of, but with the same distributions as, the
2.2 Convergence Concepts 57

X:-sequence. Now, the given probability space may not support two such se-
quences. In that case, we enlarge it by adjunction as explained after Corollary
8 in the last section. The details are as follows.
Let (fi,2?, P) = (a, C, P) @ ( R , C, P), and let Xk, X:; be defined on fi
by the equatioiis

X;(W) = Xg(wl), X;(W) = Xg(w2), where w = (wl, w2) E d. (20)


It is trivial t o verify that {XA, n >
11, { X i , n >1) are two mutually inde-
pendent sequences of random variables on (fi, 2, <
P), XkI c, i = 1 , 2 , and
>
have the same distributions. Thus if 2, = Xk - X:, n 1, then E(Z,) = 0,
VarZ, = VarXk + VarX: = 2a: (x:), and {Z,, n >
1) is a uniformly
bounded (by 2c) independent sequence t o which Theorem 7 applies. Hence,
by that result, C,"==, VarZ, < oo, so that C,"==, ai(~:)< cc, which is (iii).
Next, if Y, = X: - E(X:), then E(Y,) = 0, Vary, = VarX:, so that
a2(yn)< oo. Hence by Theorem 6, C r = l Y, converges a.e. Thus we
00 00
have E(X:) = C;==, X i -
Y,,and both the series on the right
converge a.e. Thus the left side, which is a series of constants, simply con-
verges and (ii) holds. Observe that if the result is true for one 0 < c < oo,
then by this part the three series must converge for every 0 < c < cc. This
completes the proof.

Remarks (1) If any one of the three series of the above theorem diverges,
then En,, X, diverges a.e. This means the set [C,"==, X, converges] has prob-
ability zero, so that the zero-one criterion obtains. The proof of this statement
is a simple consequence of the preceding results (since the convergence is de-
termined by C k > , XI, for large n ) , but not of Theorem 1.12.
(2) Observe that the convergence statements on series in all these theorems
relate t o unconditional convergence. It is not absolute convergence, as simple
examples show. For instance, if a, > 0, C,"==, a, = oo, but

then the independent random variables X, = *a, with equal probability on


(0,E, P) satisfy the hypothesis of Corollary 8 and so C r = l X, converges a.e.
But it is clear that C r = l X,I = C r = l lanl = cc a.e. The point is that X, E
L 2 ( R ,C, P) and the series C;==, X, converges uncoiiditioiially in L 2 ( P ) , but
not absolutely there if the space is infinite dimensional. In fact, it is a general
result of the Banach space theory that the above two convergences are unequal
in general.
(3) One can present easy sufficient conditions for absolute convergence
00
of a series of random variables on (0,C , P). Indeed, X, converges
absolutely a.e. if C,"==, E(IX,) < oo. This is true since E(C,"_, IX,) =
C,"==, E ( X , I) < cc by the Lebesgue dominated convergence theorem, and
since Y = C r = l IX, is a (positive) r.v. with finite expectation, P [ Y > A] <
58 2 Independence and Strong Convergence

E(Y)/X + 0 as X + oo, so that 0 5 Y < oo a.e. Here X, need not be in-


dependent. But the integrability condition is very stringent. Such results are
"nonprobabilistic" in nature, and are not of interest in our subject.
A natural question now is to know the properties of the limit r.v. X =
C,"==, X, in Theorem 9 when the series converges. For example: if each X, has
a countable range, which is a simple case, what can one say about the distri-
bution of X ? What can one say about Y = Cr=la n X n , where Cr=la: < oo,
E(X,) = 0, E ( X 2 ) = 1, and X, are independent?
Not much is known about these queries. Some special cases are studied,
and a sample result is discussed in the problems section. For a deeper analysis
of special types of random series, one may refer to Kahaiie (1985). We now
turn to the next important aspect of averages of independent random vari-
ables, which has opened up interesting avenues for probability theory.

2.3 Laws of Large Numbers


Very early in Section 1.1 we indicated that probability is a "long-term aver-
age." This means that the averages of "successes" in a sequence of independent
trials "converge" to a number. As the preceding section shows, there are three
frequently used types of convergences, namely, the pointwise a.e., the stochas-
tic (or "in probability") convergence, and the distributional convergence, each
one being strictly weaker than the preceding one. The example following the
D
proof of Proposition 2.2 shows that X, + X does not imply that the X,(w)
need t o approach X(w) for any w E fl. So in general it is better to consider
the a.e. and "in probability" types for statements relating to outcomes of w.
Results asserting a.e. convergence always imply the "in probability" state-
ments, so that the former are called strong laws and the latter, weak laws. If
the random variables take only two values (0, I), say, then the desired conver-
gence in probability of the averages was first rigorously established by James
Bernoulli in about the year 1713, and the a.e. convergence result for the same
sequence was obtained by E. Bore1 only in 1909.
Attempts to prove the same statements for general random variables, with
range space R, and the success thus achieved constitute a general story of the
subject at hand. In fact, P. L. ceby#ev seems to have devised his inequality
for extending the Bernoulli theorem, and established the following result in
1882.

Proposition 1 ( ~ e b ~ ~ eLet v )X.I , X2,. . . be a sequence of independent


C , P ) with means p1, p2, . . . and variances a:, a;, . . . ,
random variables o n (R,
such that if Sn = C:'=,XXi, one has 02(s,)/n2 + 0 as n + oo. T h e n the
sequence obeys the weak law of large numbers (WLLN), which means, given
E > 0, we have
2.3 Laws of Large Numbers

Proof By ~ e b ~ ~ einequality
v's (1) follows at once.

Note that if all the X, have the same distribution, then they have equal
moments, i.e., a: = 022 = ... = a2, SO that 02(sn) = Cr=l 022
= n a 2 , and
a 2 ( S n ) / n 2 = a 2 / n + 0 is automatically satisfied. The result has been im-
proved in 1928 by A. Khintchine, by assuming just one moment. For the
proof, he used a truncation argument, originally introduced in 1913 by A. A.
Markov. Here we present this proof as it became a powerful tool. Later we see
that the result can be proved, using the characteristic function technique, in
a very elementary manner, and even with a slightly weaker hypothesis than
the existence of the first moment [i.e., only with the existence of a derivative
at the origin for its Fourier transform; that does not imply E ( X ) exists].

T h e o r e m 2 ( K h i n t c h i n e ) Let X I , X 2 , . . . be independent random vari-


ables on (a, C , P) with a common distribution [i.e., P[X, < z] = F(z),z E R,
>
for n 11 and with one moment finite. Then the sequence obeys the WLLN.

Proof We use the preceding result in the proof for the truncated functions
and then complete the argument with a detailed analysis. Let E > 0, S > 0 be
given. Define
Ug = X k ~ [ <~ n ~6 ] ,h Vp = X k ~ [ l X ~ l > n 6 ] , PI
+
so that X k = Uk Vk. Let F be the common distribution function of the
X k . Since E ( X k l ) < GO, we have M = E(IXkl) = JR z l d F ( z ) < GO, by
the fundamental (image) law of probability. If p = E ( X k ) = JR z d F ( z ) and
pk = E (UE) , then

and by the dominated convergence theorem, we have

>
Thus there is N1 such that n Nl + /A; - pi < &/2. Note that p; depends
only on n, and hence not on k, because of the common distribution of the X k .
Similarly

By hypothesis U,", U,", ... are independent (bounded) random variables with
means pd and variances bounded by n6M. Let TG = U," +
. . . U g and +
60 2 Independence and Strong Convergence

+ +
W z = V;L . . . V z . Then by the preceding proposition, or rather the
ceby;ev's inequality,

On the other hand, adding and subtracting n p and using the triangle inequal-
ity gives

Thus if n > Nl, we have, with the choice of Nl after (2), on the set

the following:

Hence for n > N1 this yields

But by definition Sn = T: + W z ,n > 1, so that

Choose N2 such that n > N2 '.J ~ ~ l s l > .rldF(.r)


nsl < S2, which is possible
since M = E ( X k l ) < oo. Thus for n > N2, P[V,"# 0] 5 S2/(n6) = 6/n by
(7). Consequently,
2.3 Laws of Large Numbers

If N = max(N1, N 2 ) and n > N , then (5) and (8) give for ( 6 )

Letting n + oo aiid then 6 +0 in (9), we get the desired conclusion.

It is important t o notice that the independence hypothesis is used only


in (4) in the above proof in deducing that the variance of Tc = the sum of
variances of U r . But this will follow if the Uf;nare uncorrelated for each n.
In other words, we used only that

x (1 ydFxJ (Y) = IL:IIL:r*.


= Izl>n6] [lvlIm61

Now this holds if Xi, X j are independent when i # j . Thus the above proof
actually yields the following stronger result, stated for reference.

Corollary 3 Let X I , X 2 , . . . be a pairwise independent sequence of ran-


d o m variables o n (a,C, P ) with a c o m m o n distribution having one m o m e n t
finite. T h e n the sequence obeys the WLLN.

In our development of the subject, the next result serves as a link be-
tween the preceding considerations and the "strong laws." It was obtained
by A. Rajchman in the early 1930's. The hypothesis is weaker than pairwise
independence, but demands the existence of a uniform bound on variances,
aiid then yields a stronger conclusion. The proof uses a different technique, of
interest in the subject.

Theorem 4 Let { X n , n > 1) be a sequence of uncorrelated ran-


d o m variables o n (a,C, P ) such that a 2 ( X n ) < M < oo, n > 1. T h e n
[S, E ( S n ) ] / n + 0 i n L2-mean, as well as a.e. [ T h e pointwise convergence
-

statement is the definition of the strong law of large numbers ( S L L N ) of a


sequence.]

Proof The first statement is immediate, since


62 2 Independence and Strong Convergence

by the uncorrelatedness hypothesis of the X, and the uniform boundedness


of a 2 ( X k ) .This, of course, implies by Proposition 1 that the WLLN holds for
the sequence. The point is that the a.e. convergence also holds.
Consider now, by ~ e b y ~ e vinequality,
's for any E > 0,

Hence by the first Borel-Cantelli lemma, letting Yk = X k - E ( X k ) and


3, = Yk (so that the Yk are orthogonal), one has P ( [s n 2 I > n2&],i.0.) =
0, which means sn2/n2 + 0 a.e. This is just an illustration of Proposition 2.3.
With the bouiidedness hypothesis we show that the result holds for the
full sequence g,/n and not merely for a subsequence, noted above.
For each n > < +
1, coiisider n2 k < ( n 1)' and gk/k Then

- -
and let T, = max,a<l;<(,+l)a S k - Sn2. Since as k + ce the first term on
the right + 0 a.e., (shown above) it suffices to establish Tn/n2 + 0 a.e. To
use the orthogonality property of the Yk, consider T.: We have

and so for n > 2, since a2(x)= a2(x,)< M ,

This crude estimate is sufficient to show, as before, that

by Markov's inequality and (12). Thus En,, PIITn > n2€] < ce and the
Borel-Cantelli lemma again yields p[ITn/n2 > E , i.o.1 = 0. Hence Tn/n2 + 0
a.e. and by (11) S k / k + 0 a.e., proving the result.
2.3 Laws of Large Numbers 63

We now strengthen the probabilistic hypothesis from uncorrelatedness t o


mutual independence and weaken the moment condition. The resulting state-
ment is significantly harder t o establish. It will be obtained in two stages,
and both are of independent interest. They have been proved in 1928 by A.
Kolmogorov, and are sharp. We begin with an elementary but powerful result
from classical summability theory.

Proposition 5 (Kronecker's Lemma). Let a l , a 2 , . . . be a sequence of


numbers such that C,,,(a,/n)
-
converges. Then

Proof Let so = 0, s, = C:=,(ak/k), and R, = ak. Then s, 4 s


by hypothesis. Also, ak = k(sk sk-I), so that
-

Hence

because s, + s +- for any E > 0, there is no[= no(€)] such that n > no +-
Is, - sI < E , and hence

as n + cm. [This is called (c,1)-convergence or Cesciro summability of s,.]


Since E > 0 is arbitrary, the result follows.

Theorem 6 (First form of SLLN). If X I , X 2 , . . . is a sequence of indepen-


dent random variables on ( 0 ,C , P) with means zero and variances a:, a;, . . . ,
satisfying C r = l ( a i / n 2 ) < cm, then the sequence obeys the SLLN, i.e.,

a.e. a s n + oo.

Proof Let Y, = X,/n. Then the Y,, n >


1, are independent with means
zero, and C:=l a2(y,) = Cr=l(a:/n2) < cm.Thus by Theorem 2.6, C r = l Y,
64 2 Independence and Strong Convergence

converges a.e. Hence x r = l ( X , / n ) converges a.e. By Kronecker's lemma,


( l / n ) x i = l X k + 0 a.e., proving the theorem.

This result is very general in that there are sequences of independeiit


>
random variables {X,, n 1) with means zero and finite variances o:, o;, . . .
satisfying x:=l(ai/n2) = oo for which the SLLN does not hold. Here is a
simple example. Let X I , X 2 , . . . be independent two-valued random variables,
defined as
1
P ( [ X n = n ] ) = P ( [ X n = -n]) = -.
2
Hence E(X,) = 0, a2(X,) = n2, SO that x r = l [ a 2 ( X , ) / n 2 ] = +oo. If the
sequence obeys the SLLN, then X k ) / n + 0 a.e. This implies

hence P I I X n > n,i.o.] = 0. By independence, this and the second Borel-


Caiitelli lemma yield C,"==, P [ I X , > n] < oo. However, by definition
P[lX,I > n] = 1, and this contradicts the preceding statement. Thus
(1/n) x r = l X k f ) 0 a.e., and SLLN is not obeyed.
On the other hand, the above theorem is still true if we make minor relax-
ations on the means. For instance, if {X,, n > 1) is independent with means
> >
{p,, n 1) aiid variances {a::, n 1) such that (i) C,"==, 0: < oo aiid (ii) if
either p, i 0 or just (1112)C;=, pk + 0 as n + ce, then ( l l n ) C;=, Xk + 0
a.e. Indeed, if Y, = X, - p,, then {Y,, n > 1) satisfies the conditions of the
above result. Thus ( l / n ) x i = l Yk = (1112) Xk - ( l l n ) x i = l pk + 0 a.e.
If pk + p , then (1/n) C i = l pk + p by (14). Here p = 0. The same holds if we
only demanded ( l l n ) EL=, pk + 0. In either case, then, ( l l n ) C;=, X k + 0
a.e. However, it should be remarked that there exist independeiit syininetric
two-valued X,, n >1, with x , , 1 [ 0 2 ( ~ , ) / n 2 ] = ce obeying the SLLN. Ex-
amples can be given t o this effect, if we have more information on the growth
of the partial sums {S,, n >I), through, for instance, the laws of the iterated
logarithm. An important result on the latter subject will be established in
Chapter 5.
The following is the celebrated SLLN of Kolmogorov.

Theorem 7 (Main SLLN). Let {X,, n > 1) be independent r a n d o m vari-


ables o n (R, C , P) with a c o m m o n distribution and Sn = EL=, X k . T h e n
S n / n i a o , a constant, a.e. i f fE ( I X 1 ) < +ce, in which case a0 = E ( X 1 ) .
O n the other hand, if E(IXII) = +GO,t h e n limsup,(S,l/n) = +oo a.e.

Proof To prove the sufficiency of the first part, suppose E ( X 1 l ) < oo.We
use the truncation method of Theorem 2. For simplicity, let E(X1) = 0, since
otherwise we consider the sequence Yk = XI, E(X1). For each n, define
-
2.3 Laws of Large Numbers

Thus X, = U, + V, and {U,, n >


I}, {Vn,n >1) are independent se-
quences. First we claim that liinsup, VnI = O a.e., ilnplyillg ( l l n ) EL=l Vk -+
O a.e. That is to say, P([Vn # 01, i.0.) = 0. By independence, and the Borel-
Cantelli lemma, this is equivalent t o showing E r = l PIIVn > 01 < oo.
Let us verify the convergence of this series:

= C P I X 1 I > n] (since the Xi have the same distribution)

00

= C nan,n+i ( where ak,k+l = P[k.< 1x1 < k + 11)

Next consider the bounded sequence {U,, n > 1) of independent random


variables. If pn = E(Un), then

by the dominated convergence theorem. Hence ( l l n ) EL=, PI, + 0. Thus by


the remark preceding the statement of the theorem, if E,"==, [02( u n ) / n 2 ]< oo,
then (1112) C:=l Un + 0 a.e., and the result follows.
We verify the desired convergence by a computation similar to that used
in (15). Thus

X ; ~ P( by the common distribution of the X n )


= Axl l<nl
66 2 Independence and Strong Convergence

Hence

x00

n=l n2
5
00 n-1

n=l k=O
H a k , k + l [ using the notation of (15)]
n2

Thus
. n . n . n

as n + GO.
Conversely, suppose that S n / n + a o , a constant, a.e. We observe that

so that liin sup, ( X , 1 In) = 0 a.e. Again by the Borel-Cantelli lemma, this is
equivalent t o saying that C r = l P I I X n > n] < oo. But

= x00

n=l
nan,,,+l [ as shown for (15)]

Hence E ( I X 1 ) < oo. Then by the sufficiency ( l / n ) S n + E ( X 1 ) a.e., so that


a0 = E ( X l ) , as asserted.
For the last part, suppose that E ( X I I ) = +oo, so that E ( X l l / a ) = +GO
for any a > 0. Then the computation for (16) implies
2.3 Laws of Large Numbers

since the X, have the same distribution. Consequently, by the second Borel-
Cantelli lemma, we have

P([X, I > a n ] ,2.0.) = 1. (17)


But IS, - SnPII= IXnI > a n implies either IS,I > a n / 2 or ISnP1I> a n / 2 .
Thus (17) and this give

Heiice for each a > 0 we can find an A, E C , P(A,) = 0, such that

limsup-1 % a
>- on R - A,.
n n 2
Letting a run through the rationals and setting A = UCYE ratiollalS A", we
get P ( A ) = 0, and on R A, limsup,(lS,/n) > k for every k > 0. Heiice
-

liin sup,(IS, I n ) = +oo a.e. This completes the proof of the theorem.

The above result contains slightly more information. In fact, we have the
following:

Corollary 8 Let {X,;n > 1) be as i n the theorem with E ( X 1 l ) < oo.


T h e n IS, / n + E(X1) i n L1 (P)-mean ( i n addition t o the a. e. convergence).

>
Proof Since the X n are i.i.d., so are the X n l , n 1, and they are clearly
independent. Moreover, by i.i.d. P[-x < X, < x] = P[-x < X1 < x]. Indeed

> 1)
=S, x [ l x l < x l d F ( Xsince
)( X, has F as its d.f. for all n

= P[IXl < x] (by the image law).


By the SLLN, S,/n + E(X1) a.e., so that S n l / n + IE(X1)l a.e. Given E > 0,
choose xo > 0 such that

If S
A =Ck=l
n
and S: = Sn-SA, then {SA/n, n > 1) is uniformly
bounded, so that it is uniformly integrable. But
68 2 Independence and Strong Convergence

>
uniformly in n. Thus { ( l / n ) S t , n 1) is also uniformly integrable. Conse-
>
quently { ( l / n ) S n I , n 1) is a uniformly integrable set. Hence the result
follows by Vitali's theorem and the limits must agree, as asserted.

R e m a r k See also Problem 10 for similar (but restricted t o finite measure


or probability spaces) convergence statements of real analysis, without men-
tion of independence.

These results and their methods of proofs have been extended in various
directions. The idea of investigating the averages (both the WLLN and SLLN)
has served an important role in creating the modern ergodic theory. Here the
random variables X n are derived from one fixed function X1 : R + R in terms
of a measurable mapping T : R + 0 [TP1(C) c C] which preserves measure,
meaning P = P o T p l , or P ( A ) = P ( T p l ( A ) ) ,A E C. Then

>
where T 2 = T O Tand T n = T o T n p l , n 1. Since X1 : R + R and T : f l + R
are both measurable, so that (Xl o T)-I (B) = TP1 ( X P 1(B)) c T P 1(C) c C ,
where B is the Bore1 a-algebra of R, X 2 is an r.v., and similarly X n is an r.v.
For such a sequence, which is no longer independent, the prototypes of the
laws of large numbers have been proved. These are called ergodic theorems.
The correspoiideiits of weak laws are called mean ergodic theorems and those
of the strong laws are termed individual ergodic theorems. This theory has
branched out into a separate discipline, leaning more toward measure theoretic
functional analysis than probability, but still retaining important connections
with the latter. For a brief account, see Section 3 of Chapter 7.
Another result suggested by the above theorem is t o investigate the growth
of sums of independent random variables. How fast does Sn cross some pre-
scribed bound? The laws of the iterated logarithm are of this type, for which
more tools are needed. We consider some of them in Chapters 5 and later. We
now turn t o some applications.

2.4 Applications to Empiric Distributions, Densities,


Queueing, and Random Walk
(A) Empiric Distributions

One of the important and popular applications of the SLLN is t o show that
the empiric distribution converges a.e. and uniformly t o the distribution of
the random variable. To make this statement precise, coiisider a sequence
of random variables X I , X 2 , . . . on ( f l , C, P) such that P [ X n < z] = F(z),
2.4 Applications 69

z E R,n > 1; i.e., they are identically distributed. If we observe "the seg-
ment" X I , . . . , X n , then the empiric distribution is defined as the "natural"
proportion for each outcome w E R :
1
Fn(z,w) = -{ number of Xi (w) < z} . (1)
n
Equivalently, let us define

We have the following important result, obtained in about 1933.

T h e o r e m 1 (Glivenko-Cantelli). Let X I , X 2 , . . . be independent and


identically distributed (i.i.d.) r a n d o m variables o n (a, C , P ) . Let F be their
c o m m o n distribution function, and if the first n - r a n d o m variables are "ob-
served" (termed a r a n d o m sample of size n ) , let Fn be the empiric distribution
determined by (1) [or (211 for this segment. T h e n

lim
n-a
sup
-a<,<,
IFn(z,.)-F(z)l=O
I =l.

Proof Since the X, are identically distributed with a common distribution,


the same is clearly true of the Y , given by (2). Indeed,

for all i > 1. Hence by the (special case of) SLLN, we get

We need t o prove the stronger assertion on a.e. uniform convergence in z


for (4), which is (3). This is more involved and is presented in three steps.

1. Let 0 < k < r be integers and z k , r be a real number such that

for definiteness [and use F(-oo) = lim,,-, F(z) = 0, F ( + m ) = lim,,+, F(z) =


11. Also define
70 2 Independence and Strong Convergence

and
lim F,(zi,,
Ha,, = {w : n-00 + 0, w) = F(zk,, + 0)) .
The11 by (4), P(Ek,,) = 1 = P(Hk,,), 1 < k < r. Let

2. We have P(ET)= 1 and if E =


A, B E C, P ( A ) = 1 = P(B),then clearly
nzlE,, then P(E) = 1. In fact, if

Hence P ( A n B) = 1. By induction, with A = Ek,T, B = Hk,T, k = 1, . . . , r ,


it follows that P(E,) = l , r
n
>1. Since E = nZ1
B, = limn,, B,, where
B, = L E,, it also follows that P(E) = limn P(Bn)= 1.
Let us express E in a different form. First note that E, is given by

then S E C, because if

clearly S c 3, and by the density of rationals in R, s c S also follows. Since


s E C, so is S E C. We need t o establish the following result.
3. E c S, so that 1 = P(E)5 P(S) 5 1. For, if we let z E (zk,,,zk+l,,),
then by the monotonicity of Fn and F, we get
2.4 Applications 71

This is clearly possible since F ( X ~ + ~ 5


, , )( k + l ) / r and F(xk,, + 0 ) > k l r .
Hence ( 6 ) may be written

aiid in a similar way

Combining these two sets of inequalities we get for a.a. ( w )

>
Since r 1 is arbitrary, the left-side inequality holds if the right-side inequal-
ity does, for almost all w. Hence w E E + w E = S . Thus E c S , and the
s
theorem is proved.

Remark: The einpiric distribution has found substantial use in the statis-
tical method known as the "Bootstrap". In the theory of statistics, bootstrap-
ping is a method for estimating the sampling distribution of an estimator by
"resampling" with replacement from the original sample.

In the proof of the theorem, one notes that the detailed analysis was needed
above in extending the a.e. coilvergelice of ( 4 ) for each x to uniform conver-
gence in x over R.This extension does not involve any real probabilistic ideas.
It is essentially classical analysis. If we denote by C the class of all intervals
(-cm, x ) , and denote by

aiid similarly
72 2 Independence and Strong Convergence

then p,(A) is a sample "probability" of A [i.e., p,(.)(w) is a probability for


each w E R,and pn(A)(.) is a measurable function for each Borel set A];
and p is an ordinary probability (that is, determined by the common image
measure). Then (3) says the following:

P [ lim sup p , (A) -


nioo AEC

This form admits an extension if X I , X 2 , . . . are random vectors. But here


the correspondent for C must be chosen carefully, as the result will not be true
for all collections because of the special sets demanded in Definition 2.1 (see
the couiiterexample following it). For instance, the result will be true if C is
the (corresponding) family of all half-spaces of Rn. But the following is much
more general and is due t o R. Ranga Rao, A n n . Math. Statist. 33 (1962),
659-680.

Theorem 2 Let X I , X 2 , . . . be a sequence of independent random vectors


o n ( R ,C , P) with values in Rm, and for each Borel set A c Rm, we have
p(A) = P[X, E A], n > 1, so that they have the c o m m o n image measure p
(or distribution). Let p,(A) be the empiric distribution based o n the sample
(or initial segment) of size n (i.e. on, X I , . . . , X,) so that

If C is the class of measurable convex sets from Rm whose boundaries have


zero measure relative t o the nonatomic part of p, then

P
lniw
lim suplpn(A) - p ( A ) = O
AEC 1 = 1.

We shall not present a proof of this result, since it needs several other
auxiliary facts related to convergence in distribution, which have not been es-
tablished thus far. However, this result, just as the preceding one, also starts
its analysis from the basic SLLN for its probabilistic part.

(B) Density Estimation

Another application of this idea is to estimate the probability density by a


method that is essentially due to Parzen (1962).
Suppose that P [ X < z] = Fx(z) is absolutely contiiiuous relative to the
Lebesgue measure on R, with density f ( u ) = (dFx/dz)(u), and one wants
to find an "empiric density" of f ( . ) in the manner of the Glivenko-Cantelli
theorem. One might then consider the "empirical density"
2.4 Applications 73

and find conditions for f,(x, h) + f (x) a.e. as n + cc and h + 0. In contrast


to the last problem, we have two limiting processes here which need additional
work. Thus we replace h by h, so that as n + cm, h, + 0. Since F,(x) itself
is an w-function, we still need extra conditions. Writing &(x) for f,(x, h,),
this quotient is of the form

for a suitable iioiiiiegative function K ( . ) , called a kernel. The approximations


employed in Fourier integrals [cf. Bochner (1955)], Chapter I) give us some
clues. Examples of kernels K ( t ) are (i) ep", (ii) e p t ~ [ t 2 0(iii)
1 , X [ ~ , I ]and
, (iv)
+
1/(1 t2). In this way we arrive at the following result of Parzen. [Actually
he assumed a little more on K , namely, that K ,the Fourier transform of K ,
is also absolutely integrable, so that the examples (ii) and (iii) are not admit-
ted. These are included in the following result. However, the ideas of proof
are essentially his.]

Theorem 3 Let XI, X 2 , . . . be independent identically distributed random


variables on ( 0 ,C , P) whose common distribution admits a uniformly con-
tinuous density f relative to the Lebesgue measure on the line. Suppose that
K : R + R+ is a bounded continuous function, except for a finite set of dis-
continuities, satisfying the conditions: (i) JR K ( t ) d t = 1 and (ii) ItK(t)l + 0
as It + cm. Define the "empiric density" f, : R x R + R+ by

where h, is a sequence of numbers such that nh: + oo, but h, +0 as n + cm.


Then

lim sup I f , ( x ) f ( z ) l = O
n+oopm<x<m I =l.

Proof The argument here is somewhat different from the previous one,
and it will be presented again in steps for convenience. As usual, let E be the
expectation operator.

1. Consider

We assert that g,(x) + f ( x ) uniformly in x E R as n + cm. For using the


i.i.d. hypothesis,
2 Independence and Strong Convergence

it suffices to show that v,(z) + f (x) uniformly. Then since g, is a (C,1)-


average like v,, it follows that g,(x) + f ( x ) in the same sense. Since f is
assumed to be uiiiforinly coiitinuous aiid integrable (and a probability den-
sity), it is easily seen that f is also bounded. Thus consider

But, given E > 0, there is a 6, > 0 such that f (x - t ) - f(x)l < E for t l < 6
by the uniform continuity of f . Thus

+ li (k) k d t

t sup luK(u)lL/ f(t)dt


lu26/h,, 6 R

since f is bounded. Letting n + oo, so that h, + 0, by (i) aiid (ii) both the
second and third terms go to zero. Since the right side is independent of x , it
follows that vn(x) + f (x) uniformly in x, as n -+ oo.

2. We use now a result from Fourier transform theory. It is the following.


Let K ( U ) = JR emXK(x)dxjthen one has the inversion, in the sense that for
almost every x E R (i.e., except for a set of Lebesgue measure zero)
2.4 Applications 75

Results of this type for distribution functions, called "inversion formulas,"


will be established in Chapter 4. If K is assumed integrable, then the above
integral can be replaced by ( 1 1 2 ~J)R e - i u x ~ ( u ) d u= K(z) a.e. so (16) is
the (C,1)-summability result for integrals, an exact analog for series t h a t we
noted in the preceding step.
Let 4, (u) = ( l l n ) Ckl +
e w x ~ . Then e m X ~ = cos u x j i sin u x j is a
bounded complex random variable and, for different j , these are identically
distributed. Thus applying the SLLN t o the real and imaginary parts, we get

lim &(u) = E(+l(u)) = ~ ( e " ~a.e.[P].


~ ) (17)
n i m

If w E 0 is arbitrarily fixed, then $,(u) can be regarded as

where Fn is the einpiric distribution of the X,. Now using the "iiiversion
formula" (16) for k,
we can express f, as follows:

with t j = z - X, (w) and 6 = h,a,

[ by the inversion formula, a.e. (Lebesgue measure)]

= fn(z?(w? l by(W1. (19)


We need this formula t o get uniform convergence of f, (z) t o f (z).

3. The preceding work can be used in our proof in the following manner.
By Markov's inequality
76 2 Independence and Strong Convergence

where the limit can be brought outside of the P-measure by Fatou's lemma.
(Note that the sup inside the square brackets is bounded by hypothesis and
is a measurable function, by the same argument as in step 2 of the proof of
Theorem 1. The existelice of limit in (20) will be proved.) We now show that
the right side of (20) is zero, so that (12) results. But if I . 11, is the uniform
(or supremum) norm over R,then

and x H gn(x) = E ( f n ( x ) ) is a constant function (independent of w). By


step 1, the last term goes t o zero as n + oo, and hence its expectation will
go t o zero by the dominated convergence since the terms are bounded. Thus
it suffices t o show that the expectation of the first term also tends t o zero
uniformly in x. Consider

where we used (19) aiid the fact that g,(z) = E ( f,(z)), which is again ob-
tained from (19) with E ( & ( u ) ) . With the same computation, first using the
Fubini theorem and then the dominated convergence theorem t o interchange
integrals on [-a, a] x R,we can pass t o the limit as a + cc through a sequence
under the expectation. Thus

(22)
But by (171, $,(u) - E($,(u)) = I & ( u ) -~ ( e ~+~0 a.e.,~ ~and ) since
1
these quantities are bounded, this is also true boundedly. Thus by letting
n + oo in both sides of (22) and noting that the limits on a and n are on
independent sets, it follows that the right side of (22) is zero a.e. By the
uniform bouiidedness of the left-side norms in (221, we can take expectations,
aiid the result is zero.
Thus E(Ifn(.) - f ( . ) I u ) + 0 as n + cc, and the right side of (20) is zero.
This completes the proof.

Remark Evidently, instead of (171, even WLLN is sufficient for (22).


Also, using the CBS-inequality in (22) aiid taking expectations, one finds
that Var(q5,) < M1/n and this yields the same conclusion without even using
WLLN. (However, this last step is simply the proof of the WLLN, as given by
ceby;ev.) It is clear that considerable analysis is needed in these results, after
einployiiig the probabilistic theorems in key places. Many of the applicatioiis
use such procedures.
2.4 Applications

(C) Queueing

We next present a typical application t o queueing theory. Such a result was


originally considered by A. Kolmogorov in 1936 aiid is equivalent t o a one-
server queueing model. It admits exteiisions aiid raises many other problems.
The formulation using the current terminology appears t o be due t o D. V.
Lindley.
A general queueing system consists of three elements: (i) customers, (ii)
service, aiid (iii) a queue. These are generic terms; they can refer t o people at a
service couiiter, or planes or ships arriving at a port facility, etc. The arrival of
customers is assumed t o be random, and the same is true of the service times
as well as waiting times in a queue. Let ak be the interarrival time between
+
the kth and the (k 1)th customer, bk the service time, and Wk the waiting
time of the kth customer. When customer one arrives, we assume that there
is no waiting, since there is nobody ahead of this person. Thus it is reasonable
+
t o assume a0 = Wo = 0. Now bk Wk is the length of time that the (k 11th +
customer has t o wait in the queue before the turn comes at the service counter.
We assume that the interarrival times ak are independent nonnegative random
variables with a common distribution, and similarly, the bk are nonnegative
i.i.d. and independent of the ak. As noted before, we can assume that the
basic probability space is rich enough t o support such independent sequences,
as otherwise we can enlarge it by adjunction t o accomplish this. The waiting
+
times are also positive random variables. If ak+l > bk Wk,then the (k 1)th +
customer obviously does not need t o wait on arrival, but if ak+l < +bk Wk
+
then the person has t o wait bk Wk ak+l units of time. Thus
-

If we let X k = bk- 1 ak , then the X k are i.i.d. random variables, aiid (23)
-

+ >
becomes Wo = 0 and Wk+1 = max(Wk Xk+1,0), k 0. Note that whenever
Wk = 0 for some k, the server is free and the situation is like the one at
the beginning, so that we have a recurrent pattern. This recurrence is a key
ingredient of the solution of the problem of finding the limiting behavior of
the Wk-sequence. It is called the single server queueing problem.
Consider So = 0, S, = C;=,X k . Then the sequence {S,, n >
0) is also
>
said t o perform a random walk on R, and if Sk E A for some k 0 and Bore1
set A, one says that the walk S, visits A at step k. In the queueing situation,
>
we have the followiiig statement about the process {W,, n 0).

Theorem 4 Let X k = bkPl ak, k > 1, and {S,, n > 0) be as above.


-

Then for each n > 0, the quantities W, and A(l, = max{Sj, 0 < j < n} are
identically distributed random variables. Moreover, if F,(x) = P[W, < x],
then
lim F,(x) = F ( x )
n+cx
78

>
2 Independence and Strong Convergence

exists for each x, but F ( x ) = 0 is possible. If E(X1) exists, then F ( x )


x E R,whenever E(X1) 0, and F(.)defines an honest distribution function
when E(X1) < 0, i.e., F(+cm) = 1.
0, -
>
The last statement says that if E(bk) E ( a k ) ,k >
1, so that the expected
service time is not smaller than that of the interarrival time, then the line of
customers is certain to grow longer without bound (i.e., with probability 1).

Proof For the first part of the proof we follow Feller (19661, even though
it can also be proved by using the method of convolutioiis aiid the fact that
Wk aiid Xk+1 are independent. The argument to be given is probabilistic aiid
has independent interest.
Since Wo = 0 = So,we may express Wn in an alternative form as Wn =
max{(S, - Sk): 0 k < < n). In fact, this is trivial for n = 0; suppose it is
verified for n = m. Then consider the case n = m + 1. Writing S,+l Sk = -

+
S, Sk X,+l, we have with V for "max"
-

>
Hence the statement is true for all m 0. On the other hand, X I , . . . , X, are
i.i.d. random variables. Thus the joint distribution of X 1 , X 2 , .. . , X n is the
same as that of X i , X i , . . . , XA,where X i = X,, Xh = X,-1,. . . , XA = X I .
But the joint distribution of Sb,S;, . . . , SA, where Sh = X i (S&= 01,
and that of So,S 1 , . . . , S, must also be the same. This in turn means, on
substituting the unprimed variables, that So,S1,.. . , S, aiid SA, Si = S, -

SnP1,Si = S, - S n P a , .. . , S A = S, - So are identically distributed. Putting


these two facts together, we get m a x ~ < k < ~ Sand L maxolk5,(Sn - Sk) =
Wn are identically distributed. But the ,?b,-s;, . . . , Sh and So,S1,.. . , Sn were
D
noted to have the same distribution, so that inaxo<k<, Si = maxo<k<, Skor
M, aiid W, have the same distribution. This is thefiist assertion in which we
only used the i.i.d. property of the X, but not the fact that X, = bnPl a,. -

The above analysis implies

But

so that
lim max Sk < x
n+cc O<k<n I ,

F ( x ) = liin Fn(x) = P
n-00
lim max Sk
n-oo O<k<n
<x I = P sup Sk < x ,
[k>o I (25)
2.4 Applications 79

exists, and 0 < F ( x ) I l , x E R. Clearly F ( x ) = 0 for x < 0. On the


other hand, if E ( I X 1 ) = oo, then by Theorem 2.7, limsup, IS, = +oo a.e.
which implies (since So = 0) that either limsup, S, = +oo a.e., so that
sup, S, = +oo a.e., or this caii happen with probability zero. Since F(z) = 0
for x < 0, we only need to consider z > 0. Thus if sup, S, = +oo a.e., then
1 - F ( x ) = P [ s ~ p S~k )>~x] = 1, so that F ( x ) = 0, x E R. If sup, S, < oo
a.e., then lim,,, F ( x ) = 1 and F is a distribution function. Note that, since
supn2oS, is a symmetric function of the random variables X,, which are i.i.d.,
we caii deduce that sup, S, = oo has probability zero or oiie by Theorem 1.12
so that (25) can be obtained in this way also.
Suppose that E(IX11) < oo. Then we consider the cases (i) E(X1) > 0,
(ii) E ( X 1 ) < 0, and (iii) E(X1) = 0 separately for calculating the probability
of A,, where

Case (i): p = E(X1) > 0: By the SLLN, S,/n + E(X1) a.e., so that for
sufficiently large n , S, > E(X1) . a.e. Thus

for any x E R', and hence P(A,) = 0 , or F ( x ) = 0, x E ,


'
R in this case.

Case (ii): p = E(X1) < 0: Again by the SLLN, S,/n + E(X1) a.e., and
given E > 0, and S > 0, oiie caii choose NEs such that n > NEs implies

This may be expressed in the following manner. Let E > 0 be small enough so
+
that E(X1) E < 0. Then for 0 < S < $, choose NEs such that with (261,

For this NEs, consider the finite set S1,S2,.. . , S N C h p lSince


. these are real
>
random variables, we can find an x6 E R' such that x x6 implies

PISl < 2 , . . . , SNely-l


< x] > 1 6 . (28)
If now
80 2 Independence and Strong Convergence

then A, = A, n B,, for z > 0. Hence we have


F(z) = P(A,) = P(A,n B,)

> 2(1 - 6 ) - 1= 1 - 2 6 [ b y ( 2 7 ) and (28) 1.


Siiice 0 < S < is arbitrary, we conclude that lim,,, F i x ) = 1, and hence
F gives an honest distribution in this case.

Case (iii): E(X1) = 0: Now S, = C r = l X,, n >


1, is a symmetrically
dependent sequence of random variables and So = 0. Thus sup,,, S, 0 >
a.e., aiid siiice we caii assume that X I $ 0 a.e., all the S, do ilot vanish
identically a.e. Consider the r.v. Y = limsup, S,. Then Y[= Y(S,, n I ) ] is >
symmetrically dependent on the S, and is measurable for the tail a-algebra.
Hence, by Theorem 1.12 it is a coilstant = ko a.e. It will be seen later (cf.
Theorem 8 below) that, siiice S,/n + 0, a.e. by the SLLN, S, takes both
>
positive aiid negative values infinitely often. Thus ko 0. But then

0 < Y = lim sup S,


n) 1

= lim sup(X1
n) 1
+ . . . + X,) = X1 + liinn ysup(X2
2
+ . . . + X,)

Siiice Y = ko a.e. and X1 is a real nonzero r.v., (29) caii hold only if ko = +GO.

--
Now [lim sup,,l S, = +oo] C [sup,,o S, = GO], aiid so we are back in the
situation treated in case (i), i.e., F(z) 0, z E R. This completes the proof
of the theorem.

The preceding result raises several related questions, some of which are
the following. When E ( X 1 ) < 0, we saw that the waiting times W, + W in
distribution where W is a (proper) r. v. Thus, in this case, if Q, is the number
of customers in the queue when the service of the n t h customer is completed,
D
then Q, is an r.v. But then what is the distribution of Q,, and does Q, + Q?
Siiice Q, is no more than k iff the coinpletioii of the nth customer service time
is no more than the interarrival times of the last k customers, we get

P[Qn< k] = P [ W n bn I + a,+l +.
. . +an+k]. (30)
The random variables on the right are all independent, and thus this may
D
be calculated explicitly in principle. Moreover, it can be shown, siiice W, +
W aiid the b, aiid a, are identically distributed, that Q, 2 Q from this
expression.
2.4 Applications 81

Other questions, such as the distribution of the times that W, = 0, sug-


gest themselves. Many of these results use some properties of convolutions
of the image measures (i.e., distribution functions) on R, and we shall omit
coilsideration of these specializatioiis here.
All of the above discussioiis concerned a single-server queueing problem.
But what about the analogous problem with many servers? This is more
involved. The study of these problems has branched out into a separate dis-
cipline because of its great usefulness in real applications. Here we consider
only one other aspect of the above result.

(D) Fluctuation Phenomena

In Theorem 4 we saw that the behavior of the waiting time sequence is gov-
erned by S, = EL=, X k , the sequence of partial sums of i.i.d. random vari-
ables. In Section 2 we considered the convergence of sums of general iiide-
pendent random variables, but the surprising behavior of i.i.d. sums was iiot
analyzed more thoroughly. Such a sequence is called a random walk. Here we
include an introduction t o the subject that will elaborate on the proof of The-
orem 4 and complete it. The results are due t o Chung and Fuchs. We refer t o
Chuiig (1974). For a detailed analysis of the subject, and its relation t o the
group structure of the range space, see Spitzer (1964).
Thus if X,, n > 1, are i.i.d., and {S, = X k ,n> 1) is a random
walk sequence, let Y = limsup, X,. We showed in the proof of Theorem 4
+
[Case (iii)] that Y = X1 Y and Y is a "permutation invariant" r.v. Then
this equation implies Y = ko a.e. (= *oo possibly), by the Hewitt-Savage
zero-one law. If X1 = 0 a.e., theii by the i.i.d. condition, all X, = 0 a.e., so
that S, = 0 for all n (and Y = 0 a.e.). If X I $ 0 a.e., theii ko = o o or +ce
only. If ko = - m , then clearly -ce 5 lim inf, S, 5 lim sup, S, = - m , so
that limn,, S, = -ce a.e.; or if ko = + m , then liminf, S, can be + m , in
which case S, + +ce a.e., or liminf, S, = o o < liinsup, S, = +ce. Since
liin sup, (S,) = liin iiif, (-S,), no other possibilities can occur. In the case
-

c e = lim inf, S, < lim sup, S, = +ce a.e. (the interesting case), we can
>
look into the behavior of {S,, n 1) and analyze its fluctuations.
A state x E R is called a recurrent poant of the range of the sequence if for
each E > 0, P[IS, - x < E, 2.0.1 = 1, i.e., the random walk visits x infinitely
often with probability one. Let R be the set of all recurrent points of R.A
point y E R is termed a possible value of the sequence if for each E > 0, there
is a k such that P[Sk- yl < E] > 0. We remark that by Cases (i) and (ii)
of the proof of Theorem 4, if E ( X 1 ) > 0 or < 0, then lim,,, S, = +ce or
= -ce respectively. Thus fluctuations show up only in the case E(X1) = 0
when the expectation exists. However, E(IX1I) < oo will iiot be assumed for
the present discussion.

Theorem 5 For the random walk {S,,n > I), the set R of recurrent
values (or points) has the following description: Either R = 0 or R c R is a
82 2 Independence and Strong Convergence

closed subgroup. I n the case R # 0,R = (0) iff XI = 0 a.e., and if X I # 0


a.e., we have either R = R o r else R = { n d : n = 0, *1, 4 ~ 2 ,...), the infinite
cyclic group generated by a n u m b e r d > 0.

Proof Suppose R # 0.If z, E R and z, + z E R,then given E > 0, there


is n, such that n >
n, + x , - xll < E. Thus letting S,(w) = z,, we get
>
S n ( w ) -21 < E, n n,(w), for almost all w, and hence if I = (z - E , X E ) , +
then P[SnE I,i.o.1 = 1. Since E > 0 is arbitrary, x E R , and so R is closed.
To prove the group property, let z E R aiid y E R be a possible value of
the random walk. We claim that z y E R. Indeed for each E > 0, choose rn
-

such that PIISm y < E] > 0. Since z is recurrent, P[S, zl < ~ , i . o .=


- ] 1.
-

Or equivalently P[S, - zl < E, finitely many n only ] = 0. Let us consider,


since [IS, - x < E for finitely many n] = [IS, - XI >
E for all but finitely
many n ] ,

P[IS, - XI < E, finitely often ]

= PIISm-yI < E]P


m+n
XI, -

(by the independence of Smand Sm+, Sm).


(z - y) < 2 ~ finitely

-
,
I
often ,

(31)

By hypothesis P[Sm- y < E ] > 0, and this shows that the second factor of
(31) is zero. But by the i.i.d. hypothesis, Sn aiid Sm+, Smhave the same -

distribution. Hence P[IS, (z y) I < 2 ~finitely


- - , many n] = 0, and z y E R. -

Since y = z is a possible value, 0 E R always, aiid z (z y) = y E R. - -

Similarly 0 - y E R and so R is a group. As is well known, the only closed


subgroups of R are those of the form stated in the t h e ~ r e m and
, ~ R = (0) if
X1 = 0 a.e. In the case that X1 # 0 a.e., there is a possible value y E R of
the random walk, aiid y E R by the above analysis. Thus R = (0) iff X I = 0
a.e. It is of interest also to note that uiiless the values of the r.v. X1 are of
the form nd, n = 0, 4Z1,*2,. . . , R = R itself. This completes the proof.

It is clear from the above result that 0 plays a key role in the recurrence
pheiioineiioii of the random walk. A characterization of this is available:

>
Theorem 6 Let {X,, n 1) be i.i.d. r a n d o m variables o n ( R ,C , P) and
{S,, n >
0) be the corresponding r a n d o m walk sequence. If for a n E > 0 we
"ndeed,if R # 0,because it is a closed subgroup of R, let d = inf{x E R , x >
0). Then d 2 0 and there exist d, t R,d, I d. If d = 0, we can verify that
{kd,,k= 0,*1,*2 , . . . ; n 2 1) isdense i n R and c R + R = R .Ifd > 0,then
{ n d , n = 0 , *I,. . .) c R and is all of R. There are no other kinds of groups. Note
that if R f 0 every possible value is also a recurrent value of the random walk.
2.4 Applications 83

then 0 is not a recurrent value of {S,, n > 0). If, on the other hand,for every
E > 0 it is true that the series in (32) diverges, then 0 is recurrent. [Itfollows
from (36) below that if the series (32) diverges for one E > 0, then the same
is true for all E > 0.1

Proof If the series in (32) converges, then the first Borel-Cantelli lemma
implies P[IS, < E, finitely often ] = 1 so that 0 6 R. The second part is
>
harder, since the events {[IS, < ~ ] , n 1) are not independent. Here one
needs to show that P[IS, < E, i.o.1 = 1. We consider the complementary
event aiid verify that it has probability zero, after using the structure of the
S, sequence.
>
Consider for any fixed k 1 the event A h defined as

A g = [SmI< E, IS, > & , n> m + k]. (33)


+
Then A k is the event that the S, will not visit (-E, E) after the (m k 1)th -

+ +
trial, but visits at the mth trial [from the (m l ) t h t o (m k - l ) t h trials,
it may or may not visit]. Hence A h , A h + k ,Ah+,k,. . . are disjoint events for
> >
rn 1 aiid fixed k 1. Thus

> > +
But for each k 1, [IS, < E] and [IS, S m 5 2 ~n , rn k] are independent,
aiid A k > [lSmI< E] n [IS,-Sml > >
2 ~ , n rn+k], k >
1, since IS, >
>
(IS,-SmI - SmI) 2 ~ - E= E, on the displayed set. Hence, with independence,
(34) becomes

But

= P [ S n> 2 ~n , > k] (by the i.i.d. condition).


84 2 Independence and Strong Convergence

Hence
00

C P [ S m< E] . P[S, > 2 ~ , >n k] 5 2k.


m=l
Since we may take the second factor on the left out of the summation, and
since the sum is divergent by hypothesis, we must have PISnI 2 ~n , k] = 0> >
for each k. Hence taking the limit as k + oo, we get

P[lS, > 26, finitely often ] =P


[
U
k >
[IS, 1
]
> 2 ~ ]= 0,

or P[IS,I < E, i.o.1 = 1 for any E > 0. This means 0 E R and completes the
proof of the theorem.

Suppose, in the above, the X, : R + R ' are i.i.d. random vectors


and Sn = C r = l Xi. If IXi is interpreted as the maximum absolute value
of the k components of Xi, and Sn visits (-E, E) means it visits the cube
(-E, E ) ~c R' (i.e., IS, < E), then the preceding proof holds verbatim for
the k-dimensional random variables, and establishes the corresponding result
for the k-dimensional random walk. We state the result for reference as follows:

Theorem 7 Let {X,, n >


1) be 2.i.d. k-vector random variables on
(a,
E, P) and S, = Cy=L=,Xi, SO= 0, where k >
1. Then 0 is a recurrent
>
value of the k-random walk {S,, n 0) iff for any E > 0,

Moreover, the set of all recurrent values R forms a closed subgroup of the
additive group IKk.

The proof of the last statement is the same as that for Theorem 5 , which
has a more precise description of R in case k = 1.
If R = 0 , then the random walk is called transient, and is termed recurrent,
(or persistent) if R # 0.
We can now present a sufficient condition for the recurrence of a random
walk, and this completes the proof of case (iii) of Theorem 4.

Theorem 8 Let S, = X1 + . . . + X,, {X,, n > 1, 2.i.d.) be a (real)


random walk sequence on ( 0 ,C , P) such that S,/n 3 0. Then the walk is
recurrent.

Remark As noted prior t o Theorem 3.2, this condition holds for certain
symmetric random variables without the existence of the first moment. On the
other hand, if E ( I X 1 ) < cm, then it is always true by the WLLN (or SLLN).
2.4 Applications 85

We shall establish the result with the weaker hypothesis as stated. The proof
uses the linear order structure of the range of S,. Actually the result itself is
not valid in higher dimensions (2 3). It is true in 2-dimensions, but needs a
different method with characteristic functions (cf. Problem 21.)

Proof We first establish an auxiliary inequality, namely, for each E > 0,

1P[lSml< k]52" P[lSm<&I, r,k > 1, integers. (36)

If this is granted, the result can be verified (using an argument essentially


due to Chung and Ornsteill (1962)) as follows: We want t o show that (32)
fails. Thus for any integer b > 0, let r = kb in (36). Then

because (mlb) 5 k. By hypothesis S m / m 3 0, so that P [ ( S m ( / m< ~ / b ]+ 1


as rn i oo. By the (C,1)-suminability,

Hence (37) becomes on letting k -+ cc

Since b > 0 is arbitrary, (32) fails for each E > 0, and so {S,, n 2 1) is
recurrent.
It remains to establish (36). Consider, for each integer m, [ r n ~ Sn < <
+
(m I)&]and write it as a disjoint union:
n
~ Sn < (rn
[ m ': + I)€] = U [ r n <~ Sn < (rn + I)€]? Akl (38)
k=O
where A. = [me < +
So < (rn l ) ~and ] for k> 1, Ak = [ S k [ r n ~( ,m +
<
I ) € ) ,S j @ [ r n ~( ,m + I ) € ) ,0 j 5 k 11. Thus Ak is the w-set for which S k
-

enters the interval [ m ~(m , + l ) ~for


] the first time. Then
2 Independence and Strong Convergence

[since onAk,m6 5 Sn,Sk < ( m + l ) +~S n - S k < E]

(since Ak is determined by XI, . . . , X I ,

aiid heiice is independent of Sn - Sk for n >k)


= xC P(Ak)PIIS,,r: < el (by the i.i.d. property)

5 C P [ ~ s<~€1 (siiice the Ak are disjoint). (39)


j=O

Summing for m = - k t o k - 1, we get

This proves the inequality (361, and heiice also the theorem.

It is now natural t o investigate several other properties of recurrent random


walks, such as the distribution of the first entrance time TA of the process into
a given Bore1 set A c R, finding conditions on X in order that E(TA) < oo
or = oo, aiid P[TA < oo] = 1. Conversely, the recurrence aiid traiisieiice of a
random walk determines the structure of the range space R or Rn on a general
locally compact group G. However, these questions need for their consider-
ation certain analytic tools that we have not yet developed. In particular,
a detailed study of characteristic functions aiid distribution functions is an
essential first step, and this is undertaken in Chapter 4. It is then necessary
t o study further properties of sums of independent but not necessarily iden-
tically distributed random variables, continuing the work of Section 2. Here
the most striking result, which we have not yet touched upon, is the law of
the iterated logarithm. This is a strong limit theorem, based on the existence
of two moments, but for its proof we also need the work on the central limit
problem. Thus the results of this chapter are those obtainable only by means
of the basic techniques. We need t o continue expanding the subject. First a
Exercises 87

weakening of the concept of independence is needed. Then one proceeds to


a study of the central limit problem and the (distributional or) weak limit laws.

Exercises
1. (a) Let (0,C, P) be a probability space with 0 having at least three
points. If X : 0 + R is a random variable taking three or more distinct
values, verify that 1,X , X 2 are linearly independent (in the sense of linear
algebra) but will be stochastically independent only if X is two valued and
X 2 is a coilstant with probability 1, in which case 1 , X , X 2 are not linearly
independent. Give an example satisfying the latter conditions. On the other
hand, if X, Y are stochastically independent and not both are constant, then
they are linearly independent, whenever X # 0 and Y # 0.

(b) Consider f l = {1,2,3,4,5} with P({i}) = 115 for i = 1 , 2 , 3 , 4 , 5 .


Is it possible to find events A, B of 0 so that A and B are independent?
The answer t o this simple and interesting problem is no. A probability space
(0,C, P) is called a "dependent probability space" if there are no nontrivial
independent events in fl, ( R , C, P) is called an independent space otherwise.
R. Shiflett aiid H. Schultz (1979) introduced this coiicept where they studied
both finite and countably iiifiiiite settings for R. Show that if R is finite so
that R = {1,2,. . . , n ) with P({i)) = l / n for i = 1 , 2 , . . . , n then ( R , C, P)
is a dependent probability space if and only if n is prime. Additional results
on finite dependent spaces with uniform probabilities caii be found in the
article by Shiflett aiid Schultz aiid in the work of Eiseiiberg aiid Ghosh (1987).
Recently, W.F. Edwards (2004) investigated the case of the space (fl, C, P)
with R = { 1 , 2 , 3 , .. .) and the measure P not uniform as follows. Show that
> +
if 0 = { 1 , 2 , 3 , .. .) with P({i)) = pi pi+l = P({i 1)) for all i and if

then (0,C, P) is an independent space. The hypothesis in this last statement


is sufficient but not necessary which caii be seen by showing if f l = { 1 , 2 , 3 , . . .}
with pi = (1 r ) r e l for 0 < r < 1,i = 1 , 2 , . . . , then ( f l , C, P) is an indepeii-
dent space. These results give an idea of the interest that is associated with
the question, "Are there necessary and sufficient conditions for a probability
space (0,C, P) to be dependent?"

( c ) One result without a restriction on the cardinality of R can be ob-


tained by showing that (fl, C, P) is an independent probability space if aiid
only if there exists a partition of 0 into four nontrivial events A, B , C and D
88 2 Independence and Strong Convergence

for which P ( A )P ( B ) = P ( C )P ( D ) . [A related idea was considered by Chen,


Rubin and Vitale (1997) who show that if the collection of pairwise indepen-
dent events are identical for two measures, then the measures coincide. These
are just some of the ideas associated with independent probability spaces.
This type of inquiry can be continued with a serious investigation.]

2. Let 4 : R+ + R+ be an increasing continuous convex or concave


function such that $(O) = 0, with $(-x) = 4 ( x ) , and in the convex case
<
4(2z) c4(x), z >
0, 0 < c < oo. If Xi : w + R,i = 1 , 2 , are two random
variables on (R, C, P) such that E(4(X,)) < cc, i = 1 , 2 , theii verify that
+
E (4(X1 X a ) ) < cc and that the converse holds if X I , X 2 are (stochastically)
independent. [Hint: For the converse, it suffices to consider 1x2 > no > 1.
Thus

< > >


for an A, = [Xzl n ] ,no n 0. Note that the converse becomes trivial if
>
Xi 0 instead of the independence condition.]

3. The preceding problem can be strengthened if the hypothesis there


is strengthened. Thus let X 1 , X 2 be independent and E(X1) = 0. If now
4 : R+ + R+ there is restricted to a continuous convex function and
+
E ( 4 ( X 1 X2)) < oo, then E(4(X2)) < E($(Xl +
X2)). If E(X2) = 0
is also assumed, theii E ( 4 ( X i ) )< +
E ( 4 ( X 1 X 2 ) ) ,i = 1,2. [Hint: Use
Jenseii's inequality, and the fundamental law of probability, (Theorem 1.4.1)
+ < +
in 4(x) = 4(E(X2 2)) E(4(z X z ) ) and integrate relative to dFxl (z),
then use Fubini's theorem.]

4. (a) Let I = [O, 11, B = Bore1 a-algebra of I, and P = Lebesgue mea-


sure on I. Let X I , . . . , Xn be i.i.d. random variables on (a, C, P) with their
common distribution F(z) = PIX1 < z] = x, 0 < z < 1 (and = 0 for
x < 0, = 1 for x > 1). Define Yl = min(X1, ..., X,), and if Y , is defined, let
Y,+l = min{Xk > Y , : 1 k < < n}. Then (verify that) Yl < Y2 < . . . < Y,
are random variables, called order statistics from the d.f. F , and are not in-
dependent. If FY,,,..,~~, is their joint distribution, show that

From this deduce that, for 0 < a < b < 1,i = 1 , . . . , n ,

and that for 0 < a < b < c < 1,1< i < j < n,
Exercises

-
n!
(i - l ) ! ( j - i - l)!(n - j)!

<
[Note that for 0 yl < yz < . . . < y, <
1, for small enough E > 0 such that
+
[Yi, Y, E] are disjoint for 1 <
i I n , we have

P[Yi I Y , I Yi + E i 7 1 I i < n]
-
-
C I xiI I Y j + E j ,
P [ ~ j 1 Ij < n],
all perlnutations
...,,i,,) of (1,2 ,...,7
(i~ ~ )

where the X, are i.i.d. for each permutation, and that there are n! permuta-
tions.]
(b) Let 2 1 , . . . , Zn be i.i.d, random variables on ( 0 ,C , P) with their com-
mon distribution F on R contiiiuous aiid strictly increasing. If Xi = F ( Z i ) ,
1 < <i n , show that X I , . . . , X, are random variables satisfying the hy-
pothesis of (a). Deduce from the above that if 2, is the ith-order statistic of
(21,. . . , Z,), then

Similarly, obtain the corresponding formulas of (a) for the 2,-sequence.

5. (a)Following Corollary 1.8 we have discussed the adjunction procedure.


Let X I , X 2 , . . . , be any sequence of random variables on (a,
C, P).Let F,(z) =
P [ X , < x], i = 1 , 2 , . . . . Then using the same procedure, show that there is
another probability space (fi,2, P) and a mutually independent sequence of
random variables Yl, Y2,. . . on it such that P[Y, < x] = F,(x), x E R, n 1. >
[Hint: Since Fn is a d.f., let pn(A) = JA ddF,(x),A c R Borel, X, = identity
on R. Then (R, B, p,) is a probability space and X, is an r.v. with Fn as its
d.f. Consider, with the Fubini-Jessen theorem, the product probability space
c,
(dl P) = @,21(R,, B,, p n ) , where R, = R, Bn = B. If L;) = ( 1 ~ 1 ~ x 2. .), . E
fi = RW , let Yn(L;))= nth coordinate of L;) [= xn = K_(L;))].
Note that the Y,
are independent random variables on (fi, El P) aiid P[Y, < z] = p, [x, <
x] = F,(x),x E R , n 1.1 >
(b) (Skorokhod) With a somewhat different specialization, we can make,
the following assertion: Let X I , X 2 . . . be a sequence of random variables on
( R , C, P) which converge in distribution to an r.v. X . Then there is another
probability space (R', C', P') and random variables Yl, Y2, . . . on it such that
Y, + Y a.e. and P[X, < x] = Pf[Y, < x ] , x E R,for n 1. Thus X,,Y,>
90 2 Independence and Strong Convergence

have the same distributions and the (stronger) pointwise convergence is true
for the Y,-sequence. (Compare this with Proposition 2.2.) [Sketch of proof:
>
Let Fn(x) = P [ X n < x],F ( x ) = P [ X < x],x E R, n 1. If Yn, Y are inverses
to F,, F, theii Y,(z) = inf{y E R : F,(yL> z}; aiid similarly for Y. Clearly
Yn, Y are Borel functions on ( 0 , l ) + R. Since Y,(z) < y iff F,(y) > z,
we have, on letting R1 = (0, I ) , C1 = Borel a-algebra of R', with P I as the
Lebesgue measure, P1[Yn< y] = P f [ x : x < Fn(y)] = Fn(y); and similarly for
P1[Y< y] = F ( y ) . Since F,(x) + F ( x ) at all continuity points of F, let x be a
continuity point of F. If the F, are strictly increasing, theii Yn = F;' and the
result is immediate. In the general case, follow the argument of Proposition
<
2.2, by showing that for a < b c < dl

and then setting b = c, a continuity point of F; let a 1' b and d J c, so that


Yn(c) + Y(c). Since the discontinuities of F are countable aiid form a set of
P I measure zero, the assertion follows. Warning: In this setup the Y , will not
be independent if Y is iioiiconstant (or X is nonconstant).]
( c ) The following well-known construction shows that the preceding part
is an illustration of an important aspect of our subject. Let ( R , C) be a mea-
surable space and Bi E C be a family of sets indexed by D c R such that
for i , j E D , i < j + Bi c Bj. Then there exists a unique random variable
X : R + R such that {w : X(w) < i} c Bi aiid {w : X(w) > i} c B,".
[Verify this by defining X(w) = inf{i E D : w E B,} and that X is measur-
able for C.] If P : C + R+ is a probability and D is countable, {Bi, i E D}
is increasing P a.e. (i.e., for i < j, P ( B i - B j ) = 0), then the variable X
above satisfies {w : X(w) < >
i} = Bi, a.e. and {w : X(w) i} = B,", i E D .
(See e.g., Royden (1968, 1988), 11.2.10.) Suppose that there is a collection
of such families {B?,i E D = R, n > I} c C . Let X, be the correspond-
ing random variable constructed for each n , and let F,(x) = P ( B g ) where
-oo < x < oo. Show that Fn = P o X i 1 , determined by the collection, and
>
that for n l , . . . , n,, xi E R, m 1 one has

defines an m-dimensional (joint) distribution of (X,, , . . . , X n t r bso


) constructed.
[This construction of distributions will play a key role in establishing a general
family of random variables, or processes, later (cf., Theorem 3.4.10).
(d) Here is a concrete generation of independent families of random vari-
ables already employed by N. Wiener (cf. Paley aiid Wiener (1934), p. 143),
and emphasized by P. Lkvy ((1953), Sec. 2.3). It also shows where the prob-
abilistic concept enters the construction. Let Yl, . . . , Yn be functions on (0,l)
each represented by its decimal expansion
Exercises 91

&,
a,,, taking values 0 , 1 , . . . , 9 each with probability independent of one
another. (This is where probability enters!) Then each Y, is uniformly dis-
tributed and they are mutually independent. (Clearly binary or ternary etc.
expaiisioiis can be used in lieu of decimal expansion. Unfortunately, no recipe
exists for choosing a,,, here. A similar frustration was (reportedly) expressed
by A. Einstein regarding his inability t o find a recipe for a particular Brownian
particle t o be in a prescribed region, but only a probability of the event can be
given. [cf., Science, 30 (20051, pp. 865-890, special issue on Einstein's legacy].)
>
If {F,, n 1) is a sequeiice of distribution functions on R,let F l l be the geii-
>
eralized inverse of F, as defined (in part (b)) above. Let X, = F;l(Y,), n 1.
Then {X,, n > 1) is a sequeiice (of mutually independent) random variables
with distributions F,. [It is even possible t o take a single uniformly distributed
random variable Y by reordering a,,, into a single sequence {bk, k > 1) so
that Y = ~ rh, = by ~excluding the terminating decimal expaiisioiis which
are countable aiid hence coiistitute a set of (Lebesgue) measure zero, and
then X, = F;l(Y), n > I.] It should be observed that in the representation
of X, as a mapping of (Yl, . . . , Y,) [or of Y] by I, which is one-to-one, there
are infinitely many representations, while a unique distribution obtains if it
is nondecreasing, such as I?;.' This fact is of interest in applications such as
those implied in part (b) above.
The followiiig example is considered by Wiener (in the book cited above,
p. 146). Let Yl, Y2 be independent uniformly distributed random variables on
(0,l) aiid define R = ( log Yl) +, and Q = 27rY2 and let X1 = R cos 0, X 2 =
R sin 8. Then the Jacobian is easily computed, and one has dyldyz = ~ r - ( x ~ + z ~ )
dxldx2 so that X I , X 2 are independent normal random variables generated
by Yl, Y2. Extending this procedure establish the followiiig n-dimensional
version. Let Yl, . . . , Y, be independent uiiiforinly distributed raiidoin vari-
ables on (0,1), Q1, = 2TYk+1 and X1 = Rsin QnP1 . . . sin 82 sin 81; X 2 =
R sin 8,-1 . . . sin 82 cos 81, . . . , X,-1 = R sin 8,-1 cos 8,-2, and X, = R cos 8,-1
where R = (-2 log ~ 1 ) ; .The Jacobian is much more difficult, [use induc-
tion], but is nonvanishing, giving a one-to-one mapping. (With R = 1, the
trailsformatioil has Jacobian t o be (-l)"(siii Ql)"(sin 82)" . . . sin QnP1 cos 8,
so that it is 1-1 between the open unit n-ball and the open rectangle
0 < 8, < T , z = 1 , . . . n.) This shows that the @, sequence (different from the
F,) can be somewhat involved, but the procedure is quite general as noted by
N. Wiener whose use in a construction of Brownian motion is now legendary,
aiid was emphasized by P. Lkvy later. [In the last chapter we again coiisider
the Brownian motion construction with a more recent and (hopefully) simpler
method.]

6. (a) (Jessen-Wintner) If {X,, n > 1) is a sequeiice of independent count-


ably valued raiidoin values on (a, C , P) such that S, = C;=, X k + S a.e.,
then the distribution of S on R is either (i) absolutely coiitinuous or singular
relative t o the Lebesgue measure or (ii) P[S = j] > 0 for a countable set of
92 2 Independence and Strong Convergence

points j E R, and no mixed types can occur. [Hints: Let G c R be the group
generated by the ranges of the X n , so that G is countable. Note that for any
+ +
Borel set B , the vector sum G B = {x y : x E G, y E B ) is again Borel.
If no = {W : S,(W)+ S(W)), then let A = {w : S(W) +
E (G B ) n Ro),
and verify that A is a tail event, so that P ( A ) = 0 or 1 by Theorem 1.7.
+
Indeed, if gl - g, E G, then gl E G B for some Borel set B iff gz E G B . +
+
Now if Sn = S - (S - Sn) E G , then S - Sn E G B , and conversely. But
+
S - Sn E no.Hence A = [S- Sn E G B ] n Go, so that A is a tail event,
and P ( A ) = 0 or 1. This implies either S is countably valued or else, since
P ( R o ) = 1, P[SE G + B ] = 0 for each countable B . In this case P[S E B ] = 0
for each countable B , so that S has a continuous distribution, with range non-
countable. Consequently, either the distribution of S is singular relative t o the
+
Lebesgue measure, or it satisfies P[S E G B ] = 0 for all Borel B of zero
+
Lebesgue measure. Since G is countable, this again implies P[S E G B ] = 0,
so that P[S E B ] = 0 for all Lebesgue iiull sets. This means the distribution
of S is absolutely coiitinuous. To see what type is the distribution of S, we
have to exclude the other two cases, and no recipe is provided in this result.
In fact this is the last result of Jessen-Wintrier's long paper (1935).]
(b) To decide on the types above, we need to resort to other tricks, and
>
some will be noted here. Let {X,, n 1) be i.i.d. random variables with

Let Sn = EL=lxk/2'. Then Sn + S a.e. (by Theorem 2.6). Also IS 5 1 a.e.


Prove that the S distribution is related to that of U V, where U aiid V are
-

independent random variables on the Lebesgue unit interval [O, I ] , with the
< <
uniform distribution F, i.e., F ( x ) = 0 if x 0, = x if 0 < x 1, aiid F ( x ) = 1
for x > 1, and hence has an absolutely continuous distribution. [Hints: Note
that if Fu,Fv are the distributions of U, V, then FrJ+v can be obtained by
the image law (cf. Theorem 1.4.1) as a convolution:

(sillce FrJ,v= FrJ. Fv by independence)

Thus Fu+v is continuous if at least one of FrJ,Fv is continuous. Next verify


that if x = Er=l~ ~ / where
2 ~ &I,, = 0 , 1 is the dyadic expansion of 0 < x < 1,
then (as in the construction of Problem 5 (d) above)
Exercises 93

with p as the Lebesgue measure. Deduce that U has the same distribution
as the identity mapping I : ( 0 , l ) + ( 0 , l ) with Lebesgue measure.](Explicit
calculation with ch.f. is easier and will be noted in Exercise 4.11.)
>
( c ) By similar indirect arguments verify the following: (i) If {X,, n 1) is
as above, then S, = EL=, x k / 3 b S a.e. aiid S has a singular distribution.
(ii) (P. Lkvy) If Y,, n = 1 , 2 , .. . , are independent with values in a countable
set C c R,and if there is a convergent set of numbers cn E C such that

then S = Cr!l Yk exists a.e., and S takes only countably many values with
positive probability.
(d) The proofs of T h e o r e m 2.6 aiid 2.7 used the Kronecker lemma and
the (c, 1)-summability. Thus the Kolomogorov SLLN (Theorem 2.7) can be
considered as a probabilistic analog of the classical (c, 1)-summability in the
>
sense that a sequence {X,, n 1) of i.i.d. r.v.s on (R, C , P) obeys the (c, 1)-
pointwise a.e. iff E(X1) = p E R exists. Since classical analysis shows that
>
(c, 1)-summability implies (c, p)-summability for p 1, one can expect a sim-
ilar result for i.i.d sequences. In fact the following precise version holds. Let
>
p, p E R,p 1. Verify the following equivalences for i.i.d. r.v.s:
>
(i) {X,, n 1) obeys the SLLN,
(ii) E(X1) = p ,
>
(iii) {X,, n 1) obeys (c, 1)-summability a.e. with limit p ,
>
(iv) {X,, n 1) obeys (c,p)-summability a.e. with limit p ,

i.e., lim ~ ( k + ~ - l ) ~ n p k = p a . e . ,

(v) {X,, n
"+" ('"3 k=O
> 1) obeys Abel mean a.e. with value p ,
00

i.e., lim (1 - A)
OSXTl
Aixi= p a.e..
i= 1
[Hints: The classical theories on summability imply that (i) +- (iii) +- (iv)
+- (v) and Theorem 2.7 gives (i) @ (ii). So it suffices t o show (v) +- (ii).
For ordinary sequences of reals, Abel convergence does not imply even (c, 1)-
convergence. (Here the converse holds if the sequence is bounded in addition,
as shown by J.E. Littlewood.) But the i.i.d. hypothesis implies the converse
a.e. as follows. Using the method of Theorem 2.9, called symmetrization, let
X: = X, - XA where Xn aiid XA are i.i.d. (one may use enlargement of
the basic probability space as in the proof of 2.9, where X i is denoted as 2,
there), and (v) can be expressed if 1 - X = l l m , m 1 as >
2 Independence and Strong Convergence

lim
m-oo
-x(l--)~'=~-~=O,
1
m
O0

i=l
1
m
i
"
or alternately

Theii Y, + 2, + 0 a.e. as m + oo, and Y, 2, are independent. Verify that


for each E > 0, P[Z,I >
E] + 0 as m + cc. Then using Slutzky's Theorem
and stochastic calculus (Problems 9(b) and l l ( c ) below) suitably conclude
that Ym + 0. Next Y, = Ym - AX;
+ 0 and finally that X & / m + 0 also
as m + oo. [This needs some more work!] Theii by the Borel-Cantelli lemma,
deduce that E ( X 1 l ) < oo, as in the proof of Theorem 3.7. Hence SLLN holds.
Thus the equivalence follows. The above sketch is a paraphrase of T . L. Lai
(1974). Can we replace mutual independence here by pairwise independence,
as in Corollary 3.3 if we only ask for WLLN?]

7. This problem illustrates the strengths and limitations of our a.e. con-
vergence statements. Let ( R , C , P) be the Lebesgue unit interval, so that
R = ( 0 , l ) and P = Lebesgue measure on the completed Borel a-algebra C .
If w E 0, expand this in decimals: w = 0.x1x2.. . SO that if X,(w) = x,, then
X, : 0 + {0,1, . . . ,9} is a r.v. Verify that {X,, n > 1) is an i.i.d. sequence
with the common distribution F , given by F ( y ) = (k+1)/10, for k 5 y < k + l ,
k = 0 , 1 , . . . , 9 ; = 0 if y < 0; = 1 for y > 9. Let Sk(.) be the Dirac delta func-
tion, and consider Sk(Xn). Then P[Sk(X,) = 11 = 1/10, P[fik(X,) = 01 =
>
9/10, and Sk(X,), n 1, are i.i.d., for each k = 0,1, . . . , 9 . If kl, k2, . . . , k, are
< <
a fixed r-tuple of integers such that 0 ki 9, define (cf. Problem 5 (d) also)

Show that the E,,,, n > 1, are bounded uncorrelated random variables for
which we have ( l l m ) C z l E,,, + 1/10, a.e. as m + oo (apply Theorem
3.4), r = 1 , 2 , . . .. This means for a.a. w E f l , the ordered set of numbers
(kl, . . . , k,) appears in the decimal expailsion of w with the asymptotic rela-
tive frequency of 1/10'. Every number w E R for which this holds is called a
normal number. It follows that CT=lE,,, + cc as m + cc for a.a.(w) (as in
the proof of Theorem 4.4); thus E,,, = 1 infinitely often, which means that the
given set (kl, . . . , k,) in the same order occurs infinitely often in the expansion
of each normal number, and that almost all w E f l are normal. [This fact was
established by E. Borel in 1909.1 However, there is no known recipe t o find
which numbers in 0 are normal. Since the transcendental ( T - e ) E (0, I ) , it
Exercises 95

is not known whether T - e is normal; otherwise it would have settled the old
question of H. Weyl: Is it true or false that in the decimal expansion of the
irrational number T, the integers 0 , 1 , . . . , 9 occur somewhere in their natural
order? This question was raised in the 1920's to counter the assertion of the
logicians of Hilbert's school asserting that every statement is either "true" or
"false," i.e., has only two truth values. As of now we do not know the definitive
answer to Weyl's question, even though T has been expanded to over lo5 dec-
imal places and the above sequence still did not appear! [See D. Shanks and J.
W. Wrench, Jr. (1962). Math. Computation 16, 76-89, for such an expansion
of n. On the other hand, it is known that 0.l23456789lOlll2l3l4l5l6l7...,
using all the natural numbers, is normal. Recently two Japanese computer
scientists seem to have shown that the answer is 'yes' after expanding T for
several billions of decimal places. See, e.g. J.M. Borwain (1998), Math. Intel-
ligence~,20, 14-15.]

8. The WLLN of Theorem 3.2 does not hold if (even) the symmetric
moment does not exist. To see this, we present the classical St. Petersburg
game, called a "paradox," since people applied the WLLN without satisfying
its hypothesis. Let X be an r.v. such that

>
on ( R , C , P ) . Let {X,, n 1) be i.i.d. random variables with the distribution
of X . If S, = C L = , X k , show that S,/n f i a, as n + cm, for aiiy a E R,
either in probability or a.e. for any subsequence. (Use the last part of Theo-
rem 3.7.) The game interpretation is that a player tosses a fair coin until the
head shows up. If this happens on the nth toss, the player gets 2, dollars. If
aiiy fixed entrance fee per game is charged, the player ultimately wins and the
house is ruined. Thus the "fair" fee will have to be "infinite," and this is the
P
paradox! Show however, by the truncation argument, that S,/(n log2 n) + 2
as n + cm, where log, n is the logarithm of n to base 2. If the denominator is
P
replaced by h(n) SO that ( n log, n ) / h ( n ) + 0, then S,/h(n) + 0 and a.e. In
>
fact show that for any sequence of random variables {Y,, n 1) there exists
an illcreasing sequence k , such that P[IY, > k,, i.o.1 = 0, so that Yn/kn + 0
a.e. Thus nlog2 n is the correct "normalization" for the St. Petersburg game.
(An interesting and elementary variation of the St. Petersburg game can be
found in D.K. Neal, & R.J. Swift, (1999) Missouri J. Math. Sciences, 11,No.
2, 93-102.)

9. (Mann-Wald). A calculus of "in probability" will be presented here.


(Except for the sums, most of the other assertions do not hold on infinite
> >
measure spaces!) Let {X,, n 1) and {Y,, n 1) be two sequences of random
variables on ( 0 ,C , P). Then we have the following, in which no assumption
of independence appears:
96 2 Independence and Strong Convergence

P
(a)X,zX,Y,zY+X,*Y,zXf Y,andX,Y,+XY.
(b) If f : R2 + R is a Bore1 function such that the set of disconti-
nuities of f is measurable and is of measure zero relative to the Stieltjes
measure determined by the d.f. FX,Y of the limit vector (X, Y) of (a), theii
P P
f (X,, Y,) 2 ( X , Y) under either of the conditions: (i) X, + X , Y, + Y
or (ii) ax, + P
+
BY, + a X PY for all real a , p. If f is continuous, then
strengthen this to the assertion that f (X,, Y,) 3 f (X, Y) if coiiditioii (i)
holds. [Hint: For (ii), use Problem 5(b) aiid the fact that (X,, Y,) 2 (X, Y)
iff ax, + PY, 2 a X + PY for all real a,p.]
10. Suppose that for a sequence {X,,n >
1 , X ) in L1(P) we have
X, 2 X . Show it is true that E ( X 1 ) 5 liminf, E(IX,), and if, further,
the set is uniformly integrable, then E ( X ) = limn E(X,). [Hint: Use Problem
5 (b) and the image probability Theorem 1.4.1. This strengthening of Vi-
tali's convergence theorem (and Fatou's lemma) is a iioiitrivial contribution
of Probability Theory to Real Analysis!]

11. ( a ) If X is an r.v. on ( 0 ,C, P), then p ( X ) , called a median of the


distribution of X , is any number which satisfies the inequalities
1 1
P [ X 5 p(X)I 2 5, P [ X 2 p(X)] > 5.
Note that a median of X always exists [let p ( X ) = inf{a E R : P [ X 5 a] $) >
+ +
and verify that p ( X ) is a median and p ( a X b) = a p ( X ) b, for a , b E R].
D
If X, + ao, a0 E R, show that p(X,) + a0
(b) A sequence {X,, n >
1) of random variables is bounded in probability
if for each E > 0 there is an no[= no(,)] aiid a coilstant Mo[= M o ( ~ )>] 0 such
that P [ I X , > ME] 5 E for all n > no. Show that
2 X and Y, +
if X,
P
0
are two sequences of random variables, then XnYn + 0, X, + Y, + X , as
P D

n + oo and {X,, n > 1) is bounded in probability. If {X,, n > 1) has the


P P
latter property aiid Y, + 0, theii XnYn + 0 as n + oo.
D D
(c) (Cram&Slutsky) Let X, + X , Y, + a , where a E R and n + oo.
Then X,Y, 2
a x , aiid if a # 0, X,/Y, 2 X l a , so that the distributioiis of
a x and X l a are F ( x / a ) and F ( a x ) for a > 0, 1 - F ( x / a ) and 1 - F ( a x ) for
a < 0. Here again the sequences {X,) and {Y,) need not be independent.
> >
(d) Let {X,, n 1) and {Y,, n 1) be two sequences of random variables
D
on (R,C, P) and a, J, 0, /3, J, 0 be numbers, such that (X, - a ) / a , +X and
(Y, - b)/pn 2 Y, where a , b E R, b # 0. Show that (X, - a)/a,Y, 2 Xlb.
All limits are taken as n + oo.

12. (Kolmogorov). Using the method of proof of Theorem 2.7, show that
if {X,, n >
1) is an independent sequence of bounded random variables on
Exercises 97

(0,E, P), common bound M and means zero, then for any d > 0 we have,
with S, = XI,,

Deduce that if Var(Sn) + oo, then for each d > 0, P[IS, < dl + 0 as n + oo.
13. (Ottaviaiii). Let {X,,n > 1) be independent random variables on
+
( 0 , E , P )and E > 0 be given. If S, = C L = l X k , PIIXk . . . +X,I < E] >
q > 0, 1 < k < n , show that

> >
[Note that if Al = [S1l E ] , aiid f o r k > 1, AI, = [ISkI E , I S< ~E,I <
j < > + +
k - 11, then [JS,J ~ / 2 ]3 Uk(Ak n [JXk+l . . . X,J <
~ / 2 ] )The
.
>
decomposition of [maxk<, ISkl E] is analogous to that used for the proof of
-

Theorem 2.5.1

14. We present two extensioiis of Kolmogorov's inequality for applications.


(a) Let XI, . . . , X, be iiidepeiideiit random variables on (a,C , P) with means
zero aiid variances o:, . . . , a:. Then the following improved one-sided inequal-
ity [similar to that of ceby"sv's; this improvement in 1960 is due to A. W.
Marshall] holds: for E > 0, aiid SI,= ~ f =Xi, , one has

[Hint: Consider f : Rn + R defined by f (zl,. . . z,) = [Czl(~z, + +


a:)/(~~
Cy=l a;)l2, and evaluate E ( f ( X I , . . . , X,)) with the same decomposition as
in Theorem 2.5. If n = 1, this reduces to Problem 6 (a) of Chapter 1.1
(b) Let {X,, n > 1) be independent random variables on (R, C , P) as
above, with zero means and {o:, n > 1) as respective variances. If E > 0,
> >
S, = C;=, XI,, and a1 a2 . . . + 0, show that with simple modifications
of the proof of Theorem 2.5,

[This inequality was noted by J. Hiijek and A. Rknyi.]


+
(c) If in (b) we take a k = (no k - 1 ) for~any~ fixed but arbitrary no > 1,
deduce that
98 2 Independence and Strong Convergence

Hence, if C n > l ( a i / n 2 ) < oo, conclude that the sequence {X,, n > 1) obeys
the SLLN. (Thus we need not use Kronecker's lemma.)

15. In some problems of classical analysis, the demonstration is facilitated


by a suitable application of certain probabilistic ideas and results. This was
long known in proving the Weierstrass approximation of a continuous function
by Bernstein polynomials. Several other results were noted by K. L. Chung for
analogous probabilistic proofs. The following is one such: an inversion formula
for Laplace transforms. Let X1 (A), . . . , Xn(A) be i.i.d. random variables on
(a, C, P), depending on a parameter A > 0, whose common d.f. F is given
by F ( x ) = 0 if x < 0; and = AS: ePxtdt if x >0. If S,(A) = C r = l Xk(A),
using the hints given for Problem 6(b) show that the d.f. of S,(X) is F,, where
Fn(x) = 0 for x < 0, and = [Xn/(n- I)!] J >
: tn-lepxtdt for x 0. Deduce that
P
E(S,(A)) = n/A, VarS,(A) = n/A2, so that S,(n/x) + x as n + oo. Using
the fundamental law of probability, verify that for any bounded continuous
mapping f : R+ + R+, or f Bore1 satisfying E(f (Sn))2< ko < cc (cf.,
also Proposition 4.1.3 later) then E(f (S,)) + E(f (x)) = f (x), by uniform
integrability, (use Scheffk's lemma, Proposition 1.4.6), where

Hence prove, using Problem 6(b), that for any continuous f E L ~ ( R + if


) f
is the Laplace trailsforin of f [ f ( u ) = ST
epUtf (t)dt, u > 01 one has the
illversion

the limit existing uniformly on compact intervals of R+. [Actually f can be


in any LP(R+), 1 < p < oo, not just p = 2. The distribution of X1 above is
called the exponential, aiid that of S, (A), the gamma with parameters (n, A).
More d.f.s are discussed in Section 4.2 later.] The result above is the classical
Post- Widder formula.

16. Let {X,, n > I} be independent random variables on (R, C, P) aiid


Sn = C i = l X k . Then Sn + S a.e. iff Sn 5 S. This result is due t o P. Lkvy.
P D
(We shall prove later that Sn + S can be replaced here by Sn + S, but more
tools are needed for it.) [ Hints: In view of Proposition 2.2, it suffices t o prove
the converse. Now Sn - S + 0 + ISn,n > 1) is Cauchy in probability, so for
P

1 > E > 0, there is an no[= no(€)]such that m , n > no +-PIISn-S, > E] < E.
Thus P[Sk-SmI> E] > I E for all rn < k 5 n. Hence by Problem 13 applied
> m > no}, we get
t o the set {Xj, j
1
max ISk S, > 2 ~ 5] -P[IS, SmI> E] < -.
E
- -

P[m<k<n I-& 1-E


Exercises 99

This implies upon first letting n + cm, and then letting m + cm, since the
0 < E < 1 is arbitrary, that ISk,k >
1) is pointwise Cauchy and hence con-
verges a.e.1

17.(P. Lkvy Inequalities). Let X1, . . . , X, be independent random vari-


ables on ( R , C, P) and Sj = x i = l X k . If a ( X ) denotes a median (cf. Problem
11 ) of X , show that for each E > 0 the following inequalities obtain:
(a) P[maxllj5,(Sj p(Sj Sn))
- - > €1
I 2P[Sn €1; >
(b) P[maxilj5, ISj p(Sj S,)
- - > €1
I 2P[IS, €1. >
[ Hints: Use the same decomposition for inax as we did before. Thus, let
A, = [S, -S, < >
p(S, -S,)], so that P ( A j ) $ , 1 j n , and < <
B, = [Sj p ( S j
- - S,) > E, for the first time at j].

Then B j E o(X1, . . . , X,), Aj E O ( X , + ~.,. . , X,), aiid they are independent;


>
U,"=, B j = B = [max(Sj p ( S j S,)) E] , a disjoint union. Thus P[Sn
- - >
E] > n
C j = l P ( B j n A,) > ~ P ( B )giving
, (a). Since P(-X) = - p ( X ) , write
X j for X,, 1 I j I n , in (a) aiid add it t o (a) to obtain (b). Hence if the X,
are also symmetric, so that p ( X ) = 0, (a) aiid (b) take the followiiig simpler
form:
> <
(a') P[maxi5jln Sj E] 2P[S, E]; >
>
(b') P[maxlSjln SjI E] < 2 P [ I S n &].I >
18. Let {X,, n >
1) be independent random variables on ( R , C , P) with
zero means aiid variances {o:, n >
1) such that En,, o:/b: < cc for some
0 < b, I b,+l /' cc. Then (l/b,) EL=, X k + 0 a . e . [ Hint: Follow the
proof of Theorem 3.6 except that in using Kronecker's lemma (Proposition
3.5) replace the sequelice {n),>l there by the {b,),>l-sequence here. The
- -

same argument holds again.]

19. Let {X,, n > 1) be i.i.d. and be symmetric, based on ( R , C, P). If


Sn = C i = 1 X k , show that for each E > 0,

[Hints: By Problem 17b' and the i.i.d. hypothesis, we have, with So= 0 = Xo,
100 2 Independence and Strong Convergence

since [maxjln S j1 > E] > [rnaxj5, IXj >


2 ~ 1 .Summing and using the hy-
pothesis with n for E, and a, = PIX1 < 2n] in (40), we get

The convergeiice of the given series implies PIISn >


n ~+ ] 0 as n + co,
and then by the (C,1)-summability the second term in (41) + 1. Hence
00
En=l nPIX1 > 2n] < co. But this is the same as the last series (by i.i.d.).
Rewriting P I X 1 I > < +
2n] as E k > a nP [ k IX1 < k 11 and changing the order
of summation one gets X1 E L2(P), and by the i.i.d. hypothesis

The converse here is similar, so that the last equivalence follows. It should
be remarked that actually all the implicatioiis are equivalences. The difficult
part (the first one) needs additional computations, and we have not yet devel-
oped the necessary tools for its proof. This (harder) implication is due t o Hsu
and Robbins (1947), and we establish it later, in Chapter 4.1 Show, however,
what has been given is valid if the symmetry assumption is dropped in the
hypothesis.

20. In the context of the preceding problem, we say [after Hsu and Rob-
bins (1947)l that a sequence {Y,, n >
1) of random variables on ( 0 ,C , P)
converges completely if for each E > 0, (*)C;==, PIIYn > E] < co. Show that
complete convergeiice implies convergeiice a.e. Also, verify that (*) implies
that the a.e. limit of Yn is necessarily zero. Establish by simple examples
that the converse fails. [For example, consider the Lebesgue unit interval and
Yn = n ~ [ ~ Show,
, ~ ~ however,
~ ~ . ]that the converse implication does hold if
there is a probability space (a1, El, P'), a sequence {Z,, n >
1) of indepen-
dent random variables on it such that P[Yn < z] = P1[Zn < z],z E R,n >
1,
Exercises 101

and 2, + 0 a.e. Compare this strengthening with Problem 5. [Hint: Note


that limsup, 2, = 0 a.e., and apply the second Borel-Cantelli lemma.]

21. The following surprising behavior of the symmetric random walk se-
quence was discovered by G. P6lya in 1921. Consider a symmetric random
walk of a particle in the space R '
. If k = 1, the particle moves in unit steps
to the left or right, from the origin, with equal probability. If k = 2, it moves
in unit steps in one of the four directions parallel to the natural coordinate
axes with equal probability, which is 114. In general, it moves in unit steps in
the 2k directions parallel to the natural coordinate axes each step with prob-
ability 1/2k. Show that the particle visits the origin infinitely often if k = 1
or 2, and only finitely often for k = 3. (The last is also true if k > 3.) [Hints:
If e l , . . . ,el, are the unit vectors in IKk, so that ei = ( 0 , . . . , 1 , 0 , .. . , 0 ) with 1
in the ith place, and X, : R + IW'" are i.i.d., then

Let S, = CklX j . Then if k = 1, the result follows from Theorem 4.7, and
if k = 2 or 3, we need to use Theorem 4.8 and verify the convergence or
divergence of (35) there. If p, = P[IS, = 01, so that the particle visits 0 at
step n with probability p,, then the particle can visit 0 only if the positive
and negative steps are equal. Thus p, = 0 for odd n and pa, > 0. However,
by a counting argument ("multinomial distribution "), we see that

Using Stirling's approximation, n! *nnepn, one sees that p, l/n,


and so p2, = oo,as desired. If k = 3, one gets by a similar computation

Again simplification by Stirling's formula shows that 2' , l/n3I2, SO that


C n 2 , p 2 , < cm (in fact, the series is approximately = 0.53), aiid S, is not
recurrent. By more sophisticated computations, Chung and Fuchs in their
work on random walk showed that the same is true if the X, are just i.i.d.,
with E(X1) = 0,O < ~ ( 1 x 1 <~cm,) and no component of X1 is degenerate.
This problem also shows an intimate relation between the structure of random
walks and the group theoretical properties of its range (or state space), aiid
deeper connections with convolution operators on these spaces or the group.
For a recent contribution on the subject, and several references to the related
literature on the problems, the reader is referred to Rao (2004a).]
Chapter 3

Conditioning and Some Dependence Classes

This chapter is aimed at an extended study of two important classes of


lion-independent random families, namely, martingales and Markov processes.
These classes are based on the general concept of conditioning; thus condi-
tional expectatioiis and probabilities are treated in considerable detail. The ex-
istence of such classes is deduced from a fundamental theorem of Kolmogorov
and its generalization by Bochner. This is given in several different forms, and
these ideas are illustrated with important applications as well as problems.

3.1 Conditional Expectations


The formative years of probability theory were naturally dominated by the
then precisely defined iiotioii of independence. However, the need for a relax-
ation of the conditions governing this coiicept was noticed while studying, for
instance, the behavior of sums of independent random variables. But a use-
ful concept generalizing this phenomenon was formulated by A. A. Markov
only in 1906; and later in 1935 the martingale dependence was introduced by
P. L&y. Also in the early 1930s the strict sense and second-order stationary
dependelices were presented respectively by G. D. Birkhoff and A. Khint-
chine based on the needs of ergodic theory and harmonic analysis of random
functions. This chapter consists of a study of the first two classes and their
basic properties. We shall consider them further in the last three chapters,
where some other dependelice notions are briefly discussed. Of course, there
are numerous other definable dependence classes, because events that are iiot
independent are by default dependent. Many of these will iiot be considered
in any detail.
The very definitions of both Markovian and martingale concepts depend on
the notions of conditional probabilities and expectations. The latter concepts
are so fuiidaineiital for the modern developinelits in probability theory that
some authors wish t o start the subject with conditional concepts. However, it
appears desirable t o follow the natural growth of our theory, and formulate
104 3 Conditioning and Some Dependence Classes

each one precisely as an essentially unique solution of a functional equation.


This is analogous t o our mathematical formulation of the independence con-
cept in Chapter 2, which employed a system of equations. We first motivate
the concept, because it is not a simple or a particularly intuitive idea.
Let (a, C, P) be a probability triple that describes an experiment or a
physical phenomenon mathematically. If an event A has been observed, how
does one assign probabilities t o the other events of 0 after incorporating this
knowledge about A? Thus one should consider every event of 0 along with
A, so that C ( A ) = {A n B : B E C} is the new class for which we need
t o define PA : C(A) + R+ as a probability. Assuming P ( A ) > 0, so that
PA(A) = 1 is desired, we see that P A ( C ) = P ( C ) / P ( A ) , C E C(A). Thus
PA(B)= P ( A n B ) / P ( A ) E [O, 11 is the correct assignment t o B . If A, B
are independent, then PA(B)= P(B), and thus A has no influence on B,
as one would like t o have for an extension. It is clear that C ( A ) is a (trace)
0-algebra contained in C aiid PA: C(A) + [0, I ] is a probability measure. So
(A, C(A), PA)is a new triple. Since PA(AC)= 0, we see that PA: C + [0, I]
is also defined, and is a measure. One calls PA an elementary conditional
probability on C relative t o the event A satisfying P ( A ) > 0. If X : C + R
is an integrable random variable, we can define its elementary conditional
expectation, given A, naturally as

where P is the probability measure of the original triple. If P ( A C )> 0 is also


true, theii EAc(X) can be similarly defined, so that the elementary coiiditioiial
expectation of X relative t o A and A" generally determines a two-valued
function. In extending this argument for a countable collectioii P = {A,, n >
>
1) ofeventssuch that P(A,) > 0 , n l,U,>lA, = 0, andA,nA,
- = 0,n #
m (thus P is a partition of 0),we get

Then E ? ( X ) is called the (elementary) conditional expectation of the inte-


grable r.v. X on (0,C, P) relative t o the partition P. This is an adequate
definition as long as one deals with such countable partitions.
The above forinulatioii is not sufficient if the knowledge of a part of the
experiment cannot be expressed in terms of a countable set of conditions. Also,
if P ( A ) = 0, theii the above definition fails. The latter is of common enough
occurrence as, for instance, in the case that X has a continuous distribution (so
that P [ Y = a] = 0, a E R)and A, = [Y = a]; and one needs t o define EAa( X ) .
These are nontrivial problems, and a general theory should address these
difficulties. A satisfactory solution, combining (1) aiid taking the above points
into consideration, has been formulated by Kolmogorov (1933) as follows. Let
us first present this abstract concept as a natural extension of (1).
3.1 Conditional Expectations 105

If B = a(P),the a-algebra generated by P, then E7>(X)of (1) is clearly


B- measurable. Integrating (1) on A E P c B, we get (because A = u k E J A k
for some J c N)

If we write Ea(x)for E p ( X ) and note that P generated B, (2) implies that


the mapping vx : A H J A X d P , A t B, is Pa-continuous (ux << P a ) , P a
being the restriction of P t o B c C , and (2) becomes

This relation is generalized as follows:

Definition 1 Let (a,C, P) be a probability space, X : f l + R be


an r.v. which is integrable (at least X + or X - is integrable), and B c C
be any a-algebra. Then a B-measurable function EB(x) satisfying the set of
equations

is termed the conditional expectation of X relative to B, and P" : A H


E " ( ~ A ) , A E C, is called the conditional probability function relative to B.
Thus p a ( . ) satisfies the functional equation

The existence of Ea(x),hence p a ( . ) : A H ~ " ( ~ A~t 1C,, results from


the fundamental Radon-Nikodfm theorem (Theorem 1.3.12), since

ux : A H
S, X d P , A E C,

defines a signed measure and vx << Pa. Thus du,/dPB = E"(x)a.e. [Pa]
exists and is Pa-unique by that theorem. Any member of the Pa-equivalence
class is called a version of the conditional expectation E"(x), and it is cus-
tomary to call E"(x)an r.v. when a version is meant.
This general concept could not have been formulated before the avail-
ability (before 1930) of the abstract Radon-Nikodfm theorem. [Alternatively,
if the martingale convergence for directed indexes is granted, then (1) can
be extended; cf. Problem 30.1 Since (3) is a special case of (41, the elemen-
tary definition is included in the present one. We note that in (41,if X is
not integrable [i.e., E ( X + ) = cc but E ( X p ) < oo, or E ( X + ) < cc but
E ( X p ) = oo], then dux/dPa = +oo (or = o o ) on a set of positive Pa-
measure, so that E"(x)= E"(x+) - E"(x-) is still defined but need not
106 3 Conditioning and Some Dependence Classes

be a proper r.v. as compared with X . Thus the general case is deeper and
not quite intuitive, in contrast to the elementary formulation (1). Similarly,
P" : C + L m ( P ) , the conditional probability, is not a (scalar) measure, but
it is a vector space-valued (bounded) function. These concepts constitute an
enormous generalization of the classical expectation aiid probability notions.
Just as the definition of independence is given by a system of equations, so are
these conditional notions given by (4) and (5). We now present some simple
properties and also show some of their individual characteristics that are not
possessed by the uiiconditional concepts. These will give us a clear idea of
their structure.
The first consequences are coiltailled in the following:

Proposition 2 Let ( 0 ,C, P) be a probability space, B c C, a a-algebra.


T h e n the conditional expectation operator E": L1(P) + L 1 ( P ) has the fol-
lowing properties a.e. Let { X , Y , X Y ) c L1(P) : ( i ) E"(x)> 0 if X > 0,
~ " ( 1= +
) 1, (ii) E " ( ~ xb y ) = ~ E " ( x + )~ E " ( Y ) , (iii) E"(xE"(Y)) =
E"(X)EB(Y), (iv) IEB(X)I < EB(IXl), so that IE"(X)Ill < I X I l , (v) if
B1 c B2 c C are a-algebras, then E"I (EB2 ( X ) ) = EB2(EB1 ( X ) ) = EB1( X ) ,
whence the operator EB1is always a contractive projection o n L'(P), (vi) if
B = (0,R), then E"(x)= E ( X ) , and if B = C, t h e n E"(x)= X ; also for
a n y a-algebra B1 c C, E(EU1 ( X ) ) = E ( X ) for all X E L1(P) identically.

Proof Definition 1 implies both (i) and (ii). Taken together, it says that
E" is a positive linear operator on L 1 ( P ) .
For (iii), if X1 = E"(Y) E L1(R,B,P"), we have to show that

If X1 = X A , A E B, then AnB E B, so that (6) is true by (4). Thus by linearity


of the Lebesgue integral, (6) holds if X1 = Czl aiXA,,A, E B. If X , Y are
positive, so that X1 >0, then there exist B-measurable simple functions Y,
such that 0 <Y, 1' X I , and by the monotone convergence theorem the last
two terins of (6) will be equal. They thus agree for all X , Y in L1 (P).Since
the first two terins are always equal, (6) holds as stated, and (iii) follows.
Since X < <
X 1x1,by (i) we get I E " ( x ) I <
E"(~x ) integration
aiid
yields (iv), on using (4). Similarly (vi) is immediate.
For (v), since B1 c B2 implies E B 1 ( X ) is B1-, hence B2-measurable, it
follows that EB1(E""x))= EB1( X ) , a.e. On the other hand,

Identifying the extreme integrands which are B1-measurable, we get


3.1 Conditional Expectations 107

E"I ( ~ " 2 ( X ) ) = EB1( X ) a.e. Thus (v) holds, and if


B1 = B2, EB1(EB1(X))= E n l ( X ) , so that EB1. EB1= EB1.This com-
pletes the proof.

Remark Property (iii) is often called an averaging, and (v) the commu-
tativity property of the conditional expectation operator. Item (iv) is termed
the contractivity property. Also, ~"(1x1) = 0 a.e. iff X = 0 a.e., since

Thus E" can be called a faithful operator.

Several of the standard limiting operations are also true for E", as shown
by the followiiig result.

Theorem 3 Let {X,, n >


1) be random variables on ( 0 ,C , P) and
B cC be a a-algebra. Then we have
< <
(i) (Monotone Coiivergence) 0 Xn T X +- 0 E U ( X n )T E"(x)a.e.
<
(ii) (Fatou's Lemma) 0 X, +- EU(liminf, X,) <
lim inf, E"(x,)a.e.
(iii) (Dominated Convergence) X,I 5 Y , E ( Y ) < oo, and X, + X a.e.
+- E"(x,)+ E"(x)a.e. and in L1(p). If here X, 3 X , then E"(x,)+
E"(x)in L1(P) (but not necessarily a.e.).
(iv) (Special Vitali Convergence) { X n , n >
1) is uniformly integrable,
X n + X a.e. +- IE"(x,) E B ( X ) Il 1 + 0, so that {E"(x,),
- n 1) is>
uniformly integrable and E U ( X n ) converges in probability to E"(x).(This
convergence is again not necessarily a.e., and the full Vitali convergence
theorem is not valid for the conditional expectations.)

Proof The argument is a modification of the classical (unconditional) case.


<
(i) Since E"(x,) E"(X,+~),by the preceding proposition, and since
limn E"(x,)exists and is B-measurable, we have for any A E B

= liin l EB(X,)dPB (by the classical monotone coiiwrgence),

=S, lim X,dP (by the classical monotone convergence),

=S, X ~ =
P S, E R ( x ) d P B , (by definition).
108 3 Conditioning and Some Dependence Classes

Since A E B is arbitrary and the extreme integrands are B-measurable, they


can be identified. Thus limn En(xn) = E"(x)a.e.

(ii) This is similar. Indeed, let Yn = inf{Xk : k >


n). Then 0 Yn < Y =
lim inf, X,. Heiice by (i) and the monotonicity of EUwe have, since Yn < X,,

EU(liminf x,) = E"(Y) = liin E"(Y,) = liin iiif E"(Y,)


n n n

< liinniiif E"(x,)a.e.


Similarly, if X, 5 Z a.e., E ( ( Z ( )< oo,then EB(lim sup, x,) 2 lim sup, EB(Xn)
a.e.
(iii) Since Y 5 X, 5 Y a.e., n > 1, we can apply (ii) t o X, + Y > 0 and
Xn 5 Y. Heiice with

lim inf (X,


n
+ Y) = limninf X n + Y = X + Y = limnsup(Xn + Y ) ,
one has

E"(x)+ E B ( Y ) = E" (limni n f ( x n + Y))

<
n n
+ Y)
- lim s u p ( ~ " ( x , ) + En(y))= lim sup EB(xn

5 E" (lim sup(Xn + Y) , (since L,+ Y 5 2Ya.e.1,


n

Thus there is equality throughout. Caiicelling the (finite) r.v. E"(Y), we get
limn E B ( X n ) = E n ( X ) a.e. Finally, as n + cm,

E(IE'(X, X ) I ) < E(E"(IX, X I ) ) (by Proposition 2iv)


< E(IXn XI) (since E = E W u , B o= (0,a))
-

+0 (by the classical dominated convergence).

This yields the last statement also. The negative statement is clear since, e.g.,
P
if B = C , then X n + X +
a.e. convergence.
(iv) Again we have
3.1 Conditional Expectations 109

as n + oo, by the classical Vitali theorem, since the Xn are uniformly in-
tegrable and the measure space is finite. The last statement is obvious. We
obtain the negative result as a consequence of (the deeper) Theorem 5 below,
finishing the proof.

If B c C is given by B = a ( Y ) for an r.v. Y : R + R, then for X E L1(P)


we also write E " ( ~ ) ( Xas) E ( X I Y ) , following custom. It is then read simply
as the conditional expectation of X given (or relative to) Y. The following two
properties are again of frequent use in applications, aiid the first one shows
that Definition 1 is a true generalization of the independence concept.

Proposition 4 Let X be a n integrable r.v. o n ( R , C, P). If B c C i s a


a - algebra independent of X , then E"(x)= E ( X ) a.e. If Y i s any r.v. o n 0,
then there i s a Borel mapping g : R + R such that E ( X Y ) = g(Y) a.e., so
that the conditional expectation of X given Y i s equivalent to a Borel function
of Y.

Proof If X and B are independent, then for any B E B, X and XB are


independent. Hence

= E(XxB) (by independence)

Since the extreme integralids are B-measurable, E ( X ) = E"(x)a.e.


For the last statement, E(X1Y) is a(Y)-measurable. Then by Proposition
1.2.3, there is a Borel g : R + R such that E(X1Y) = g(Y) a.e. This completes
the proof.
Let us now return to the parenthetical statement of Theorem 3iv. The pre-
ceding work indicates that a conditional expectation operator has just about
all the properties of the ordinary expectation. The discussion presented thus
far is not delicate enough to explain the subtleties of the conditional concept
unless we prove the above negative statement. This will be the purpose of the
next important result, due t o Blackwell and Dubins (1963). In this connection
the reader should compare Exercise 10 of Chapter 2, aiid the following result,
which is essentially also a converse to the conditional Lebesgue dominated
convergence theorem. It reveals a special characteristic of probability theory,
to be explained later on.

Theorem 5 Let {X, X,, n > 1) be a set of integrable random variables


o n ( R ,C , P ) such that X n + X a.e., and, if U = lXnl, E ( U ) =
+cm. T h e n there exists [after a n enlargement of ( R , C, P) b y adjunction if
110 3 Conditioning and Some Dependence Classes

necessary], a sequence {X1,X;,n >


1) of integrable random variables on it
having the same finite dimensional distributions as the X,-sequence, i.e.,

z, R, 1 < i < k < GO, such that for some a-algebra Bo c C , E"o(x;) +
E U u ( X ' ) at almost no point as n + GO.I n other words, even if the X n are
uniformly integrable, the X; satisfy:

[Thus the interchange of limit and E"U is not generally valid.]

Proof We first make a number of reductions before completing the ar-


gument. Since X, + X a.e. implies that p(X,) + p ( X ) a.e. for each real
continuous p on R, taking p(z) = max(z, 0) = z+, we get X: + X* a.e.
>
Thus for the proof, it is sufficient t o assume that X n 0 in addition. Let us
then suppose that the given sequence is nonnegative from now on, and present
the demonstration in steps.

I. It is enough t o establish the result with X = 0 a.e. For let Y, = (X, -

X ) + aiid Zn = ( X , X ) , so that X, X = Yn +Z,. Clearly E(Y,) < GO,


E(-2,) < cm, and Y, + 0 a.e., Z, + 0 a.e. Also,

sup Yn > sup(X, - X) = sup X n - X = U - X,


n) 1 n) 1 n) 1

so that E(sup,,, Y,) > E ( U ) E ( X ) = +GO, by hypothesis. On the other


-

hand, by definition IZ, = (X, X ) =X


- X n if X n < X , aiid = 0 if
-

X, >
X , so that (recalling X,, X 0) >
sup ZnI <X aiid E ( X ) < cm.
n) 1

Since Zn + 0 a.e., by Theorem Siii, for every 0-algebra B C C, ~ " ( 2 ,+


) 0
a.e. Consequently,

EB(xn
-X) = 01 = [
P lim En().;,)
n-00
+ n-00
lim ~ " ( 2 ,=
I
)0

= P lim E"(Y,) =
In+_

Hence, if ( 0 ,C , P) is rich enough so that there is a a-algebra B c C such


that the right side of (10) is zero, it will follow that E"(x,)+ E"(x)at
almost no point. Thus by the adjunction procedure the probability space can
be enlarged t o have this needed richness, so the result will follow if a sequence
>
Yn 0, Y, + 0 a.e., but E"(Y,) + 0 almost nowhere, is constructed.
3.1 Conditional Expectations 111

11. We now assert that it is even sufficient t o find a suitable two-valued YA-
sequence. More explicitly, we only need t o construct a sequence {YA, n > 1)
such that YA + Y' = 0, P[YA = a,] = p, = 1 - P[YA = 01, 0 < pn < 1,
a, > 0, and for each w E R,YA(w) . YA(w) = 0 if n # rn, C n = , p n = 1, with
00

00
U' = supn2l YA = E n = , YA, satisfying

For let Ak be the event that Yk >U - 1 for the first time, with the Yk, U
of the last step. This means

Then the Ak are disjoint, u k > l A k = [U >


11. Let A. = R - Uk>lAk,and
consider YkxA,, k > 1. By the structure theorem for measurablefunctions,
<
there exists a sequence 0 f k n Y k x A kpointwise,
, with each f k n as a siinple
function. Hence there is no = no(k) such that if hk = fkn, <
YkxAk,then

Here hk = b i x ~ may
, be assumed to be in canonical form, so that
Bi n = 0,Bi c Ak,bi > 0. Hence supk,,- hk = C F = l hk, since the Ak are
also disjoint. Thus

But hk 5 Yk; thus for each a-algebra B c C,

> >
aiid hence P[limk~ ' ( h k )= 01 P[limkYk = 01 0, aiid it suffices t o show
that P[limk~ ' ( h k )= 01 = 0 when hk + 0, each hk being a positive siinple
function satisfying (12). Since each hk is in the canonical form, it is a sum
112 3 Conditioning and Some Dependence Classes

of a finite number of two-valued functions and is nonzero on only one set


(a Bi here), and for different k, hk lives on an Ak (a disjoint sequence). Now
rearrange these two-valued functions into a single sequence, omit the functions
which are identically zero, and let Bo be the set on which all these vanish. If
we add XB, to the above set of two-valued functions, we get a sequence, to be
called Yi, Yz/, . . . , which satisfies the conditions given at the beginning of the
step. So it is now sufficient to prove the theorem in this case for a suitable
a-algebra Bo c C .
111. We now construct the required {Yl, >
k 1) and Bo. Since 0 < pl < 1,
choose an integer k > 1 such that 2Tk < pl <
2Tk+'. Let ai > 0 be numbers
satisfying CE1aipi = +oo. Let N, = {i 2 : 2,+' > <
ai < 2,+lc+'), n 1. >
Note that N, = 8 is possible and N, depends on k. Consider

Let T(= Tk) = u,N, = {i > 2 : a, > 2"'). Set r = CiETpi <1 - pl <
1 2 ~ "and
-

Let {W, Z, > O):==, be a set of mutually independent random variables on


(a, E, P) (assumed rich enough, by enlarging it if necessary) whose distribu-
tions are specified as

P [ W = 01 = 1 - t , P [ W = n] = t,, n > 1,

P f ifi>2,i@T
0, otherwise,

and for n > 1,

for n E N,
0, otherwise.

Let Bo = a(Z,, n >


0). We next define the desired two-valued random vari-
ables from {W, Z,, n >
0), and verify that they and this Bo satisfy our re-
quirements.
Consider the composed r.v. V = Z o W(= Zw). We assert that P [ V =
n] = p,. Indeed,
3.1 Conditional Expectations

00

= C P[Zn = 11P [ W = n] (by independence)


n=O

= pl - 2 - 9 2-" ppl [by (14) aiid (15)]


Next for i > 2,i @ T ,
1
P [ V = i] = P [ W = 0, Zo = i] = (1 - t)p,(l - t)- = p,,

and for i E T
P [ V = i] = P [ W = n , Zn = i] = p,. (16)
This proves the assertion. Define YA = a,X[v=,p Then the YA have the two
values 0, a, > 0, and, [V = n] being disjoint events, for each n only one YA(w)
is different from zero. Atoreover, YA + 0 a.e., and

P[YA = a,] = P [ V = n] = p,, (17)


as required in step 11. Thus {YA, n >
0) and Bo are the sequence and a a-
algebra, aiid it is to be shown that this satisfies the other coiiditioii of step I1
to complete the proof.
IV. E"U(YA) + 0 at almost no point of R.
For since EB0(YA) = a , ~ " ~ ( ~ [ ~== a,pB0
, ~ ) [V = n], it suffices t o show
that P[E"o(Y;) > 1, i.0. ] = P[P"O[V = n ] >- lla,, i.o.1 = 1. This will be

established by means of the second Borel-Cantelli lemma.


Since o ( Z n ) c Bo, by Propositioiis 2v and 4, we have for i E Nn,

Thus a,P[V = ilZ, = i] = a,t,,i E N,. But t, > 2Tnpk and, for i E N,,
a, > 2n+k, so that ant, > 1. Coiisequeiitly if A, = [Z, # 11 and
B, = [ a n P [ V= il Z, = i] > 1 for some i E N,],
then for n > 1,A, c B,. Though the B, are not necessarily indepeii-
dent, the A, are mutually independent, since the Z,, n > 0, are. Thus [A,,
114 3 Conditioning and Some Dependence Classes

i.o.1 c [B,, i.o.1, and it suffices t o show that P[A,, i.o.1 = 1 by verifying that
C r = l P ( A n ) = oo (cf. Theorem 2.1.9ii).
By ( I s ) , P ( A n ) = CztN,, pz/tn = rnltn. To show that - (rn/'tn) = ~7

note that r, = t, -2-("+'"), by (13). Now, if r, > 2P(n+k)for infinitely many


n , then r,/t, > 112 for all those n , so that En,,
-
(r,/t,) = GO.If r, < 2P(n+k)
>
for n no and some n theii t, < 2-(n+"1), so

Consequently,

where To = {i >2 : ai >


2,0+~}. But CzgT0 piai <
2,0+" C i y l p i <
- yo+k 7

and by choice Ci,, piai = +GO.Hence CitTo piai = GO, so that the series in
(19) diverges. Thus in all cases C,,,(r,/t,) = +oo. It follows that P[A,,
Lo.] = 1, which completes the proof of the theorem.

The implications of this result will be discussed further after studying


some properties of conditional probabilities. But it shows that a uniformly
integrable sequence {X,, n > 1) such that X, + X a.e. does not imply
E"(x,)+ E"(x)a.e. for each a-algebra B c C when E(sup, X,I) = +oo.
Thus the Vitali convergence theorem of Lebesgue integrals does not extend,
with pointwise a.e. convergence, t o conditional expectations. It is possible,
however, t o present conditions depending on the given a-algebra B c C, in
order that a certain exteiisioii (of Vitali's theorem) holds. Let us say that a se-
quence {X,, n > 1) of random variables is conditionally uniformly integrable
relative to a a-algebra B c c (or c.u.i.) if limk,, E"(X,I~[~~,, ) k l ) = o a.e.,
uniformly in n. If B = (0,R}, then this is the classical uniform integrability
<
on a probability space (cf. Section 1.4). It is always satisfied if I X n Y and
E ( Y ) < oo, by Theorem Siii, but may be hard t o verify in a given problem.
Note that if the X,, n > 1, are iiitegrable, theii the c.u.i. implies the classical
concept (use the eiisuiiig result and Theorem 5), but the converse is not true.
We have the following assertion, complementing the above remarks:

Proposition 6 Let { X , X n , n >


1) be a sequence of integrable ran-
dom variables on (a, C, P) and B c C be a 0-algebra. If this sequence
is conditionally uniformly integrable relative to B and X, + X a.e., then
E"(x,)+ E"(x)a.e. [and in L1(P)-norm by Theorem 3 already].

Proof Since the Xn-sequence is conditionally uniformly integrable relative


>
t o B, it is clear that the sequences {x:, n 1) have the same property where,
as usual, X$ = max(X,, 0) and X; = X z X, - >
0. Thus the hypothesis
implies
Hence, using the fact that liminfn(-a,) = -limsupn(an) for any { a n , n >
1) c R,(21) gives for all m > 0,

i ni f E ( )2 - lim sup E" ( X i )<


,, - Urn
n n

(by Theorem 3ii, since the X;X[~,~,,] are bounded)

= E
" (liin i i i f ( - ~ ; x [ ~ ~
n
- u,,

2 E" (liin iiif Xn)


n
- Urn a.e.

Since m > 0 is arbitrary aiid Urn + 0 as rn + oo, (22) implies

liiniiif E B ( X n ) 2 E" (liiniiif Xn) a.e. (23)


n n

Considering X z and -Xn in the above, we deduce that

lim sup E U ( X n )= - liin iiif E"(-x,)


n n

<
- E" (- lim inf (-x,,))a x . [by (23)j
n

Since X, + X a.e., lim sup, X, = lim inf, X, = X a.e. so that (23) aiid (24)
imply

E"(x)< liminf EB(xn)


I l i m s u p E B ( x n ) I E"(x)a.e. (25)
n n
116 3 Conditioning and Some Dependence Classes

Hence limn E"(x,)= E"(x)a.e., as asserted.

If {X,, n > 1) is conditionally uniformly integrable relative to each a-


algebra B c C, theii the sequence constructed in the proof of Theorem 5
is prevented. Hence by that theorem E(sup, IX,) < oo must theii be true.
Combining these two results, we can deduce the following comprehensive re-
sult.

Theorem 7 Let {X,X,, n > 1) be a sequence of integrable random


variables o n a probability space (a, C, P ) such that X, + X a.e. T h e n the
following statements are equivalent:
(9 E(supn2, IXnl? < oo.
(ii) For each a-algebra B c C, E'(x,) + E"(x)a.e.
(iii) For each a-algebra B c C, {X,, n > 1) i s conditionally uniformly
integrable relative t o B.
If any one of these equivalent conditions i s satisfied, then the L1(P)-convergence
also holds.

Now we ask: Is it possible to state a full conditional Vitali convergence


assertion? One direction is Proposition 6. The converse appears slightly dif-
ferent; this is given as Problem 5. It is easy to give a sufficient condition on
the sequence {X,, n >1) in order that the hypothesis of the above theorem
be verified:

Corollary 8 Let p : R+ + R+ be a n increasing function such that


p ( z ) / z /( oo as z /' oo. If {X,,n >
1) i s a set of random variables o n
(R, C , P ) such that X, + X a.e. and E ( y ( I X , ) ) < oo, suppose for each a-
<
algebra B c C there is a constant CB > 0 such that E B ( p ( I ~ , ) ) CB < cc
a.e. T h e n E'(x,) + E"(x)a.e. If the set {C" : CB c C) i s bounded, then
E(sup, XnI) < a,.

Proof Let [(x) = ( y ( x ) / x ) and A, = [X,I > m]. Then

as m + oo (uniformly in n). Thus {X,, n >


1) is conditionally uniformly
<
integrable relative t o B. If CB C < oo, all B c C, then c.u.i. holds for all
B, and Theorem 7iii applies.

Using the order-preserving property of the conditional expectation oper-


ator E", we can extend the classical proofs, as in Theorem 3, and obtain
conditional versions of the inequalities of Holder, Minkowski, and Jellsen as
follows.
3.1 Conditional Expectations 117

Theorem 9 Let X , Y be random variables o n ( 0 ,C , P ) , p > 1, and


B c C a a-algebra. T h e n we have
1 1
(i) E " ( I X Y ) " ( ~ Y+~ ~=) 1.
< [ E " ( x I ~ ) ] ~ ~ ~ [ Ea.e., ]~/~
- (26)
-
P 4

(ii) E'(IX + Y p ) < [ ( E U ( ~ X p ) ' l+p ( E " ( Y ~ ) ~ a.e.,


]'~~ P7)

(iii) If p : R + R i s a convex function such that E ( y ( X ) ) exists, and


<
Y = E" (x)a. e., o r Y E"(x)a. e. and y i s also increasing, then

Proof Since En is faithful (cf. remark after Proposition 2), these inequali-
ties follow from the unconditional results. Briefly, (26) is true if X =O a.e. Thus
let 0 < E'(~xP)= N$ < ce a.e., and 0 < ~ ' ( 1 ~ 4 =) N; < oo a.e. Then
the numerical inequality of Eq. (4) of Section 1.3, with ( a = l l p , C,? = 114,
and 1 < p < oo there ( N i = Nx, N$ = Ny) implies

Note that N$, N; are B-measurable, so that by Proposition 2,

This is clearly equivalent to (i), called the conditional Holder inequality. Sim-
ilarly (ii) and (iii) are established. Because of its importance we outline (28)
again in this setting.
Recall that the support line property of a convex function at yo is written
as [cf. Eq. (13) of Section 1.31

+
P(Y?2 $ 4 ~ 0 ) g(yo)(y - yo), y € R, (30)
where g : R + R is in fact the right (or left) derivative of y at yo and
is nondecreasing. (It is strictly increasing if y is strictly convex.) Take y =
X(w),yo = EB(x)(w), in (30). Then we get

Since the hypothesis implies g ( E U ( x ) )is B-measurable, we have, again with


the averaging property of E' applied to the fuiictioiial inequality (311,

which is (28). Under the alternative set of hypotheses of (iii), one has from
(32), since y is now increasing,
118 3 Conditioning and Some Dependence Classes

Note that (32) and (32') are valid for any bounded r. v. X , and then the gen-
eral case follows by the conditional inonotoiie coilvergelice theorem, so that
E " ( ~ ( x )exists.
) This completes the proof.

Again equality conditions obtain in the above inequalities by an analysis


of the proofs, just as in the classical unconditional case. For instance, it can be
verified that in the conditional Jensen inequality (32), if y is strictly convex
and B is complete, equality holds when aiid only when X is B-measurable.
We leave the proof to the reader.
The preceding deinoiistratioii may lead one to think that similar results are
true not only for operators such as E", but also for a larger class of operators
on the L"-spaces. This is indeed true, and we present a family of mappings in
the problem section (cf. Problems 6 and 7ii) to indicate this phenomenon.
The above theorem yields further structural properties of E", as noted in
the following:

Corollary 10 For each a-algebra B c C the operator E" is a linear


contraction on LP(R, C , P ) , 1 < p < oo, i.e., it is linear and

Proof Since P is a finite measure LP c L1, >1, so that E" is defined


on all L", and is clearly linear. To prove the contraction property which we
have seen for p = 1 in Proposition 2, note that for p = +oo this is clear, since
<
1x1 IIXII, a=, and

Thus let 1 < p < oo. Then by (28), since lXlP is B-measurable

This implies (33), aiid that E"(x)E L" for each X E L", as asserted.

Remark The result implies that E"(x,)+ E"(x)in L"(P)-mean when-


ever Xn + X in LP(P)-mean.

It is clear that E " ( L ~ ( R CP))


, = LP(R, B, P"). Thus it is natural t o ask
whether there are any other contractive projections on L"(R, C, P) with range
L"(0, B, P"), or, considering the averaging property of E" (cf. Proposition
2iii), are there any other contractive projections on LP + Lp with the av-
eraging property and having constants as fixed points? These questions lead
to the correspoiidiiig nontrivial characterization problems for conditional ex-
pectations and are functional analytic in nature. To indicate the flavor of the
3.1 Conditional Expectations

problems, we present just a sample result. [It is not used in later work.]

Theorem 11 Let B c C be a a-algebra and consider a contractive pro-


jection Q o n L1 (C) [ = L1 (R, C, P)]with range L1 (B). T h e n Q = E". Con-
versely, E" i s always a contractive projection o n L1(C) with range L1(B) for
any a-algebra B c C.

Proof The second part has already been established. We need to show
the more involved result Q = E". For this we first assert that Q is a positive
operator with the property E ( Q X ) = E ( X ) , X E L1(C), and reduce the
general case t o this special result. Indeed, by the density of bounded fuiictioiis
in L1(C), and the fact that L1(C) is a vector lattice [i.e., X E L1(C) + X =
<
X + - X - , 0 X* E L1(C)],it suffices to establish the result for 0 X < <
1
a.e. Siiice Q is identity on its range (because Q2 = Q), and 1 E L1(B), it
follows that Q l = 1. Then

E(1 X ) = I l X I 1 > I Q ( 1 X ) I 1 (since Q is acontraction)


= E(I1- Q X ) = 1- E ( Q X ) .

This implies

There is equality throughout and this shows that Q X >


0 a.e., as well as
E ( X ) = E ( Q X ) , proving our assertion.
We are now ready to establish the general result, namely, Q = E". Siiice
for each X E L1 (C),Q X E L1 (B), and the same for En(x),we have, for all
A E B,

E ( x A Q X ) = E ( Q ( x A Q X ) ) (by the assertion),

If we show that the right sides of these equations in (34) are equal, then we
get the equality of the left sides giving the middle equality below:

and hence the arbitrariness of A in B, the B-measurability of E"(x)and Q X


imply E"X = Q X . Let us prove the above equality.
First, one obtains the stronger statement that Q(x'X) = Q(x'QX), A E
B and for all X such that 0 X < <
X A . Indeed, by the positivity of Q
< <
established at the beginning of this proof, 0 Q X QxA = X A , since XA E
L1(B). Thus Q vanishes outside A, and hence, if g = Q ( x A X ) - Q ( x A Q X ) ,

XA + g = Q[XA(XA + X - QX)] < Q[XA+ X - QX] = XA a.e. (35)


120 3 Conditioning and Some Dependence Classes

< < >


Hence g 0 a.e. Similarly XA - g XA and g 0, so that g = 0 a.e. for all
< <
0 X XA.If 0 X < < 1, then consider X x A ,XxA.. By the special result
of (35) and the fact that Q ( X X A ) X A=~ 0 = Q(XxA,)xA,and the linearity
of Q we get [even without ( 3 5 ) ]

Hence Q X = E'(x) for 0 <


X <
1. Then by linearity, this holds for all
bounded random variables X , aiid by density of such X in L1(C), aiid the
continuity of the operators Q , E', the result holds for all X E L1(C). This
completes the proof.

With further work, one can prove that the above theorem is true if L 1 ( C )
<
is replaced by LP(C), 1 p < oo.It is also true if Q is a contractive projection
<
on LP(C), 1 p < oo,p # 2, but Q1 = 1 a.e. Then its range can be shown
t o be of the form L"(B). The result again holds for p = 2 if Q is assumed
positive in addition. These and related characterizations are not essential for
the present work and are not pursued. A detailed account of the latter may
be found in the first author's book (1981).

3.2 Conditional Probabilities


In Definition 1.1 we introduced the general coiicept of the conditional prob-
ability fuiictioii relative to a 0-algebra B c C as any fuiictioii P" : C +
L Y ( R , B, P) satisfying the functional equation

Since p B ( A ) = E " ( ~ A ) and p n ( A ) is a B-measurable function, unique outside


of a P"-null set depending on A, one says that any member of the equiva-
lence class is a version of the coiiditioiial probability (of A given B), aiid P'
is referred t o as the conditional probability. Some immediate consequences of
the definition are recorded for reference:

Proposition 1 Let ( 0 ,C, P) be a probability space, and B c C be a


0-algebra. T h e n the following assertions are true:
(i) A E C +-0 < P"(A) < 1 a.e., ~ ' ( f l )= 1 a.e., and P'(A) =0 a.e. if
P ( A ) = 0.
3.2 Conditional Probabilities 121

(ii) {A,,n >


1) c C , A n disjoint + P"(U,>~A,) = C r = l P B ( A n ) a.e.
-

= supn C r = l Pn(A,) a. e..


>
(iii) If {An, n 1) are as in (ii) and A = U ~ = ~ A t, h, e n for 1 p < oo, <
P"(A) -
m

C P"(A~) i o as m - w.

These assertions are immediate from Theorem 1.3. Consider (iii) as an illus-
tration:

1 P A -

n=l
P A ) 1 P

P
= /n
P ( u ~ A ~ )d&] ' [using (ii)]

(by Corollary 1.10, Jensen's inequality)

= P (u,>,A,) i O as m i oo.

Taking B = C (or An E B for all n ), one sees that the assertion fails if
p = +w, since P B ( A ) = E"(XA) = XA hence I x ~ ~ > , , A ~ l o o = 1 f + O as
-

n + w.
This proposition states that P"(.) has formally the same properties as the
ordinary measure P. However, each property has an exceptional P-null set
which varies with the sequence. Thus if B is not generated by a (countable)
partition of C, then the collection of these exceptional iiull sets can have a
union of positive P-measure. This indicates that there may be difficulties in
treating P U ( . ) ( w ) : C + [0, I] as a standard probability measure for almost
all w E R.Indeed, there are counterexamples showing that PB(.)(w) cannot
always be regarded as an ordinary probability function. We analyze this fact
and the significance of property (iii) of the above proposition since the struc-
ture of conditional probability fuiictions is essential for the subject.

Proposition 2 There exists a probability space (0,C, P) and a a-algebra


Bo c C such that for the conditional probability P B O ,PBO(.)(w): C + [O, 11
i s n o t a probability measure for almost all w E 0.

Proof It was already seen that P" exists satisfying the fuiictioiial equation
( I ) for any a-algebra B c C . By definition = PB(A) and hence, by
the linearity of E", we have for each simple function f = C r = l a,XA,, Ai E C,
122 3 Conditioning and Some Dependence Classes

where the integral for simple functions is defined by the sum. It is easily seen
that the integral in (2) is well defined for all simple functions, and does not
depend on the representation of f . Thus for such f , there is a P-null set Nf
such that for w t f l Nf

We observe that, if P"(.)(w) is a measure for each B c C and w E R - No for


some fixed P-null set No (> N f ) , then (3) will be a Lebesgue integral, and
thus it can be extended to all measurable f : R +, '
R using Theorem 1.3i on
the left aiid the inonotoiie convergence on the right. If follows from this that
(3) holds for all P-integrable f , since theii the right side of (3) is the standard
(Lebesgue) integral.
Now suppose that our (0,C , P) is the probability space given in Theorem
1.5 with B = Bo c C there. If p B ( . ) ( w ) can be regarded as a probability
measure, then by the standard Vitali convergence theorem for each sequence
{f,,n > 1) of uniformly integrable fuiictioiis such that f, + f a.e.(P), we
must have for each w E 0 - No, ( P ( N o ) = 01,

= lim E"~(~,)(W).
n-00

However, if the above sequence is not dominated by an integrable function


[i.e., sup,2l f , @ L1(P)] as in Theorem 1.5, then by that theorem, (4) must
>
be false for alinost all w. It is sufficient to take for {f,, n 1) the two-valued
sequence of that same theorem. Consequently our assumption that

P""(.((w): C + [O, I]

is a probability for a.a.(w) cannot hold, and in fact p B 0 ( . ) ( w ) is a measure for


alinost no w t R This completes the proof. [We give another couiiterexample
in Problem 13, using the axiom of choice.]

Motivated by the above proposition we introduce

Definition 3 If ( f l , C, P) is a probability space aiid B c C is a a-algebra,


theii a mapping P(., .) : C x f l + [O, I] is called a regular conditional probability
if (i) ~ ( . , w :) C + [O, 11 is a probability for each w t f l No, P ( N o ) = 0,
-

(ii) P(A,.) : 0 + [O, 11 is a B-measurable function for each A E C, and P


satisfies ( I ) , so that

P(A, w)p(dU) = P ( A n B ) , A t C, B t B. (5)


3.2 Conditional Probabilities 123

Since by the Radon-Nikod9m theorem the mapping P satisfying this func-


tional equation is P-unique, it follows that P ( A , .) = P"(A) a.e., so that
P ( . ,.), if it exists, must be a version of P". The preceding proposition asserts
that a regular coiiditioiial probability need iiot always exist, while Eq. (2) of
Section 1 shows that such a mapping exists if B is iiot "too large," e.g., if it is
generated by a countable partition. Note that, by (3), if there exists a regular
probability function P", then E"(.) is simply an integral relative to this mea-
sure. This means (1) of Section 1.3 will be true in this general case also, but it
is iiot valid (by Proposition 2) for all conditional probability functions. This
circumstance raises the followiiig two important questions in the subject:
(A) Under what restrictioiis on the probability space ( f l , C, P),and/or the
a-algebra B c C , does there exist a regular conditional probability function?
(B) To what extent can the theory of conditional probability be developed
without regard to regularity?
It is rather important and useful to know the solutions to these problems,
because they are, in a sense, special to probability theory aiid also distinguish
its individual character. We consider (A) in detail, but first let us record a
few remarks concerning (B). Since in general P" cannot be forced to behave
like a scalar measure (by Proposition 2), let us return to Proposition liii,
which does not depend on restrictions of the iiull sets. Indeed, this says that
<
P' : C + Lp(fl, B, P),1 p < oo, is a-additive in the p-norm, and if we
only look upon the LP-spaces as vector lattices [i.e., for f , g E LP, f < g iff
<
f (w) g(w) a.a. (w)] , then the parenthetical statement of Proposition lii is
that
P " : C + L ~ ( ~ ~ , B , P1
) ,< p < o o .
<
is a-additive in the order topology. If 1 p < oo, the norm aiid order topolo-
gies coincide for studying P", and if p = oo, the order topology is weaker
than that of the norm. If we therefore regard P" as a vector-valued mapping
from C into the positive part of L p ( 0 , C, P), p >1, then one can develop
the theory with the a-additivity of P' in the order or norm topologies. This
aspect is called the vector measure theory, and using the latter point of view,
the integral in (3) can be developed for all f E LP(C). Evidently, the clas-
sical Vitali convergence result is false in this generality, but fortunately the
dominated convergence statement survives. Consequently, using the vector
integration theory, it is possible to present a satisfactory answer to problem
(B). The classical intuitive explanations with conditional probabilities aiid
expectatioiis have to be given with necessary aiid explicit care aiid restraint.
Such a general theory has been discussed by the first author in a monograph
(Rao (1993),(2005)).It turns out, however, that in most problems of practical
interest, solutions found for (A) are sufficient, and so we turn to it. Because
of the intuitive and esthetic appeal, a considerable amount of the literature is
devoted to problem (A), and we present an important part of it here.
To consider (A), we have already noted that there exist regular conditional
probability functions if the conditioning a-algebra B is generated by a (count-
124 3 Conditioning and Some Dependence Classes

able) partition of C. As a motivation for the general study, let us present a


classical example where B is a richer a-algebra than that determined by a
partition. Thus let R = R2, C = Borel a-algebra of R2, B = Borel a-algebra
of R, aiid let f : R2 + R+ be a measurable function such that

Let P : A H JA f (z,y)dzdy, A E C, be the (Borel) probability measure, aiid


n l , 7r2 be z- aiid y-coordinate projections on R2. If B2 = n;'(B) c C, then
B2 is richer than a partition-generated 0-subalgebra of C. We now exhibit a
"natural" regular conditional probability fuiictioii Q : C x R + R+ relative
to B2. Define g : TK2 + PS+ by the equation

where f2(y) = JR f (z,y ) d s Theorem 1.3.11i (Fubini) guarantees that f 2 ( . ) ,


and hence g ( . ) , is B- and C-measurable. Since C is generated by the algebra
of measurable rectangles B x B, consider Q defined for such an A = A1 x A2 E
B x B,w = ( x , y ) E A, by

It is clear that Q is well-defined, Q(., w) is 0-additive on B x B, and Q(A, .)


is B2-measurable. If Q is shown to satisfy (5), then we can conclude that it
is a regular conditional probability function. To see this consider, for any
B2 E B2 = T;'(B)(B~ = R x B for a unique B E B),

= S,Ll g(xlu)dx k
(*)
f (u, u)dudu

(by Tonelli's theorem)

Now both sides of (7) are a-additive in A(E B x B) for each fixed B2 E B2.
Since B x B generates B 8 B = C, the Hahn extension theorem for a-finite
measures implies that (7) holds on C for each B2 in B2. Hence Q(., .) is a
3.2 Conditional Probabilities 125

regular conditional probability function as asserted. Note that &(A, .) is thus


a version of P ~ ( A )A, E C.
It is clear that, if 0 = Rn x Rm in the above, the same procedure holds for
<
the Q defined by a more general g(xl, . . . , xn yl, . . . , y,) for any 1 m, n < oo.
Here the structure of Borel algebras of Rn is used. Also, {Rn}n>l are the range
spaces of random vectors. It is shown later that the a b ~ v ~ c o n s i d e r a t i o n s
can be extended t o the image measures (i.e., distribution functions) on such
spaces.
Guided by the above example, we can give the regularity coiicept of Defi-
nition 3 in the followiiig slightly more convenient form.

Definition 4 Let ( R , C, P) be a probability space and B, S be two a -


subalgebras of C without any particular inclusion relationship between them.
A mapping P : S x f l + R+ is a regular coiiditioiial probability (in the
extended sense) if (i) P(., w) : S + R+ is a probability, for each w t f l , (ii)
P(A,.) : f l + R+ is B-measurable, for each A t S, and (iii)

If S = C, then this definition essentially reduces t o the preceding one. For


this reason, the phrase , "in the extended sense" will be dropped hereafter,
since there is no coiiflict or ambiguity by this action. If B = f l in (81, and
if P(., .) satisfies (i) and (ii) of the above definition, then any P on S for
which (8) holds is often called an invariant probability for the "kernel" or
the "transition probability" P(., .). For this interpretation, ( R , S) and (fi,B)
can be completely different. The generalization is of interest for an extended
study of Markov processes. However, for the present, we use Definition 3. Its
significance is made evident when the pair (S,B) is specialized and image
conditional measures are brought into play (cf. Theorem 1.4.1).
>
Let X : 0 + Rn, n 1, be a random variable (or vector if n > 11, and let
R be the Borel a-algebra of Rn with S = X P 1 ( R ) c C. Then P : S x R + R+
becomes Q x ( D , w) = P ( x p l ( ~w),) ,D E R, and (8) reduces t o

Such a Q x : R x 0 + R+ is called a regular conditional distribution (= image


regular conditional probability) when P is regular in the sense of Definition
4. Since Q(., w) = (P" o Xpl(.))(w) : R + R+ is a version of the image
conditional probability fuiictioii P" o X p l : R + L 1 ( R , B ,P), it is impor-
tant t o know about the existence of such a Q. In other words, when does a
function Qx : R x R + R+ exist, satisfying the conditions: (i) Q x ( D , .) is
B-measurable (B c C), (ii) Q x ( . , w) : R + [O, 11 is a probability, and (iii)
Q x ( D , .) = P B 0 x p 1 ( D ) ( . )a.e.[P]?
If X = ( X I ,X 2 ) and f (., .) is the density fuiiction of X relative t o
the planar Lebesgue measure [i.e., if F ( x , y) = PIX1 < x , X 2 < y], then
126 3 Conditioning and Some Dependence Classes

F(z,y) = sf, Sz f (u, v)dudv], and ( R , C ) = (R2,B @ B), then, as we have


shown, Q defined by (6) is indeed such a regular conditional distribution.
Without assuming such a representation, we can show that a regular condi-
tional distribution always exists for a random vector. The result is due t o
Doob (1953).

Theorem 5 Let (0,C, P) be a probability space and X : 0 + Rn be a


random vector. If B c C is a a-algebra, then a regular conditional distribution
Qx : B x R + R+ for X given B exists.

Proof The argument uses the fact that rationals are dense in R (or use
any dense denumerable set in Rn), and properties (i) and (ii) of Proposition
>
1 are available. Let {ri,i 1) c R be an enumeration of the dense set which
we take as rationals, and coiisider for each w t fl,

where B is the given a-subalgebra. By Proposition 1, there is a P-null set


N ( r Z 1 , . . , r,,,) such that, if w 6N(r,, , . . . , r,,,) then Fn(.. .; w) given by (10)
is nonnegative and nondecreasing, and if

so that P ( N ) = 0, then Fn(.. .; w) is also left continuous and in fact is an


n-dimensional distribution function for w 6 N . If XI, E R is arbitrary, define
for w @ N,
F n ( z l , . . . , z n : w) = lim F ( r t 1 , .. . , rt7,; w), (11)
Tsk l'zk

and if w E N , let Fn be equal to any fixed distribution function G,. For


definiteness, let us take Gn(zl, . . . , z,) = P[X1 < zl, . . . , X, < z,]. Then we
see that Fn(.. .; w) is an n-dimensional distribution fuiiction for each w t fl.
Thus one can define (for each Borel B c Rn, i.e., B t R )

It is clear that Qx (., .) : R x R + R+ is well defined, Qx (., w) is a-additive on


B, and Qx ( B , .) is measurable relative t o B, where R is the Borel a-algebra of
Rn. Thus Q x ( . , .) will be a regular conditional distribution of X if it is shown
that Q x ( B , w ) = P"(x-'(B))(w) for a.a. w E R , i.e., that Qx is a version of
P" o X - l on R .
For this purpose, consider the class C c R defined by

By defiiiitioiis (10) and (121, if S is the seiniriiig of right-open left-closed in-


tervals (or rectangles), then S c B. Also S is a T-class, since it is closed
3.2 Conditional Probabilities 127

under intersections. Atoreover, by the monotone convergence theorem and


(10)-(12) it follows immediately that C is a A-class. Hence by Proposition
1.2.8, a(S) = R c C. Thus Q x ( . , . ) is a version of P" o x-', completing the
proof.

It is gratifying t o know that there is always a version of P" o X - l which


is a regular conditional distribution on ( R n , R ) for each random variable
X : 0 + R n and a-algebra B c C . Note that, in this description we are
only concentrating on S = X p l ( R ) c C aiid P" : S + L1(fl, B, P ) . But
usually S is a much smaller 0-ring contained in C . Can we say, on S, P"
itself has a version which is a regular conditional probability? In other words,
can we transport the regularity property of Q x ( . , . ) to P" on S ? In gen-
eral, the answer is "NO." To see that there is a problem here, note that if
X ( R ) = Bo c R n , where Bo # R n , then by definition of inverse image,
X p l ( D ) = 0 for all D c Rn Bo. Consequently the mapping X p l : R + S
-

is not one-to-one. If Bo = Rn or if Bo E R, then we may replace R by the


trace a-subalgebra R ( B o ) = {A n Bo : A E R), and then X-' : R ( B o ) + S
is a a-homomorphism, and the above pathology is immediately eliminated.
Further, we can provide an affirmative answer easily. This observation is also
due to Doob.

Proposition 6 Let (0,C, P) be a probability space B c C a a-algebra,


and X : R + Rn a random vector. If X ( R ) = Bo E R and S = X P 1 ( R ( B o ) ) ,
where R is the Bore1 a-algebra of Rn, then there exists a version ux of
P" : S + L1(R, B, P) [ux(A, w) = PB(A)(w) a.a. (w), A E S] that is a
regular conditional probability. (Thus vx is defined only on S c C . )

Proof Since by hypothesis Bo E R, and by definition XP1(Bo) =


X-'(Rn), let Q x (., .) be a regular conditional distribution of X given B, guar-
anteed by the above theorem. It follows that Qx(Bo, w) = P " O X ~ ~ ( B ~ ) ( W ) =
P " ( x - ~ ( R ~ ) ) ( w ) = 1 a.a.(w). Let A E S. Since X : R + Bo is onto,
X p l is one-to-one on R ( B o ) + S. Hence there exists a B1 E R ( B o ) with
A = XP1(B1). [In fact, if there is also a B2 E R ( B o ) with A = XP1(B2),
then
B1 = X ( X - ~ ( B ~=) X) ( A ) = X(X-'(B~))= B2,
because the onto property of X is equivalent to X I X p l ( D ) ] = D for all
D c Bo.] Thus ux(A,w) = Qx(B,w) with A = X p l ( B ) , w E R N,
-

P ( N ) = 0, unambiguously defines ux for almost all w E R. Since X - l


preserves all set operations, it follows that ux(., w) is a probability for all
w E R - N , and then ux(A,w) = Q x ( B , w ) = P"(X-'(B))(w) = PB(A)(w)
a.a.(w), A E S . Since Qx(Bg,w) = 0 aiid P"(x-'(D)) = 0 a.e. for all
D E R(B,"), we have ux(A, w) = PU(A)(w) a.a. (w), A E S, so that ux
is a version of P" on S . This completes the proof.
128 3 Conditioning and Some Dependence Classes

In the above result, suppose B = a ( Y ) , the a-algebra generated by a


random vector Y : R + Rm. Then (cf. Proposition 1.4) there is a Borel
function g : Rm + R such that for each A = [ X < a] E S = a ( X ) , we have

If Y(w) = y ( R~m ) , then one expresses (13) symbolically as

P'([x< a])(,) = P{w' : X(wl) < a Y ( w ) = y}, (14)


and the right-side quantity has no other significance except as an abbreviation
of the left side. If X : fl + Rm is a coordinate variable, in the sense that
its range is Rn-this happens in many application-then the coiiditioiial
probability in (14) is regular, by Proposition 6, aiid it is a coilstant on the set
B = {w : Y(w) = y}. Thus when P' is regular, one denotes the right side of
(14), indifferently and unfortunately, as

We call FxlY (.I.) a conditional distribution function of X given Y [or B =


a ( Y ) ] . This terminology is "meaningful", since F x y ( y ) is a distribution
function, and F x l y ( ~ I .is) a Borel function. So,

and the right side always defines a conditional probability distribution by The-
orem 5; aiid (15) says that F x y ( . l y ) is a distribution function (can be chosen
t o be such for each y), and FxlY (al.) is Borel measurable. Hence we may state
the following (but we show later there is room for troubles in this calculation!):

Proposition 7 Let ( R , C , P) be a probability space and X , Y be a pair


of random vectors on R into Rn and Rm, respectively. If FX,yis their joint
distribution, Fy that of Y and Fx ( .) the conditional distribution, then

and the integral is an m-fold symbol. Also, F x l y ( z l y ) = F x ( x ) , x E Rn, if


X , Y are independent. Moreover,

lSg h(z, Y ) F x , YdY)


( ~=~ . l[Sg h(z; Y ) F x I Y ( F
I
~Y~Y
( ~)Y )(17)
,

for all Borel sets A c Rn, B c Rm and a bounded Borel function h :


Rn x Rm + R .

Proof For notational convenience, we take rn = n = 1. Let B, = (-co, z),


B, = (-co, y). Then by defiiiitioiis of F x , ~
FX, y , Fy , we have, since By =
a ( Y ) and YP1(By) E By,
3.2 Conditional Probabilities

= S,,P ( ~ , l t ) ~ ~ ( d (by
t ) the image probability law)

Thus (16) holds. If X , Y are independent, then PB(A) = E ~ ( =~P (~A ) )


a.e., by Proposition 1.4. Hence (15) becomes FXY = Fx (x), x E R.
To prove (171, since Fx y(.ly) is a distribution function, the mapping
v(.l y) : A H JA FX ( d x y ) ,A E R (Borel a-algebra) is a probability measure.
Consider for any bounded Borel fuiictioii h : R x R + R,

It is clear that C contains all intervals of the form [a,b) and their intersec-
tions, so that the semiring S of such intervals is in C and R E C. It is a T-class,
aiid by the inoliotolie convergence theorem it follows that C is also a A-class.
Hence by Proposition 1.2.8, C > o ( S ) = R. Since C C R , we have C = R,
aiid (17) is verified. Note that for the argument here aiid for (171, h can be
any Fxly-integrable function and need not be bounded. This completes the
proof.

Suppose that the distribution fuiictioiis FX,u,FXyand Fy are absolutely


coiitiiiuous with densities fX, (., .), fxI (.Iy), aiid fu (.). Then (17) implies

(by Fubini's theorem). (18)

Since this equation is true for all bounded Borel h, it follows from the Lebesgue
theory that fX,y(x, y) = f X l y ( x y fY(y)
) for a.a.(x, y) (Lebesgue). Hence the
example given for regular conditional distributions [cf. (6)) is recovered.
130 3 Conditioning and Some Dependence Classes

We restate ( 1 7 ) and ( 1 8 ) in a different form for the Lebesgue-Stieltjes mea-


sures, for reference, as follows:

Proposition 8 Let ( R ,C , P ) be a probability space and X , Y be random


vectors o n R into Rn, Rm. Suppose P X , y ,Py are the Lebesgue-Stieltjes mea-
sures o n Rn x Rm and Rm, respectively. If Q ( . y ) i s the regular conditional
distribution of X given Y = y, t h e n for any Borel sets A c Rn, B c Rm, we
have

where TI : Rm+n + Rn, TZ : Rm+n + Rm are coordinate projections. If,


further, P x , ~
density f x , y : Rn x R m -
is absolutely continuous relative t o the Lebesgue measure with
E+, then Py, the marginal of PX,y [i.e.,Px,y(Rnx
.) = P y ( . ) ] , also has a density f y [so that f y ( y ) = f x , y ( z , y ) d z ] and
JR.l

i s absolutely continuous relative to the Lebesgue measure. A version of its


density i s f x l y : R" + I?+, and it satisfies

Moreover, if h : Rn x Rm + R is a bounded Borel function, t h e n

and (cf Proposition 1 . 2 ~ 2 )

E ( h ( X ,Y ) )= / 1
R"' R"
h ( z 1 Y ? P X ~( Y~ ~ Y ? (dy?.
P Y

All these statements have already been proved above. Because of (211, the
conditional Holder aiid Minkowski inequalities (cf. Theorem 1.9) can also be
obtained using the corresponding classical procedures with Lebesgue-Stieltjes
integrals, once the regular conditional probability theory is available, i.e., if
Theorem 5 is given.
If X , Y are random variables, as above, having an absolutely continuous
(joint) distribution function with density f x , ~ ,then for any real a , b, aiid
61 > 0, Sa > 0, we have
3.2 Conditional Probabilities 131

as 61 + 0 or 62 - 0. Thus, e.g., P [ a 5 X
(14) and (20) oiie has
< a + S1,Y = b] = 0. However, by

On the other hand, using the naive approach [cf. (1) of Section 11

Letting 62 + 0 aiid using the classical Lebesgue differentiation theorem oiie


gets [the left side symbolizing the limit t o be denoted as the left side of (24)]
the following:

P[a I X < a+SllY = b] = lim P[a


6 2 10
< X < a + h l b IY < b + h2]

which is (24). It is worth noting an important difference between (24) and


(25). We now explain the lack of unique constructive procedures here.
Initially, we defined P ( A B) as P ( A n B)/P(B), which is unambiguous
only if P ( B ) > 0, while for the abstract equation ( I ) , p U ( A ) is defined for a
family B of events with B E B. We then specialized the latter t o obtain (24)
directly. However, we have n o t presented a recipe for calculatiiig p B ( A ) ,or the
regular conditional probability &(A, w) [a version of p U ( A ) ] ,when it exists.
In fact, this is a nontrivial problem. As (25) demonstrates, in any given situa-
tion, Q(A, w) or Pxy( A y ) should be calculated using additional information
that may be available. Also, (19) clearly shows that this is an abstract differ-
entiation problem of PX,yrelative t o Py. The problem is relatively simple if
P [ Y = b] > 0. But it is important and nontrivial in the general case, where
Y is also a vector and P [ Y = b] = 0. The theory of differentiation of inte-
grals enters crucially here. For an overview of the latter work, the reader may
consult a lucid survey by Bruckiier (1971), and for further abstract results,
Hayes aiid Pauc (1970). The point is that, especially in the vector case of the
conditioning variable Y, the limit in (25) may either (i) not exist, or (ii) exist
132 3 Conditioning and Some Dependence Classes

but depend o n the approximation sequence B, J, B = [Y = b]. If the approxi-


mation sequence, called the differentiation basis, is not the correct one, and if
the limit exists with some such "basis," the result can be another "conditional
density." When an appropriate differentiation basis is used and the derivative
exists, then it will be a version of the Radon-Nikodfm integralid by the general
theory of differentiation. These points merit the following elucidation.
We first indicate the correct meaning of the differentiation basis, state
a positive result, and then present a set of examples showing how different
limits appear for different approxiination sequences. Thus if ( R , C, p ) is a
general 0-finite measure space, then a family 3 c C of sets of positive finite
p-measure is called a differentiation basis if the followiiig two coiiditioiis are
satisfied: (i) for each w E R , there is at least one generalized sequence (or net)
{Fa,a! E I} c 3 such that Fa --t w (read "Fa contracts to w " ) in the Moore-
Smith sense (i.e., there is a0 such that a > a0 + w E Fa, where > is the
ordering of the index I),aiid (ii) every cofinall subsequence of a sequence of
a contracting {F,, a E I} also contracts to w. Here w E F, is not necessary. If
there is topology in 0, F, --i w can be interpreted in other appropriate ways.
The general existence result here is as follows:

P r o p o s i t i o n 9 (Existence of a Basis) If (a,C, p) i s a Carathe'odory-


generated measure space and p i s 0-finite, then there always exists a differen-
tiation basis 3 c C with the following property (called a Vitali p r o p e r t y ) :
for each A c R and E > 0 there exists a sequence {F,, n > 1) c 3 such that
(i) the F, are disjoint, (ii) p*(A - U,F,) = 0, and (iii) p(U,F, - A) < E ,
where A i s a measurable cover of A and p* i s the outer measure generated by p.

This result will not be proved. [It is quite iiivolved, and indeed is related to
the existence of a "lifting map" on (0,C, p). A complete proof can be found
in books on differentiation, and a recent version (and discussion) of it is given
in the first author's monograph (Rao (1993), (2005), Section 3.4).] We do not
need the result here except to draw the readers attention to the existelice of a
nontrivial problem with calculatioiis of regular conditional (probabilities or)
distributions.

A S e t of E x a m p l e s 10 Let (0,C, P) be a probability space and X :


R+ x R + R be a mapping such that X ( t , .) is a random variable for each
t E R+ aiid X ( . , w) is differentiable for almost all w E R. In particular, let
Y = X1(O,.), the derivative of X at t = 0. It is an r.v. We stipulate that for
each finite set tl < t2 < . . . < t , and ai E R,the r.v. 2, = Cr=la a X ( t a .)
,
has the distribution

Recall that in a partially ordered set I, a subset J is called cofinal if for each i E I
there is a j t J with j 2 i .
3.2 Conditional Probabilities 133

The 02 is determined (for simplicity) by the condition that if ai = 1,a j =


0, i # j , in a = ( a l , . . . ,a,, . . . ,a,), then [the above implies E ( X ( t ) ) = 01
> >
E ( X 2 ( t ) )= 1,t 0. Also, we assume that the function {X(t, .), t 0) is "er-
godic." For our purposes it suffices t o say that this implies that Y and X ( t , .)
are mutually independent. (A definition of ergodic process is given after The-
orem 7.3.3.) It follows from the Kolmogorov existence theorem, to be proved
below in Section 4, that such families of random variables exist on suitable
probability spaces (and {X(t, .), t > 0) will be called a Gaussian process).
For now we can and will assume that there is such a space ( 0 ,C , P) and a
mapping X with the above properties. A problem of considerable practical
interest is to find the "conditional density of Y when the process (or family
>
X ( t , .), t 0) has started at a , i.e., X(O, .) = a." From (26) it follows that
X ( t , .) has a continuous distribution for all t , so that

>
for t 0. Hence we are in the predicament discussed followiiig (25). Since Y
is obtained by a linear operation on X ( t , .), it is easily verified that E ( Y ) = 0.
Let E ( Y 2 ) = a2 > 0 ( a 2 < cc always), and

We shall now calculate the conditional density of Y given X(O, .) = a with


different approximations of A = [X(O,.) = a] and show that the naive ap-
proach gives completely different answers depending on the approximations.
This example is adapted from Kac and Slepian (1959).
+
(i) Approximation 1. Let A6 = [a 5 X(O, .) < a S] for 6 > 0. Then
P ( A s ) > 0, and as 6 + 0, As + A in the Moore-Smith sense. Hence, if p(.) is
the density of Y,

so that with (27) the conditional density is obtained as S + 0 [since the right
side of (28) does not depend on 61 :

However, in this approximation, the fact that X(0) is part of {X(t), t 0) and >
that X ( 0 ) = limt,o X ( t ) is not used; i.e., part of the information is ignored.
So we remedy this in the next set of approximations.
134 3 Conditioning and Some Dependence Classes

(ii) Approximation 2(m). Let 6 > 0 and m be a real number that is the
slope of a straight line through (0, a)(yt = a mt, t + >
0). Let A p = [ X ( t ) :
+
X ( t ) passes through the line y = a mt, of length 6, for some t 01. Thus>
< <
AT = {w : X ( t , w ) = a + m t , for some 0 t ~ / ( l + r n ~ ) ~ / is~ anevent.
),A~
Again P ( A F ) > 0, and for each rn, AT + A as S + 0. We now calculate
the "conditional density" p m ( . a ) and show that for each m it is a different
function. First let Y > m , and using the procedure of (28) and differentiating
relative to y to obtain the density, we get (on noting that Y and X(0) are
independent and that
X(t) - (a - mt)

a.e, as 6 + 0; now the approxiination of sets AT depends on the values of Y)

lim
+ Jaa-(,mj,- P(Y?. f (z)dz
(as?
6-0 1
6 p(y)cly Jaa-(y-m)6 f (z)dz + i .rrnW
P ( Y ) ~Ja
Y
a-(y-m)b
f (z)dz,
where f (.) is the density of X ( 0 ) as in (26) and p(.) is the density of Y given by
(27). Here since Y > m , the approximation obtains only when a - (y - m)6 <
X(0) <a . To simplify (29), let us find the limits of the numerator (= N6)
and denominator (= D6) separately:

= (y - m)p(y)(- f (a)) [by Lebesgue's differentiation


theorem because f (.) is continuous].
Similarly,
3.2 Conditional Probabilities

Using similar calculations for Y < m , and combining it together with the
above for (29), one finds

Giving m a different value each time, we get uncountably m a n y limits for


the coiiditioiial density of Y given X ( 0 ) = a here. Similarly with other types
of approximations, still other p ( . a ) can be obtained. Which oiie should be
taken as the correct value? It is seen that not all these { A r ,S > 0) qualify
to be the differentiation bases. (If m + 0, then one may choose a net out of
this to be such a basis and the result will be independent of m.) It is thus
necessary that one verify first whether the approximating family i s indeed a
differentiation basis, and t h e n calculate the conditional distribution. Without
this, oiie may end up with "spurious" densities, and the conditional distribu-
tion theory does lead t o ambiguities. Thus, the result depends on the basis.
Unfortunately, there is no known method to verify this fact.

The preceding discussion, examples, and comments do not complete the


sketch of the subject uiiless the followiiig two natural questions are also coii-
sidered in addition to (A) and (B) treated above:

(C) If we obtain a conditional probability distribution, using a (suitable)


differeiitiatioii basis, does it satisfy the fuiictioiial equation (9) or (16)?
(D) Cali we find general coiiditioiis on the probability space (R, C , P),
on which the r.v. X is defined, so that the family of regular conditional dis-
tributions {Qx(., w ) , w E 0 ) of (9) is unique outside of a P-null set (with the
differentiation procedure noted above)?

The affirmative answer to (C) has already been indicated in the preceding
discussion. This is a consequence of the general differeiitiatioii theory that
can be found, for instance, in the references of Bruckner (1971) or Hayes and
Pauc (1970). There is also a positive solution t o (D), but it lies somewhat
136 3 Conditioning and Some Dependence Classes

deeper. This is related t o the theory of "disintegration of measures." Here


topology plays a key role. Thus if R is a locally compact space, C is its
Bore1 a-algebra, and P is a regular probability on C, and if (fi, 2) is another
such (locally compact) Borelian measurable space (for us fi = Rn,2 = R)
and X : R + fi is P-proper [i.e., X is measurable for (2, C), where 27'
is the P-completion of 2, and f ( X ) is P-integrable for each continuous
f : R + R with compact support], then any pair Q k ( . , G ) , G E f i , i = 1 , 2 ,
of regular conditional probabilities (9) (which always exist) can be obtained
through the differentiation process. Moreover, they are equal outside a P-
null set if for each G, Q&(.,G) aiid Q$(., G) have their supports contained
in X-'({GI) [satisfy (9)] aiid p(Q&(A,.)) = p(Q:(A, . ) ) , A E C . Here
p : ~ ~ P(O X 2- ' ), + Loo(2,P O X - ' ) is a "lifting map" uniquely determined
by (and determining) the differentiation basis. In other words, Proposition 9
yields this p. In this theory for certain "nice" bases, p has stronger properties,
called a "strong lifting," meaning that if A E 2 is open, theii p(xA) XA >
must hold, aiid theii each Q&(., G) automatically has its support in X p l ({GI).
This is a deeper result, and is proved in the book by Ionescu Tulcea (1969,
Chapter IX, Section 5, Theorem 5), and the relevant result is also given in the
first author's book (1979, Chapter 111, Section 6, Theorem 2). We therefore
omit further discussion here, except t o note that this aspect of the theory is
quite delicate. Note that even these "nice" versioiis depend on p, aiid hence on
a differentiation basis. A lifting is seldom unique, depending upon the axiom
of choice.

Remarks 11 Many special attempts have been made in the literature


t o make this work "easy" through axiomatic means or other assumptions.
One of the extensive developinelits is the new axiomatic theory of probabil-
ity, started by Rknyi in the middle 1950s and detailed in his book (1970).
The basic idea is t o take the concept of conditional probability as a mapping
P : C x B +[0,1] such that P ( . , B) is a probability on the 0-algebra C for
each B E B, a collection of measurable sets for which 0 @ B, and it satisfies
two other coiiditioiis in the second variable. Then it is shown t o be of the
form p(A n B ) / p ( B ) , A E C, B E B, for a (a-finite) measure p : C + PS+,
and the theory develops quickly without the difficulties noted above. However,
problems of the type discussed in (26)-(30) do not fit in this theory, and thus
it is not general enough t o substitute for Kolmogorov's model. The second
approach was advanced by Tjur (1974). This is based on classical methods for
P(AIY = y) = lim, P ( A n B,)/P(B,), where the net {B,, a E I} converges
t o [Y = y] in the Moore-Smith sense. By assuming that the probability space
(or the range space of random variables) is nice enough, a general approach t o
calculating this limit using the methods of differential geometry is developed.
However, here also the fuiictioiis P ( y ) depend on the limit procedure used,
and it is not possible t o decide on the correct function from the family of
such functions. While these attempts may have individual merits, they do not
include some simple and natural applications. Consequently, we stay with the
3.2 Conditional Probabilities 137

Kolmogorov model. The abstract definitions by means of the functional equa-


tions [cf. (1) and (5)] appear t o be the most general ones available. Therefore,
the rest of the work in this book is based mainly on the Kolmogorov axioms
of Chapter 1. A possible way t o reconcile these dificulties i s t o consider the
constructive analysis, as reformulated by E. Bishop (1967). A s of n o w there i s
n o constructive method t o calculate Radon-Nikodym derivatives (hence condi-
tional expectations). S o this i s a question to be left for the future. For a more
detailed analysis and discussion, one may refer t o a recent account in the first
author's book (Rao, (2005)).

As a final item of this section, let us introduce an extension of the previous


chapter's independence concept for conditional probability measures without
invoking regularity. This leads t o some important applications.

Definition 1 2 Let (R, C, P) be a probability space and {B, B,, a E I}


be a family of 0-subalgebras of C. The B,, a E I, are said t o be conditionally
independent given B when I has cardinality at least 2, if for each (distinct)
finite set a l , . . . ,a, of I, and any A,, E B,, , i = 1 , . . . , n, we have the system
of equations

Similarly, a family {X,, a! E I} of random variables is (mutually) condition-


ally independent given B, if the a-algebras B, = a(X,) have that property.

It is clear that this reduces t o (unconditional or mutual) independelice if


B = (0,Q}, since then P' becomes P. Consequently it is reasonable t o extend
many of the results of the last chapter, and those based on them, with the
conditional probability function. However, some new and unfamiliar forms of
the "expected " results appear. We briefly illustrate this point here, and note
some alternative (operational) forms of conditional independence. In what fol-
lows, P'(.) is also denoted as P ( . B ) for notational convenience.

Proposition 13 Let B, B1, B2 be a-subalgebras from (R, C , P ) . T h e n the


following are equivalent:
(i) B1, B2 are conditionally independent given B.
(ii) P(BIIo(B U B2)) = P(BIIB) a e . , B1 E B1.
(iii) P(B2a ( B U B1)) = P(B2B ) a.e., B2 E B2.
(iv) For all X : Q + a+, B1-measurable, E ( X a ( B U B2)) = E ( X B ) a.e.

Proof (i) + (ii) If B E B and B2 E B2, then B n B2 E a ( B U B2) and is


a generator of the latter 0-algebra. So it suffices t o prove the equation of (ii)
when integrated on such a generator, since both sides are measurable relative
t o a ( B U B2). Thus
138 3 Conditioning and Some Dependence Classes

(by the averaging property of E")

Hence the extreme integralids agree a.e. [PI on o ( B U B2) which is (ii), because
all such sets generate the latter a-algebra (or use Proposition 1.2.8).
(ii) + (i) Since B c o ( B u B2), and hence E" = E " E ~ ( " ~ " (ii)
~ ) ,yields
for a.a.(w),
PB(Bln B2)= E " ( E ~ ( " ~ ". xB2))
~)(~~~

= E"(XB~ . P(B1IB)) (by hypothesis)

since PU(B1)(= P ( B I B ) ) is B-measurable. Thus (i) holds.


By interchanging the subscripts 1 and 2 in the above proof, (i) @ (iii) fol-
lows. Finally (iv) + (ii) trivially, and then (ii) + (iv) for all step functions by
the linearity of conditional expectation operators. Since each B1-measurable
X > 0 is a limit of an increasing sequence of B1-simple functions, and the
monotone convergence criterion holds for these operators, (iv) holds for all
such X . This completes the proof.

Using the ( T , A)-class theorem (cf. Proposition 1.2.8) just as in the uncon-
ditional case, we deduce the following result from the above.

Corollary 14 Let X I , . . . , X, be r a n d o m variables o n ( R ,C , P) and


B cC be a a-algebra. T h e n the set {XI,.. . , X,) is (mutually) conditiona,lly
3.2 Conditional Probabilities

independent iff for all 1 < m < n,

Another consequence of the concept is given by

Corollary 15 Let B1, B2 be independent a-algebras from (R, C , P ) . If B


i s a a-subalgebra of B1 ( o r B2), t h e n B1 and B2 are conditionally independent
given B, so that

Proof For any C E B we have, with B c B1,

(since B1, Bz are independent and

[because P"(B) is P(B) by the independence of B and B].

Hence P"(A n B ) = P"(A)PB(B) a.e. Now the case that B c B2 being simi-
lar, the result holds as stated.

Remark: The following consequence is worthy of special mention: Let


X , Y, Z be random variables on (a, C, P) with Fl = a ( X ) , F2 = a ( Y )
aiid F 3 = o ( Z ) . If X is independent of Y aiid Z , aiid Y is integrable, then
E ( Y X , 2) = E(Y1Z) a.e., i.e., E " ( ~ ~ ~ ) =
(YEO) ( ~ ) ( Ya.e.
) so that X and Y
are conditionally independent given 2. This follows from from the corollary
by taking B = &, B1 = Fl and B2 = F2 [and similarly X and Z are condi-
tionally iiidepeiideiit given Y]. For this alternative (distributional) relation, it
is not necessary t o demand integrability of X , Y and Z . For the relation with
conditional expectation, the integrability of Y or Z is needed.

The Kolmogorov zero-one law takes the following form:

Proposition 16 If { X n , n > 1) i s a sequence of r a n d o m variables o n


(R,C, P) that i s conditionally independent relative t o a 0-algebra B c C , and
if 7 = a(Xn,n > k) i s its tail a-algebra, t h e n B and 7 are equivalent,
140 3 Conditioning and Some Dependence Classes

in the sense that for each A E 7 there i s a B E B such that A = B a.e.[P].

Proof The argument is analogous t o the earlier result. Indeed, by the


preceding corollary we have, for each n , the 0-algebras a ( X 1 , . . . , X,) aiid
> +
o ( X k ,k n 1) are coiiditioiially independent given B. Hence o ( X k , .. . , X n )
>
and 7 have the same property for each n. We deduce that a ( X k ,k 1) and 7
>
are conditionally independent given B. Since 7 c a ( X k ,k I ) , it follows that
7 is conditionally independent of itself given B. Thus A E 7 + P B ( A n A) =
( P " ( A ) ) ~ ,so that P B ( A ) = 0 or 1 a.e., aiid it is a B-measurable indicator.
So for some B E B, P B ( A ) = X B a.e. Siiice then both are Radon-Nikod9m
derivatives of P relative t o P", it follows that A = B a.e. [PI, completing the
proof.

Since in the uiiconditional case B = (0, a), this says that if {X,, n 1)>
are mutually independent, 7 aiid (0, R) are equivalent, which is Kolmogorov's
>
zero-one law. Another interesting observation is that if the X,, n 1, are in-
dependent with the same distribution and if P is the a-algebra of permutable
events relative t o {X,, n > I ) , by definition of permutability, we can con-
clude that {X,, n > 1) will still be conditionally independent given P, since
in Definition 11 only finitely many X, will appear each time. Consequently
by the above proposition P aiid 7 are equivalent. Siiice each event of 7 has
probability zero or one, so must each event of P . Thus each permutable event
determined by the independent X n with the same distribution has probability
zero or one which is the Hewitt-Savage law (cf. Theorem 2.1.12).

3.3 Markov Dependence


Using the concepts developed in the preceding two sections, we can con-
sider various classes of dependent random families. Thus in this and the next
sectioiis we introduce two fundamental classes of such dependences, namely,
Markovian aiid martingale families. The first one, t o be discussed here, was
introduced by A. A. Markov in 1906 for the case that the range of each ran-
dom variable is a finite set, so that no difficulties with conditional probabilities
arise. The general case emerged later with the studies of Kolmogorov, Doe-
blin, P. Lkvy, Doob, Feller, Hunt, and others. It is one of the most active areas
of probability theory. The second area, martingales, of equal iinportaiice and
activity, will be considered in Section 5, after some existence theory in Section
4.
The concept of Markovian dependence is an extension of that of indepen-
dence given in Chapter 2, and so we introduce it here by following Definition
2.1.1, aiid then present equivalent versions for computational coiivenience.
3.3 Markov Dependence 141

Definition 1 Let (R, C, P) be a probability space and I be an ordered


set. If {B,, a! E I} is a net of a-subalgebras of C, consider the "past" and
<
"future" a-algebras G, = a ( B y ,y a) and Go = a(By1,y' >
a ) . Then the net
is said t o be Markovian if for each a E I (i.e., "present"), the 0-algebras G,
and Go are coiiditioiially independent given B,, so that ( " I " is the ordering

In particular, if {X,, n E I}is a set of random variables on fl,then it is called


Markovian if the a-algebras {a(X,), n E I} form a Markovian family in the
above sense. If I c R (Rn) , then {X,, n E I} is called a Markov process
(Markov random field).

Using the result of Proposition 2.12 and Corollary 2.13, it is possible t o


present several equivalent forms of the above definition. We do this below.
Note that (1)can be stated informally as follows: The a-algebras {B,, a E I},
or the random variables {X,, a! E I},form a Markovian family if the past and
future are conditionally independent given the present. Since P"" is not nec-
essarily a genuine probability measure (it is generally a vector measure), as
discussed at length earlier in Section 2, the above statement is only informal.
For a finer analysis (using the classical Lebesgue integration theory) we need
t o assume that the P"- are regular. However, several preliminary considera-
tions can be presented without such restrictions, and we proceed t o discuss
them now. Again set P"(.) = P(.B ) whenever it is convenient.

Proposition 2 Let { X t , t E T} be a family of random variables on


( R , C , P ) , with T C R. Let B1 = o ( X t ) c C , t E T . Then the following
statements are equivalent:
(i) The family is Markovian.
>
(ii) For e a c h t ~< t2 < . . . < tn+17t3E T , n 1, ififn = { t i , . . . , tn},GT7,=
a ( X t l , . . . , X t , , ) , then P ( A G t 7 , )= P(AIBt?,) a.e., and A E Bt7,+,.
(iii) If t l < t 2 , Gtl = a ( X t , t I t l , t E T ) , t, E T, i = 1 , 2 , then P(AIGtl) =
P ( A B t , ) a.e., and A E Bt,.
>
(iv) If Gt = a ( X , , s t , s E T ) , t E T , and Gt is as in (iii) , then
P(AIGtl) = P(AIBtl) a.e., and A E Gtl.
(v) If Gt, Gt are as above and Z is any bounded Gt-measurable function,
then E G ~ 1 ( Z=) E ~ L (2)
I a.e.

Note Observe that since the definition of a Markov family is symmetri-


cal in that, intuitively, the "past" and "future" are conditionally independent
given the "present," the above proposition gives the following alternative in-
terpretation, namely; the family is Markovian implies that, for predicting the
future behavior of the process, given the whole past and present the behavior
depends only on the present, and conversely. This is the natural interpreta-
tion of (iii) @ (i) @ (iv). These are obviously only one-sided statements. But
142 3 Conditioning and Some Dependence Classes

Definition 1, which is equivalent t o these, on the other hand, shows that if


{Xt, t E T ) is Markovian, then {XPt, t E T ) is also Markovian if T = R or
T c R is a symmetric set relative t o the origin. Thus a Markovian family
remains Markovian if the time direction (i.e., of T, the index set) is reversed.
It is useful t o have these alternative forms of the concept. Note also that if
TI c R and a : TI + T is an isotone mapping (so that it preserves order),
then {X,(,), i E TI) is also Markovian if the family {Xt, t E T ) is. In particu-
lar, each subfamily of a Markovian family is Markovian. Also, if gt : R + R is
a one-to-one and onto mapping such that g,l(R) = R the Bore1 a-algebra of
R,then the process {x
= gt(Xt),t E R } is Markovian whenever {Xt, t E R } is
such, since a(x)
= Y , ~(' R ) = X;' ( g r 1( R ) ) = o ( X t ) . However, it is possible
t o introduce other definitions of reversibility. For instance, if the range space
of {Xt, t E T ) is a set of integers (or only positive integers), then one can ask
<
whether the probability of taking values from i t o j (i j) by the process is
the same as that from j t o i. This need not be true in general, and when true
it is called a "symmetry" or "path reversibility." This special case will be de-
signated as such.

Proof (i) + (ii) Taking I = T c R in Definition 1, for each t, E T ,


Gt,, aiid are conditionally independent, given Bt7,. Since Bt,,+, C Gtrb, it
G t r b

follows that Gt7, and Bt,,+, are also coiiditioiially independent, given Bt,, . Then
by Proposition 2.12iii

Consider 7 , = {tl, . . . t,}. Clearly G,, c Gt7, aiid Bt7, c G,, . Heiice applying
the operator EB7l~ t o both sides of (2) aiid noting that ~4 EB7~~,
" 7 i.e.,
1 ~

EG7l.EGtl~= EB7l~, we get [since the right side of (2) is BtrL-measurable]

= EG711
(P(AIBt,,))= P(AIBt,,) a.e., A E &,+, [by (2)l.
Heiice (ii) follows.
(ii) + (iii) First note that if F is the collectioii of all finite subsets of
TI (i.e., a E 3 iff a = { u l , . . . , u,} c TI for soine 1 <n < oo),where
TI = {t E T , t 5 t l ) , then Gtl = C T ( U , ~ ~ G , ) ,with G, = a ( X t , t E a ) .
Indeed, since each G, c Gtl, a E F, Gtl contains the right-side a-algebra. But
Gtl = ~ ( u ~ ~aiid ~ BBt~c )G, , for soine a E F. Hence Gtl is contained
~ each
in the right side. Now (ii) implies, by Proposition 2.12iii again, that G, aiid
Bt, are coiiditioiially independent given Btl for each a E F. The argument
proceeds as in the case of independent events (see the proof of Theorem 2.1.3).
Thus t o use Proposition 2.12ii, we introduce two families of sets D and C as
follows. Let D be the class of all finite intersections of events each belonging
t o a G,, a E F. Then it results in G, c D , a E F. But this clearly implies
Gtl c o ( D ) . It suffices t o show, for (iii), that Bt, and a ( D ) are conditionally
independent given Btl. For this consider the class
3.3 Rtarkov Dependence 143

C = { B E C : P ( B n A1Bt3,,)= P ( B B t l ) . P(AIBtl) a.e., all A E Bta}.


Evidently D c C (since D and Bt, are conditionally independent). It is easy
to verify that [ D is a T-class and that] C is a A-class, as in the independent
case. Hence C c o ( D ) . Since C and Bt, are coiiditioiially independent giveii
Btl, the same is true of o ( D ) , aiid hence of Gt, and Bt, giveii Btl. This shows
that (iii) is true. The remaining implications can now be deduced quickly.

(iii) + (iv) Writing t for t l , in (iii), we get that Gt and Bt, are conditionally
independent giveii Bt for any t2 E T , t2 > t. From this, by Proposition 2.12ii,
we deduce that Gt aiid a ( X t , ,i = 1, . . . , n, t , > t, t , E T ) are coiiditioiially
independent giveii Bt. Then, by the argument of the preceding paragraph, Gt
and Gt are conditionally independent given Bt, which is (iv).

(iv) +- (i) By Proposition 2.12iii, Gt and Gt are conditioiially independent


given Bt. Thus (i) holds by Definition 1.

(iv) + (v) This is true if Z = X A , A E Gt. By linearity, the result holds


if Z = C r = l aiXA,, Ai E Gt. By the conditional monotone convergence cri-
terion, it is true for any @-measurable Z >
0. This implies (v) for general
bounded Z , since then Z = Z + Z p , Z* being @measurable. That (v) +-
-

(iv) is trivial. This completes the proof of the proposition.

If {Xt, t E T} is a family of independent random variables on (R, C , P)


and T c R,then it is evident that this forms a Rtarkovian class. An equally
simple example is that any monotone class {Bt, t E T ) , T c R,of a-algebras
from ( R , C, P) is Markovian. In fact, if Bt c Bt1, for t < t' and Gt, Gt are
the a-algebras generated by {B,, s E T, s <
t ) and {B,, n E T, n t), then >
Gt c Bt c Gt, so that for any A E Gt, B E Gt, we have

The decreasing case is similar. A less simple result is the following:

Example 3 (a) Let {X,, n > 1) be a sequence of independent random


variables on ( R , C, P) aiid Y, = C;=, X k. Theii {Y,, n >
1) is Markovian.
To verify this, let B, = o(Y,) aiid A, = o(X,). Theii A1 = B1 and
B, c a ( B n p l U A,) c a(A1 u A 2 . . . U A,). If

then Gn = a(& U Uk2,+1 Ak). Also, Ak is independent of B, for k n 1. > +


Rtoreover, if D is the class of events of the form C1 n C2, where C1 E B, and
144 3 Conditioning and Some Dependence Classes

Cz E o(Uk>n+lA k ) , theii D is a T-class generating Gn. Hence it suffices t o


verify the truth of (1) for all A E Gn and B = Cl n C2 E D by the above
proposition [see the proof of (ii) H (iii)]. Thus for these A, B we have

P"~I(A n B) = E"I (xA. xcl . xC2)


E"' (XA . XC1 . E(xcL))

(since C2 is independent of B, and also of G,,


Corollary 2.15 applies)

and Corollary 2.15 applies again]

This proves the Markoviaii property of {Y,, n > 1).


(b) The following consequence of the above illustration is useful in some
applicatioiis (cf., e.g., in Problem 33 of Chapter 5 later). Thus if X I , . . . , X,
are iiidepeiideiit random variables, and Yn = C;=, X k , as in the illustration,
so that {Y,, Fn = o ( X 1 , . . . , X,), n >1) is a Markov process, we assert
that, with n = 2 for simplicity and letting p, = P o x,-'(.), i = 1 , 2 ,p =
+
P o (XI Xz)-l(.), the following obtains:

for each Borel set B C R.


Indeed, if A c R is any Borel set, theii

[by Theorem 1.4.1 (ii),]


3.3 Markov Dependence

[by independence of X I , X 2 ,]

B X I ] dPa where
From ( I ) of Section 2.1, the left side =
x,l(4 PIX1 + X a E
B = a(X1). It follows that (*) holds since x P 1 ( A ) is a generator of B. This
can be extended for n >
2 to obtain the following, which we leave to the
reader (cf. also Chung (1974), P. 308):

Recall that if B = a ( X ) and Y is integrable, then E'(Y) is also written


E ( Y X ) . If Y = X A , then the latter is often denoted P(AIX). With this
notation, the Markoviaii property given by Proposition 2 can be stated as
follows. The class {Xt, t E T), T c R, is a Markov process iff any one of the
following equivalent conditions holds: For any s < t in T and Borel set A c R,

(9 P ( [ X t E A] X,, r 5 s) = P ( [ X t E A]IX,) a.e. (3)

(ii) For any t l < t2 < . . . < t,+l in T and Bore1 set A c R,
P([XtrL+,~ A I 1 X t l , . . . , X, ~
t ) = P ( [ X t , ~ + , l ~ A l 1 X t , ~ a.e.;
) , n > ~ (4)
,
(iii) For any sl < sz < . . . < s, < t < t l < . . . < tm in T
and Borel sets A, c R, Bj c R,

This interesting form leads us to derive a fundamental property of Markov


processes. We recall from Theorem 2.5 that for any B, a a-subalgebra of C of
146 3 Conditioning and Some Dependence Classes

( R , C , P), and a random variable X : R + R,a regular conditional distribu-


tion for X given B always exists on R. It is a version of the image measure of
P", i.e., of P" o xpl(.). Thus we have

P r o p o s i t i o n 4 If {Xt, t E T c R} i s a Markov process o n a probability


space ( R , C , P) and r < s < t from T , then the following relation, called the
C h a p m a n - Kolmogorov e q u a t i o n , holds:

If a version of the (image o r ) regular conditional distribution of

i s denoted Q,,t(A,X,(w)), then (6) can be expressed as

for almost all w E fl, the exceptional null set depending o n r , s , t and A.

[Often Q,,t (A, X, (w)) is written as p(X, (w), s ; A, t ) in (7) and interpreted
as the probability of the motion of a particle w starting at time s from the
state X,(w) and moving into a position or state in the set A at time t > s.]
< <
Proof. Consider the process {Xu, r u s < t , Xt : r , u, s, t in T}. Then
by (3)
< <
P ( [ X t < XIX,, r u s) = P ( [ X t < XIIX,) a.e. (8)
Hence applying the operator E(.IXT) to both sides of (81,and noting that
~ ( x Cr )r ( X t Lr, 5 u 5 s ) , one gets, by Proposition 1.2,

which is (6). Since P ( [ X t < XIIX,) is a(X,)-measurable aiid Theorem 1.4.1


holds for P" because of Theorem 1.3 (i), we have

for a a(X,)-adapted bounded Y, so that (7) follows from this aiid (6), coin-
pleting the proof.

For convenience of terminology, one calls the range space of a Markov


process (or any family of random variables) the state space of the process.
If the latter is at most countable, then a Markov process is called a Markov
chain. If the range is a finite set, we say that the Markov process is a finite
Markov chain.
3.3 Markov Dependence 147

The preceding proposition implies that for every Markov process its family
of conditional probability functions { p ( . , t ; ., s),s < t in T) must satisfy the
Chapman-Kolmogorov equation (6) or (7). It would be surprising if a Markov
process can be characterized by this property, in the sense that the only con-
ditional probability functions satisfying (6) or (7) are those given by a Markov
process. Unfortunately this is not true, as was first pointed out by P. Lkvy
already in 1949. The following simple example, due to W. Feller, illustrates
this point.

Counterexample 5 We present a noii-Markovian process whose condi-


tional probabilities satisfy (7). Let the state space be {1,2,3}. Consider the
probability space R = {wi, 1 i < < 9). Here the points w, of ~ " a v e integer
coordinates and are specified by wi = (1,2,3), wa = (1,3,2), w3 = (3,1,2),
w4 = (3,2,1), us = (2,1,3), us = (2,3,1), w7 = (1,1,1), ws = (2,2,2),
wg = (3,3,3). Let C be the power set of fl and P({wi}) = 1 9' 1 < 2- <
9. On
'

this probability space, consider a "process" X I , X a , X3 where Xi(w) = ith


coordinate of w. Thus, for instance, X1(w4) = 3, X3(wG) = 1, etc. Then
Xi : 0 + {1,2,3), and

For any (i, j) and (m, n ) we have

$ ifif j
= i 1 ifi=j,m=n
0 otherwise.

Also, {XI, X 2 ,XS} are pairwise independent. They are not Markovian. To see
the latter.

so that the "future" depends not only on the "present" but also on the "past."
< <
But for 1 i < j 3, the Qi,,(., .) satisfy (7), since
148 3 Conditioning and Some Dependence Classes

and similarly other coinbiiiatioiis are verified. Note that (7) holds identically
(not a.e.).
The preceding example can be extended t o an infinite family of ran-
dom variables with the same (non-Markovian) properties. Let us assume
that the basic probability space ( 0 ,C , P) is rich enough to support the fol-
lowing structure. [Actually, it is possible to enlarge the space to suit the
needs. For this, if ( fi, C, P) is as in the example, then we let (R, C , P) =
@i21(fii,Ci,P,), fi, = fi, etc., as a product space, and the correctness of
this procedure is a simple consequence of Theorem 3 of the next section.] Let
{X,, n >
1) be random variables such that X I , X 2 ,X3 are the ones defined
above, and let each block of the following three r.v.s have the same distribu-
tions as these. Thus for any rn 1,>

where 1 <
i l l i2,i3 <
3 are integers. It then follows immediately that the
family {X,, n >
1) is non-Markoviaii, but pi,,,, = P[Xk+1 = i21Xk = ill =
>
113, k 1, and (7) holds. Here one defines X, at w E R , w = (51,5 2 , . . .), G, E
Ri, by the equation

This completes the description of the example.


Note It is of interest t o observe that the commutativity properties of
conditional expectation operators (cf. Proposition 1.2), with Q ~ ( A =
) PIX1 E
A],A c R Borel, Q,(A, X,) = Q,+l,,(A, X,), and the Markov property of
>
{X,, n 1) together imply, in the context of Proposition 2 above

for any bounded Borel f : R + R and any random variable X on ( R , & + I , P),
where = a ( X 1 , . . . , Xn+1), of the family. Indeed, we have

by regularity of pB1lo X-l. But E(f (X)IB,) = E(E(f( X ) B n + l B,), since


Bn c . Hence with the Markovian property of {XI, X2, . . . , Xn+l), we
get p B 1 b =P"(~M) 0x2~
= Q,(., .), and the last integral of (9) is just
(10). Similarly (with the Markovian property again),
3.3 Markov Dependence 149

Since E ( f ( X ) ) = E E U 1 E U 2.. . (f ( X ) ) , by iterating (11) n times we get


EI'~

(9). In particular, if f ( X ) = f l ( X 1 ) . . . f,+l(X,+l), then (9) reduces to

for any real bounded Borel functions f l , . . . , f,+l and the image measure ~1
of P given by Q 1 ( ~= ) P(XT'(A)) = PIX1 E A]. This is called the initial
>
distribution of the process {X,, n I), being the image measure of the first,
or initial, random variable X I . Taking f, = X A , A, c R, Borel, (12) yields an
important expression for the joint distribution of ( X I , . . . , X,+l) :

/
&+I
Qn(dxn+l>xn). (13)

Setting A1 = R,Ai = R , i >3, this relation in conjunction with (7) gives


the distribution of X2 and all the marginal distributions of any subset of
{Xk, k > 1). For instance, ~3 (.) is given by [since Q, (R, znpl)= 11

for all Borel sets A c R. Similarly others are obtained. Here Q1,2(.,x1)is
the same as in (7) with r = 1,t = 2 there. E v e n though the conditional
distributions {Q,,(., .), r < s ) do n o t uniquely correspond t o a Markov process,
as shown in the counterexample above, the (absolute) finite dimensional o r
joint distributions (as n varies) given by (13) are uniquely defined and, a s
demonstrated in the n e x t section, t h e y determine a Markov process. In this
sense both (7) and (13) play crucial roles in the Markov process work.
In the preceding discussion Q,,t(A, x) is a version of P ( [ X t E A ] X , = x ) ,
and is a regular conditional distribution of Xt given X, = x. Since generally
150 3 Conditioning and Some Dependence Classes

there exist several versions of the conditional probability, how should one
choose a family z),r < t , x E R} in order that all these measures
simultaneously satisfy (13)? There is no problem if the family is finite or even
couiitable, since we can find a fixed iiull set and arrange things so that this
is possible. Also, if the state space is at most countable (i.e., the Atarkov
chain case), we can take the exceptional set as empty and answer the problem
affirmatively. In the general case, no such method is available. To include
all these cases, one assumes the desired property and develops the general
theory. This is called the (Markov) transition probability (nonconstructive or
idealistic procedure) family defined as follows [we write Q(., t ; z, r) in place of
QT,t(., z) for conveiiieiice].
A mapping Q : B x R+ x R x R+ + [O, 11, where B is the Borel a-
algebra of the state space R, is a (Rtarkov) transition probability if for
each O < r < t , z E R ,
(i) Q(., t; z, r ) : B + [0,11 is a probability,
< <
(ii) Q(A, t; .,r ) : R + [0, I ] is B -measurable for each A E B, O r t ,
<
(iii) for each 0 r < s < t , one has

identically in all the variables shown. [For this definition R+ can be replaced
by a subinterval of R. But for the following work we use R'.]
If p is an initial probability on B then substituting p and Q in (13) oiie
can generate an n-dimensional probability measure on Rn. We also take

as a boundary condition. If a Markov process { X t , t >


0) on ( 0 ,C, P) is
given, then &(A, t ; z, s ) is a version of~ " (x,' (A)) (w), aiid so they are
( ~ 6 )

equal a.e. Thus (14) is just the Chapman-Kolmogorov equation, which is now
assumed to be true identically (without any exceptional sets). It follows from
the work of the next section that there exists a Rtarkov process on a probability
space (R, C, P) such that p(A) = PIX. E A], and P ( [ X t E A] X , ) (w) =
&(A, t ; X,(w), s ) for a.a.(w). [The exceptional iiull sets depend on A, s , t , in
general, aiid this is why we assume the conditions (i)-(iii) above identically.]
A coilsequelice of the above (strengthened) conditions is that oiie can
transform the Markov process theory into one of functional operations by
means of (14):

Proposition 6 Let B ( R , B) [ = B ( R ) , say] be the space of real bounded


Borel functions on R with the uniform norm: If I = sup{ f ( z ) : z E R}. If
< <
{Q(., .; ., .)} is a transition probability family, as above, and for each 0 s
t , Us,t is defined as
3.3 Markov Dependence 151

t h e n Us,t : B(R) i B ( R ) i s a positive contractive linear mapping, i.e., Us,tl =


1, IU,,t f I 1 1 f 1, and U,,tf > 0 for f > 0. Moreover, {U,,t : 0 I s I t}
forms a generalized semigroup (or satisfies a n e v o l u t i o n e q u a t i o n ) , in that

UT,sUs,t= UT,t, 0 < r < s < t , U,,, = id. (16)

Conversely, every such family of (evolution) operators o n B ( R ) uniquely de-


termines a (Markov) transition probability family o n B x I%+x R x I%+.

Proof Since f is bounded, the integral in (15) is well defined. If f is a


simple fuiictioii in B ( R ) , it is clear that Us,tf is a bounded Bore1 function. By
the dominated convergence theorem, the result follows for all f E B(R). Thus
Us,t : B ( R ) + B ( R ) is linear, maps positive elements into positive elements,
and IUs,tf 1 I I f 1 , since Q(R, t ; x, s ) = 1. Also, by (14)

(by a form of Fubini's theorem)

Hence U,,t = U,,sUs,t proving (16), since U,,, = id is obvious.


In the opposite direction, let U be as given, and for each A E B, x E
<
R, s t , define Q(A, t; z, s) = ( U s , t ~ A ) ( zThen
). Q(.,t; z, s) is additive by the
linearity of Us,t,Q(A, t ; ., s) is B-measurable, and 1 = Us,t1 = Q(R, t; z, s). To
see it is 0-additive on B, let f, be any sequence of continuous fuiictioiis with
compact supports such that f, J 0 pointwise. Then AE, = {x : f,(x) >
E } is
compact and n, A; = 8.Hence there is an no such that A&, = 8 (finite
intersection propgrty).
Thus I f,l < E , n > no. If we coiisider the linear fuiictioiial (U,,t(.)) (x),
then the norin coiiditioii implies that I Us,tf, 1 <If, I + 0 as n + 0. Since it
is positive also, it defines an integral on Coo[CB(R)], the space of continuous
functions with compact supports. If &(., t ; x, s) represents (Riesz representa-
tion theorem) this (Us,t(.))(x),then Q = Q on all compact sets in B. But the
latter generate B. Thus the standard results in measure theory imply Q = Q
on all of B, and hence Q is a transition family, since then (16) becomes (14).
This completes the proof.

In a moment we present an important application of the above results im-


plying that all these assumptions are automatically satisfied for that class. We
must first observe that the relation (16) is a natural one in Markov process
152 3 Conditioning and Some Dependence Classes

work. Our concept of Markovian family as given by Definition 1 (cf. Propo-


sition 2 also) is based on the distributions (or probability measures) of the
random variables. A weaker concept will be t o ask for conditions on a few mo-
ments. This inquiry leads t o an iiitroductioii of a wide-sense Markov family, of
some interest in applications. Since the conditional expectation is a contrac-
tive projection on L1 (C) (cf. Theorem 1.11), the corresponding wide-sense
operation will use the Hilbert space geometry, and is given as follows.

Definition 7 (a) Let {Xt, t E T } be a family of square iiitegrable(comp1ex)


random variables on a probability space (R, C, P) and T c R.We define a
correlation characteristic p(s, t ) as

) if E ( X S 2 )> 0, s < t in T
~ ( st ),=
otherwise.

[p(s,t ) will be the correlation if E(X, 12) is replaced by [E(x, 1 2 ) ~ ( 1X2 ) ] t'I2
and all the Xt have zero means; Xt is the complex conjugate of Xt.]
(b) If t l < . . . < t, are points of T, and 5352; = sp{Xt,, . . . , X t n } is the
linear span, let ~(.15352;)be the orthogonal projection of L 2 ( C ) onto 5352k. Then
the given family is called a Markov process i n the wide sense if for each such
>
collection of points t l < . . . < t,, n 1, of T ,

This may be equivalently expressed as

~ ( x tX,t ,l , . . . , Xt,,-l) = ~ ( x tXt,,-l)


,, a.e. (17)
Note that if 5352; is replaced by L 2 ( o ( x t 1 , .. . , X t , , ) ) c L 2 ( P ) , then (17)
becomes E(Xt,,X t l , . . . , X t r L 1 =
) E(Xt,,X t , , a.e., and the concept reduces
t o the ordinary Markovian definition in L 2 ( P ) , by Proposition 2; the latter
should be called a strict-sense Markowian concept for a distinction. But this
qualification is usually omitted. [Then E(. 5352:) becomes E(.IXtl).]
The following characterization of (17), due t o Doob, reminds us of the
"evolution" equation, formula (16) again.

Proposition 8 Let {Xt, t E T ) c L 2 ( R ,C , P) be a family of random


variables, with T c R. T h e n it i s a Markov process i n the wide sense iff
its correlation characteristic satisfies the following functional equation, for
r < s < t inT:
P(T, t ) = P(T, S)P(S, t ) . (18)
Proof For s < t , consider X, and Y, = X t - p(s,t)X,. Then for any
complex scalar a we have
3.3 Markov Dependence 153

so that a x , and Y, are orthogonal vectors in the Hilbert space 'H =


L2(Q,C , P). Consider 5352, as the linear span of X,, and 9.R: as its orthogonal
complement, so that 'H = 5352, @ 9.R: and each vector in 'H can be uniquely
expressed as a sum of two mutually orthogonal vectors, one from 5352, and
another from 5352;. In particular, if the (orthogonal) projection in 'H onto 5352,
is denoted Q,[= E(.Ix,)], then setting a = p ( s , t ) in the conclusion of (19),
we get (adding Y, and a x , there)

which implies Q,Xt = p(s, t)X, [ and (I Q,)Xt = X t p(s, t)X,]. But
- -

by the wide-sense Markovian hypothesis, if Q , , is the projection on 5352: =


sp{X,, X, }, the linear span, then

However, (I- Qr,,)Xt is orthogonal to 9.R:, so that it is orthogonal to both


X, and X,. By (20), (I- Q,,,)Xt = Xt - p(s,t)X, = Y, is orthogonal to X,
(and X,). Thus

Cancelling 0 < E(IXT12) , this gives (18).


Conversely, suppose (18) is true for any r < s < t . Then (21) is true; it says
that X, and Y, are orthogonal to each other for any r < s < t (r,s, t E T). Also
by (19), X, and Y, are always orthogonal. Hence from the arbitrary nature
of r, Y, is orthogonal to X,, X,, , . . . , X,, for rl < 7-2 < . . . < r k < s < t in T.
This means Y, E 5352; n (5352;1)' so that

Thus the process is wide-sense Markov, completing the proof.

An Application We now present an important and natural applica-


tion of the preceding considerations showing that the various assumptions
made above are fulfilled for a class of problems. Let {Xk, 1 5 k 5 n}
be a sequence of i.i.d. random variables (also called a random sample),
with a continuous strictly illcreasing distribution function F. Consider the
order statistics {X?, 1 < i< n} based on the above family, meaning
X I = min(X1,. . . , X,), . . . , XGpl = the second largest of ( X I , . . . , X,), and
X c = max(X1,. . . , X,). Since X I < Xz < . . . < X c with probability one,
they are not independent. Also, the Xi need have no moments.
Regarding the "process" {X:, 1 i < < n}, it has the remarkable Markov
property already observed by Kolmogorov in 1933.
154 3 Conditioning and Some Dependence Classes

Theorem 9 Let X I , . . . , X, be i.i.d. random variables on (0,C, P) with


a continuous strictly increasing distribution function F. If {X:, 1 i < <
n}
is the process of order statistics formed from the Xi, let Y , = F(X,*) and
Zi = log F(X;+l-i), 1
- < < i n. Then the three sequences {X,*, 1 i < <
n), {Y,, 1 < < i n), and {Z,, 1 < < i n) form strict-(=ordinary-) sense
Markov families with (1,2, . . . , n ) as the parameter (=index) set. Moreover,
the 2,-process is one of independent increments, and the Y,-process is both a
strict and a wide-sense Markov family. If 1 <
ii < ia < . . . < ik <
n and
dF/dx = FFIexists, then the joint distribution of X,*,,. . . ,X,*k has a density
g,,,...,,, (., . . . , .) relative to the Lebesgue measure, and is given for o o < X1 <
A2 < . . . < X k < w , by

and gi, ,,,,,i, = 0 for all other values of Xi.

Proof Let us first recall that if X t : 0 + B ( t E T , interval) is a Markov


process with ( B , B ) as measurable space, and & : B + A is a one-to-one aiid
onto mapping, where (A, A) is a measurable space, with q5,l(A) = B, t E T,
then = cjt(Xt) : R + A is also a Markov process. Indeed, Ct = Y,-'(A) =
x;~(&~(A)) = x;'(B) = a ( X t ) c C, and, by Definition 1, X t is a Markov
process, which means the set of a-algebras { a ( X t ) = Ct, t E T} is Markovian.
Consequently the Yt-process is Markovian. We identify them with X,*, Y,, 2,.
Taking q5t = F : R + (0, I ) , all t = 1 , 2 , .. . , n , in the above so that &
is one-to-one, onto, aiid the Bore1 0-algebras are preserved, it follows that
{X:, 1 i < < n} is Markovian iff {Y,, 1 < <
i n} is; and then the same is
true of {Zi, 1 i < < n}, since, if $(x) = - logx, $ : ( 0 , l ) + R+ is also one-
to-one aiid onto, with similar properties. Because of this we shall establish
directly the Markovian property of the Zi-sequence. [Similarly one can prove
it directly for the Y,-sequence also; see Problem 22 for another argument.]
Note that if tk denotes the kth largest of the numbers X l (w), . . . , X,(w),
then Xc ( w ) = ( X l (w), . . . , Xn (w)). Consequently by hypothesis on F, it fol-
lows that Y , = F(X,*) = F ( C ( X 1 , . . . , X,)) = ti( F ( X l ) , . . . , F ( X n ) ) . Hence
Y , is the ith-order statistic of F(X1), . . . , F ( X n ) . Similarly, Zk is the kth-order
statistic given by Zk = log F(X;+l-k) = Ek ((1, . . . ,
- en), where
= - log F ( X , )
This follows from the fact that - logy is decreasing as y > 0 increases, and
hence the kth largest from below t o Zk corresponds t o the kth smallest from
above t o the X,- or F(Xi)-sequence. If G ( . )is the distribution of the (, (which
are clearly independently and identically distributed), then
3.3 Markov Dependence

Hence for z > 0,

siiice F is strictly illcreasing and Xi has F as its distribution function. Thus


G ( . )has a density g given by

and 2 1 , . . . , 2, are order statistics of an r.v. whose density is (23). Note that
< <
in the above, we have also obtained the d.f. of qk = F ( X k ) .In fact, 0 qk 1
a.e., and its d.f. is given by

These facts will be useful for the following computatioi~s.


Let us now derive the joint distribution of the order statistics. Since -ce <
X I < . . . < X; < ce a.e. (equalities hold only with probability zero, since
F is continuous), the range of these fuiictioiis is the set A = {(zl,.. . , z,) :
c e < z1 < z2 < . . . < z, < W}(C Rn), which is Borel. Let B C A be any
Borel subset. Then

u
[the sum is over all permutations o of ( I : . . . , n)]

=
u
/ 'i'/ dF(hl) . dF(h,), (siiice the Xi are i.i.d.)

If F1 exist>s,then the above can be written as


156 3 Conditioning and Some Dependence Classes

But this is true for all Borel subsets B of A. Hence by the Radon-Nikod9m
theorem, ( X I , . . . , X i ) has density, say, f * , given by

n! F1(X1).. . F f ( X n ) if ce < X1 < . . . < A, < +oo,


f*iX1,. .. , x ~=)
{
-

0 otherwise.
(26)
<
If 1 zl < z2 < . . . < ik <
n is a subset of the set ( 1 , 2 , . . . , n ) , then the
marginal density of X ; , . . . , XG, say, gi , i k (21,.. . , x k ) , is obtained from
(26) by integrating f *, for fixed x l < x2 < . . . < xk, relative to all the X over
the following set

This gives (22). The details of this elementary integration are left to the reader.
It remains to establish the special properties of the Y,- and 2,-sequences.
First consider the 2,. Let Uk = Zk - ZkP1 (and Zo = 0 a.e.), k = 1, . . . , n.
>
Then Uk 0, and Z1 = U1, Z2 = U1 U2,.. . , Zn = + Uk. The mapping
from the range space of (Z1,. . . , Z n ) to that of (U1,.. . , Un) is one-to-one,
and we shall compute the joint distribution of Ul, . . . , U, and show that they
are mutually independent. Since, as seen in Example 3, each partial sum se-
quence of independent random variables is a Markov process, both properties
announced for the &sequence are established at once. Now with (23), the
joint distribution of Z1, . . . , Zn is given from (26) as

The Jacobian J of the above mapping from (21,.. . ,Z,) + ( U l , . . . , U,) is


< <
given in their range spaces, with zk = C zk= l u2,1 k n , which is triangular,
and one finds
J = d ( z l , . . . , z,)/d(ul,. . . ,u,) = 1.
Thus for any n-dimensional Borel set B in the range space of the Ui, if A is
its preimage in the range of 21,.. . , Z,, then
3.3 Markov Dependence 157

Since B is an arbitrary Bore1 set in the positive orthant of Rn, the range
space of (U1, . . . , U,), it follows from (271, by means of the Radon-Nikod9m
theorem, that the Uk have a density function h given by

h(u1,. . . ,u,) =
n! nj":; C ( ~ - J ) ~ J + ~if u1 > 0 , . . . , u, > o
otherwise.

Since the density factors, it results that the set {Ul,. . . , U,} is a mutually
independent family (Uj with density = constant .e-(n-j)U~+lfor uj+l 0). >
Thus the &-sequence is Markovian, and is in fact of independent increments.
It only remains to verify the wide sense Markov property for the real Y,-
sequence.
Let r ( i l , i2) = E(Y,,Y,,). Now using (24) in (26) or (22) for 1 i l < i2 < <
n , we get, on substitution, that the densities g,, and g,,,i, of Y,, , (Y,,,Y,,) to

and

Hence

Similarly,
158 3 Conditioning and Some Dependence Classes

Thus

Hence for 1 < i l < i2 < i3< n , we get p(il,i2)p(i2,i3) = p(il,i3). By


Proposition 8, this shows that the Y,-sequence is also wide-sense Markov. The
proof is complete.
It is easy t o prove the Markovian property of the order statistics (for which
the existence of F' need not be assumed) using the Y,-sequence instead of the
2,-sequence employed in the above proof. This alternative method of interest
is given later as Problem 22. The point of the above result is t o recognize
this (Markov) property for order statistics of a (finite) random sample from
any continuous (strictly increasing) distribution function. This illuminates
the structure of the problem and admits further analysis. (cf. Theorem 5.4.8,
7.2.5.)
For a deeper study, the class of Markov processes must be restricted t o sat-
isfy some regularity conditions. There are numerous specialized works, both
intensive and extensive, on the subject. Proposition 6, for instance, exhibits
an intimate relation of these processes t o the theory of semigroups of opera-
tors on various function spaces. We will not enter into these special relations
in the present work. We now turn, instead, t o showing that such processes
exist under quite broad aiid reasonable conditions.

3.4 Existence of Various Random Families


In all the preceding discussion, we have assumed that there exists a family
of random variables on a probability space (0,C , P ) . When does it really
exist? This fundamental question will now be answered. Actually there are two
such basic existence results, corresponding t o the independent and dependent
families. These were proved in the 1930s. They are due t o B. Jessen aiid
A. Kolmogorov, respectively. Both results were extended subsequently. We
discuss them here and present the details of a relevant part of this work that
implies the existence of all the random families studied in our book. Readers
pressed for time may glance through this section and return t o it later for a
detailed study.
To introduce some terminology and t o give a motivation, consider (a, C , P).
An indexed collection {Xt, t E T) of random variables Xt : 0 + R is termed
a stochastic (or random) process or family. If T c R, let t l < . . . < t, be n
points from T , n > 1, and consider the n-dimensional distributions (or image
measures) of X t l , . . . , Xtn given by
3.4 Existence of Various Random Families 159

As n varies, the set IFtl,,,,,t,b,t, E T, n >


1) clearly satisfies the following
system of equations, called the compatibility conditions:

aiid if (il , . . . , i n ) is a perinutatioii of (1,. . . , n), then

The mysterious condition (3) simply states that the intersection of the sets
inside P ( . ) in (1) is commutative and thus the determination of F does not
depend on the ordering of the indices of T. Equations (2) aiid (3) can be
put compactly into a single relation, namely: if for any set a, = ( t l , . . . , t,)
of T, Rn = X ~ , E ~ , , R ~=, , RR, B ~ n, = @t,E,,,Bt,,Bt, = B , the Bore1 a-
algebra of R, and .irn,,+l . . Rn+l + R n , the coordinate projection [i.e.,
-1
~ n , n + l ( x l , .. ., % + I ) = (21,. . . , xn)], then Rn+' = Tn,,+l (Rn), Bn+l >
1
Tn,n+l (Bn). Moreover, if we define

then (2) and (3) can be combined into the single statement that

where PmrL = P o (Xt,, . . . , Xt,,)-I. Thus (4) is the compatibility relation.


The preceding discussion shows that a given random family always in-
duces a set of mutually compatible finite-dimensional probability distributions
(=image laws) on {Rn, n >
1). Kolmogorov (and, in the important case of
independence, Jessen) showed that, conversely, such a compatible collection of
finite-dimensional distributions determines a random family on some proba-
bility space whose image laws are the given (finite-dimensional) distributions.
We precisely state and prove these statements in this section. First let us give
two simple examples of a compatible family of probability distributions.
Let p be any measure on (R, B) (e.g., p is the Lebesgue or counting mea-
sure). If f,, n >
l, are any iioiiiiegative B-measurable mappings such that
sroO >
f,dp = 1,n 1, consider the family

where an = ( 1 , 2 , . . . , n ) . Then {Pa,, >


, n 1) clearly satisfies (2) aiid (3), or
(4). It is, of course, possible to coiistruct more complicated sets {P,,, , n 1) >
with other specific measures such as the Gaussian (to be detailed later). The
160 3 Conditioning and Some Dependence Classes

measures defined by the right side of (13) in the preceding section can be shown
to be a compatible family also. A class of distributions n 1) which >
cannot be so factored is the following "multinomial" (discrete) distribution:

where a, = ( 1 , 2 , . . . , n ) aiid

f i ,...,n(x1, . . . , xn) = if 0 < pl < 1, z, > 0, xzlX i = n , z, integer ,


xZlpi = 1,
0 otherwise.
It can be verified that n >
1) is a compatible family. Many others can
be exhibited. See also the interesting Exercise 5(c) of Chapter 2.
We turn to the basic existence theorems noted earlier. One of the problems
is the (formidable) iiotatioii itself. Thus a special effort is made to minimize
this unavoidable difficulty. Let (a,,C,, P,)aEI be a family of probability spaces.
(Here I = N or I = T c R+ is possible. But I can be taken as any index
set, since this has no effect on the notational or conceptual problem.) If 0 =
xiEIRi is the cartesian product, let ni : R + fli be the coordinate projection,
so that w = (wi,i E I) E R implies ni(w) = wi E Q i , i E I. Similarly, if
a, = ( i l , . . . , i n ) is a finite subset of I, = x,,-,,, fl, theii : fl +
with T,,~(W)= (w,,, . . . , wirL).Let C,,, = @,,-,, C, be the product a-algebra
of R,,, . The subsets {w : .iri(w) E A) c R for A E Ci and {w : T,,~(W)E
B,) c 0, for Bn c Can are called cylinder sets of R with bases A and B n ,
respectively. If 3 is the collection of all finite subsets of I aiid an,a, are
two elements of 3, we say a, < a, if a, c an aiid for any PI, P2 in 3
there is a y E F such that /Il < y , Pa < y.(Simply set y = P1 U p2 E F ) .
With this definition (3, <) becomes a directed set, so that for Pi, p2 in 3,
pl < P2 is meaningful, and for any two y1,72 there is a larger element 7 3
( ~ <i 731 i = 1 1 2).
If alla 2 are in 3,a1 < a a , theii we can define another coordinate projec-
tion between fl,, , R,, namely, n,,,, : R,, + fl,, . [In (4), n,,,+l is such a
T,,,,.] For example, if a1 = ( i l , i z ) , a 2 = (21, i 2 , i 3 ) ,and w,, = (wi,,w,,, w,,)
with wi E Ri, then T,,,, (w,,) = (wil ,wi,) = w,, E R,, . It is now easy t o ver-
ify that for any a1 < a 2 < a3, we have nal,, = nal,, o n ,,,, , n,, = identity,
and n,, = n,,,, on,, . Moreover, the relation between the product a-algebras
is that n;&, (C,,) c C,, . Thus each n ,,,, is (C,, , C,,)-measurable. If for
each a! E 3,we define P, = @,,-, P, by the Fubini theorem (cf. 1.3.11), then
{Pa,a! E F) forms a compatible family, in the sense that for each a1 < a 2
in 3 we have P,,(T;&, (A)) = Pal(A), A E C,,, satisfying (4). This is a
consequence of the image measure theorem (cf. 1.4.1). With this notation, the
followiiig result is obtained:
3.4 Existence of Various Random Families 161

Proposition 1 Let {(R,, C,, Pi),i E I) be a family of probability spaces


and R = xitlQi, C, = C2, P, = Pi, a E 3 , where 3 i s
the directed set of all finite subsets of the nonempty index set I . If we let
Co = U,tFn;l(C,) be the class of all cylinder sets of fl, then Co i s a n
algebra of R and there exists a unique finitely additive function ( a "rob-
ability") P : Co + [O, 11 such that P o T;' = P, for each a E 3 , with
{P,, a E 3) as a compatible family of probability measures o n {C,, a E 3).

Proof Since inverse mappings preserve all set operations (i.e., unions,
intersections, differences, and complements) for any collectioii [which need
not be countable, e.g., n i l(no AD) = no n;l(Ap)], it follows that n i l (C,)
is a 0-algebra on R , a E 3,and if a1 < a z , then n;:(C,,) c n;:(C,,). This
immediately implies that Co is an algebra (and in general n o t a a-algebra).
We now define P on Co. The compatibility of the P, was already verified
preceding the statement of the proposition, and it is essential for the definition
of P.
Let A E Co. Then A E n;l(C,) and perhaps also A E n i l (Cp). Thus
there are B1 E C,, Bz E Cp such that A = r;'(B1) = T F ' ( B ~ ) . Let =
a U /3 E 3,so that y > a ,y > p. Then we know from the definition of the
T, above that T, = T,, o T, and TO = TO, o T~ Consequently A can be
represented in two ways:

Comparing the last quantities of both lines of (5), we get

But T, : R + Q7 is onto [ T ~ ( R=) Q7], and this implies T;'(B) = 0 iff B = 0.


Hence n G ( B 1 ) = niy'(Bz). By the compatibility of the P, noted above,

Hence if we set P ( A ) = P,(B1) = Pp(Ba) then P is uiiainbiguously defined,


and P : Co + [O, 11. Atoreover, P,(B1) = P(r;l(B1)) = P ( A ) , so that P, =
POT;'. If A1 , A2 are disjoint sets in Co,then A1 E n:; (C,, ), A2 E T;; (C,, ),
so that if y = a1 U a2,then A1, A2 are in T;' (C7) [> T:; (C,, ) U T& (C,,)].
Hence Ai = n;' (Di), Di E C,, i = 1 , 2 , disjoint, P ( R ) = 1, and

Thus P is a finitely additive "probability." The uniqueness of P is now evident.


Note that we have not used the product measure property of P, in the above
162 3 Conditioning and Some Dependence Classes

argument, and the result is thus valid for any compatible family of {P,, a! E
F). The proof is complete.
The main problem now is to show (under suitable conditions) that P is
0-additive on Co, so that it has a unique a-additive extension P t o a(Co),
(= C, say). The triple ( R , C, P) is then the desired probability space, giving
our first existence result. We now establish this property for the product case.

Proposition 2 The function P : Co + [O, 11 of the preceding proposition


is a-additive, when each Pa is a product probability.

Proof Since P is already shown t o be additive, a-additivity is equivalent


to showing that P is continuous at 0 from above, i.e., if A, J, 0, A, E Co,then
P(A,) + 0. This is further equivalent to showing that for any decreasing
sequence {A,,n > 1) c Co, P(A,) >
6 > 0 implies Or==,A, # 0, since
P(A,) > P(A,+l) is implied by additivity. This is verified as follows.
Since {A, J) c Co = Ua,Fn;l(~,), there exists a sequence {a,, n >
>
1) c F such that A, E ~ ; , t ( C , , , ) , n 1. But we have seen that for a! < p,
T ~ ' ( C> >
~ T);' (C,). Thus replacing the sequence {a,, n 1) by {P,, n 1) >
with /Il = al, . . . , p, = UL=l a k E F, if necessary, one can assume that
a1 < a 2 < . . .. With this, each A, = n;,l(~,) for some En E Can,aiid
P(A,) = Pa(&). Note that A, is a cylinder: A, = En x x , , ~-an R,. Since
P,,, = BZEa7, P, is a product measure, by the Fubiiii theorem, one has, if
a, = (il, . . . , i n ) , that the function

is Pi,-measurable, and Pa,,(B,) = Jn h,(wil)Pi1(dwil) = P(A,)


J 71
6 > 0. >
Also, 0 5 h, 5 1 aiid the decreasing sequence {h,) has its integrals bounded
below by S > 0, it follows by the Lebesgue bounded convergence theorem that
there exists an w i E Ql such that h,(w;) f , 0 as n + cm.Next apply the
same argument to the function

and deduce that there is an w2: E Ri, such that gn(~E) f i 0. Repeating this
procedure, we see that XB,, ( W E , . . ,u:~, w , ~ +. .~. ,w,,, ) cannot be zero for all
points ( w , ~ +. ~. . ,w,,,) E fl,,+, x . . . x a,,,. Thus there exists an w0 E f l such
that w0 = ( w i , . . . , w,: wik+, . . .) E A,, for any n > k. If /3 = Uzl,ai we
can choose w0 such that its countable set of components corresponding to the
countable set P, and the rest arbitrarily in xiEI-pfli. Then by the form of
A, (that they are cylinders), w0 E A, for all n > 1, so that w0 En,"==, A,.
This shows that P(A,) > S > 0 implies n,A, # 0, and hence P is a-additive
on C o Thus (0,C,P) exists, where P is the unique extension of P onto
3.4 Existence of Various Random Families

C = a(Co),by the Hahn extension theorem. This completes the proof.

Remark The space (0,C, P ) is called the product of {(Ri, Ci,Pi),i E I),
aiid usually it is denoted ( R , C, P ) = BiE1(Ri, Ci,Pi).

The preceding two propositions can be combined to yield the following


result about the existence of a class of random families. This is obtained by
B. Jessen, noted above, in a different form.

Theorem 3 (Jessen). Let {(a,, C,, P i ) , i E I} be any family of probability


spaces. Then there exists a probability space ( R , C, P), their product, and a
family of random variables (i.e., measurable mappings ) {Xi, i E I), where
Xi : 0 + Ri is defined as Xi(w) = wi E Qi for each w = (wi, i E I) (Xi is the
coordinate mapping), such that (a) they are all mutually independent, and (b)
for any measurable rectangle Ail x . . . x A,,, E C,, x . . . Gin one has

Proof The first part is a restatement of Proposition 2, and since xi1(Ci)


c Co, it follows that each X, is measurable. That the X, are independent
and that (7) is valid now follow immediately. Indeed, each Xi is a coordinate
function, and hence x,-l = .ir,l in a different notation, so that we have

[ here a is ( i l l . . . , i n ) and A, = Ail x . . . x Ai,, E C,]

n
= P ( A , ) (since P is a product measure) (8)
k=l

Thus (8) is ( 7 ) ,aiid (9) proves independence, since it is true on Co and hence
by the (T, A) criterion (cf. Theorem 2.1.3), on all of C . This completes the
proof of the theorem.
164 3 Conditioning and Some Dependence Classes

If we take each (R,, Ci) as the Borelian line ( R , B ) and Pi : B + [O, 11


as any probability (or a distribution) function, then the above result ensures
the existence of arbitrary families of mutually independent random variables.
Thus all the random families considered in Chapter 2 exist. Regarding the gen-
erality of the preceding result, it must be emphasized that the spaces (a,,C,)
are abstract and that n o topological conditions entered in the discussion. Now
looking at the proof of the key Proposition 2 (and hence of Theorem 31, one
feels that the full force of independence was not utilized. In fact, one can
use the same argument for certain (Markoviaii-type) dependence families, in-
cluding those of the preceding section. Such an extension is presented for the
case when the index set I is N, the natural numbers. This result is due to C.
Ionescu Tulcea (1949). [It may be noted in passing that if I is uncountable,
then (0,C, P) will not be separable even when each Qi is finite.]
As a motivation, let us first record the following simple result when I is
countable.

Lemma 4 Let {(0,, Ci,Pi),i E W) be probability spaces and (0,C, P)


be their product, as given by Theorem 3. If Ai E Ci and A = xZENAi,t h e n
A E C and
n r o o 1
P ( A ) = lim
n-00
II
i=l
P, (A,) =
L i=l 1
P, (A,)

Proof Let a, = ( I , . . . , n) and T, : R + R, be the coordinate projections,


so that T,,, : R + = x k l f l , . If B, = A1 x . . . x A, and = x,,,R,,
then r,,i(B,) = B, x dn) E C and r,,i(B,) > T ; , ~ + ~ ( B , + ~Also,
).

so that A E C, since C = a(Co)is the a-algebra generated by the cylinder


set algebra Co of 0. Hence by Proposition 2,

= lim Pi(Bi) [by (7)]


n-00
i= 1
This establishes ( l o ) , and hence the lemma.
It is of interest to note that the algebra of cylinder sets Co,as defined in
Proposition 1, can be described in the present case as follows. If An E C,,
then T;' (A,) E Co. But n;l(An) = fll x . . . x flnPl x An x a(")= T;,! (D,),
where D, = fll x . . . x fl,-1 x A,, and a, = ( I , . . . , n ) . Thus if a, < a,
is again written for a, c a , and C,,, = C,, we have Co = Un>l 5 -
,,b,

where = T;,:(C,~,). This reduces the compatibility condition t o saying,

since kerb c 5,,,+,


, that = I where (r;,; (D,)) = (D,)
3.4 Existence of Various Random Families 165
- -
for D, E C,,,; i.e., the image measures P,,, of P,,, on are extensions of
each other as we go from g,,, to Garb+,. Hence by Proposition 1, there exists
a unique additive mapping on the algebra Eo into [O,1] if only we are given
probabilities Pa,,on C,,, such that

1 < < i n. The P,,, need not be product measures. Indeed (11) is just the
statement that P,, = P,,,,, IC,~, . (Verify this.)
Suppose now that we are given an initial probability Pl on E l . Let us then
say that PO7, is a pmductlike measure on C,,, , where a, = (1, . . . , n ) as before,
if there exist mappings P,(., .) : C, x RorLpl+ R+ such that

(i) P,(A,; .) is CarLpl -measurable for each A, E En, and


(ii) P,(.: w,~,+,) is a probability for each w, n p l E fl,,,p, = xy:R,n > 1,
in terms of which

P,,, (Al x . . . x A,) =

for each measurable rectangle A1 x . . . x A, of C,,, .


The classical (unsymmetric) Fubini theorem implies that (12) is well de-
fined and has a unique a-additive extension to EarL, n > 1. Atoreover, condi-
tion (ii) on P,(.; .) implies that the Pansatisfy (11). In other words, if Pa",is
the image measure of Pan (thus Pa",o T;,: = Pan), then {Pa,,, g,,, , n > 1)
is a compatible family of probability measures. Here P,, = PI is the initial
probability. If each P,(.; w,,, ) is independent of the second variable worLpl
then Pa,,is simply the product probability. But what is the basis of (12)? Are
there any P,(.; .) other than the absolute probabilities to talk about in con-
nection with the generalization? Indeed, such measures are regular conditional
probabilities if each (R,, C,) is (R, B ) , the Boreliaii line, by Proposition 2.8.
Thus such productlike measures can exist without being product measures.
Also (12) is obtained from the commutative property of conditional expecta-
tions and the image probability theorem, since we have C,, c g,, . .. c Can
and for any bounded f : f l + R,measurable for C,,,,, ,

The right side of (13) is just the right side of (12) when regular conditional
probabilities exist. We now present the desired extension using the above no-
tation.

Theorem 5 (Tulcea) Let {(R,, C,),i E N) be a family of measurable


spaces and suppose that Pm7, : Con + [O, 11 i s a productlike probability for
each n > 1 with P,, = PI a s the initial probability. T h e n there exists a unique
166 3 Conditioning and Some Dependence Classes

probability P on ( R , C) such that PmrL


= P o T;,:, n >
1, where T,,~ is the
coordinate projection of R into Garb,( 0 , C ) being the (product) measurable
space introduced above.

Proof It was already noted that {P,n,n >


1) defined by (12) forms a
compatible family. Hence there exists a finitely additive P : Co + [O, 11 such
that P o T;,: = ParL, n > 1. TO show that it is a-additive, we consider an
>
arbitrary sequence C, > 6,+1,{ C n , n 1) c Co,n;=, En = 0, and verify
>
that P(c,) + 0. If this is false, then there exists a 6 > 0 and ~ ( 6 , ) S > 0
for all n. Proceeding exactly as in Proposition 2, it may be assumed that there
exist n1 < n2 < . . . such that C, E 5 z
,,b,, > >
1, and then P,,, (C,) 6, where
C, is the base of the cylinder 6,. Then

is PI-measurable, 1 > h, > >


h,+l > 0, and Pa,,,(C,) = JQ,h , ( w ~ ) P l ( d w ~ )
6 > 0. Here all are Lebesgue integrals and, by the inonotoiie convergence,
there is an w! E fll such that h,(wy) +
0 as i + oo. We then repeat this
argument and deduce, as in the proof of Proposition 2, that n,",l
C, # 0.
Thus P must be a-additive on Co. The rest of the argument is the same in
both cases, and this terminates the proof.

One of the interesting features of this result is that Jessen's theorem ex-
tends t o certain noiiindepeiidence (but just productlike) cases. Again no topo-
logical conditions exactly intervene, but the existence of regular conditional
probabilities, is assumed instead. We now show that this result implies the
existence of Markov processes that were discussed in the last section.

Theorem 6 Let {(a,, C , ) , i E N) be a sequence of measurable spaces


and 0 = x p l O , , C = a(Co),with Co = Un>l T;,:(C,,,) as the algebra of
cylinder sets. If PI : C1 + [O, 11 is an initial probability, for each n > 1, let
P,(.; .) : En x Rn-1 + [O, 11 be a (Markov) transition probability in the sense
that

(i) P,(.: wnPl) is a probability measure on C, for each w,-1 E


(ii) P,(A; .) is Cn-l-measurable for each A E C,.

Then there is a unique probability P : C + [0, 11, a Markov process {X,, n >
1) on (R, C, P) such that PIX1 E A] = P l ( A ) , A E El, and for each A,", E
C, of the form A,,, = A1 x . . . x A,, a , = ( 1 , 2 , .. . , n), we have
3.4 Existence of Various Random Families 167

I n fact, X,(u) = w, E On, n >


1, defines the above Murkov process o n
(0,C, P) with values i n the spaces {R,, n 1). >
Proof If Pan denotes the measure defined by the right side of (14), then
it is a productlike probability, since, comparing it with (12), we can take

as Pk(.;wk-1). Consequently, the existence of a unique probability P on ( R , C)


follows from Theorem 5. Since X, : 0 + R, are coordinate functions (indeed
X, = n,) it is clear that X;l(C,) c C and the X, are measurable. Thus
{X,, n > 1) is a random process. To see it is a Markov process, we need t o
show that

for each A, E E n . Let C = [X, E A,], and B E a ( X 1 , . . . Xn-1) = B. We can


restrict B to the generators of B and verify (15) on any such B. Hence, writing
~ ) expressing B = 0:~:n i l ( ~ i ) , A ci Ci,
P ( C I X 1 , .. . , X n - l ) as E ' ( ~ and
we have

[since C = X;' (A,) = n;' (A,) and B is as above]

[where P, ,,-, on is given by (12)]

=L P ( [ X , E A,] IXnPl) (u)Pu(dw) (by the image probability law).

Since the extreme iiitegrands are B-measurable and B is a generator, we can


identify them P-uniquely. But this is (15) in a different notation, and com-
pletes the proof of the theorem.

This result implies that all random families considered in the last section
exist. Regarding both Theorems 5 and 6, the reader may have noticed the
special role played by the availability of a minimum value in the index set.
168 3 Conditioning and Some Dependence Classes

This is no accident. If the index set is as general as in Jessen's theorem, how


should we proceed in establishing a corresponding result? The existence of a
minimal element allowed us to simplify the compatibility condition, which is
essential in proving the existence of a finitely additive function P : Co + [O, I]
(cf. Proposition 1). After that, the result in Proposition 2 did not use this. In
the general case, therefore, we need the (strengthened) compatibility of the
P, and a precise version is given below. It enables us t o assert the existence
of (Markov) processes { X t ,t E T), where T c R is any index set.
If {(ai, Ci),i E I) is a family of measurable spaces, where I is an index
set, let 3 be the directed set (by inclusion) of all finite subsets of I aiid
fl, = xiE,fl,, C, = @,€, C,, a E 3,be as in Proposition 1. For each a E 3,
suppose a probability P, : C, + [O, 11 is given. Then the system {P,, a! E 3)
is termed generalized productlike if for each a , p in 3, a! < P, such that
C,? = (a,i l , . . . , i k ) c I (is a finite set), we have

where Ap = A, x Ail x . . . x Ai, with A, E C,, AZJE CiJ,and for each i j

(a) PzJ(AzJ; .) is C, @ Czl@ . . . @ CzJpl-measurable


(b) Pzl(. : w,, w,, , . . . , w , ~ ~ is
, ) a Probability on CzI.

This definition reduces to (12) if I has a minimal element aiid I is count-


able, but is stronger otherwise. Also, if A,, = O i l , . . . , A a k = Oik, SO that
Ap = T;;(A,), where ~ , p: Op + O, is the coordinate projection mapping,
then (16) implies Pp(Ap) = P, (A,), or equivalently, P~(T;; (A,)) = P, (A,).
In fact, for a n y a < p, from 3, (16) implies that Pp o x $, = Pa. This is
precisely the compatibility coiiditioii for the family {P,, a E 3).It is then
immediately obtained from the argument used in the proof of Proposition 1
that there is a unique additive set function P on the algebra of cylinder sets
Co into [O,1] such that P o T;' = P,,a E 3,where T, : R + R, is the
coordinate projection. With this strengthening, Theorem 5 may be restated
as follows:

Theorem 7 Let {(R,, Ci),i E I) be measurable spaces, and {P,, a E 3)


be a s y s t e m of generalized productlike probabilities o n {C,, a E 3) of the given
family. T h e n there exists a unique (a-additive) probability P o n the 0-algebra
C generated by the cylinder sets Co of the spaces {(ai, Ci),i E I) such that
P, = P o x ; ' , a E 3 .

The proof is almost identical to that of Theorem 5, and is left t o the reader.
3.4 Existence of Various Random Families 169

Using this form, a continuous parameter version of Theorem 6 can be ob-


tained quickly. However, the "one-step" transition probabilities employed in
that result will not be meaningful here, since there is no such "step" in the
coiitiiiuous parameter case. A precise version of the result is as follows, and
one needs the full force of the Chapman-Kolmogorov equation [cf. Proposition
3.4, particularly Eq. (7) there] which is now assumed t o hold everywhere.

Theorem 8 Let {(Qt, C t ) , t E T c R ) be a family of measurable spaces


and ( R , C) be their product. (i) If T has a minimal element to, let Po be
the initial probability o n Ct,, o r (ii) if there i s n o minimal element, let Pt
be a probability o n &,for each t E T . Let I- < s < t be points i n T , and
let there be given Markov transition probabilities p , , ( . , .) : C, x R, + R+,
p,,t : Ct x 0, + a+, and p,,t : Ct x R, + R+ such that they satisfy the
Chapman-Kolmogorov equation identically:

T h e n there exists ( i n either case) a unique probability P o n C and a Markov


process {Xt, t E T) o n (0,C, P) with values in {Qt, t E T ) such that (i')
P[Xt, E A] = Po(A),A E Ct, o r (ii') P [ X t E B ] = P t ( B ) ,B E Ct, respec-
tively, and

POTY'(A,)=/ 1 ~ ~ ~ ( d w ~ ~ ) ~ ~ ~ P ~ ~t,,,t,,-l(dwt,,;wt,,-l)l
~ ( d w ~ ~ ; w ~ ~ ) . .
(18)
where A, = Atl x . . . x At", E C,, a = ( t l , . . . ,t,) C T . I n fact, Xt(w) =
wt E Rt, t E T, w = (wt, t E T ) E fl, defines the above Markov process.

This result follows immediately from Theorem 7, and the fact that (17)
implies the consistency of the system defined by (18). Note that if the minimal
element exists, then we can always start for each a! E .F from the minimal
to E T . The modifications are simple, and again are omitted (given as Problem
23).
Observe that we strengthen (17) so as to be valid for all x E fl,, not just
a.a. x E R,. This is useful in constructing Markov processes from each given
starting point; i.e., in case (i'), Pt, may be replaced by Pz[= Pt,(.xo) with
Pto(A1xo)= x A ( x O ) With
] this result we now have the complete existence
theory for the work of Section 3. However, the demand for regular condi-
tional probabilities is not always fulfilled naturally unless we have topologies
in the Rt-spaces with which to use the theory of Section 2. In those cases
(with topologies) we can present a more general and natural proposition for
applications. This is due to Kolmogorov (1933), and its extension t o Bochner
(1955). These results are sufficient for all the processes we deal with in this
book, and in fact for essentially all stochastic theory. We thus turn to them.
They again may be skipped at a first reading.
170 3 Conditioning and Some Dependence Classes

Let {Rt, t E T ) be a family of Hausdorff topological spaces and T a non-


empty index set. Let 0 = x t E ~ Obe t the cartesian product space with the
(Tychonov) product topology. If for each t E T, Ct is a a-algebra of Rt contain-
ing all of its compact sets, let Co = UaEF x;'(C,) be the algebra of cylinder
sets of R where x0 : fl + a,(= xtE,Rt) is the coordinate projection for
each a in F,the directed set (by inclusion) of all finite subsets of T , as in
the preceding discussion. The point of interest in this case is the family of all
cylinder sets of 0 with compact bases from the 0,, as it plays a key role in the
existence theorem. Thus let C = { C c fl : C = x;'(K), K c R, compact}.
Clearly C c Co. Even though the elements of C are not necessarily compact
(since R need not be a compact space), the followiiig technical property is
available.

Proposition 9 If {C,, n > I} c C such that Ck # 0 for each n > 1,


then n n"= l C n # 0 .

Proof Since each C, has a compact base, Cn = K,,, x X ~ E T - ~Qt, , , where


KarLc Garb is compact, so that Cn = T;,~(K,,,). Let TI = Un a,. Then
TI is a countable subset of the index set T . Thus for each t E TI, let a,,
be an element of TI which contains t , aiid be the (compact) base of
C,,. Since each Cn # 0, KOnl# 0 also. Let w0 E fl be a poiiit such that
w: E , t E TI. Since w0 = (w:,t E T ) , this is possible. But R is not
generally compact. Thus we manufacture a compact subset K c 0 and select
a suitable net in K , whose cluster point (there is at least one) in K will lie in
each Cn. This will finish the argument.
Let K = xtET1K m n L x X ~ E T - T ~{w:}. Since each member is nonvoid aiid
compact, K is a (nonvoid) compact subset of fl in its product topology. Now
let D be the collection of all (nonvoid) finite subfamilies of Co = {C,, n > 11,
directed by inclusion. Next for each E E D, set TE = U{a : T;'(K,) E £1.
Theii TE is a finite subset of TI. By the finite intersection property of Co for
each E E D , we have nGEE n,,,
Gn Cn, # 0. Let w" be any poiiit in this
intersection. Since TE c TI, for each t E TE the t t h coordinate WE(= rrt)(wE))
is in KarLt.Let us select a point w E of R for each E by the rule

Theii Z% nnGEBG n K , since w: E KOnlfor t E TI aiid TE c TI. Hence


{z",E E D} c K is a net and, since K is compact, so that it is closed (since
the product topology of 0 is Hausdorff), the net has a cluster point w* E K.
If C, from C is any set, then there is an E in D such that {C,) c I, and
G" E nGEE G + w E E En. Since {wE,E E D} c K , it is also in Cn for all E
sufficiently large or "refined," so that the net enters C, for each n eventually.
Hence w* E Cn for all n , and n,,, -
Cn # 0, as asserted.
3.4 Existence of Various Random Families 171

With this topological property of cylinders, it is possible to present the fol-


lowing result, which is somewhat more general than the original formulation
of Kolmogorov's (1933) existence theorem, but is a specialization of its 1955
extension of Bochner's theorem. Because of this circumstance, we call it the
Kolmogorov-Bochner theorem, and it is an interpolation of both these results.
The classical version will also be stated for comparison. After establishing the
result we discuss its relation to Ionescu Tulcea's theorem and its significance
for the subject.

Theorem 10 (Kolmogorov-Bochner) Let {(at, C t ) , t t T) be a family


of topological measurable spaces, where at i s Hausdorff, Ct i s a a-algebra
containing all compact subsets of Rt, and T i s a nonempty index set. Let F
be the class of all finite subsets of T , directed by inclusion, and

[Here T, : R + 0, = xtE,Rt is the coordinate projection.] For each a! E F ,


let P, : C + [O, 11 be a Radon probability [i.e., a probability that satisfies
P,(A) = sup{P,(K) : K c A, compact ) , A t C, = @,,, Ct.] Suppose that
{P,, a t 3)i s a compatible family, i n the sense that for each a < P, with a, P
in F , Poo n 21 = P,, where n,o : flp i R, i s the coordinate projection. T h e n
there exists a unique probability P : C i [O, I] such that P, = POT,', a t F,
and a family of Qt -valued random variables { X t , t E T } such that

where At, t Ct,,a = ( t l , . . . , t,) t 3 . Here P o n C need n o t be a Radon


probability, but only has the following weak approximation property:

P ( A ) = sup{P(C) : C c A, C a compact based cylinder }, (20)


for each cylinder set A in C.
Proof As noted in the proof of Proposition 1, P : Co + [O, 11 defined, for
each A E Co,A = T;' (A,) for some a E F , by the equation P ( A ) = P,(A,),
is unambiguous, and is finitely additive there. Also, P(a)= 1. [See Eqs. (5)
and (6).] We need to establish that P is 0-additive on Co, so that by the
classical (Hahn) extension theorem it will have a unique C-additive extension
to C = a(Co).
First let us verify (20). If A E Co, so that A = T;' (A,), a! E F and
A, t C,, then

= sup{P,(K) : K c A,, compact } ( since P, is Radon )


3 Conditioning and Some Dependence Classes

< sup{P(C): C c A, C a compact based cylinder). (21)


But for each C c A, P(C) < P ( A ) ,so that the opposite inequality also holds
in (211, aiid hence ( 2 0 ) is true, as stated. We now establish a-additivity.
Let {A,, n > 1) c Co aiid A, J 0.It is sufficient to show, by the additivity
of P on Co that P(A,) J, 0, which will imply a-additivity. Let E > 0. Since
A, > An+l and A, E r i l ( C , ) for some a , E F, we may clearly assume by
the directedness of F that a , < a,+l,n >
1. Then by (20),there exists a
compact based cylinder C, c A, such that

But the C, need not be monotone. So let B, = Ck c A,. Then B, E Co.


We assert that the P(B,) also approximate the P(A,). To see this, consider
for n = 2.

= P(C1) + P(C2)(by additivity of P on C o )

But C1 U C2 c Al U A2 = A l , since A, J,, and so P ( A 1 )- P(C1U C2) > 0.


Hence writing B2 = Cl n C2, we get

By a similar coinputatioii with Bz aiid C3 one gets for BQ= B2 n C3,


E E E
P(Ba) 2 P(A3) - (i+ 3 + 3) 1

and by induction

But n,",,B, = n,"==,Ckc n,",,A, = 0. Since the Ck are compactly based,


>
by Proposition 9, there exists an no 1 such that n;LlCk = 0. Thus B,, = 0.
Hence (23) implies for all n no,>

and since E > 0 is arbitrary, limn P(A,) = 0. Thus P is a-additive.


3.4 Existence of Various Random Families 173

Finally, let X t : w H wt. Since .irt : 0 + Rt is continuous, as a coordinate


projection mapping (and Xt = rt),it follows that X-'(Ct) c C, and Xt
is a random variable on R with values in Qt. If a = ( a t , , . . . , a t , , ) ,A, =
At, x . . . x At", then

which is (19). Thus the proof is complete.

If each Rt = R, Ct = B, the Borel a-algebra of R, and Po is given by a


distribution function Ft,,,,,, t,, so that

then the above result reduces to the classical theorem of Kolmogorov. We


state it for a convenient reference.

Theorem 11 (Kolmogorov) Let T c R, t l < . . . < t, be n points from


T . For each such n-tuple, let there be given a n n-dimensional distribution
>
function Ft,,,,,, t,b such that the family {Ft,,,,,, t,b,t , E T , n 1) i s compatible,
in the sense that Eqs. (2) and (3) hold. Let R = RT = x t E T R t , R t = R,
and C be the smallest 0-algebra containing all the cylinder sets of the f o r m
{w E R : wt < a} for each t E T and a E R. T h e n there exists a unique
probability P : C + [O, 11 and a stochastic process (or a r a n d o m family)
{Xt ,t E T } o n (0,C, P) such that

PIXtl < 2 1 , . . . , Xt,, < s,] = Ft,,..,,t,, ( a , . . . , s,),t , E T, s, E R,n > 1.


(24)
T h e process i s actually defined by the coordinate functions Xt : w H wt for
each w E R , t E T .

Discussion Since by our work in Section 2, regular coiiditioiial distribu-


tions exist under the hypothesis of this theorem, the result is equivalent to
Theorem 7 when each Qi = R and Ci = B there. In the present case, it is
clearly better to stay with the (regular, that is, Radon or Lebesgue-Stieltjes)
probability functions without going to the conditional distributions. In the
context of Markov processes, it is appropriate to coiisider results such as The-
orems 5-7 without invoking topological hypotheses. The general conditions
for the existence of regular conditional distributions are known, thus far, only
when each Ri is either a complete separable metric (also called Polish) space,
or each (Ri, Ci) is a Borel space (cf. Problem 14). However, Theorem 10 is less
restrictive than either of these conditions. Note also that the random families
coiistructed in Theorems 6, 8, and 10 need not take values in a fixed set. They
can vary from point to point in T. This freedom is useful in applications.
174 3 Conditioning and Some Dependence Classes

The original Bochner version extending Theorem 10 is in terms of abstract


"projective systems." It and related generalizations, which need a more elabo-
rate framework, have been given in the first author's (1981,1995) monographs,
and they will be omitted here. The present theorems suffice for our current
needs. Now we turn to another important dependence class for analysis.

3.5 Mart ingale Sequences


In considering the first general dependence class, namely, Markovian families,
we did not impose any integrability conditions on the random variables. There
the concept involved first conditional probability measures, but for essentially
all the analysis their regularity is also demanded. However, if the families are
restricted to integrable random variables, theii one can employ the concept of
conditional expectations and no regularity hypotheses enter. With this point
in mind we can introduce a large new dependence class, called martingales,
and some of its relatives. The motivation for this concept comes from some
gambling systems. Suppose in a certain game of chance, a gambler's succes-
sive fortunes are X I , X 2 , . . . at times 1 , 2 , .. . Then it is reasonable (and fair)
+
to hope that the "expected fortune" on the ( n 1)th game, having known
the fortunes of the first n games, is the present (or the nth) one. In terms
of conditional expectations, this says that E ( X n + l1x1,. . . , X,) = X,, n > 1,
a.e. The asymptotic behavior of the Xn-sequence is of interest in probabilistic
analysis. Thus we state the concept precisely, and study the limit properties
of the process.

Definition 1 (a) Let (0,C, P) be a probability space and Bn c C


be an increasing sequence of C-algebras. If X, : 0 + R is an integrable
random variable on (R, C , P),and is B-measurable for each n (also called B,-
>
adapted), theii the adapted sequence {X,, B,, n 1) is a martingale whenever
the following set of equations holds:

and {B,, n > 1) is termed a (stochastic) base of the martingale. In case


B, = a ( X 1 , . . .X,), it is the natural base, and then {X,, n >
1) itself is
sometimes referred to as a martingale, omitting any inelltioil of the (natural)
base. Also (1) is expressed suggestively as

E(Xn+lIX1,.. . X,) = X, a.e., n > 1. (2)


(b) An adapted integrable sequence {X,, B,, n > 1) on (0,C, P) is called
a submartingale (supermartingale ) if
3.5 Martingale Sequences 175

In the gambling interpretation, a submartingale is a favorable game and


a superinartingale is an unfavorable game, to the player. A martingale is
therefore a fair game. Thus a sequence {X,, B,,n >
1) which is both a
sub- and supermartingale is a martingale. Note that E(X,) is a constant for
martingales, and is nondecreasing for submartingales (nonincreasing for su-
permartingales) by (1) and (3). For instance, EB1l(Xn+l)= X, a.e. implies
E(Xn+l) = E(EBrl >
(Xn+l)) = E ( X n ) , n 1.

An immediate consequence of the above definition is the following:

>
Proposition 2 Let {X,, B n , n 1) c L 1 ( P ) . T h e n it is a martingale iff
X, can be expressed as X, = C;=, Yk, where E'"Y~+~) = 0 a.e., k 1. >
>
Moreover, for a martingale sequence {X,, B,, n I), if each X, i s i n L 2 ( P ) ,
>
then its increments {Y,+1 = X,+1 X , , n 1,Yl = X I ) form a n orthogonal
>
sequence. [ T h e Y,, n 1 is also termed a martingale difference sequence.]
>
Proof For a martingale {X,, B,, n 11, if we set Y, = X, - XnP1,n > 1,
and Yl = X I , then

Conversely, if the condition holds, then for each n > 1, since X,+l =
~iz: n
Yk = Yn+l+ C k = 1Yk = Yn+1+ Xn, then

>
Hence {X,, B,, n 1) is a martingale.
If the martingale is square integrable, then for the increments sequence
>
{Y,, n 1) we have, with m < n ,

E(Y,Y,) = E(E"~~-~(Y,Y,)) (by Proposition 1.2)

Hence {Y,, n > l)(cL 2 ( P ) ) is orthogonal, as asserted


A simple example of a martingale is the sequence {S,,n >
1) of partial
sums S, = C;=, XI, of independent integrable random variables X, with
zero means.
Large classes of martingales can be recognized by means of the next result:

Proposition 3 Let X be any integrable random variable o n (R, C, P)


and B, c Bn+l c C be a sequence of a-algebras. If X, = E " ~ ? ( X ) ,then
{X,, B,, n > 1) i s a martingale and moreover, i t i s a uniformly integrable set
in L1 ( P ) .
176 3 Conditioning and Some Dependence Classes

(by the commutative property of E"" , since B, c &+I)

Hence it is a martingale.
For the second statement, note that

and for each a > 0, with the Markov inequality,

and the right side tends t o zero uiiiforinly in n as a + cm. Then

(by the conditional Jenseii inequality)

=L E"" ( x [ ~I>.]
~ ,X, ) d P , (because X, is B, - adapted),

=/, X [ X , , > lXldP


~] + 0, (uniformly in n ) ,

as a + oo through a sequence, by the dominated convergence theorem. Hence


>
by Theorem 1.4.5, {X,, n 1) is uniformly integrable, as desired.
To gain some facility with operations on martingales and submartingales
let us establish some simple results oil traiisformatioiis of these sequences.

Lemma 4 Let {X,, B,, n > 1) be a (sub-) martingale and q5 : R + R be


an (increasing) convex function. If E($(X,,)) < cm, for some no > 1, then
{q5(Xk),Bk, 1 < k < n o ) is a submartingale.

Proof The assertioiis follow from the coiiditioiial Jeiisen iiiequality. Indeed,
by hypothesis
EBr7 (&+I) X, > a.e.
3.5 Martingale Sequences

Hence in both cases of martingales or submartingales,

4(X,) < ~ ( E " M (X,+l)) (with equality in the martingale case)

< E"" (4(X,+l)) (by Theorem 1.9iii), (6)

provided E($(X,)) < oo for all n < no. Since q5 is convex, there is a
support line such that ax + b 5 $(x) for some real a , b and all x. Hence
>
E ( 4 ( X n ) ) aE(X,) + >
b > o o for each n 1. Thus if E ( 4 ( X n ) ) < oo, then
< <
I E (4(X,)) I < oo for 1 n no, aiid (6) implies {4(X,), B,, 1 n no) is< <
a submartingale, as asserted.

Taking 4(x) = x + , we get {XL, B,, n >


1) to be a positive submartin-
gale for any (sub-) martingale {X,, B,, n >1). In the martingale case
> >
{x:, B,, n I), aiid hence {IX, 1 , B,, n 1) are submartingales. Taking
4(.) as a iioniiicreasiiig concave function, we get for any (super-) martingale
{X,, B,, n > 1) that {4(X,), B,, n >
1) is a supermartingale if E(q5(X,))
exists. In general, if {X,, B,, n >
1) is a submartingale, {-X,, B,, n >
1) is
a supermartingale. Thus it suffices to consider sub- (or super-) martingales,
aiid the other (super) case can theii be deduced from the first.
Another property, useful in some computations, is contained in

Lemma 5 Let {X,, B,, n >


1) be an L1(P)-bounded martingale, so that
>
sup,>, E(IX,I) < W . Then X, = Xn - xA2),n 1, and {x:),B,,~ >
11, i = 1 , 2 , are positive martingales. [This is a Jordan-type martingale de-
composition.]

Proof Let p, : A H sAX,dP,A E B,. Then p, is a signed mea-


sure and Ip, l (A) = SA IX, d P , the variation measure of p, on B,. Thus
Ipnl(.) is a-additive. Since {IX,I, B,, n >1) is a submartingale, as a conse-
quence of the preceding lemma, Ip, (A) <p,+l (A), A E B,, by (3). Also,
sup, Ip,l ( R ) < oo. Hence if v,(A) = limk,, Ipk1 (A), A E B,, which exists
by the inonotoiiicity of p,l, theii v, : B, + R+ is an additive bounded set
function, since Ip, 1 (.) has those properties. Also the a-additivity of p , (.)
in addition implies the same property of u,, since evidently B, c B,+1 and
vn(A) = v,+l(A) for all A E B,. This is a standard application of results
in measure theory [cf. Halinos (1950, p. 170)]. We include a short proof for
completeness.
Thus for the a-additivity, if E > 0 and no >1 are given, there exists an
n, such that no 5 n, + u,,(Q) < p n E ( R ) ~ / 2 by+ , definition of u,, since
Q E B,,. On the other hand, p,= I (.) is a-additive, so that CEl p,= (A,) =
Ip,, I ( R ) for any measurable partition {A,, j > 1) of fl in B,, = Brie. The
coilvergelice of this series implies the existelice of a jo(€), and since Ip,I(A) <
1 (A) and u,(A) = lim,, Ipnl (A) implies p,(A) < u,(A), such that
the following inequalities hold:
3 Conditioning and Some Dependence Classes

But by choice of n,, we have with (71,

Since E , no, are arbitrary, (8) plus additivity of u,, +-u,, (a)
= C,"=,
u,, (A,),
and hence u, is a-additive on B, for each n . But each p , is P-continuous,
and thus it follows that u, is also P-continuous. By the Radon-Nikod$m
theorem (cf. 1.3.12ii), we may define x;) = dun/dPB,, on B,. Then the fact
that u,(A) = u,+l (A) for all A t B, implies

This means {X?),B,,n >


1) is a positive martingale. But we also have
un (A)> >
pn (A) pn (A), so that

The integrands are &-measurable. Hence xi2)= x;') - X, > 0 a.e., and

Thus {xA2),
B,, n 1) is also a positive martingale, and this finishes the
proof.

The next result extends Kolmogorov's inequality in two ways. The exten-
sions are due to Doob, and HAjek and Rknyi. These are frequently used in
martingale analysis. [Here (ll),(121, and (14) are due to Doob.]

Theorem 6 Let {X,, Bn, n > 1) be a submartingale on ( 0 ,C ,P ) . Then


(i) X t R implies

and
3.5 Martingale Sequences 179

AP [@xk< A ] >I [mink 5 ,, X k <XI


X , ~ P - E(X, - X I ) > E ( x ~ E(x,+).
- )

(12)
(ii) If, moreover, X, > 0 a.e. for each n, we have for A > 0, and a n J, 0,

and for 1 < y < oo,q = p/(p - 1)

qPE(Xi) ifp>l
(14)
E ( l max
k l n XE ) 5 { ~ ~ l + E ( x n l o g + x n )i f~p = l .
e- 1

where log+ a = l o g a if a > 1 and log+ a = 0 if a 5 1, e being the base of 'log'.


Proof (i) As in the proof of Theorem 2.2.5, we decompose M = [maxk<,
>
Xk A] into suitable disjoint events and estimate their probabilities. Thus, let
>
hfi = [XI > A ] , andfor 1 < k < n , set MI, = [Xk A,Xi < A , 1 < i k-11. <
Then Mi E Bi, disjoint, aiid M = U;=, Mk. Hence

E'VX,) dP since I\/IkE Bk


k=l

This gives (11). For (12)! we consider N = [minkin X k A], and set Nl = <
[XI <A]. If 1 < k <
n , let Nk = [XI, & X i A,1 <
i 5
k-11. Thus < <
Nk E Bk, disjoint, and N = UL=l Nk. Hence Nk c Nz)" aiid (u~z;

(since N," E B aiid { X I , X2} is a submartingale for {B1, B2})


180 3 Conditioning and Some Dependence Classes

= hP(N1) + 1
Nz
X2dP + 1
N;-nnr;
X 2 d P (since N2 c N;)

[since (Nl U N2)' E B2 and {X2,X3} is a submartingale for (B2, B3}]

This gives (12) and (i) follows.


>
(ii) The argument is again similar to the above. Since X n 0, a n > 0, let
M = [supk>,a k X k > A]. Set hfi = [ a l X l > h] and for k > l , M k = [akXk>
<
A, a,X, < & 1 I i k I ] . As before Mk E Bk,M = U,,, M k , disjoint union.
If the right side of (13) is infinite, then the inequality true. Thus let it be
finite. Set S = C:=l(an - U,+~)X,. Consequently, E(S) < oo, and, since

>
Xn 0, a, - an+l > 0, we have

(since Mk E Bk and the sequence is a positive submartingale)

Hence (13) obtains.


To establish (14), let Y = maxl<k<,Xk. Then Y E L p ( P ) . If p > 1, by
>
Theorem 1.4.1iii7we have Y 0 because X k 0, and >
3.5 Martingale Sequences

< 41 IX, I l p I YP-' 1, (by Holder's inequality)

If YII, = 0, then the inequality is true; if lYlgiq > 0, dividing both sides
by this number, (15) reduces t o (14) in this case.
If p = 1, we let Z = (Y - l ) ~ -
[ ~ and
> ~ calculate,
] (since (Y - l ) ~ [ ~ <
< O),
~ ]

E ( Y - 1) < E(Z) = P [ Z > y ] d y (by Theorem 1.4.liii again)


10

But a log b < +


a log+ n a log(b/a) a log+ a < +
(ble) for any a > 0, b > 0,
since alog(b/a) has a maximum for a = ble for each fixed b > 0. Thus (16)

(t )
becomes
E(1') - 1 5 E (x,,log+ x,) + E(Y).

which is (14) if p = 1, and this completes the proof of the theorem.

Remarks (1) Letting n + cc in (11) and (12), we get the following


inequalities, which are useful in some computations. For a subinartingale
{X,,B,,n > I}, and X E R,we have [cf. the last parts of proof of (11)
and (12)]
182 3 Conditioning and Some Dependence Classes

X P [sup X, > X] < lim inf X:~P < lim


n
X:~P (17)
n> 1 n

and

iiif X,
XP'n21
< A] > liin sup X L d P + E(X1)

(2) The inequalities (11) and (13) reduce t o Kolmogorov's inequalitx as


they should, under the following identification. Let Yl, . . . , Yn be independent
random variables with E(Y,) = pi and VarY, = a:. If Sn = CL=l(Yk - pk)
and X n = S;, then {Xk,k >. 1) is a positive submartingale, so that (11) gives
for X # 0,

Similarly in (13), let a1 = . . . = an = 1 and ak = 0, k > n,X, = S i . Then


(13) becomes the above inequality. On the other hand, if for any no > 1 we
+
let ak = (no k 1 ) Zk ~= S$+kPl,
- ~ ~ .l3, = a(S1,.. . , S,), considering the
positive submartingale {Zk,Bno+kPl, k >
11, we get

(3) If ( a i / n 2 ) < m , then letting no +m in (19) and noting

as no i cx and then k i oo,so one gets P[lim,,,(Sn/n) = 01 = 1. This is


Theorem 2.3.6, and is one form of the SLLN. Thus the general study leads t o
other applications.

The fundamental questions in the analysis of martingale sequences con-


cern their pointwise a.e. convergence behavior. Two forms of the martingale
convergence theorem are available, due independently t o Doob, and Andersen
aiid Jessen. They are now presented, each with two proofs. Our treatment
also shows their equivalence, aiid gives a better insight into the structure of
these processes.

Theorem 7 (Andersen-Jessen) Let (R, C , P) be a probability space and


u : C 4 R be 0-additive, so that it is also bounded. If u is P-continuous,
and B, c c C are a-algebras, let u, = uB,, P, = P B , be the restric-
tions. Let X, = dv,/dP, be the Radon-Nikodim derivatives for each n 1. >
3.5 Martingale Sequences 183

Then X, + X, a.e., and in L1 (P).Moreover, X, = du,/dP, a.e., where


B, = a(UnylB,),u, = uB, and P, = PB,.

First Proof To begin with, we establish that X, + X' a.e., and then
that X ' = X, a.e. Thus let X, = lim inf, X, and X * = liinsup, X,. Then
X, < X * and are B-measurable. For the first assertion it suffices to show
that if B = [X, < X*], then P ( B ) = 0. Equivalently, if B,,,, = [X, rl < <
7 3< X * ] , SO that B = U{B,,,, : rl,r2 rationals}, then P(B,,,,) = 0, since
the uiiioii is countable. [We used the fact that v is P-continuous +- v, is
P,-continuous and so, X, = dv,/dP,.]
Let a, b be in R and consider Ha = [X, <
a], K b = [ X * b]. Then >
Ha E B,, K~ E B,. We assert that

(i) v(Ha n A) < a P ( H a n A), A E B, ,


(ii) v ( K b n A) > b p ( K b n A), A E B,. (20)
Indeed, let a, \, a , and define H, = [infk21X,+k < a,] ( E B,). Using the by
now standard decoinpositioii of H,, as in the proof of Theorem 6, we express
H, as a disjoint union. Thus let Hnl = [X,+l < a,], and for k > 1, let Hnk
+
be the event that X,+k is not above a, at n k for the first time so that

Then H, = Uk,l Hnk,disjoint union, and H, > Hn+l,H, = H,, since


a, \, a. But A-E U,, - B, +- A E B,, for some no, and thus for n no we >
>
have Hnkn A E Bntk, k 1. Hence

v(H, n A) = v(H,,k n A) (since v is 0 - additive)


k>l

Since v l ( f l ) < oo, on letting n + oo (21) reduces to (i) of (20) if A E U,, B,.
For the general case, let p(A) = aP(H, n A) v(H, n A). Theii p i s a real
-

a-additive fuiictioii on the algebra On,, B, (since B, c and hence


by the classical Hahn extension theorem it has a unique a-additive extension
onto the a-algebra generated by this algebra, namely, B. Hence (i) is true as
stated. (ii) is similarly established, or it follows from (i) if we replace a , u, X,
by b , v and X , .
Now in (20) let a = rl, b = rz, where rl < rz. Theii BrlT, = HTln Kr2,
and (i) and (ii) of (20) yield
184 3 Conditioning and Some Dependence Classes

But this is possible only if P(B,,,,) = 0. Hence P ( B ) = 0 aiid X, = X * a.e.


Let X' be the coininoil value, so that X, + XXIa.e.
>
To see that XXI= X, a.e., note that {X,, n 1) is uniformly integrable.
Indeed, since u (0)< oo, u,l ( R ) <
u (Q), so that sup, JQlX,ldP < cm.
Also,
lim X,dP = lim v,(A) = liin v(A) =0
P(A)+O P(A)+O P(A)+O

uniformly in n. Hence by Definition 1.4.3, the set is uniformly integrable, and


by the Vitali convergence (Theorem 1.4.4),

S, X,dP, = v, (A) = liin v,(A)


n+m
=
+
lim
,, S, xndpm = S, XXIdP,, A E B,.

>
Since rn 1 is arbitrary, this shows that u,(A) = JA XXIdP,AE B,,
and then as in the preceding argument the a-additive function v&(.) -

J(,)X'dP, which vanishes on this algebra, also vanishes on B,. Thus

Since X,,X1 are B,-adapted, X' = X, a.e. Moreover, by the same Vitali
theorem, E(IX, X, ) + 0 as n + cm. This proves the theorem completely.
-

Second Proof By hypothesis, for m < n , A E B,, + u,(A) = u,(A) =


u, (A). Hence

Since the extreme integrands are B,-measurable, X, = E U r n (X,) a.e., aiid


similarly X, = EBm (X,) a.e. Thus {X,, B,, 1 n <
oo) is a martin- <
gale sequence. Since B, = O ( U , > ~a,), it follows that U,,l L1( R , B,, P) c
L1 (a,B,, P), aiid clearly the former space is dense i i i t h e latter. Also,
X, E L1 (a,B,, P),X, E L1 ( R , B,, P). By density, given E > 0, there ex-
ists YE E L1 (a,B,, , P) for some no such that E(IX, Y,l) < €12. Since -

>
EBrb(Y,) = YE a.e. for all n no, we have for n m no > >
IX, X , I Y , I + EBf-(x,) p K I
< I EUn(x,) a.e.
< E", (x, YEI+ E""] (IX, Y E ) a.e.,
- -

(by conditional Jensen's inequality)


< 2 sup
n
E"" (x, - YJ) a.e. (24)

But {EUrb(lX, Y,I), n


- >
1) is a (positive) martingale. Hence by (17) for
any A > 0 we get from (24)
3.5 Martingale Sequences

5 lim P
[
l <sup
k<n E""(x,-Y~';)>x/~]

Letting E \ 0 aiid then X + oo, we get I X, -X,I + 0 a.e. as n , rn + oo.


Hence X, + X ' a.e., aiid by Fatou's lemma E ( X f ) 5 liminf, E(IX,) 5
I u ( R ) < oo. Next we apply the same argument as for (22), and deduce that
the set {X,, n > 1) is uniformly integrable, and hence X' = X, a.e. as well
as E(I X, - X,I) + 0. This finishes the proof.

The preceding argument shows, for any integrable r.v. Z such that X, =
EBr7 (Z),n > 1 (cf. also Proposition 3), that {X,, B,, n > 1) is a uniformly
integrable martingale. Conversely, given any uniformly integrable martingale
{Y,, B,, n. > l), define

Then (23) shows that u, = u,+l B,. Hence we may define u : & +R
by setting u(A) = u,(A) if A E B,, and this gives u unambiguotsly on the
algebra Un>l B,, and it is additive there. The uniform integrability now ad-
ditionally implies that v is o-additive on this algebra, aiid hence has a unique
>
a-additive extension t o B,. Thus for each martingale {X,, B,, n I}, we can
>
associate a compatible system {R, B,, u,, P, n 1) which determines a signed
measure space ( R , B,, u). Here T , : 0 + 0, = 0 are identity mappings.
This exhibits an inherent relation between martingale theory and the exis-
tence theory of Kolinogorov and Bochiier (cf. Theorem 4.10). This seemingly
simple coiiiiection actually is much deeper between these two theories. An as-
pect of this is exemplified in the second proof below. However, if the sequence
{X,, B,, n > 1) is merely a martingale (but not uniformly integrable), then
also u : Un>l B, + R is uniquely defined, but is only a finitely additive func-
tion. F i i i a l l ~note that in Theorem 7, the sequence {X,, B,, 1 5 n 5 oo} is a
>
martingale, so that X, = E'" (X,), n 1. If there is such an r.v. X t L1(P)
with X, = E'~L(X),then the martingale {X,, B,, E'-(X), n > 1) is said t o
be closed on the right.
We shall now present the general martingale convergence theorem, again
with two proofs. The first one is direct, in the sense that it is based only on
Theorem 6i after a preliminary simplification. The second one, based on an
application of Theorem 4.10, is a reduction of the proof t o that of the pre-
ceding theorem of Andersen and Jessen. It therefore shows that both these
186 3 Conditioning and Some Dependence Classes

results are equivalent, although this connection lies somewhat deeper. There
are several other proofs of both these theorems (and also of their equivalence
assertion), but we shall present a relatively simple argument. However, all
these different proofs have independent interest, since they lead t o various
extensions of the subject.

>
Theorem 8 (Doob) Let {X,, B,, n 1) be a martingale on (R,C , P) and
sup, E ( X,I) < oo. Then X, + X, a.e. and E(I X,I) liminf, E(I X,I). <
First Proof Here the convergence assertion follows if we express each
X, = xi1) xi2)
- with {x:), B,, n >
1) (since sup, E(I X , ) < oo) as pos-
itive martingales for z = 1 , 2 , by Lemma 5, and prove that +x x?)
): a.e.
Thus the result obtains if each positive martingale {Y,, B,, n >
1) is shown
t o converge a.e. Since {ecYrb, >
B,, n 1) is a positive uniformly bounded sub-
martingale, aiid Y, + Y, a.e. iff ecYn + ecY- a.e., it is clearly enough
t o establish that each bounded positive submartingale converges a.e. Since
L w ( P ) c L 2 ( P ) , this will follow if we demonstrate that each positive square
integrable submartingale {Z,, B,,n >
1) satisfying ~(2:) K < oo con- <
verges a.e. Now by Leinina 4, {Z:, &, n >
1) is a submartingale, aiid if
<
a, = E(Z:), then a, 1' a , a K < oo, as n + oo, because

a,+l = E(Z:+,) E(EU"(22))>


= ~ ( 2 2=)a, >
0, by submartingale property.

so the expectations of a subinartingale form an increasing sequence. Thus for


n > m,

and both terms on the right are nonnegative since by the submartingale hy-
pothesis ~ (Z,(Z,
~ 1 2,)) = z,(E'.~ (2,) 2,)
- - - >
0 a.e. Now let n + oo,
and then rn + oo; the left side of (25) tends t o zero, and hence so does the
right side. Thus E(Z, - 2,)' + 0, implying that Z, + 2, in L ~ ( P ) .Using
this, we can deduce the pointwise convergence.
Let m >
1 be fixed and consider (2, - Z,, B,, n > m). This is clearly a
submartingale. Hence by Theorem 64 given X > 0, we have

XP
I max (Zk
m<k<n
- 2,) >X <E(Zn
J - ZmI)

and

Let n + oo; this implies, on subtraction,


3.5 Martingale Sequences

P SUP (Zk - 2,)


[k>m
>A (Zk - Zm) < -A
I
If rn + oo in (26), since the L2-convergence of the Z, implies their L1-
convergence, one gets

sup lZk-Z,I
k>m
>X J =O.

>
Hence Zk - Zm I + 0 a.e., and so {Zk(w), k 1) is a scalar Cauchy sequence
for a.a.(w). It follows that Z, + Z, a.e. [and in L1(p)]. Thus, recapitulating
the argument, what we have shown implies that X, + X, a.e. for the original
mart ingale.
The preceding result also implies I X, + I X, I a.e., and then by the Fa-
tou inequality one gets E(I X,I) <
liminf, E(I x,). This proves the result
completely.

Second Proof By Lemma 5 , it is again sufficient t o consider only positive


>
martingales {X,, B,, n 1) and show its a.e. convergence. Thus hereafter we
can and do assume that X, > 0 a.e.
The key idea now is t o identify the given martingale with another mar-
tingale on a nice topological space aiid show that the latter converges a.e. by
means of Theorem 7. We then transport the result t o the original space. To
implement this, let S, = R+, R, = R, the Bore1 0-algebra of R+, aiid consider
the product measurable space (S,Z) = x n E N ( S n , R , ) , as in the last section,
so that S is the cartesian product space with its product topology and Z is
the a-algebra of cylinder sets of S. As usual, let T, : S + S,", = xZCa7, St,
where a, = ( 1 , 2 , . . . , n). We define a probability measure on Z as follows.
Let f : w H (XI (w), X2(w),. . .) E S be the mapping from fl + S. If
A = x E I A , = n r = l ( A l x . . . x A, x ~ ( " 1 ) E Z (cf. Lemma 4.4), then

It follows that f is measurable, and so we can define p : Z + [O, 11 as the image


probability of P under f , i.e., p(A) = p ( f P 1 ( A ) ) , A E C . Then ( S , Z , p ) is
a probability space, aiid if F, = .~r;,l(Z,~,), where Z,n = R,, we have
F, c F,+, c Z. Also ~ , ( f ( w ) ) = X,(w),w E Q , n >
1, and, of course,
T, : S + R+ is a positive random variable. By the image law result (cf.
Theorem 1.4.1), it follows that
188 3 Conditioning and Some Dependence Classes

Hence {T,,F,,n > 1) is a positive adapted integrable sequence on ( S , Z , p ) ,


and moreover, the integrals are constants by the right side of (28)(since Xn
is a martingale sequence). Actually {T,, .En, n > 1) is also a martingale on
(S,1 1
To verify this, let A E F,. Then f p l ( A ) E B,, and

(by the martingale property of the X n )

(by the image law 1.4.1, and A E .En c Fn+1 c Z)

>
Since A E .En is arbitrary, {T,, .En, n 1) is a positive martingale.
Finally, let u, : .En + R+ be defined by u,(A) = JA jan(s)dp(s). Then un
is a-additive on .En for each n. Also, this gives a unique additive set function
u on all the cylinder sets of fl since, u, = un+ll F,, by (29). On the other
hand, let Garb = u, o xi,:. Then G,,, is a finite measure on Z, , aiid so is
a Lebesgue-Stieltjes or regular measure (i.e., by the standard real analysis
theory, each open set has Garb-measurefinite, and it can be approximated
from inside by compact sets, even intervals). If the constant value E ( X n ) = a
is taken as a = 1, by dividing if necessary, then Gar, is even an n-dimensional
distribution fuiictioii on Sorb.Hence by Theorem 4.10, the compatible family
>
{Garb, n 1) uniquely determines a a-additive function 5 : Z + [0,a] such
that Garb= 5 o T;,: = u o T;,:, n > 1. It follows that u is a a-additive and
uniquely (by extension) defined on Z, u, = ul.En, T, = du,/dpF7,. Hence by
Theorem 7, n, + nw a.e. (and n, = d u l d p also.) Thus there is a set A E Z
with p(A) = 0, aiid n,(s) + n,(s), for all s E S A. Let N = f p l ( A ) , so
-

that P ( N ) = p(A) = 0, and-if w @ N , then f (w) @ A-we have

The last statement now follows by Fatou's lemma, as before. This completes
the second demoiistratioii of the theorem.

Discussion Theorem 8 includes the result of Theorem 7. In fact, if


X, = du,/dP,, then {X,, B,, n >
1) is a uniformly integrable martin-
gale, sup, E(I Xnl) < oo. Hence X, + X, a.e., by Theorem 8, and the
3.5 Martingale Sequences 189

uniform integrability implies E(IX, - X,I) + 0. It is then easily in-


ferred that du,/dP, = X, a.e. However, in Theorem 8 one is given only
that { X n , B n , n> 1) is an L1(p)-bounded martingale. Thus if u, : A H
I, X, dP, A t B,, then u, = un+tlBn and u : U,,, & + R is well defined
by ulB, = u,, but, in general, it is only finitely additive. Examples can be
given t o show that u is even only "purely finitely additive," so that Theo-
rem 7 cannot be applied directly. But the second proof of Theorem 8 shows
that with a product space representation [this idea is borrowed from Ionescu
Tulcea (1969)l the preceding noted hurdle is not real, and the result can be
reduced t o that of Theorem 7. Here the crucial poilitwise convergence, which
interestingly enough used the classical Kolmogorov result (really Theorem
4.11 sufficed), followed from Theorem 7. In this way Theorem 8 is also a con-
sequence of Theorem 7. The work needed in this latter reduction is evidently
nontrivial. For this reason, it was believed for some time that Theorem 7 is
strictly weaker than Theorem 8. These various demoiistratioiis also show that
the martingale convergence theorems are essentially related t o the differentia-
tion of set functions. In fact, if Theorem 8 (its first proof) is granted, then we
can actually deduce the Radon-Nikod$m theorem from it. This will become
clear shortly (see Problem 28). In this sense, the extra sets of proofs given for
the above two theorems are inherently useful and illuminating.

It is now time t o present analogous convergence statements for submartin-


gales. Even though these results can be proved independently, it is fortunate
that the submartingale convergence can be deduced from the martingale con-
vergence itself. For this deduction, one needs the following simple decompo-
sition, which turns out t o have, especially in the contiiiuous index case, a
profound impact on the subject. In the discrete case it is called the Doob
decomposition, after its inventor and we shall present it. But its continuous
parameter analog (called the Doob-Meyer decomposition) is much more in-
volved. and is not considered here.

Theorem 9 Let {X,,B,,n >


0) c L 1 ( f l , C , P ) , B , c be any
adapted process (so that X, is &-measurable). Then there exists a martin-
gale {Yn,B,, n >
1) and an integrable adapted process {A,, t3,-1,n 1) c>
~'(0 C ,P),A0 = 0, such that
X,=Y,+An, n > O , a.e., (31)
uniquely. Further, A, is increasing iff the given process is a submartingale.

Proof The decomposition is obtained constructively. Set A. = 0, and


define recursively for n 1 >
An -
-
E",,-' (Xn) - XnP1 + AnP1 = E"-I (Xn - 1 ) + A n - (32)
and let Y, = X, - A,. Then {A,,B,-1, n > 1) is adapted, integrable, Y, t
L1(P), and
190 3 Conditioning and Some Dependence Classes

E"'? (Y,+l) = E"'? [X,+1 - E"" (X,+1) + X, - A,] [by (32)]


= E""(X,)-A,=X,-A,=Y, ax.,

since X, is &adapted and A, is BnP1 c &adapted. Thus {Y,, B,, n 1) >


is a martingale and the decomposition (31) holds.
For uniqueness, let X, = Y, + +
A, = YA A; be two such decompositions.
Then Y, -YA = A; A, = B,, say, defines a process such that Bo = 0,
-

and the left side gives a martingale, while {A; A,, Bn-l, n 1) is adapted.
- >
Since Bo = 0, Yo = Yd a.e., aiid for n 1, one has >
E"'? (B,+l) = E"" (Y,+l - Y;+,) = Y, - YA (by the martingale property)
= y,+1 YA+1
-

(because is &-adapted).

Thus B, = aiid so 0 = Bo = B1 = . . . = B, a.e. This shows Y, = YA


>
a.e., and then A, = A; a.e., n 1. Heiice the decomposition in (31) is unique.
If {X,,B,,n >
1) is a submartingale, then E " ~ ~ ( X , + ~ ) X, a.e., so >
that Al >0, and by definition (32), A, AnP1 a.e., n > 1. Conversely, if >
A, >A,-1 >
0, then (31) implies

Hence {X,, B,, n > 1) is a submartingale, as asserted.


The submartingale convergeiice can now be established.

>
Theorem 10 Let {X,, B,, n 1) be a submartingale with sup, E ( X,I) <
cm. ThenX, + X , a e . , andE(IX,I) liminf,E(X,I). <
Proof By the above theorem, X, = Y, +A,, where A, > AnP1 > 0 a.e.,
>
and {Y,, B,, n 1) is a martingale. Heiice

Since A, /' A, a.e., by the moiiotoiie convergeiice theorem, the iiiequality


(33) implies that E(A,) < co. Hence

by hypothesis. Thus sup, E ( Y,) < cm, and Theorem 8 implies Y, + Y, a.e.
Coiisequently X, = Y, +A, + Y, +A, = X,, say, a.e. The last iiiequality
between expectations follows again by Fatou's lemma, completing the proof.
3.5 Martingale Sequences 191

<
Remark In (33) only the weaker hypothesis that E ( X 2 ) K O < oo is
used, but in (34) we needed E(I X, 1 ) <
K1 < oo. However, these two are
equivalent conditions. In fact, XnI = 2XL - X, and E(X,) >
E(X1) for
submartiiigales. Thus
< 2E(XL) E(X1) < 2 s un p E ( X L ) E(X1)1
E(I XnI) (35)

and hence if E(X$) < K O< oo, then E ( X,I) < K1 < oo [since X1 E L'(P)].
On the other hand, X L < X,I, so that the opposite iinplicatioii is always
true.

We now present a result on characterizations of convergences.

Theorem 11 Let {X,, B,, n > 1) be a submartingale o n (a, C , P), and


B, = a(U,,, B,). T h e n the following statements are equivalent:
-

(i) T h e sequence is uniformly integrable.


(ii) T h e sequence is Cauchy in L1(P) o n ( 0 ,C , P).
(iii) limsup,E(X,I) = K < oo,X, < E " " ( x , ) , ~ > 1, a.e., where
< <
X, + X, a.e., and K = E ( X,), {X,, B,, 1 n oo) i s a submartingale.
(iv) There exists a symmetric convex function q5 : R+ + R + , 4(O) =
0, $(x)/x 7oo as x 7oo, and sup, E(q5(Xn)) < oo.

Proof ( i ) e ( i i ) By uniform integrability sup, E ( X, 1 ) < oo, so that by


Theorem 10, X, + X, a.e. Then X, - X, + 0 a.e., aiid { I X, -X, 1, n>
1) is uniformly integrable. Thus by the classical Vitali convergence (cf. The-
orem l.4.4), E ( X n X, I) + 0 as n + oo and (ii) holds. That (ii)+(i) is a
-

standard fact in Lebesgue integration, independent of martingale theory.

(ii)+(iii) Since the Cauchy convergence implies {E(IX , ) , n >


1) is coii-
vergent (hence bounded), it follows as before (by Theorem 10) that X, + X,
a.e., aiid also E ( X,I) + E ( X,) as n + oo. To prove the subinartingale
property, consider for A E B,, ( m < n )

= Lx,~P Lx,~P
+

(by the Vitali convergence,


since the X, are uniformly integrable)

Since the extreme integralids are Bm-measurable and A E Bm is arbitrary, it


follows that X, < < <
E"~-(X,) a x . Thus {X,, B,, 1 n oo) is a submartiii-
gale, proving (iii).
192 3 Conditioning and Some Dependence Classes

(iii)+(i) Since X, < E"~I(X,) + X$ < E"II(X&) a.e., and E ( X 2 )<


E ( X & ) < oo, by (35), sup, E ( XnI) < oo. Hence by Theorem 10, X n + X,
a.e., so that we also have I X , + I X,I a.e., and the relation E ( XnI) +
E ( X,) implies by Proposition 1.4.6 (Scheffk's lemma) that {X,,n > 1) is
uniformly integrable. Thus (i) holds.
( i ) e ( i v ) This was already proved as part of Theorem 1.4.5, and does not
depend on martingale methods or results. The proof is complete.

For martingales the result takes the followiiig form:

Theorem 12 Let {X,, B,, n >


1) be a martingale on (R, C , P) and
B, = O(U,,~ B,). Then the following are equivalent statements:
-

(i) The sequence is uniformly integrable.


(ii) The sequence is Cauchy in L1(P).
(iii) The sequence is L1 (P)-bounded, so that X, + X, a. e., and
< <
{X,, B,, 1 n oo} is a martingale.
(iv) The sequence satisfies K = sup, E(I Xn ) < oo (i. e., L1 (P)-bounded),
so that X, + X, a.e., and E ( X , ) = K.
(v) There exists a symmetric convex function q5 : R+ + Kt+, q5(0) =
0, q5(z)/z/' oo as z /' oo, and sup, E(4(X,)) < oo.

Proof (i)+(ii)+(iii) and (i)+(iv) have the same proof as in the pre-
ceding result with equality in (36). That (iii)+(i) follows from the fact that
X, = E"~L(X,) a x . , by the present hypothesis, I X , < E"" ( X,) a.e.,
by the coiiditioiial Jeiiseii inequality, and that {E"" ( X, I), n > 1) is a uni-
formly integrable set. [This property was noted before, aiid is an immediate
consequence of Theorem 1.4.5. Indeed, if Y, = ~ " ~ ~ X,I),
( 1

uniformly in n , because X, E L1(P) and

as a + oo, uniformly in n.] Since {I X,l, B,, n > 1) is a submartingale, and


E(XnI) < E ( X , + l l ) , (iv) implies that lim,E(IX,l) = K = E(IX,I),
where X, + X, a.e. by Theorem 8 (thus X,l + X, a.e.). The preceding
equality and Proposition 1.4.6 together imply (i), as in the last proof.
On the other hand (i)@(v) is contained in Theorem 1.4.5, and does not
involve martingale theory. This establishes all equivalences.

Remark A difference between the statements of Theorems 11 aiid 12 is in


part (iii) of the first result, which became parts (iii) and (iv) in the martingale
3.5 Martingale Sequences 193

case. This distinction is basic. In fact, {X,,B,, 1 < <


n cm) can be a sub-
martingale without being uniformly integrable, whereas this complication can-
not occur for martingales, as proved above. Here is a simple counter-example
supporting the former statement.
Let (a,C, P) be the Lebesgue unit interval, aiid B, c C be the a-algebra
determined by the partition (O,l/n], ( l l n , l / ( n - I ) ] ,. . . , (1/2,1). If f, =
-nx(o,l,,], then {f,, B,,n >1) is a negative adapted integrable sequence
such that E(f,) = -1, f, + 0 a.e., and for any A E Bn we have

Hence EBr7(fn+l) = f n a.e., and if f, = 1, then = 1 > 0


~ " ~ l ( f , ) f, >
a.e. Thus {f,, B,, 1 n < < oo) is a submartingale. But Theorem lliii is not
true for this sequence aiid {f,, n >
1) is not uniformly integrable. Note that
>
{f,, B,, n 1) is a convergent martingale, while {f,, B,, 1 n oo) is n o t < <
a martingale.

The following consequence of the above result has some interest.

Corollary 13 Let {X,,n > 1) be a sequence of r.v.s o n (a,C, P) such


that ( i ) X, + X, a.e. and (ii) X,I < Y, E ( Y ) < cm. If {B,, n > 1) i s any
increasing sequence of a-subalgebras of C , and 2, = E n r 1 (X,), then 2, + Z,
a.e., and in L1(P), where 2, = E"= (X,), B, = a(U,21 B,).

Proof By (ii), Z,, Z, are integrable. Also, by Theorem 12iii, {E"" (X,),
< <
1 n cm) is a uniformly integrable martingale, so that EUrb(X,) + X,,
a.e., a n d i n L1(P). Let U, =supnr,IX,-X, < 2 Y , m > 1. Then by hy-
pothesis U, + U, = lim,, supn2, X, - X, = 0 a.e., and dominatedly.
In particular, E(U,) < 2E(Y) < oo, and Urn J 0 a.e. On the other hand, if
12> m,

The last term of (37) tends t o zero a.e. as well as in L1(P), and the first term
also goes t o zero by the coiiditioiial dominated convergence criterion. Hence
2, + 2, a.e. Thus E(EBr7 (U,)) = E(U,) + 0 by the Lebesgue dominated
convergence, so that E ( 2, - Z,I) + 0 also, as n + cm. This proves all the
statements.

It is possible to present convergence results for decreasing indexed (or


reverse) martingales and submartingales. These are slightly easier than The-
194 3 Conditioning and Some Dependence Classes

orems 7 and 8 on probability spaces. It must be noted that there are analogous
results if the probability space ( R , C, P) is replaced by a nonfinite (a-finite)
space ( 0 ,C ,p ) , and then the comparison of difficulties will be reversed. How-
ever, we do not treat this case here. (See Problems 39, 40 and 42.)

Theorem 14 Let B, > B,+l, n > 1, be a sequence of a-subalgebras


from ( R , C, P ) , and {X,, B,, n > 1) be a decreasing indexed martingale
in that X, E L 1 ( P ) and E ~ ' ~ + ~ ( X = ,X,+l
) a.e., n > 1 [equivalently, if
B-, 3 B-,-1,X-, E L 1 ( P ) , a n d X - , = E ~ - ~ ( X - , _ ~a.e.1.
) IfB, =
00
l i m n & = n,=,B,), then X, + X, a.e. and in L1(P)-norm, so that
< <
{X,, B,, 1 n cm} is a martingale.

Proof We follow the argument of the first proof of Theorem 8, for con-
venience. If uk : A ++ JA X k dP, A E Bk, then vk is a signed measure aiid
the martingale property implies u l Bk = uk, k > 1. Since a signed measure is
bounded, and by the Jordan decomposition y = u ~ vwe ~let vk , = fic-fii,
where fi: : B, + R+ is a (finite) measure such that u f B , = ,f:i i > 1. Evi-
dently, fi: is Pk(= P ~k)-continuous. By the Radon-Nikod9m theorem, there
exist xi1)= dfi;/dpk, x?) = dfi,JdPk such that XI, = xi1) -xP) (because
uk = fiz - fi;), and {x;),Bk, k > l), 2 = 1,2, are positive decreasing mar-
tingales. Hence t o prove X k + X, a.e., it suffices t o prove that each positive
decreasiiig martingale converges a.e. [Note that the proof of Jordan-type de-
composition for decreasing indexed martingales is simpler than that for the
illcreasing case (cf. Lemma 5), since in the latter there need be no a-additive
u on a(Unyl B,) such that uBk = uk. Even though the Jordan decomposition
is valid for finitely additive set functions, their restrictions fi: (of u* t o Bk)
are not suitable, aiid t o obtain a useful splitting, one needs the coinputatioiis
given in Lemma 5.1
The proof is now essentially the same as in Theorem 8. Thus briefly, if
{X,, B,, n > 1) is a positive decreasing martingale, then {e-Xr7,B,, n > 1)
is a positive bounded decreasing submartingale by Lemma 4, and X, + X,
a.e. iff ecX1l + ecXx a.e. If {Z,, B,, n > I} is a positive decreasiiig L 2 ( P ) -
bounded submartingale, then E ( Z i ) = a, J a >0 as n + oo. Next (25)
implies, on coilsidering 0 < a, - a, + 0, letting first n + oo and later
m + cm,that E(Z,-2,)' + 0, SO that Z, + 2 , in L 2 ( P ) .If Y, = 2,-Z,,
>
then {Y,, B,, n 1) is a submartingale such that Y, + 0 in L 2 ( P ) .With this
the work of (26), (27) holds, since the maximal inequalities in the decreasing
case take the form for any X > 0,

-AP 1 min yk 5 -A
m<k<n 1> E(Y,) - E(Y,+).

[Compare with (11) and (12).] Hence


3.5 Martingale Sequences

as n + cm and then m + cm. It follows from this that Y, + Y, a.e. and that
Y, = 0 a.e., [since Y, + 0 in L 2 ( P ) ] . Hence Zn + 2, a.e. and in L 2 ( P ) .
This proves that X, + X,, a.e.
The uniform integrability follows from the fact that

(as defined in the first paragraph)

= y 1 [X,I > A] (Iul being the variation measure of vl). (38)

But P [X , > A] <


E ( XII)/A + 0 as A + oo, uniformly in n , and vl is
P-continuous. Thus {X,, Bn, 1 n < <
cm} is a martingale and the L1 (PI-
coilvergelice is an immediate consequence of the Vitali convergence (cf. The-
orem 1.4.4). The proof is finished.

Just as in the increasing case, the decreasing indexed submartingale con-


vergence can also be deduced from the corresponding martingale result.

>
Theorem 15 Let {X,, B,, n 1) be a submartingale o n (a, C, P ) , where
B, > B,+1 are a-subalgebras of C. T h e n X, + X, a.e. Moreover, the se-
quence {X,,B,, 1 n< < cm} i s a submartingale iff E(X,) >
K > oo, or
equivalently the submartingale i s uniformly integrable. (The last condition is
automatic for martingales, but not for submartingales.)

Proof For the convergence statement, one uses a form of Theorem 9, then
reduces to the result of Theorem 14 as follows. Define a1 = 0, and recursively
for n > 1,

Then A, > 0, &+I-adapted, aiid A, decreases as n increases. First suppose


that E(X,) >K > o o . Since oo > E(X1) >
E(X2) > >
. . . K , it fol-
lows that limn E(X,) >
K , aiid hence by (39) and the Lebesgue dominated
convergence theorem,
3 Conditioning and Some Dependence Classes

= E(X1) - lim E(X,)


n-00
< oo.
Heiice XA = X, A, is well defined, and {XA, B,, n
- >
1) is readily seen to
be a martingale. By Theorem 14, XA + X k a.e., since E((XA)+) E(X;) < +
E(A1) < oo. But 0 <
A, I, integrable, so that A, + A, a.e., and X, =
XA +A, + X k +A, = X, a.e. Because 0 A, < <
A1 and E(A1) < oo,
both {Xk, n >
1) aiid {A,, n >
1) are uniformly integrable. Heiice so is
{X,, n >
1). We conclude that {X,, B,, 1 n < <
oo) is a submartingale
because
ED- (x,) = E"- (x; + A,) > E"- (X; + A,) (since A, > A,)
=X k + A, = X, a.e.
On the other hand, if {X,, B,, 1 < n <
oo} is a submartingale, then
E(X,) > -oo, and so limn E(X,) > E(X,)
> -oo. This establishes the
last half. For the first part, let a > o o , aiid define Y," = inax(X,, a).
>
Then {Y,", B,, n 1) is a decreasing subinartingale (by Lemma 4) such that
lim,,, E(Y,O1) >
a > -oo. Thus by the preceding, Y,OI + Yg a.e., and
hence there is a set N,, P(N,) = 0, such that Y,OI (w) + Y z (w), w E 0 - N,.
Because Y,Ol(w) = X,(w) if inf, X,(w) >
a, it follows that limn X,(w) ex-
ists for almost all w in the set {w : inf,X,(w) >
a ) . Note also that if
a, > a,+1 j, -00, theii No, c No, aiid hence, if flo = n,"==, Nk,, , theii for
each w E Go,X,(w) + X, (w). Since E(X,) <
E(X1) E(x?)< oo, and <
{XL, B,,n >
1) is a subinartingale, X L + X & >
0 a.e. and E ( X & ) < oo.
Thus we have o o X, < < oo a.e. With this the theorem is completely
proved.

The following is an immediate consequence of Corollary 13 and Theorems


14 and 15:

Corollary 16 Let {Y,, o o < n < oo) be a sequence of r.v.s on ( R ,C , P)


such that for all n , IY,I < 21, where Z E L 1 ( P ) . If B, C B,+I C C are
a-algebras, and lim,,, Y, = Y, limn,-, Y, = Y-, exist a.e., then the
following are true with B, = o(U, B,), B-, = B, : n,
(i) lim,,, E".. (Y,) = E"- (Y,) , lim,
,-,E"" (Y,) = E"-- (Y-,) ax.
(ii) lim,,, E",I (2)= En- (Z),lim, ,-,E " I ~ ( Z )= En-- (2)a,,.

These limits also hold i n L1(P)-mean.

Two Applications We present here two useful applications of the pre-


ceding results. Some others appear in the problems section. There are nu-
merous refinements of the foregoing theory and the subject is one of the most
3.5 Martingale Sequences 197

active areas in probability. A few extensions will be treated later, for instance,
in Chapter 7.
In the theory of nonparametric statistics, a class of random sequences,
called U-statistics (originally introduced by W. Hoeffding in the late 1 9 4 0 ' ~ ) ~
together with their limiting behavior plays a key role. The latter can be stud-
ied effectively with martingale theory. Let us define these and deduce their
convergence properties.
Let {X,, n > 1) be an i.i.d. sequence of random variables, or more gen-
erally an exchangeable or permutable orsymmetrically dependent sequence, in
>
the sense that for each n 1, {X,, , . . . , X,,, ) aiid {XI, . . . , X,) have the same
joint distributions for any permutation ( i l l . . . , i n ) of (1,2,. . . , n ) (cf. Defini-
tion 2.1.11). If f : Rk + R is a Borel function such that f (xil, . . . , xili) =
f (21,.. . , x k ) , SO that it is symmetric and if either f is bounded or

then the U-statistics are the random variables {u;,,, n > k ) defined by

Thus if k = 1, f (2) = 2, then u{,, = x,,the sample mean, aiid for


1
1 x2)', one gets u[, = the sample variance. Other
k = 2, f (21,2 2 ) = 2 ( ~ -
specializations yield sample central moments, etc. (For example, for the third
moment,

so that E(u{,) = E ( X 3 ) for the i.i.d. case, and complicated higher symmetric
functions are possible candidates for other parameter estimations. The matter
is not pursued further.)
This sequence has the following general property:

Proposition 17 Let {UL,,,n >


k) be a sequence of Ti-statistics of a
symmetrically dependent family {X,, n 1) on (R, > C , P) and a symmetric
Borel f : R" R such that E ( f(X,,, . . . , X , , ) ) < oo. If FI.,,,= F, =
~(uL,,, >
rn n ) , then {UL,,, Fn,n >
k) forms a decreasing martingale and
f
F- (f(X1, . . . ,Xk)) a.e., a n d i n L 1 ( P ) . I f t h e X , arei.i.d.,
-

U;,,+U~,~-E
then u L , ~= E(f ( X I , . . . ,XI.,))a e . , and hence is a constant.
Proof First note that, for symmetrically dependent r.v.s, by definition
for each subset i l < i2 < . . . < i k of (1,2,. . . , n ) , 1 < < k n < oo, the
joint distributions of ( X i l , . . . , X,,) and ( X I , . . . , X k ) are identical. Hence if
198 3 Conditioning and Some Dependence Classes

f : R' + R is a symmetric Borel function, then the r.v.s f (X,, , . . . , X i , ) and


f ( X I , . . . , X') are identically distributed for any k-subset of (1, . . . , n ) . If the
latter r.v.s are integrable, then they have the same conditional expectations.
This follows from a simple computation. Indeed, let X , Y be identically dis-
tributed and C c R be a Borel set. Thus A = X p l ( C ) , B = Y p l ( C ) are in
C , and P ( A ) = P ( B ) . If D E Fn,we have

S, E3" ( X A ) dPF,, = P(A nD) = P ( B nD) (by the equidistributedness)

Since the extreme integrands are &-measurable, EFr7 ( X A ) = EFn(XB) a.e.


Considering linear combinations, we deduce that the same equality holds for
all simple functions, and then by approximation E 3 n (Y) = Ezb(Z). Hence
in our case this implies

= O ( U { , ~m
Since Fn+1 , > n + I ) , the first r.v. in the sequence which is

subsets of the integers of (1,. . . , n ) and (lif I) of (1, . . . , n + 1) and averaging,

It follows that {uin1 3,.>n 1) is a decreasing martingale and E ( uffn) <


E ( f ( X I , . . . , X k ) ) < cc by hypothesis. By Theorem 14, this martingale is
uniformly integrable, so that

In case the r.v.s are i.i.d. also, then Foe, being the tail a-algebra, is degen-
erate, so that uf,, is constant by Theorem 2.1.12. The L1(P)-coiivergmce
implies that E(U;+) tends to E(f (XI, . . . , X')), as asserted.

Taking k = 1, f ( z ) = z, the above result implies the SLLN for sym-


metrically dependent r.v.s, since then U l n = ( l / n ) ELl XI,. We state it for
reference as follows:

>
Corollary 18 Let {X,, i 1) be symmetrically dependent integrable r.v.s
>
on (R, C , P ) . Then, if ';fn = a(Xk, k n ) and ';fm = 0, F n , we have

1 Exz+ Eix(xl) a.e. and in L 1 ( p ) , as n + cm. (43)


12
2=1
3.5 Martingale Sequences 199

Remark If the X, are i.i.d., then the above result becomes, since X1 is
independent of F,, so that E ~ = ( X = ~ )E(X1), the sufficiency part of the
SLLN given in Theorem 2.3.7. The necessity part, however, is n o t implied by
the martingale theory and one has t o refer t o the original work presented there.

The decreasing martingales always appear if one formulates "ergodic se-


quences" as martingales. The latter are discussed in Section 7.3. Those re-
sults are generalizations of the SLLN t o other dependent (including ergodic)
sequences. Here we consider another application t o likelihood ratios, which
plays a useful role in the theory of statistical inference. One uses instead the
increasing indexed martingales for these problems.
Let X I , X 2 , . . . be a sequence of random variables on a measurable space
(0,C ) . Suppose one is given two probability measures P and Q on C, and
it is desired t o determine the correct uiiderlyiiig measure, P or Q, based
on a single realization of the sequence. The problem is made precise as fol-
lows. Suppose B, = o ( X 1 , . . . ,X,) c C aiid P, = P B,,Q, = Q B,. Let
Y, = dQk/dP,, the Radon-Nikod$m density of the P,-continuous part of Q,
relative t o Pn. It is called the likelihood ratio for certain historical reasons. If
n is increased without bound, then a variational principle (in this context it
is called a "Neymaii-Pearson lemma") says that the event En,k= [Y, > k]
has P-probability "small" aiid Q-probability "large" for a suitable k . Hence
if the realization w E for large n, then we take Q t o be the correct prob-
ability and reject P and the opposite action otherwise. The discussion on the
appropriateness and goodness of this rule belongs t o the theory of statistical
inference, aiid it is not possible t o go into that topic. (For a mathematical
treatment aiid consequences, see eg., Rao (2000), Chapter 11.) Here we estab-
lish the convergence properties of the likelihood ratio sequence {Y,, n > 1)
defined above.

Theorem 19 Let X I , X 2 , . . . be a sequence of random variables o n a


measurable space (R, C) and suppose that it i s governed by one of two prob-
abilities P o r Q o n C. If B, = a ( X 1 , . . . ,X,), and Y, = dQE/dP, i s the
likelihood ratio, as defined above, then {Y,, B,, n > 1) i s a nonnegative su-
permartingale o n (0,C , P),Y, + Y, a.e. [PI, and Y, = dQ&/dP,, where
Q, = QI B,, P, = PIB, with B, = a ( X n , n > 1).

+
Proof By hypothesis, if Q, = QE Q i is the Lebesgue decomposition
of Q, on B, relative t o P, into the absolutely continuous part Qk and the
singular part Q i , so that Q i ( B o ) > 0 only on Bo with P,(Bo) = 0, we have
for any B E B,,
3 Conditioning and Some Dependence Classes

However, if Bo c B is maximal with Qk(Bo) > 0 [Bo exists], aiid similarly


Bo c B with Q ; + ~ ( B ~ >
) 0 [so that P , + ~ ( B=
~ )0, P,(Bo) = 0, and since
P,+l IB, = P,, P,+l (Bo) = 0 also], it follows that Bo > Bo. Hence Q;+, ( B ) -

Q i ( B ) = Q,+l(N,+l n B ) Q,(N, n B ) , where N, is the singular set of Q,,


-

so that N, c N,+l, aiid thus it is nonnegative. Consequently, (44) becomes

Since the extreme integrands are B,-measurable, and B in B, is arbitrary, (45)


implies that E'~.(Y,+~) < Y, a.e. aiid {Y,, B,, n > 1) is a supermartingale,
<
as asserted. Siiice JQYn dPn = E(Yn) Q ( R ) = 1, Theorem 10 (rather, its
counterpart for supermartingales) implies that Yn + Y, a.e. It remains to
identify the limit. This can be done using the method of proof of Theorem
8. For variety, we present here an alternative argument reducing the result to
Theorem 8 itself.
+
Let p = P Q : C + Kt+, which is a finite measure dominating both
+
P and Q. If g = dQ/dp, fo = dP/dp, and if Q = QC QS is the Lebesgue
decomposition relative to P, then let gl = dQc/dp,g2 = dQVdp, so that
g = g1+92 a.e.[p].Note that [gl > 0] c [fo > 01 and P([g2 > 01 n [fo > 01) = 0.
Let No = [(g2/f O ) > 01 = [(g2/f0) = + m ] , SO that P ( N o ) = 0. I f f = dQC/dP,
then by the chain rule for Radon-Nikod9m derivatives one has

a.e. [p].

Consequently for each a >0

It follows from (47) that [(g/fo) > a] = [(gl/fo) >a] a.e. [PI.This also
shows that these sets do not depend on the auxiliary dominating measure
p, and any other such p' can be used. Siiice f = gl/fo, by (46) one has
0 5 f (w) 5 (g/ fo)(w) < m for a.a.(w). Replacing C by B, and B,, we deduce
that f, = (9nIf0,n) and fa = (ga/fo,,) a.e.[P]. But {g,, &, 1 5 n 5 m)
aiid {fo,,, B,, 1 < <n oo) are martingales on (a, C, p) by Theorem 8, so
that g, + g, a.e. [p] and in L1(p); similarly fO,, + fO,a a.e. [p] and in
L1(p), with fo,, = 0 a x . only if the measure P, = 0. Siiice this is not
the case [Q(Q) = 1 = P ( G ) ] , we get f, = g,/fo,, f, = g,/fo,,
+ =
(dQ,/dp,)/(dPa/dl-l,) a.e., and the last ratio is dQ&/dP, a.e.[p] (hence
[PI),by (46). This proves the theorem completely.
The preceding result is usually given in terms of the image measures, which
are distribution functions. This can be translated as follows. Let Fn,Gn be
the n-dimensional d.f.s of P, and Q,, i.e.,
3.5 Martingale Sequences

Suppose that F, : Rn + [0, I], G, : Rn + [0, 11 are absolutely continuous


relative t o the Lebesgue measure of Rn, with densities f, and g,. Then they
are Bore1 functions and fn(X1 (.), . . . , Xn(.)) : R + R+ is a random variable.
By (46) and (47) and the ensuing discussion, it follows that

Here xi = Xi(w) is the value of Xi for the realization w. Thus the ratios
(gn/ f n ) form a supermartingale aiid their values can be calculated in any
problem. It should be noted, however, that each gn/fn : Rn + R+ is defined
and as n changes their domains vary and are not nested. Thus {g,/ f,, n 1) >
cannot be meaningfully described as a supermartingale on the spaces {Rn, n >
11, though informally it may be and is so described in the literature. The
rigorous definition (and meaning) is that given by (48).
An interesting consequence of the above theorem is the followiiig result,
which we present as a filial item of this chapter.

Theorem 20 Let ( 0 ,C, P) be a probability space and u : C + R be a


a-additive function (a signed measure). If Bn c Bn+l c C are a-algebras,
vn = vl Bn, Pn = P Bn, and Xn = dvi/dPn, where v i is the Pn-continuous
part of v, relative to P,, then X, + X, a.e. [P,], X, = dv&/dP,, where
B, = a ( U n 2 1 B n ) . Moreover, the adapted sequence {X,,B,,n 1) is a >
q u a s i - m a r t i n g a l e , in the sense that

Remark The term "quasi-martingale" was used for a more general process
in the middle 1960s by D. L. Fisk and it was also termed an F-process and
a (*)-process. It is an adapted process satisfying (49), called a star condition.
The term has not yet become standard. We use it here since in the discrete
case they all coincide. Clearly every martingale and every L1(P)-bounded
sub- or superinartingale (see below) is a quasi-martingale. This class presents
a nice generalization of these important processes. The main convergence re-
sult above is originally due t o E. S. Aiidersen and B. Jessen, who gave a direct
proof as in Theorem 8.

Proof Let u = u+ - u be the Jordan decomposition of u on C. Then


on (a, C), the finite measures v* and P satisfy the hypothesis of Theorem 7.
(It is irrelevant that v+, v are not probabilities. That they are finite mea-
sures is sufficient.) If f:i = u * Bn and Y, = d(fiL)c/dPn,2, = d ( f i ~ ) ~ / d P , ,
202 3 Conditioning and Some Dependence Classes

then X, = Y, - Z, and {Y,, B,, n >


1) and {Z,, B,, n > 1) are positive
supermartingales and hence converge a.e. Thus Xn + (Y, - 2,) = X, a.e.
and

It is now asserted that the Xn-sequence satisfies (49). In fact,

>
(since E " ~ ~ ( Z , + ~ ) Z, and similarly for Y,),

= E(Yl) - lim E(Yn)


ni00
+ E(Z1) - lim E ( Z n )
n+00

< E(Yl + Z1) - E(Y, + 2,) (by Fatou's lemma)

This establishes the result.

It is an interesting (and not too difficult a) fact that every L1(P)-bounded


adapted sequence satisfying (49) (i.e., every quasi-martingale) can be ex-
pressed as a difference of a pair of iioiiiiegative supermartingales. Thus the
preceding result admits a further extension of the martingale theory. We omit
a specialized study of the subject here.
This chapter, more than anything else, shows vividly the individual char-
acteristics of probability theory and its potential growth as well as its in-
teractioiis with many other areas of analysis and applications. Several other
interesting consequences of, and complements to, this theory are included in
the following collection of exercises.
Exercises 203

Exercises
1. Let X , Y be two square integrable random variables on ( R , C, P ) . If,
further, E(XI Y) = Y a.e. aiid E(YI X ) = X a.e., show that X = Y a.e.. The
same result is true even if X , Y are only integrable, but the proof is slightly
more involved. (For the latter use Theorem 1.4.5iii and Theorem 1.9iii.)

>
2. Let { X n , n 1 ; Y ) be a s e t of r.v.s in L ~ ( Q , C , P )and
, B = a(X,,n>
1). Consider the subspace L2(fl,B, P) of L 2 ( f l ,C, P).Each X n is in the sub-
space, but Y need not be. Show that there is a unique Yo E L 2 ( f l ,B, P)
such that E(IY Y o 2 ) = inf{E( Y XI2) : X E L 2 ( f l , B ,P)), aiid that
- -

Yo = E"(Y). This Yo is called the best (nonlinear) predictor of Y based on


>
{X,, n 1) and it is a function of the latter (by Proposition 1.4).

3. Let 4 : R+ + R be a coiitinuous symmetric convex function, aiid Y


be an r.v. on ( R , C, P) such that E(4(2Y)) exists. (Here 2 can be replaced
>
by any a > 1.) If a = E ( Y ) , and X = { X n , n 1) is a random vector, let
Z = E(YIX)[= E"(Y),B = a ( X , , n >
I ) ] . Show that

with strict inequality if 4 is strictly convex aiid Y is not B-measurable. In


particular, if 4(z) = z2,theii we have Var Y >Var 2, so that in estimating
the mean a by Y or Z , the latter is preferable to the former, which does not
use the information about the Xn and thus has a larger variance. This is of in-
terest in statistical applications, and the inequality when 4(z) = x2 has been
established in the middle 1940s by D. Blackwell aiid C. R. Rao independently.

4. Let X1,X2 be a pair of r.v.s on (0,C , P) and p, = P o x,T',~ = 1 , 2 ,


and u = P 0 ( X I , Xz)-l be the image measures on R and R 2 . If p = p1@pz is
the product measure, u is p-continuous (is this automatic?), and h ( z l , 2 2 ) =
du/dp(z1,z2) a.e. [p],show that for any bounded Bore1 fuiictioii g : R2 + R,
the mapping f : 2 2 H E (g(X1, zz)h(X1, 2 2 ) ) is p2-measurable aiid integrable
and, moreover, f (Xz) = E ( g ( X l ,X2) X 2 ) a.e. [PI.

5. Let {X,,n > 1) be a uniformly integrable sequence of r.v.s on


( R , C, P) aiid B c C be a a-algebra. If E"(x,)+ E"(x)a.e., theii show
that {X,, n > 1) is conditionally uniformly integrable relative to B in the
sense defined prior to Proposition 1.6. [Hint: If U;(C) =SC: ~ " ( 1 XnlxA)dP,
then limp(A),o u i ( C ) = 0 uniformly in n and C. Note that there is a
"derivation basis" .F --t {w), (cf. Proposition 2.9) such that ( D ~ U ~ ) ( W + )
~ " ( 1 Xn X A ) ( W ) a.a.(w). These two assertions imply that
204 3 Conditioning and Some Dependence Classes

6. Let (0,C, p) be a (a-) finite space and L"(p) be the usual real Lebesgue
<
space on it, 1 p < cm. An operator T : L"(p) + L"(p) is termed positive if
Tf > 0 a.e. for each f >
0 a.e. Establish the following statements:

(a) If T is a positive linear operator on LP(p) + LP(p), theii (i) T(f)I <
T ( f 1) a.e., (ii) f, > 0, f, < 9 E L"(P) + sup, T(f,) < T(sup, f,) a.e.,
and (iii) a positive linear operator T : L"(p) + L"(p) is always continuous
(=bounded).
(b) Let T : LP(p) + LP(p) be as above, and fnl < g E LP(p). If fn + f
a.e., theii T(f,) + T ( f ) a.e., and in LP(p). (In other words, the assertioiis of
Theorem 1.3 for conditional expectation operators extend t o a more general
class. Here T need not be faithful or a projection.)

>
7. (i) If { X n , n 1) c LP(P) on a probability space (a,C, P), then
X, is said to converge weakly to X E LP(P) if for each Y E Lq(P), where
+ <
ppl q p l = 1 , l p < oo, E(X,Y) + E ( X Y ) as n + cm. Show that for any
a-algebra B c C, E"(x,)+ E"(x)weakly if X, + X weakly. [Hint: Verify,
by Proposition 1.2, that E(YE"(x)) = E(xE"(Y)).] This property of E" is
called "self-adjointness."
(ii) Let {X,, n > 1) be as in (i) and T : LP(P) + LP(P) be a con-
tiiiuous linear operator. It is known that there is a continuous linear T * :
L4(P) + L4(P) such that E ( Y T ( X ) ) = E ( X T * ( Y ) ) ,X E L"(P), Y E L4(P),
and if 1 < p < oo, then (T*)* = T. Show that if X, + X weakly in
L"(P), 1 < p < cm, then T X n + T X in the same sense. (This reduces to
(i) if T = E".)

8. Let ( R , C) be a measurable space, and A, : C + [O, I], i = 1 , 2 , be


two probability measures such that X2 << A1. If B c C is a a-algebra,
X is a nonnegative r.v. on R and E: : L1(Xi) + L1(Xi),i = 1 , 2 , are
the corresponding conditional expectation operators, show that E?(X) =
E ~ ( X , / E ~ ( ~ ) ) ~where
. ~ . [ gX=~ dAa/dA1
], a.e.[A1].Interpret the result with
X = X A if the coiiditioiial probabilities P?(.), i = 1 , 2 , are regular. Deduce
that E 2 ( X g ) = E1(XIg) a.e. [A1], for all bounded r.v.s X , so that the (dis-
tinct) conditional expectations Ei(. g), i = 1 , 2 , agree on L1 (XI).

9. Let (a,C) be a measurable space and P = {P8,B E I) be a family of


probability measures on C. A a-algebra B c C is said to be suficient (also
termed a "sufficient subfield") for the family P if the conditional expectation
operator E: on L1(P8) is the same for all 6' E I, so that E:(x) is invariant
as 6' varies over I, for any bounded r.v. X on 0. Clearly B = C is sufficient
for P, since then E: =identity. Also, the last part of Problem 8 above implies
that if Pel << Peaand g = dPe1/dP8,, theii B = o(g) is sufficient for the pair
(Pel,Pea). This extends as follows:
Exercises 205

(a) Suppose that {Po,6' E I) is a dominated family, in the sense that there
is a finite (or a-finite) measure X : C + R+ such that Po << X for all 6' E I.
[Note that if X is a-finite, then there exist A, E C, disjoint, 0 = U,>l- A,, 0 <
X(An) < oo,by definition, so that

is a finite measure and X << u << A; thus in the domination questions we


may assume that X ( 0 ) < oo, and we do so here.] Let go(w) = (dPe/dX)(w)
a.e. [A]. Then a a-algebra B c C is sufficient for {Po,Q E I) iff each go(.)
is B-measurable, Q E I. [Hint: If E ~ ( x =)E'(x), for all Q E I, then the
domination hypothesis implies that E"(x)= E ~ ( x )
a.e.[X], aiid hence, if
tje(w) = ( d P Q / d X n ) ( ~ where
), = PelB, then taking X = X A , A E C , we
have

Hence dPe/dX = ge a.e. It is B-measurable, and Qe = go a.e.[X].The converse,


that go is B-measurable +-E ~ ( x =) E'(x),is similar. If B is the smallest
a-algebra relative t o which all the {go, Q E I} are measurable, it is called the
minimal suficient a-algebra].
(b) If X, : 0 + R , i = 1 , .. . , n , are r.v.s and Q n ( A , Q )= Pe(X-'(A)) is
the (image) probability distribution on Rn, X = ( X I , . . . , X,), suppose that
p,(= X o X p l ) dominates Qn(., Q), Q E I. Let T : Rn + Rm be a Borel
function. Then T ( X ) : R + Rm is called a statistic. It is termed a suficient
statistic for Q,(., 0) if B = a ( T ( X ) ) is sufficient for {Po,Q E I},in the sense
of (a). Show, from (a) that T ( X ) is sufficient for I&,(., 6'),6' E I) iff

where c(.) is a Borel function independent of Q aiid pn(., 0) depends on z only


through the fuiictioii T . [This is called a factorization criterion. The concept
of sufficiency originated with R. A. Fisher in the early 1920s and developed
later in increasing generality by J. Neyman, P. R. Halmos and L. J. Savage,
aiid R. R. Bahadur.]
(c) If P = {Po,Q E I} is iiot dominated, results similar t o the above are
iiot always valid. However, we can assert the following:

(i) If {B,, n > 1) c C is a monotone sequence of a-algebras each B,


being sufficient for P, then B = limn Bn is also sufficient for P. (We need t o
use Theorem 5.9 or 5.14 appropriately.)
(ii) If {B,, n > I} is an arbitrary sequence of sufficient 0-algebras from
C for P, and each B, contains all the Po-null sets for each 6' E I, then
206 3 Conditioning and Some Dependence Classes

B, = B is also sufficient for P.

In the other direction, if B1 is generated by a countable collection of sets,


and B2 is sufficient, then o(B1 U B2) is sufficient for P. [This result uses not
only Theorem 5.9, and (i) above for the sequence Bk,n >
I ) , but
another result on the a.e. convergence of {S,f,n > l), f E L ~ ( P ~where
),
Sn - E"~I~ " 1 1 - 1 . . . ~ " 1 The
. result of this part is in D. L. Burkholder Ann.
Math. Statist., 32(1961), 1191-1200, and should be consulted on details. A
thorough discussion of sufficieiicy is given in Chapter 6 of Conditional Mea-
sures and Applications, (1993, 211d. ed. 2005) by the first author.]

10. This problem contains two illustrations of sufficiency discussed in the


preceding exercise:

(a) Let {(a, C , Po),B E I) be a family of probability spaces, ( X I , . . . , X,)


be a vector of symmetrically dependent r.v.s such that the distribution (or im-
age probability) function on Rn has a density f,(.; Q) relative to the Lebesgue
measure. [The hypothesis implies f, (x,, , . . . , x,,, ; Q) = f, ( X I , . . . , x,; 8) for
each permutation (21, . . . , i n ) of (1,. . . , n).] Let X ( l ) < . . . < X(,) be the or-
der statistics of the given vector, T ( X ) = T(X1, . . . , X,) = ( X ( l ) ,. . . , X(,)).
Show that o ( T ( X ) ) is generated by {T(X)-l (A) : A c Rn, X A is symmetric
Borel), and that T ( X ) is sufficient for X , by verifying that, for all Q E I,

where the summation ranges over all n! permutations of (1,. . . , n ) and g :


Rn + R is any bounded Borel function.
(b) Let {(R,C, Po),8 E I) be as above, and X : L? + R be an r.v. such
that

for suitable A(Q) > 0, C ( t ) > 0, Bj(Q) E R, and p ( . ) is a-finite on R


Such a collection is called an exponential family of distributions. Show by
use of Problem 9b that for any X I , . . . , X n i.i.d. having the above distribu-
tion, T ( X ) = (Cy'l Tj(Xi),j = 1 , . . . , k) constitutes a sufficient statistic for
8, based on X = ( X I , . . . , X,), n >
1.

11. If X I , X 2 on ( 0 ,C, P) are jointly (or bivariate) normally distributed,


so that
Exercises

with 0 < 0: < co, p < 1,let Z = X1+X2. Show that E " ( ~ ) ( =x aZ~a.e.[P]
)
is a representation of the conditional operator, where a is some real number.
Verify that E(Z)' 5 a:/a: with a1 = (a: +pala2)(a: +2pala2 +a;)-'. [First
compute the d.f. of ( X I , Z).] Is a = a l ?

12. In Problem 9c (converse), we assumed a 0-algebra B1 to be countably


generated. This is still more general than that generated by a partition. Thus,
if B is countably generated, show that there exist at most contiiiuuin many
atoms At in B (i.e., A c At, A E B + A = 8 or A = At) such that Ut At = 0,
and each B E B is a union of these atoms.

13. We now state a form of the classical example showing the nonex-
istence of a regular conditional probability. Let (a, C, P) be the Lebesgue
unit interval, and A c 0 be a nonmeasurable set of Lebesgue outer mea-
sure one and inner measure zero. Let 2 be the a-algebra generated by C
and A. Verify that A E 5 iff A = (A n B ) U (Ac n D ) , B , D E C . Let
+
P(A) = a P ( B ) (1 a ) P ( D ) ,0 < n < 1. Show that P is well defined on
-

2,is a measure, P C = P, and that P ( . I C) on C, defined by the functional


equation (2.1) is not regular, i.e., that P ( . C ) ( w ) : 2 +[0,1] is a measure for
a.a. w E R is not true [even though P ( A2)(.) is measurable for C].However,
P ( . C) is a vector-valued a-additive set function satisfying Proposition 1.1ii
aiid l.liii, but the Vitali convergence theorem is false for it. [In contrast to
Proposition 2.2, this result depends on the axiom of choice through the exis-
tence of a Lebesgue iioiiineasurable set A.]

14. In the opposite direction of the above result, here is a positive asser-
tion. If (S,Z) is a measurable space, it is called a Borel space if there exists a
Borel subset A c R aiid a one-to-one bimeasurable, onto, mapping f : S + A
[so f is (Z, R(A))-measurable aiid the inverse f : A + S is (R(A),Z)-
measurable, where R ( A ) is the trace a-algebra of the Borel a-algebra R of
R.] It is known that every separable complete metric space with its Borel
a-algebra is a Borel space. [Cf. Parthasarathy (1967), Section V.2, on Borel
spaces.] If B c C is any a-algebra of a probability space ( R , C, P),(S,Z) is a
Borel space, aiid X : R + S is an r.v. [i.e., X p l ( Z ) c C], then a conditional
distribution P" O X - ' = P(., .) : Z x 0 + [O, 11 exists, and if X ( 0 ) E Z, then
P" itself is regular on C1 = XP1(Z). [Hint:Consider Y = f O X : 0 + A, and
by Theorem 2.5, P" o Y-l gives a version which is a regular conditional dis-
tribution, Q ( . ,.), aiid let P(., .) = Q( f (.), .). Since f = (f P ' ) ~ ' , it preserves
countable operations. The second part is as in Proposition 2.6.1

15. We present here a useful complement to the preceding problem. Let


(0,C, P) be a probability space, B c C a countably generated a-algebra.
208 3 Conditioning and Some Dependence Classes

Show that there is a regular conditional probability Q(., .) : E X0 + [O, 11 such


that (i) Q(.,w) is a-additive for all w E 0 - N , N E B, P ( N ) = 0, (ii)Q(A, .)
is B-measurable, and (iii) for each w E R - N , if A(w) = n{A : w E A E B},
theii A(w) E B aiid Q(A(w),w) = 1. [If P is the countable collection with
o(P)= B, theii Q(A, .) = X A a.e., A E P, aiid theii the same holds if A is re-
placed by the atom A(w) of B which is B-measurable. The result is somewhat
similar t o the partition case.]

16. This problem is a kind of converse t o the preceding one. Let ( R , C, P)


be a probability space and B c C be a 0-algebra containing a countably
generated a-algebra Bo c B. Let P" be the conditional probability function.
If P" is regular and if on each Bo-atom P" is indecomposable (so it is a
P"-atom), then verify that P" can be decomposed as follows: there is an
N E B, P ( N ) = 0, aiid fl N = U,,, At, At is a Bo-atom, the cardinality of
-

I being at most of the continuum, and

[This nontrivial result is in D. Blackwell, Ann. Math. 43, (1942), 560-567.1

17. (a) Let { A n , n >


1) be a partition of fl in ( R , C, P),P ( A n ) > 0.
If B E C, P ( B ) > 0, then show that the following, called Bayes formula, is
valid:

(b) If ( X , Y) : 0 + TK2 is a random vector on ( R , C , P) whose dis-


tribution has density (relative t o Lebesgue measure) fx,y and if f x , f y are
the (marginal) densities of X and Y, suppose fx(x) > 0, f Y (y) > 0 a.e.
(Lebesgue). Then verify the contiiiuous "analog" of the above formula if
f x Y , f Y I X are supposed t o satisfy:

These two forinulas find some applications in statistical inference.

18. Consider the family { X ( t ) ,0 t< < 1) of 2.10 (the counterexample). To


find the coiiditioiial density of the a.e. derivative Y of X at 0 given X ( 0 ) = a ,
let B6 = [ ( X ( t ) a ) 2 t 2
- + < S2, for some 0 < t < 61. Verify that B6 is
measurable aiid the conditional density of Y (the derivative of X at t = 0)
given X ( 0 ) = a , by means of approximations using the BB,is
u 2/~e-~2/2cu"
liin P ( [ Y < y ] 1 Bs) = o o < y < oo,
610 Spmm2/TGkv"2"%dv du,
Exercises 209

where a2 = E ( Y 2 ) > 0. Thus we have yet another form of the density de-
pending on the derivation basis which is different from all the others!

19. Here is an application which will be analyzed further. In certain prob-


lems of population genetics, gene coinbiiiatioiis under random illfluelices can
be described as follows. From generation to generation genes combine between
sexes I and 11, which may be identified as a game between two players with a
fixed fortune. In this terminology the game can be stated as: A random portion
X, of the genes (or fortune) is retained by I at the nth generation (or stage)
aiid Y, is the random proportion given by I1 to I at the nth generation. Thus
- the fortune of I at stage n , aiid 2, that of 11, then we assume that
if Z, is
+
Zo Zo = 1 (by normalization), so that we get Z, = X,Z,-l +
Y,(1 - ZnP1)
and 2, = 1 - 2,. Suppose that (X,, Y,) : R + TK2 are independent bivariate
random vectors and PIO < X, <
1,O < Y, <11 = 1. Also, let (X,, Y,)
be independent of ZnP1. Then show that {Z,,n >
1) is a Markov process.
(Compare with Example 3.3.)

20. (Continuation) Suppose that the (X,, Y,), n = 1 , 2 , . . . , are i.i.d. as


(X, Y) and P[IX -Y I = 11 < 1. If pk ( n ) = E(z:),show that limn,, pk (n) =
n k exists for each k = 1 , 2 , . . . (use induction), aiid that

Thus the limiting moments do not depend on the initial distribution of the
r.v. Z o What happens if P[IX Y I = I] = I? [It results from the Helly-Bray
D
theorem of next chapter that Z, + Z and n k = ~ ( 2 " )The . last equation
follows from Problem 10 of Chapter 2 also, if the convergence result is given,
since {Z,, n >I}, being a bounded set, is uniformly integrable. This result
(and the next one) appears in DeGroot and the first author (1963).]
D
21. (Continuation) Let Z, + Z as in the preceding problem, keeping the
same assumptions and notation. For some applications, indicated in Problem
19 above, it is of interest to find the limit Z-distribution. If F, is the distribu-
tion of Z, and F that of Z , using the relation Z, = X,ZnP1 + Yn(l - Zn-l)
aiid knowing that F, + F, show that for each n (the intervening conditional
probabilities are regular and)

and, since (X,, Y,) has the distribution of ( X , Y) deduce that


210 3 Conditioning and Some Dependence Classes

is the integral equation determining F, which depends only on the distribution


of (X,Y) and not on the initial distribution of Zo. If (X, Y) has a density
g(., .), so that F' = f z also exists a.e., show that f z is the unique solution of
the integral equation

+
with a ( z , t) = max[O, (x t - l ) / t ] and b(z,t) = min[(z/t), 11. It is of interest
to solve the integral equation (*) for general classes of g(., .), but this does
not seem easy. The reader is urged to work on this interesting open problem.
As an example, verify that if X , Y are independent aiid have densities g x
and g y , given (for p > 0, q > 0) by the beta density

0 < z < 1 , 0 < y < 1, theii f z of (*) is also a beta density, given as

<
22. (a) Let X I , . . . , Xn be i.i.d. with a d.f. given by F(z) = 0 if z 0, = z
if 0 < z < 1, and = 1 if z > 1, called the uniform d.f. If X I , . . . ,X: are the
order statistics, show that a coiiditioiial density of X I , . . . , X i given Xf =
c , , O < c i < l , i = k + l , . . . ,n , i s g i v e n b y

< <
and = 0 otherwise, aiid hence that {Xz, 1 k n} forms a Markov process.
Deduce that, if X I , . . . , X n are i.i.d. with a contiiiuous strictly iiicreasiiig d.f.,
theii their order statistics also form a Markov process.
(b)We have seen in the proof of Theorem 3.9 that Zk = ~ f Ui,=where ~
the Ui are independent exponentially distributed random variables. Changing
the notation slightly, suppose that S, = Cr=l Xi where the Xi are indepen-
dent, each exponentially distributed, i.e., P ( X i < z) = 1 e P x , z = 0, so
-

that S, is iiicreasiiig (in the proof 2, has a reverse labelling!) Show that,
using the change of variables technique of elementary probability, the vari-
s,,+1k = 1 , . . . , n have a multivariate beta (also called a Dirichlet)
ables Yk = Sk-,
Exercises 211

distribution. Thus, Yk and X i of part (a) above have the same (joint) distri-
bution, the latter being obtained using a random sample from the uniform
distribution on the unit interval.

23. Complete the details of proof of Theorem 4.8 on the existence of con-
tinuous parameter Markov processes, and also the details of Theorem 4.7 t o
use it there.

24. By Definition 3.1, an ordered set of random variables (or vectors)


{ X t , t E I) is Markovian if the 0-algebras {a(Xt), t E I) form a Markovian
family. Let {X,,n > 1) be a family of r.v.s, and Y, = (X,+l,. . . ,X,+k) :
0 + R%e the sequence formed of the X,. Show that {Y,, n 0) is Marko- >
vian iff the X n satisfy the condition

< <
where Bn = o ( X i , 1 i n). Thus if k = 1, we have the ordinary Markovian
condition, aiid if k > 1, the X, are said t o form a kth order (or multiple)
Markov process. (Follow the proof of Proposition 3.2.) The preceding rela-
tion shows that many properties of multiple Markov processes are obtainable
from ordinary (vector) Markovian results. Show that any kth-order Marko-
+
vial1 family is also (k 11th-order Markovian for all integers 1 > 0 but not
if 1 < 0. (This is an important distinction t o note, since in the theory of
higher-order "stochastic differential equations" of the standard type, the so-
lutions turn out t o be multiple Markov processes, but not of lower order ones.)

25. (a) As an application of Theorem 4.5 (or 4.71, show that there exist
multiple Markov processes (on suitable probability spaces) of any given order
k.
(b) Using Theorem 4.11, extend Problem 5(c) of Chapter 2 as follows.
Let (0,C, P) be a complete probability space, and {B:, x E R, t E I) c C ,
where I is any index set and B: c B:, a.e. for any x < x'. For each t E I,
there is a (unique) r.v. Xt : R + R such that [Xt < z] c B: a.e. and
[Xt > z] c (B:)" a.e. Show that for each t l , . . . , t,, if PIBZl nB2 n . . .nB$;, ] =
>
Ftl,..,trL( X I , . . . , x,), (+), then {Ft,,. . , t r Lt,, E I,n 1) is a compatible fam-
ily of d.f.'s and {Xt, t E I) is a process with the F ' s as its d.f.'s. [Thus the
process is determined solely by (R, C, P).]Conversely, any compatible fam-
ily of d.f.'s as in Theorem 4.11, determines a probability space and a family
{B:, x E R,t E I) satisfying (+). Thus we have an adjunct of Theorem 4.11.

26. Extend Proposition 3.2 t o kth-order Markovian sequences.

27. As an example of multiple Markov sequences, let {X,, n >


1) be a
Markov process on ( R , C ) aiid define S, = C;=,X k . Show that {S,, n 1) is >
a second-order Markov sequence. [Hints: it suffices t o verify that {f (S,),n >
1) has the desired property for each bounded Bore1 f on R.In particular, take
212 3 Conditioning and Some Dependence Classes

ft(x) = eZtx,tE R, and verify that ~ ~ l ( f ~ ( S = , +E ~ ~


))( )~( f t~( s n ~
+l)>
) ~ ~ ~
a.e., where .En= a(S1,.. . , S,). Argue that this suffices.]

28. Let {X,, n >1) be a sequeiice of r.v.s on (a, C , P) with a finite state
space (=range), say, T = ( 1 , 2 , . . . , s). If the sequeiice is a kth-order Markov
process ( = a chain in this case), it can be regarded as an ordinary Markov chain
with values in TIC c TKk, by Problem 22. Let Yn = (X,+l,. . . , X n + k ) , n 1, >
be this chain. For each j, 1 E T', let pi;) = P([Y, = 111 YnPl = j) be the
transition probability. Suppose that the one-step transitions are independent
of n (i.e., the chain has coilstant or stationary transitioii probabilities.) Find
the Chapman-Kolmogorov equations in this case. (Note that, of the possible
s2%alues of the transitions, all but values vanish in this representation.)

29. Let {X,, n >


1) be a Markov chain on ( R , C , P) with a countable set
of states, denoted coiiveiiieiitly by the natural iiuinbers {1,2,. . .) and suppose
that the chain has one-step stationary transitions, i.e.

is independent of n > 1. The structure of the chain is then defined by the


>
matrix ( p i j , i , i l ) , where pij> 0, C j 2 1 p i j = 1. The chain is irreducible
if any pair of states il # i2 can be reached from one to the other in a finite
number of steps, in that if (pj;)) = (pij)l', then > 0 for some n > 0. A
>
sequence {mj, j 1) of positive numbers is a positive subinvariant (invariant)
>
measure of the chain if C , > l mipij 5 m j ( = m j ) , j 1. It is known that each
irreducible Markov chain admits a positive subillvariant measure. A Markov
chain is path reversible if for any r > 1, and states i l l . . . , i,, we have the
followiiig coiiditioii of Kolmogorov satisfied:

Show that an irreducible Markov chain is path reversible iff the chain ad-
mits a positive invariant measure {mj, j >
1) such that for all j, k we have
mjpjk = mkpkj. In this case the invariant measure is also unique except for a
constant multiplicative factor. [Further properties of such chains can be found
in Feller (1957) and Kendall (1959). Note that in our definition of a Markov
process (cf. Definition 3.1), it is symmetric with regard to the ordering of the
index set, and here the symmetry is about the motion of the chain relative to
its states.]

30. We say that any adapted sequence {X,, B,, n >


1) of integrable ran-
dom variables on ( R , C , P) is a (sub-) martingale if for all A E B,,

without reference to coiiditioiial expectations and thus to the Radon-Nikod9m


theorem. (We now show their existence.) The convergence theory can be de-
veloped with this definition. If the (sub-) martingale convergence is given,
Exercises 213

then the (Lebesgue-)Radon-Nikod9m theorem can be derived as follows:


(a) Let u : C + R+ be a measure, and C be countably generated. Set
+
p = u P : C + R+,so that u is p-continuous. If {B,, n >
1) generates
< <
C, let Fn = a ( B k ,1 k n). If {Bn,i, 1 i < <
k,) is an Fn-partition of fl
generating F,, define

Since u << p , this is well defined. Show that {X,, F,, n >1) is a positive
martingale, so that X, + X, a.e. Verify that the sequence is also uniformly
integrable. (Note SIX,,> a1
X,dp = u[X, > a].) Hence deduce that u(A) =
S, X,dp,A E C, and that d u l d p = X,, a.e.[p]. If N = [X, = 11, then
P ( N ) = 0. If g = X,/(l - X,), and uC = u ( N Cn .), then uC << P, and
g = d u c / d P a.e. is the Radon-Nikodqm derivative. If us = u ( N n .), then
SA +
u(A) = g d P us (A), A E C, gives the Lebesgue-Radon-Nikod9in theorem,
where us : C + R+ becomes the P-singular part of u aiid uc its P-continuous
part.
(b) If now (a) is given, and g, = du:/dP,, where u, = u F,, P, = PF,,
>
show that {g,, F,,n 1) is a supermartingale on (R, C, P) such that g, + g
a.e. [Remark: Part (a), and hence (b), can be extended t o the general case
that C is iiot countably generated as follows. Consider A c C, a countably
generated 0-algebra, aiid let X A be the corresponding duA/dpA. The col-
lection of all such 0-algebras can be directed (into I) under inclusion. Then
{XA,A, A E I) forms a uniformly integrable positive martingale. Such a
martingale can be shown t o converge in L1(p) t o X (but not a.e.). This is
-
sufficient t o conclude that X = duldp and then g = X / ( l - X ) on NC,where
N = [ X = 11, gives tj = duc/dP. Thus the general result is obtainable from
the martingale theory. See, e.g., theorem 5.3.3. in Rao (1987, 20041, for more
details.]

31. Show by an example that Theorem 5.8 is not valid if the moment
condition is omitted; i.e., a lion-L1 (P)-bounded martingale need iiot converge
a.e. (Let {X,, n > 1) be independent, P [ X , = +n] = P[X, = n ] = and
>
coiisider { S , = C r = l a k X k ,n I ) , and follow the discussion in Section 4C
of Chapter 2, with suitable ak, the interarrival times.)

>
32. Let {Vk,F k - 1 , k 1) be an adapted uniformly bounded sequence of
r.v.s and {X,, F,, n > 1) be an L2(P)-bounded martingale on a probability
space ( R , C, P ) . Show that the sequence {S, = >
Vkq5k,n 1) converges
a.e. and in L 2 ( P ) , where q5k = X k - X k p l is the martingale increment or
difference, and X o = 0 a.e. (Such a Vk-sequence is termed predictable and S,
a predictable transform of the X,-process. Note that the q5k are orthogonal.)
214 3 Conditioning and Some Dependence Classes

33.Let{X,,n>1)bei.i.d.andB,=a(Xk,1<k<n),B=a(U,,1&), -

on ( 0 ,C , P ) . If X is a B-measurable r.v. with E ( X ) = 0, E ( X 2 ) < cm, show


that E"'? ( X ) + X a.e. [and in L1(P)] as n + oo, and that (x)=E " ( ~ ~ I )

f (X,), n > 1, for a fixed Bore1 function f on R, f (X,) E L 2 ( P ) . Verify that


{f (X,), n > 1) forms an independent sequeiice with E(f (X,)) = 0, aiid
a2 = E(f(X,)') < E ( X 2 ) < cm, all n. (Use the image law of probabili-
ties.) If a, = E ( Xf (X,)), then {EL=l a kf ( X k ) ,B,, n >
1) is a martingale
which converges a.e. and in L ~ ( P ) .(Use Bessel's inequality.) Deduce that
E"(~~)(X = )O,n > 1, then EBl.(X) = 0, and filially X = 0 a.e. Show that
this gives a form of the Hewitt-Savage law.

34. Let {X,, B,, n > 1) be a supermartingale on (R, C , P) and g : R +


R+ be an increasing concave function such that g(tX,) is integrable for each
n > 1 aiid t > 0. Show that for any X > 0,

In case {X,, B,, n > 1) is a submartingale, then one has for any A,
P sup X,
n
>X <
J lim ~ ( e ~ ( ~ " - ' ) )t, > 0

35. Let {X,, B,, n > 1) be an L1(P)-bounded martingale, i.e.,

If h : R + R+ is a continuous fuiictioii such that h(t)/t2 + 0 as t + 0,


show that the martingale increments sequeiice (4, = X, X n l , n 1) with >
Xo = 0 satisfies h(q5,) < cc a.e. [This is not as simple as one might
-

expect. First note that {Y, = B,, n > 1) with


xkPl(xkxkP1),
-

is an L2(P)-boundedmartingale, aiid E(sup, I Y,+1-Y,12) I 4X4 < cm. Next,


verify that on AX = [sup, X,I <
A], Y, + Y, a.e., implying

This yields the result, due originally t o D.G. Austin that Ek,l4;
- < oo a.e.,
and then our assertion follows.
Exercises 215

36. Let {X,,n >1) be a sequence of independent r.v.s on ( R , C , P)


with means zero, and Sn = X k . Then the martingale {S,, F,, n 1) >
converges a.e. to S E L 1 ( p ) iff it is uniformly integrable, where

[For the necessity, let S, + S a.e., aiid note that S S, is independent of


-

X I , . . . ,X,, so that E ~ ~ (=SE(S


) +
S,) S, = E(S) S, a.e. Since the
- +
left side tends a.e. (and in mean) to E " ~ ( S = ) S, implying E(S) = 0, we
deduce that S, + S in mean, and hence is uniformly integrable. The same
>
result holds if L1(P) is replaced by L P ( P ) , p 1. This appears to be due to
J. Marcinkiewicz.]

37. (Continuation) In the above problem, under the same hypothesis,


show that for any p 1, >

[Note that, by the submartingale property, we have ISn11, 1' IS1 1,. If p > 1,
this is a consequence of Theorem 5.6ii1with C,? replaced by p / ( p 1) = q; thus
> < <
if p 1.25, this is 5. For 1 p < 1.25, we need to prove this. Let us sketch
the argument which gives the result. This application is due to Doob. First
suppose that X, has a symmetric distribution; i.e., P[X, E A] = P[X, E -A]
> >
for any Bore1 set A c R. Since P[maxk<, Sk A, Sn A] = P[Sn A], verify >
>
with the decomposition of the event L a x k < , Sk A], as in Theorem 5 . 6 ,
that P[maxk<, - S k> < >
A, S, < A] P[Sn A], so that by addition one has

Then again as in the proof of Theorem 5.6ii with the above inequality,

In the unsymmetric case, let (GI, C1,P') be a copy of the given probability
space, and {XA, n > 1) be independent r.v.s distributed as the X,. Then on
>
the product space (0,C, P)x (R', C', P I ) , {X, -XA, n 1) is a symmetrically
distributed sequence, aiid the above result applies to give

But XI" is convex, and so IA" < 2"-I [ A - A'I" + AII"]. Hence
216 3 Conditioning and Some Dependence Classes

38. (Continuation) Under the same hypothesis as in the above problem,


>
if S, + S a.e., and S E L P ( P ) , p 1, show that X , E Lp(P) aiid S, + S in
LP(P)-mean. (Thus if the martingale is of this form, the convergence state-
ment can be improved.)

39. Let ( T , I ,A) be the Lebesgue unit interval and f : T + R be an


integrable r.v. on it. Consider the partition { I k , O < <
k 2, - I), where
+
Ik= [k/an, (k 1)/2"). If

and B, is the 0-algebra generated by the partition, show that {f,, B,, n 1) >
is a uniformly integrable martingale and that f, + f a.e. and in L1(X). [Thus
every element of L1(X) can be written as the a.e. and in mean limit of a mar-
tingale of simple functions.]

40. (Generalization of the Above) Let (R, C, P) be a probability space,


(T, I,A) be the Lebesgue unit interval, and (W, A, p) be their product. Let
< : 0 + R be a one-to-one and onto maping such that it is measurable and
measure preserving, so that P = P o <-l. Let T, : T + T be a "shift," so
+ +
that 7,(t) = t 2Zn,7;(t) = t k2-,, 0 <k < 2n,7,2"= identity = 7;. If
A. = E x I,E E C, I E 7 ,define Ak = ['"(E) x 7;(1), 0 <k < 2", aiid
I = [O, 2-,), aiid 3, as the a-algebra in A, generated by sets of A of the form
A = (J Akf :i1
with Ak defined above. If f E L1(P), let h(w,t) = f (w), w E
>
R , t E T. Show that {h,, 3,,n 1) is a decreasing martingale, where

Deduce that h,(s) + h(s) a x . [p] as n + oo,where {j,, n > 1) is a subse-


quence of integers going to infinity satisfying the above restriction for t. [Hint:
h, is a constant on the generators of 3,, and if A E 3, is a generator, verify
that S' h, d p = S'
hdp after noting that sAk g(w, t ) dp =sAo g(<kw,~ k td)p
for any g E L1(p). Since S, hl dp = If dP, the convergence statement
+
follows from Theorem 5.14 for all j, such that jn2Tn 5 t < (j, 1)2Tn for
each n. But there are illfinitely many such sequences, and so the result is a
weak one. This formulation is due t o M. Jerison.]
Exercises 217

41. Many sequences of r.v.s can be expressed as functions of martingales.


It is their convergence that is the difficult question. We illustrate it for two
important processes in this and the next problem. Let {Xn,B,, n >1) be
any quasi-martingale bounded in L1(R, C, P ) . [See Eq. (49) of Section 5 for
definition.] Show that this sequence can be expressed as a difference of two
positive supermartingales relative t o the same stochastic base {B,, n >
1).
(Use Theorem 5.9 and Lemma 5.5.)

42. Let ( R , C, p ) be a a-infinite measure space aiid u : C + R+ be 0-


<
additive (hence bounded) aiid p-continuous. Let : R + f l be a measurable
trailsformatioil such that p and its image p o<-I are equivalent (i.e., have the
same null sets.) Let v = EL=o <-5
vo p, = p o <-%which are signed
measures (<-(k+l) = O t p k ) on C and vn <
< p,. Let X n = dvn/dpn be the
Radon-Nik0dj.m derivative. If ( N , N , <) is the counting measure space on the
natural numbers N, aiid (W, A, A) = ( R , C, p ) x (N, N, <) , let F, = B, @ G,,
where B, = a(~i,i >
<-'"(A), A E C),G, = 0((0,1,. . . , n I ) , {k}; k n ) , so
-

that Fn+lc Fnc A. If f (s) = X1 (w), s = (w, k) E W aiid f, (s) = X, o<"(w)


< >
if 0 k < n , aiid = f ( s ) if k n , show that {f,, F,, n > 1) is a decreasing
martingale on the a-finite space (W, A, A) and that X , + X a.e. [p] is equiva-
lent t o the convergence of the f,[= E~~~(f)-] martingale. [Follow the hints as
in Problem 40. Note that none of our results so far can be used t o deduce this
coilvergelice statement, which is the Hurewicz-Oxtoby ergodic theorem. It is
known that X, + X a.e. In a similar way other averages can be formulated
as (decreasing) martingales, but the known martingale convergence theory is
inadequate for this type of application. This observation is also due t o M.
Jerison.]

43. If P ( . , .) : f l x C + R+ is a mapping on a measurable space (fl, C), it


is called a Markov kernel if P ( . , A) is C-measurable and P(w, .) is a measure,
P(w, R ) = 1, w E 0. [It is sub-Markov if P(w, R ) 5 1.1 If PI,P2 are two such
kernels, then by the standard Lebesgue theory we define their composition
as Q = PI o P2 t o mean Q(w,A) = JQP1(w,dL;))P2(L;),A).Note that this
coinposition is a form of the Chapman-Kolmogorov equation if ( R , C) is the
Borelian line, when PI aiid P2are the one-step trailsition functions depending
only on the number of steps taken but not the starting step. [They are also
called the stationary transition probabilities.] We define

i f f is a positive (or real bounded) measurable function on 0. Then g will be a


positive (or bounded) measurable function. If g 5 f , then f is called excessive
or superharmonic, and if g = f , then f is called invariant or harmonic rela-
>
tive t o the kernel P. (If g f , then f is subinvariant or subharmonic.) These
play important roles in potential theory. Let {X,, n > 0) on a probability
space (R, B, p) be a sequence of r.v.s with a stationary transition probability
218 3 Conditioning and Some Dependence Classes

function P(.,.), so that P ( x , A) = p([X, E All XnPl = x ) , x E R,A E B.


Show that for Bore1 f : R + R which is bounded and invariant rela-
tive to P, we have E,,(f(X,)) = jjR f ( y ) P n ( x o , d y ) , where P' = P, and
Pn = P o PnP1is the coinposition and Xo = zo a.e. [ P ( x o ,.)I. Deduce that
{ f (X,), B,, n >1) is a uniformly integrable martiiigale on (R, B, P ( x o ,.)),
< <
where B, = a(f ( X k ) ,1 k n). If f here is excessive (or subinvariant) then
the corresponding process is super-(or sub-)martingale.

44. (Continuation) With the setup of the preceding problem, aiid if the
>
tail 0-algebra 7 of {X,, n 0) on (R, B, P ( x o ,.)) is degenerate, so that each
A E 7 satisfies P ( x o ,A) = 0 or 1, aiid if f is invariant, then show that f must
be a constant relative to P ( x o ,.). In particular, if P(x0, dy) = p(x0, y) dy, so
that we can define the convolution g(x) = (f * p(x0, .))(x) on R as

show that each bounded continuous invariant f depends only on xo. [Reduce
the problem t o the above one by writing p for p(xo,.) and considering the
>
special Markov process {S, = C;=,X k , n I), where the X, are i.i.d. This
is discussed, in a more general case, for finding the solutioiis p of p = p * P,
by G. Choquet aiid J. Deny in C.R. Acad. Sci., Ser. A., 250 (1960), 799-801.1

45. The theory of super- and subharmonic functions was primarily de-
veloped by I?. Riesz in the late 1920s; among other theorems, Riesz proved
an important decoinpositioii of such functions. These results have analogs in
martiiigale theory (by substituting "martingale" for "harmonic"), of which the
above two assertions are illustrative. Thus the corresponding Riesz decompo-
sition in our theory is as follows. A positive supermartingale {X,, B,, n 1) >
is termed a potential if X, + 0 a.e. and in ~ l ( ~ ) - m e aLet
n . {X,, B,, n 1) >
be an arbitrary supermartingale. Then it admits a unique Riesz decomposi-
+ >
tion X, = Y, Z,, where {Y,, B,, n 1) is a martingale and {Z,, B,, n 1) >
is a potential iff it dominates a martingale, i.e., iff there exists a martiiigale
{W,,B,, n >1) such that X, >W, a.e. for all n >1, or iff there is an
a > -oo such that limn E(X,) > a. [Hint: If X, dominates a martingale,
then E(X,) > E(X,+,) > E(W,) = n > o o , since the expectation of a
martiiigale is a constant. This implies sup, IE(X,) I < oo. If X, = Xk A, -

>
is the decomposition of Theorem 5.9, where {A,+1, B,, n 1) is the increas-
ing adapted process, A1 = 0, and {XA, B,, n >
1) is a martingale, so that
E(A,) < E ( X i ) +sup, I E(X,)I < oo, let A, = lim, A,. Then E(A,) < oo.
If we let Y, = Xk - E'I~ (A,), and 2, = (A,) - A,
E'VI >0, then we have
X, = Y, +Z, as the desired decomposition. For uniqueness if Y, 2, = X,+
is another decomposition, then Y, -p, = 2, Z,, so that
-
Exercises 219

Since { Y, - ? >
, , B,, n 1) is a submartingale
- -
whose expectations -
are nonde-
creasing, we must have E(I Y, - Y,) = 0, so that Y, = Y, and then 2, = 2,.
The converse is immediate.]

46. Consider the Haar system of fuiictioiis on [0,1]. These are defined on
[0,1] as Ho(.)= 1, Hz,,(.) = 2"2(X[0,2-b1) - x [ ~ - L - I , ~ - L ) ) , k = 0 , 1 , . . . , and if
<
1 j < 2" k >
1, H2r+j(.) = ~ ~ k ( . j 2 - ~ ) ~ [ ~ ~ - k , ( Then ~ + ~ {H,, >
) ~ - ~n ) . 1)

forms a complete orthonormal system in L2(0,1). (Completeness means that


only the zero function is orthogonal to all H,,n > 1.) If f E L2(0,I ) , and
a, = J ; f (z)H, (z)dz, let S, (f ) = xi=o
ak H k . If

show that {S,(f), B,, n >


1) is a uniformly integrable martingale, and,
in fact, S,(f) = + f a.e. [Note that B, has ( n
~ ' ? l ( f ) 1) atoms. If +
n = 2% j, then the intervals [ 1 2 ~(1~ , 1)2TkP1),0 + 1 < 2 j 1, and < +
+ + < <
[m2-< ( m 1)2Z", j 1 rn < 2'", 0 j < 2 5 are these atoms and that
~ "(H,+l) = 0. This construction will be elaborated upon early in Section
1 1

8.1. Also compare with Problem 39 above.]

47. The preceding case is abstracted as follows. An orthonormal sys-


tem { & , n >
1) in L 2 ( P ) on a probability space ( R , C, P) is an H-system
if (i) each 4, takes at most two nonzero values with positive probability,
(ii) B, = a ( $ l , .. . ,&) has exactly n atoms, and (iii) Enr7(&+l) = 0
a.e. Show that an orthonormal system {c$,)~ is an H-system iff for each
f L2(PI,EU1l(f) = Ci=,a&, ak = Ja f 4 k dP.
48. (Continuation) Let {X,, B,, n >
1) be a square integrable martingale
on ( R , C, P), and $, = X , - XnP1, with = X I , be the martingale [differ-
>
ence or] increments sequence. Verify that if {$,, n 1) c L 2 ( P ) is complete,
then it is an H-system.

49. (Continuation) I f f : [O, 11 + R is any Lebesgue measurable function,


>
aiid {&, n 1) is a complete H-system in L2(0,I ) , then it can be expressed
as f = akq5k, for a suitable set of ak E R,the series converging a.e. Fur-
ther, thereexists a complete H-system on [0,1] such that every measurable
(not necessarily integrable in either case) function has a series representation
as in the preceding statement. [This result in its general form is due to Gundy
(1966). It needs a careful analysis using the work of the last two problems.]

50. The preceding three problems admit the following further extension.
Let ( 0 ,C, P) be a probability space and C be countably generated. Then
there exists a (universal) martingale {X,, B,, n >
1) on ( R , C, P) such that
C = c(U, B,), aiid every extended real-valued measurable (for C), but not
necessarily integrable, f on R is a pointwise a.e. limit of a subsequence
220 3 Conditioning and Some Dependence Classes

{X,,, k > 1) of the X,,n > 1 sequence. [This result is involved, and it
abstracts the preceding work, and is due to Lamb (1974). However, not all
a.e. convergence results of Fourier series may be established by martingale
methods. For instance, if f E LP(O,I ) , 1 < p < oo,and q5k(x) = e2"i"",hen
So1
f = Ckrlakq5k, ak = f (x)$k(x) dx, converges a.e. (and in LP-norm), but
the partial sums do not form a martingale, and as yet no martingale proof of
this statement is available. A relatively "simple" proof of this result, which is
originally due to L. Carlesoii and R.A. Hunt, in the middle 196Os, is given by
C.L. Fefferman. Ann. Math. 98 (1973), 551-571.1
Part I1 Analytical Theory

The fine properties of probability are often obtainable by using the sharp
Fourier analysis techniques They are called characteristic function methods,
and Chapter 4 is utilized in establishing various important results including
the Lkvy-Bochner-Cramkr representation theorems applicable for uniqueness,
derivation of distributions of ratios of raiidoin variables and special properties
of sums of independent raiidoin elements. Then the longest chapter of the
book, Chapter 5, is devoted t o the central limit theory with error estimations,
t o stable distributions as well as t o invariance or functional limit theorems.
Here the Kolmogorov law of the iterated logarithm and certain m-dependent
theorems are also included. Several important compleineiits are considered in
both the chapters and should be of interest for students as well as researchers.
Chapter 4

Probability Distributions and


Characteristic Functions

Structural properties of probability distributions and of characteristic func-


tions are treated here in some detail. These include the selection principle, the
Bochner and Lkvy theorems on characteristic functions together with their
essential equivalence, Cram6r7stheorem on the Fourier-Stieltjes transform of
signed measures, aiid some multivariate extensions. The equivalence of point-
wise a.e. convergence and convergence in distribution for sums of independent
r.v.s is also established. Many additional results are sketched as exercises.

4.1 Distribution Functions and the Selection Principle


In the theory of the preceding chapters we have already seen the use of dis-
tribution fuiictioiis at several places. We now undertake a systematic study
of the structural properties of these functions and of their Fourier transforms
for use in the further development of probability theory.
Recall that a distribution function of a random variable X on ( 0 ,C , P) is
the image law Fx given by Fx(z) = P[X < z], so that it is a nondecreasiiig
left contiiiuous function on R satisfying Fx(-co) = lim,,-, Fx(z) = 0 and
Fx (+co) = limx,+, Fx (x) = 1. Here we study the properties of such func-
tions F on R without reference t o r.v.s. In view of the Kolmogorov-Bochner-
Jessen theory of the preceding chapter, the collection of such distribution func-
tions (generalized t o multidimensions) determines a probability space aiid a
family of r.v.s. on it with these (finite dimensional) distributions for the r.v.s.
thus determined. Hence a study of distributioiis per se becomes an integral
part of probability theory; in fact, for most of the analytical work they even
occupy a preeminent position in the subject. Consequently, we start with this
analysis, through a use of Fourier transforms for a deeper insight into their
analytical structure.
In Section 2.2 we introduced the concept of the convergence of a sequence of
distribution functions and noted that this is the weakest of the convergences
224 4 Probability Distributions

considered there. The following fundamental selection property of distribu-


tions, discovered in 1912 by E. Helly, plays an important role in our study.

Theorem 1 (Helly7s Selection Principle) Let {F,, n >


1) be a sequence
of distribution functions on R . Then there exists a nondecreasing left contin-
< <
uous function F (not necessarily a distribution), 0 F(x) 1, x E R, and a
subsequence {F,, , k > 1) of the given sequence such that F,, (x) + F ( x ) at
all continuity points x of F.

Proof We first establish the convergence at a dense denumerable set


of points of R and then extend the result using the fact that the conti-
nuity points of a monotone function on R form an everywhere dense set.
Thus, because rationals are dense and denumerable in R,we consider them
here for definiteness, aiid let rl, r 2 , . . . be an eiiuineration of this set. Since
< <
0 Fn(z) 1, {F, (rl),n >
1) is a bounded sequence, so that by the Bolzano-
Weierstrass theorem it has a convergent subsequence, {Fnl(rl), n >
I), such
that Fnl(rl)+ G l ( r l ) , as n + oo. Continuing this procedure, we get a
sequence Fnk(rk)+ G k ( r k ) and { F n k , n > 1) C {F,(kpl),n > 1). Con-
sider the diagonal sequence IFn,, n > 1). This converges at x = rl, r 2 , . . . .
Let lim,,, Fn,(ri) = aili = 1,2,. . . ; thus ai = Gi(ri). Since the F, are
increasing, it follows that for r, < rj, a, < a j . For each z E R, define
G(x) = inf{an : rn > x). Since an < a,+l, it is clear that G(.) is non-
decreasing. We now show that Fnn(x) + G(x) at each continuity point x of
G.
If E > 0, aiid a is a continuity point of G, choose h > 0 such that

This is clearly possible by the continuity of G at a. Let ri, rj be rationals from


our enumeration such that (by density) a - h < ri < a < r j < a h. Then +
G(a - < ai < G(a) < a j < G(a + h).
h)

Choose N ( = N,) such that n > N implies

Then for all n > N we have by the inonotoiiicity of the Fn and G aiid the
above inequalities:

Similarly
4.1 Distribution Functions

From these two inequalities one gets

so that Fnn(x) + G(x) at all continuity points x of G. Now define F on R


by F(z) = G ( z 0), so that F ( x ) = G ( z ) if x is a coiitiiiuity point, aiid
-

F ( x 0) = F ( x ) if z is a discontinuity point of G. Thus Fnn(x) + F ( x ) at


-

all x which are continuity points of F. This completes the proof.

>
Remarks (1) The fact that the set {ri,i 1) c R is the set of rationals
played no part in the above proof. Any dense deiiuinerable set will do. Consider
F,(x) = O for x < n, = 1 for x >
n; we see that Fn(z)+ O as n + oo for
all x E R, so that the limit F satisfies F(z) = 0. Thus such an F is not
necessarily a distribution function (d.f.).
(2) If +oo and -cm are continuity points of each F,, then Fn(-oo) +
F(-oo), F, ( + m ) + F(+oo), so that F,(+O) - Fn(-0) = 1 implies in this
>
case that F is a distribution. In particular, if {F,, n 1) is a sequence of d.f.s
on a compact interval [a,b] c R, then the limit of any convergent subsequeiice
is necessarily a d.f., since {a,b) may be included in the set {r,, i 1). >
(3) The preceding theorem can be stated as follows: A uniformly bounded
sequence of nondecreasing functions on R is weakly sequentially compact in
the sense that it has a coiivergent subsequeiice whose limit is a bounded lion-
decreasing function.

The next theorem supplements the above result and is very useful in our
study. It should be contrasted with the Lebesgue limit theorems, for which
the integrands vary aiid the measure space is fixed whereas the opposite is
true in the following. It is due to E. Helly in a special case, and the general
case to H.E. Bray in 1919. The connection between these two viewpoints is
clarified in an alternative proof below.

Theorem 2 (Helly-Bray) Let {G,, n > 1) be a sequence of individually


bounded nondecreasing functions o n R. If there exists a bounded nondecreas-
ing function G o n R such that

(i) limn,, Gn (x) = G(x) at all continuity points x of G,


(ii) limn,, G, (fcm) = G ( f oo)in the sense that

liin liin G,(x) = lim G(x)


n++m x++m x +00

and similarly for x +o o , then for any bounded continuous function f : R +


R, we have
dim L f ( x ) d ~ , ( x )= L i ( x )d ~ ( x ) .
Proof Since G is iioiidecreasing aiid bounded, given a S > 0, there is an
(1)

Ns > 1 such that n > Ns implies by hypothesis (ii),


226 4 Probability Distributions

Hence {Gn, n >1) is uniformly bounded. Thus there is an M < oo, with
G,(x) <
M , for all x E R.Next, for any a > 0, we consider

It is t o be shown that II,I + 0 as n + oo. This is accomplished by estimating


the right side terms aiid showing that each goes to zero.
Since f is bounded, there is a c > 0 with f ( x ) I c , x E R.If E > 0 is
given, we choose a as before, so that *a are continuity points of G and

This is possible since G is bounded and its continuity points are dense in R.
>
By (i) and (ii), we may choose an Nl (E), such that n Nl (E) implies

Then

Similarly
IIrI < c[G,(+oo) - G,(a) + G(+oo) - G(a)].
Adding these two and using (2) and (3), we get

<
For I:l, since GnI M aiid [-a,a] is a compact interval, divide [-a, a]
at the xi into rn subintervals such that the oscillation of f on each is bounded
by ~ / 1 6 Mwhere -a = xo < x l < . . . < x, = a , the xi also being continuity
points of G. All this is clearly possible. Hence
4.1 Distribution Functions

Let Nz(E) be chosen so that n > Nz(E) + G , (xi) - G(xi) < &/8mc,i =
I , . . . , m. Then the above inequality becomes

Thus (4) aiid (5) imply for n > max(N1 (€1, N2(€11,

This completes the proof of the theorem.

Note that, as the example in the first part of the above remark shows,
condition (ii) of the hypothesis in the Helly-Bray theorem is essential for the
coiiclusioii of ( I ) . In that example, F, is the d.f. of a discrete r.v. X,, and
P [ X , > a] + 1 as n + cc for each a > 0. Thus the probability "escapes
t o infinity," and condition (ii) is simply t o prevent this phenomenon from
happening, so that (1) is true.
We present an alternative sketch of the important Helly-Bray theorem by
reducing it t o the Lebesgue bounded convergence through the image probabil-
ity law (cf. 1.4.1) and the representation in Problem 5b of Chapter 2. [Readers
should complete the details given as hints there, if they have not already done
so. A more general case will be proved in Proposition 5.4.2.1

Alternative Proof For simplicity we take G,, G as d.f.s. (The general case,
which caii be reduced t o this, is left t o the reader.) Thus G,(z) + G ( z ) as
n + oo for all continuity points x of G. Hence by the above noted problem,
there exists a probability space ( 0 ,C , P) and a sequence of r.v.s X,, X on it
such that

aiid X, + X a.e. [In fact, (R, C , P) caii be taken as the Lebesgue unit interval,
and X, (w) = G;' (w),w E ( 0 , l ) where G;' is the (generalized) inverse of G,,
i.e., G;'(w) = inf{y E R : Gn(y) > w}, and similarly X(w) = G-'(w), w E R.]
Now if f is as given, f ( X n ) + f ( X ) a.e. and f ( X ) is an r.v. By the image
law 1.4.1, we have
4 Probability Distributions

(by the Lebesgue bounded coiivergence theorem)

= L f (x)dG(x)

This is ( I ) and terminates the proof.


(by the same image law).

The technique of the second proof above will be used to establish the fol-
lowing result on the convergence of moments.

Proposition 3 Let {X,, n > 1) be a sequence of r.v.s on (R, C , P) such


that E ( I X n s ) < KO < ce for all n > 1 and some s > 0. If Xn + X i n dis-
tribution, then E ( X , I r ) + E ( X I r ) for each 0 < r < s, so that the rth-order
absolute moments also converge.

Proof Let Fn(x) = P [ X n < x ] , F ( x ) = P [ X < x ] , x E R.Then F,(x)


F ( x ) at all contiiiuity points x of F, by hypothesis. Hence by the technique
-
used in the last proof, there exists an auxiliary probability space (a', C', PI)
>
and a sequence {Y,, n 1) of r.v.s on it such that Y, + Y a.e., and Pf[Y, <
x] = F,(x), P f [ Y < x] = F ( x ) ,x E R.Thus by the image law, we also have

and
E(lXIT)= 1B
xlrdF(x) = L, IYITdP1. (7)

Since 0 < r < s , the second coiiditioii implies the uniform integrability of
>
{1Ynr, n 1). Indeed, we have for any a! > 0,

uniformly in n. Hence, since IY, 1' + Y 1' a.e., by the Vitali convergence
theorem we deduce that

From (6)-(8), it follows that E(IX,Ir) - E ( I X r ) , as asserted.


4.1 Distribution Functions 229

Using the same ideas as in the above proof, we can also deduce the follow-
ing result, which complements Theorem 2 in some respects.

Proposition 4 Let {F,,F, n > 1) be d.f.s such that F, + F at all


continuity points of F . I f f , : R + R are bounded continuous functions such
that f, + f uniformly, t h e n we can conclude that

Proof Using the same representation as in the preceding proof, we note


that there is a probability space (R,C , P) aiid r.v.s X, aiid X on it such that
X n + X a.e. aiid P[X, < z] = F,(z), P [ X < z] = F(z),z E R.But then
(9), by the image law, is equivalent t o

Since f n + f uniformly, so that f is also bounded and continuous, we deduce


that f,(X,) + f ( X ) a.e. and by the bounded convergence (10) holds. Again
by the image law theorem, (9) follows.

A direct proof of (9), without the representation, t o get (10) is possible,


and it is similar t o the Helly-Bray theorem. But that is not as elegant as the
above one.
Actually, the converse t o ( I ) , and hence a characterization of the conver-
gence in distribution, is also true. If F and G are two d.f.s on R,then we
define the Lkvy metric between them as

It is not difficult t o verify that d(., .) is a distance function on the space M


of all d.f.s on R. A verification of the metric axioms will be left t o the reader.
We have several characterizations of the concept in the following:

Theorem 5 Let IFn,F, n > 1) be distribution functions o n R. T h e n the


following statements are mutually equivalent:

(i) F, + F at all continuity points of the latter.


(ii) lim,,, JR f (z)dF,(z) = JR f (z)d F ( z ) for all bounded continuous
f :R+R.
(iii) d(Fn,F) + 0 as n + cm.
(iv) If P, and P are the Lebesgue-Stieltjes measures determined by F, and
F , t h e n limsup, P,(C) 5 P ( C ) for all closed sets C c R.
(v) If P, and P are as in (iv),then liminf, P,(D) > P ( D ) for all open
sets D c R.
230 4 Probability Distributions

Proof The method of proof here is to show that each part is equivalent
to (i) or (ii). Now the Helly-Bray theorem already established that (i)+(ii).
For the converse (cf. also the remark after the proof) let E > 0 be given.
If zo is a point of continuity of F, there is a 6[= S ( z O , ~>) 0] such that
z z o < S +-IF(z) F ( z o )< €12. We construct two bounded contiiiuous
- -

<
functions f l , fi on R, f l f 2 , as follows:

By hypothesis, for these f l , fi, there exists an N [ = N(E,f l , f 2 ) ] such that


>
n N implies

Hence.

Similarly

From(13) and (14) we get

Since zo E R is an arbitrary continuity point of F, (15) implies that Fn + F


at all continuity points of F, and ( i ) e ( i i ) is established.
( i ) e ( i i i ) If Fn + F, for each E > 0, choose *a as continuity points of F
such that F(-a) < ~ / 2 , 1 -F ( a ) < ~ / 2 Partition . the compact interval [-a, a]
as - a = a o < a1 < . . . < a , = + a , with a, -a,-1 < ~ , =i 1, . . . , m, and also
a. as continuity points of F. Let N be chosen such that for n > N , we have
by (9,
IFn(ai) - F ( a i ) l < ~ / 2 , i = 1,. . . ,m (16)
To show that the F,, F satisfy ( l l ) ,let z E R be arbitrary. If z a < = ao,
then by monotonicity of F, and F, and inequalities (16), we have
4.1 Distribution Functions 231

F,(x) < F,(ao) < q u o ) + &/2 < & + F ( x ) ,


F,(x) > 0 > F ( a o ) €12 > F ( x ) E [since F ( a o ) < €121.
- - (17)
If aj-1 < z < a j , then by (16), aiid a j - a j - l <~ ,= lj, . . . , r n ,

Similarly if x > a,, one gets

<
By (17)-(19) aiid (11) we have d(F,, F) E. Thus (iii) holds.
Conversely, if (iii) is given, let E > 0 and xo be a continuity point of F.
Then there is a S1 [= 61 (xo,E ) > 0] such that for 0 < 6 < 61, Ix - xo < 6 *
IF(z) F ( z o )< €12. If r] = m i n ( ~ / 2 , S )> 0, then by (iii) there exits No such
-

>
that n No + d(F,, F) < r ] , and from (11) we have

Hence,lF,(zo) - F(zo) < E aiid (i) follows.

(ii)*(v) If P and P, are as given, then (ii) may be written as

f dP, f dP, f : R + R is bounded and continous. (20)


.I% JR
=
JR

Let A c R be an open set. Then XA is a lower semicontinuous (1.s.c.) function.


Recall that a function h : R + R is 1.s.c. if h(y) lim inf,,, <
h(x) for each
y E R.A classical result from advanced calculus says that h is 1.s.c. iff there
>
exists a sequence of coiitinuous fuiictioiis h,(> 0 if h 0) such that hn(x) T
<
h(z) for each z. We use this here. Thus let 0 h, T XA pointwise, where h,
is coiitinuous on R.Let k be a fixed integer, and by (ii) for each E, there is an
>
no[= no(&,h k ) 11 such that n no implies >

Since 0 < hk < X A , we also have

(i)+-(iii) can be quickly proved using Exercise 5(b) of Chapter 2 again. Thus
F, + F, a d . f . + X, + X a.e. on (fl',C1,P1),Fn= FxrL,F = Fx. So for c > 0,
there is an no,n 2 no + P 1 [ X n-XI < E] 2 1 - c . If f1° = [IX, - XI < c ] ,then
on 00,X, -E < X < X, +
E +- Fn(x E) E 5 F ( x ) 5 Fn(x
- - E) E. Hence + +
d(Fn, F) I E .
232 4 Probability Distributions

lim inf P, (A) =


n

h k ( ~ ) P ( d-~ ) (21)1.

Letting k + cm in this sequence, we get by the monotone convergence theorem,

Since E > 0 is arbitrary, (v) is established, i.e., ( i i ) J ( v )


If C c R is any closed set, then (v) implies

and the left side is equal to 1 - lim sup, P,(C). Hence (iv) is true. Conversely,
if (iv) holds, then considering the complements as here, we get (v). Thus
(iv)@(v) is always true.
( v ) ~ ( i v together
) imply (i). Indeed, let A c R be a Borel set whose
boundary has P-measure zero. [For example, if A = (a, b), then {a), {b) have
P-measure zero, which is equivalent to saying that a , b are contiiiuity points
of F.]Thus P(A - int(A)) = 0, where A is the closure of A and int(A) is the
interior of A. Thus by (v) and its equivalence with (iv) we have

P(A) > lim sup P, (A) > lim sup P, (A) > limninf P, (A)
n n
> limninf P, (int (A)) > P(int (A)) [by (v)]

But the extremes are equal. Thus limn Pn(A) = P ( A ) for every Borel A whose
boundary has P-measure zero. In particular, if A, = (-cm, x ) and noting that
Fn(s)= Pn(A,), F(s)= P(A,), this yields (i). Thus the proof is complete.

Remark Since the proof is given as ( i )(ii) ~ J ( i v ) e ( v )J (i) and (i)e ( i i i ) ,


there is redundancy in showing separately that ( i i ) J ( i ) . However, the separate
argument shows that this implication is true if we assume (ii) only for the
(subclass of) uniformly contiiiuous bounded fuiictioiis f . This insight is useful
in addition to the fact that ( i ) ~ ( i iis) the most important and often used part
of the above result.
We now present a complement to the above theorem. This is a partial
converse to Proposition 3. An annoying problem here is that moments of a se-
quence of d.f.s may converge without the d.f.s themselves coiiverging. Further,
a sequence of iiuinbers need not be moments of a d.f.; and even if they happen
to be moments, it is possible that two different d.f.s can have the same set of
moments. (We give two examples in Problem 5.) Thus with restrictions, to ex-
clude all these difficulties, we can present the following relatively simple result.
4.1 Distribution Functions 233

>
Proposition 6 Let {F,, n 1) be a sequence of d.f.s having moments of
>
all orders {Mik),k 1). Thus we have

If limn,, >
M i k ) = dk)exists for each k 1, and if {(a("), k 1) determines >
a distribution F uniquely, then Fn(x) + F ( x ) , as n + co, holds for each x
which is a continuity point of F, so that F, + F as n + co.

Proof By Theorem 1, {F,, n >


1) has a convergent subsequence. Thus
there exists a nondecreasing left continuous F, 0 <
F(.) 1, such that for <
>
some {F,, , k I), we have Fnh+ F. To see that F is a d.f., we need to show
that F(+co) F(-co) = 1. Indeed, given E > 0, choose a large enough so
-

that *a are continuity points of F aiid n ( 2 ) / a 2< E . This is clearly possible.


Then
F(a) - F(-a) = liin [F,, (a) - FnA(-a)]
kioo

> k-oo
lim 11 - 1/a2 x2 dl?,, (x)
XI> a1

> k-oo
lim [l - l / a 2 x2 d ~ n(x)]
,

= 1- a ( 2 ) / a 2 [since AT?) by
+ a(2), hypothesis,]
>I-&.
Since E > 0 is arbitrary, we deduce that F(.) is a d.f., aiid Fn, + F.
Hence there is a probability space ( R , C ,P ) ( a s in the proof of Proposi-
tion 4) and a sequence of r.v.s {X,, , Y, k >
1) such that X,, + Y a.e.
and P [ X n h < x] = Fnh(x), P [ Y < x] = F (x),x E R. Also for each in-
teger p > 1,O <
)";M
"i + a('"), so that { M",);i k 1) is bounded.>
By Proposition 3, E(Xn,14) + E ( l Y 4 ) , 0 < q < 2p. Hence by Proposi-
tion 1.4.6, {IXnhIq, k >
1) is uniformly integrable. But this implies that
{X&, k > 1)) q >
1 integer, is also uniformly integrable. Consequently by
Theorem 1.4.4, E(X2,) + E(Y4), and the a ( 4 ) , 1 <
q < 2p, are the qth
moments of F . Since p is arbitrary, it follows that { a ( 4 ) , q 1) are all the >
moments of F, and by hypothesis these determine F uniquely.
If {Fnl, n' > 1) c {F,, n >
1) is any other coiivergeiit subsequence,
then the preceding paragraph shows that F,, + F', a d.f. Since F' also has
>
{a(4),q 1) as its moments, by the uniqueness hypothesis F = F'. But by
Theorem 5iii, the set of distribution functions on R is a metric space under
convergence in d.f. topology, and in a metric space a sequence converges iff
each of its coiivergeiit subsequelices has the same limit. Thus the full sequence
>
{F,, n 1) converges to F, coinpletiiig the proof.
2 34 4 Probability Distributions

This result is of use in applications only if we have some criteria for the
unique determination of d.f.s by their moments. The question involved here
is nontrivial and there has been a considerable amount of research on what
is called the "moment problem." For an account, see Shohat aiid Tainarkiii
(1950). For instance, if S = { z : F(z) > 0) c R is bounded, then F is uniquely
determined by its moments. This and certain other easily verifiable sufficient
conditions can be obtained from the work on characteristic functions of dis-
tribution functions. We now turn t o a detailed analysis of these functions and
devote the rest of the chapter t o this topic as it is one of the most effective
tools in the subject.

4.2 Characteristic Functions, Inversion, and


L6vy's Continuity Theorem

In the preceding section some properties of distribution fuiictioiis (d.f.s)


have been given. For a finer analysis however, we need t o use the full structure
of the range space, namely, R, and turn t o Fourier transforms of d.f.s. These
are called characteristic functions (ch.f.s) in probability theory and their spe-
cial properties that are of immediate interest will be considered here. Thus if
F : R + R is a d.f., then we define its ch.f. by the Lebesgue-Stieltjes integral:

This concept was already introduced for, aiid the uniform continuity of 4 :
R + C was established in, Proposition 1.4.2. It is clear that complex analysis
plays a role, since generally 4 is complex valued. Note that by the image
probability law (Theorem 1.4.1), (1) is equivalent t o saying that, if X is an
r.v. on (a,C , P) with F as its d.f., then

Hence if X I , X 2 are independent r.v.s on ( 0 ,C , P) then

In particular, if X = X1 + X2, then


4.2 Characteristic Functions

On the other hand, by the image law theorem one has

Thus Fx is the convolution of Fxl and Fx, (already seen in Problem 6(b) of
Chapter 2). Also it is a commutative operation. Equations (4) and (5) together
imply that the ch.f. of the convolution of a pair of d.f.s is the product of their
ch.f.s. It is also clear from (5) that, if Fxl and Fx, have densities f l and
f 2 relative to the Lebesgue measure, theii the convolutioii Fx = Fxl * Fx,
becomes

and hence by the fundamental theorem of calculus (or the Radon-Nikodj.m


theorem in the general case) we conclude that Fx again has a density f and
from (6)
(7)

Note also that (5) implies Fx is absolutely continuous if either Fxl or Fx,
has this property.
To get some feeling for these functions, we list a set of basic d.f.s that
occur frequently in the theory, and theii their ch.f.s will be given.

1. Gaussian or normal [often denoted N ( p , a 2 ) ] :

2. Poisson:
e-X 2
C05k.<a: k ! ' ~ > O 1

F(x) =
x = O , ( w e take X0 = 1 for X > O),
07 ifx<O, o r O < X < m .
236 4 Probability Distributions

3. Cauchy:

4. Gamma:

5. Uniform:
0, z < a a c R
a < z < b , b ~ R
1, z2b.
6. Bernoulli:

7. Unitary (or degenerate):

8. Binomial:

, I
k € Z
otherwise.

The respective ch.f.s are given by

We leave a verification of these formulas t o the reader with a reminder that one
can use, for coiivenience, the calculus of residues for some of these evaluations,
(such is the case with the Cauchy and Gaussian distributions).
4.2 Characteristic Functions 237

The interest in ch.f.s stems from the fact that there is a one-to-one relation
between the d.f.s and their ch.f.s, and since the latter are (uniformly) continu-
ous, they are more suitable for a finer analysis using the well-developed results
from Fourier transform theory. The uniqueness statement is a consequence of
the followiiig important result.

Theorem 1 (Lkvy's Inversion Formula, 1925) Let F be a d.f. with ch.f.


4. If h > 0
and a f h are continuity points of F , then we have

Moreover, if 4 i s Lebesgue integrable, then F has a continuous bounded den-


sity, and (8) reduces t o

F1(x) = f (x) = -
in L e-"x$(t) d t .

Proof The importance of the result lies in the discovery of the formula
(8). Once it is given, its truth can be ascertained by substitution as follows.

= 1ST
R -T
y e " ( x - a ) d t dF(x) (by Fubiiii's theorem)

where

= ,iT sin h t cos t(x - a )


t
dt
sin ht
[since -cos t ( x a ) is an even fuiictioii of t
-

t
sin h t
while -siiit(z a ) is an odd function of t.]
-

t
sin(z - a + h)t - sin(z - a - h)t
dt
t
sin u 1 (z-a-h)T
sin u
- duu - ; i - u du.
238 4 Probability Distributions

We recall from advanced calculus that s F ( s i n u/u)du = 7r/2. Then

1
liin GT (z) =
T-oo
-
2
- +
[sgii(z a h) sgn(z - - a - h)]

O,z<a-h
21 , x = a - h
1, a - h < z < a + h
21 , z = a + h
0, z > a + h

Here "sgn" is the signum function:

Substituting this in (lo), we get by the bounded convergence theorem

liin
1
-
T - - w ~ ~
sin ht e-ita
-
S _t ~
4(t) dt = I-,
a+h
1. d F ( z ) = F ( a + h) - F(a - h),

*
since ( d F ) ( a h) = 0, by hypothesis. This establishes (8).
For the second part, let q5 be integrable. Dividing both sides of (8) by 2h,
and noting that (sin ht)/ht is bounded (by 1) for all t , we can first let T + cc
and then h + 0, both by the dominated convergence theorem, since $(.)I is
the dominating integrable function. It follows that the right side has a limit
aiid so must the left side; i.e., F1(a) e x i s h 2 Since

sin ht
lim -= 1:
h i m ht
(8) reduces to (9). Further, for all z, it is clear that f (z)l < sR
4 ( t ) l dt, aiid
f is bounded. Also expressing q5 by its real aiid imaginary parts aiid the latter
by the positive and negative parts, we deduce that 4 is a linear combination
of four nonnegative integrable functions (since $ 1 is), and (9) implies that f
is a sum of four terms each of which is a Fourier transform of a nonnegative
integrable function. By Proposition 1.4.2, it follows that each of the terms is
continuous, and hence so is f . This proves the theorem completely.

We can now give the desired uniqueness assertion.

"ate that the existence of a symmetric derivative is equivalent to the existence


of ordinary derivative for d.f.s. [Use the Lebesgue decomposition and write F =
+ +
Fa F, Fd and observe that the singular and discrete parts F,, Fd have no
contribution and then F' = FL, the absolutely continuous part. We leave the
details to the reader .]
4.2 Characteristic Functions 239

Corollary 2 (Uniqueness Theorem) A d.f. is uniquely determined by its


ch.f.

Proof By definition, every d.f. associates a ch.f. with it. If two d.f.s Fl
aiid F2 have the same ch.f. 4, we need to show that Fl = F2. To this end,
since each F, has at most a countable set of discontinuities, the collection of
continuity points for both Fl and F2,say, Co, is the complement of a countable
set, and hence is everywhere dense in R. Let ai E Co,i = 1 , 2 ,a1 < a2. Then
by (81,
Fl(a2) Fl(a1) = Fz(a2) Fz(a1),
- - (11)
since their right sides are equal. If Pi is the Lebesgue-Stieltjes probability
determined by Fi on R, theii (11) implies that Pl(A) = P2(A) for all intervals
A c R with end points in Co. Consequently, PI and P2 agree on the seiniriiig
generated by such intervals. Since Co is dense in R, the a-algebra generated
by this semiring is the Borel a-algebra of R. But the Pi are a-additive on the
semiring, and agree there. By the Hahn extension theorem, they have unique
exteiisions to the Borel 0-algebra of R and agree there. Thus PI = P2 on this
algebra, so that if A = (-oo, z),z E R, which is a Borel set, we get

and the d.f.s are identical. (A direct proof of this for d.f.s. is also easy.)

In view of the preceding uiiiqueiiess theorem, it is quite desirable to have


various properties of ch.f.s at our disposal both for further work on the sub-
ject and for a better understanding of their structure. The following formula
is used for this purpose.

Proposition 3 Let F be a d.f. and 4 its ch.f. Then for any b > 0, we
have

+ z)
LblF(a - F(a - z)]dz = - eCtoq5(t) dt, a E R. (12)

Proof Replacing 4 by its definition aiid simplifying the right side, exactly
as in the proof of Theorem 1, we get the left side. However, for variety we
present an alternative argument, following H. Cram&, and deduce the result
from Theorem 1.
Let h > O be arbitrarily fixed and consider G ( z ) = J ~~ (~ydylh.
)+ ~ Then
G is a contiiiuous d.f. In fact, if F is the uniform d.f. on (-h, 0), as defined
in the list above, theii G ( z ) = & F(z y) d ~ ( y =
- ) (F * F) (z) is the coii-
volution. Since F is continuous, G is also. Let $ be the ch.f. of G. Then
$(t) = q5(t)$(t),t E R, where $(t) = (1 eith)/ith,$(.) being the ch.f. of F .
-

Hence by Theorem 1,
240 4 Probability Distributions

e-ita e-it(a+h)
G(a + h) - G(a) = liin -
it $it) dt

Since [(sinht)/tI2 is integrable on R, we can take the limit as T + cc here,


and then, on substitution for G in terms of F, we get from (13)

Replacing x by a + h + t , y by a + u in the left-side integrands, one obtains

Let a +h = a! and finally h = b in (14); it reduces t o (12), as claimed.

We are now ready t o establish the fuiidaineiital continuity theorem due t o


P. Lkvy who discovered it in 1925.

Theorem 4 (Continuity Theorem for ch.f.s) Let {F,, n >


1) be a se-
>
quence of d.f.s and {q5,, n 1) be their respective ch.f.s. T h e n there exists a
d.f. F o n R such that F,(z) + F i x ) at all continuity points of the latter iff
&(t) + 4 ( t ) , t E R where q5 i s continuous at t = 0. W h e n the last condition
holds, then q5 i s the ch.f. of F .

Proof The necessity follows from our previous work. Indeed, let Fn + F,
a d.f. Then by the Helly-Bray theorem (cf. Theorem 1.2, which is valid for
complex functions also, by treating separately the real and imaginary parts),
with f (z) = eitx,

Thus q5,(t) + q5(t),t E R, and q5 is a ch.f. of F, and hence is continuous on R.

The converse requires more detail and uses the preceding technical result.
>
Since IFn,n 1) is uniformly bounded, by the Helly selection (Theorem 1.1),
< <
there is a subsequence F,, + F, where 0 F 1 and F is a left contiiiuous
nondecreasing function. We first claim that F is a d.f., using the hypothesis
4.2 Characteristic Functions 241

on 4. For this it suffices t o show that F(+oo) - F(-cm) = 1, and later we


verify that the whole sequence converges t o F .
Let b > 0 and consider Proposition 3 with a = 0 there. Then

Since
1cos bt
-
--
2 sin2(bt/2)
t2 t2
is integrable on R,by the dominated convergence theorem we can let k + cm
on both sides of (15) t o get

7r R u2
4 (i)du.

Letting b + +cm and using L'Hopital's rule on the left and the dominated
convergence theorem on the right, we get

since, by hypothesis, 4 is coiitinuous at t = 0, and &(0) = 1, so that 4,, (0) +


4(0) = 1. Here we also used again the fact from calculus that

Thus F is a d.f., and by the necessity proof we can now conclude that 4 is
the ch.f. of F.
Let IF,;, n' > 1) be any other convergent subsequence of {F,, n I), >
with limit F. Then by the preceding paragraph F is a d.f. with ch.f. 4 again
(since 4, + 4 implying that every coiivergent subsequence has the same
limit). By the uniqueness theorem (Corollary 2) F = F . Hence all coilvergelit
subsequelices of {F,, n > 1) have the same limit d.f. F, so that the whole
sequence F, converges t o the d.f. F with ch.f. 4. This completes the proof.

Remarks ( I ) The continuity of 4 at t = 0 is essential for the truth of the


sufficiency of the above theorem. For a simple counterexample, coiisider F,
defined by
x<n
+n)/2n, -n 5 x < n
x n. >
242 4 Probability Distributions

Then F,(z) + = F ( z ) , x E R, and F is not a d.f. If 4, is the ch.f. of F,,


then it is seen that &(t) = (sinnt)/nt, so that q5,(t) + q5(t), where q5(t) = 0
if t # 0; = 1 if t = 0. Thus q5(.) is not continuous at t = 0 and it is not a ch.f.,
and F is not a d.f.
(2) It should also be noted that &(t) + 4 ( t ) , t E R,in the theorem
cannot be replaced by &(t) + q5(t), for t l < a, a > 0, since two ch.f.s
can agree on such a finite interval without being identical, as the following
example (due t o A. Khintchine) shows. Let Fl have the density f l defined by
fl(z) = (1 cosz)/7rz2, and F2 be discrete with jumps at z = n7r of sizes
-

2/n27r2,n = *I, f3 , . . . , aiid $ at 0. Then using Proposition 3 with b = 1 aiid


F as unitary d.f., one finds that the ch.f. of Fl is &(t) = 1 tl for I t < I;= 0
for tl > 1 (cf. Exercises 8 and 9). On the other hand, a direct calculation,
with the resulting trigonometric series, for q52(.) gives

and so q52(t) = 1 t l for It1


- <I ; # 0 for I t > 1. Thus q51(t) = q52(t) for
<
1 tl 1;# q52(t) for I t > 1. (If we expand 1 - tl in the Fourier series, then
the above expression results.)

The preceding remark and the theorem itself heighten the interest in
the structure of ch.f.s. First, how does one recognize a uniformly coiitinuous
bounded fuiictioii t o be a ch.f., and second, how extensive aiid coiistructible
are they? (These are nontrivial.) Regarding the second problem, note that
since the product of two (or a finite number of) ch.f.s is a ch.f. [cf. (3) and
(4)], we may coiistruct new ones from a given set. In fact, if {&, n >
1) is a
>
sequence of ch.f.s aiid ni 0 with Ci,, -
ni = 1, then

is also a ch.f. Indeed, if F, is the d.f. corresponding t o the ch.f. q5,, then
F= n,F, is clearly a d.f., and its ch.f. is given by (16).
The preceding example admits the followiiig extension.

Proposition 5 Let h : G x R + C be a mapping and H : G + [O, 11 be a


d.f. If h(., t) is continuous for each t E R and h(s, .) is a ch.f. for each s E G,
then
h(s,t)dH(s), ~ E R (17)

is a ch.f. for G = .Z or R.I n particular, if $ : R + C is a ch.f., then for each


A > 0, q5 : t H exp(X($(t) - 1)) is a ch.f. [Here G c R is any subset.]

Proof Let G = R.The first part of the hypothesis implies that h(s, t ) I <
h(s,O) = 1 , s E R and t E R, so that the integral in (17) exists. By the
4.2 Characteristic Functions 243

representation used before, there is a probability space (R, C , P) and an r.v.


X : R + R with H as its d.f. But the structure theorem of measurable
functions gives the existence of a sequence of simple functions X n : 0 + R
such that X , + X pointwise everywhere. If H n ( x ) = P [ X n < x ] ,then H, +
H , and by the Helly-Bray theorem

But H, is a discrete d.f., since X , is a discrete r.v. Hence the left-side integral,
for each n, is of the form (16)with a finite sum, so that it is a ch.f., say, $,(.).
By (18) $,(t) + $ ( t ) t, E R, as n + cm.Since h ( s , .) is continuous, it follows
immediately that the right-side integral of (18) is continuous on R.Theorem
4 then implies that 4 is a ch.f. The case when G = Z is simpler and is left to
the reader.
The last part is immediate. In fact, expanding the exponential,

where a,(X) > 0, a,(X) = 1, and $ ( t ) , is a ch.f. for each n. This is thus
of the form (16),so that 4 ( . ) is a ch.f. as shown there. The proof is completed.

Using this proposition, it is clear that we can generate a great many ch.f.s
from those of the list given at the beginning of this section. The first ques-
tion is treated in Section 4. To gain further insight into this powerful tool, we
consider some differentiability properties.

Proposition 6 If q5 is a ch.f. of some d.f. F, and has p derivatives at


t = 0 , then F has 2[p/2] moments, where [x]is the largest integer not exceed-
ing x. On the other hand, if F has p moments [i.e., JR 1xIP d F ( x ) < oo], then
4 is p times continuously differentiable. Here p > 1 is an integer.
Proof Recall that, for a function f : R + R,the symmetric derivative at
x is defined as

whenever this limit exists. Similarly the pth symmetric derivative, if it exists,
is given by the expression

with (A:) f ) ( x ) = f ( x + h ) f ( x h ) . It may be verified that, if f has a pth


- -

>
(p 1) ordinary derivative, then it also has the pth symmetric derivative and
244 4 Probability Distributions

they are equal. But the converse is false, cf. the example after the proof. (This
is also a standard fact of differentiation theory.) In our case, f = 4, x = 0.
+
If p = 2m 1, or = 2m, then 2[p/2] = 2 m , m >
1. Since ~ ( P ) ( o )exists by
hypothesis, its symmetric derivative also exists and is the same as 4(")(O).
Thus we have

Hence substituting for 2 [p/2],

2m
sin tx
x2" d F ( x ) (by Fatou's lemma)

= 1 x2"' dF(x).

This proves the first part.


For the second part, if p = 1,

4(t + h ) - 4(t)

The integralid is dominated by 1x1, which is integrable by hypothesis. Thus


by the dominated convergence theorem we may let h + 0 under the integral.
This shows
ml(t) = i /
xeitx d~ (x),
R
t E R,

exists. The general case for p > 1 follows by induction, or by a similar direct
argument. Note that 4(")(0) = i" JR ZP d F ( x ) = Pap,where a p is the pth
moment of F, if a, exists. This completes the proof.

To see that the first part cannot be strengthened, consider the example of
a symmetric density f given by

where

Then it is not difficult to verify that 4("(0) exists, and 4(3)(0) = 0. But it is
SR
clear that E(1XI3) = 1 xi3f (x) dx = +m. The details are left to the reader.
4.2 Characteristic Functions 245

As a consequence of the above proposition, if a d.f. F has p finite moments


then its ch.f. q5 is p times continuously differentiable. Thus we can expand 4
in a Taylor series around t = 0, and obtain the following:

Corollary 7 Suppose F is a d.f. with p moments finite. If a l , . . . , a,


are these moments, then the ch.f. 4 of F can be expanded around t = 0, with
either of the following two forms of its remainder:

where p, = JR 1 xIpdF(x), lQ, 5 1, and o ( t ) +0 as tl + 0.

These expalisions will be useful in some calculations for the weak limit
laws in the next chapter. An immediate simple but useful observation is that
if $ : R + R is any continuous function which can be expanded in the form
+
$(t) = 1 O(I t 1 2 + E ) ,> ~ 0, it will not be a ch.f. unless $(t) = l , t E R,so
that it is the ch.f. of the unitary distribution at the origin. Indeed, if $(.) is
a ch.f. with such an expansion, by the above corollary, a1 = a 2 = 0. If the
second (even, or absolute) moment is zero, then the d.f. colicelitrates at the
origin, proving our claim. In Section 4, we characterize functions which are
ch.f.s, but this is a nontrivial problem, and in general there is no easy recipe
for recognizing them.
To illustrate the power of ch.f.s we present an application characterizing
the Gaussian law. If X I , X 2 are two independent r.v.s with means p l , p2 and
variaiices a:,o; > 0, then Y , = (X, p , ) / a i , i = 1 , 2 , are said t o have a
-

"reduced law," since E(Y,) = 0 and Var Y , = 1,i = 1,2. Note that Yl, Y2 are
still independent. Now if Xi is Gaussian, N ( p i , a:), then it is very easy t o
find that E ( X i ) = pi and Var Xi = a:, i = 1,2. Hence the sum X = X1 X 2 +
+ +
is seen t o be N ( p , a 2 ) ,where p = p l pz and o2 = of 0;. Indeed, using
the ch.f.s we have
)=
q5(t) = ~ ( ~ , t ( x l + X d E (eitxl)E(edX2) (by independence)

= exp (iplt - $) exp (iaZt - $?)

Hence Z = ( X - p ) / a is N ( 0 , l ) . Thus if Xi are N ( p i , a:), then they have the


same reduced law as their sum. Does this property characterize the Gaussian
law? The problem was posed and solved by G. P6lya. The effectiveness of
ch.f.s will now be illustrated by the following result, the first part is based
246 4 Probability Distributions

upon P6lya's above noted work and the second one is noted by Ibraginov and
Linnik (1971), p. 32.

Proposition 8 (i) Let X I , X 2 be independent r. v.s with two moments.


Then their reduced law is the same as that of their sum iff each Xi is Gaus-
sian, N (p,, a:), i = 1,2.

(ii) Let X , Y be r.v.s with ch.f.s cjx, q5y. If Y is normal N ( p , a'), and
4 x ( u ) = 4 y ( u ) for a bounded countable distinct set of real values u,, n >1
>
and 4x (u,) = 4y (u,), n 1, then 4x (u) = 4y (u) for all u E R so that X is
also N ( p , a2).

Proof (i) If they are Gaussian, then (20) shows that the reduced law of
the sum is of the same form. Thus only the converse is new. This is trivial if
022 = 0. Thus let a: > 0, i = 1,2.
+
Let Yl, Y2 be the reduced r.v.s from X I , X2 and Z be that of X1 X 2 . By
hypothesis, Yl,Y2 and Z have the same d.f., or equivalently the same ch.f.,
= $ (say). If 41, $2, and $3 are the ch.f.s of X 1 , X 2 , and X1 +X 2 , then we
have the relations [since Xi = aiY, + p i , X1 + X 2 = (a: + O ; ) ~ / ~ Z(pl
+ +p2)] :

41 ( t ) = ei'llt4(alt), 4 2(t)= ei'lzt4(a2t),

4((4
43(t) = ei('l~+'lz)t + ai)1'2t) = $h(t)42(t), t E R.
This equation simplifies on substitution for q5j(t) to

It is the solution of this fuiictional equation which answers our question.


To solve (21), let

Thus (21) becomes

Replacing t by a t and P t and iterating, we get

Repeating the procedure for each term, we get at the nth stage

where pol p l , . . . ,p, are the coefficients in the binomial expansion of (1+z)" =
CZopixi, which are integers. Because 0 < a < 1 , 0 < P < 1, it is clear that if
4.2 Characteristic Functions 247

uo = a n t , u1 = an-'pt,. . . , u, = p n t , then uk = an-'pkt + 0 uniformly in t


(on compact sets) as n + cm.Also, since the d.f. F , whose ch.f. is 4, has two
moments, it has mean zero, and variance one. Thus by (19) in a neighborhood
o f t = 0.
+ + 1 .
+
4 ( t ) = 1 0 . (it) - ( ~ t ) ~o(t2).
2 (24)
Substituting (24) in (23)(with t = uk there), we get

To simplify (25) further, we recall a standard fact from complex analysis,


namely, in a simply coiiiiected set D, not containing 0, log z has a branch
+
[= f (z), say] and any other branch is of the form f (z) 2k7ri (k an integer),
where f : z H f ( z ) is a continuous function. Here if D = $([-to, t o ] ) , it is
connected, 0 $ D, and z = $(t). For definiteness we take the principal branch
(with k = 0). In this case, 4 is also differentiable and we can expand it in
a Taylor series. (We include a proof of a slightly more general version of the
"complex logarithm" below, since that will be needed for other applications.)
Hence (25) becomes

log 4(t) =

1
--t2(a2
2
+ p")" + R,,

Rn + P 1 ~ 1 ~ 1 a 2 n p+2.p. .2+ p , 6 , ~ , p ~ ~ ] ,
= t 2[ p 0 6 0 ~ O ~ 2 n

1 6j < 1. Now for each E > 0, we can choose an N such that n > N + 16, u j <
+
€/t2 for given I t > 0. Since a2 P2 = 1, we get IR, < E. Hence (26) for
>
n N becomes
+
I log4(t) t 2 / 2 < R n I < E. (27)
Since the left side is independent of n , and E > 0 is arbitrary,
$(t) = ept2I2. (28)

But (23) shows that (251, (26), and hence (27), are valid for any t , since for
n large enough anPkp't is in a neighborhood of the origin. Thus the result
holds for all t E R,and by the uiiiqueiiess theorem, (28) implies that 4 is a
Gaussian ch.f.
248 4 Probability Distributions

(ii) By the Bolzano-Weirstrass property of bounded sets in R, there is a


subsequence of u,, denoted by the same symbols, with limit t . By subtracting
this for the sequence, we can assume that t = 0, i.e., u, + 0 as n + oo
and u, # 0. Since q5y(u) = eifi'"-gu2, we consider two cases: (a) a = 0 and
(b) a > 0, the degenerate and the general cases of Y. For (a), q5x(un) =
>
e v U r b , n 1, so that if Fx is the d.f. o f X ,

By the uniqueness theorem, Fx must have a jump at x = p, and it must be


unitary by the fact that the u, are distinct. So we only need to consider the
case (b): a > 0.
The idea now is to show that 4x is an entire fuiictioii which agrees with
the exponential q5y at a sequence of distinct points u, (with limit 0) aiid
hence by complex function theory, the ch.f.s must be identical, which proves
the result. Since $(t) = q5x(t)$x(t) is a (real) symmetric ch.f. with values
u'
q 5 y ( ~ , ) q 5 ( ~ n ) = e-+ = ( q 5 X ( ~ n ) ) 2 , we may restrict cjx(u) to be = $y(u)l
l2
at u = u,, so that p = 0 and a2 = 1 by rescaling to establish 4 x ( t ) = e
--
2 .
t2
This implies cjx (t) = e - T eZetfor some Q E R, and establishes the result.
Thus $x is (real and) symmetric, to use induction, consider

This shows

for some 0 < KO< GO, aiid u,l E, n < >


no, KObeing an absolute constant.
Since % is bounded, and tends to 1 as z + 0, we have for each a > 0

uniformly in a > 0 (for u, <


E ) . This implies S_aa
x2dFx (x) < oo aiid as
a + GO, we get Srw ) oo.But then by Proposition 6, 4$(.) exists
x 2 d ~ x ( x<
and (by symmetry of q5x), &(O) = 0. Atoreover q5$(O) = $G(O) holds.
Now using induction, let the result hold for all m < r integers. Then
4.2 Characteristic Functions 249

where KT > 0 is a finite number depending only on r. By the earlier reason-


<
ing this implies that JRv2rdFx(v) Kr < cm,and hence by Proposition 6,
$$" (.) exists, and $gT)(0) = q5gT)(0) = KT,and 1q5$"(u) 1 < 1q5gT'(0) =
lq5gT)(0)1< K T , u E R. Since r > 1 is arbitrary, and $y(.) is an entire func-
tion, this implies that q5x(.)is also an entire function aiid agrees with q5y(.)
on u,,n > 1, 4$)(0) = (0). Hence by the classical theory q5x = 4y and
the result follows. This completes the proof.

The above characterization shows how ch.f.s allow us t o apply refined


methods of complex analysis as well as differential calculus t o solve problems
of interest. Several others of a similar nature are treated later on.
In order not t o interrupt our future treatment, we present the result for
complex logarithms here in a form suitable for our work. This is really a simple
exercise in complex analysis. [See Cartan (1963) for an extended treatment
of the topic.] When the problem was discussed briefly by Lokve (cf. his book
(1963) p. 291 ff) some reviewers were upset and expressed that the comments
were inadequate. Here we present a more detailed view, essentially followiiig
Tucker ((1967), p. 93) who witnessed Lokve's lectures. An alternative method
will also be included.

Proposition 9 Let f : R + C - (0) be a continuous function so


that f (t) # 0, t t R and suppose f (0) = 1. Then there exists a unique
continuous g : R + R such that f (t) = f (t)1 exp(ig(t)) = eh(t) where
+
h(t) = Log f (t) = log f (t) 1 i arg f (t) aiid g(t) = arg f (t) ( o r h(t)) is con-
tinuous in t. For instance, let f (0) = 1, so that one can demand g(0) = 0 and
+
then Log f = log If (t) ig(t) is uniquely defined.

Proof We present the argument in the form f (t) = If ( t ) e ~ (for


~ )a unique
coiitiiiuous g : R + R, g(0) = 0. Observe that if such a g(.) exists, then it
is unique. Indeed, if gl, gz are two functions here they differ by an integer
+
multiple of 2.ir so that gl(t) = g2(t) 27rk(t), k : R + R is continuous. But
gl(0) = gz(0) = 0 implies k(0) = 0, and being integer-valued and continuous,
this forces k(t) t o vanish and hence gl = g2(= g, say).
To show that such a g exists, coiisider a > 0 and since f (t) # 0, by
hypothesis, If 1 : A + R+ is bounded and continuous on each compact set
A c R, and is strictly positive. Replacing f by h,
we may assume that
< <
If ( t ) = 1. Taking A = [0,a], we find min{l f ( t ) : 0 t a ) = b and by this
reduction, b = 1. Since f is uniformly continuous on A, for E = 1 there is a
S > 0 such that for t l , t 2 t A = [O,a],Itl t21 < S +- f ( t l ) f ( t z ) < 1.
- -

Since f ( t ) = 1, we make the key observation that a r g (M) < $, where


, ,

arg p stands for argument of p (i.e., tali-'


(Reb) 1 for p t C
(0)). Consider
-

a partition 0 = to < t l < . . . < t, = a of [O,a] such that maxi(ti - t i P l ) < 6.


< <
Set g(t) = arg f (t), 0 = to t t l . Then g is continuous on [0,t l ] since Iml f 1
250 4 Probability Distributions

and Re(f) are continuous on [0,t l ] and f (t) # 0 on [0,tl]. Now extend g by
defining inductively as:

Then g(0) = 0, g is continuous and is the required function. Since R is


a-compact we may extend it first t o [-m, m] and then t o U,>o[-m, m] con-
+
tinuously. If we let h(t) = log If (t) ig(t) then Logf (t) = h(t) is the uniquely
defined contiiiuous logarithm, as desired. [Note that h : R + @ is a function
of t and is determined by f (not a function of f which is complex valued!)]

Alternative Proof (& 1&K.L. Chung (1968), p. 241) The argument is based
on the MacLaurin expansion of the log function log z about 1, and is slightly
longer, and is as follows.

(-q-1
Log z = C7 ( - 1 z - 11 < 1

at z = 1. If h(t) = Logf (t) here, then h(0) = 0 aiid h(.) is continuous. Let
a > 0, aiid for E = ;, we find a S > 0 aiid a partition of [-a, a], a = t, <
tp,+l < . . . < to < t l < . . . < t, = a,tj+1 - t j = t l - to < 6, such that
f(t1)- f ( t ) < <
?j f o r t E [ t - l , t l ] , and If(t)-11 = f ( t ) - f ( 0 ) ?j. Hence
Log f = h is well-defined by the series with z = f (t), t E [t-1, t l ] , h(0) = 0.
As a power series, h(.) is continuous and f (t) - f (ti)1 = f (ti)1 f ( t ) - 1I <
i, t E [ti,ti+l], since f (ti)l = 1. We may extend h onto [ti,ti+l] by setting

h(t) = h(t,) + Log (%) ,


and by iteration for all t E [-a, a]. Then we have [f(to) = 1 aiid t, < t < ti+l]

In the same way it can be extended t o the left so that h(.) is defined on
[-a, a]. As before, we can iterate the procedure t o R = U,,o[-n,n], by a -
compactness. The uniqueness is, as before, immediate andpthe result (the
unique continuous representation of Log f ) follows.

Remark In the first edition of this book, the proof given is analogous
t o the first one above. But an anonymous reader indicated that the series
argument is superior t o the first one. So we included both the methods of
proof that might appeal t o a wider audience.
A useful consequence of the above proposition is given by
4.3 Cramkr's Theorem 251

Corollary 10 Let { f , f,, n >} be a sequence of ch.f.s such that


f n ( t ) + f ( t ) as n + m,t E R. I f f and fn do n o t vanish anywhere, t h e n
Log f n ( t ) + Log f ( t ) ,t E R.

Proof Since f,, f do not vanish, by the above proposition Log f, and
Log f exist and

The hypothesis implies fnl ( t ) + f ( t ) ,t E R. Since these are never zero,


and on their ranges log is continuous, log 1 f , + log 1 f 1 . Similarly, using
the fact that arg(.) is a continuous function and f, + f , the composition
arg f,(t) + arg f ( t ) ,t E R. Hence Log f,(t) + Log f ( t ) ,t E R, as asserted.

R e m a r k s ( 1 ) The ch.f.s fn + f pointwise implies that the convergence is


also uniform on compact sets of R. Since these fuiictioiis are assumed noiiva-
iiishing and f , ( A ) , f ( A ) are compact sets for any compact A c R, and since
the Log function is uniformly continuous on compact sets of its domain, we
can strengthen the conclusion of the corollary t o Log fn + Log f uniformly
on compact subsets of R.The details of this statement are left t o the reader.
( 2 ) Hereafter the unique logarithm given by the last part will be termed
a distinguished logarithm of f denoted Log f . The same argument also shows
that Log q51cj2 = Log cjl +Log $2 and Log cjlcjyl = Log cjl -Log $2 for non-
vanishing 41, $2 as in Proposition 9. This fact will be used without comment.

As the above two results indicate, the combination of Fourier transform


theory, complex analysis, aiid the special ideas of probability make it a very
fertile area for numerous specializations. We do not go into these here; a few
of these results are given in the problems section.

4.3 Cramgr's Theorem on Fourier Transforms


of Signed Measures

If Fl and F2 are two d.f.s, then G = Fl - F2 is of bounded variation and


determines a signed measure. In a number of applications it is useful t o have
an extension of the inversion formula (cf. Theorem 2.1) for such functions as
G above. An interesting result of this nature was given by H. Cram& ( 1 9 7 0 ) .
We present it here aiid include some consequences.
252 4 Probability Distributions

Theorem 1 Let G : R i R be a function of bounded variation such


that limlxl+, G(x) = 0 and JR I x I dG(x) < cm.If g(t) = JR eit" dG(x) and
0 < a < 1, then for each h > 0 and x E R we have
h
(U - x
1
~ ) = -%
) a - l ~ ( du
9(t?e-itx
1 u"-l
dudt. (1)

If in addition JR lg(t)/tl dt < oo, then it follows that

Proof We establish (1) for x E R,h > 0 such that n: aiid z h are points +
of continuity of G. The latter set is dense in R.This will prove (1) since both
sides of (1) are continuous in h and x for each 0 < a < 1. To see this, let
(u - x ) / h = v in the left side of (1). It becomes

+ +
as z + xo, since G(z vh) + G(zo vh) a.e. (Lebesgue) as z + xo. Also,
the left side tends t o zero as h + 0, by the bounded convergence. Hence it is
continuous in x and h. For the right side of ( I ) , since G = GI - G2, where
Gi are bounded and nondecreasing with one moment existing, their Fourier
transforms are differentiable by Propositioli 2.8, aiid hence so is g(t). Since
g(0) = 0, this implies g(t) = tg'(0) +
o(t), so that g(t)/t = O(1) as t 4 0.
Regarding the last integral,

Hence the integral on the right of ( I ) as a fuiictioii o f t is absolutely convergent


uniformly relative t o x and h, and is a continuous function of x and h.
+
Thus let x , x h be continuity points of G. Proceeding as in Theorem 2.1,
consider, for any T > 0, a simplification of the right side by substitution:

[bj7 Fubini's theorem]


4.3 Cramkr's Theorem

With the bounded convergence theorem applied t o (3) one gets

liin (left side of (3))


T+cc

(since(2/ii) lY (sintar)/t dt = sgn n


)

= ( a ) 1 x
z+h
(y - x)YdG(y) + ( h Y / a ) G ( x+ h)
(after simplification)

(y - X ) ~ ~ ' G (dy~ ) (with integration by parts).

This establishes ( I ) using initial simplification from the first paragraph.


For the last part, we may differentiate both sides of (1) relative to h and
get, by the fundamental theorem of calculus,

the interchange of differential aiid integral being easily justified. Cancelling


+
ha > 0, and replacing x h by x in the above, we get (2), and the theorem is
proved.

The above result is of interest in calculating the distributions of ratios of


r.v.s aiid asymptotic expaiisioiis (cf. Theorem 5.1.5). The former, also due to
Cram&, is as follows.

Theorem 2 Let X I , X2 be two r. v. s with finite expectations. If P[X2>


01 = 1 and q5 is the joint ch.f., so that q5(t,u) = E(eitX1+iuX2 1, suppose
that q5(t, u) = O((I t + " some S > 0 as I t u + oo. Then the
~ 1 ) ~ for +
distribution F of the ratio X1/X2 is given by

where $2 is the ch.f. of X 2If, further,

uniformly in x, then the density F' exists and is given by


4 Probability Distributions

f ( z ) = F 1 ( z )= ,
2:tL2
-it, t z ) dt, z t B.
(5)

Proof If F 2 ( z )= P [ X 2< z ] ,z fixed, then by hypothesis F2(0) = 0, aiid

Let Y, = X 1 - x X 2 and H ( y ) = P[Y, < y ] , so that F i x ) = H ( 0 ) - F2(0).


If G ( y ) = H ( y ) F 2 ( y ) ,then G satisfies the hypothesis of Theorem 1. The
-

further condition on 4 ensures that for M > 0,

and as t + oo, I4(t, t z ) 5 M tlp"or z t B.These rates of growth are


utilized as follows.
If hit) = E(eitrc), then hit) = 4 ( t ,t z ) , aiid the Fourier transform g of
G satisfies g(t) = hit) - $2(t). Hence Eq. ( 6 ) shows that

and the given asymptotic rate growth implies J[l 1 g(t)/tl dt < oo. Thus
we can apply the second part of Theorem 1 t o get

Since G ( 0 ) = F ( z ) , ( 7 ) gives ( 4 ) for y = 0. From this ( 5 ) is obtained by dif-


ferentiation using the additional hypothesis. This completes the proof.

In case X 1 and X 2 are independent r.v.s, the above result simplifies as


follows.

Corollary 3 Let X I ,X 2 be independent r.v.s with finite means and ch.f.s


41 and $2. If P[X2 > 01 = 1 and ~ l l t 1 2 M142(t)/tldt
l < cc for some Atr > 0,
then we have

Moreover, ( 8 ) has a density f given b y

provided the integral exists uniformly relative to z in compact intervals.


4.3 Cramkr's Theorem 255

Proof Since q5(t, u) = q51(t)$z(u), the result follows from (4) because the
>
integrability conditions of (6) are satisfied as t + 0, and for tl M > 0, we
have

The current hypothesis implies that this is integrable on tl > M. The rest is
exactly as in the theorem, and the result holds as stated.

Let us present two applications of these results to show their utility.

Example 4 Let X , Y be independent N ( 0 , l ) r.v.s and Z = YI. Then

and vanishes otherwise. Thus its density function is fz(z) = J m e - z a / 2 .


Since X , Z are independent, consider the distribution of the ratio X/Z. The
hypothesis of Corollary 3 is easily seen to hold here. Now if $1, $2 are ch.f.s
of X and Z , then q51(t) = ept2I2and

Thus (9) gives the desired density, and we have

dt d r (by Fubini's theorem)

Thus X / Z has a Cauchy distribution. [Alternatively, Z = x/@ and Y2


has the chi-square distribution with parameter =1 (i.e., of one degree of free-
dom). Hence Z is distributed as student's t-with one degree of freedom, (as
in Exercise 8 ) , which is Cauchy. This argument did not use Corollary 3. With
change of variables, one can also show that X / Y has a Cauchy distribution
to which Corollary 3 is not applicable!]

Example 5 Consider a pair of dependent r.v.s X I , X2 whose joint d.f. has


the ch.f. q5 given by ( P [ X a > 01 = 1, X2 being a gamma r.v.):
256 4 Probability Distributions

It is not evident that q5 is a ch.f. But it arises as a limit of a sequence of ch.f.s


and is continuous. [See Equation (5.6.21) latter.] Now it is easily verified that
this satisfies the hypothesis of the last part of Theorem 2, and hence the
density of the ratio of X I , X2 is given as

(since the iiitegrand above has this primitive


and so no contour integration is needed)

=
1
lim
1 + (iz/T)
-
271 T-a [ ( I + x2)([1+ T 2 + 2 i T ~ j / T ~ ) l / ~

Thus the distribution of the ratio of two dependent r.v.s, neither of which is
N ( 0 , I ) , is again Cauchy.

Other applications and extensions of the above theorems will be seen to


have interest in our work.

4.4 Bochner's Theorem on Positive Definite


Functions

We present here a fuiidaineiital characterization of ch.f.s due to S. Bochner


who established it in 1932. There are at least three different methods of proof
of this result. One is to base the argument on the continuity theorem for
ch.f.s. The second is to obtain the result by a careful extension of an earlier
(special) result of Herglotz for the discrete distributions. The third one is first
to establish the result for fuiictioiis in L2(R) and then to use it to obtain the
general case for all d.f.s. Of these the first two methods are probabilistic in
4.4 Bochner's Theorem 257

nature and the third one is more Fourier analytic in content. We therefore
give only a probabilistic proof, which is due to Cram&.

Definition 1 Let f : R + C be a mapping and t l , . . . , tn be points in R.


Then f is said to be positive definite if for any a, E C, i = 1 , . . . , n, we have

where aj is the complex conjugate of a j .

The fundamental result alluded to above is the following:

Theorem 2 (Bochner) A continuous q5 : R +C with $(O) = 1 is a ch.f.


iff it is positive definite.

Proof The necessity is simple and classical, due to M. Mathias who ob-
served it in 1923. Thus if 4 is the ch.f. of a d.f. F , then 4 is continuous,
4(0) = 1, aiid

Thus 4 is positive definite.


For the converse, let q5 be positive definite, continuous, aiid 4(O) = 1. If
f : R + C is continuous, then for any T > 0, (known as the integral form of
positive definiteness)

since the integral exists on the compact set [O, T] x [0,TI, and is a limit of
the finite (Riemann) sums Ci Cj q5(ti t j ) f (ti)f (tj)AtiAtj. But the latter
-

is nonnegative by the fact that q5 is positive definite. Now let f ( u ) = ePiu".


We define

Make the change of variables t = u - v, T = v in the above. Then after a slight


simplification one gets
4 Probability Distributions

so that

where $ ~ ( t )= [l - ( tl/T)]$(t) for tl 5 T , and = 0 for tl > T . We now


claim (i) p~ is a probability density on R and (ii) q 5 is ~ its ch.f., for each
T. These two points establish the result since 4 ~ ( t + ) q5(t) as T + oo for
each t E R and 4(.) is contiiiuous on R by hypothesis. Thus by the contiiiuity
theorem for ch.f.s, $ must also be a ch.f. We thus need t o establish (i) and (ii).

(i) The verification of this point involves an interesting trick. Consider


the increasing sequence of fuiictioiis $IN defined by $IN(z)= [I ( z / N ) ] if
-

<
1 z N, and = 0 for I zl > N . Then $INpT : R + R+ is contiiiuous, aiid
) 1 as N + oo,for each z E R.Hence
$ ~ ( zjl

where we used the dominated convergence theorem t o move the limit inside
) ~integrable] aiid then the continuity of 4 at
the integral [since ( s i n ~ l v is
t = 0, plus the fact that $(O) = 1, implying cjT(0) = 1, and Iq5(t)l 5 1 (cf.
Proposition 3 below.)

(ii) To see that 4T is the ch.f. of p ~ we, use the just established integra-
bility of p ~ and
, the dominated coilvergelice t o conclude
4.4 Bochner's Theorem

= lim
N-oo
S, e""$N(x)m(x) dx [$.N (.) is defined above for (4)]

(by integration aiid Fubini's theorem)

= q 5 ~ ( t ) (by the dominated convergence, as in (i)


and the continuity of $T at t).

But the left side is a ch.f., and hence q 5 is


~ the ch.f. of p ~ This
. proves (ii),
and the theorem follows.

There is actually a redundancy in the sufficiency part of the hypothesis.


The positive definiteness condition is so strong that mere continuity at t = 0
of $ implies its uniform continuity on R. Let us establish this and some related
properties.

Proposition 3 If 4 : R + @ is positive definite, then (i) 4(0) 0, (ii) >


<
4(-t) = &t), (iii) 4 ( t ) l q5(0), (iv) $(.) is positive definite, and (v) if q5 is
continuous at t = 0, then it is uniformly continuous on R.

Proof If bij = q5(ti t j ) and a = ( a l , . . . ,a,)t, then (1) implies that the
matrix B = (bij) is positive definite; i.e., using the inner product notation
aiid t for conjugate traiisposition of a vector or matrix aiid B* for the adjoiiit
of B, we get by positive definiteness:

( B a , a) = (a, B * a ) = (B*a,a ) = (B*a,a ) > 0, (5)


But by the polarization identity of a complex iiiiier product, we have for any
pair of n-vectors, a , b

4(Ba,b) = ( B ( a + b ) , a + b) - ( B ( a b ) , a b)
+ i ( B ( a + ib), a + ib) - i ( B ( a - ib), a - ib),
= 4(B*a, b) [by (5)].

Hence ( B a , b) = (B*a,b) for all a , b, so that B = B*. Taking a = b =


>
( 1 , 0 , . . . , O ) t , we get q5(0) 0. This gives (i) and (ii). The positive definiteness
260 4 Probability Distributions

of B implies that each of its principal minors has a positive determinant. Thus
1 $(t)I2 5 4(0)', which is (iii). That (iv) is true follows from (1) itself (or also
from (5)).
For (v) coiisider the third order principal minor Bj of B. Excluding the
trivial case that 4(0) = 0, we may and do normalize: 4(0) = 1 [since otherwise
$(t) = $(t)/q5(O) will satisfy the conditions]. Then the determinant of B3 is

Taking t l = 0 , t 2 = t , and ts = tl, so that b12 = 4(-t), b13 = 4(-t'), and


bas = $(t - t') in (6), we get

which tends t o zero uniformly as t + tl. This establishes (v), and the proof is
complete.

In general, it is not easy t o determine whether a bounded contiiiuous


function on R is a ch.f. The proof of Theorem 2 contains information for the
following result due t o Cram&, which may be easier t o verify in some cases.

Proposition 4 A bounded continuous function 4 : R + @, with 4(0) = 1,


is a ch.f. iff for all T > 0,

p,(x) = iT
iT cj(u - v)ei(u-v)z d u d v > O , xER.

The proof is essentially the same as that of Theorem 2 with simple alter-
ations and is left t o the reader.
In view of Proposition 3, it is natural t o ask whether the contiiiuity hy-
pothesis can be eliminated from Theorem 2. F. Riesz in 1933 has shown that
this is essentially possible, but we get a slightly weaker conclusion. A precise
version of this statement is as follows.

Theorem 5 (Riesz) Let 4 : R + @ be a Lebesgue measurable mapping


satisfying $(0) = 1. Then $ is positive definite iff it coincides with a ch.f. on
R outside of a Lebesgue null set.

Proof Suppose 4 is positive definite, Lebesgue measurable, and 4(0) = 1.


We reduce the argument t o that of Theorem 2, by the followiiig device due t o
Riesz himself. Let t l , . . . , t, be n arbitrary points from R.By hypothesis
4.4 Bochner's Theorem 261

This inequality holds for any vector ( t l , . . . ,t,) E Rn. Hence integrating the
expressioiis on both sides of (7) relative t o the n-dimensional Lebesgue mea-
sure on the compact n-rectangle [O, NIn c Rn and using Proposition 3, we get
on the left for the diagonal terms nq5(0)Nn = n N n , and for the non-diagonal
terms [there are n ( n - 1) of them, and q5 is a bounded measurable function]

n ( n - 1 ) ~ " ~ $(t
~ lN
lN - tl)e'"('-'1 dt dt'

Coiisequeiitly ( 7 ) becomes

Dividing (8) by n ( n - 1 ) N n P 2and noting that n > 1 is arbitrary, one obtains

But this is 2 ~ N p ~ (ofx )(3), and we can use the argument there.
Thus if 4N and p~ are defined as in that theorem, then pN(x) 0 and >
4~(t= ) [1 ( t /N)]q5(t), where [1 (I t I N ) ] actually defines the ch.f. of
- -

the probability density (1 cos Nz)/7rNx2, x E R, as we have seen in Section


-

1. Now the work leading t o (4) implies that 0 5 JRPN(x)dx 5 Mo < cm,
uniformly in N . Next consider for any u E B

= lim
K-oo
1
T
/ 1"4~
0
(t + (a) sin v
2
dt du [by step (ii) after (411

= lim
K-a
11
R v/K
u+v/K
4 N ( r ) dr &)
sin v 2
dv

(by the dominated convergence


and then the second integral is unity).

But the left side is G 5 ~ ( tdt,


) where j3N is the Fourier transform of p~ (a
"ch.f."), and since & ~ ( +
r ) q5(r) for each r E R as N + cm, by the bounded
convergence theorem we conclude that
262 4 Probability Distributions

exists. If G N ( x ) = JfwPN(u) du, then {GN,N >


1) is a uniformly bounded
(by M o ) nondecreasing class, so that by the Helly selection principle we can
find a subsequence GN, with GN, + G as Nk + cm at all continuity points of
G, where G is a bounded (by Mo) noiidecreasing (aiid nonnegative) function.
By the Helly-Bray theorem, we then have

as Nk + cm. Actually we apply that theorem with the followiiig modified


form using the fact that g,(x) = (eitL"- l ) / i x + 0 as x + +m. Let a , b be
continuity points of G, and consider

5 ~ lb
gu(x) d(GNk- G) (x) + ifif0s ~ { g u ( x ): x $ ( a )
The last term can be made small if a , b are chosen suitably large, aiid then
the integral on the compact set [a,b] goes t o zero by the Helly-Bray theorem.
(12)

But the right side of (11) is J:4(t) dt. By this argument we deduce that
>
each convergent subsequence of {GN, N 1) has the same limit = J : $(t) dt,
which is absolutely continuous. It follows that the limit fuiictioiis of G N differ
by at most a constant, i.e., dGN + dG for a unique G, aiid hence their
Fourier trailsforins converge aiid if Jp:eitx dG(x) = $(t), then $(t) dt =
$(t) dt, u E R. Thus $(t) = $(t)a.e. But $ is continuous and $(0) = 1 =
$(O), so that $ is a ch.f. by Proposition 3. This proves the main part of the
theorem.
4
For the coiiverse, let 4 = a.e., where 4 is a ch.f. aiid 4(O) = 1. We
again form pT(x) as in (3) with q5. Since the Lebesgue integral is unaltered if
we replace the integrand by an a.e. equivalent function, p~ remains the same
if $ is replaced 4. Consequently (3) implies p~ > 0, and (4) shows p~ is a
probability density. Also, & is the ch.f. of p ~ aiid
, hence is positive definite.
The same is true if q 5 is~ used in place of &, aiid q 5 is
~ continuous, being
the product of the bounded function q5 aiid the ch.f. defined there. Thus
~ positive definite and cjT + $ pointwise as T + cm. Since a pointwise
q 5 is
limit of a sequence of positive definite functions is clearly positive definite,
it follows that 4 is positive definite. Since 4 is not necessarily coiitiiiuous at
t = O,4 is not generally a ch.f., as simple couiiterexainples show. This proves
the coiiverse aiid the theorem is established.

Even though we used the continuity theorem for ch.f.s as a key result in
the proof of Theorem 2, one can establish that the continuity theorem is a
consequence of Bochner's theorem. Thus there is a certain equivalence be-
tween these results. Let us now establish this statement.
4.4 Bochner's Theorem 263

Theorem 6 The continuity theorem for ch.f.s is a consequence of


Bochner's theorem. More explicitly, let {$,,n > 1) be a sequence of ch.f.s
>
and {F,, n 1) be the corresponding sequence of d.f.s. If $,(t) + $(t), t E R,
and q5 is continuous at t = 0, then there is a d.f. F and F, + F at all conti-
nuity points of F . Further, 4 is the ch.f. of F .

Proof Since $ is the pointwise limit of positive definite functions, it is pos-


itive definite; and it is continuous at t = 0 by hypothesis. Thus by Proposition
3 and Theorem 2, 4 is a ch.f. Let F be its d.f. We now show that F, + F,
or equivalently, if P, aiid P are their Lebesgue-Stieltjes probability measures
on R, then limsup, P,(A) <P ( A ) for all closed sets A. (Here the truth of
Theorem 2 is assumed.)
For this, we shall verify one of the equivalent hypotheses of Theorem 1.5.
Let {t,,n >1) c R be an arbitrarily fixed dense deiiuinerable set (e.g.,
rationals.) Write K = {em : x :z R}, and let K" be the countable Cartesian
product of K with itself. Then K (identified as the unit circle in C)is compact,
and hence so is K w , by the Tychonov theorem, and is separable in the product
topology. Consider the mapping T : R + K w , defined by
7 :x H (eitl",j > 1). (13)
Since for any X I # x2 we can find a pair of t,, t k such that eZtlx1# eZt"\ it
follows that 7 is a one-to-one mapping of R into K m . Also, it is continuous.
Further, if 7, = 7 then 7;' (closed set) c closed subset of [n,n I ] . +
Since n >1 is arbitrary, this shows that 7 - I : Kw + R aiid 7 : R + Kw
are both measurable. We may say, therefore, that T is a Borel isomorphism
between R and T(R) which is countably compact. With this mapping, consider
>
p, = P , o F ~ . T h e n p , ( ~ ( R ) )= 1 , n 1, and thesameis t r u e o f p = POT-'.
Moreover, since 7-' is a fuiictioii,

e"'pl("a., (dk) = e"" P, (dx) = e"" d ~ , ( x )= A, ( t ) , (14)

aiid similarly 4 is the ch.f. of p . Since K" is separable, we can use the same
reasoning as in Helly's selection principle and conclude that there is a subse-
quence p,, + ,Ci and the compactness of Kw implies is a probability measure
(no mass "escapes" t o infinity). By the Helly-Bray theorem (also applicable
4
here), q5,, (t) + & t ) , the ch.f. of ,Ci. But then = q5, aiid so by the uniqueness
theorem p = j i Repeating the argument for each convergent subsequence, we
deduce that p, + p. (If we had extended Theorem 1.5 somewhat, this could
have been immediately deduced from it.) We still have t o prove the result for
the Pn-sequence.
Let R = U, A,, where A, are disjoint bounded Borel sets whose bouiid-
aries have zero P-measure. If C c R is any closed set, then (& denoting the
closure of Ak)
i n sup P (C) = lim sup
n
x
k>l
P, ( C n Ak) < x
k>l
liin sup P, ( C n zk). (15)
2 64 4 Probability Distributions

We now use the result that p, +p proved above. Thus

l i m s u p ~ , ( ~ n & )= limsupP,
n n
07~(7(~nz~))
= lim sup pn ( T ( C n &))
n

5 p ( 7 ( C n &)) (by Theorem 1.5iv)

Substituting (16) in (15), we get

lim sup Pn( C ) 5 P ( C n &)


n
k)l

= P ( C n Ak) [since P(& - Ak) = 01


k>l
= P(C).

Hence by Theorem 1.5iv again, Pn + P or Fn + F, which ends the proof.

Remark This result is significant only when we present an independent


proof of Bochner's theorem. Indeed, the other two proofs mentioned in the
introduction of this section are of this type. It is also possible t o establish
Bochner's theorem using the projective limit results of Bochner aiid Kol-
mogorov (cf. Theorem 3.4.10). We give one of these versions in the next sec-
tion for variety aiid also because these coiisideratioiis illuminate the subject.
It shows how several of these apparently different results are closely related
t o each other.

The condition of positive definiteness of Theorems 2 and 5 is thus essential


for characterizing ch.f.s. But it is not very easy t o verify in applications. The
followiiig suficient coiiditioii can be used almost by inspection, and hence we
include it. This result was obtained by G. P6lya in 1923.

Proposition 7 A continuous symmetric nonnegative function 4 on R


with 4(0) = 1 is a ch.f. if it is nonincreasing and convex on the positive line.

Proof We exclude the simple case that $(t) = 1, t E R. If limtioo $(t) =


a < 1, then $(t) = (4(t) - a ) / ( l- a) satisfies the hypothesis. Thus we may
assume that a! = 0. As noted in the proof of Proposition 1.3.7, $ can be
expressed as

4(t) = 4(0) + /f
0
t
( u )' u , > 7' (17)

and 4(0) = 1. Here f is the right derivative of 4, which exists and is non-
decreasing. Since 4 is decreasing by hypothesis, f must be negative on R+
4.5 Some Multidimensional Extensions 265

and lim,,, f (u) = 0 [limt,, 4(t) = 0 being the present condition]. We now
complete the proof (using a remark of K. L. Chung) by reducing the result t o
Proposition 2.5.
As noted in Remark 2, after Theorem 2.4, we know that hl (t) = 1 I t 1 -

for t l< 1, and = 0 for t > 1, gives a ch.f. [of (1 c o s z ) / m 2 , z E R],and


hence if h is defined as

then for each t , h(.,t ) is also a ch.f., aiid h(s, .) is coiitiiiuous for each s . We
now produce a d.f. H on R using (171, and then get a mixture of h and H t o
represent 4 and complete the argument.
Since f in (17) is increasing, consider H(s) =:J t df (t) = 1 - $(s)+sf (s).
>
Then H(.) /' 0 is a d.f. because 4(s) + 0, as s + oo so that f (s) = o(spl),
implying lim,,, >
sf (s) = 0. Hence for t 0, if

we get gf(t) = f (t) (by the fundamental theorem of calculus, or even by


considering

as A t + 0 because of Lebesgue's theorem on differentiation), so that


g(t) = 4(t) + C. But g(0) = 4(0) = 1, so that C = 0, aiid hence
$(t) = JR h(s, t) d H ( s ) , t E R, since g(-t) = g(t) = 4(t) = 4(-t). Thus
by Proposition 2.5, 4 is a ch.f. This completes the proof.

4.5 Some Multidimensional Extensions

The preceding results for ch.f.s are all on R or the (discrete case) inte-
gers. Some of these extend immediately t o k dimensions, but others, such
as Bochner's theorem, are more iiivolved. In this section we indicate these
possibilities, but prove a generalization of Bochner's theorem using the pro-
jective limit method. This section may be skipped on a first reading.
266 4 Probability Distributions

If X I , . . . , X, are r.v.s on ( 0 ,C , P), then their joint d.f. Fx,,..., x,,, or F,


for short, is given by

F,(sl,. . . ,s,) = PIX1 < 2 1 , . . . , X, < s,], s, E R. (1)


Thus any nondecreasing (in each component) left continuous nonnegative
function which satisfies

lim F,(sl ,... , x , ) = O , lim F , ( x ~ , . . . , x , ) = ~ , - ~ ( ,x.~. . , x,-l),


z,--00 x,,-00

is a d.f., where AF, is the (n-dimensional) increment of F,. The ch.f. is defined
as usual as

If p is the Lebesgue-Stieltjes probability measure on Rn defined by F,, then


for any Bore1 set A c Rn with its boundary dA, measurable, we say A is a
continuity set of p or of F, if p(dA) = 0. [Thus p ( 2 ) = p(iiit(A)).] If A is a
rectangle, then dA is always measurable, and if p(dA) = 0, then it is simply
termed a continuity interval. The inversion formula and the uniqueness theo-
rem are extended without difficulty. We state it as follows.

Theorem 1 Let F, be a n n-dimensional d.f. and 4, be i t s ch.f. ( o n Rn).


If A = ~ Y = ~ [ a i ,+a jhi) i s a continuity interval of F,, h, > 0, and P i s i t s
Lebesgue-Stieltjes probability, t h e n

Hence P o r the F, i s uniquely determined by i t s ch.f., 4,


The straightforward exteiisioii of proof of the one-dimensional case is left
to the reader.

The next result contains a technique, introduced by H. Cram& and H.


Wold in 1936, that allows us to reduce some multidimensional considerations
to the one-dimensional case. If ( X I , . . . , X n ) is a random vector with values
in Rn, we introduce the (measurable) set SA,z, with a = ( a l , . . . , a n ) E R n ,
4.5 Some Multidimensional Extensions

We now establish the above stated technique in the form of

Proposition 2 If PI and P2 are two Lebesgue-Stieltjes probabilities on


Rn such that Pl(Sa,,)= P2(Sa,,)for z E R and vector a E Rn, then PI = P2
and the common measure gives the distribution of ( X I , . . . , X,).

Proof By hypothesis PI and P2 determine the same d.f. of Y:

Hence

where 4, is the joint ch.f. of ( X I , . . . , X,). Thus if t = 1, this shows that PI


aiid P2 have the same ch.f.s for n # 0 E Rn. If n = 0, then &(O) = 1. Hence
PI and P2have the same ch.f. 4,. By the preceding theorem (the uniqueness
part) PI = P2on all the Borel sets of Rn. The last statement is immediate.

Using this result, for instance, the inultidimensioiial continuity theorem for
ch.f.s caii be reduced t o the one-dimensional case. We sketch this argument
here. If P, + P, where P,, P are the Lebesgue-Stieltjes probabilities on R k ,
then by the k-dimensional Helly-Bray theorem (same proof as in the one-
dimensional case) the corresponding ch.f.s converge t o that of P. Thus for the
sufficiency, the above (reduction) technique caii be applied.
If 4, is the ch.f. of a k-dimensional distribution (= image probability)
P,, then, by the (multidimensional analog of) Helly selection principle, there
exists a a-additive bounded measure P on Rk such that P,(S) + P(S) for all
Borel sets S c Rk such that ~ ( 8 s =)O,8S being the boundary of S. On the
other hand, for each fixed t l , . . . , t k j # 0 in R k ) ,& ( t ) = & ( t t l , . . . , t t k ) +
4 ( t t l , . . . , t t k ) = $ ( t ) , t E R, and 4 is contiiiuous at t = 0. Hence by the
one-dimensional coiitiiiuity theorem $(.) is a characteristic function. If S,,, is
given by (4) as a subset of R y w i t h a, = t, here), then Fz(z) = P,(S,,,) +
P(s,,,), aiid F,"(z) + F0(z) at all continuity points z of F", a d.f. with 4
as its ch.f. Now let z + oo (because Fa(+oo) = I ) , it follows that

P ( R k ) = lim P(s,,,)
= F0(+oo) = 1.
00'2

Hence P is a probability function, and then 4 will be its ch.f. Next, by the
familiar argument, with Theorem 1, we conclude that each coiivergent sub-
>
sequence of {P,, n. 1) has the same limit P, and thus the whole sequence
converges t o P. This gives the multidimensional continuity theorem for ch.f.s
Let us record another version of the above proposition which is very use-
ful in some applications. In particular, we caii deduce a classical result of J.
268 4 Probability Distributions

Radon from it. The desired form is the following

Proposition 2' A probability distribution in R%s determined completely


by its projections on a set of subspaces of dimensions 1 , 2 , . . . , k 1 that to-
-

gether exhaust the whole space.

Proof Since the set {S,,, : x E R, a! E R '


) of subspaces of Proposition
2 cover R k , it is clear that both propositions are equivalent. We sketch the
argument, with k = 2, for simplicity and emphasis. Thus we take the ch.f. q5
of F ( . , .) :
4(tl, t2) = L2 eitlXiit2y d ~~1 Y).
( (6)

Consider a line eQthrough the origin making an angle 8 with the x axis. Then
the projection of (x, y) on this line is given by

Hence if the projection of F on is Fee,its ch.f. is obtained as follows. Let


x, y be the values of the r.v.s X , Y, aiid z' be that of XI = X cos Q Y sin 0. +
Then
q5Q(t)= ~ ( e " ~ = 1
' ) eitr dFx1 ( T )
R
is assumed known for each 0 5 Q < .~raiid t E R.But
eit(X cos Q+Ysin Q)
h ( t ) = E( ) = q5(t cos8, t sin8).

Coiisequently if 4Q(t)is kiiowii for each Q aiid t , then so is q5, since

Thus F is completely determined, by Theorem 1. This completes the proof.

The followiiig consequence of the above result is due t o J. Radon, proved


in 1917, and plays an important role in applications. In fact, it played a cru-
cial role in the work of A. M. Cormack, who derived a particular case of it
independently in the late 1950's. The work is so useful in tomography and
(brain) scanning that he was awarded a Nobel prize in medicine in 1979 for
this. It was ultimately based on the Radon theorem. [A historical account
and medical applications are detailed in his Nobel lecture, which is printed in
Science 209(1980), 1482-1486.1

Corollary 3 (Radon's Theorem) If K is a bounded open set of the plane,


and if the integral of a continuous function f : K + R vanishes along every
chord of K , then f = 0.
4.5 Some Multidimensional Extensions 269

Proof Let f l = max(f,O) and f 2 = fl - f . Then f l , f 2 are nonnega-


Se
tive and continuous. By hypothesis f d x d y = 0 for every chord e of K, so
Se st
that f l d x d y = f2 d x d y for each such l. Since the fi are integrable, we
may identify them as "densities" (i.e., that their integral should equal unity
is unimportant). But by Proposition 2 (or 2') these integrals determine fi
completely. Hence fl = f 2 , so that f = 0, as asserted.

Another consequence of some interest is contained in

Corollary 4 (Rknyi) Let F : R2 + R+ be the d.f. of a bounded random


vector ( X , Y ) : R + R2 of a probability space ( R , C , P ) . [Thus the range of
( X ,Y ) i s contained in some ball of radius 0 < p < oo.]I f Ft,, the projection
of F o n a line le: ee ( x ,y ) = x cos Q + y sin Q through the origin, i s given for a n
infinite set of distinct lines [= for a n infinite collection of distinct Q(modn ) ] ,
then F i s uniquely determined.

Proof Let Q1, H z , . . . be a distinct set as in the statement. Then this is


bounded and there exists a convergence subsequence {8,~),/>1 with limit 4.
By Proposition 2 or 2', if 4 is the ch.f. of F and cje is the ch.? of Fe,,then

is known. Since the vector ( X , Y ) has a bounded range, the measure deter-
mined by the d.f. F has a compact support. Thus its Fourier transform 4 is an
analytic function (since 4 is continuously differentiable; also see Problem 6).
But this transform is known for a convergent subsequence {Q,I),/>~,so that
(9) implies, by the method of analytic continuation, that q5 is defined uniquely
>
on all of the complex plane C. This shows that {q5e (.), 0 E (Q,, n 1)) deter-
mines 4, and hence F, as asserted.

Let us now present an extension of Bochner's theorem with a proof that


does n o t use the continuity theorem. The latter can thus be obtained from
the former, as shown before. Our argument is founded on the Kolmogorov-
Bochner Theorem 3.4.10, exhibiting the deep interrelations between three of
the major results of the subject.
The first step here is based on the following special result.

Proposition 5 (Herglotz Lemma) Let 4 : .Zk + C, k > 1, be a function


where FZk i s the set of all points of Rk with integer coordinates. T h e n for each
j = ( j l , . . . , j k ) E Zk,

where T%S the cube with a side [0,2n] i n Rk and p i s a probability o n the
Bore1 sets of T', iff q5 i s positive definite and $(0) = 1, i.e., iff q5 is a ch.f. o n
4 Probability Distributions

Zk. The representation (10) is unique.

Proof If q5 is given by ( l o ) , then clearly q5(O) = 1 and it is positive definite


as in the one-dimensional case. Only the converse is nontrivial.
Thus let q5 be positive definite and 4(0) = 1. If j' = (ji,.. . ,jfi) and
j l f = ( j f , . . . ,ji) are two points of Z" then, the sum and difference of such
vectors being taken componentwise, let

The positive definiteness of q5 implies fN > 0. If we define

for each Borel set A of T" then p~ is a probability, since q5(O) = 1. More-
over, the orthogonality relations of trigonometric functions on the circle T , by
identifying 0 and 27r, give

if jTl< N, r l l. . . k
={0
j (1 - ) otherwise.
=
(13)

Here j = ( j l , . . , j k ) . But for the sequence { p N , N >1) on the Borel sets


of T" we can apply the k-dimensional version of Theorem 1.1, and extract a
convergent subsequence {pN,?,s > 1); thus there is a probability p (since T k
is compact in the product topology) which is a limit of this subsequence. Now
by the correspoiidiiig Helly-Bray theorem we get on letting N, + ce in (13),

This is (10).
The uniqueness can be obtained from Theorem 1. Alternatively, since T%S
k
8, E T, j E Z k )
compact, we note that the set of functions {exp(i C T = ljrQr),
is uiiiforinly dense in c(T'"), the space of continuous complex fuiictioiis on
T< by the Stone-Weierstrass theorem. This implies the uniqueness of p in
(10) at once, completing the proof.
4.5 Some Multidimensional Extensions 271

The preceding result will now be given in a general form using Theorem
3.4.10 in lieu of the Lkvy continuity theorem for ch.f.s. If S is a set, then
TS(= xtEsTt,Tt = T ) also denotes the space of all functions defined on S
with values in T . Since T = [0, 2x1 is compact, T' is then compact under the
product topology and, with compoiientwise addition modulo 2x, it becomes
a compact abelian group. We can thus express Q E T S as 6' = (Q(s),s E S),
so that the coordinate projection p, : 6' H Q(s)is a mapping of T S onto T .
Let B be the smallest a-algebra (=cylinder algebra in the earlier terminology)
with respect t o which each p,, s E S, is measurable when T is given its Borel
a-algebra 7.Thus B = o(U, p l l ( ' T ) ) . Let Zbe the set of mappings n : S + Z
such that n(s) = O for all but a finite number of s E S . With compoiientwise
addition, Z becomes an abelian group (the "dual" of TS.).
Using this terminology, we have the following simple but crucial extension
of the preceding result:

Proposition 6 Let q5 : Z+ C be positive definite and 4(0) = 1, where 0


is the identically zero function of Z. Then there is a unique probability P on
B such that for each n E Z ,

Proof First note that the integral in (14) is really a finite-dimensional one
since n(s) = O for all but a finite subset of S. Let us define for each finite set
F c S,
ZF = {n E Z : n ( S - F) = 0).
Clearly ZF is a group (isomorphic t o RF). Hence by Proposition 5 applied t o
T F , there is a unique probability PF on the Borel 0-algebra of T F such that

If 3 denotes the directed (by inclusion) family of all finite subsets of S, then
the uniqueness of PF in (15) implies that {PF,F E 3)is a compatible family
of Borel (or Radon) probability measures on {TF, F E 3 ) . Now by Theorem
3.4.10 there exists a unique probability P : B + [O, 11 such that pop;' = PF,
where p~ : T S + T F is the corresponding coordinate projection. This means
we can replace T F and PF in (15) by T S and P, so that (14) holds. This
completes the proof.

To obtain the n-dimensional version of Bochner's theorem from the above


result, we need t o identify T S with a simpler object. Here the structure theory
(actually the Pontryagin duality theorem) intervenes. The special case for IKk
272 4 Probability Distributions

will be given here, as this is easy. If the "duality" is assumed, then the proof
carries over to the general locally compact abelian groups. However, for sim-
plicity we restrict our treatment t o TKk here. [A similar idea will be employed
in Section 8.4 in discussing a subclass of strictly stationary processes.]
Recall that if G is a locally compact abeliaii group, then a coiitinuous ho-
momorphism a of G into the multiplicative group of complex numbers with
absolute value one is called a character of G. Thus a : G + C satisfies (i)
+
a ( x ) 1 = 1, (ii) a ( x y) = a ( z ) a ( y ) ,x, y E G (with +
as group operation),
and (iii) a is continuous. If G = &tk, the additive group of k-tuples of reals,
then we can give the followiiig simple and explicit description of these char-
acters. In addition, the set of all characters on G, denoted G(C T" endowed
with the product topology), will be identified.

Proposition 7 If G = IW" t h e n each character a ( € G) i s of the


form a(z) = ei(">y),where z = ( z l , .. . , z k ) E &tkj similarly y E IWk and
(z,y) = C jk= l z j y j . Moreover, G(= (IKY)
iiisomorphic and homeomorphic
t o TKk under the identification a ++ y, so that a = ei('l" corresponds uniquely
t o y.

Proof The well-known (measurable) solution of the Cauchy equation


f ( z + y ) = f (z)+f (y) is f (z) = (z,c), where c E Ck. Setting f (z) = Log a(z)
here, one gets a(z) = e("J), and since a(z)l = 1 for all z E G, c must be a
pure imaginary, c = -iy for some y E TKk. Thus a(z) = ei("'". On the other
hand, clearly every y E IKk defines ei(.'y) E G. It is also evident that y ++ ei('>y)
is one-to-one. Further,
k
( 1:n ) = {a : a ( z ) - 1 < 6, z < n}, z12 = E x : ,

is a neighborhood of the identity character one, and varying E and n , these


form a neighborhood basis of the identity of G, and with the group property of
G, it determines the topology of G, which is clearly the same as the induced
topology of the product space T" noted earlier. However, using the form
a(z) = ei(">y),we see that I ei("l" - 1 < E is equivalent to

1 (z,y) < 2 arc s i n ( ~ / 2 ) ,


or equivalently, 1 y < 6 with S = (2/n)arc s i n ( ~ / 2 ) .Since this defines the
neighborhood basis of the identity of &tk, we conclude that the mapping be-
tween TKk and ( I K k ) is bicontinuous at the identity, and since both are groups,
the same must be true everywhere. This completes the proof.

A consequence of the preceding two results is the following. Since the space
T can be identified with 0 < B < 27r (with group operation addition modulo
2 ~ )or
, equivalently with the group of all complex numbers of absolute one,
4.5 Some Multidimensional Extensions 2 73

if S = G = R k , then G = T" is isomorphic and homeomorphic to G = Rk.


Thus if h : G -+ G is this mapping, then h ( ~ =) ~ % n dh is one-to-one, etc.
Also this identification shows (&tkT Rk E (Kt". Bochner's theorem can now
be established quickly as follows. We write IKk = G whenever it is convenient.

Theorem 8 If 4 : G + C i s a continuous mapping with 4(O) = 1,


then there exists a unique probability measure P o n the Bore1 a-algebra B of
Rk = G such that

iff 4 i s positive definite, o r equivalently 4 is a ch.f. o n G " IKk.


Proof As before, if 4 is given by (161, then it is immediate that 4 is
positive definite, continuous, and 4(0) = 1. So we only need to establish the
converse if 4 satisfies these conditions.
Consider, to use Proposition 6, the function 1/1 : Z + C defined by

Since Z is a group, it is clear that $ is positive definite and $(0) = 1. Then by


Proposition 6, there is a probability p on the cylinder a-algebra of T" such

Note that B is a complex hoinoinorphisin of absolute value one, and that


+
Q(s sf) = Q(s). 8(sf). Being continuous, it is a character of G, and the
mapping (8, s ) H Q(s)is jointly continuous in 8 and s by Proposition 7. Since
h : T" -+ G = Rk is a homeomorphism, preserving all algebraic operations,
we have, on taking n = 6,(.)(a delta function) in (18) and using the image
law theorem, $I(&,) = 4(s) by (17), so that

where P = p o h t l and B tt ze E G = Rk. Since G E R k , (19) reduces to


(16). The uniqueness of representation is a consequence of Theorem 1. This
completes the proof.

Remarks (1) The above argument holds if G is any locally compact


abelian group. However, we need a more delicate discussion for a determina-
tion of characters as well as the full Pontryagin duality theorem for t>his.The
274 4 Probability Distributions

method employed above is a specialization of the general case considered by


M. S. Bingham and K. R. Parthasarathy ( J . London Math. Soc. 43 (1968),
626-632).
(2) The uniqueness part caii also be established independently of The-
orem 1, using an argument of the above authors. What is interesting here
is that Proposition 6 ( a consequence of Theorem 3.4.10) and the image law
of probabilities replaced the continuity theorem for ch.f.s in this independent
proof. The latter result is now a consequence of Theorem 8, as shown in the
previous section.

4.6 Equivalence of Convergences for Sums of


Independent Random Variables
After introducing the three central convergence coiicepts without regard t o
moments, (namely a.e., in probability, aiid in distribution) we established a
simple general relation between them in Proposition 2.2.2. Then we stated
after its proof that for sums of independent r.v.s these three convergences can
be shown t o be equivalent if more tools are available. The equivaleiice of the
first two was sketched in Problem 16 of Chapter 2. However, the necessary
results are now at hand, aiid the general assertion caii be obtained, without
reference t o the above noted problem, as follows.

Theorem 1 Let X I , X 2 , . . . be independent r.v.s on ( 0 ,C , P),Sn =


Xk, and S be an r.v. Then the following convergence statements (as
n + oo) are equivalent:
(a) S , + S a.e.,
(b) Sn + S in probability,
(c) Sn + S in distribution.

Proof By Proposition 2.2.2.,(a) +- (b) +- (c) always. Hence it suffices t o


prove that (c) +- (a).

Thus assume (c) and let $,(t) = ~ ( e " ~ "By ) . independence, if gn is the
ch.f. of Sn,then $,(t) = ni=, 4k(t), t E R. Since Sn + S in distribution, the
continuity theorem for ch.f.s implies $,(t) + $(t), t E R, and $ is the ch.f.
of S. Hence there is an interval I : a 5 t 5 a , a > 0, such that $(t) # 0.
Let t be arbitrarily fixed in this interval. Then for each E > 0, there is an
>
no = n o ( & , t )such that n no + I$,(t) - $ ( t ) < E, and by compactness
of [-a, a ] , we can even choose no as a function of E alone (but this fact will
not be used below). Thus $,(t) # 0 for all n > no. The idea of the proof
>
here is t o consider {e"sll ,n no} and show that it converges a.e., and then
deduce, by an exclusion of a suitable null set, that S, + S pointwise on the
4.6 Equivalence of Convergences 275

complement of the null set. The most convenient tool here turns out to be the
martingale convergence theorem, though the result can also be proved purely
by the (finer) properties of ch.f.s. We use the martingale method.
Let Yn = edts kn(t) aiid 3, = a(S1,. . . , S,). Then we assert that
>
{Y,, Fn,n 1) is a uniformly bounded martingale. In fact Yn = l/l$, (t) 5
(I$(t) - < cm, where E may and will be taken smaller than min
{I$(s) : -a 5 s 5 a } > 0. Next consider

= [."% /$, ~ l b +(t??


( t ) ] ~ ( e ~ ~ /&+I 1
(by the independence of Xn+l and 3,)
= Yn a.e.
This establishes our assertion. Hence by Theorem 3.5.7 or 3.5.8, Y, + Y,
a.e., so that there is a set Nt c fl such that P ( N t ) = 0 aiid Y,(w) =
&ts (m) (t) i Y, (w), w E R N,. Since $, (t) + $(t) # 0, and these
-

are constants, we deduce that ezts"(-') + p,(w) for each w E R Nt, where -

?
, (w) = Y, (w)+(t). From this one obtains the desired convergence as follows.
Consider the mapping (t, w) H ezts~l(-').This is continuous in t for each w
aiid measurable in w for each t. We assert that

(i) the mapping is jointly measurable in (t, w) relative to the product


a-algebra B BE, where (I,B, p) is the Lebesgue interval and ( R , C, P) is our
basic probability space, and
(ii) there exists a set Go E C, P ( R o ) = 1, such that for each w E Go, there
is a subset I, c I satisfying p(I,) = p ( I ) = 2a, and if t E I,, then eitslb(")
converges to a limit f, (t), say, as n + oo.

These properties imply the result. Indeed, if they are granted, then from
+
the form of the exponential function one has f, ( t t') = f, (t)f, (t'). Since
+
this is true for each t , t' in I, for which t t' E I,, it follows that f, satisfies
the classical Cauchy fuiictioiial equation (cf. Proposition 6.7 or Problem 23
for another argument), and since If, (t)1 = limn leitslb(w) = 1, the solution
is f,(t) = eit"(,) for some a ( w ) E R.Hence f,(t) # 0 for t E I, and it is
continuous for all t E I, (whence at t = 0), so that eitsr7(") = f,,,(t) + f,(t)
as n i cm for all t E R, and f, is a ch.f. (of a unitary d.f.) for each w E
Ro. Therefore the hypothesis of Corollary 2.10 is satisfied, aiid so tS,(w) =
i Log f,,,(t) + i Log f,(t) = ta(w),t E R. It follows that S n ( w ) +
a!(w), w E Go, and so Sn + a! a.e. But then Sn + a! in distribution and
by hypothesis S, + S in distribution. The limits being unique (in the Lkvy
metric), we must have S = a! a.e., and (a) follows. Let us then establish (i)
aiid (ii).
The joint measurability does not generally follow from sectional measur-
ability. Fortunately, in the present case, this is quite easy. By hypothesis,
2 76 4 Probability Distributions

Sn : 0 + R is measurable, and if g(t) = t, the identity mapping of I + I,


then it is clearly measurable (for B). Thus the product g(Sn) : I x R + R
is jointly B @C-measurable. Since eZx is continuous in x and since a continu-
ous fuiictioii of a (real) measurable fuiictioii is measurable, we conclude that
e"('") : I x f l + C is jointly measurable, proving (i).
For (ii), the set A = {(t, w) : limn,, e z t s ~ ~ ( "exists)
') is B BE-measurable,
and so each t-section A(t) = R - Nt satisfies P(A(t)) = 1 by the martingale
convergence established above. Since p 8 P is a finite measure, we have by
the Fubini theorem applied t o the bounded fuiictioii X A ,

where A(w) is the w-section of A. This is 2a since P ( A ( t ) ) = 1 and p ( I ) = 2a.


It follows that p(A(w)) = 2a for almost all w. Hence there exists an Go E
C , P ( f l o ) = 1, such that for each w E flo,p(A(w)) = 2a. Consequently, if
w E Ro, then for each t E A(w),l'lmn+m e " ' ~ l ( ~ exists,
) and (ii) follows. This
completes the proof.

Now that the basic technical tools are available, we proceed to develop the
key results on distributional convergence in the next chapter. Several useful
adjuiicts are given as problems below.

Exercises

1. Let M be the set of all d.f.s on R,aiid d(., .) be the Lkvy distance, as
defined in (11) of Section 1. Show that d(., .) is a metric on M x M + R+
and that ( M , d ) is a complete metric space.

2. If ( R , C, P) is a probability space, X , Y are a pair of r.v.s on it, let


d l , d2 be defined as

where d is the Lkvy distance. Verify that the metric dl is stronger than the
metric d2 in the sense that if X, + X in d l , then the same is true in d2. Give
an example to show that the converse is false, [i.e., convergeiice in distribution
doesn't imply convergeiice in probability].
Exercises 277

3. If ( M , d ) is as in Problem 1, prove that the set of discrete d.f.s is every-


where dense in M . [Hint:If F E M, let X be an r.v. on some (R, C, P) with
F as its d.f. Then there exists a sequence of simple r.v.s X, + X pointwise
on this space, aiid apply Problem 2.1

4. The alternative proof of the Helly-Bray theorem given for Theorem 1.5
extends t o the following case. Let C be the algebra generated by the open
>
sets of R, and let {P,P,, n 1) be finitely additive "probabilities" on C .
Suppose that for each open set A such that P ( d A ) = 0 we have Pn(A) +
P ( A ) , where dA is the boundary of A. Then show that for each real bounded
contiiiuous f on R

Here integrals with respect to a finitely additive measure p are defined


as the obvious sums for step functions, aiid if f n + f in p-measure and
JR 1 f n - f, dp + 0, then JR f d p = lim, Jp,f n dp, by definition. This integral
has all the usual properties except that the Lebesgue limit theorems do not
hold under the familiar hypotheses. The converse of the above (Helly-Bray)
statement of (*) holds if Pn and P are also "regular," (i.e., p is regular on C,
if p(A) = inf{p(B) : B E C , A c int(B)) = sup{p(C) : C E C,?? c A) for
each a E C.) [This is a specialization of a more general classical result of A. D.
Alexandroff, and our proof works here.] In the above, let P, Pn be a-additive,
Pn(A) + P ( A ) , and f : R + R+ be continuous. Then show that

5. In proving Proposition 1.6, we noted that moments need not determine


a d.f. uniquely. Here are two examples of this phenomenon:

(a) Let fl, fz be two functions defined for x > 0 by

aiid f,(x) = 0 for x 5 0, i = 1,2. Show that, with a calculus of residues com-
putation, JR+ xn s i n [ ( & / 2 ) ~ ~f/l ~ >
( x] ) dx = 0 for all integers n 0. Deduce
that f l , f 2 are densities having the same moments of all orders, even though
f l f f2 o n R + .
(b) If the r.v. X is normally distributed, N(0, I ) , and Y = ex, then Y
has a d.f., called the log-normal, and the densities are
278 4 Probability Distributions

and fu (y) = 0 for y <


0. Show that JR yn sin(2r log y) fy (y) dy = 0 for all
>
integers n 0. Deduce that f y and g y , defined by

are both densities with the same moments of all orders even though fu # gy
on R+.

6. Let F be a d.f., for which the moment-generating function (1n.g.f.) M ( . )


Sp:
exists, where by definition, M ( t ) = etx d F ( z ) for It < E, for some E > 0.
Then verify that F has all moments finite, aiid F is uniquely determined
by its moments. [The Taylor expansion of M ( t ) shows that M ( . ) is analytic
in the disc at the origin of C and of radius E . Then the ch.f. q5 of F and
the m.g.f. M satisfy 4 ( z ) = M ( i z ) on this disc aiid both are analytic. Thus
the uniqueness follows.] This result implies, in particular, that the d.f. of a
bounded r.v. is uniquely determined by its moments. Also deduce that a set
{an,n > 1) c R, a 0 = 1, forms a moment sequence of a unique d.f. if the
series
C (an/n!)tn = M ( t )
n)O

is absolutely convergent for some t > 0 and if M : t H M ( t ) satisfies


M ( i t ) = 4 ( t ) , t E R,where 4 is a ch.f.

7. Calculate the ch.f.s of the eight standard d.f.s given in Section 2.

8. Prove Proposition 2.3 by the substitution method of Theorem 2.1.


Using this proposition, deduce that the ch.f. q5 of the density p , p ( z ) =
(1 cosz)/7rz2,z E R,is given by q5(t) = (1 Itl) for t l
- - <
I , = 0 for
1 tl > 1.
9. Complete the details of Khintchine's example: Let f l be a probability
density given by f l ( x ) = (1 - c o s z ) / r x 2 ,x E R, and F2 be a discrete d.f.
with jumps at z = n7r of sizes 2/n27r2,n = *1, f3, . . . , and at 0. Show that
the ch.f.s coincide on the interval [ - I l l ] but are different elsewhere. Deduce
that if &(t) + q5(t),t E [-a, a ] ,a > 0, where 4, and 4 are ch.f.s, theii the
convergence need not hold on all of R. (Also, see Exercise 34 below.)

10. If f ( z ) = K(1zI21og z l ) ~ for


' zl > 2, aiid = 0 for z < 2, where
K > 0 is chosen such that J , f (z)d z = 1, theii verify that its ch.f q5 is differ-
entiable at t = 0 with (dq5/dt)(O) = 0, even though the mean of f does not
exist. [By the comment at the end of the proof of Theorem 2.1, the symmetric
derivative of the ch.f. of a d.f. is equivalent t o its ordinary derivative, and this
may be used here.]

11. Let X I , X 2 , . . . be independent r.v.s each having the same distribution


given by
Exercises

If Yn = C;=,(X~/~", find its ch.f. aiid show that Y, + Y , a.e., where Y is


uniformly distributed on the interval (-1,+1). (Use ch.f.s aiid the continuity
theorem for finding the d.f. of Y.)

12. Strengthen the conclusion of Corollary 2.10 as follows: If (4, $,, n 1) >
is a sequence of ch.f.s such that &(t) + 4 ( t ) , t t R, theii the conver-
gence is uniform on each compact set of R. [First show that the set of
fuiictions is equicontiiiuous; i.e., given E > 0, there is a 6, > 0 such that
I t - t'l < 6, + f (t) - f ( t ' ) < E for all f E (4, $,, n I).] Suppose >
that none of the 4 in this set vanishes. Show that Log &(t) + Log $(t) iff
&(t) + q5(t), t t R, and the convergence is uniform on compact intervals in
both directions.

13. This problem contains a simple extension of the Lkvy inversion formula
with the same argument. Thus let F be a d.f. with 4 as its ch.f. If g : R + R is
an absolutely (i.e., g, gl are) Riemann integrable function such that for each
z t R,
*
liin g (z h) = g (z 0)
hi0
*
exists. Show that we then have

Deduce Theorem 2.1 from this.

14. We give an adjunct t o the selection theorem. Let { F n , n >


1) be a
sequence of d.f.s. with {&,n >1) as the corresponding ch.f.s. If Fn + G (at
all continuity points of G ) , a necessarily nondecreasing function with 0 5 G 5
1, let g be its Fourier transform. Show that q5, + g on R. If on the other hand,
limn,, &(t) dt exists for each u E R, show that F, + a limit (= H, say),
at all continuity points of H, and if h is the Fourier transform of H, theii

lo ( t t = h ( t ) d t , u t R.

In particular, deduce that 4, + g a.e. [Lebesgue] on R implies F, + G (at


continuity points of G) and the Fourier transform of the limit agrees with g
outside of a Lebesgue null set. (The argument is the same as that used for the
last half of Theorem 4.5.)

15. Let X I , . . . , X , be r.v.s with ch.f.s defined by cjk(t) = E ( ~ ~ ~ and


~ L )
280 4 Probability Distributions

Show that the X k are mutually independent if and only if

Show by an example that the result is false if ( t l , . . . , t,) is replaced by only


the diagonal (t, . . . , t ) E Rn. [For a counterexample, consider n = 2, X , 1, <
with a density given by f (x1,x2)= i{1+ xlx2(x? - x;)}, -1 < xi < 1;= 0
otherwise.]

16. We present three important facts in this exercise along with sketches
of their proofs.
(a) Let X I , . . . , X n be i.i.d. N ( 0 , l ) r.v.s and set X = ( l / n ) Cy=l Xi. Show
that X and {(Xi - X),i = 1 , . . . , n} are independent. [Use Problem 15, and
the algebraic identity

and T = ( l l n ) C r = l ti.] Show that the r.v. V = C r = l (Xi - XI2 has a gamma
distribution whose ch.f. q5 is given by q5(t) = (l-2it)p(np1)/2. [Use the identity

and the fact that the left side has a ch.f. t H (1 - 2it1-,/~. This result is
important in statistical inference where V/(n-1) is called the sample variance
and X, the sample mean. Just as in Proposition 2.8, it can be shown (with
further work) that the independelice property of X aiid V characterizes a
normal distribution. See below.]
(b) It suffices to establish the above converse for n = 2. Thus let X I , X2 be
i.i.d. with F as the common d.f., having two moments. Let Y = X1 X2 and +
Z = X1 - X 2 . Show that if Y and Z are independent, then F is N ( p , a2)and
this essentially gives the last statement of (a). [Let q5y, q5z aiid $ be ch.f.'s
of r.v.s Y, Z and d.f. F. Then the independence of Y, Z +- q5y ( ~ ) q 5 ~ ( = s)
+
$(s t)$(s t) aiid so
-

These relations imply the key functional equation for $

since $(-t) = $(t). Put s = t in (*) to get 1$(2s) = $ ( s ) so


~ that
Exercises 281

By iteration, conclude that 1$(2ns) = l$(s)14",or I$(s)I = $(2Tns)14".


Hence conclude that $(.) never vanishes. So by Propostion 2.9, s H f (s) =
Log$(s), called the cummulant function of $, is well-defined. Thus (*) gives,
upon taking logs, the cuinmulant equation

Now use the fact that F has two moments, so that by Proposition 2.6, $
aiid hence f , is twice differentiable. Thus differentiate (**) twice relative t o
t and set t = 0 t o get ffl(s)= -a2 where f f f ( 0 ) = a2,the variance of F.
Since f l ( 0 ) = ip, the mean of F , the solution of this differential equation is
+
f (t) = 02t2/2 p t , SO that $(s) = exp(ip a2t2/2) as asserted.
-

Remark: The result is true without assuming aiiy moments. But then we
need a different method. The coiiclusioii holds even if X I , X 2 do not have the
same distribution. Then gx1(s)= e, L I S + O ~and S ~ /gx,
~ (s) = e - ' l ~ + u a s a / 2 .
See Stromberg (1994), p. 104. The above argument is a nice application of
Propostions 2.6 and 2.9.1
(c) The independence coiicept is so special for probability theory, even for
Gaussian families it has distiiict properties. Thus let X I , . . . , X, be N ( 0 , l )
random variables. Then they can be uncorrelated without being independent,
or pairwise (or "m-wise, m < n) independent without mutual independence.
Verify these statements by the following examples with n = 2 and 3.
(i) Let X1 be N ( 0 , l ) aiid X 2 = X l x I X1xIc where I is an interval
-

I = [-a,a] such that PIX1 E I] = 112 aiid so PIX1 E I"] = P[-X1 E


IC] = 112 since X 1 is also N ( 0 , 1). For aiiy open set J c R,observe that
+
P [ X 2 E J] = P [ X 2 E J n I] P [ X 2 E J n IC] = PIXl E I n J] P[-Xl E +
IC +
n J] = P[X1 E I n J] P [ X l E ICn J] = P [ X l E J],since X1 and -XI are
identically distributed. From the arbitrariness of J, conclude that X 2 is also
N ( 0 , l ) and
E ( X 1 X 2 )= E ( x ; ~ ~E )( x & ~ =
- , )0
so that they are uncorrelated. Verify that

so that they are not jointly normal or Gaussian and are not independent.
(ii) Let X I , X 2 ,X3 be each N ( 0 , I ) , but with a joint density

where gx, is the standard normal density N ( 0 , l ) for i = 1 , 2 , 3 , and 5%=


>
x,x[lz,l<l]then fxl,x,,x, 0 and is a nonfactorizable density so that X I , X 2
and X3 are not mutually independent, but pairwise independent N ( 0 , l ) .
Clearly, a similar example can be given for any subset m (< n ) variables.
[Note that if ( X I , . . . , X,) is jointly normal, then these difficulties disappear.]
282 4 Probability Distributions

17. Let U, V be two r.v.s whose joint ch.f. q5u,v(., .) is given by

as in Example 5 of Section 3. Show that P [ V > 01 = 1 and that the d.f. of


U v - l I 2 is N ( 0 , l ) . [This is another consequence of Theorem 3.2. First verify
that

Next use Theorem 3.2 to get

18. Let X aiid Y be independent r.v.s with X as N ( 0 , l ) aiid Y a gamma,


so that their densities f l , f 2 are given by

where X > 0, a > 0, 2 E R,y E R+.Using Corollary 3.3, show that the
distribution of x y p 1 I 2 has a density f given by

If a = X = n/2, this f is called a "Student's density" with n degrees of free-


dom, and is of importance in statistical testing theory.

19. Let {X,, n = 0, * I , . . .} be a sequence of complex r.v.s with means zero


aiid variances one. Suppose that f ( m , n ) = E(x,X,), aiid the covariance is of
the form f ( m , n ) = r ( m n ) ; i.e., it depends only on the difference m n . Such
a sequence is called weakly stationary. Show that there is a unique probability
function P on the Bore1 sets of ( O , ~ T ] such that

i.e., r is the "ch.f." of P. (Consider Herglotz's lemma. See Section 8.5 where
the use of such r is discussed further.)

20. Suppose that {Xt, t E R) is a set of complex r.v.s each with mean
zero and variance one. Let f ( s , t) = E(x&) and be of the form f ( s , t) =
r ( s - t). This is the continuous parameter analog of the above problem. If r(.)
Exercises 283

is Lebesgue measurable, show that there is a unique probability function P


on the Borel sets of R such that

where A has Lebesgue measure zero. [Consider the Riesz exteiisioii of Bochner's
theorem. In both problems, finite variances suffice, in which case P ( . ) is a fi-
nite (Borel) measure, but not necessarily a probability.]

21.Let X, be an r.v. that has a log-normal distribution with mean pn and


variance 0:. This means that the density of X, is defined by (compare with
Problem 5b)

What happens if a,/p, f , 0 [Since a normal distribution is uniquely deter-


mined by its moments, by Problem 6, we can apply Proposition 1.6. (Cal-
culation of the ch.f. of 2, is clearly difficult.) Therefore compute the mo-
ments of 2, and find their limits. This is not entirely simple. For instance, if
7, = ea: - 1, then

where C2k,k= 1 . 3 . . . (2k 1). Regarding this simplification of the binomial


-

coefficient identity, see Canadian Math. Bulletin, 14 (1971), 471-472.1

22. Let {X,, n > 1) be a sequence of r.v.s such that X, +


D
X , where X is
not a constant, and a,X, + b, + X, where a, > 0, b, E R, and x is also not
D

a constant r.v. Then show that a, + a , b, + b, and x = a x + b a.e. (This


assertion, due t o A. Khintchine, can be proved quickly by the second method
of proof given for Theorem 1.2. The result says that the convergent sequences
of the form {a,X, + >
b,, n 1) have the same type of limit laws.)
284 4 Probability Distributions

23. Let {$,, n > 1) be a sequence of ch.f.s such that &(t) + 1 , t E A,


where A is a set of positive Lebesgue measure in R.Then verify that &(t) +
1,t E R. [Hints: Since $,(t) + 1 also, and $,(t) = $,(-t) + & ( t - t') +
0, t , t ' E A (CBS inequality)+ it holds on B = A A (algebraic difference).
-

But by a classical result of H. Steinhaus, such a set B iiicludes an open interval


(-a,a) c B , a > 0. Thus & ( t ) + 1 for I t < a . By the CBS inequality, for
all I t < a we get

Hence 4, (t) + 1 for t < 2a, aiid since R c U ,,, (-nu, n a ) , the result holds
first for I tl < nu, n > 1, by induction, and then Tbr all t .
Remark The above result is true somewhat more generally, namely: If
&(t) + $(t) for all tl < a , and q5 is the ch.f. of a d.f. for which the m.g.f.
also exists, then &(t) + 4(t) on all of R.See Problem 34 below for a more
analytical statement.]

24. Let X I , . . . , X , be independent r.v.s such that X i is N ( b , 1). If Y, =


Cz=l X:, then Y, is said t o have a noncentral chi-square distribution with r
and Q = (El='=,fi?) as its parameters, called the degrees of freedom and non-
centrality parameter, respectively. Verify that the ch.f. of Y, is given by &,
where
q5T(t)= (1 - 2it)-r/2 exp(itQ(1 - 2it)-I}.
Using the inversion (and calculus of residues), the density h, of Y, can be
shown t o be (verify this!)

and h,(z) = 0 for z 5 0. If p, aiid a: are the mean and variance of Y,, show
theii that, as an application of the continuity theorem for ch.f.s,

What happens if o,/p, f i 0 (The ratio o,/p, is called the coeficient of vari-
ation of YT aiid in statistics it is sometimes used t o indicate the spread of
probability in relation t o the mean. In contrast t o Problem 21, ch.f.s can be
directly used here, and by Proposition 2.9, Log 4, is well defined.)

25. If q5 : R i @ is uniformly continuous, theii for it t o be a ch.f., it is


necessary that (i) 4(O) = 1, (ii) Iq5(t) 5 1, and (iii) 4(-t) = m.
But these
are not sufficient.
Exercises

( a ) Show that the following are ch.f.s:

< <
(a) 4(t) = exp{-ItT}, 0 r 2, (see Example 5.3.11 (ii) later),
(b? 4(t? = (1 + I tI?r11
(c) 4 ( t ) = l I t if t <
i ; = i l t r l if I t 2;.

[Use P6lya's criterion. In (a) if 1 < r <


2 this is not applicable. But r = 2
is obvious and 1 < r < 2 can be deduced from Example 5.3.11, case (ii) as
detailed there.]
( p ) On the other hand, show that the following are not ch.f.s, but each
satisfies the necessary conditions:

+
[If $(t) = 1 o(t2) as t + 0, then it is a ch.f. iff $(t)
+
-
1. Also g : R+ + R+
is convex iff g(x) = g(0) Jf h(t) dt, where h(.) is nondecreasing. Use this in
part ( a ) . In ( a a ) , if 1 < r < 2, show that 4(t) = lim,[$(t/nll')]n, where $ is
the ch.f. of the so-called syininetric Pareto density, p : x H ( r / 2 ) zlr('+l) if
1x1 > 1; = 0 if I xi < +
1; and so $(t) = 1 c,l t T O(t2) as t + 0, but P61ya7s
-

result does not give us enough information!]


(7)Show, however, that the following statements hold, and these comple-
ment the above two assertions: Let q5 be a ch.f. and $ : R + C be a function
>
such that if {a,, n 1) is any sequence with a, /' oo,then 4(t)$(ant) defines
a ch.f. for each n. Verify that $ must in fact be a bounded contiiiuous fuiictioii
and deduce (by the continuity theorem) that $ must be actually a ch.f.
(6) If q5 is a ch.f., then q5$ = $ I 2 is a ch.f. But the absolute value
(and thus a square root) need not be a ch.f. For instance, if X is an r.v.,
P [ X = -11 = i = P [ X = +I], t h en Iq5(t) = E ( e " x ) is not a ch.f. (If
it were, then we may invert it aiid get its density. It may be seen that there
is no such density by a simple calculation. For a general case, see Problem 3 5 . )

26. This problem deals with a calculation of probabilities in certain non-


parametric statistical limit distribution~,aiid seems to have already been
noted by B. V. Giiedeiiko in the early 1950s. Let X , Y be jointly normally
distributed with means zero, unit vaiances, and correlation p, so that their
density f is given by

f (x7 y) = 2 4
1
-p2)1/2 . exp {- 2(1
1
- p2)
(x2 - 2pxy + y2)
Show that, by the image law theorem,
286 4 Probability Distributions

where a! = p(1 -p2)l/'. [Since exp{-i(x2 +


y2)) is symmetric in x and y, the
second term can be interpreted as the probability that (X, Y) takes its values
in the sector {(x, y) : 0 < x < +cm,0 < y < a x ) if p > 0, and replace a! by
<
a if p 0. If 0 is the angle made by this sector, theii the probability mass
being uniform in each such sector, it must be 0 1 2 ~ Since
. the slope of the line
making an angle Q with the x axis is a! = tan Q, we get that the second term
is ( l l 2 ~ ) a r csinp.] From this deduce that

P [ X Y < 01 = 1 - 2 P [ X > 0, Y > 01 = (l/n)arc cosp.


27. (a) Let X, = (Xnl, . . . , X n k ) be a k-vector of r.v.s aiid X =
( X I , . . . , X k ) be a raiidoin vector. If a = ( a l , . . . , a k ) E &tk,show that
k
a . X, = aiX,, + a . X in distribution for each a E R h s n + cc
iff the vectors X, + X in distribution, i.e., iff the d.f.s FX,, (x) + F x ( x ) at
D
all continuity points x = (21,. . . , xk) E Rk of Fx. Deduce that if X, + X,
D
then Xni + Xi for each i = 1 , . . . , k. However, does the componentwise con-
vergence in distribution imply its vector convergence?
(b) Suppose (X, Y) is a raiidoin vector on (a, C , P ) . If Fx,Fy are (the
marginal) d.f.s of X and Y, define the joint d.f. of the vector (X, Y) as Fa,

Show that for each such a , Fa is a d.f. with the same marginals Fx,Fy. [It
>
should be verified also that the increment A F a ( x , y) 0 for all (x, y) E R2.]
Thus the marginals generally do not determine a unique joint d.f.

28. Let {X,, n > 1) be a sequence of i.i.d iioiiiiegative r.v.s aiid Sn =


X k , So = 0. Then X, may be regarded as the lifetime of, for example,
the nth bulb, or of the nth machine before breakdown, so that S, denotes
the time until the next replacement or renewal. Let N ( t ) be the number of
renewals in the interval [0,t ] , or equivalently N ( t ) is the largest n such that
S, < <
t . Hence N ( t ) = 0 if S1 > t , and N ( t ) = n if Sn t < Sn+'.Show that
N ( t ) is an r.v., and if F is the d.f. of X I , theii

where F(') = F, F(') = F * F (the convolution), and F(,) = F * F ( ~ - ' ) . If


>
F ( x ) = 1 - e-'", X > 0, x 0 ( = 0 otherwise), show that

[The N ( t ) process will be analyzed further in Section 8.4.1


Exercises

29. Complete the details of proof of Proposition 4.4.

30. Let X I , X 2 , . . . be i.i.d. random variables each with mean zero and a
finite variance 0 < o2 < cm. Let Sn = EL=, X k . Suppose that the coinmoii
ch.f. q5 of the X, is integrable on R.If a E R aiid 6 > 0, show that, by use of
the Lkvy inversion formula,

[Hints: If Fn is the d.f. of Sn,a * S are continuity points of Fn,then by


Theorem 2.1,

Next q5(t) = [l- (t2/2) + o(t2)],and so

Since Iq5(t)" 5 4 ( t ) l , which is integrable, replace t by (t/a&) in the above


formula (+), aiid take limits with appropriate justification. Finally note that
*
a 6 can be arbitrary coiitiiiuity points, as in the original statement.]

31. Let {X,, n > 1) be i.i.d. with E(X1) = 0, E ( X f ) = a2 < m, and


Sn = EL=1 X k . Then E n > l P [ S n > n ~<] cm, and conversely, if this series
converges, we can conclud% that the moment coiiditioiis hold. [Sketch: The
converse part was outlined in Problem 19 in Chapter 2. The direct part is
also involved, and is due t o Hsu aiid Robbins (1947). If q5 is the coinmoii ch.f.
of the X,, then 11- $(t) 1 < a l t 2 , $'(t) < a21 t , and $''(t) < as for t near 0.
Let Z be an r.v. independent of an X having two moments and whose ch.f. $
vanishes outside (-4,4). [For example, f , (z) = 3(27rp1zp4si1i4z, and we use
this density.] Let I t 5 4S,6 > 0, and note that

and
C P[1 % > 2711 < C(p[sn+ zhp1l> n] + P[IZI > nh]),
n>l n) 1

and it suffices t o verify that


288 4 Probability Distributions

is bounded in N . The ch.f. of S, +Z6-' vanishes outside of (-4nS7 4126). Thus


by the inversion formula,

Hence on subtraction we get

Use the above inequalities for q5(t),and choose 6 > 0 so that for some constants
B7 ( 3 7

and simplify this t o show

sin t/2 cos(N 1/2)t


t3
-

dt + O(1).
By a careful estimation, show that these integrals are bounded. There is some
delicate estimation here, which proves that (*) is bounded, aiid thus the result
follows.]

32. Different forms of Theorem 2.1 can be used t o calculate probabilities


at the discontiiiuity points of the d.f. of an r.v. (a) If X is an r.v. whose d.f.
has a jump at a , aiid if 4 is the ch.f. of X , then, using the method of proof of
the just noted theorem, show that

(b) If {an, n >1) is the set of all discontiiiuity points of the d.f. of X ,
then show that we have an extension of the above result as

[Hints: Let Y be another r.v. which is independent of X and which has the
same d.f. F as X . Then 4(t) . = ~ ( e ~ ~ ( But
~ - the ) .G of X Y is
~ )d.f. -

the convolution of that of X and -Y given by


Exercises

where F ( y ) = P [ - Y < y]. By (a), the left side (above) gives the discon-
tinuity of G at x = 0, and for this value show that the integral for G is
C n ( P [ X = an])2,since a, is a discontiiiuity of F iff a , is such for F.]

33. This exercise extends Proposition 2.6. Let X I and X 2 be independent


r.v.s on (R, +
C , P) and X = X1 X2. If X has 2n moments, then the same is
true of X1 and X2. Moreover, there exist real numbers K j and a j such that

E(x:"<K~E(~~+x)~" j=l12 and O<k<n.

[Hints: Let c j j and cj be the ch.f.s of X j and X , so that 4 = 414% Replacing


+
X 2 by /3 X2 if necessary, we may assume that X 2 takes values in both R+
and R with positive probability. Since E ( x ~ <~ ) by Proposition 2.6,
oo,
4(2n)(0)exists; expressing it as the symmetric derivative, we get

Hence for 0 < k < 212,

and for 0 < x, y < a , since 0 < xZn < (x + y)2n, we get from this

Interchanging Fx,, Fx,, here, the second inequality holds since


p q e i t ( ~ + X) I ) = IE(eitx)I.

This is a form of a result of A. Devinatz and extends a classical result due to


P. Lkvy and independently to D. A. Raikov.]

34. If 41, 4 2 are ch.f.s on R,which agree on (-a, a ) , a > 0, and if one of
them is regular, then they agree everywhere. (Compare with Exercise 9. This
is a form of a classical result due t o J. Marcinkiewicz, and the proof depends
on complex function theory. Compare with Proposition 2.8 (ii).)

+
35. An r.v. X is of lattice type if its range is of the form {a k p : k =
0, *I, f2 , . . . , P > 0 and a real }. Show that its ch.f. 4x is periodic of period
/3. If P = 2 ~then ,
4 Probability Distributions

In particular, if X is a symmetric lattice r.v., deduce that 4x(.)l cannot be


a ch.f. This extends Exercise 25(S). (For a stronger negative statement, with
q5 never vanishing, see Exercise 9 of Chapter 5.)
Chapter 5

Weak Limit Laws

The strong (or pointwise a.e.) limit theory of Chapter 2 naturally leads t o
the distributional convergence of random sequences. Such a shift in viewpoint
enabled an enormous growth of probability theory. This chapter contains a
general outline of this picture. It starts with the classical central limit theo-
rems of Lkvy and Liapounov and contains their modern versions as well as
an error estimate of Berry and Essken. Some aspects of infinite divisibility
together with the Lkvy-Khintchine representation and stability are treated.
The invariance priiiciples of Donsker aiid Prokhorov are discussed, and two
important applicatioiis are included. Further, Kolmogorov's law of the iter-
ated logarithm and related results are given. Applications and extensions t o
m-dependent sequences establish the generality and limitations of invariance
principles. The tools developed in Chapter 4 are essential here. The material
in this chapter represents a central aspect of analytical probability theory,
soine of which will be essential for Part I11 of this book dealing with soine
important applications. In fact, much of additive process analysis in Chapter
8 depends upon the work of this chapter.

5.1 Classical Central Limit Theorems


In all the results of Chapter 2, we demanded pointwise convergence of the
sequences of various r.v.s (either partial sums or averages). The conclusions
are the strongest possible in this setup, and all coiisideratioiis relate t o the
given uiiderlyiiig probability space. However, if we lower our demands aiid
settle for somewhat weaker conclusions, such as convergence in probability
or in distribution (hence the appellation "weak limit laws"), then many new
results can be proved. In this, the intermediate versions with "in probabil-
ity" are not very illuminating. Really new areas are opened for investigation
when we go t o the convergence theory on the image spaces, i.e., t o the dis-
tributional convergence. In this case, we can employ the new tools developed
in the preceding chapter, thereby bringing in the well-known machinery of
292 5 Weak Limit Laws

classical Fourier analysis. Moreover, as Theorem 4.6.1 shows, in some impor-


tant cases the weak and strong convergence statements coincide. With these
as motivation, we shall concentrate henceforth on distributional convergence,
inequalities, and the like. Note that by the Kolmogorov-Bochner theorem (see
3.4.10), with a consistent family of distribution functions, we can manufacture
a probability space and a set of r.v.s on it having the given d.f.s as its finite-
dimensional distributions. Thus with a probability space and r.v.s on it, we
can go t o the image space with d.f.s satisfying the compatibility conditions,
and given the latter family, we can illvent a probability space that is, in a
well-defined sense, measure-theoretically indistinguishable from the original
space. This type of change of spaces t o suit our needs is a distinguishing fea-
ture of probability theory. In this vein, we prove some key results that play
a central role in theory and applications; these results are called central limit
theorems, especially when the limit distribution is normal or Gaussian (or
more generally, one that is "infinitely divisible," a term t o be defined later).
Let us start with a very simple result, the weak law of large numbers, due
t o A. Khintchine, which was given as Theorem 2.3.2. But we can slightly im-
prove it here. [We use modern methods and simplifications afforded by ch.f.s.1

Proposition 1 Let {X,, n >


1) be independent r.v.s with the same dis-
tribution. If their common ch.f. has merely a derivative at the origin of R,
with value ip, then ( l l n ) X k + p in probability.

Remark It was noted in the last chapter (Problem 10) that a ch.f. can
have a derivative at the origin without the d.f. having a finite mean. Thus the
hypothesis here is weaker than that of Theorem 2.3.2. That proof does not
apply. (However, the proof there illustrates the truncation technique which is
useful for other results.)

Proof If 4(t) = ~ ( e " ~ k then


) , by hypothesis 4(t) = 1 i p t+ + o(t) (as
t + 0), since 4 has a derivative = i p at 0. If Sn = Ci=,X k , then

& (t) = E(e"'"In) = ,.(e'tx"n) (by independence)

By the continuity theorem, this implies (Sn/n) + p in distribution, hence in


probability, by Proposition 2.2.2. This completes the proof.

The following result was established by A. de Moivre in about 1730 if the


r.v.s are Bernoulli, taking values 1 and 0, each with probability ?j,and it was
extended by P. S. Laplace nearly a century later if the probabilities are p,
1p (0 < p < I ) , usually called the Laplace-DeMoivre central limit theorem.
The result was generalized by J. W. Lindeberg in the early 1920's, and t o the
5.1 Classical Central Limit Theorems

present form by P. Lkvy a little later.

Theorem 2 Let {Xn, n > 1) be i.i.d. random variables each with mean
p and variance o2 > 0. If S, = EL=1 X k , then

Proof Let 4, be the ch.f. defined by

= fi
1=1
E (ex { x - p)}) (by independence)

[4(.) being the common ch.f. of Xi - p],


n
(it)2a2
=
[ l+O+-
2n02
+ o(t)2] (by Corollary 4.2.7')

Since t H ept2I2 is the ch.f. of the standard normal N ( 0 , l ) d.f., the result
follows by the continuity theorem for ch.f.s, completing the proof.

A natural question is t o consider the case that the X, are not identically
distributed; if this is solved one tries t o extend the results for sequences which
form certain dependence classes. We consider the first problem here in some
detail aiid aspects of the second (dependent) case later. It is noted that the
above theorem fails if the i.i.d. hypothesis is omitted. (See Example 4 be-
low.) The early treatment of a nonidentically distributed case was due t o A.
Liapounov in 1901, and his results were generalized in the works of J. W. Lin-
deberg aiid W. Feller. This circle of results attained the most precise and
definitive treatment in the theory of illfinitely divisible distributions. First we
present Liapounov's theorem, since his method of proof has not only admitted
a solution of the central limit problem, as indicated above, but contained a
calculation for the speed of convergence in the limit theory as well. Here is
the convergence result.

Theorem 3 (Liapounov) Let {X,, n >


1) be a sequence of independent
r.v.s with means {p,,n >
I}, variances {a:, n >
I}, and finite third abso-
lute central moments p i = E ( X k - pkI3). ~fSn = xk,and we write
p3(Sn) = E L = l (not the third absolute moment of S,), a2(S,) = E L = l a;,
then

lim P
nicx [ Sn - E(Sn)
a(sn)
<z
I ,(2+1/2
L e-t2/2
dt, z c R, (3)
2 94 5 Weak Limit Laws

whenever limn,, [p(Sn)/a(Sn)]= 0.

Proof Let q5'(t) = E(exp{it(Xk - p k ) ) ) and

so that

by independence. By Corollary 4.2.7, [see Eq. (19') there] we have


t o;t2
pitS = 1 yi (say), 8 1
~"(o(s,)=l-m+~lm
1. (4) + <
Hence

<
since 01, pk by Liapounov's inequality (cf. Corollary 1.3.5 with r = 2, s = 3,
aiid p = 0 there). But pk/a(sn) <
p(Sn)/a(Sn), which tends t o zero as
n + cm. Thus there exists an no such that n >
no + pk/a(Sn) < 1 and for
fixed but arbitrary t , ykl <
$ since y k + 0 as n + cc for each k. Now

Consequently for some Q3l < 1, [using (4)],

<
where O4 = 04,' aiid 041 1. Hence we can take the logarithms of $I, and 4k
(cf. Proposition 4.2.9) t o get
5.1 Classical Central Limit Theorems

since [p(Sn)/a(Sn)]+ 0 by hypothesis. Here we have set

so that for some 85 (= 85,n), 85 1 5 1,


n

Hence limn,, $,(t) = exp(-t2/2), and the proof is completed by the conti-
nuity theorem.

Remark In contrast t o Theorem 2, in the nonidentically distributed case,


the conclusion of the above theorem does not hold if an additional condition
on the relative growth of the moments higher than 2 of r.v.s is not assumed.

Let us amplify the sigiiificaiice of this remark by the followiiig example:

Example 4 Let {X,, n > 1) be independent r.v.s such that

and
P [ X k = 0] = 1 - ( l l k ) .
Then E ( X k ) = 0 , 0 2 ( x k ) = 1, so that a2(sn)= n, where Sn = EL=1 Xk
and p"Sn) = E;=, kV2. Thus p3(sn) is asymptotically of the order n5I2 (by
Euler's formula on such expressions). Hence p 3 ( s n ) / a 3 ( s k )f i 0. On the other
hand,

Therefore

where we used the Rieinaiin approximation to the integral. Thus

lim g n ( t ) = exp{ f (t)} # exp(-t2/2).


n-00

Hence the limit distribution of Sn/+ exists, but it is not normal, even though
all the X k have the same means and the same variances, but not identically
296 5 Weak Limit Laws

distributed, and the hypothesis of Theorem 3 is not satisfied.

+
For the validity of Liapounov's theorem it suffices t o have 2 6,6 > 0 mo-
ments for the r.v.s. Then define pi+6= E ( xt12+" , p 2 + 6 ( ~ n )= pi+6.
The sufficient condition of the theorem becomes

lim [p2+"(s,)/a2+"S,)] = 0.
n+cx

The demonstration is a slight inodificatioii of that given above, and we shall


not coiisider it here (but will indicate the result in problem 2). See the coin-
putatioii following Theorem 3.6 below for another method.
In the proof of the preceding theorem [cf. especially Equation (5)], there
is more information than that utilized for the conclusion. The argument gives
a crude upper bound for the error at the n t h stage. This is clearly useful in
applications, since one needs t o know how large n should be in employing the
limit theory. We now present one such result, due independently t o A.C. Berry
in 1941 and C.G. Essken in 1945, which is a generalization of the original work
of Liapounov as well as an improvement of that of Cramkr's, (cf. the latter's
monograph (1970)).

Theorem 5 Let {X,, n > 1) be independent r.v.s as in Theorem 3, i.e.,


E (X,) = p,, V a r X, = oi (> 0), and p i = E ( X, pI3) with p3(S,) =
-

EL=1 p i , where S, = Xk. T h e n there exists a n absolute constant 0 <


Co < cc such that

If the X, are i.i.d., and the rest of the hypothesis is satisfied, let Var X1 =
02,p3 = E ( X1 - p 1 3 ) , so that p"S,)/a"S,) = (p/a)"l/fi). Under these
conditions we deduce from (6) the following:

Corollary 6 Let {Xn, n > 1) be 2.i.d. w i t h three m o m e n t s finite. T h e n


there i s a n absolute constant O < C1 < oo such that

R e m a r k In this case A.C. Berry also indicated a iiuinerical value of


C1. Carrying out the details carefully, V.M. Zolotarev in 1966 showed that
C1 5 1.32132.. . . The best possible value of C1 is not known. For (6), H.
Bergstrom in 1949 indicated that Co < 4.8, which perhaps can be improved.

The proof of Theorem 5 is involved, as it depends on many estimates. The


basic idea, however, is the same as that of Liapounov's theorem. Thus we
present these estimates separately as two key lemmas, and then complete the
5.1 Classical Central Limit Theorems 297

proof of the theorem thereafter. If F, is the d.f. of [S, - E(S,)]/a(S,), and @


is the standard normal d.f., so that F, + @ under our assumptions, we should
get uniform (in x) bounds on I F,(z) - @(z)for each n. Thus if F and G are
two d.f.s, G is continuous with a bounded density G', let H =: F G. (We later
-

set G = @ and F = F,.) For nontriviality let H # 0. Since H is a function of


bounded variation, if we add some integrability conditions (which will hold in
our context) then Theorem 4.3.1 suggests the method t o be employed here.
In fact, that theorem was devised primarily for the (present) error estimation
needs by Cram&, and the analysis with it can be carried forward much fur-
ther. (One can give an a~yrnpt~otic expansion: F = G + H +
. . . , where the
H, are certain functions of the cebygev-~erniitepolynomials, but we do not
consider that aspect here.)

Lemma 7 Let H = F - G and M = sup,,^ I G 1 ( z ) . If H is Lebesgue


integrable on R, and h is its Fourier-Stieltjes transform, then for every T > 0

Proof Since the result is clearly true if the integral is infinite on the right
side of (8),let it be finite. Then by definition

since H ( f ca) = 0, and integration by parts is used. Hence for any fixed but
arbitrary ao E R (t # 0),

Since [-h(t)/it] is the Fourier transform of H ( . ) , we would like t o convert it


into a convolution by multiplying with a suitable ch.f. with compact support.
This is the key trick in the proof. So consider the "triangular" ch.f. 4 ~ ( t = )
[I- <
tl/T] for tl T, and =O for tl > T. This is the ch.f. of the symmetric
probability density (1c o s T z ) / n T z 2 , and we have already used it in proving
Pcilya's theorem on ch.f.s.
Thus multiplying both sides of (9) by q5T and integrating, we get
5 Weak Limit Laws

(by Fubini's Theorem)

2(1 - cos T z )
dz

(by the inversion Theorem 4.2.1 applied to q5T)

2 sin2(Tz/2)
H (x + ao) dx
=2 k (T)sin v
H~, (g) dv;

where Ha, (2) = H(z + a o ) Hence (10) gives

It is now necessary t o find a lower estimate of the left side of the integral of
(111,after an analysis of the behavior of H . This involves some computation,
and we now set down the details.
If a = s u p z E ~H(z)l,then *a is attained at some a E R,i.e., H ( a ) = f a
+ <
(by left continuity), or H ( a 0) = a . Indeed, since 0 < a 2, there exists a
sequence {x,, n > 1) c R c PS with a convergent subsequence z,, + a E PS.
But H(z) + 0 as z + *oo, and a > 0, so that a E R,i.e., a must be finite.
Thus there is a subsequeiice {znl,j >
1) c {z,, , i >
1) such that znl + a
and H (znl) + a or H (znl) + a . Consider the first case: H (znl) + a. Now
{znJ, j> 1) must have a further subsequeiice which converges to a from the
left or from the right. In the former case, by left continuity H ( a ) = +a. In the
+
latter case H ( a 0) = a. Also, by the Fatou inequality, and the continuity of
G7

Thus H ( a ) = +a holds. The case that H ( a ) = -a is similar.


Let p = a / 2 M , b = a + P , and consider H(z+b) for z < P. Then b+z >a
<
and, since I G1(z)I M by hypothesis,

Hb(x) = F(z + b) - G(z + a + P)


> F(a) - G(a + x +P)
= F(a) - [G(a) (z + P)G1(0)]
-

(say) (by the Taylor expansion of G about a )


= H(a) + (x + P)Gf(Q)> a! - (z P)hf +
= a n f p - (X +p ) =~ ( - x).
p
5.1 Classical Central Limit Theorems 299

If H ( a ) = -a, then this bound would be - M ( P + x) with the inequality


reversed. Now consider for (10)
1 - cos T x
dx

1 - cos T z
dz [by (1211

I - cos T z
dx
x2

(since the second integral vanishes)

On the other hand,

O0 sin2 v
dv.
= 2 a LT,l2
~ 7
But then the left side of (11) simplifies, using (13) and (14) (set a0 = b there),
as follows:

2 1sin2 v
--Hb
R v2
(F) dv, since Hb can be negative,
1 cos T z
-

= Hb(x) Tx2 dx
1 - cos T x 1 - cos T x
Hb(z) Tx2 Hb(z) Tx2 dz

2 2MP (1 O0

PTl2
sin2 u
>PI
I cos T z
-

T 2

2a (r 1 - 3
O0

PTl2
sin2 u
dv) [by (13) and (1411

2a (: +)- (since
sin2 v
7 1
< 7).
300 5 Weak Limit Laws

Putting this in (11) and transposing the terms, we get (8). In case H ( a ) = -a,
with the reverse inequality of (12)' we get the same result after an analogous
computation. This completes the proof of the lemma.

Let us specialize this lemma t o the case where G = @ aiid F is the d.f. F,
of our normalized partial sum (S, - E(S,))/a(S,). To use the result of (8),
one needs t o find an upper bound for the right-side integral involving h(.),
which is the difference of two ch.f.s. This is obtained in the next lemma.

Lemma 8 Let {X,,n > 1) be independent r.v.s with three moments:


E (X,) = p,, Var X, = a, and p i = E ( I X,-p13). If S, = EL=lX k ,p3 (S,) =
:
EL=lp i , and $, : t H E(exp{it(S, - E(S,))/a(S,)}), then there exists an
absolute constant 0 < C2 < m such that (C2 = 16 is permissible below)

Proof It may be assumed that the means pn are zero. Let &(t) =
E(eitx"). Then by Corollary 4.2.7, we can again use the computations of
Liapouiiov's proof, i.e. of (4) aiid (5), if, with the iiotatioiis there, we write

where

+
and if I t < a(Sn)/2p(S,), we get I ykl < (118) (1148) < 112. Thus for t in
this range with the Q i <
1, one gets on using a k pk <

2
-
- --
o;t2
2a2(S,)
+-olp;l t13
6a3(S,)

Hence, summing on k ,

Log &(t) =
t2
-
2
+ Q5
97 p3(Sn) t 1 2
~ m
Let us now simplify the left side of (15) with the estimate (16). Here we
use the trivial inequality I e" - 1I <
I zlel"l. Thus
5.1 Classical Central Limit Theorems

Next we extend the range of t as given in (15). For this we symmetrize the
d.f. Thus let X l be an r.v. independent of Xk but with the same d.f. Then
X k xi has for its ch.f. ( 4 k I 2 , and since (by coiivexity of (zI3)
-

E ( I X k - ~ f i 1 3 ) < 8 p E , Var(Xk-X;)=2a;
(with zero means), we have on writing I q 5 k 2 in place of cjk in (41, and using
the above estimates for the variance and third absolute moment there (this is
simpler than squaring the value of 1 q5k))

(since I + z < ex,z > 0).


Multiplying over k = I , . . . , n, we get
1 $n(t)12 < exp{-t2 + 4 1 t l % 3 ( ~ n ) / 3 0 3 ( ~ n ) }
< exp{-t2 + 2t2/3} = ept2/"
[since 1 t ( 5 C T ~ ( S , ) / ~ ~ ~ ( S , )in (15)].
(18)
Now to extend (171, if I tl > CT(S,)/~~(S,)
[but satisfying the range condition
of (1511,
302 5 Weak Limit Laws

Thus (17) and (19) together establish (15), with C2 = 16, and the lemma
follows.

With these estimates we are now ready t o complete our proof

Proof of Theorem 5 We set G ( z ) = @(z),


so that G1(z) = ( 2 ~ ) - ~ / ~ e - " ' / ~
and M = ( 2 ~ ) - ' / in
~ Lemma 7. Also, if

then by ceby5ev's inequality F,(z) < l/z2 if z < 0 and > 1 (1/z2) if
-

z > 0. The same is true for @(z)= P[X < z].Hence H = F, @ is Lebesgue
-

integrable first on [I XI > E ] and, being bounded, also on [-&,&I for E > 0,
thus on R itself. Let T = [a(Sn)/p(S,)]3 in Lemma 8. Since the hypothesis of
Lemma 7 is also satisfied, we get

where Co is the above constant. This is (6) and the proof is complete.

Remark It should be noted that the various "standard" tricks used in the
above estimates have their clear origins in Liapounov's proof of this theorem,
and thus the important problem of error estimation is coiisidered there for
the first time. Taking the X k as Bernoulli r.v.s in the Berry-Essken theorem,
one may note that the order of (p3/a3)(S,)cannot be smaller than what we
have obtained in (6). Under the given hypothesis, it is "the best" order of
magnitude, though Co can (with care) be improved. Also, if the limit distri-
bution is different (but contiiiuous with a bounded density such as gamma),
Lemma 7 can still be used. For an up-to-date treatment of this subject, see
R. N. Bhattacharya and R. R. Rao (1976). For another account including
the multidimensional problem, see Sazonov (1981), Springer Lecture Notes in
Math. No. 879.)
In the central limit theorems of Lkvy and Liapounov, we coiisidered the
< <
independent sequences {Xk/a(S,), 1 k n}, n = 1 , 2 , . . . , and their partial
5.1 Classical Central Limit Theorems 303

sums. Abstracting this, one may consider the double sequences {Xnk,1 k < <
n} of independent r.v.s in each row. These results can appear as interesting
problems in their own right; S. D. Poisson already considered such a question
in 1832. We can establish it with our present tools quite easily, but as seen
in the iiext section, this turns out t o be an important new step in the devel-
opment of our subject. Its striking applications will appear later in Chapter 8.

Theorem 9 (Poisson) Let {Xnk,1 k < < n, n > 1) be a sequence of


finite sequences of Bernoulli r.v.s which are i.i.d. in each row:

Proof Let &k(t) =~ ( e), &(t)


~ ~= E(eztSr~).
~ l ~Them~
n
4 )= 4 ) (by independence in each row)
k=l

Since the limit is a Poisson ch.f., the result follows from the continuity theo-
rem.

The interest in this proposition steins from the fact that the liinit d.f. is
not Gaussian. Unfortunately, the significance of the result was not recognized
until the late 1920s, and the theorem remained a curio for all those years. Its
status, on a par with the Gaussian d.f., was realized only after the formulation
and study of infinitely divisible distributions as a cullnillation of the central
liinit problem. We take it up in the iiext section, presenting several key results
in that theory, and then deduce certain important theorems, which were orig-
inally proved by different methods. For instance, we obtain, among others,
the Lindeberg-Feller theorem, which generalizes both Theorems 2 and 3, later
in Section 5.3. Its key part is considered further in the continuous parameter
case in Section 8.4. As is clear from the above work, the use of ch.f.s in the
analysis is crucial. The basic probability space will be in the background. For
that reason, this aspect of the subject is often referred t o as the analytical
theory of probability.
304 5 Weak Limit Laws

5.2 Infinite Divisibility and the L6vy-Khintchine Formula


One of the important points raised by the Poisson limit theorem is that the
asymptotic d.f.s for partial sums of a sequence of sequences of r.v.s iiide-
pendent in each row caii be different from the Gaussian family. This is a
generalized version of the classical central limit problem, and the end result
here is different-it is not normal. The only thing common t o both these d.f.s
is that their ch.f.s are exponentials. An abstraction of these facts leads t o the
introduction of one of the most important classes as follows:

Definition 1 A d.f. F with ch.f. 4 is called infinitely divisible if for each


>
integer n 1, there is a ch.f. 4, such that 4 = (q5,)n.

Since a product of ch.f.s corresponds t o the sum of independent r.v.s, the


above definition caii be translated into saying that an r.v. has an infinitely
divisible d.f. iff it caii be written as a sum of n i.i.d. variables for each n 1, >
or equivalently, the d.f. is the n-fold coiivolution of some d.f. for each n 1. >
Clearly the normal and Poisson ch.f.s are infinitely divisible.

Two immediate properties of real interest are given in the following:

Proposition 2 (a) A n infinitely divisible ch.f. 4 never vanishes o n R.


(b) If X i s a n infinitely divisible r.v., and X i s nonconstant, t h e n X takes
infinitely m a n y values, which form a n unbounded set.

Proof (a) By definition, for each integer n >


1 there is a ch.f. 4, such
that 4 = (4,)", so that 4, l 2 = 4I2ln. But 4,$, = 4, 12, and since 4, and
-

4, are ch.f.s, their product is a ch.f. (It is the ch.f. of X - X', where X , X'
are i.i.d. with ch.f. $,.) Similarly, 412 is a ch.f. Since I 4(t) 1, <
liin
n i m
1 4, (t) 1 = liin
nicx
4 (t) 1 2/n =
exists, aiid g(t) = 0 on the set {t : 4(t) = O), aiid =1 on the set {t : I 4(t) > 0).
But 4(0) = 1 and 4 is continuous at t = 0. Thus there exists a T > 0 such that
4 ( t ) l > 0 for tl < T. Hence g(0) = 1 and g is coiitiiiuous for T < t < T.
Since each 1 4,12 is a ch.f., and 1 4,12 + g, which is continuous at t = 0, by
the continuity theorem, g must be a ch.f., and hence is continuous on R. But
g takes only two values: 0, 1. Thus we caii conclude that g(t) = 1 for all t E R
which yields I 4 ( t ) > 0 for all t E R. This proves (a).
(b) If X colicelitrates at a single point p E R, then its ch.f. 4 : t H e"p is
clearly infinitely divisible. If X takes finitely many values, then it is a bounded
r.v. Thus, more generally, suppose X is a bounded r.v. We assert that it
must be degenerate. Indeed, if I XI <
M < cc a.e., and if X is an infinitely
divisible r.v., then by definition, for each integer n >
1, there exist n i.i.d.
variables X n l , . . . , X,, such that X = Ci=,Xnk The fact that I XI <
M
< < <
a.e. implies I XnkI M/n a.e. for all 1 k n. If not, A1 = [Xnk> M/n] or
5.2 Infinite Divisibility 305

A, = [Xnk < -M/n], P(A,) > 0, for i = 1 or 2, and at least one (hence all by
i.i.d.)k, P [ X n k > M l n ] > 0 or P [ X n k < -M/n] > 0, then on A1, [ X > MI,
or on A2, [ X < -MI, which is impossible. Thus P(A1 U A2) = 0. Hence

Since n is arbitrary, Var X=O, so that X = E ( X ) a.e., as asserted. (See Prob-


lem 8 which asks for another argument.)

An immediate coilsequelice is that the binomial, multinomial, or uniform


d.f.s are not infinitely divisible. On the other hand, the r.v. whose d.f. has a
density f given by
f (x) = (1- cos x)/.irx2, x E R,
is unbounded. Its ch.f. 4, which we found before, is given by 4(t) = 1 - I tl
if I t 5 1, aiid =O if tl > 1. Since 4(t) = 0 on [ t > I ] , by the proposition
this ch.f. callnot be infinitely divisible. In addition t o normal and Poisson, we
shall see that the gamma ch.f. is also infinitely divisible (cf. Problem 14).

Remark (Important) Since by (a) of the proposition, an infinitely divis-


ible ch.f. 4 never vanishes, by Proposition 4.2.9 there exists a unique con-
tinuous fuiiction f : R + @ such that f ( 0 ) = 0 aiid 4(t) = e f ( t ) , i.e.,
f (t) = Log 4 ( t ) , t E R, the distinguished logarithm. We call e(l/n)f(t) the
(distinguished) nth root of $(t), and denote it ( 4 ( t ) ) l l n .

Hereafter, the nth root of 4 is always meant t o be the distinguished one


as defined above. This appears often in the present work. Thus, we have the
following fact on the extent of these ch.f.s.

Proposition 3 Let F be the family of infinitely divisible ch.f.s on R.


Then F is closed under multiplication and passages to the (pointwise) limits
in that 4, E F,4, + 4 on R, 4 is a ch.f. +-4 E 3.

>
Proof Let 4i E 3,i = 1 , . . . , k. Then by definition for each integer n 1,
k
there exist $, such that 4, = ($,),. Thus 4 = n , = l cji = ( n t = l and
n;=, gin is a ch.f. Hence 4 is infinitely divisible and 4 E F. Next suppose
4
that 4, E F and 4, + a ch.f., as n + cm.Then for each integer rn 1, we >
have that $n12/m is a ch.f., as seen before, and the hypothesis implies that
I 4, 12/" +-1 $ 1 ~and
1 ~
the limit is continuous at t = 0. Thus by the continuity
theorem 412/" is a ch.f., and $12 is infinitely divisible. Now by Proposition
2a, $ never vanishes and is continuous, so that Log $ is well defined. The same
is true of Log 4,. Hence on R
306 5 Weak Limit Laws

and ($)'lrnis continuous at t = 0. Thus ($)'lrnis a ch.f., implying that 4


is infinitely divisible. Thus 4
E F, and the proof is complete. [Note that
q5n E 3,4n + 4 011 R + q5 E F, as &(t) = epnt212shows.]

The argument used for (2) implies that, if 4 E F, then qY is a ch.f. for
>
each rational r 0. Then the continuity theorem and the fact that each real
>
X 0 can be approximated by a sequence of rationals r, >
0 give the result
>
for all X 0. A similar argument (left t o the reader) gives the second part of
the following, since ( 4I2)'ln is a ch.f., n 1. >
Corollary 4 Let 4 be a n infinitely divisible ch.f. T h e n for each X > O,q5'
i s also a ch.f. Further, I q 5 is a n infinitely divisible ch.f.

But there are ch.f.s q5 such that q5' is not a ch.f. for some X > 0. (See
Problem 9.) These properties already show the special nature and intricate
structure of infinitely divisible ch.f.s. There exist pairs of noninfinitely divisible
ch.f.s whose product is infinitely divisible; and also pairs one member of which
is infinitely divisible and the other one not, and the product is not infinitely
divisible. We illustrate this later (see Example 8.)
The problem of characterizing these ch.f.s was proposed, and a first solu-
tion given, by B. de Finetti in 1929, and a general solution, if the r.v.s have
finite variances, was given by A. Kolmogorov in 1932. Since, as is easily seen,
the Cauchy distribution is infinitely divisible aiid it has no moments, a further
extension was needed. Later P. Lkvy succeeded in obtaining the general for-
mula, which iiicludes all these cases, but it was not yet in canonical form. The
final formula was derived by A. Khintchine in 1937 in a more compact form
from Lkvy's work. We now present this fundamental result and then obtain
the original formulas of Lkvy and of Kolmogorov from it.

Theorem 5 (Lkvy-Khintchine Representation) Let 4 : R + C be a


mapping. T h e n 4 i s a n infinitely divisible ch.f. iff it c a n be represented as

itx 1 + x2
4(t) = exp { i y t + (e'xt - 1--
)
1+ x 2 x } 7
(3)

where y E R,and G i s a bounded nondecreasing function such that G(-oo) =


0. W i t h G t a k e n as left continuous, the representation i s unique.

Proof We first prove the sufficieiicy and uiiiqueiiess of the representation,


since the argument here is more probabilistic than the converse part. Thus let
q5 be given by (3). The idea of proof is t o show that q5 is a limit of a sequence
of ch.f.s each of which is infinitely divisible, so that by Proposition 3 the limit
function 4, which is continuous on R, will be infinitely divisible. We exclude
the true and trivial case that G = 0.
Let E > 0 aiid T > 0 be given. For I t 5 T, choose a, > 0 such that
5.2 Infinite Divisibility

where K(= KT) = sup{ 1 1t


ft (x) : 5 T, x E R} with

and so limx,o ft(x) = t 2 / 2 . On the other hand,l , ,& ft(x) <


2 for each
t . Hence we deduce that for tl <
T , K < oo. Now choose an integer N, and
an n-partition, of [-a,, a,] with n 2 N,, -a, = xo < X I < . . . < X, = a,,
such that

This is possible since ft is uniformly coiitinuous on [-a,, a,]. But 4, given by


(3), is never zero; thus Log 4 is defined by Proposition 4.2.9, and for I t T<
we get

Using the Riemann-Stieltjes approximating sums for the integral on the left
in (6), one has, by ( 5 ) ,

Hence (6) and (7) imply

This may be simplified by writing out the values of f t . Let

if z j # 0, and at zj = 0, if G has a jump of size n 2 at 0, then

lim ft(x)[G(x)- G(O)] = -a2t2/2.


z-0
308 5 Weak Limit Laws

Hence (8) shows that d ( t ) is approximated by the following in which we sub-


stitute -a2t2/2 at xj = 0:

But this is a finite product of the ch.f.s of Gaussian and Poisson d.f.s. Thus
it is infinitely divisible for each n , and by Proposition 3, the limit q5 is also
illfinitely divisible as 4 is clearly contiiiuous at t = 0.
Let us establish uiiiqueiiess before turning to the proof of necessity. As
noted above, Log q5 is a well-defined continuous complex function on B by
Proposition 4.2.9. Consider

w(t)= ( 1 / 2 )
1-1 Log 4(u) du + Log 4(t), t E R.

Using the representation (3), this can be simplified immediately to get

Thus if (G being left continuous also) we set

then W(.) is a bounded nondecreasing left continuous function and W(-oo) =


0. Hence it is a d.f. except for a normalization, and w(.) is its ch.f. Thus W is
uniquely determined by w, which in turn is uniquely determined by 4. It then
follows that the left contiiiuous G is uniquely determined by W, and hence so
is 4. Since 4 and G determine y in (3) uniquely, the representation in (3) is
unique.
We now establish the representation (3) if 4 is an infinitely divisible ch.f.,
i.e., the necessity part of (3). (Because of its importance, an alternative proof
of this part based on a result of Gnedenko's is also given later.) Thus for each
n > 1 there exists a ch.f. $, such that 4 = ($I,)n, and since Log 4 exists,
consider

n($I,(t) -

[ : I 1
1) = n exp -Log 4(t) - 1
5.2 Infinite Divisibility

i Log $(t) as n + oo. (9)

If F, is the d.f. of $
,I then (9) may be written as

Let

Then G,(-oo) = 0, G, is nondecreasing, left continuous, and G,(+oo) < oo.


>
We show that (i) {G,, n 1) is uniformly bounded, so by the Helly selection
principle, G,, (x) + G(x), a bounded nondecreasing left continuous function
(for a subsequence), and (ii) this G and 4 determine y and (3) obtains.
To establish (i) consider the followiiig integral for each fixed t and n:

Hence the real and imaginary parts of I,(t) converge to the corresponding
parts of Log 4 ( t ) , so that one has, from the real parts,

&% S, [(cost, - 1)(1 + x2)/x2]dG, (2) = log 4 ( t )I.

If A, = J[lt"lI1ldG,(x),B, = JIIZl>,l dG,(x), so that Gn(+oo) = An + Bn,


for each E > 0 there is an n, such that n n, implies >

and so (12a) gives


310 5 Weak Limit Laws

Thus {A,, n > 1) is bounded. If 0 < t < 2, (12b) gives

>
Thus {En,n 1) is bounded, and hence {G,(+oo), n 1) is bounded. >
Now by the Helly selection principle there is a subsequence {Gn,)k>l
which converges to G, a bounded nondecreasing left continuous function, at
all points of continuity of G. Clearly G(-cm) = 0. To see that G,,(+cm) +
G(+cm), let E > 0 be given and choose a0 > 1, a continuity point of G, such
that G(+oo) G(ao) < €13 and for I t
- <
2/ao, log 4(t)l €112. Choose <
>
N, > 1 such that n N, +-I G,, (ao) G(ao)l < €13. Then
-

Now if Nl > n,, then (12b) holds if z > a0 > 1, so that for n > Nl we get

- log $(t)l + &/I2 > S (1 - cos tx) dG, (2). (14)


[I 4>aol
Integrating (14) for 0 < t < 2/ao, and dividing by the length of the interval,

Thus

A xl>a"l
dG,(x) < €16 + 2 sup
I tl<2ar1
I log I m(t) I < €16 + 2 . €112 = €13, (15)

by the choice of ao. Hence if n > N2 = max(N,, Nl), then (13) and (15) yield
5.2 Infinite Divisibility

Finally, let

Then y,, I < GO, and I, of (11) becomes

Since the integrand in (17) is bounded and continuous for each t , and
G,, + G as shown by (16), we may apply the Helly-Bray theorem for it and
interchange the limits. But by (11) In,(t) + Log q5(t), so that (17) implies
yn, must converge to some number y. Hence we get

Log $(t) = ity + 1- -)


itx +
1 x2
-d~(x).
1+x2 x2
which is (3). This gives the necessity, aiid with it the theorem is completely
proved.

An alternative proof of necessity will be given after we record the special


but also useful forms of Lkvy and Kolmogorov, as consequences of formula
(3).
Define M : R- + R+ and N : R+ + R- by the equations

If a2 = G(O+) - G(0-) >


0 is the jump of G at x = 0, then (i) M , N are both
nondecreasing, (ii) M(-oo) = N(+Go) = 0, (iii) M aiid GI l ,N aiid GI R+
have the same points of continuity, aiid (iv) for each E > 0,

Given G, (18) and (19) determine M and N , aiid conversely, if M , N are given
to satisfy (18) aiid (19) aiid the conditions (i)-(iv), then G is determined in
terms of which (3) becomes, with a y t R,

ixt
+ L + ( e m t 1
- +1x 2) ~N(x)}, t t ~ .
312 5 Weak Limit Laws

The collection (7,a', M , N ) is the L h y (spectral) set for the infinitely divisible
ch.f. 4, and the pair (M,N ) the Le'vy measures. These will be needed in Section
8.4 as well.
If the iiifinitely divisible d.f. has a finite variance (equivalently, its ch.f. q5
is twice differentiable), then we can also get the Kolmogorov formula from (3)
as follows.
Define K : R + R', called the Kolmogorov function, by

Clearly K ( z ) > 0 for z t R, and the followiiig definition of q5 is formally


correct for suitable y:

We have to show that K(+cm) < cc if the d.f. has finite variance. In fact,
from (22) by differentiation of the holomorphic function Log 4 at t = 0, we
get

where X is the r.v. with 4 as its ch.f. Thus K(+oo) < oo, and then

Thus (22) is obtained from (3) and is rigorously correct. We state this result
as follows:

Theorem 6 Let q5 : R + C be a mapping. T h e n it i s a n infinitely divisible


ch.f. iff it admits a representation (20) for a Lkvy set (7,a2,M , N ) . O n the
other hand, if q5 : R + C i s a ch.f. of a d.f. with finite variance, then it i s
infinitely divisible iff q5 admits a representation (22) for a Kolmogorov pair
(V, K ) . If (M,N ) and K are taken left continuous, t h e n these representations
are unique.

It may be remarked that (22) can also be obtained directly using the argu-
ment of (3) with slight simplifications, though it is unnecessary t o reproduce
them here (cf. Problem 12).
To present the alternative proof of the necessity part of Theorem 5 noted
earlier, we need the following result on the convergence of iiifinitely divisible
d.f.s, which is of independent interest. It is due to B.V. Gnedenko.
5.2 Infinite Divisibility 313

Proposition 7 Let {F,, n > 1) be a sequence of infinitely divisible d.f.s


with the L h y - K h i n t c h i n e pairs {(y,, G,), n > 1). T h e n F, tends t o a d.f. F
(necessarily infinitely divisible) i f l there exists a pair (y, G) a s above such that
(i) limn,, Gn(x) = G(x) at all continuity points x of G,
(ii) lim,,, G,(*oo) = G(*oo) and
(iii) lim,,, y, = y (E R).

T h e pair ( y , G ) determines the ch.f. of F by the (Le'vy-Khintchine) for-


m u l a (3).

Proof Let F, be infinitely divisible and 4, be its ch.f. Then for each n ,

, ,

If (i)-(iii) hold, then {G,(+oo), n >1) is convergent, so that it is bounded and


< >
I G n ( x ) sup, G,(+oo) < oo. Thus the G,, n 1, are uniformly bounded.
The integrand in (23) is a bounded contiiiuous function in x for each t. Hence
by the Helly-Bray theorem [with (iii)] &(t) + 4(t) and 4 is given by (23)
with (y,, G,) replaced by (y, G), and is continuous. Thus 4 is an infinitely
divisible ch.f. by the sufficiency part of Theorem 5. By the continuity theorem
F, + F and 4 is the ch.f. of F, which is infinitely divisible.
The converse can be proved using the arguments of the necessity part of
Theorem 5. But we present an alternative method which does not depend
on that result, so that the present result can be used t o obtain a simpler
proof of its necessity. The idea here is to use the same trick employed for the
uniqueness part of the proof of Theorem 5. Thus let $, = Log 4, and

(by substitution for gn from (23) and simplification).

By hypothesis of this part, 4, + 4, a ch.f. Hence gn + $ and then wn(t) +


w(t), t t R,and since 4 is continuous, so is w. But

and G, is increasing, G,(-oo) = 0. Thus w,(.)is the ch.f. of the "d.f." H,,
where
314 5 Weak Limit Laws

Then by the continuity theorem, H, + H at all continuity points of H, and


H,(*oo) + H(*oo). But h is a positive bounded continuous function such
that lim,,,o h(u) = lim,,, i,
h(u) = 1. It follows, by the Radon-Nikod9m
theorem. that

Since l / h ( u ) is also bounded and continuous, one concludes from (25) and
the Helly-Bray theorem that

G, (x) + G(x) =
I-, -d H ( u )

at all continuity points x of G (equivalently of H) and G,(*oo) + G(*oo).


Since q5, + 4 by hypothesis, (23) and this imply that y, + y , and hence
(i)-(iii) hold; therefore q5 is given by (23) with (y, G) in place of (y,, G,).
This completes the proof.

Remark The corresponding limit theorems hold for the Lkvy and Kol-
mogorov representations, although each statement needs a separate but en-
tirely similar argument.

We are now ready to present the following:

Alternative Proof of (the Necessity of) Theorem 5 By hypothesis 4 =


($,), for each n, where $, is a ch.f. Then by (9)

n($, (t) - 1) + Log q5(t), n + oo, t E Kt. (9')

Let F, be the d.f. of $., Hence

(e"~ - 1 - )itx
1+x2 d~,(x)

where

If 4, is defined by (3) with (y,, G,), then it is clear (by the sufficiency) that 4,
is illfinitely divisible. The hypothesis and (9') imply that q5,(t) + q5(t),t E R.
Thus the necessity part of Proposition 7 implies that y, + y , G, + G satis-
fying the stated conditions and that 4 is given by (3) for the pair (y, G). This
5.2 Infinite Divisibility

proves the result.

In general, the infinite divisibility is verified by an explicit construction of


the pair (y, G) aiid the formula (3), (20), or (22). There is no other quick aiid
easy test. Let us present an example to show how this may be accomplished.

Example 8 Let 0 < a 5 P < 1 and X , Y be independent r.v.s on


( 0 ,E, P) whose distributions are specified as

and
P[Y = 01 = 1 / ( 1 + a) = 1 - P[Y = 1 1 .
+
Let Z = X Y. Then X is an infinitely divisible r.v., but Y and Z are not.
However, if 2 is independent of Z and has the same d.f. as Z , then V = Z - 2
is an iiifiiiitely divisible r.v.

Proof We establish these facts as follows. Here Y is a Bernoulli r.v., aiid


+
X is an r.v. representing the 1st head on the ( n l ) t h trial in a sequence of
tosses of a coin whose probability of obtaining a tail is P (then X is said to
have a geometric d.f.):

Thus 4 never vanishes, and we have by 4.2.9,

Hence

Since the product is the ch.f. of a sum of n independent Poisson r.v.s, it


is iiifiiiitely divisible, and by Proposition 3, 4 is iiifiiiitely divisible. Also by
Proposition 2b, the bounded r.v. Y is not infinitely divisible.
Regarding Z , consider its ch.f.

+
Since the ch.f. of Y in absolute value lies between (1 - a ) / ( l a) and 1, it
also never vanishes. Thus $(t) never vanishes, and again by 4.2.9,
5 Weak Limit Laws

Log $(t) = Log $(t) + Log


( l : f 3
= C(emt - I)pn/n
n>l

+Log(l + aePit) - log(1 + a) [by (26)]

+
where y = C n > l ( p n (-l)na!n)/(l + n 2 ) and G is a function on the integers
+
of bounded variation with jumps of size n p n / ( l n2) and (-l)nP1[nan/(l +
n2)] at positive and negative integers, respectively. It has no jump at 0. This
formula is like (3); but G is not inonotoiie increasing. Hence by Theorem 5, @
cannot be infinitely divisible.
If c(t) = E(eitv) = $(t)$(-t) = $(t)I2, then using (27) we get

where G is now moiiotoiie nondecreasing and bounded with positive jumps at


+ +
f1, *2,. . . , of sizes [I n / ( 1 n2)](plnl ( - l ) n - l a l n l ) since
, <
a! p, and no
jump at 0. Hence (28) is the same as (3) with "y = 0" and "G = G,"so that
V is an infinitely divisible r.v.

This last part implies that <= $? is an infinitely divisible ch.f. even
though neither $ nor $ is such. Also c(t) = 1 4 I 2 ( t )E(eitY)I2 is a product of
an infinitely divisible I 412 and a noninfinitely divisible one. Thus the intricate
structure of these ch.f.s as well as the beauty and depth of the L&y-Khintchine
formula are exhibited by this example.
The next result is motivational for the work of the followiiig section.

Example 9 Let X n k , 1 k < < n, n >


1, be independent and have the
d.f.s defined by P [ X n k = kin] = l / n = P [ X n k = -k/n] and P [ X n k = 01 =
< <
1 (2/n), 1 k n. Then S, = C;=,Xnk is not illfinitely divisible for any
-

n > 1, but Sn D
i S and S is infinitely divisible; it is not Gaussian.

Proof Since each S, is a bounded liondegenerate r.v., it cannot be in-


finitely divisible by Proposition 2b. Let 4, be the ch.f. of s,, so that
5.2 Infinite Divisibility

q5n (t) = E (e"'ll ) = nn

k=l
E (eztxrb" (by independence)

Thus 4, is real and for each n ( > 5), &(t) is never zero [> (i)n].
Consequently

log &(t) = 2
k=l
log [l + (cos
kt
; I)]
-

Z
= - [(cosk t j n ) - 11 - 0(1/n)
n

Hence the Riemanii integral approxiination gives

lim log & ( t ) = 2 (costx - 1) dx, t ER


n-00

Since the limit is coiitinuous at t = 0,

D
is a ch.f., which, by the continuity theorem, implies Sn + S and 4 is the ch.f.
of S. Clearly S is not a normal (=Gaussian) r.v.
To see that q5 is infinitely divisible, note that

4(t) = exp {Il (e" - 1) dx }

where G is a nonnegative liondecreasing bounded (by 2) function. It may be


written as
z < -1

-1<x<1
x > 1.
By Theorem 5, (0, G) is a Lkvy-Khintchine pair and 4 is infinitely divisible,
as asserted.

The properties of the sequence {Xnk,1 k < < n} can be expressed as


f0110~s:E ( X n k ) = 0;
2k2
Var Xnk = -
n3 '
318 5 Weak Limit Laws

thus

and for E > 0,


2
P [ Xnk - E(Xnk) > &] < n-

and
lim max P[lXnk - E ( X n k )
n+cc l < k < n
> E] = 0
Note that the variances of the partial sums are bounded, aiid the independent
components X n k are uniformly absolutely negligible.
Abstracting these two properties, we present some limit theorems aiid
some important results, such as the Lindeberg-Feller theorem, conditions for
the limit law t o be Poisson, and the like.

5.3 General Limit Laws, Including Stability


Let us now abstract the properties of the Xnk-sequence appearing in Example
9 of the preceding section. The "smallness" of X n k is made precise in

Definition 1 (a) Let {Xnk,1 k <


k,, n < >
1) be a sequence of
sequences of r.v.s on a probability space (R,
C , P). Then the Xnk are called
infinitesimal if for each E > 0

lim
n-cc
max P[IX n k
l<k<k,,
> E] = 0.
(b) More generally, the Xnk are called asymptotically constant if there ex-
ist constants ank [which typically are E ( X n k ) if these exist, and are medians
< <
in general] such that in {XAk = Xnk - ank, 1 k k n , n I ) , the XAk are >
infinitesimal.

For coinputatioiial convenience, it is better t o have alternative forms of


condition (1). These are given by

Proposition 2 Let {Xnk, 1 < k < k,, n > 1) be a sequence of r.v.s. Then
the following are equivalent:

(i) The Xnk are infinitesimal.


(ii) If cjnk is the ch.f. of X n k , then

lim max ~ 5 , ~ ( t ) - 1 1= 0
n-cc l<k<k,,
5.3 General Limit Laws

uniformly in t < T for each T > 0.


(iii) If Fnkis the d.f. of X n k , then
x2
lim max -dFnk(z)= 0.

Proof (i)+(ii) Let T > 0 be arbitrarily fixed. Since (1)holds, given E > 0,
>
there exists an NE such that n N, implies

E &
< -T+ - = E for all It S T .
-2T 2
Thus (ii) holds. This implication is the one often used.

(ii)+(i) By hypothesis lirn,,, q&(t) = 1 uniformly in k for tl 5 T,


T > 0. Thus Xnk i 0 in distribution, hence in probability, uiiiforinly in k.
This means, in symbols, ( I ) is true, so that (i) follows.

(i)+(iii) If 0 < E < $, then there is an NE such that

Hence

5 €2 + €12 < (€12) + (€12) = E.

Thus (iii) holds.

On the other hand, if (iii) is given, for each E > 0,


5 Weak Limit Laws

x2
inax
llklk, IW 1 + x2 dF,k
-- (x)

> mar
1
1
,,>, -
1
€2
+ €2 dF,k (x), because
x2
--
1+ x 2
1' on R',

Taking limits as n + cc,this implies (i), and the proof is complete

If we are given a sequence of sequences of independent r.v.s {Xnk,1 k < <


k,), conditions should be found such that the sequence {(S, -A,)/B,, n 1) >
converges in distribution t o an r.v., where S, = ~ k : ~
X n k ,Bn > 0, A, E R
are suitable constants. The key idea now is to express the ch.f. 4, of
(S, - A,)/B, in terms of a ch.f. $, as in Eq. (2.3), for some suitable pair
(y,, G,) with an error term tending to zero, as n + cc uniformly on compact
t-sets. Then the given sequence converges iff the $, sequence does, and for
the latter sequence Proposition 2.7 tells us how to proceed. The remarkable
first steps were taken in 1936 by G. Bawly even before the above result was
discovered. This simple, yet important proposition, which suffices for many ap-
plications, will now be presented. (Of course, the Kolmogorov representation
was known by 1932.) For nontriviality, henceforth we assume that k, + oo as
n + cc without further mention.

Theorem 3 (Bawly) Let {Xnk,1 5 k 5 k,,n > 1) be a sequence


of sequences of rowwise independent r.v.s with finite variances. Suppose that
the r.v.s Xnk - E ( X n k ) are infinitesimal and -
Var S, < cc, where
Sn = c:, Xnk. T h e n for a n y sequence {A,,n > 1) c R,S, A, + s-

in distribution iff the following sequence of infinitely divisible ch.f.s $, (or


their d.f.s) converges t o a (necessarily infinitely divisible) ch.f. @ as n + cc,
uniformly o n compact subsets of R, where

Here

Fnkbeing the d.f. of Xnk T h e limit d.f. in both cases i s the same, and i t i s
infinitely divisible.
5.3 General Limit Laws 321

Remark The associated d.f.s with the ch.f.s $, are called the accompany-
ing laws of the given sequence {Sn- A,, n > 1).

Proof Let

Then by independence

We now associate an infinitely divisible ch.f. with the right side on using the
infinitesimality of XAk = Xnk - E ( X n k ) .
+
Let FAkbe the d.f. of Xkk, so that FAk= Fnk(z E ( X n k ) ) Fnk
, being the
d.f. of Xnk. Consider

By Proposition 2 [ ( i ) ~ ( i i ) ] ,

lim
n-oo
max l a n l , = 0,
l<k<k,,
tl < T; T > 0.

Hence there exists an N1 such that n >


Nl J maxl<k<krLI ankl < Be- i.
cause for tl < +
T , &k(t) = 1 ank, cjnk does not vanish for n Nl and, by >
Proposition 4.2.9, Log q5nk is well defined. Thus

Also, E(XAk) = 0, whence

Note that (6) implies

If $, is as defined in (3) and 4, is given by (4), then one has the followiiig
>
simplification in which n N1, so that Log 4, is well defined for I t T : <
322 5 Weak Limit Laws

I
k ,,
= itan + (edX -

z2
1 - itz)
I
d K n (x)- itan + C Log On/ ( t )
/=1

Hence with ( 5 ) and ( 7 ) , ( 8 ) becomes

as n + cm, by Proposition 2. Thus ($,/$,)(t) + 1 + &(t) - $,(t) + 0


as n + cm uniformly in t <
T. Then if $,(t) + $ ( t ) , tl T , it is clear <
that & ( t ) + $ ( t ) also, I t <
T and conversely. The proof of the theorem is
complete.

It is now easy t o present conditions for a prescribed infinitely divisible limit


distribution t o be the limit d.f. of the sequences { X n k , 1 k k,, n 1) of< < >
Theorem 3, using Proposition 2.7. The following result, due t o Gnedenko, is
an adaptation of the former.

Theorem 4 Let {X,k, 1 < k < k,,n > 1) be a sequence of sequences


of rowwise independent r.v.s with finite variances and each X n k - E ( X n k ) be
infinitesimal. Then the d.f.s of the sequence of sums S , = X n k A,, c::, -

for a given constant sequence {A,, n > 1) c R converges to a limit d.f. F,


and their variances tend to the variance of F iff there exist an a! E R and a
bounded nondecreasing function K : R + I%+ with K(-cm) = 0 such that

at all continuity points z of K ( z ) ,

(ii) lim
n-00
5/"
/=I -00
+
u 2 d ~ , * ( u E ( x , ~ ) )= K ( + a ) ,
5.3 General Limit Laws

(iii)

where Fnkis the d.f. of the Xnk T h e ch.f. of F i s given by (22) of Section 2,
with (a,K) as the Kolmogorov pair determined by (2)-(iii) above.

Proof Let n, = C;:, E ( X n k ) - A,, aiid

Then K,(-cm) = 0, K, is nondecreasing, and by (i)-(iii), K,(z) + K(z)


at K-continuity points x and K,(+oo) + K(+oo), a, + a . Since Var
S, = Kn(+cm), and a convergent (real) sequence is bounded, the "if' part of
Bawly's theorem is satisfied: and if $, is defined by (3) with this a, and K,,
then $, is an infinitely divisible ch.f., and by (an analog for the Kolmogorov
representation of) Proposition 2.7, $,(t) + $ ( t ) ,t E R,where $ is given by
(3) with the above n and K . Hence S , A , 2
3 and, further, the proposition
implies that the variances a2(S,) converge t o the variance of s because in the
Kolmogorov formula K(+cm) = Var 3 . Thus the result holds in this direction.
D -
Conversely, let S, -A, + S and a2(S, -A,) = a2(S,) + a2(s). Then the
variances are bounded and Xnk - E ( X n k ) are infinitesimal in both directions,
by hypothesis. Hence by the necessity part of Bawly's theorem (and Proposi-
tion 2.7 or the Kolmogorov representation), a, and Kn(.) defined above must
converge t o n and K(.) (always taken left coiitinuous), aiid the latter pair
determines $ uniquely, which is then a ch.f. of an infinitely divisible d.f. Thus
conditions (i)-(iii) above hold. This completes the proof.

The result enables us t o obtain conditions for distributional convergence


of partial sums of independent r.v.s with finite variances t o any infinitely di-
visible d.f., as soon as we are able t o calculate the Kolmogorov pair (a,K ) .

Let us now present a specialization of the above result if the desired limit
d.f. is the standard normal N ( 0 , l ) . Since its ch.f. is t H e p t 2 I 2 , in the Kol-
inogorov formula [see (22) in Section 21, it is seen that y = 0 aiid K must
have a jump of size 1 at z = 0; i.e., K ( z ) = 0 for z < 0, = 1 for z 0. With>
this knowledge of (a,K ) the following normal convergence criterion holds.

Theorem 5 Let {Xnk,1 5 k 5 k,} be a sequence of sequences of row-


wise independent r.v.s with two m o m e n t s finite. T h e n for some sequence of
constants {A,, n > 1) the sequence Si = c::, D
Xnk A, + 3, which i s
-

distributed as N (0, I ) , a2(Sc)+ 1, and X n k - E (Xnk) are infinitesimal, iff


for each E > O , F n k being the d.f. of X n k , we have
324 5 Weak Limit Laws

n-oo

(iii) lim
n-oo
(5
k=l
- A,
E(xnk1

Proof Suppose that the conditions hold. Then the (Xnk - E(Xnk)) are
infinitesimal since for each E > 0,

max P [ Xnk - E(Xnk)l


15Kkn
> E] = max ![[I "I>&]
dFnr(x+E(X,e))

The variances a2(S;) are bounded because by adding (i) and (ii), we get

and
l~ll
n-oo
k",

k=l
/"_" +
u2ci~,k(u E(x,,~))
= 0 if x < 0
1 if 2 2 0 .
Thus if

then conditions (i)-(iii) of Theorem 4 are satisfied for the infinitesimal se-
< < >
quence {Xnk E ( X n k ) ,1 k k,, n 1). Hence, by that theorem, the limit
-

distribution of the S: is N(O,1).


Conversely, if the d.f.s F, of S: converge to N(O,1) and the variances of
S: tend to 1, the r.v.s Xnk - E ( X n k ) being infinitesimal, then by Theorem
4 again, its conditions (i)-(iii) must hold. These are now equivalent to the
present conditions, since (iii), with n = 0, is the same in both cases. Next for
all E > 0 with the two-valued K going with N(O, I), we have as n + m,
5 . 3 General Limit Laws

These imply as n + oo,

This completes the proof.

An immediate consequence of this result is the celebrated Lindeberg-Feller


theorem. Here the X,,n >
1, are independent r.v.s with E(X,) = 0, Var
X, = 02,and Xnk = X k / o ( S n ) ,where S, = EL=1 Xk. If

then the Xnk are infinitesimal and Var(CL=, X n k ) = 1, so that conditions


(ii) and (iii) of Theorem 4 are automatic. Now for the convergence t o N(0, I ) ,
only (i) of Theorem 4, which became (i) of Theorem 5 , need be satisfied. Note
that if Fk and Fnkare the d.f.s of X k and X n k , then

and hence for any E > 0,

x2 dFnk(x) =
S [I ~12.1
x2 d ~ ( kx a ( ~ , ) )= --
a2(Sn) S y2 dFk(y).
[I UI~EO(S,,)]
(10)
Thus we have the following result as a consequence of Theorem 5. The suffi-
ciency is due t o Lindeberg and the necessity t o Feller.

Theorem 6 (Lindeberg and Feller) Let {X,,n > 1) be independent


r.v.s with finite variances and zero means. Let Fk be the d.f. of X k , and
Sn = xr=l X k . T h e n (Sn/a(Sn))2 S, which i s N(0, I ) , and the Xk/a(S,)
are infinitesimal iff the following CONDITION O F LINDEBERG i s satisfied
,for each E > 0 :

It is useful t o note that if the r.v.s have three moments finite, then the Lia-
pouiiov condition P3 (S,)/03(S,) + 0 implies ( l l ) ,so that this is an important
generalization of the Liapounov theorem. To see this implication, consider for
E > 0,
5 Weak Limit Laws

1
x2 dFk (x)
k=l

+
The same computation holds if only 2 S, S > 0, moments exist and moreover
E(I ~ ~ 1 ~ + ~ ) / fgoes
l ~t o+zero
~ ( as
S n~+
) cm.
Observing that, for a Poisson distribution with parameter X > 0, the
Kolmogorov pair (a,K) is given by a = A,

x<1
K(x) = {Oh; x 2 1;

we can present analogous conditions for sums of independent r.v.s for conver-
gence t o a Poisson limit. This will extend Proposition 1.9. However, it is again
an immediate consequence of Theorem 4. The easy verification is left t o the
reader.

Theorem 7 Let {Xnk, 1 < k < k n , n > 1) be a sequence of rowwise

C
independent sequences of infinitesimal r.v.s with nite variances. Then for
some sequences {A,, n 2 1) of constants, S: = C k = l X n k A, converges i n
-

distribution to a Poisson r.v. with parameter A > 0, and 02(S;) + A, iff for
each E > 0 we have

(ii)

(iii)

It is now natural t o ask whether there are analogous results for the se-
>
quences of partial sums {S,, n 11,S, = x;"
X n k , if the infinitesimal X n k
do not have finite moments. Indeed, the answer is yes, and with the Lkvy or
Lkvy-Khintchine representations and Proposition 2.7, such results have also
been obtained, primarily by B.V. Gnedenko. Now a more delicate estimation
of various integrals is needed. We state the main result-a generalized Bawly
theorem. Then one can understand the type of conditions that replace those
5.3 General Limit Laws 327

of Theorem 4. For a proof of the result, we direct the reader t o the classic
by Gnedenko and Kolmogorov (1954) where references t o the original sources
and other details are given.

Theorem 8 (Generalized Bawly) Let {Xnk,l < k < k,,n > 1)


be a sequence of rowwise independent sequences of infinitesimal r.v.s and
l Xnk - An for some constants {A,,n > 1). Then S: + S iff
,,
S; = x kk = D

the following accompanying sequence {Yn,n > 1 ) of infinitely divisible r.v.s


converges i n distribution to Y,and then Y=S a.e. Here Yn is an r.v. with ch.f.
$I, determined by

E (eztyr7) = $, ( t )= exp

where for any fixed but arbitrary T > 0, with Fnkas the d.f. of X n k , we have

(Even though the Yn depend o n T > 0, the limit r.v. Y does not.)
The other, equally useful, forms of the limit distributions use the Lkvy
representation. All such results are discussed in detail, with a beautiful pre-
sentation, in the above monograph. In all these theorems the r.v.s X,k were
assumed asymptotically constant. If this hypothesis is dropped, the methods
of proof undergo drastic changes. No general theory is available. But the fol-
lowing extension of the Lindeberg-Feller theorem, obtained by V.M. Zolotarev
in 1967, indicates these possibilities.
Let X n k ,k> l,n > 1, be rowwise independent r.v.s with E(Xnk) = 0,
2
Var Xnk = ank,and xkrl = 1. Let Sn = xk,l X n k , which converges
by Theorem 2.2.6. If G is the normal d.f., N(O, I ) , let Gnk be defined by
Gnk(x)= G ( x / a n k ) .If F, H are two d.f.s, let d(F,H) denote the Lkvy metric
defined in Eq. (4.1.11). We then have

Theorem 9 (Zolotarev) Let {Xnk,k > 1, n > 1) be a sequence of rowwise


independent sequences of r.v.s with means zero, variances a:k, such that
328 5 Weak Limit Laws

Let G be the n o r m a l d.f., N(0, I ) , and F, the d.f. of S, = Ck,lX n k . T h e n


-

F,(z) + ~ ( zfor ) all z E R [ o r d(F,, G) + 0] as n + oo iff

(i) an = supk d(Fnk,G n k ) + 0, where Fnki s the d.f. of Xnk and Gnk i s
defined above, and
(ii) for each E > 0, with An = {k : oik < 6)

Again we omit the proof of this interesting result t o avoid the digression.
In this vein, we state an important coilsequelice of Theorem 8. Its proof in-
volves an alternative form of the conditions of Theorem 8, and they are still
nontrivial.

Theorem 10 (Khintchine) Let Sn = c::,X n k , the X n k ,1 I


D
k I kn,
being independent and infinitesimal. Suppose Sn + S as n + oo. T h e n S i s
n o r m a l N ( 0 , l ) iff for each E > 0

lim
n-00
P [X n k1 > E] = 0,
which i s equivalent t o the condition

There are other interesting specializatioiis for Poisson aiid degenerate con-
vergence aiid then t o single sequences. For details, we refer the reader t o the
Gnedenko-Kolmogorov fundamental monograph noted above.
The preceding theory shows that any infinitely divisible d.f. can be a limit
d.f. of the sequence of sums of rowwise independent infinitesimal r.v.s. The
classical central limit theory leads t o the normal d.f. as the limit element.
However, it is of interest t o look for a subclass of the illfinitely divisible laws
which can arise having a "simpler" form than the general family. This turns
out t o be a family, called stable laws. Let us introduce this class by the fol-
lowing motivational example, which gives some concreteness t o the general
concept t o be discussed.

>
Example 11 Let {X,, n 1) be i.i.d. random variables with the common
d.f. F, given by its density F
L = f,, (Pareto density, c.f., Exercise 4.25)
5.3 General Limit Laws 329

If S, = Cr=lX k , we find the numbers b,(p) > 0 (called "normalizing con-


stants") such that S,/b,(p) 2 3, where 3 is an illfinitely divisible r.v.
(i) It is clear that, if p > 2, then a2 = Var X1 = (p - 2)-l < cm,and the
classical central limit theorem applies with b,(p) = m,
so that

Thus we only need to consider 0 < p 5 2. This has to be treated in two parts:
0 < p < 2 and p = 2. The variances do not exist in both cases.
(ii) 0 < p < 2. We cannot directly try to verify (12) here since it is first
D
necessary to find the b,(p) such that S,/b,(p) + S. Only then (12) gives
conditions for s to be N ( 0 , l ) . However, this is the key part. Let us try, by
analogy with (i), n", for some a > 0 as the normalizing factor. A heuristic
reason for this will become clear a little later. We use the technique of ch.f.s.
Thus

= [l - In(t)ln (say). (14)

It is clear that 4, is real and never zero on R.Thus we have, on expansion of


(14) 7
log &(t) = n[-I, - I - . . .] (15)
if I I, < 1. We now find a such that In= 0 ( 1 / n ) . In fact,

Taking a p = 1, we can then let n + oo.But note that the integral on the right
converges only if p < 2. (Its singularity is at the lower end point if p = 2.)
Hence for 0 < p < 2, letting a! = lip, we have

From (15) and (16), it follows that


5 Weak Limit Laws

and since the right side is continuous, it is a ch.f., by the continuity theorem.
If p = 1, then the limit is the Cauchy ch.f. Thus for O < p < 2, bn(p) = nllp
is the correct normalizing constant for Sn.
(iii) p = 2. Since b, = fi is not enough t o control the growth of S,, as
seen from the divergence of (16), we need b, t o grow somewhat faster. Let us
try b, = ( n log n)lI2. Then (15) becomes with this new normalization

if J n < 1. Here

nJ, = 4n 1
" sin2(tz/2 [n log n]'I2)
x3
dz

It is clear that the integral on (i,cm) converges for each i > 0, and hence
the right side goes t o zero as n + cc on this interval. Therefore its value, as
n + cc, is the same as the limit of

We now estimate this. First, choose i > 0 such that given rj > 0,1 - rj <
< <
sin2 u/u2 < 1 for 0 u i. Then

Hence for each t # 0,

t2
< JA < -[logi - t /2(n log n ) 'I2)].
log n
First letting n + cc on both sides and then r j + 0 we see that the extremes
have the limit t2/2. Coiisequently limn,, A
J = limn,, nJ, = t2/2. Substi-
tuting this in (17) we get

Now that b, = ( n log n)'l2 is seen as the correct normalizing constant, so that
S,/b, 2 S, we could also immediately verify (12), so that s is N ( 0 , I ) , which
5.3 General Limit Laws

agrees with (18).

Thus if bn(p) = nl/p for 0 < p < 2; = ( n l ~ g n ) l for


/~p = 2; or = f i if
p > 2, we see that

where fi = p on [0,2] and fi = 2 when p > 2. A ch.f. of the type on the


right side is clearly the n t h power of a similar ch.f. for each n 1, so that >
it is infinitely divisible. Such ch.f.s define symmetric stable distributions. We
therefore introduce the concept following P. Lkvy.

Definition 1 2 An r.v. X is said t o have a stable d.f. F if, when X I , X 2


are i.i.d. with the d.f. F, then for each a1 > 0, aa > 0 and bi E R,i = 1 , 2 ,
there exist a pair a > 0, b E R, such that (alX1 bl) + +
(azXz ba) and +
a X + b have the same d.f. Equivalently, F ( a l l ( ( . )- bl)) *F(a,'((.) - b2))(x)=
F (ap1(x - b)), x E R, where * denotes convolution.

Stated in terms of ch.f.s, the above becomes: if q5 is the ch.f. of X , then

From this it follows that normal, Cauchy, and degenerate d.f.s are stable.
But there are others. Their structure is quite interesting, though somewhat
intricate, as we shall see. [Some important applications of this class will be
considered in Section 8.4.1

Iterating (20) n times, one gets with a1 . . . , a n E E', an a E R+ and a


b E E such that n

In particular, setting a1 = a2 = . . . = a, = 1, there exists an a; > 0 and a


b' E R such that
(q5(t))" = q5(a',t)eitb', (21)
so that

Since the factor in [ ] in (21') is a ch.f. for each n , we conclude that q5 is


infinitely divisible, and so the stable class is a subset of the infinitely divisible
family. Hence a stable ch.f. never vanishes.

Remark Evidently (21) is derived from the definition of stability. It


says that if X is an r.v. and X I , . . . , X , are i.i.d. with the d.f. of X , then
332 5 Weak Limit Laws

sn = X, and a',X + b' have the same d.f. for each n and for some
a; > 0, b' E R.[Equivalently, (S, - b,)/a, has the same d.f. as X.] However,
the converse of this statement is also true; i.e., if q5 is a ch.f. which satisfies
(21) for each n , then q5 is stable in the sense of Definition 12, or (20). This
follows from the representation (or characterization) below of the class of sta-
ble ch.f.s. In (21), the a, are called norming constants. One says that the
+
d.f.s F, G are of the same type if F ( x ) = G(ax b) for all x E R and some
a > 0, b E R.In words, F, G differ only by the scale and location parameters.
From this point of view, if F is stable, then so is G, where G(x) = F ( a x b) +
for some a > 0, b E R, aiid all x E R. Thus we have stable types of laws.

Regarding a heuristic reason for the normalizing factors used in Example


11 (however, the r.v.s there are not stable, only the limit is), or in (21), may
be compared with the followiiig definitive statement.

Proposition 13 T h e norming constants a, > 0 i n (21) are always of the


f o r m a, = nl/",a! > 0. If, moreover, 4 i s nondegenerate, t h e n a! 5 2.

Proof To simplify the argument, we first reduce it to the symmetric case


and complete the demoiistratioii essentially followiiig Feller. If X I , X2 are i.i.d.
stable r.v.s, then Y = X I X 2 is a symmetric stable r.v aiid the X,, Y have
-

the same norming constants. Indeed, if q5 is the (common) ch.f. of the X, and
$I that of Y, we have by (21)

and $ ( t )> 0. Thus $I and 4 have the same norming constants a, > 0.
Consequently we may (and do) assume that X is a symmetric stable r.v. with
a, > 0 as its norming coilstant for the rest of the proof. Also let X $ 0 (to
avoid trivialities).
As noted in the remark before the statement of the proposition, the
stability hypothesis on X implies that if X I , . . . , X, are i.i.d. as X , then
S, = c r = l X i and a,X + bk are identically distributed for each n , where
a, is the normiiig coilstant aiid n > 1 is arbitrary. Replacing n by m n +
>
(m, n 1 integers), we first note that (S,+, - S,) aiid S, are independent
r.v.s, aiid the stability hypothesis implies the following set of equations, since
(S,+, - S,) and S, are identically distributed:

where X 1 , X " are i.i.d. as X . The symmetry of X implies that b' = 0 in this
representation of the S,. Since S,+, = (Sm+,- S,) +
S,, (22) yields
5 . 3 General Limit Laws 333

In terms of ch.f.s, this becomes

since XI, X" are i.i.d. aiid have the same ch.f. as X . Replacing t by tla, and
setting m = n in (24), we obtain

Since t E R is arbitrary, we see (why?) from (25) that aa, = aaa,. In a similar
manner considering r blocks of n terms each for S,,, we get a,, = a,a,, r >
>
1 , n 1. If now n = r P , SO that a , r + ~= a,a,k, k = 1 , 2 , . . . , p , multiply them
to get a, = (aT)P. We next obtain a few other arithmetical properties of
{a,, n >1). Since the result is true and trivial in the degenerate case for 4
(any a, works), we exclude this case in the following argument.
The sequence {a,, n >
1) is monotone increasing and tends to infinity.
Indeed, let u > 0 be arbitrarily fixed. Consider with (23) and the symmetry
of the r.v. X .

P[am+,X > a,u] = P[a,Xf + anX1' > amu]


> P[a,Xf > a,u, a,X1' > 01
= P[a,X1 > a,u]P[a,X" > 0] (by independence)
1
> -PIX1 > u]
2
(by symmetry of the d.f. of X").

Thus
1 1
P [ X > (am/am+,)u] > -PIX1
2
> u] = - P [ X
2
> u].
(26)
Now the right side is a fixed positive constant for a suitable u > 0. If a,/am+,
is not bounded as m + oo, and n + cm, then the left side goes to zero,
contradicting the inequality of (26). Hence

In particular, if rn = rk and n = (r + 1)" rn, where r > 0 is fixed, we get


for large rn, n ,

by the preceding paragraph. Letting k + cm, this implies a,/a,+l 1 so that <
>
{a,, r 1) is monotone.
Next we assert that a, = r P , for some C,? > 0, so that a, tends to infinity
>
aiid proves the main part. In fact, if k , p 1 are integers aiid q > 1,then we
can find a unique integer j >
0 such that
334 5 Weak Limit Laws

This implies a, > 1 and then on taking "logs" for these inequalities and
dividing, one gets (all ak = 1 + X = 0 a.e. by (21), so that ak # 1):

j logp
< - log k 5 +
( j 1)log p
(j+l)loga, logak jloga, '

This is independent of q, and so by taking q large, we can make j large, so


that the extremes can be made arbitrarily close to each other. This implies
the middle ratio is independent of k. Letting (log k)/ logak = a > 0, we get
ak = kl/" aiid p = l l a . It only remains to show that if a > 2, then 4 is
degenerate.
Again consider (21) for I q5(.)I. Since q5 is infinitely divisible, $(t) I is never
zero. Thus
n10gq5(t)=loglq5(ant)l, ~ E R .
-1 .
But we just proved that a, = nl/". Hence replacing t by 7/12" ln the above
we get
I
1% 4 ( ~ )=l nlog q5(7/n0 )I,
and therefore log 1 q5(1/na ) I = O ( l / n ) . Consequently, n2/" log I q5(l/n1/")l =
-1

o(n('/")-l), and this tends to zero as n + oo if a > 2. It means that $(t)I =


+
1 o(t2) as t + 0. This implies that the r.v. X with q5 as its ch.f. has finite
second moment, aiid then it is zero, so that the r.v. is a constant a.e. To see
this, consider Y = X X, where x is i.i.d. as X. Thus Y has two moineiits
-

iff X has the same property (cf. Problem 33 of Chapter 4), and its ch.f. is
$I2 > 0. If F is the d.f. of Y, then
1 - cos tx
kX2d~(x)=2kf$ t2 dF(x)
1 - cos tx
5 2liminfk t2 dF(x) (by Fatou's lemma)
t+O

Here C is a constant satisfying [log I 4(t) 12] /t2 5 C as t + 0 +- 4 ( t )l 2 >


+
ePct2. But $(t)l = 1 o ( t 2 ) , t + 0. Thus the second moment of Y must
vanish. The proof is finished.

Remark If a = 2, the above computation implies that log Iq5(t) = O(t2)


as t + 0, so that $"(O) exists and X has two moments. Then (21') shows
that q5 is the ch.f. of (S, - n b f ) / f i for all n >
1, where S, = X k ,XI,
are i.i.d. as X , and b = E ( X ) . The classical central limit law (Theorem 1.2)
shows that 4 must be a normal ch.f. We leave the formal statement t o the
reader (see Problem 23.)
5.3 General Limit Laws 335

The preceding result has another consequence. We also term the constant
a! > 0 of the above proposition the characteristic exponent of (the d.f. of) X .
Corollary 14 If X is a nondegenerate symmetric stable r.v. with a char-
acteristic exponent n > 0, and X 1 , X 2 are i.i.d. as X , then for any positive
numbers a, b we have

Proof If X , X I , X 2 are as in the statement, aiid 4 is the ch.f. of X , then


by (24)

for any positive integers m, n. Replacing rn, n by np aiid rnq(p, q > 1 integers)
and t by t/(nq)'la in (30), we get

Hence (29) is true if a = p/q, b = rnln, i.e., all positive rationals. If a , b > 0
are real, then they can be approximated by sequences of rationals for which
(31) holds. Since 4 is continuous, (31) implies

This is (29), and the result follows.

The interest in stable laws is enhanced by the fact that they are the only
laws that can arise as limit distributions of the normalized sums of i.i.d. se-
quences of r.v.s as originally noted by P. Lkvy. That is how he first introduced
this concept. Let us present this result precisely.

Proposition 15 (Lkvy) Let {X,, n > 1) be 2.i.d. and S, = x i = 1 Xk.


D - .
Then for some constants A, > 0, B, E R, (l/A,)S, - B, + S tff s is a stable
r. v.

Proof That every stable law is a limit of the described type is immediate
from definition. In fact, by the remark following Definition 12, if X is a stable
r.v. aiid X I , . . . ,X, are i.i.d. as X , then S, = x:=, Xi
-
a,X b,,a, > +
D
0, so that (l/a,)S, b, = X , where b, = b,/a,. Thus only the converse
-

is nontrivial. The true and trivial case that S is degenerate will again be
eliminated in the rest of the proof.
D -
Suppose then (l/An)Sn- B, + S, as given. Hence every convergent subse-
quence on the left has the same limit 3. Consider the following blocks of i.i.d.
kn
sequences: SI, = E:jl Xi, 5'2.1 = E;Zn+l Xi1 . 1 S k n = E i = ( i - l ) n + l xi.BY
hypothesis, for each k >
1,
336 5 Weak Limit Laws

and {gk,k > 1) are i.i.d. Let k be arbitrarily fixed. Consider

D
as n + cm, since Yl, + s by hypothesis and {kn, n >
1) is a cofinal subse-
quence of the integers. (This is immediate if we use the image laws and go
t o another probability space on which the corresponding sequence converges
a.e., as in the second proof of Theorem 4.1.2.) From (32) and the definition
of Yk, in (33), one gets

D
But (33) and (34) imply that Ykn + S and ak,Ykn bk, + 2
?. Since S
is iioiidegeiierate by assumption, we can apply the result of Problem 22 in
Chapter 4 (and the reader should verify it now, if it was not already done),
ak, + ak > 0, bkn + bk, SO that (34) implies

Since k > 1 is arbitrary, this implies s is stable, which ends the proof.
The preceding work heightens interest in the stable laws and it is thus
natural t o study aiid determine this subset of the illfinitely divisible class.
Such a characterization has been obtained again jointly by A. Khiiitchiiie aiid
P. Lkvy in 1938, and we present it now with a somewhat simpler proof. The
original one depended on the canonical representation of infinitely divisible
laws given in Theorem 2.5.

Theorem 16 Let 4 : R +C be a mapping. Then q5 is a stable ch.f. iff it


admits the representation,

where y E R, -1 < /3 < 1, c > 0 , 0 < a! < 2, and


taii(xa/2) if a f l
m ( t ,a ) =
-(2/x) log tl if a = 1.
5.3 General Limit Laws 337

Remark The constant a! > 0 here will be seen t o be the same as that of
Proposition 13, and hence it is just the characteristic exponent of 4. The case
that a! > 2 of Proposition 13 corresponds t o c = 0 here. Also a = 2 gives the
normal, and n = 1, p = 0 , r = 0 gives the Cauchy ch.f.s. If 4 is liondegenerate
(thus c > O ) , then 14 is Lebesgue integrable on R, aiid so every nondegener-
ate stable d.f. is absolutely continuous with a continuous density (by Theorem
4.2.1). However, an explicit calculation of most of these densities is not sim-
ple. [Some asymptotic expansions for such densities have been presented in
1954 by A.V. Skorokhod.] We derive the representation (35), but the complete
proof will be given only for the symmetric stable case and comment on the
omitted part (shifted t o the problem section, see Exercises 24 and 25). The
argument uses some number theoretical techniques of great value.

Proof Let 4 be a stable ch.f. in the sense of Definition 12. Then (21) holds
by Proposition 13, with a, = nilo for some n > 0, so that, setting 6 = l / n
for coiivenience, we have

Since 4 is also infinitely divisible by (211), it never vanishes by Proposition


2.2a. (This elementary property is the only one taken from the theory of
Section 2.) By the remark following Proposition 13, a = 2 implies $ is normal
and (35) is true. We exclude this and the (true and) trivial degenerate cases
from the following discussion. So assume that 0 < a! < 2 (or S > Hence i).
by Proposition 4.2.9,

where g(0) = 0 and g ( . ) is continuous on R. Let h(t) = $(t)I. Then (36)


implies
(h(t)), = h(n6t), n 1, t E R. > (38)
Since (4 aiid so) h is continuous, we deduce from (38) for integers rn,n > 1,
h(n6) = (h(1))" and h(m6) = h ( n 6 ( m / n ) 6 )= h ( ( m / r ~ ) ~ ) ) , ,

so that
h((m/n)" = (h(m")'ln = (h(l))"ln.
By the continuity of h, we get h ( t 7 = ( h ( l ) ) t for t > 0. Replacing t by t q n
the above, one obtains

where c = -log h ( l ) > 0. Clearly (39) is true for t = 0 also, and then for all
t € R.
Let us next consider g ( . ) . From (36) and (37) (considering the imaginary
parts), one has
338 5 Weak Limit Laws

n g ( t ) = g(n6t) tb,. + (40)


Hence replacing n by mn gives

m n g ( t ) = g(m6n6t) tb,, +
= mg(n%) b,n% - + tb,, [by ( 4 0 ) ]
= m ( n g ( t ) tb,)
- - b,n% + tb,,.
Rewriting this, one has (set t = 1 )

b, = mb, + n6 b, = nb, + m6bn (because b, = b,,).

Thus
b,(n - n" = bn(m - m", n, m > 1 integers. (41)
If now a # 1 , so that 6 # 1 , we get a solution of (411, for some a0 E R : b, =
a o ( n - n 6 ) . Setting f ( t )= g ( t ) - a o t , then with ( 4 0 ) one has

nf ( t )= f ( n 6 t ) , t E R, n > I. (42)

For this functional equation we can apply the same argument as in ( 3 8 ) . Thus

f ( n 6 )= nf ( 1 ) and f ( m 6 )= f ( n 6 ( m / n ) 6=)nf ( ( m / n ) 6 ) .

Since f (.) is continuous, this gives

Next replacing t by t o , one gets (because 6 = lla)


f ( t )= f ( 1 ) . t" = g ( t ) - aot. (43)
Substituting ( 3 9 ) and ( 4 3 ) in ( 3 7 ) , we have for a! # 1,

Log 4 ( t ) = c l t o + iaot + if ( l ) t o (t > 0 )


= iaot - ct"(1 - iPo) (t > 0 ) (44)

where Po = f ( l ) / c . Since q5-t) = m,


the result for t < 0 also is obtained
from the above. This becomes an identity for t = 0 . Hence ( 4 4 ) yields

where 0 = P o / P ifPo # 0 , = 0 , ifPo = 0 , with -1 P 1. Now ( 4 5 ) is the< <


same as ( 3 5 ) if we can identify 0 as ta11(7~a/2).This is discussed further later
on.
5.3 General Limit Laws 339

Let us consider the case a = 1 [so that S = 1 in (4111. There is no problem


+
for (39). Now (40) becomes n g ( t ) = g ( n t ) tb,. From this we deduce, on
replacing t by m t and eliminating b, between these equations,

ng (mt)= g ( n m t ) + mtb,
= g ( n m t ) + rn [ n g ( t )
- g ( n t ) ]. (46)
To solve the functional equation (461, let w ( u ) = e P t L g ( e Uu) ,E R.Then if
>
u = log t , a, = log n , n 1, t > 0 , we get from (46)

For each fixed rn, let v ( r ) = w ( r a,) +


w ( r ) . Then (47) implies that v ( r
- +
>
a,) = v ( r ) ,n 1 and r E R.This periodicity behavior in turn gives for any
integers p, q, m, n with m 1, n > 1 >

Choose mo > 1, no> 1 such that a,,/an, is irrational. [This is possible.


Indeed let mo, no be relatively prime. For the continuous function f : x H
f ( z ) = n ; , f ( x ) = rno has a solution zo (by the intermediate value theorem)
which is irrational.] Then the set {pa,, +
qa,, : p, q all integers) is dense in
R.Hence (48) implies v ( u ) = v(O),by continuity of v ( . ) . This means in terms
of w (.) .

W(U + a,) - w ( u ) = w(a,) - w(O), u E R, rn > 1.


Let ( ( u ) = w ( u ) - w ( 0 ) . Then the above equation becomes

Replacing a , by pa, + qa, in (49), we get

If rn, n are replaced by rno,no, then, using the density argument, we can
deduce that
+ +
( ( u r ) = ( ( u ) ( ( r ) , U , 7- E R. (50)
340 5 Weak Limit Laws

But ((.) is continuous, and thus (50) is the classical Cauchy functional equa-
+
tion. So ((u) = aou, or w(u) = aou bo for some constants ao, bo. Hence for
t > 0, and u = logt,

g(t) = g(eU)= W ( U ) . elL= (a0 logt + bo)t. (51)


Substituting (39) aiid (51) in (37) with a = 1 [and using q5(-t) = m]gives
Log 4 ( t ) = ibot - c tl{l + i(ao/c) sgn t . log I t } , t E R. (52)
If p = -7rao/2c, then (52) is of the form (35), provided that we show I PI 1. <
In other words, (45) and (52) together give only the form of (35). To show that
the expressioiis for 4 given by (45) aiid (52) do in fact determine ch.f.s, we have
to verify that they are positive definite to use Bochner's theorem. This is quite
difficult. There is the followiiig alternative (detailed, but elementary) method
for 0 < a < 2, a # 1, i.e., for (45). Since q5 is integrable one considers its
inverse Fourier transform and shows that it determines a positive (continuous)
integrable function (=density of a d.f.) iff 6' = t a n ( ~ a l 2 )The
. annoying case
a = 1 needs a separate proof to show that I a o / c <
2/7r. Here we omit this
(nontrivial) work. (But a detailed sketch of the argument for 0 < a 2, a # 1, <
is given as Problems 24 and 25.)
If q5 is the ch.f. of a symmetric stable law, then 4 is real and the above
(unproved) case disappears, and one gets

which proves (35). Note that the a in (35) is the same constant as that in
Proposition 13. For a > 2 , 4 is a ch.f. only if c = 0.
Conversely, if q5 is given by (35), in the symmetric case it reduces to (53).
The latter is a ch.f. by P6lya's criterion (cf. Problem 25 of Chapter 4). To see
that 4 is then a stable ch.f., it suffices to verify the relation (20)(i.e., Definition
12). Thus for a1 > 0,az > 0 we see that

+
where a = (a? a$)'/'. Hence a function q5 defined by (53) is always a sym-
metric stable ch.f. Actually if we first verify that 4 given by (35) is a ch.f.
(indicated in the problem), then the above simple argument implies that it is
a stable ch.f. (Of course, the "if" here iiivolves a nontrivial amount of work.)
This finishes the proof.

Remark Let X be a symmetric stable r.v. so that its ch.f. q5x,,(.) is given
by (53). Then Schilder (1970) has observed that I . 11, : X + c ~ l ' " ~0~<,
<
a 2, defines a metric on the class of independent symmetric r.v.s in L m ( P ) .
Thus I X 1, = log ~ x , l c ll A m P ~ ( I ) , and with a small computation it is also seen
that for independent r.v.s X I , X2 with the same characteristic exponent a,
5.4 Invariance Principles 341

+ +
1x1 X z l m = I X I I , IX21,. This statement is also true if X 1 , X 2 and
+
X I X2 are all symmetric stable with the same a! > 0, but here the joint (or
multivariate) stability concept is needed. (See Problem 25 (c) on this notion.)

The simple argument concerning the representation (35) presented above


follows essentially that of Ramaswamy et al, (1976). A related treatment by
S. Bochner is given as Problems 26 and 27. An example of (35) with a =
i, /3 = 1 , =~0, c = 1, due independently t o P. Lkvy and N.V. Smirnov, is as
follows:

Several striking properties of stable laws are known. An excellent account


of these inay be found in the monograph of Giiedeiiko aiid Kolmogorov (1954).
For more recent accouiits, one inay see Zolotarev (1986) and Samorodiiitsky
aiid Taqqu (1994). We give them no further coiisideratioiis here, but strongly
urge the reader t o review the material carefully. The most interesting aspect
of this subject here is that the whole analysis depends only on the structure of
the real line R aiid the key coiicept of (statistical) independence. Replacing R
by an algebra of matrices defined on a (Hilbert) space, aiid introducing a new
concept called "free independence" on the new space relative t o an expectation
like functional (a trace operation on matrices), it is possible t o extend most
of the above analysis t o this new setting. This is being done by D. Voiculescu
(see his recent CRM monograph, 1992). All the preceding work is necessary t o
understand this exteiisioii which has theoretical consequences. The matrix al-
gebra actually goes over t o C*-algebras aiid von Neumaiiii algebras! We briefly
consider an application of the (classical) stable class in Section 8.4 and will
see how an important and very interesting new chapter of the subject emerges.

5.4 Invariance Principles


Let us take another look at the classical central limit theorem. If (X,, n 1)>
is an i.i.d. sequence of r.v.s with means zero and unit variances, and S, =
C;=, X k , then Theorem 1.2 says that (Sn/,";I) 3 Y, where Y is N ( 0 , l ) .
From some early (1931) results of A. Kolmogorov, aiid of P. Erdos aiid A/I.
Kac in the middle 1940s, it is possible t o look at the problem in the following
< <
novel way. If I = {t : 0 t 11, then define a mapping Y,(., .) : I x L? + R
by the equation

where [nt]is the integral part of nt, so that for t = 1, Y,(l, w) = S,(w)/,";I, w E
L?. Thus if we set So = O,Y,(.,w) is a polygonal path, joining (0,O) and
342 5 Weak Limit Laws

(t, Y,(t, w)); and hence for each n >


1, and w E R,the curve Y,(., w)
starts at 0 and is continuous on I. The central limit theorem, slightly ex-
D
tended (to be discussed later), shows iiot only that Yn(l, .) + Y(1, .), which
is N ( 0 , I ) , but that Y,(t, .) 2 Y(t, .) andY(t, .) is N(0, t). Moreover, for
O < t1 < t, < I , Y (1,.) Y (t2,.) aiid Y (t2, .) Y ( t l , .) are independent,
- -

N(O,1 t 2 ) aiid N(O, t 2 t l ) , respectively. This led M.D. Donsker t o look at


- -

Z,(.) = Y,(., .) : L? + C[O, 11, the space of real continuous functions as the
range space of {Z,, n > I ) , and if p, = P o 2;' is the image law, then t o
investigate the convergence of p, as well as t o determine the limit. Thus it is
desired t o show, in general, that, under reasonable conditions, one can assert

C ( S ) being the space of scalar coiitinuous fuiictioiis on a metric space S . Here


S = C[O, 11. This is equivalent t o saying that p,(A) + p(A) for all Bore1
sets A c S such that the boundary a A satisfies p(aA) = 0 (essentially the
same proof of Theorem 4.1.5 given there for S = R.) In 1951 Donsker was
able t o establish this result for the space S = C[O, I] aiid identify p as the
Wiener measure on S . Of course, this includes the Lindeberg-Lkvy theorem,
aiid opened up a whole new area of research in probability theory. These ideas
have been extended and perfected by Prokhorov (1956), and there has been
much research activity thereafter.
Since pn is the image measure of the 2, in C[O, I], aiid pn + p
in the above sense, one can coiisider the corresponding theorems if S is
taken t o be other interesting metric spaces. This new development is called
the weak convergence of probability measures in metric spaces. The work
is still being pursued in the current research. The second possibility is
D
t o note that 2, + Y [to mean that for each O < t l < . . . < t k <
1,(Z,(tl, .), . . . , Zn(tk,.)) 2 (Y(t1, .), . . . , Y ( t k ,.)) is equivalent t o showing
D
h(Z,) + h(Y) for each bounded continuous mapping h : C[0,1] + C[O, 11
aiid calculating the distributions of h(Y) for several interesting hs]. But this
is iiot siinple in general. However, p, is determined by the distribution of 2,
or of Y,(., .) aiid this in turn is determined by the i.i.d. sequence {X,, n I}. >
The classical limit theorem says (cf. Theorem 1.2) that for all distributions
satisfying these moment conditions the limit d.f. remains the same. Hence in
the general case the measure determined by the finite dimensional d.f.s of Y
is the Wiener measure for all the initial {X,, n > 1)-measures. Thus choose
some siinple aiid coiiveiiieiit d.f. for the X,-sequence, calculate the d.f. of
h(Y, (., .)), and then find its limit by letting n + oo. This will give the d.f. of
h(Y). The underlying idea is then called the invariance principle. Since it is
based on weak convergence it is sometimes also referred t o as weak invariance
principle or a functional central limit theorem. In other cases as in the first
SLLN (cf. Theorem 2.3.4 or 2.3.6) the convergence of the averages is a.e., aiid
the corresponding ideas lead t o a "strong" invariance principle. We present a
5.4 Invariance Principles 343

few of the results of Donsker and Prokhorov in this section because of their
importance and great interest in applications.
The preceding discussion clearly implies that we need t o consider new
technical problems before establishing any general results. The first one is the
definition and existence of Wiener measure aiid process, which can be stated
as follows. (From now on an r.v. X ( t , .) is also written as X t , for convenience.)

< <
Definition 1 An indexed family {Xt, 0 t 1) of r.v.s on a probability
space ( R , C, P) is called a Brownian motion (or a Wiener process) if each Xt
<
is a Gaussian (or normal) r.v. N(O, a 2 t ) aiid for each O t l < t2 < . . . < t n <
>
I , n 1, the r.v.s Xt,, Xt, Xt,, . . . , XtrL Xt ,,-, are mutually iiidepeiideiit
- -

with E(I Xt, - Xt,-, 1') = a2(tz.- tZP1),where Xo = 0 a.e. Thus

(Here the index [0, 11 is taken only for convenience. The concept holds if the
index is any subset of R', or even R with simple modifications.)

It is not obvious that such a process exists. Indeed, if Ftl,,,,,tn is the joint
>
d.f. of X t l , . . . , Xt,, , then from (3) we can immediately note that IFtl,...,t,, , n
1) is a compatible family. In terms of ch.f.s this is immediate, since

,...,t,, ..
( ~ 1 , . 1 ~ n )

= E(exp{iulXt, + . . . + iunXt,,})

aiid the compatibility conditions on the F become [cf. Eqs. (3.4.2), (3.4.3)]

(ii) h,,, , t , , , ( ~ z l , . . . , ~ z , , ) = 4 t l ,,t,,(~l,...,~n),[(il,...,in)+(1,2,...,n)].


Since &(u) = exp{-$u2a2t), this is clearly true for the d.f.s given by (3).
Hence by Theorem 3.4.11, it follows that there exists a probability space
(0,E, P) and a process {Xt, t E [O,1]}on it with the given finite-dimensional
distributions (3). In fact 0 = R [ O ' ~ ] , C = the 0-algebra generated by the
344 5 Weak Limit Laws

cylinder sets of R , and if w E R , then Xt(w) = w(t), the coordinate function.


However, we need t o know more precisely the range of the r.v.s. In other words,
since for each w, X(.)(w) = w(.) is a t-function (called the sample function),
what is the subspace of these w that satisfy (3)? For instance, (3) implies for
each 0 < s < t < 1,Xt X, is N(0, a 2 ( t s ) ) , so that we caii find all its
- -

moments. In particular, taking a2 = 1 for convenience,

However, a classical result of Kolmogorov's asserts that any process Xt for


which (KObeing a constant)

E(I Xt - X,") 5 KOIt - sl+" 6 > 0, a > 0, t, s E R, (5)


must necessarily have almost all its sample functions continuous. In other
words, the P-measure is not supported on all of R [ O > ~ ] , but it concentrates
(except for a set of P-measure zero) on the subset C[O,11 c I W [ ~ >On ~ ] . the
other hand, for the probability space (a, C, P) constructed above, the only
measurable sets (of C) are those which are determined by at most countably
many points. This implies that C[O, 11 @ C , but it has P-outer measure one
and P-inner measure zero (this needs a computation). Consequently one has
to extend P to P and expand C to 2,so that the new a-algebra is determined
by all {w : Xt(w) < u ) , t E [O,l],wE C [ O , l ] , uE R,and

The right side is given by (3). Fortunately this is all possible, aiid iiot too diffi-
cult. We omit the proof here, since it is iiot essential for this discussion. (It may
be found, for instance in the first author's (1979) monograph, pp. 186-191.)
One then notes from the work in real analysis, that because C[O, 11 is a sep-
arable metric space [under the sup norm as metric, I XI = sup,5tll X ( t ) ] ,
its Borel 0-algebra (i.e., the one determined by the open sets) aiid C are the
same and that any finite measure on such a Borel 0-algebra is automatically
regular, [i.e., P ( A ) = sup{P(K) : K c A, compact ), A E C]. This regular
probability measure P on the Borel a-algebra B of C[O,11, is called the Wiener
measure, and is also denoted W(.). Thus (C[O,11,B, W ) is the Wiener space
and {Xt, 0 5 t 5 1) can be regarded as a process on ( R , C, P) with its sample
fuiictioiis in C[O,I]. It is the Wiener (or Brownian motion) process.
There are other ways of constructing this process. We establish one such
construction in the last chapter and present some deep results. N. Wiener
was the first in 1923 to demonstrate rigorously the existence of this process
(hence the name Wiener process), even though R. Brown, an English botanist,
observed the process experimentally, i.e., the erratic behavior of its sample
paths (or functions), as early as 1826 (hence Brownian motion). Now it caii
be approximated by a random walk; and other methods, such as Wiener's
5.4 Invariance Principles 345

original construction, are available. However, there is no really very "simple"


proof, and the method outlined above seems t o be the "bottom line."
For further work, it is useful t o have the Skorokhod mapping theorem in
its general form. We had given a special case of it as Problem 5b in Chapter
2, aiid utilized it in deriving alternative (and simpler) proofs in Chapter 4
(e.g., a second proof of the Helly-Bray theorem). Since C[O, 11 is not finite
dimensional, our special case extends only by a nontrivial argument, which
we present here for a separable metric space. (Cf., also Problem 5(c)-(d) of
Chapter 2 t o understand its place here as a coilsequelice of earlier ideas.)

P r o p o s i t i o n 2 (Skorokhod) Let S be a complete separable metric space


with B as its Bore1 a-algebra. If P,, P are probability measures on B such that
Pn + P, in the sense that Pn(A) + P ( A ) for all A E B with P ( 8 A ) = 0
(such sets are also termed P-continuity s e t s ) , then there exist r.v.s X,, X
on the Lebesgue unit internal [0, I ] with values in S [so that X;l(B), X p l ( B )
are Lebesgue measurable classes of sets] such that X, + X a.e. [p], and

p being the Lebesgue measure. Thus Pn,P are the distributions of X n , X in S.

Proof Let p denote the metric fuiictioii on S . We coiistruct measurable


P-continuous partitions of S aiid analogous partitions of [0, 1) having equal
Lebesgue measure. Then define countably valued r.v.s on [0, 1) into the above
partitions of S and show that by the refinement order these converge t o the
desired r.v.s relative t o Pn- and P-measures. Here are the details. (see also
Billingsley (1995), 11.333.)
For each integer k > 1, let { B ~ } ,be~ balls ~ of diameter less than 2T"
which cover S and such that P(~B;)= 0. Since P is a finite measure, there
are only countably many balls whose boundaries can have positive measure, so
that by changing the radius slightly the above can be achieved. Since for any
balls El, B 2 ,d(Bl n B2) c d ( B l ) u d ( B 2 ) , we may disjunctify the above B:
aiid still retain them as P-continuous sets. Thus let A t = B?,A$ = B!j Elfl -

aiid Ak = B; - U?Z; B:, aiid let us continue this procedure for each k 1. >
If we let S,, , , =,, A:, , then for each k-tuple of integers (il, . . . , i i ) , the
collection {S,,,,,,,,,, k > I} are disjoint P-continuity sets such that for each
k , Uik21Silr,,,,iL = S Z ~ , . the . . ,diameters
~ ~ ~ ~ satisfy
, diam (S,,,,..,,,) I 2-IC and
together they cover S . (Verification is left t o the reader.)
Next obtain for the interval 10, 1) the corresponding decoinpositioiis such
,,
that Ii,,..., i, and I: ,,,,, are chosen t o satisfy p(Ii ,,..., i,) = P(S2 , i k ) and
p ( I z ,...,i,) = P, (S,,,,,,,i , ) , if ( i l l . . . , ik) 4 (ii , ik, . . . , i i ) (lexicographic or-
,;,
der), take I,,,,,,,,, t o the left of I,;,,,,, Here we use the order property of the
real line. Similarly we order I: ,,,.,ik t o the left of I; ,...,i; and both of these
cover the unit interval. With such a decomposition we construct the desired
r.v.s as follows: Choose a point xi ,,..., i, E Sil,...,i, and set
5 Weak Limit Laws

(Omit the empty I's.) This defines x,"and ~ " 1 1 [O, 1) as r.v.s into S, since
Sil,,,,,i,
E I3 for each k >
1. Furthermore, p(x,"(w),x,"+'(w)) 2-"', <
1 1, >
and the same is true of x ~ w ) Thus
. for each n , these are Cauchy sequences,
and by the coinpleteiiess of S, there exist mappings X, and X such that
X: (w) + X, (w), ~ " w ) + X(w), as k + oo,for all but at most a count-
able set of w that are boundary points of these I-sets. Heiice defining them
arbitrarily at these points (which have p-measure zero) we see that X,, X are
measurable mappings on [0, 1) into S, i.e., r.v.s. Also,

by hypothesis, since Sil,,,.,i, are P-continuity sets. Thus for each w E I,,,,..,i, ,
>
an no(w, k) can be found such that n no(w, k) + X: (w) = xk(w) , and so

Thus Xn(w) + X(w) for each w E [ O , l ) . It remains to establish (6).


Let A E B, and let U ' S ,,,,, ,, be the union of all S,,,, ,, that meet with
A. If AE is the €-neighborhood of A [i.e., AE = {y : p(y, A) < €11,then by
definition taking E > 2-'+I, we see that (J'S,,,,,,,,, c AE and that as E J, 0
we get 2 ' \ 2 (the overbar denotes closure). Consequently

If now A is a closed set, then on letting k + cc and then E J, 0, we get

lim sup p o (x"-' (A) < P(A). (8)


k

This implies p o (xk)-'+ p o X p l = P by the analog of Theorem 4.1.5,


which holds here with essentially no change in its proof. Heiice p o X p l = P.
Replacing X by X n and xkby x,"in the above, the result implies p o X i 1 =
P,. [In the above-noted theorem we constructed a function in showing that
<
P, + P L,P,(c)P ( C ) . The following function g may be used for the
same purpose: g(u) = f ((l/e)p(u,C ) ) , where f (u) = 1 if u >
0: = 1 u if -

< < >


0 u 1: = 0 if u 1. Then g = 0 outside of C", and

lim sup P, (C) < g d P = P(C") < P ( C ) + E.]


n
5.4 Invariance Principles

This completes the proof.

We have the following interesting consequence, t o be used below

Corollary 3 Let (S,B) and (3,g) be two metric spaces as in the propo-
sition. If P,, P are probabilities o n B such that P, + P in the sense of the
proposition, and i f f : S + s i s a (B, @-measurable mapping such that the
discontinuity points Df E B satisfy P ( D f ) = 0, t h e n P, o f -' + P o f -'. I n
particular, if ((R, C , Q ) i s a probability space, Y,, Y are r.v.s from (R into S
D
such that Y, + Y (i.e., Q o Y'; + Q o Y - l ) , then f (Y,) 2 f (Y) for each
continuous f : S + 3 .

Proof By the above proposition, there exist r.v.s, X,, X on the Lebesgue
unit interval ([O, I ) , L, p ) into (S,B) such that X, + X a.e., and P, =
p o x;', P = p o X-l. Also, f (X,) (w) + f ( X )( w ) for all w for which f is
coiitiiiuous at X(w). The set of discontiiiuities of f is contained in x - ' ( D ~ ) ,
which is p-null. Hence p o f (x,)~' + p 0 f (X)-', or equivalently

Setting P, = Q o Y,-', P = Q o Y-' in the above, and since P, o f-' =


Q o f (Y)-l, P 0 f -'= Q 0 f-'(Y)-l, the main result implies the last part.
This finishes the proof.

Note that both the above proposition and corollary reduce t o what we
have seen before if S = R. Moreover, the calculus of "in probability" results
of Mann and Wald given as Problems 9-11 in Chapter 2 extend t o the case
considered here.
We are ready t o prove Donsker's (1951) invariance principle discussed ear-
lier in this section, aiid it still needs many details, t o be given here.

>
Theorem 4 Let {X,, n 1) be a sequence of i.i.d. random variables o n
( 0 ,E, P) with zero means and variances a2 > 0. If Y, i s defined by (1) as a
polygonal function o n fl + C[O, 11, which i s a random element with distribu-
tion P,(= PoY;l) o n the Bore1 sets of the separable metric space C[O, I ] (with
uniform n o r m as its metric), then P, + W, where W i s the W i e n e r measure.
Equivalently, Y, 2 Z, where {Z(t, .), t E [O, I]} i s the W i e n e r o r Brownian
m o t i o n process with Z ( t , .) as N ( 0 , a2t).Hence for each h : C[O, 11 + C[O, 11
which is measurable and whose discontinuities form a set of W i e n e r measure
zero, we have h(Y,) 2 h ( Z ) o r P, o h t l + W o h p l .

Proof By definition of Y, in ( I ) , the central limit theorem implies


Y,(l, .) 2 Y(1, .) where {Y(t, .), 0 5 t 5 1) is the Wiener process in which
D
we may aiid do take o2 = 1 for convenience. Note that Y,(t, .) + Y(t, .) also,
348 5 Weak Limit Laws

as n + oo. Indeed, for E > 0,

1
< n€2
- +0 as n + cx ( ~ e b ~ ~ einequality).
v's (9)
On the other hand, since [nt]/n + t as n + GO, from L6vy7s central limit
theorem we get with $(u) = ~ ( e ~ ~ ~ l ) ,

so that (l/fi)S[ntl2 Y(t, .). But the mapping h : ( u l , . . . , uk) H ( u l , u2-Ul7


. . . , u ukpl) is a homeomorphism of IW" aiid thus if P, is the probability
-

<
measure of (I/&) (S[,tll,. . . , S[,t,I ) , 0 t l < . . . < tk <
1, then it converges
(weakly or distributionally) t o W(.) iff Pn o h-' + W o h-' for each fixed
k. Taking k = 2,O < t l < t2 < 1, the general case being similar, we get by
independelice of S[ntllaiid Slntz1S[nt,l, -

Thus the finite-dimensional distributions of {Y,(t, .), O t <


1) converge at
all points t o the corresponding ones of the Brownian motion process {Y (t),0 <
<
t 1) by (9) and (10). Using this we shall show that the probability measure
Pn induced by Yn on C[O, 11 converges to the Wiener measure W on C[O,11. For
this, we need to establish a stronger assertion: for each Borel set A satisfying
W(BA) = 0, one has P,(A) + W(A). This is the crux of the result aiid we
present it in two parts, as steps for clarity.
I. Since the Borel a-algebra B is also generated by the (open) balls of
C[O, 11, it is sufficient to prove the convergence result for sets A which are
finite intersections of such balls satisfying W(BA) = O because sets of the
latter kind again generate B. Now let f l , . . . , f, be elements of C[O, 11 aiid A
be the set which is the intersection of m-balls centered at these points and
radii rl, . . . , r, having the boundary of W-measure zero. Thus A is of the
form

>
Let k 1 be fixed and t = j / 2 k , j = 0 , 1 , . . . , 2k, a dyadic rational. Replacing
t by these numbers in the definition of A, the new set Ak (say) approximates
A from above as k + GO.Given 6 > 0, we can find a k >
1 such that [we
assume W(8A) = 01, W(Ak) - W(A) < 6. But by the first paragraph, (the
5.4 Invariance Principles 349

multi-dimensional central limit theorem) if B is a k-cylinder with B as its


base (so .irl;l(B) = B and B is spelled out below in P,[B])

liin P, [B]= nliin Pn[ ( Y ( t l ) ,. . . , Y(tk)) E B] = w(B),


n i m i m

where the base B is a k-dimensional Bore1 set and w ( ~ B=)0. In particular,


taking B = Ak, one has limn,, P n ( A k ) = W ( A k ) , and since P,(Ak) >
P n (A), we get
lim sup Pn(A) < lim sup Pn(Ak ) = W (Ak) 5 W(A) + 6. (11)
n n

It follows from the arbitrariness of S > 0 that G,,,P,(A) W(A). Thus <
the next step is to establish the opposite inequality, liminf, P,(A) W(A), >
which needs more work than that of establishing (11).
11. Let E > 0, r] > 0 be given. Choose n > 0 such that if
-
H
C
Y = {g : f (t) + < g(t) <f (t) - a, 0 < t < 1))
which increases to A as a J 0, then W(H,) > W(A) -E. One can approximate
H, from above if t is replaced by the rationals as in the last paragraph. If
t = i / k , i = 0 , 1 , . . . , k, and

+
then W(A) < W ( H m ) E < w ( H ~ )E for each k + >
1. But f,fx
E C[0,1].
Hence they are uniformly continuous. Thus - we can
- find k > 1 such that if
I t s < I l k , then if(t) f(s) < n/3, if(t) f ( s ) < n/3. In particular, if
- -

n > k, and Cn c C[O,I] is the set of functions which are piecewise linear on
[(i l ) / n , i l n ] ,i = I , . . . , n , then by the uniform contiiiuity of these functions
-

(and by the Weierstrass approximation theorem) we see that P,(Cn) = 1 for


each n (every lln-neighborhood of C, in C[O,11). Next let
-
F, = {g E C[O, 11 : g(i/n) < f ( i / n ) + a / 3 or g(i/n) >f (i/n) - a/3
< <
for some 0 i n}.

If p > 1/n2k, then


- C, n F," c A [each g E Cn not in Fn must satisfy
f ( t )+ a / 3 <
g(t) < f ( t ) -?/3.] We want to estimate the probabilities of A
for the P, and show that limnP,(Ac) <
W(Ac) to complete the proof. By the
above iiiclusioii [and the fact that P ( C n ) = 11 one has

where F,,, are the disjoint sets (F, = UL1F,,,) defined by


350 5 Weak Limit Laws
-
F,,, = + a / 3 < g(i/n) <f(i/n) 4 3 , i = 0 , . . . , r 1,
{g : f ( i / n ) - -

but not for i = r > 1).

If q,,, is that q satisfying (q l ) / k < r / n < qlk, 1 5 q < k < n (set qn,o = O),
-

then

By definition of F,,,,

is false and hence g $ H , or is in H:. Hence the first term on the right side
of (13) is dominated by P,((H;)"), since n k. >
Consider the second term of (13). Since Y,(t, .)- aiid Y(t, .)-processes have
independent increments, we have for 0 t , t' < <
1 and S = C[O, 11,

by the triangle inequality. But by the first paragraph [cf. ( l o ) ] the first and
<
last integrals of (14) tend to 0. The middle term ( [nt]- [ n t f ] / n ) l l 2 Hence
.
for large enough n , the right side of (14) 3 < d m .
Consequently

(by cebysev's inequality aiid the above estimate).

But ( r / n ) - (qn,,/k) <


l / k , by definition of q,,,. Hence the right-side term is
<
at most Pn(Fn,,)(l/a2k) Pn(Fn,,)q. This gives for the second term of (13)
a bound <
q, since C:=z P,(F,,,) = P,(F,) <1. Substituting this in (13)
and then putting together all the estimates in (12), one has
5.4 Invariance Principles 351

But by ( l o ) , lim,,, P,((H;)') = w ( ( H ~ ) " ) for each k > 1. Hence (15)


yields
lim sup P,(Ac) 5 W((H;)")
n
+ q 5 W(Ac) + q + E. (15')

From this we deduce that ( E > 0, q > 0 being arbitrary) lim inf, P,(A) >
W(A). This and the inequality (11) above imply limn P,(A) = W ( A ) , and
the proof of the theorem is complete since the last part is an immediate con-
sequence of Corollary 3.

Discussion 5 The above proof is essentially based on Donsker's argu-


ment. There is another method of proving the same result. Since by (10) the
finite-dimensional distributions converge t o the corresponding distributions of
the Brownian motion process, the result follows if one can show that the se-
>
quence {P,, n 1) is weakly compact. What we proved is actually a particular
case of the latter. (This follows by Helly's selection principle if the space is fi-
nite dimensional.) The general compactness criterion, of independent interest,
was established by Prokhorov for all complete separable metric spaces in 1956,
and then he deduced Donsker's theorem from a more general result. When
such a compactness criterion is available, the rest of the generalizations are, in
principle, not difficult, though the computations of individual estimates need
special care. For a detailed treatment of these and related matters, one may
refer t o the books by Billingsley (1968), Parthasarathy (1967) and Gikhman
and Skorokhod (1969); and the former is followed in the above demonstration.

Let us state Prokhorov's result, which illuminates the structure of the in-
variance principle as well as the above theorem.

< <
Theorem 6 (Prokhorov) Let {Xnk,1 k knIn2l be a sequence of row-
wise independent sequences of r.v.s which are infinitesimal and which have two
moments finite such that E ( X n k ) = 0, Var S, = 1, where Sn, = Xnk EL=,
and S, = Snk7,. Let

t E [tn,,t,(,+l)], Sno= O,tno = 0, and t,, = Var S,,. Let P, be the distri-
bution of Yn in C[O, I ] . Then P, + W, the Wiener measure in C[O, I], iff the
Xnk-sequence satisfies the Lindeberg condition:

for each X > 0, where Fnk is the d.f. of Xnk [Compare this form with the
Lindeberg-Feller form given in Theorem 3.6.1
352 5 Weak Limit Laws

For the i.i.d. sequence of Theorem 4, the Lindeberg condition is automatic,


and hence this is a considerable extension of the previous result. To see this
implication, let Xnk = x k / n 1 I 2 , so that Fnk(z)= F ( z f i ) , where F is the
common d.f. of the Xi there. Since k , = n , we have

by a change of variable. Since Var X = 1, the right side + 0 as n + oo for


each X > 0. Thus (16) holds.
We shall omit a proof of Prokhorov's theorem, which can be found in the
above references. However, we illustrate its idea by a pair of very important
applications.

TWOApplications Let {Xnk,1 k < <


k,, n >
1) be a sequence of row-
wise independent infinitesimal r.v.s with means zero and finite variances. Let
Snr= E L = l X n k ,tnr = Var Snr= EL=, Var Xnk We then assert

Theorem 7 Suppose that the Xnk satisfy the Lindeberg condition (16).
Thus if t, = tnk,,, then with the above notation and assumptions, (16) becomes

implying that

max ~ , , / t ; ' ~ <z


] = lx epu212dul

Remark According t o Theorem 6, the limit d.f. of Mn = maxl<,<k7, snr/t;l2


is the same as that of V = supo,,,, B ( t ) , where {B(t),O <
t(l} is the
standard Brownian motion procesi.rhus the right side of (17) is P [ V < z].
However, there is no simple method of evaluating the latter probability. By
the invariance principle this limit d.f. does not depend on the particular d.f.s
Fnk,and we take advantage of this point in choosing simple enough d.f.s in
calculating the limit. For variety, we prove the result without iiivokiiig the
preceding theorem, whose proof we have not included.

Proof The result is established in two stages. First we show that the left-
side limit of (17) exists and is independent of the Fnk.(This would be immedi-
ate from Theorem 4 if the X n were i.i.d., since then h ( f ) = supoltsl f (t), f E
C[O, I ] , is a continuous fuiictioiial on C[O, 11.) Next we choose Xnk t o be syin-
metric i.i.d. Bernoulli r.v.s and evaluate the limit. These are iioiitrivial coin-
putations.
5.4 Invariance Principles 353

>
I. For the first point, let {Y,, n 1) be a sequence of independent N ( 0 , l )
r.v.s on ( 0 ,C , P) on which the Xnk are defined. The probability space may
be assumed rich enough to support all these r.v. families, by enlarging it if
necessary. Let 2, = EL=,Yk and if Fn(x) is the left side of (17) before taking
the limit, then we assert that for each E > 0 and integer r 1, >
liminfF,(x)
n
>P I max ~
l<k<r
~ / < xr - ~ / ~

and
limsup F,(x)
n
<P max ~
l<k<r
~ / < x]
r .~ / ~ (19)

These two inequalities are analogous to those of steps I and I1 of the proof
of Theorem 4, and the following argument proceeds on the same lines but is
tailored to the case under consideration.
Let i l l . . . , i, be integers chosen such that for each 1 j r, we have < <

where t, = Var S,k,, , as in the statement. Thus i j is the smallest integer such
that tniJ > < <
( j / r ) t n , and 1 il ...< i, = k,. Consider Unl = Sniland
Un j. -
- S nz,
. - S .nz J . = 2 , . . . , r . We note that as n + oo the Unj satisfy
Lindeberg's condition. In fact, by (16') one has

Hence for each E > 0,

inax Var Xnk/tn


l<kIkn
< l <inax
kIkn
(Ilt,) x2 dFni
211Ct!,/2]

Since E > 0 is arbitrary, (19") implies Var X n k / t n i 0 uniformly in k , as n +


oo. Now if a,, = Var(U,,), (19') and (19") imply l / r <
anl/t, < l / r Var
/ =1 -o(1), 2/r <
a,l +
a,,/t, < 2 / r Var X,,,/t, = 217- o(l), -

so that a,,/t, = l/r +


o ( l ) , and similarly anj/t, = l / r o ( l ) , 1 j r.+ < <
Thus for 0 < T/ < 1, there is an no[= nO(q)]such that n no implies >

Hence by (20) on setting B,, = [I XI > ~d-1,


354 5 Weak Limit Laws

as n + oo,using (16'). Thus the Unj also satisfy Lindeberg's condition for
1/2 D
each j . By Theorem 3.5, Unj/anj + to an N(O,1) r.v., for each j = 1 , . . . ,r,
as n + cm.Since the Unj, j = 1,. . . , r, are independent, so are these limit
r.v.s. But by (20) lim,(anj/t,) = l/r. Consequently, r 1 / 2 ~ , j / t k / 22 t o an
N(O,1) r.v. as n + oo,j = 1 , . . . , r, these being independent. This means

liin P[Unj < zjm


n+cx
j ,= 1 , .. . , r] = ( 1 / 2 ~ ) ' / ~
j=1
fi t?2/2 dU. (21)

Now going to an auxiliary probability space and using Corollary 3 with h(5) =
~ % .j. ), Zr)
m a ~ ~ < ~ < ~ ( ~2 t== (21,. , E Rr, we get from (20) that

11. It will be shown presently that (22) implies (19) as well as (18).
For this let G,,,.(z) = P[maxl<j<,-S,,] < ztk"]. We express the event
- -

[inaxlc S,,! < ztA/2] here as a disjoint union in much the same way as in
(13) (as was done many times before, e.g. in martingale convergence). Thus,
let
Hnl (z) = [Snl ~t:'~], >
and for j > 1,Hnj(z)= [Snj ~ t k ' ~ >, < zt;"
Sn, for 1 <i <j - 11. These
are disjoint, aiid if Qnj(z) = P(Hnj(z)),
we have

If ijPl < i j , then for each ijPl < k < i j , set


snk > ~ t ; / ~l < inax
,
z<k-1
s,,,< z t y 2 , Snt1 - Snkl> € t y 2 ]

and

snk> z t y 2 , l <max
z<k-1
s,, < lit:/', s,,~ &*I < ~ t k / .~ ]
-

'l%us &,k (2) = Qkk (z) + Q i k(z),aiid by independence

by ~ e b ~ ~ einequality
v's for the last one. By (20), this becomes
5.4 Invariance Principles

However, on each set appearing in Qgk7we have Snk2 xi$" and I Snil-
S n h< ~ t k ' ~so, that SnG> Snk E ~ A>/ ~(x
- - ~)t;''. Hence

Thus (24) and (25) give

Clearly F,(x) < G,,(x) for all r > 1. Hence the above inequality becomes

lim sup F, (x)


n
< lim
n
G,, (x) <P max Z j
l<j<r
< xrl"
I ,

which is (19). Taking liminf of the first two terms and using (22), we get (18),
since rj > 0 is arbitrary.
From (18) and (19) it follows that

max Z~ < ~ r l / ~
l<j<r J
If Q,(x) = P[maxlsilr ~ ~ / < rx],~then / by
~ the Helly selection principle
for a subsequence ri, Q,, (x) + Q(x) at all continuity points x of Q, a nonde-
creasing function, 0 < <
Q 1. Since Fn does not depend on ri, we get from
(27) on letting r, + oo,

Letting E \, 0 so that z-E is a continuity point of Q, we see that lim,,, F,(x)


exists and = Q(x) where x is a continuity point of Q. [Also note that (27) and
356 5 Weak Limit Laws

(27') + Q is a d.f. But this is again independently obtained in step 111.1 This
proves our first assertion, as well as the fact that the limit does not depend
on the d.f.s of Xnk. It only remains t o calculate this limit.
>
111. Consider {YA, n I), where the YA are independent and P[Yl = I] =
P[Yi = I ] = i l k >
I. Now wemay take Xnk = YL,1 5 k 5 k, = n (or
< <
X n k = ~ l l n l l1~ , k n , so that the Xnk are infinitesimal, but this is not
used at this point) and set Sn = EL=, Yl.
Let Si = maxl<k<, Sk and z > 0. Now if N = [zfi], the integral part,
- -

then since S: takes only integer values, we have (with largest N < x f i )

To simplify the middle term consider for any integer J > 1,


n- 1
PIS: > J7Sn< J] = C P[Sj 2 J7Si< J , l 5 i 5 j
- l,Sn < J]

j=1
>
(the first set being [S1 J,Sn < J] and the nth term is zero)

(by independence)
n-1
= C >
P[Sj J,S, < J. 1 i < 5j - l]PISn Sj > 01,
-

j=1
(by symmetry of the Y , )
n-1
= C P [ S ~ > J , S ~ < J . ~ < ~ ~ ~ - ~ , S ~ - S ~ > O ] ,
j=1
(by independence)
n-1
= CP[Sj> J,Si < J71< i 5 j - l , S n > J]
j=1
= P[s; > J,Sn> J]
= P[Sn> J].

Substituting this in (28) with J = N + 1, one gets


5.4 Invariance Principles

But
liin p[s,/fi
n
< z] = ( ) l 2 e U 2, z t 8,

by the central liinit theorem. Since the liinit is coiitinuous, the last term in
(29) goes t o zero, and hence for all z > 0

and =O for z 5 0, since F,(z) = 0 for all n >


1, z 5 0. Substituting this in
(27'), we see that (17) holds, and the theorem is completely proved.

In the special calculation of probabilities following (28) we have used the


symmetry in deducing S, - Sj > 0 has the same probability as S, Sj < 0
-

(and, of course, also the independence). This is called the reflection principle
due t o D. Andri.. It is thus clear that in all these problems involving invariance
principles a considerable amount of special ingenuity is needed t o obtain spe-
cific results. For instance maxjs, I Sj, minjln Sj, etc., are all useful problems
with applicational potential and admit a similar analysis.
The next illustration shows how another class of problems called empiric
distributional processes can be treated, and new insight gained. We use the
preceding result in the following work.

If {X,, n > 1) is a sequence of i.i.d. random variables with F as their


coininoil d.f., and, for each n , if F,(z) = ( l l n ) C,"=,
x [ ~ , <called
~ I , the em-
piric distribution, then we have shown in 2.4.1 (the Glivenko-Cantelli the-
orem) that F,(z) + F(z) as n + m, uniformly in X ( E R ) , with proba-
bility one. Can we get, by a proper normalization, a limit distribution of
the errors (F,(z) F ( z ) ) ? If F is a contiiiuous d.f., then F ( X 1 ) is uni-
-

formly distributed on the unit interval. By the classical central limit theorem,
2
fi(F,(z) F(z)) Z,,an N(O,1) r.v. Consequently, one should consider
-

processes of the type Y,(z) = fi(F,(z) - F(z)) and investigate whether


Y,(t) 2 Y ( t ) , and whether {Y(t), t E R} is somehow related t o the Brow-
nian motion process. This is our motivation. But there are obstacles at the
very beginning. Even if F is continuous, Yn(.) [unlike (111 has jump discon-
tinuities. Thus Y, : R + D ( R ) , the space of real functions on R without
discoiitiiiuities of the second kind. If, for convenience, we restrict our atteii-
tion t o the uniform distribution F ( t ) = t (cf. Theorem 3.3.9, where such a
358 5 Weak Limit Laws

transformation for strictly increasing functions was discussed in detail), even


then Y,(R) c D[O, 11 and C[O, 11 c D[O, 11. This and similar problems show
that one has t o introduce a topology in the larger space D[O, 11 t o make it a
complete separable metric space so that the above iiiclusion or embedding is
coiitinuous, and if P, is the induced measure of Yn in D[O, I ] , theii one needs
t o find conditions for Pn + P, and determine P. A suitable topology here
was introduced by A.V. Skorokhod, perfected by Kolmogorov, and soon there-
after the corresponding extension of Theorem 6 was obtained by Prokhorov
(1956). It turns out that P colicelitrates on the subspace C[O, I ] of D[O, 11,
and P is related t o W in a simple way. [It is the "tied down" Wiener process
X ( t ) : X(0) = 0, X ( 1 ) = 0.1 We do not need this general theory here. The
next application for the empiric distribution processes can be proved using
Theorem 7 above and Theorem 3.3.9 in a reasonably simple manner. We do
this followiiig R h y i (1953).
The result t o be established is on the limit d.f. of the relative errors in-
stead of the actual errors F,(z) F(z). We assume that F is continuous,
-

and consider -oo < xa < oo such that 0 < F ( x a ) = a < 1. Then one has
the following result, which complements an application detailed in Theorem
3.3.9. However we need t o use the latter theorem in the present proof.

Theorem 8 Let F be a continuous strictly increasing d.f. and z,, 0 < a <
1, be as above. If Fn is the empiric d.f. of a set of n independent r.v.s with F
as their common d.f., then

where y(a) = y ( a / ( l - a ) ) l I 2 . If, moreover, F(zb)= b, with 0 < a < b < 1,


then we have

with c(a, b) = [ a ( l - b)/(b - a)] I n particular,


1 a ( l b) -

G(0) = - arc sin


IT b ( l - a)1/2
Proof First note that if we let b + 1, so that X b + +oo, theii G(y)
becomes the right side of (301, as it should. Thus it suffices t o establish (31).
5.4 Invariance Principles 359

However, (31) is also an easy extension of (30). Thus we obtain (30) for sim-
plicity, and then modify it to deduce (31). Because of the separability of R,
there are no measurability problems for the events in (30) and (31) (cf. the
proof of the Gilvenko-Cantelli Theorem 2.4.1). We present the proof again in
steps, for clarity. The first step deals with a reduction of the problem.

I. Since Y[= F ( X ) ] is uniformly distributed on (0, I ) , if u = FP1(x), then


Hn(x) = F,(u) = F,(FP1(x)) is the empiric distribution of the observations
Y , = F ( X i ) ,i = 1 , . . . , n, from the uniform d.f. Hence
IF, (x) - F(x)l = sup [Hn - 21
sup
x,, <x<oo F(x) a<z<l 1~

Coiisequeiitly (30) is equivalent to the following:

. .

We claim that, for (32), it suffices to establish

lim P
n i m
I sup
a < H , , (2111
fi(~n(x)
x
< y] = 8iy(a) epu2l2du, y > 0.
(33)
To see this, let

where E > 0 is arbitrarily fixed. If B, = [ I H,(z) - x 5 €1, then on En for


> + <
H n ( x ) a E or a H,(x) - E x , one has <

It follows that An n Bn c A, n B,. Hence from

we get, on noting that as n + oo,H,(x) +x a.e., uniformly in x (Glivenko-


Cantelli), that if (33) is true,

n
P(A,) < lim
n
P(A,) = / 0
u(a+e)
eP7'2/2du,
360 5 Weak Limit Laws

Similarly, starting with H,(x) >a - E, on B , we have a - E 5 H,(x) >x - E,

and if
r 1

-
then An n Bn >An n B,, and since a 5 x , we get with (33)

Since the right sides of (34) and (35) are continuous functions of y ( a ) , and
y(a f E ) + y(a) as E + 0, (34) and (35) imply (32) if (33) is true.
11. Let us therefore establish (33). Here we use the properties of order
statistics, especially Theorem 3.3.9. If Y;" < Y,* < . . . < Y,* are the order
statistics of the uniform r.v.s Y l ,. . . , Y, [Yk= F ( X k ) ]then
, clearly the einpiric
d.f., H, can also be defined as

(This is obviously true for any d.f., not necessarily uniform.) Since H , is a
step function which is constant between Y< and Y$+l,it follows that

But the Glivenko-Cantelli theorem implies ( H n ( x )- x ) / x + 0 a.e. uniformly


in z, so that in (36) ( k / ( n Y ; ) ) 1 + 0, a.e. uniformly for a
- <
k / n < 1 as
n + oo. Thus for each E > 0, there is an no [= n o ( € ) such
] >
that n no implies
( k / ( n Y { ) ) 11 < ~ / a.e.
- < <
n for all a k / n 1. Hence

where o,(l/&) + 0 a.e. as n + ce for a < k / n < 1. If follows thereafter


that

6
n-a
liin P inax
[ansiisn ( --
1) < y]

I
k
= lim P max J;l log - < y] .
n-oo anllcsn nu,*
Thus (33) will be proved, because of (36) and (381, if we show
5.4 Invariance Principles 361

111. Now { Y i ,1 5 k 5 n ) is a set of order statistics from the uniform


d.f. But from Theorem 3.3.9 (or even directly here) Zk = - log Y,*+lpk is
an order statistic from the exponential d.f., and then (from the theorem)
{Zk, I 5 k 5 n) has independent increments. In fact, as we saw in the proof
of that theorem, if Uk = Zk Zkpl (ZO= O), theii U1, . . . , U, are independent;
-

the density of Uk is given by

+
In other words, if Uk = Vk/(n k I ) , theii Vk has the standard exponential
-

density with mean 1, i.e., the density gv, is

Thus
n k

the V, are i.i.d. with density given by (40). But -log: Yc = Zn-k+l =
E":; Uj, and the Uj satisfy the Lindeberg condition. In fact they satisfy
the stronger Liapoullov condition lim,, p(Zn)/a(Zn) = 0, where

But a simple computation shows that +


< 17 E F = l (n - k 1 ) ~Hence
~ .
the above condition holds. Thus Theorem 7 applies, and we get the followiiig
on setting s2(n) = Canllcln Var Unpk+l:

where z > 0. But the left-handed side (LHS) of (41) can be written as

Since 0 < a < 1 and j 2 an, we have with the standard calculus approxima-
tions
362 5 Weak Limit Laws

Hence the right-hand side (RHS) of (42) simplifies to

Finally, (39) follows from (41) and (43) if we set

in (43). This establishes (30) in complete detail.


IV. It is now clear how t o extend this for (31). The only changes are in
the limits for the maximum. Hence consider for 0 < a < b < 1,

(the primed variables being centered at means)


n-hn

(since the first term does not depend on k)


= + (s~~J). (44)

Thus (A1) and (A2) are independent r.v.s and (A1) 2 to an r.v., N(0, (1 - b)/b)
as n i oo, and by the first part

The limit d.f. of en is thus a convolution of these two, so that


lim P[& < y] = e-bu2/2(l-b) dU
n+oo

[("?A) [ab/(b-a)]'"
x e-~2/2 dv.

The right side reduces to G(y) and (31) is obtained.


5.4 Invariance Principles

Finally, for G(O), note that it can be written as

The right-side expression was shown in Problem 26 in Chapter 4 t o be the


desired quantity. This proves the theorem completely.

We have included here all the (brutal) details because of the importance
of this result in statistical hypothesis testing. It is related t o classical the-
orems of Kolmogorov and Smirnov, aiid is used t o test the hypothesis that
a sample came from a specified contiiiuous (strictly increasing) d.f. Various
alterations [e.g., with absolute deviations in (30) and (3111 are possible. We
do not consider them here, but indicate some in a problem (see Problem 32).
Before ending this section, we add some remarks on two points raised
before. Processes {Xt, 0 5 t 5 1) more general than Browiiiaii motion are
those which have independent iiicrements aiid are stochastically continuous,
i.e., limt+, P [ Xt X,I
- > E ] = 0 for t , s E [O, I] aiid a given E > 0. Such
processes can be shown t o have no discontinuities of the second kind, so that
their sample paths belong t o D[O, 11. Consequently one can seek conditions
for the convergence of Pn of Yn(t, .)--the random polygon obtained as in The-
orem 6 for sequences of rowwise independent asymptotically coilstant r.v.s.
Here the theory of infinitely divisible distributions (as a generalization of the
classical central limit problem) enters. The corresponding characterization has
been obtained by Prokhorov (1956). The second point is about the "strong"
invariance principle. This is about a statement that the random polygonal
processes, obtained as in ( I ) , converge t o the Brownian motion process with
probability one, if both caii be defined on the same probability space. Cali
this always be done or only sometimes? A first positive solution was given by
V. Strassen in the early 1960s. To describe his formulation, let {X,, n > 1)
be i.i.d. with zero means and unit variances. If So = 0, S, = Ci=,X k , let
Yn(t, .) be the polygonal process defined similarly (but we can not use the
central limit theorem aiid so must have a different normalization). Thus let
us define on the probability space (R, C , P) of the X,

where [t]is the integral part. Extending the ideas from the embedding method
of Proposition 2, aiid using other tools (such as the iterated logarithm law-see
the next section) one caii obtain the followiiig result.

Theorem 9 (Strassen) There exists a probability space (GI, C1,P') (actu-


ally the Lebesgue u n i t interval will be the candidate) and a Brownian m o t i o n
364 5 Weak Limit Laws

process {B(t),t > 0) >


and another process { ~ ( t )t , 0) on it such that

(i) {?(t), t > >


0) and { ~ ( t )t , 0) have the same finite-dimensiond
distributions, and
(ii) P'[limt+oos~ps5tI Y ( ~- )~ ( t ) / ( 2 t l o g l o g t ) ~= /0~] = 1.

Even though we discussed the existence of Brownian motion on [0, 11, the
general case of R+ is similar. A number of other "old" results have been ex-
tended to this setting. A survey of these aiid other possible extensions with
references have beeii given by Csorgo aiid Rkvksz (1981), to which we refer the
interested reader for information on this line of investigation. As is clear from
the statement of Theorem 9, one needs to use several properties of Brown-
ian motion, (some of these will be studied in Chapter 8) and will lead us
tangentially in our treatment. So the proof will not be detailed here.
It is clear that the results of this section indicate that a study of limit the-
orems in suitable metric spaces more general than those of C[O,I] aiid D[O, I]
can be useful in applications. These may clarify the structure of the concrete
problems we described above. Relative compactness of families of measures
and their equivalent usable forms, called tightness conditions from Prokhorov's
work, have beeii the focus of much recent research. One of the main points
of this section is that an essential part of probability theory merges with the
study of measures on (infinite-dimensional) function spaces. We have to leave
these specializations at this stage.

5.5 Kolmogorov's Law of the Iterated Logarithm


The preceding result, due t o Strassen, contains a factor involving an iterated
logarithm. In fact the law of the iterated logarithm (LIL) is about the growth
of partial sums of (at first) independent r.v.s which is a far-reaching general-
ization of the SLLN aiid is also needed for a proof of Theorem 4.9. Here we
demonstrate the first basic LIL result given in 1929 by A. Kolmogorov. Simi-
lar to the other theorems, this one also has been extended in many directions,
but for now we shall be content with the presentation of a complete proof of
this fundamental result. We include a historical motivation.
The problem originated in trying t o sharpen the statements about the
normal numbers (cf. Problem 7 in Chapter 2). If 0 < x < 1, and in the
dyadic expansion of x, S,(x) denotes the number of 1's in the first n digits,
then Sn(x) + cm a.e. as n + cm. (A similar problem also arises for decimal
expansions.) But what is the rate of growth of Sn? Since S n / n + 112 a.e. by
the SLLN, one should find the rates of growth of I Sn (n/2) . The first result
-

in 1913, due to F. Hausdorff, gives the bound as ~ ( n ( ' / ~ ) + "E )>


, 0. Then in
the following year G.H. Hardy and J.E. Littlewood, who were the masters of
5.5 Kolmogorov's Law of the Iterated Logarithm 365

the "E-6" estimations of "hard" analysis, improved the result t o O(J=).


Almost 10 years later A. Khintchine, using probabilistic analysis, was able t o
prove the best result O(2/-), and more precisely,

lim sup I Sn - 12/21 - 2-1/a a.e. (Lebesgue).


n -/2
Thus the law of the iterated logarithm was born. In the current terminology, if
Sn is the partial sum of i.i.d. symmetric Bernoulli r.v.s, then (1) is equivalent
t o stating that

1/& a.e. (Lebesgue).

This result showed the power of the probabilistic methods and represented a
great achievement in the subject. That was enhanced by the next result of
Kolmogorov's, when he generalized the above for an arbitrary independent
sequence of bounded r.v.s. We establish this here. Because it is a pointwise
coilvergelice statement, the result is a strong limit theorem. However, error
estimates in the proof depend critically on the weak limit theory, and so the
result is placed in this chapter. (Thus the strong statement is really based on
the "weak statement.") Actually in the modern development of probability,
the strong and weak limit theories intertwine; and this greatly enriches the
subject.

The desired result can be stated as follows.

Theorem 1 (Kolmogorov) Let {X,, n >


1) be a sequence of independent
individually bounded r.v.s such that E(X,) = 0, s i = Var(Sn) /' oo, where
Sn = Xk. If X, I = o(sn(loglog s , ) ~ ' / ~ a.e.,
) >
n no, then

and

lim, ,, (S,/\/~S;log log s,) = -1


I = 1.
(3b)
Hence we also have

For a proof of (3a), we need t o establish with the weak limit theory certain
exponential bounds on probabilities. Thus we first present them separately
for clarity. Note that if (3a) is established, then considering the sequence
{-Xn,n > 1) which satisfies the same hypothesis, one gets (3b), and then
(3c) is a consequence. Hence it suffices t o establish (3a). This is done in two
stages. First one proves that
5 Weak Limit Laws

and then for each E > 0, one shows that the left side is >
1 E a.e. These
-

two parts need different methods. Kolmogorov obtained some exponential


upper and lower bounds for P [ m a ~ l < ~Sk >
< , Xc,]. Then using the (relatively
easy) first Borel-Cantelli lemma, (4)proved with the upper estimate. The
lower one [for the opposite inequality of (4)] is more involved. It should be
calculated for certain subsequelices by bringing in the independent property
so as to use the second Borel-Cantelli lemma. This is why a proof of Theorem
1 has always been difficult. A relatively simple computation of the bounds,
essentially following H. Teicher (2.Wahrs. 48 (1979), 293-307), is given here.
These are stated in the next two technical lemmas for convenience.
It is useful to consider a property of the iiuinerical fuiictioii

1
h(x) = (e" - 1 - x)/x2 = -
2!
+ 3!x + x2
- -
4!
+ . .. . (5)

For x > 0, h(x) > 0, hf(x) = (dh/dx)(x) > 0, and hl'(x) = (d2h/dx2)(x)> 0.
Heiice (h(.) is a positive illcreasing convex function. The same is also true if
x < O[h(O) = 1/21. To see this, let z = y , y > 0, so that

Hence h(x) > 0 for x < 0, and limy,o g(y) = 112, lim,,, g(y) = 0. Also,
9'(?4) = - (Y + 2)g(?4)lly. Thus g1(?4) < 0 iff 9(y) > (Y + 2Ip1. And g"k4) =
[ 9 ( ~ ) ( ~ ~ + 4 ~ + 6 ) ( ~ SO ) 1 d/ l~( y, ) > 0 i f f d y ) > (y+3)(y2+4y+6)r1.
+ 3that
+
For us it suffices to establish the first inequality. Since (y 2 ) ~ < + +
' (y 3) (y2
4y+6)-17 y > 0, if we verify the second, both inequalities follow. Thus consider
the latter. Now

Thus

Hence for 0 < y < 3,

>
I f y 3, g(y) > ( y - l ) / y 2>(y+3)/(y2+4y+6). Heiice in all cases h given by
(5) is a nonnegative increasing convex function on R. This function h plays a
5.5 Kolmogorov's Law of the Iterated Logarithm 367

key role in the next two lemmas, giving the exponential bounds alluded to be-
<
fore. (Only 0 h / is used in the next lemma, and Xk's are as in Theorem 1.)

Lemma 2 Let {X,, n >


1) be independent r.v.s with means zero and fi-
nite variances. Let S, = EL=,Xk, s, = Var S, (= C;=, 02, a; = Var Xk),
and c, > 0 be such that 0 < c,s, and increasing. Let h(.) be given by (5). Then

(i) P[Xk < cksk, 1 < k < n ] = 1 implies

and for A, > 0, b > 0,x, > 0, we have

(ii) P[Xk > c k s k , 1 < k < n] = 1 implies


E(exp{tSn/sn)) > exp 1 - t2h(-cnt)/si4
k=l
(9)
and if also ak < cksk, k > 1, one has

Proof (i) Consider for 1 <k <n


E(exp(tXk/sn})
=1 +
E(exp{tXk/s,) 1 tXkl.5,)
- - [since E ( X k ) = 01
= 1+~(h(t~k/s,)(t~~:/s:)) (11)
< 1+t2~(h(tcn)(~z/si))
(by hypothesis and the monotonicity of h)
= 1 + (t2&s:)h(tc,) < exp{(t2&s~)h(tc,)}
(since 1 + x < ex for x > 0). (12)
< <
Multiplying over 1 k n , we get (7) by independence.
>
Also, ISk,a(X1, . . . , X k ) , k 1) is a martingale sequence, and since
x H exp{tx}, t > 0, is convex, {exp{tSk}, a ( X 1 , . . . , Xk), k >
1) is a posi-
tive bounded submartingale (by the bound condition on the X k ) . Hence the
maximal inequality gives (cf. Theorem 3.5.61)

max Sk > A,x,s,


l<k<n
exp{tSk) > exp{A,tx,s,) I
5 Weak Limit Laws

This is (81, and (i) follows.

+ >
(ii) For (91, we use the inequality 1 x exp(x - x2) for x 0. In fact,>
>
this is obvious if x 1 and for z = 0. If 0 < x < 1, then this is equivalent to
+
showing that (1 x) exp(x2 x) > 1, or that
-

Since f (0) = 0 and f f ( x ) = x(1+2x)(l+x)-' > 0, f (.) is strictly increasing, so


that (13) is true. We use this with the expansion (11). Thus by the condition
of (ii) and the monotonicity of h on R,

Multiplying over 1 < k < n , this yields (9) since si = C;=,02. If also
<
02 c i s i , so that n, n

we get (10) from (9) on substituting the bound c i for EL=la;/sA. This com-
pletes the proof of the lemma.

Next we proceed to derive the lower bound using (10). This is somewhat
involved, as remarked before.

Lemma 3 Let {X,, n >


1) be an independent sequence of individually
<
bounded centered r.v.s. If 0 < 02 = Var XI,, I X I , dk a.e., where 0 < dl, 1'
and z, > zo > 0, satisfying lim,+,(d,z,/s,) = 0, with s, = Var S,,S, =
EL=1X k , then for each 0 < E < 1, there exists a 0 < C, < 112 such that

for all n > no(= n,).


Proof Since X k is a bounded r.v., its moment-generating function (m.g.f.)
~ " ,0 < cjk(t),cjk(0) = 1, and cjk is contin-
exists. Thus if cjk(t) = ~ ( e ~ then
uous. Hence there is a to > 0 such that 0 < 41,(t) < cc for all 0 <
t < to,
since the set J = {t : 4 ( t ) < oo) is an interval containing 0. This is be-
cause the fuiictioii t H exp{tx) is convex, and hence so is q5k, which implies
that J is a convex set. If a0 = inf {t : t E J),bo = sup{t : t E J), then
-cc <a0 <0 < bo <
+cm,so that J is the interval (ao,bo) with or with-
out the end points. Thus $k(t) = logq51,(t),0 <
t < to, is finite, so that
4k(t) = exp $k(t).
5.5 Kolmogorov's Law of the Iterated Logarithm 369

The idea of the proof is (i) t o obtain lower and upper bounds for (so we
get exponential bounds for q5k), and (ii) then use an ancient transformation,
due t o F. Esscher in 1932, t o relate the result t o the desired inequality (14).
There is no motivation for this computation, except that it was successfully
used by Feller (1943) in an extension of a result of Cram& aiid then in his
generalization of Kolmogorov's LIL. We employ it here in the same manner.
To proceed with point (i), let c, = d,/s, and consider q5kn(t) = $k(t/sn).
Since q5k is actually holomorphic (on C),we have on differentiation

where, as in Lemma 2, hl(x) = (ex-1)/x. Clearly hl is increasing on.


'
R But
on R-,let hz(y) = hl(-y), y > 0. Because hl(0) = 1,h2(y) = (1 -epy)/y + 0
as y + oo,it is decreasing for y 7because hk(y) = [(y 1)e-Y - 1]/y2 and +
+
eY > 1 y + hk(y) < 0, so that hl(x) is increasing on all of R as x 7 .
<
Consequently for t > 0, because I Xk/snl c, = d,/s, a.e. for k n , <

Similarly ex is illcreasing so that

However, a; <
d: = c:s:, k <
n , and for t > 0, 4k(t/sn) > 1. Thus the
inequalities (16), (17), and (11) yield, with h of that lemma, [$kn = log&,]

and similarly

$!,(t) = -- [$;,(t)I2 5 q5in(t) < etc' (a:/$) [by (17)l. (19)


d)kn(t)
A lower bound is obtained by using (12), (17), and (18):

Consequently, if $,(t) = &,(t), then (18) implies

For point (ii) we proceed t o the key Esscher transformation aiid use these
bounds in its simplification. Let 0 < t < to be fixed, and if Fk is the d.f. of
370 5 Weak Limit Laws

Xk, so that Fk, is the d.f. of Xk/s,, 1 5 k 5 n, define a new d.f. Fin by the
equation
dFin (x) = [$kn (t)lP1etxdFkn (x). (all
Let {Xkn(t), 1 k < < n} be independent r.v.s each with d.f. FL,. It may
be assumed that these r.v.s are defined on the same probability space as
the original Xk, by enlarging the underlying space if necessary. Let S,(t) =
Xkn(t) and Fi be its d.f. Noting that if the ch.f. &,
of F;, is calculated,
then the ch.f. 2of S,(t) is given by

[by the fact that &(t) = log $kn(t)]

Thus the meail and variance of S,(t) are

respectively, so that they are $k(t) and $:(t). If F ' is the d.f. of G ( t ) =
(S, (t) $; (t))/
- a, then it is clear that

We use this transformation t o connect the probabilities of Sn/s, and s n ( t )


for each 0 5 t < to, since then by a suitable choice of t (t = 0 corresponds t o
S,) we will be able t o get the desired lower bound. But the ch.f. of Sn/snis
given by exp$,, aiid (22) implies (by the uiiiqueiiess theorem) that the d.f.
FA of S,(t), with F, as the d.f. of S,/s,, is

Here we use

Now.
5.5 Kolmogorov's Law of the Iterated Logarithm

= eQrb(') La eCtx d ~ A ( z ) [by (23)]

x d ~ ($;I: ( t )+ z m) (by change of variables).


(24)
To get a lower bound, note that if t is replaced by a sequence t , in (19)
and (19') such that t,c, + 0 , implying etnClb = 1 O(t,c,), we have +

since a: = s i . To get a lower estimate for exp{$,(t) - t $ d ( t ) } in (24),


consider

1
2 t;[-
2
+ o(t,c,) - o(t;c:) - (1 + o(t,c,))]
1
+ +
[since h ( x ) = - o ( z ) ,h l ( z ) = 1 o ( z ) as z = t,c,
2
+ 01

2 - ( t ; / 2 ) ( 1 + ~ ) where O < & < l . (26)

as t,c, + 0 by (20) and (25). Hence 6, 5 -&/2 as t,c, + 0. Setting


w = ( 1 - ~ ) t in
, (24),we get with (26)

then E(Zk,) = 0, Zk, = ( S n ( t n ) - $k(tn))/dm,


the Zkn, 1 5 k 5
n, are independent and infinitesimal, because by hypothesis

1 Z k n 5 2 ~ , ( $ : ( t , ) ) - l / ~ = o(1) ax. uniformly in k.

Also by hypothesis
dnxo dnzn
lim -5 liin -= 0,
n-cc S, n-os S,
372 5 Weak Limit Laws

so that s, + oo faster than d,.


Since X k, and hence Xk, [and Xkn( t ) ,]take values in the interval [-dk , dk],
it follows that for large enough n [note that &(t/sn) 11 >

because dFk(x) = 0 for 1 XI > dk and ~ s ,> d, for large enough n , by the
above noted condition. But ~ a r ( ~ , ( t , ) / m=) 1. It follows from Theorem
3.5 that (S,(t,) -$k(t,))/ + an r.v. which is N ( 0 , l ) . Consequently
the right-side integral of (27) is >
112. Hence (27) becomes for large enough
n

This is (14) if t, = x,(1 ~ ) 2 ' / ' , aiid the proof of the lemma is complete.
-

Note how significantly the central limit theorem (for rowwise independent
sequences) enters into the argument in addition t o all the other computations
for the lower bound. We are now ready t o complete the

Proof of Theorem 1 / ~ ) so d, T,
Let d, = o ( s , ( l o g l ~ g s , ) ~ ~ aiid
1 X,l < d,, a.e. If c, = d,/s,, then c, = o(1) and a i / s i < c i . Also,
~ , ( l o g l o g s , ) - l / ~ + 0 as n + cm. For any a > 0, b > 0, consider a! >
bF1 + bh(ab), where h(x) = (ex - 1 - x)/x2. Choose P > 1 such that
+ +
alp2 > bF1 bh(ab). Since s i + , = s: o:, so that s,+l/s, + 1 as n + oo
(because ai/si 5 c i + O), we deduce that there exist nk < nk+l < . . . such
that s,, < PIC< s,,,,. For otherwise there will be no s, in (PIC,Pk+l)such
>
that l i m ( ~ , + ~ / s , ) /3 > 1, contradicting the preceding sentence. This implies
also that s,, PIC,the symbol indicating that the ratio + 1 as k + oo.
Let x i = log logs,, so that for large enough n , c,x, 5 a , since c, goes t o
zero. We can now use Lemma 2. Taking A, = n / P 2 > 0, b > 0, aiid x, > 0 as
here, we then get by ( 8 ) ,

P
I inax Sk> a p F 2 s n , (log log s,, )
l<k<n, J
< exp{- [(ab/P2)- b2h(ab)]log logs,,). (29)
But (nb/P2) b2h(ab) > 1, by the choice of P. Hence there is an 7 > 0 such
-

that the above probability in (29) is not larger than

+
exp{- (1 q ) log log s,, ) < {k log ~ ) l ( " / ~ )

for all large enough n , because of s,, ,!?%Thus by the first Borel-Cantelli
lemma, since (kl~gp)-~-(< ~ /cm,
~ ) we have
5.5 Kolmogorov's Law of the Iterated Logarithm

P[S, > as,(log log s,)-l'', i.o.1

<P 1
r

mau
<n<nh+l
S, > as,, (log log s,,)~/', i.0.
7

J
It follows that

+
Since a > bE1 bh(ab) for all a > 0, b > 0, we see, on letting a + 0, so that
+
h(0) = 112, that (30) is true for all a > bE1 b/2. The least value of the right
side isa, and so (30) holds a.e. if a = a . This establishes (4). Note that
by applying this result t o {-X,, n 11, we deduce that >
-
lim Isn
s, (log log s,) lI2
<J?
-
a.e.

We now prove the opposite inequality t o (4). Again choose the {nk, k 1) >
as before, and let 0 < E < 1. To use Lemma 3 and the second Borel-Cantelli
lemma (cf. Theorem 2.1.9ii), it is necessary first t o consider at least pairwise
independent events. We actually can define mutually independent ones as
follows. Let
Ak = [S,, Snkpl > (1 ~ ) ~ ' ~ a l c b l c ] ,
- pk, >
- -

where, since s,, k 1,p > 1 we let

a; s:, , (1 pE2),
=
-
- -

b; = 2 log log ak 2 log log s,, < 2(1 + E) log k,


for all large eiiough nk (or k). Thus d,, b k a i l = o(s;:) = o(1).
Let z, = bk in Lemma 3. Then the above definitions of ak, bk yield with
(14), and S,, Snhp,
- for S, there (all the conditions are now satisfied),

Since xk,l k-('-")' = 00, Theorem 2.1.9 implies P ( A k , i.o.)=l.


It is now necessary t o embed this &sequence in a larger, but dependent,
sequence D k = [S,, > Sk] for a suitable Sk and show that Lk D k > Lk Ak
t o deduce the result. The crucial work is over, and this is a matter of adjust-
ment. For simplicity, let v: = 2 loglogs,. For large enough nk, consider
374 5 Weak Limit Laws

where /3 > 1 is chosen large enough so that (1-~)'(l-p-l)ll2 > ( 1 - ~ ) ~ + 2 / p


holds. This can be done. With this choice, consider the events Bk = [ Sn,-, I <
2 s n , ~ , v , , ~ , ] .By (31) P[Bi,i.o.]=O. But we have

Since P [ A k , i.o.]=l and P[Bi,i.o.]=O, we get P [ A k n B k , i.o.]=l. Thus

Consequently, we have

Since E > 0 is arbitrary, by letting E \, 0 through a sequence in (33) we get


the opposite inequality of (4) with probability one. These two together imply
the truth of (3a). As noted before, this gives (3) itself, and thus the theorem
is completely proved.

This important theorem answers some crucial questions but raises others.
The first one of the latter kind is this: Since many of the standard applica-
tions involving i.i.d sequences of r.v.s are not necessarily bounded but will
have some finite moments, how does one apply the above result? Naturally
one should try the truncation method. As remarked by Feller (1943), if the
X,, n > 1 are i.i.d., with slightly more than two moments, then they obey
the LIL. In fact, let Y, = X,xArL,where A, = [ X,I <n1I2 log log n] and
E(Xf(1og X I I ) ' + ~ )< oo for some E > 0. Then one can verify with the Borel-
Cantelli lemma that P [ X n # Yn] < oo, so that X, = Y, a.e. for all large
n. But the Y, are bounded and the reader may verify that the Yn-sequence
satisfies the hypothesis of Theorem 1. Hence that result can be applied t o the
i.i.d. case with this moment condition. However, this is not the best possible
result. The following actually holds. The sufficiency was proved in 1941 by P.
Hartman and A. Wintner, and finally in 1966 the necessity by V. Strassen.
We state the result without proof. Note that, since "limsup, S," defines a
tail event, the probability of that event is either 0 or 1, by the "0-1 law."

Proposition 4 Let {X,, n >


1) be a sequence of i.i.d. random variables
with zero means. Let Sn = X k . Then LIL holds for the S,-sequence, in
the sense that
P [E
n-00
sn
( n log log n )
=JI]= l (34)

iff E ( X f ) < oo, and then E ( X f ) = 1.


5.6 Application t o a Stochastic Difference Equation 375

In a deep analysis of the case of bounded r.v.s, Feller (1943) has shown that
for any increasing positive sequence {a,, n >
1) and S, = X k with X k
as (bounded) and independent r.v.s, one has with s i = Var S,, E ( X k ) = 0,

according t o whether C n > l ( l / n ) a n exp{-ai/2) is finite or infinite. Moreover,


the same result holds for unbounded r.v.s, provided there exist numbers a 0 >
O , E > 0 such that

In the i.i.d. case, (35) reduces t o the sufficiency part of the previously stated
result. (For the Bernoulli case, see Problem 35.)
Another point t o note here is that in both Theorem 1 and Proposition 4
>
the {S,, n 1)-sequence satisfies the central limit theorem, in the sense that

Thus one can ask the question: Does every sequence of independent r.v.s
>
{X,, n 1) which obeys the central limit theorem also obey the LIL? What
about the converse? In the i.i.d. case, Proposition 4 essentially gives an answer.
The general case is still one of the current research interests in probability the-
ory. The preceding work already shows how the combination of the ideas and
methods of the strong and weak limit theory is essential for important inves-
tigations in the subject. In another direction, these results are being extended
for dependent r.v. sequences. Some of these ideas will be presented in the next
section which will also motivate the topic of the following chapter.

5.6 Application to a Stochastic Difference Equation


The preceding work on weak and strong limit theorems is illustrated here by
>
an application. Let {X,, n 1) be a sequence of r.v.s satisfying the first-order
linear stochastic difference equation

where Xo = 0 for simplicity and the E ~ t, >


1, are a sequence of i.i.d. r.v.s
with P [ E=~ 01 = 0, means zero and variances o2 > 0. The constants a, are
real, but usually not known.
376 5 Weak Limit Laws

A problem of interest is t o estimate the ai, based upon the observations of


{Xt, t = 1 , . . . , n}. Thus, if the values of (the nonindependent) X t , t = 1 , . . . , n
are observed and ai is estimated then using (1) the value of Xn+l can be
"predicted." For this t o be fruitful, the estimators &in which are fuiictioiis
of X I , . . . , X,, should be "close" t o the actual ai in some well-defined sense.
The subject can be considered as follows. This is a useful application of the
Donsker invariance principle which illustrates its use as well as its limitations.
For this, one can use the familiar principle of least squares, due t o Gauss.
This states that, for n > k one should find those &(w) for which

zp-zaixtpi) (W) =
t=l
E,"(w)

is a minimum for a.a. (w). This expression is a quadratic form, so it has a


unique minimum given by solving the set of equations (with X t = 0 for t 0) <
Chi, =CXtXt-j, j = I , . . . , k , a.e. (2)

Assuming that the matrix (Cy=l XXiXt- j, 1 i , j < <


k) is a.e. nonsingular,
(2) gives a unique solution t o (irln,. . . , irk,) by Cramer's rule.
In this setting, the first-order (i.e., k = l ) stochastic difference equation (1)
(with a1 = a ) , and the least squares estimator 8,,illustrates the preceding
analysis for obtaining the asymptotic d.f. of the (normalized) errors (6, - a).
It will be seen that the limiting probability distributions of these "errors"
are different for the cases: a < 1, a = 1 (a = 1 and a = - I ) , as well as
a > 1. There is no "coiitinuity" in the parameter a. It illuiniiiates the use
and limitation of invariance priiiciples of the preceding sectioiis at the same
time.
A comprehensive account of the asymptotic distribution problem can now
be given. The solution of the problem for k = l , which is already non-trivial,
is given by:
Theorem 1 Let {X,, n > 1) be a sequence of r.v.s satisfying (1) with
>
k = 1, and the {E,, n 1) be i.i.d. with means zero and (for convenience) unit
variances. If &, is the least squares estimator of a based on n observations,
and g ( n ; a ) is defined by g ( n ; a ) = [n/(1 - a2)]-1/2 if la1 < l ; = n / f i if
a1 = 1; and = Ian(a 1)-'12 if la > 1, then
-

lim P[g(n;a)(&
n-00
- a) < z] = 1: f (u) du, 2 t 8, (3)

exists, where (i) f (u) = (llfi) exp(-u2/2) if if a1 < 1 [so that the limit is
+
the N(O,1) d.f.], (ii)f (u) = [ ~ ( lu2)Ip1 for a1 > 1 when the E, are also
normal (so that the limit is the Cauchy d.f.), and (iii) for I a1 = 1 [the E,
being as in (ii)]
5.6 Application to a Stochastic Difference Equation

f 2 -112
(x? = ( 8 ?~ /R ( )
t)i3/2
COS(~(X, t ) - 83 ( x 7t ) )
2

where p, r, 6, 6' are defined by the (complicated) expressions as follows:

(
p 2 ( x , t )= 2 1 - -
id2 (sinh2 & + Ix (sinh2 &
+ sin2 6)
+ cos2 4%) - nfi
( &)
1 - (sin & + siiih &),
r 2 ( x ,t) = sinh2 & + cos2 &+ 2zt (sinh2 &+ sin2 6)
-

+ y m ( s i n & - sinh &) ,

Q(x,t) = arc tan{


1 - (a/2)@ (coth + cot a)
1 - ( a / 2 ) @ (tanh - tan 6)
x tan& tanh&),

6(x, t) = arc tan


~cos(at/&) + D sin(at/&)
D cos(nt/&) - ~sin(at/&)

Here C , D are also functions of x , t , n and are given by:

c= (sinh &cos & - cosh &sin


( &)
I-- &)

+a- sinh &sin 6,


(sinh &cos & + cosh &sin
D =
( &)
I-- &)

-a@ cosh &cos &.

Furthermore, if I a > 1 and the E , are also normal, o r 1 a1 < 1 and the E ,
satisfy the only conditions of (i)(so that they need not be normal) t h e n we
have

I n fact, if if a1 > 1, the limit d.f. of the (6, - a) depends o n the d.f. of the
&,-the 'hoise," in the model (1)( w i t h F=l) and the invariance principle i s
inapplicable.

The proof of this result is long and must be considered separately in the
three cases: 1 a < 1,l a1 = 1, and 1 a1 > 1. This result exemplifies how real
378 5 Weak Limit Laws

life applications demand serious mathematical analysis. Thus it is useful t o


present some auxiliary results on the way t o the proof. The next statement
is an extension of the central limit theorem for "m-dependent" r.v.s used in
proving (i), and the result has independent interest.
> >
A sequence {U,, n 1) of r.v.s is called m-dependent (rn 0 an integer) if
>
U1, U2,. . . , Uk is independent of Ue+i, Ue+2,. . . whenever e - k m. If m = 0,
then one has the usual (mutual) independence. We should remark that this
unmotivated concept is introduced here, since in many estimation problems
of the type considered above such sequences appear. Let us establish a typical
result which admits various generalizations. (Cf. Problems 41 and 43.)

Proposition 2 Let {U,, n > 1) be a sequence of m-dependent r.v.s with


means zero and variances 02, but sup, a; = a2 < oo. Let Sn = Uk.
Then

whenever a3(S,)/n + +oo as n + cm.

Proof The condition a 3 ( S n ) / n + oo is always satisfied in the i.i.d. case,


since 03(s,) 3 / 2 E(U;) > 0. Let k
= n 3 / 2 [ ~ ( ~ f ) ] with > 1 be an integer t o
< <
be chosen later, and set n j = [jn/k], the integral part, 0 j k. Define two
>
sequences {Xj, j 0) and {q, j> >
0) as follows ( j 0):

The m-dependence hypothesis implies that X I , X 2 , . . . are independent r.v.s,


and if n is large enough [e.g., if n > k(2m - I ) ] , then Yl, Y2,. . . are also
independent. Since o2(S,) + cm, one sees that the initial segment of k(2rn-1)
+.
r.v.s has no influence on the problem because (XI . .+Xk(2m-1)/o(Sn) + 0
P

even if k is made t o depend on n , but it grows "slowly enough". Consider the


decomposition

If Ui/ and U p are parts of S


A and S:, respectively, then they are uncorrelated
unless li' - iffl< m, and thus at most m terms are correlated. Also,

<
since the X j are independent and E(x;) m202. Similarly (since the Y , are
also independent) one gets the following estimate:
5.6 Application t o a Stochastic Difference Equation

Hence

But from (6)

IE(s~) - E ( ( s A ) ~ ) ~ < E ( ( s ~ )+~2E(SASl)I


)
< km2a2+ 2km2a2 = 3km2a2 [by (7)]. (8)
Thus far k > 1 has been arbitrary, and the additional growth on a(Sn)
has not been used. Let k = kn = [n2/3],so that a2(Sn)/kn + oo. Hence with
this choice, (8) gives

11 - a2(s;)/a2(s,)I < 3m2a2kn/a2(s,)+ 0 as n + oo.

Also [using (711, a2(s;)/a2(sn)+ 0, E(s;s;)/02(sn) + 0, and then


S;/a(Sn) 3 0 (by Proposition 2.3.1), so that (cf. Problem 11-the Cram&
Slutsky theorein-in Chapter 2)

Thus we have reduced the problem t o finding the limit d.f. of SA/o(SA) of
independent summands which, however, are not identically distributed. For
this we now verify Lindeberg's condition (cf. Theorem 5.3.6). If Fj is the d.f.
of X j , then the following should tend t o zero. Indeed,

as n + oo, since 02(Sn)/kn + ce by hypothesis and the choice of k,. Coii-


D
sequeiitly, by the Lindeberg-Feller theorem, SA/o(SA)+ t o an r.v. which is
N ( 0 , l ) . This and (9) establish the proposition.

Remark In a note in Math. Nachr. 140 (1989), 249-250, P. Schatte has


given an example t o show that the central limit theorem does not hold for
"weakly m-dependent" random sequences. However, his calculatioiis based on
conditional moments with evaluations, not satisfying the Conditional Analysis
of Chapter 3, do not apply t o the situation considered here (or in Chung
(19741, Sec. 7.3), contrary t o his assertions there. The difficulty may be in
his manipulation on conditioning sets of probability zero as explained in our
Chapter 3 in detail.
380 5 Weak Limit Laws

We now turn t o the proof of Theorem 1, in stages. First the case 1 a <1
is considered.

Proposition 3 If a < 1, t h e n T h e o r e m I i s t r u e as stated.


Proof Consider

n
D
=
,-112 C E ~ x ~ - (cf.
~ Problem
/ ~ ~ 38). (10)
t=l
[Actually in this calculation we use the result of Problem 11 of Chapter 2,
especially the Cram&-Slutsky calculus. The reader should review it now if
not already done so.] We now assert that the d.f. of this r.v. tends t o N ( 0 , l ) .
In fact, let

Given q > 0, choose m >


1 such that a2(m+1) < q ( l - a 2 ) a P 4 . We fix this
m and produce an m-dependent sequence whose partial sums have the same
limit behavior as that of the A,. If an r.v. B,,, is defined by

then Var(A, - E n )5 ( - rn - a 2 ) . Hence for n


2 ) a~2(m+1)/(l - > rn,

D
Thus (A,/&) = (Em,,/&). Consider Y , = ~j C:=,a'~~-~-l. Then

since E(C,,,) = 0 = E(E,,,) and


5.6 Application to a Stochastic Difference Equation 381

as n + oo. Also, in (12) Var(C,,,) = n a 4 ( 1 - a2(m+1))(l - a2)-l,so that

>
a ( ~ , , , ) / n l / ~ + +m. Thus (5,
j I} is an rn-dependent sequence satisfying
the hypothesis of Proposition 2. Hence (10) and (12) imply that

which is N ( 0 , l ) . This completes the proof, on setting a2 = 1.

>
For the cases 1 a / 1, by hypothesis the ~i are N ( 0 , l ) . The exact d.f. of
g(n; a)(&,-a) can be derived in principle. This is accomplished by calculating
the moment-generating function (1ng.f.) of ( X I , . . . , X,), and then we shall
be able to find the limit m.g.f. of this quantity. By inverting the latter, the
limit d.f. of g(n; a)(&,- a) is obtained. Let us fill in the details.
Because the ~i are independent N ( 0 , l ) and EI, = X k - axkp1, the trans-
formation from the ~i to the Xi is one-to-one with Jacobiaii unity, aiid we
find that the density of the X k to be:

where A is the n-by-n symmetric positive definite matrix given by A = (azj)


with
a,i = 1 a2, + i = l , ...,n - 1
ann = 1, u ~ ( , + ~=) -a, i = 1 , .. . , n - 1.
aiid a,j = 0 otherwise. Now XI = ( z l , .. . ,x,) is the row vector (prime for
transpose).
On the other hand, (6, - a) can be written (c.f. (1) and (2) with k = 1)
as

with XI = ( X I , . . . , X,) aiid B a symmetric n-by-n matrix having 2 a on the


diagonal except for the nth element, which is zero, -1 for the first line parallel
to the diagonal and zeros elsewhere. Here C is an n-by-n identity matrix
except for the nth element, which is zero. The joint m.g.f. of the numerator
aiid denominator r.v.s in (14) is given by

= (271)-"/' k,, exp{-(112) (x'Ax) + u ( z f B x )+ ~ ( x ' C X ) }dzl . . . dx,


382 5 Weak Limit Laws

where D = A - 2uB - 2vC, which is positive definite if u, v are sufficiently


small, so that the integral in (13) exists. Since D is symmetric, it can be di-
agonalized: D = &'FA&, where Q is an orthogonal matrix, and FAis diagonal
with eigeiivalues X j >
0 of D. Setting y = Q'z aiid noting that the Jacobian
in absolute value is =1, we get for (15)

+ +
But if we let p = 1 a2- 20 2au, q = -(a! +
u), then writing D, for the D
which depends on n , we get by expansion (of D, = A, 2uBn 2vCn)
- -

with boundary values D l = 1 and D 2 = p - q2. Let 1-11, 1-12 be the roots of the
characteristic equation of the difference equation (17):

1 1 2 1/2.
p2-pp+q=0, sothat p 1 , p 2 = -2p * - (2p 2 - 4 q ) (18)

Substituting (19) in (16), we get the m.g.f. of the r.v.s for each n,. As we
shall see later, one can employ Cramkr's theorem (Theorem 4.3.2) to obtain
the exact distribution of h, - a! without inverting m(., .) itself. This is still
involved. So we turn to the asymptotic result.
For the limiting case, first observe that

aiid so consider the joint m.g.f. obtained from m(u,v) by setting

where D, is D with u, v replaced by tl/g(n; a) and t21g2(n;a),respectively.


We now calculate limn,, A(l,(tl,t2) using the conditions a ! < 1,= 1, or
> 1 in (20). Recalling the values of g(n; a) aiid expanding the radical in p l , p2
of (18), we get the following expressioiis with an easy computation:

Case 1 a ! <1
5.6 Application to a Stochastic Difference Equation

Case 2 Ial >1:

Case 3 1 a / = 1 :
fiat1
pl=l+-
n
+ 2 inf i + 0 ( n P 2 ) ,
-

Substituting these values in (20) and simplifying, one gets


lim M n ( t l l t 2 )
n i m

-
- I (1 2t2

[exp
-

(-%)
+
e x ~ ( t 2 t:/2)

- tf)P1/2

(cos26- *2 6
ifla1 < l

ifla1

~ i n 2 ~ ) - ifl a1
>1
/ ~ = 1.

(21)
[The calculations from (17) leading to (21) are due to John S. White.] Replac-
ing t l , t2 by itl and it2, we see that the limit ch.f. 4 is given by
$(tl, t2) = lim Atr,(itl, it,).
n-00 (22)
Incidentally, this shows that (corresponding to I a1 > I ) ,

is a ch.f., which we stated (without proof) in Example 4.3.5.


To get the desired limit d.f. of g(n; a)(&,- a), consider
F ( x ) = lim P [ g(n; a )(6, - a) < x]
n-00

= P[U - x v < 01 = P[U/V < x],


384 5 Weak Limit Laws

where
XIB,X XIC,X

Hence kiiowiiig the ch.f. of (U, V) by (22), we need t o find the d.f. of U/V
which gives the desired result. Here one invokes Theorem 4.3.2. With that
result, we get the density f of F , the d.f. of U/V, as follows:

= i e i v x v 2 2d [by (21) and (22)j


2

This was seen in Proposition 3, even without the normality of the ~ i The .
above simple calculatioii verifies the result with this specialization, and is in-
cluded only for illustration. It also shows how iiivariance principle is at work.

Case 2 1 a1 > 1 : In this form, we have already evaluated the integral in


Example 4.3.5, and it gives the assertion of the theorem, with E as N ( 0 , l ) .
Here the limit distribution depends on that of ~ ~aiid' the
s iiivariance principle
does not apply!

Case 3 1 a1 = 1 : In this case, one uses Theorem 4.3.2 and evaluates


the resulting integrals. It involves similar techniques, though a careful (and
a tedious) calculation using some tricks with trigonometric and hyperbolic
+
fuiictioiis [and 4 = (1 i)/z/Z, etc.] is required. We shall omit this here.
(It is given as Problem 40.) This then yields the desired result of (4). The
invariance principle is applicable here.
It remains t o prove (4'). Again note that

using the notation of (23). Hence the asserted density is obtainable from
Theorem 4.3.2 after getting the ch.f. $ of (u/@). This is seen by using the
illversion formula for (U, V) aiid simplifying the following where
5.6 Application t o a Stochastic Difference Equation 385

Here, on substituting for 4 from (21) [or(22)], one recognizes the ch.f. of
the gamma density in the simplification. The easy computation is left t o the
reader. Then by a straightforward evaluation, one gets

which is (4').
In the case that a < 1, (cf. Problem 38) we have
n
[(1- tw2)/la] xffl+ rr 2 = 1 in probability.
t=l
However, if I a1 > 1, then a slightly more involved calculatioii of a similar
nature shows that

D
where V > 0 a.e. Hence g(n; a)(&a ) + U/V; and since V is a positive r.v.
(which is not a constant), its d.f. is determined by that of the E~ Consequently
the limit d.f. of the estimators in this case depends on the initial d.f. of the
i.i.d. "errors." This is the substance of the last comment. With this the proof
of Theorem 1 is filially finished.

Remarks 4 The result of Theorem 1 for la1 < 1 holds for all ~i as stated
there, without specification of their d.f. This is a case of an invariance principle
for dependent r.v.s which we have not proved. Also, in the case that a = 1
one can apply Donsker's theorem itself, since the fuiictioiial

and (f' = df /dz is t o exist), has for its set of discontinuities zero Wiener
measure, and we may apply the corollary of Skorokhod's theorem. If the &i
have a symmetric distribution, then at-ki = for a ! = 1, and the same
reasoning holds. However, if a > 1, then no invariance principle applies.
Even in the best circumstances, the Lindeberg condition fails, and, as Theo-
rem 4.6 (Prokhorov) implies, an invariance principle caiiiiot be expected. The
last part of Theorem 1 is an explicit recognition of this situation. An extension
of Theorem 1 if X o is a constant (# 0) is possible and is not difficult.

We have not coilsidered the case that k > 1 in (1) here. It is not a simple
extension of the above work, but needs additional new ideas and work. A brief
discussion was included in the earlier edition. We omit its consideration now.
5 Weak Limit Laws

Exercises
1. Let X 1 , X 2 , .. . be independent r.v.s such that P [ X k = k3I2] = P [ X k =
-k3I2] = 1/2k and P [ X k = 01 = 1 - k-l. If Sn = x i = l X k , verify that the
sufficiency condition p(Sn)/a(Sn) + 0 as n + oo is not satisfied in Theorem
1.3. Show, however, that Sn/a(Sn)2 S, which is not N ( 0 , l ) .

2. For the validity of Liapounov's theorem existence of 2 6 moments +


<
is sufficient, where 0 < 6 1. Establish Theorem 1.3 by modifying the given
proof, with the condition that ( x r = l E(I + 0 as n + oo.
[Hint: Note that I ezz - 11 5 2 l P 6 X I , and hence for the ch.f. cjk of XI, with
E ( X k ) = 0, one has

3. Let { X n , n > 1) be independent r.v.s such that P[Xk = k"] =


12. = P [ X k = k " ] . Let Sn = C;=,Xk. Show that if a > -:, then
S, 3 Y, and if a = :, P [ Y < y] = (4/7r)'I2 J: c u ' d u , y > 0.
i,
Deduce that if a = WLLN does not hold for this sequence X k . [Use Euler's
formula for the asymptotic expressions, x i = l km n m + l / ( m + 1) for m > 0.1

4. Just as in Problem 2, show that the Berry-Essken result can be given the
following form if only 2 + S, 0 < S 5 1, moments exist (and the corresponding
p / a + 0): If the X,, n >
1, have 2+S moments aiid the r.v.s are independent,
let Sn = C i = , X k , p2+"Sn) = E ( xk2+" aiid 02(sn) = Var Sn > 0.
Then there exists an absolute constant Co[= Co(6)]such that if G(.) is N ( 0 , I ) ,

5. Let {X,, n > 1) be i.i.d. Bernoulli r.v.s, PIX1 = +1] = PIX1 =


n
-11 = 1, so that PISn = k] = (k) (1/2)", Sn = XI. Using Stirling's
approximation, n! = finn+('/') . e-n+811 , where Q, 1 < l/l2n, and letting
21, =(k n / 2 ) / m , show that, for each o o < a
- < 21, < b < oo, we have
as n + oo,

so that the order of the error in Theorem 1.5 cannot be improved.


Exercises 387

6. Let {X,, n >


1) be i.i.d. random variables with a common d.f. whose
density f is given by

where

D
If S, = Cy'l Xi, show that (logn/n)S, + S as n + oo,and find the d.f. of
S. [Hint: Use ch.f.'s and the continuity theorem.]

7. If in the above example the common d.f. has the following density:

D
find the normalizing sequence a , > 0 such that S,/a, + S as n + oo. Does
a, = [n/(2rn 1 ) 1 ' / ~work? What is the d.f. of S ? Discuss the situation when
-

m = 1 (the Cauchy distribution).

8. Give a proof, based on ch.f.s, of Proposition 2.2(b): If X is a bounded


liondegenerate r.v., then it cannot be infinitely divisible.

9. In contrast to the assertion of Corollary 2.4, if $ is a ch.f. which is not


infinitely divisible (but nonvanishing), then $'[= exp(X Log $)I need not be a
ch.f. for X > 0, where X is not an integer but an arbitrary positive real number.
The following two ch.f.s illustrate this. (a) Let 4(t; n ) = (q pe"))", p q =+ +
1, < q < 1. Show that for some real X > 0 , 4 ( . ;A) is not a ch.f. (b) A stronger
assertion is as follows. Let $(., ., m, n) be a bivariate ch.f., given for I pla 1 < 1
by

4 ( t l , t z ; rn, n ) = (1 itl)-mn/2(1 ita)-m'2


- -

x [I pf2tlt2/(1 i t l ) ( l i t 2 ) 1 ~ ' ~ .
- - -

This is known to be a ch.f. Consider (for pl2 # 0)

where M , N , R > 0 are real numbers. Verify, on assuming that it is in fact


a ch.f. and inverting, that its "density function" exists for Ip121 < 1, but is
-
negative on a set of positive Lebesgue measure, if Rp:, > min(M, N ) . Thus
4 is not a ch.f. for a contiiiuum of values of M , N, R. [This is not a trivial
problem, and it is first observed by W. I?. Kibble. Note that would have 4
been a ch.f. if $ were infinitely divisible, and the conclusion implies that it
5 Weak Limit Laws

cannot be infinitely divisible.]

10. Let {Xnk,1 < < k k,,n > 1) be rowwise independent r.v.s with
s n = C'd
kr' X n k ,E ( X n k ) = 0. Suppose that E ( s , ~ + " ) KO < oo for a
0 <6 < +
1 (so that each X n k has 4 6 moments by Problem 2 in Chapter
2). Suppose also that Var Xnk + 0 uniformly in k as n + cc (essentially
infinitesimal) and Var S, + a2 < cm. Show that S, 2 S, which is N ( 0 , a 2 )
iff E(S:) - 3[E(S;)l2 + 0 as n + cm. [Hints: Use Proposition 4.1.3 in one
D
direction; for the converse, infinitesimality implies that when S,, + S for any
subsequence, S must be iiifinitely divisible. Then the given moment condition
implies that in the Kolmogorov pair (a,K),a = 0, and K must have a jump
at z = 0 of size a2 and be constant elsewhere.]

>
11. Let ISn,n 1) be partial sums as in the preceding problem and with
4 moments, but assume that each S, is infinitely divisible. (This is the tradeoff
for S = 0 in the above.) Then Sn 2 S and S in N(0, a2)iff E(Si) + a2 and
E(S;) 3[E(S:)l2 + 0 as n + cm. (Since S is necessarily iiifinitely divisible
-

now, one can proceed as in the last problem again. The point of these two
problems is that the coiiditioiis are only on the moments and not on the d.f.s
themselves. These observations are due t o P. A. Pierre (1971).)

12. Establish the Lkvy form of the representation of an infinitely divisible


ch.f. q5; i.e., prove formula (20) of Section 2, or Theorem 2.6, in complete detail.

13. (a) Let 4 be an infinitely divisible ch.f. and (7, G) be its Lkvy-
Khintchine pair. Then show that q5(2"), the (2k)th derivative, exists iff sro0
z2'
dG(z) < cm. (This is a specialization of Proposition 4.2.6.) (b) Let $ be the
ch.f. of a d.f. Show that q5 = exp($ 1) is an infinitely divisible ch.f. Hence,
-

4
using the remark after Theorem 4.2.4, conclude that q5, + # O,q5, infinitely
divisible ch.f.+4 4
is a ch.f. [But if is a ch.f. theii it must necessarily be
infinitely divisible, by Proposition 2.3.1

14. Let X be a gamma r.v. with density fap given by

and =O otherwise. Show that X is infinitely divisible and the Lkvy-Khintchine


pair (y, G) of its ch.f. is given by
00
dz G(z) = {;Jt&epfidufor z>O
<
for x 0.

(First define yn, G, as in the proof of Theorem 2.5 and theii obtain y , G by a
limit process.)
Exercises 389

15. For the same d.f. as in the preceding problem, show that the corre-
sponding Kolmogorov and Lkvy pairs are (y, K) and (y, a', M , N) where

16. The following interesting observation is due to Khintchine on a relation


between the Riemann zeta function and the theory of infinitely divisible ch.f.s.
Recall that the zeta function ((.) is defined by the following sum (and the
Euler product):

where the product is extended over all prime numbers pi > 1. If 4 is defined
+
as 4 ( t ) = <(a it)/<(o), show that q5(.) is an iiifiiiitely divisible ch.f. (Note
that q5 never vanishes, and Log q5 can be expressed, with the Euler product,
as a limit of suitable Poisson ch.f.s)

17. The definition of infinite divisibility may be stated in an apparently


weaker form. Since X is iiifiiiitely divisible iff for each n, X = EL=, Xnk,
where Xnk are i.i.d. for 1 5 k 5 n , we may relax the identically distributedness
as follows. Say X is infinitely divisible in the "generalized sense" iff for each
E > 0, there exists an n [ = n ( ~ )and < <
] independent r.v.s X k ,1 k n, such that
X = xi=, > < < <
XI, and P [XkI E] & , 1 k n. Alternatively, if q5 is the ch.f.
of X , then q5 = n;=, q5k, where q5k is the ch.f. of Xk with 11 q5k(t)I < E for
-

I t 5 1 / ~SO, that both n and the 4k depend on E . We then have the followiiig
assertion: The r.v. X (or its ch.f. q5) is iiifiiiitely divisible in the generalized
sense iff it is infinitely divisible in the sense of Definition 2.1 [The method of
proof uses several estimates on ch.f.s. That the ordinary + generalized sense
is easy. For the reverse implication, note that 4 never vanishes, and the proof
proceeds by centering the r.v.s at their medians and writing:
n
Log 4(t) = iy,t +C +
( t )- 11 o(1),
[$j
j=1
$j(t) = E(exp{it(Xj - mj)}) with rnj as a median of X j ,

so that
5 Weak Limit Laws

+k (e'tx - 1 - -)
itx
1+x2
1 + x2
-ci~,(z) o(1)
x2
+
for suitable y,, F,, G,. Since E > 0 is arbitrary, let E \, 0, and show that the
right side gives the L&y-Khintchine formula, and hence the converse. The
details need care as in the necessity proof of Theorem 1.5. This result is due
to Doob (1953). Because of this result, "in the generalized sense" is omitted.]

18. Let X be an r.v. with density function f , given by

If X n k is distributed as X for each 1 5 k 5 n , show that, with Proposition


3.2 or otherwise, the X n k ,1 k< < n , are infinitesimal. Let S, = C r = l Xnk
D
Ascertain the truth of the following: S, + S as n + cm and S is a Poisson
r.v. with parameter X = 1. [Even though the result can be established by
finding the limit of the ch.f. of S, by a careful estimation of various integrals,
it is better to use Theorem 3.8 after verifying that the Lkvy-Khintchine pair
>
(y, G) for a Poisson d.f. is given by y = X/2 : G(x) = X/2 if x 1,= 0 if x < 1.1

19. The theory of infinitely divisible d.f.s in higher dimensions proceeds by


essentially the same arguments as in R. Thus the (integral) representation of
the ch.f. q5 of a pdimensional r.v. is given by Lkvy's form (by P. Lkvy himself)
as
it'x
4(t) = exp {iy't -
1 x'z +
for t E Rp, where t = ( t l , . . . , t,)' (prime for transpose) is the column vector,
y' = (71,. . . , yp) with yi E R, (ylt = C f = l yiti), r = (oij) is a p x p positive
(semi-)definite matrix of real numbers, and v is a measure on the Bore1 sets
of Rp such that

2/11:u(dx) < cm, u(dx) < cm.


ixlx<ll

The factor exp{iylt - i t ' r t ) is called the Gaussian component of q5 and the
rest the generalized Poisson component. The correspoiidiiig Kolmogorov form,
if X has two moments finite, is given by
Exercises 39 1

where K({O)) = 0, and the measure K is finite on the Bore1 a-algebra of RP.
Here the vector ;jl is given by

Verify these two forms followiiig closely the proof in the one-dimensional case.
In the finite variance case, show that the mean and covariance matrices are

Deduce that if X is a p-vector which is an infinitely divisible r.v. and a E RP,


then Y = a l X is again illfinitely divisible. [Many of these limit theorems ad-
mit immediate inultidiineiisioiial exteiisions but not automatically. The next
problem shows that the converse of the last statement is not always true.]

20. We now present an example, also due to P. Lkvy, showing that the
converse of the last statement of the preceding problem is not true. Further,
this result has additional interesting information. Let X , Y be iiidepeiideiit
<
N ( 0 , l ) r.v.s aiid E = X 2 ,q = 2XY and = Y2. Then show that any two of
the three r.v.s (<, q, 5) have joint d.f. which is infinitely divisible, but that the
joint d.f. of (t,q, 5) is not. Deduce that if each linear combination Cf=la i t i
is infinitely divisible ( a l , . . . , ap) E RP, then one caiiiiot coiiclude that the
vector (El,. . . , t p ) , p > 1, must be infinitely divisible. [Sketch: By the image
law theorem, the ch.f. q5 of (6, q, <) is

<
Since 6, are independent and each is infinitely divisible being gamma (or
Chi-square variables), so is the pair (E, <). It suffices to verify that (E, q) is
also. From the Lkvy canonical form for <, one gets

aiid integrating relative to u from 0 to z, rearranging, and using the ch.f. of


N ( p , 02)suitably, we get
392 5 Weak Limit Laws

The last integral in (2) is known t o be = v-1e-v/2. Substituting this in (2)


and using ( I ) , one has

+ /R(e2ita - l)g(u) du,

where

and g(v) = Iuplev if v < 0, = vpl (epv - e p v A ) if u > 0. Comparing this with
the Lkvy representation of Problem 19, we get that (<, q) is infinitely divisible.
+ +
From this one concludes that a l < aaq asc is also infinitely divisible for any
( a l , a2, a3) E R3.
To see that (El q, <) itself is not infinitely divisible, suppose the contrary.
Then, by the representation of 4 (cf. Problem 19), since E , q , < depend on
the infinitely divisible r.v.s without Gaussian components, Log 4 is a sum
of terms of the form SR(eit'" - l)v(dz), each a Stieltjes integral. Looking at
terms for tl = 0, t 2 = 0, or t3 = 0, one concludes that this is the exponent
of the generalized Poisson part, aiid thus it must be a sum of functions of a
form that depends on ( t l , t 2 ) ,(t2,t3), aiid ( t l , t3) alone. Hence, in particular,
<
it must satisfy (6, q, having all moments finite)

But the q5 of our example does not satisfy this, so that it cannot be infinitely
divisible. Thus there are more surprises than one ordinarily expects from these
r.v.s. Incidentally, this implies that the distribution of the sample covariance
matrix of a random sample from a multivariate normal r.v., called the Wishart
distribution, is not illfinitely divisible. Also compare this with the pathological
example given by Problem 9.1

21. Let X = ( X I , . . . ,X,) be a pvector which is infinitely divisible. If X


has four moments finite, E ( X ) = 0, and q5 is its joint ch.f., let

a4 Log 4
4 ( X i 7 X j )= at: at: Its=o=t7 , 1 < i , j < p
Exercises 393

>
Show that q ( X i , X j ) 0. Verify that if all the Xi are bounded below by a
constant (so that its ch.f. has no Guassian component), then the Xi are inde-
pendent whenever they are uncorrelated. Without the boundedness hypothesis
(using the Kolmogorov representation) show that the components X I , . . . ,X,
>
are independent iff X f , . . . , X i are uncorrelated. [Hints: Since q(Xi, X j ) 0,
the uncorrelatedness of the X9 implies that CiZj q ( X i , X j ) = 0. Thus the
measure K ( . ) must concentrate on the axes xi = 0, i = 1 , . . . p , and implies
the uncorrelatedness of Xi and X j , i # j. The converse is clear. Several spe-
cial properties of such r.v.s with four moments have been discussed by Pierre
(1971).]

22. If X = ( X I , . . . , X,) is an infinitely divisible random vector, show, by


using the Lkvy form of Problem 19, that X I , . . . , X, are mutually independent
iff they are pairwise independent. [This is also due t o Pierre (1oc.cit.) with fi-
nite variances so that Kolmogorov's form is applicable. The idea is essentially
the same even here since one verifies that the Lkvy measure must colicelitrate
on the axes. In general the finiteness of various integrals must be verified as
in the proof of the Lkvy (or the Lkvy-Khintchine) representation.]

23. Complete details of the remark after Proposition 3.13 regarding the
case of the characteristic exponent a = 2 (namely, q5 is nondegenerate,
a = 2 + q5 is a normal ch.f.)

24. This problem and the next show that the bounds on a , p in Theorem
3.16 are necessary and complete the proof there (at least for the case 0 < a <
2, a # 1).Let us write the ch.f. 4 of Eq. (44) of Section 3 as follows:

where t > 0 , 0 < a < 2 , a # 1, and c > 0. It is t o be shown that q5,, is


a ch.f. only if 0 < a < 2 and P < tan(xa/2)1. For convenience, express
p = tan x r l 2 , r l < 1, and thus for the above it suffices t o establish that
(since $,,p is a ch.f. iff go,, defined below is) the following is a ch.f.:

Let $,,, be a ch.f., and consider the density

using q5,,p, consider the density

and f/3 depending on whether t > 0 or t < 0.


394 5 Weak Limit Laws

(a) Verify that g(x; a,P) = g(-x; a, -P), and if < a! < 1 and P >
tan(.ira/2), then g(0; a,P) < 0. For this note that

by a known integral formula. Writing 1 iP = repie, 0 < 0 < 7r/2, so that


-

g(0; a , ,!?) = (2/a)r(1/a)rp11Ycos(Q/a),deduce that P > tan(7ra/2) implies


the assertion.
<
(b) If 0 < a 113, tan(.ira/2) < P < t a n ( 3 r a / 2 ) , then again g(0; a , P) <
0. Indeed, in the last expression for g(0; a,P) use the fact that the limits for
,!? imply that 7ra/2 < Q < 37ra/2, aiid hence cos(Q/a) < 0. Thus the assertion
follows.
(c) Using (a) and (b) and the fact that g is a contiiiuous function, deduce
that it cannot be a density, so that q5,,p for 0 < a < 1, p > tan(.ira/2), is
not a ch.f.

25. (Continuation) The result for 1 < a < 2 is somewhat -more in-
volved. First verify that p(x : a,y) = p(-z; a , y ) as before, since $,,,(t) =
$Y,, (-t).
(a) Let 0 < a, -1 < y < min(2a! - 1 , 1 ) ; then for each x > 0

for, on integrating by parts, note that [with a change of variable u = (tx),b]

where f (z) = exp(i.iryI2 - izll" - ~ x - " e " ~},~with


/ ~ z = u + i v . Evaluate
the integral by the residue theorem along a sector of angle Q such that 0 <
+
Q < (7r/2)(y 1) aiid Q/a< 7r. Since 1 < y < min(2a 1, I ) , this sector is
-

nonempty. Setting z = repie, 0 < r < oo, aiid noting that

where rR= {z = Repiu : 0 < u < Q},we get

From this the desired relation follows after a computation.


(b) With the result of (a), show that for 1 < a! < 2,$,,, or q5,,p
is not a ch.f. when P > tan(7ra/2)~,for, letting P = ta11(7ry/2), with
y l < 1, this is equivalent to y l > (2 a ) . If 1 < y < 2 a < 1,
- -

then (a) is applicable. If P* = tan 7ry*/2, where y* is as in (a), then


Exercises 395

IP* > t a n ( ~ / 2 a )iff y*I > a-' > 2-l. But if y*l > a-l, then, by Problem
24a, g(0; a!-', P*) < 0. Thus p(0 : a-l, y*) < 0. Hence if P < - t a n ( ~ a / 2 ) ,
then y < a! - 2 + y* < -l/a, and so p(x : a , y ) < 0 as x + oo. By the
contiiiuity of p, this again shows that 4,,p is not a ch.f. However, the set
{ p : 4,,p = a ch.f.1 is convex, aiid symmetric around the origin. This shows
that if p > I t a n ( ~ a / 2 ) 1then
, $,,p is not a ch.f., and completes the necessity
(a! = 2 being immediate, since then /3 = 0). [This argument, which did not use
the theory of infinitely divisible d.f.s, is due t o Dharmadhikari and Sreehari
(1976).]

(c) We also have a multivariate formulation of stability again due t o P.


Lkvy (1937). A random vector X = (XI,.. . , X,) is symmetric stable of type
a , 0 < a 5 2, if its ch.f. is expressible as:

= exp {- tlhl + . . . + tnhnadG(h1, . . . , hn)}, ti E R,


where G is a (a-finite) measure on (Rn,B) whose support is on the surface of
the unit sphere S of Rn. Verify that this reduces t o Theorem 3.16 for n = 1.
If we define a fuiictioiial I . 1 , as

then show that it is a metric on all such random variables (and a norm if a >
1). [This is a standard Minkowski metric, cf. e.q. Rao (1987, or 2004), Theorem
4.5.4 and Proposition 4.5.6.1 The result shows that each linear combination of
X j (Cy=l ajX3) is a symmetric a-stable random variable (e.g., take ai = biu
in the above). Show that if one defines on the measure space (R, B, G ) , [B
being Bore1 a-algebra] a probability measure R whose Fourier transform R is
given by

where Bo = {B E B, G(B) < oo), then we have

Show that for disjoint Ak E Bo, A = U;=, Ak, R A,...,


~ A,, = R A ~ @ . . . @,,,
RA
a product measure. Using Theorem 3.4.10 (or ll), conclude that there is a
probability space (R, C , P) aiid a (random) mapping p : Bo + L O ( P )such
that the ch.f. of p ( A ) , A E Bo, is given by
396 5 Weak Limit Laws

and moreover RAl, .,A,, = P o TA;, ,*,,, TA,, ,A,, : fl + Rn being the coor-
dinate projection, guaranteed by Theorem 3.4.10. Finally, conclude that the
family of finite dimensional distributions (or image measures) of {p(A),A E
B o ) is precisely I R A ,A E B O )which is a symmetric stable class with the prop-
erty that the values of p are pairwise independent on disjoint sets, hence, also
mutually independent in this case, and p(A) = C;=+(An),An E Bo with
A = Ur=lAn, a disjoint union, the series converging in probability (whence
a.e. by Theorem 4.6.1). [A brief application of such random measures will be
given in the last chapter, aiid the resulting analysis plays an important role in
certain parts of probability as well as potential theories. For many exteiisioiis
and generalizations of these results to Banach space valued functions, the
book by Linde (1986) is a good place to turn to. Contrast the conditions here
with Exercise 20 above, in obtaining a multivariate extension for the subclass
under consideration.]

26. This aiid the next problem present an extension of the stable distri-
bution theory and help in understanding that subject further.
Call a continuous mapping $ : R+ + C stable if (i)$(O) = 1, ( i i ) $(t)I <
1,
and (iii) for each n(= 1 , 2 , .. .), there exist a, > 0 and b, E R such that

Here 4 is not assumed to be positive definite. Thus even if we extend the


definition to R by setting $(-t) = m, > t 0, q5 is not necessarily a ch.f. We
now characterize such stable functions, following Bochner (1975).
Let q5 : R+ + C be a continuous mapping. Then it is a stable function in
the above sense iff the followiiig two coiiditioiis hold:

(i) 4 never vanishes, so that $ = Log 4 is defined aiid is continuous,


$(0) = 0, and
(ii) either $(t) = iCt, C E R (degenerate case), or there exists an expo-
nent 0 < p < oo such that an = nllp (in the definition of stability) aiid

[The proof of this result proceeds in all its details similar to the argument
+
given for Theorem 3.16. Starting from the identity n$(t) = $(ant) ibnt, aiid
hence for all rationals r > 0, one gets

Then one shows that { a ( r ) , r E R+) is bounded; otherwise the degener-


ate case results. In the nondegenerate case, one shows that ultimately, if
Exercises 397

+
p(t) = ($(t)/t) - $(I), then p ( a p ) = p ( a ) p(P), and the familiar func-
tional equations result, as in the text. For another generalization of stability,
see, Ramachandran and C.R. Rao (1968).]

27. (Continuation) A fuiictioii 4 : R+ + C is minimally positive definite


if for all triples t l = 0, t2 = t, t3 = 2t, 0 < t < cm, we have

Let $ be a stable function (as in the preceding problem) which is minimally


positive definite. Then in the representation of $ of the last problem, 0 < p 5
2, the coilstants A, B satisfy the restraints A > 0, B E R)

for 0 < p < 1 , 1 < p 5 2, and dl = (log2)-l. [Since the matrix (q5(t, - t j ) , 1 5
+
i, j 5 3) is positive definite, letting u = $(t), v = $(tl), w = $(t t'), where
t', t > 0, we get from its determinant

which gives, if t' = t, l u 2 - Iw 5 Iw - u 2 5 1 - uI2 + 1 - $(2t)I2 5


4(1 - Iq5(t)I2).Substitute the value of q5(t) from Problem 26 in these expres-
sions and simplify when 0 < p < 1,1 < p <2, and p = 1; then the stated
bounds result after some limiting argument. If 4 is assumed to be fully positive
definite, after extending it to R, using the argument of Problems 24 and 25,
can one get the precise bounds stated in Theorem 3.16?] It will be of interest
to extend Bochner's stability analysis to the multivariate case as in Problem
25(c) above. It opens up another area of investigation extending P. Lkvy's
pioneering work.

28. Let L be the class of d.f.s F which are the d.f.s of r.v.s S such that for
some sequence of independent r.v.s X I , X2, . . . , S, = Xk, then (S, -
D
b,)/a, + S, where b, E R and 0 < a, + cm, with a,+l/a, + 1. The
problem of characterization of L was proposed by A. Khintchine in 1936 and
P. Lkvy has given a complete solution in 1937. Thus it is called the L (or
Le'vy) class. Show that F E L iff the ch.f. 4 of F has the property that for
each 0 < n < I,$, is a ch.f., where $,(t) = q5(t)/q5(at), and hence q5 is
infinitely divisible. [Note first that $ cannot vanish. Indeed, if $(2a) = 0 for
some a > 0 such that q5(t) # 0,O 5 t < 2a, then $,(2a) = 0 and

(see the inequality in the last problem for q5 satisfying [even minimal] positive
definiteness). But $,(a) + 1 as a! + 1, for each a E R, by the continuity of
5 Weak Limit Laws

4, and this contradicts the above inequality.


Sketch of Proof Using

and each factor is a ch.f., by hypothesis, of an r.v. X k , and replacing t by ~ / n ,


letting n i oo, the right-side product converges t o 4, which is a ch.f. This
D
means ( l l n ) Ci=,X k i S, where the ch.f. of S is 4. Conversely, consider
S: = (lla,) C r = l X k b,. The ch.f. 4, of S: can be expressed as
-

where $,,, is the ch.f. of (CL=m+l X k - b, +


b,)am/an. Letting m, n + cc
so that amla, + a,O < a! < 1 (which is possible), since, by hypothesis,
&(t) + 4(t) as n i oo aiid 4 never vanishes, we get

and it is continuous at t = 0. Thus it is a ch.f.=$, (say). This gives the neces-


sity. The first part shows that for each n, 4 is the product of n ch.f.s and that
4 ( k t ) / 4 ( ( k p 1 ) t ) i 1 as k + oo. By Problem 17, 4 is infinitely divisible. This
fact also follows from the result that the ( l / n ) X k ( = X n k ) are infinitesimal,
and hence their partial sums can only converge t o infinitely divisible r.v.s.1

29. Let X be an r.v. with a double exponential (also called Laplace)


density fx given by

Find its ch.f. $. Show that fx is of class C but that 4 is not a stable ch.f.
[Hint: Observe that an exponential density is infinitely divisible, and using
the result of Problem 14 with the Lkvy-Khintchine or Kolmogorov canonical
form, deduce that fx is of class C.With the help of Theorem 3.16, conclude
that 4 cannot be in the stable family, thus the latter is a proper subset.]

30. If {X,, n >1) is an i.i.d. sequence of r.v.s with d.f. F and if S, =


X k , then F is said t o belong t o the domain of attraction of V, provided
that for some normalizing constants a, > 0 and iiuinbers b, E R,(S, -

D
b,)/a, + whose d.f. is V. Thus if F has two moments, then [with a, =
d m ,b,
= E(S,)] F belongs t o the domain of attraction of a normal
law. Show that only stable laws have iioiiempty domain of attraction aiid that
each such law belongs t o its own domain of attraction (cf. Proposition 3.15).
Exercises 399

If in the above definition the full sequence {(S, - b,)/a,, n 1) does >
not converge, but there is a subsequence n1 < n2 < . . . such that (S,, -
D
b,,)/a,, + W, then the d.f. H of W is said to have a domain of par-
tial attraction, with F as a member of this domain. Establish the follow-
ing beautiful result due to Khintchine: Every infinitely divisible law has a
iioiiempty domain of partial attraction. [Sketch (after Feller): If q5 is an iii-
finitely divisible ch.f., then q5 = e i by the Lkvy-Khintchine formula, aiid in fact
4(t) = lim,,, nr=, ei:'(t), where the right side is a Poisson ch.f. ( a "gen-
eralized Poisson") and gn(0) = 0 = $7(O), the $a being continuous. Each
$7 is bounded. As a preliminary, let q5k = e c ~be any sequence of infinitely
divisible ch.f.s with each & bounded. Let X(t) = Ck,, n;'& (akt). Choose
ak > 0, nk < nk+l < . . . such that In,A(t/a,) <,(t)l+ 0. Indeed, choose
-

first a sequence of integers such that (nk/nkPl) > 2'" supt,R Ick(t). Then

Now choose a1 = 1, aiid after a l , . . . , a,-1, let a, be so large that for I t < r
[since &(7) + O as 7 + 01 <k(t)I < (2r2n,)-l. With this choice, the right side
is < r-l, so that the left side + 0 as r + oo.Next, since the given q5 is a limit
of 4, = nr= e$:'(t) = ecr7(t)(say), so that <,
are bounded and continuous,
&(t) + 0 as t + 0. Define A(.) with this <,
in place of 5, above. Then for a
choice of nk, we have eX() infinitely divisible, and

lim exp{n,X(t/a,)> = lim exp{<,(t)} = $(t).


r-00 r-00

Since exp{n,A(t/a,)) is a ch.f. tending to the ch.f. q5, it follows that eX(.)be-
longs to the domain of partial attraction. The original proof is more involved.]

31. Let {Xnk, 1 k < < k,, n >


1) be a rowwise independent sequence of
infinitesimal r.v.s such that E ( X n k )= 0, t i k = Var Snk,where

Suppose the sequence satisfies the Lindeberg condition (as in Theorem 4.6).
Show that, following the argument of Theorem 4.7,

4
.rr k=o (2k + 1)-' exp {- ( 22(a+b)-
k + 1 ) 2 ~ 2 } sin
(2k+l)ar

if a > O , b>O

IO, otherwise.
400 5 Weak Limit Laws

[Hints: First show that the limit exists as in Theorem 4.7. To calculate that
limit, again consider the special r.v.s P [ X n k = +1] = P[X,k = -11 = 21.
Letting a, = [a&] + 1, p, = [ b f i ] +
1, verify that

where

with

+
the binomial coefficient being zero if ( n k)/2 is not an integer. Verify that
CkAk, tends t o the desired limit as n + ce on using the central limit theo-
rem. The Ak, are essentially planar random walk probabilities.]
Deduce from the above that

I 0, otherwise.

32. Let F be a continuous strictly increasing d.f. and F, be the empiric


d.f. of n i.i.d. random variables with F as their d.f. Establish the following
"relative error analog" of the Kolmogorov-Smirnov theorem for 0 < a < 1 :

i m I/
n-oo
sup
a<F(x)
I &(Fn (5)- F(x))

F(z)
where

[Hints: As in Theorem 4.8, take F as uniform, and H,(z) = Fn(Fpl(z)) the


einpiric d.f. of the uniform r.v.s Y , = F ( X i ) . Note that (with V = max)
Exercises 40 1

< <
where Y i , 1 k n , are order statistics from Y,, 1 < i < n. Thus it suffices
to calculate the limit d.f. of

Use the last part of the preceding exercise for this. The details are similar
to those of Theorem 4.8. See Rknyi (19531, in connection with both of these
results, i.e., this and the preceding one.]

>
33. Let {X,, n 1) be i.i.d. random variables with S, = Cr=lX k , So=
0. In the study of fluctuations of the random walk sequence {S,,n 0) in>
we~ had to analyze the behavior of R, = max{Sk : 0 k
Section 2 . 4 ~ < <
n).
In many of these random walk problems, the joint distributions of (R,, S,)
are of interest. Let &(u,v) = E(exp{iuR, +
ivS,)), the ch.f. of the vector
(R,,S,). If S2,S; are the positive aiid negative parts of S, [i.e., S$ =
+
(IS, Sn)/2, S; = Sk - S,], let $,(u, v) = E(exp{iuSi ivS,)). Show +
that one has the identity

If c,(u, v) = E(exp{iuR, +iv(R, S , ) ) ) , p,(u) = E ( e x p { i u S ~ ) ) aiid


, filially
q,(v) = E(exp{ivS; )), then verify that (*) is equivalent to the identity

In particular, if c,(u) = c,(u, 0) = E(exp{iuR,)), then (+) gives

The important identity (+) [or equivalently (*)I, and hence (*+), was ob-
tained by Spitzer (1956), aiid is sometimes referred to as Spitzer's identity.
He established it by first proving a combinatorial lemma, aiid using it in the
argument. Here we outline an alternative algebraic method of (*), which is
simple, elegant, and short, due to J. G. Wendel, who has obtained it after
+
Spitzer's original work. Since 4, (u, v) = c, (u v, -v), we get (*) (+).

Proof Let G,(x, Y) = P[R, < x, S, < y], and note that

Rn+1 - max(R,, S,+X,+i) and S,+i = Sn+Xn+l, so that {(R,, S,), n 0) >
is a Markov process. Also if A> :" [R, < x, S, < y], then by a simple
property of coiiditioniiig (cf. Proposition 3.1.2), we have
5 Weak Limit Laws

Thus, going to the image spaces, this becomes, if F is the common d.f. of
the Xn, (using a inanipulatioii as though the conditioning event has positive
probability, cf. Kac-Slepian paradox, Section 3.2, and thus we get the follow-
ing which needs a rigorous justification, which may be provided since all the
conditional measures are regular in this application, and since Spitzer's orig-
inal argument does not use the conditioning! See also Example 3.3.3 (b)) We
outline the argument as follows:

To go from here to (*), we introduce an algebraic construct on TK2. Namely,


let M be the space of all two-dimensional d.f.s on Kt2, with convolutioii as
multiplicatioii and linear combinatioiis of such d.f.s. [Then M is essentially
identifiable with all signed measures on &t2.] With this multiplication, M
becomes an algebra. Using the total variation as norm, M actually is seen to
be a complete normed algebra with the property that G, H E M + I G * H I <
1 I GI . I H 1 , where G * H is the convolution and I G I is the total variation
norm of G. [Thus (M, I .II) is a "Banach algebra."] If S is the degenerate
distribution at the origin, then S * G = G * S = G and S E M is the identity of
this algebra. If G E M, I G I < 1, then Gn/n! E M (by completeness),
where Gn = G*G*. . .*G (n times). This is denoted exp G(E M) (which holds
for all G E M) and similarly log (6-G) = - Gn/n E M (for I G I < 1.)
One can now verify that exp {log(& G)) = S- G for G with I G I < 1.
-

Next define two linear operators L, M on M into itself by the equatioiis


LG = F * G, where F is the common d.f. of (0, X,) ( the X, are i.i.d.1,
and (MG)(x,y) = G(x,min(x,y ) ) , G E M. It is clear that M 2 G = M G , so
that M is a projection, and M 6 = 6. Also 1 M G l < I GI and 1 LGl 1<
1 I GI 1 , so that L, M are contractions. Further M ( M ) and (I M ) ( M ) are
-

closed subalgebras, M = M ( M ) @(I M ) ( M ) . Since M G E M ( M ) , we get


-

exp{MG) E M ( M ) ,exp {(I M ) G ) E (I M ) (M).Thus for each G E M


- -

there is a G E M such that exp{(I - M)G} - 6 = (I- M)G. This completes


our introduction of an abstract algebra structure.
Let us now transform the recursion relation (i) for the Gn into this
abstract situation. Now (i), in the formal computation, is expressible as
G,+l = MLG,, and (by iteration) G, = (MLIn6, where Go = 6. Thus
for I t < 1,
Exercises 403

Ctn~,, Ct= " ( ~ ~ =) (I-ML)'~


~ 6 =G (say). (ii)
n>O n>O

To derive (*) from (ii), we first assert that

(iii)

for (ii)+ G - tMLG = 6, so that M G - t M L G = M 6 = 6, since M2 = M.


By subtraction, M G = G, and also M [ G - tLG] = M [ G - tE' * GI = 6. An
element G satisfying the last pair of equations is unique, since if G and G'
are two such elements, then MG' = GI, MIG1 tLG1] = S +- G' tMLG' =
- -

S +- G' = (I tML)-lG = G by (ii). To exhibit a solution, consider G =


-

exp{-M(log(6 - tE'))) E M. Then, since M(log(6 - t F ) ) E M ( M ) , we get


G E M ( M ) , so that MG' = G'. Also, because the convolution on TK2 is
commutative (F * G = G * E'), we get

aiid so (multiplication between d.f.s being convolution)

(6 - t F ) G = exp(log(6 - t F ) ) exp{-M(log(6 - tF)))


= exp{log(6 - tF) - M(log(6 - t ~ ) ) )
(by the commutativity noted above)
= exp{(I - M ) log(6 - tF)}
=6+(I-M)G1

for a G1 E M, since log(6 - tE') E M. Hence

Thus G is a solution and so is G, + G = G in (ii). But then

G = G = exp{-M(log(6 - tE'))) = exp


{
Ctn~(Fn)ln
n }
This gives (iii), after (i) and its consequence (ii) are rigorized.
To establish (*) from (ii), let us take Fourier transforms on both sides.
Note that the Fourier transform F of a convolution on TK2 is the product of
the transforms aiid also I F ( G ) I , 5 I GI . Thus the left side of (ii) becomes
404 5 VL7eak Limit Laws

For the right side,

But we see that for any bounded Bore1 function h on TK2,

1STK2 h(z, y)(MG)(dx, dy) = 1lTK2 h(x,y) dG(x,min(x,l/))

Taking h(z, y) = exp{iux + ivy} in (v), we get

=IS,. exp{i(umax(x, y) + vy)} d F n ( z ,Y)


(by definition of M)

exp{iu max(0, S,) + ivS,} dP

[since F is the joint d.f. of (0, X I )


and then the image law relation is used]

Substituting this in (iv) and using it in (ii) and (iii), (*) follows. [Can this
method be used to solve the integral equation of Exercise 2 1 in Chapter 3? In
both cases the method of (ij is to be justified with the Kolmogorov definition
of conditioning, without taking the inanipulatioii for granted.]

>
34. Let {X,, n 1) be an i.i.d. sequence as in the preceding problem, and
Sn = cL=1X k , So= 0, and Sk be the positive part. Show that

(a) (M. Kac-G. A. Hunt)

and
Exercises

(b) (E. S. Anderson)

[Hints: For (a) differentiate (* +) of the above problem relative to u at u = 0,


aiid identify the coefficient of tn on both sides. Similarly for (b), get the re-
sult for inaxk<, S i by replacing X, with X , , then setting u = i X , and
letting X + +A. These and other specializations and applications are given
in Spitzer's paper noted above. Further applications of these results in ob-
taining precise coiiditioiis for a unique solution F of the Wiener-Hopf inte-
gral equation F ( x ) = Jr
f (x y) d F ( x ) , where F is a d.f. on R+ and f (.)
-

is a probability density on R can be given. In fact, this equation may be


D
written, a little more generally, in terms of r.v.s as Y = ( X Y)+, where +
X , Y are independent and F is the d.f. of Y, f being the density of X.
Thus if X n , n >1, are i.i.d., So = 0, S, = C;=, X k , let Yo = So and
Yl = max[Yo,S1]= x?,
. . . , Y, = max(So,S1,.. . , S,). The existelice of a
solution is equivalent to proving that Y, + Y a.e. aiid Y < oo a.e. This may
be shown to hold if the series (* +) of the above problem converges when
t = 1, and there will be no solution if it diverges at t = 1.1

35. Let S n , n >


1, be the partial sums of i.i.d. Bernoulli r.v.s X,, with
>
P[X, = 11 = p, P[X, = 01 = l p = q , 0 < p < 1. If {4(n),n 1) is a positive
monotone increasing sequence, then show that P[S, > np+ m q 5 ( n ) , i.o.1=O
or 1 according t o whether E n > [q5(n)I n ] exp{- $ q52 (n)} converges or diverges.
Verify that the latter series converges iff

converges, where n, = [exp{r log r}], the integral part of the number shown. If
cj2(n) = 2 log log n , then one obtains the LIL for the Bernoulli r.v.s. [Hints:Let
A, = [S, -np > m q 5 ( n , ) for some n , n, < n < n,+l]. Verify that P(A,) I
, ) some coilstant 0 < C < oo. By hypothesis, this
~ e x ~ { - ~ q 5 ~ ( n , ) ) q 5 ( nfor
is the r t h term of a coilvergelit series, so that C,,, -
P(A,) < oo. For the
second part, which is more involved, note that n, - n,-1 n T ( l- (1ogr)-'),
and that [cj(n)/n]exp{-; cj2(n)} < cc only if q52(n) > 2 log logn, and
it diverges if ' ( n ) < 2 log log n. By various estimates deduce that for some
positive constants C1, C2,

and hence there are positive constants C3 and C4, such that

C3
I -1 log -
n,+l
2
406 5 Weak Limit Laws

Define another sequence mT < mr+l < . . . such that

and show that cj(mr)/q5(nr) 1. This gives after a careful estimation of terms

and

These imply the last part, and then the probability statement obtains. This
result if p = q = and q52(n) = 2X log log n was first proved by P. Erdos, aiid
the case of 4 for more general r.v.s is due t o Feller (1943), as noted in the
text. Regarding this problem, see also Feller (1957).]

The following problems elaborate and complement the special application


discussed in Section 6 in illustrating the invariance principle.

36. Let {E,, n >


1) be i.i.d. random variables with means zero aiid vari-
ances one. If X n = &,&,+I, show that {X,, n >1) is a sequence of uncorre-
lated r.v.s with bounded variances and that ( l l n ) x i = l XI, + 0, a.e. and in
mean. (Use Theorem 2.3.4.)

37. If X, = EL=, I a1 < 1, aiid the ~k are i.i.d., E ( E ~=)


D -
0, E ( E ~
=)1, aiid if = E;=, nk-'€I,, show that X, = X,, that X; + y2
a.e., and that E ( Y 2 ) = (1- a 2 ) - I ; note that Y is not a constant r.v. Deduce
that (1112)x r = l xZPl+ Y' a.e., and that E [ ( l / n ) xZ] (1 a')-'.
+ -

Does the latter sequence of r.v.s also converge t o y2 in distribution? [Recall


Cram&-Slutksy Theorem from Exercise 2.11.]

38. Let {Xt, t >0) be a sequence of r.v.s which satisfy the first-order
(stochastic) difference equation where { E ~ t, >
0) is an i.i.d. sequence with
E(E~ =)0,O < E(E:) = r2 < 00,

Using this expression, one gets by iteration:

To determine the convergence properties one needs t o consider the cases that
(i) n < 1, (ii) la1 = 1, aiid (iii) la > 1, where (for reference) the correspond-
ing processes are termed stable, unstable, and explosive, respectively. Since
Exercises 407

E ( X t ) = 0 and VarXt = x i = l a2(k-1), it follows that limt,, VarXt < cm


iff a1 < 1. Since VarXt = (1 - a Z t ) / (-l a2)if a! # 1, = t if a = 1, let
g(n; a ) = [n/(l-a2)]p1/2 if I a < 1; = n/2/2if I a = 1; and = laln(a2-1)p1/2
if I a1 > 1. Then we can establish the followiiig useful fact: If Vn = Cy=,X:
with X t given the above difference equation shows that we can coiiclude
D
g(n; a)-'V, = W, + V as n + cc and P [ V = 01 = 0. [The result is not
entirely easy. It uses the SLLN, the Cramkr-Slutsky calculus, and Donsker's
invariant principle. The details were given in the first edition, but can be tried
here.]

39. Let h, be the least squares estimator of a of the first order model as
given by Eq. (2) of Section 6 under the same conditions. Show that if 1 a1 > 1
and g(n; a ) = I aln/(a2- I)'/', then g(n; a ) ( & ,- a ) 2 V and that the limit
d.f. namely of V, depends on the common distribution of the errors E,. [Thus
V is Cauchy distributed if the E, are N(0, I).]

40. Use the method of proof of Theorem 6.1 (i.e., employ Theorem 4.3.2),
and complete the proof of the case a ! = 1 of that result. With a similar
method find the limit d.f. of 6, if a! = 1 and note that this is different from
that of the case a ! = 1 [the norming factor g(n; a) being the same]. A similar
statement holds for a = -1 also. [The computations are not simple and need
care.]

41. Proposition 6.2 admits the following extension. Let {U,, n 1) be a>
sequence of m-dependent r.v.s with E(Un) = 0, E(u:) <
M < cm, n 1. If >
Sn = x i = l Uk, then S,/a(S,) 2 2,where Z is N ( 0 , l ) distributed, provided
that (EL=, Var U ~ ) / ~ ~ ( S=, )O(1) and that the Liiideberg coiiditioii holds,
i.e. for each E > 0

where the Fkis the d.f. of Uk. [This is a special case of a result due t o S. Orey.]

42. To establish an analog of Theorem 3.liii for a multiparameter estima-


tion problem, it is necessary t o extend Theorem 4.3.2 t o higher diinensioiis as
a first step. We present such a result here. Let ( X I ,Yl, Xz, Yz) be integrable
r.v.swith P[Y, > 01 = l , i = 1,2. Let H(zl,zz) = P[X,/Y, < z , , i = 1,2] be
the joint distribution of the ratios shown. If 4 is the joint ch.f. of the vector
( X I , Yl, X 2 ,Y2), show that the joint density h = a2H l a x l a x 2 of the ratios
Xi/Y,, i = 1 , 2 , is given, on assuming that 4 is twice differentiable, by
408 5 Weak Limit Laws

whenever the integral is uniformly convergent for ( x l , 2 2 ) in open intervals of


iR2.
43. An extension of the m-dependent central limit theorem when m is
not bounded is as follows. Let {Xkj, 1 < <
j jk,k > 1) be a sequence of
+
sequences of mk-dependent r.v.s with means zero and 2 6,6 > 0, moments.
Suppose that (a) ~ u pE(I ~ xij
, ~'+') < m, (b) V a r ( ~ : = , + , Xkj) 4 ( E - i)Ko
for all i < l and k (integers > 0), (c) limkioo(l/jk) V a r ( ~ : ; ~X i j ) = a > 0
exists, and (d) limk+o; rn:+'''')/jk = 0. Then show that xCl X i J / & 3 2,
where Z is N(0, a ) . [The proof of this result is a careful adaption of the m-
dependent case; cf. Proposition 5.6.2. Here condition (d), which is trivial if
mk = f i (fixed), is essential for the truth of the result. This extension is due
t o K. N. Berk.]
Part I11 Applications

This part uses the theory developed in the preceding two parts, and
presents different ideas based upon them. Chapters 6 and 7 are short but
the former includes the relatively new concept of stopping times and classes
of (central) limit theorems for dependent sequences. This leads t o the in-
troduction of ergodic sequeiices and the Birkhoff's theorem, and the strict
stationarity concept. The work motivates a glimpse of stochastic processes
proper. We consider the key classes related t o Brownian motion (its quick ex-
istence through random Fourier transform) and the Poisson measures leading
t o a brief analysis of general classes of additive processes. These are used t o
show how various classes of families arise in applications. They include strong,
strict and weak stationarities as well as the corresponding strict, strong and
weak harmonizabilities. The key role of Bochner's V-boundedness principle is
discussed. Numerous problems of interest, relating t o queueing, birth-death
processes, generalized random measures, and several new facts are discussed
as applications with extended sketches. As a result, this part and particularly
Chapter 8 received the largest amount of new material in comparison with
the earlier presentation.
Chapter 6

Stopping Times, Martingales, and Convergence

A iiew tool, called a stopping time transformation, is introduced here; it plays


an important role for a refined analysis of the subject. With this, some prop-
erties of stopped martingales, the Wald equation, the optional sampling the-
orem, and some convergence results are given. This work indicates the basic
role played by the iiew tool in the modern developinelits of probability theory.

6.1 Stopping Times and Their Calculus


The concept of a random stopping originated in gambling problems. A typical
example is the following. Suppose that a gambler plays a game in succession,
starting with a capital X I , and his fortunes are X 2 ,X3, . . .. At game n (or at
time n ) he decides to stop, based o n his present fortunes X I , X 2 , . . . ,X n and
on no future outcomes. Thus the n will be a function of these fortunes, which
means that the r.v. n satisfies {w : n(w) <
k ) E a ( X 1 , .. . , X k ) ,k = 1 , 2 , . . . .
We abstract this idea and introduce the new concept as follows.

>
Definition 1 Let {.En,n 1) be an increasing sequence of a-subalgebras
of C in a probability space ( R , C , P). Then a mapping T : R + W U {oo) is
called a stopping t i m e (or an optional or a Markov t i m e ) of the class {.En,n >
1) if for each k E N,[T = k ] E Fk,or, equivalently, [ T <
k ] E Fk,[or
[T > k ] E F k . A sequence {Tn,n >
I)] of stopping times of the fixed class
> >
{.En,n 1) is termed a stopping t i m e process if Tn 5 Tn+l, n 1. The family
>
F ( T ) = {A E .Fa= a(& .En) : A n [T = k] E Fk, all k 1) is known as the
class of events prior to T .
In this definition, if P [ T = +oo] = 0, then T is a finite stopping time, and
if this probability is positive, then it is nonfinite (or extended real valued). In
general a linear combination of stopping times is n o t a stopping time. Because
of their use in our work, we detail some properties in the following:
412 6 Stopping Times

Proposition 2 Let {F,,n >


1 ) be a n increasing sequence of a-
subalgebras of (R,C , P ) and {T, TI, T 2 , .. .) be a collection of stopping t i m e s
>
of {F,,n 1 ) . T h e n the following statements hold:

(i) F ( T ) i s a 0-algebra, and i f [T = n] = R,t h e n F ( T ) = F,, F ( T ) c


Fm.
+ >
(ii) max(T1, T2), min(T1, Tz), and TI T2 are {F,,n 1)-stopping times.
>
(iii) T i s F ( T )-measurable and if T T , T being F ( T )-measurable implies
7 i s a n {F,,n > 1)-stopping t i m e .
(iv) If TI I Tz, t h e n F(T1) c F(T2).
(v) IfTl,T2 are a n y stopping t i m e s of {F,,n >
I ) , t h e n {[Ti I Tz], [T2 I
Ti], [Ti = T2I) c F ( T i ) n F(T2).
(vi) More generally, lim inf, T,, lim sup, T, are stopping t i m e s of {T,),>l

>
and if {T,, n 1 ) is a m o n o t o n e sequence of stopping t i m e s of the s a m e F,-
family, t h e n limn sup, T, = T i s a stopping t i m e , and limn F(T,) = F ( T ) .

Proof Consider (ii). Since

aiid
[ i n i i i ( T l , T z ) < n ] = [ T l I n ] U [ T z < nEF,,
] n>l,
we deduce that min(Tl, T2) and max(Tl, T2) are stopping times of {F,,n 1 ) . >
Note that the argument holds for sup, T,, inf, T, also for sequences. Next

+ >
so that Tl T2 is a stopping time of {F,,n 1 ) ; aiid (i) and (iii) are simple.
<
For (iv), let A E F(T1). Since Tl Tz, A can be expressed as A = An[Tl <
Tz].Thus it suffices t o show that the latter set is in F ( T 2 ) . But by definition,
we need to verify that A n [TI < <
T2] n [T2 n] E F,, n >
1. Now for a n y
>
stopping times TI, T2 of {F,,n 1 ) we have

A n [TI I Tz] n [Tz I n] = A n [TI I n] n [Tz I n]


n[min(T,n) 5 min(T2,n)]. (2)
But by (ii), min(Ti, n) is a stopping time of {F,,n > I ) , i = 1,2, and

if x <
n , = fl if z > n. Thus for all z >
0, min(Ti, n) is F,-measurable, so
that the last set of (2) is in F,. Since A E F ( T l ) , the first set is in F,. But T2
6.1 The Calculus 413

is an {F,,n > 1)-stopping time. Thus the middle one is also in F,, so that
A E F(T2). This proves (iv) a little more generally than asserted.
Regarding (v), the argument for (2) above with A = 0 shows that [TI 5
T2] E F ( T 2 ) , aiid hence its coinpleineiit [TI > T2] E F ( T 2 ) . On the other
hand, min(Tl, T2) is a stopping time by (ii) aiid is measurable relative to

by (iv). Hence [min(Tl,T2) < Ti] E F ( T i ) , by the first line of this paragraph,
and [min(Tl,T2) = T2] E F(T2) + [TI = T2] and [TI < T2] belong to F(T2).
By interchanging 1 and 2, it follows that these events are also in F ( T l ) , and
hence are in their intersection.
Finally, for (vi), siiice liinsup, T, = infk sup,2k T, aiid by (ii) sup,,k T,
>
is a stopping time of {F,,n I ) , it follows that lim sup and lim inf a n d lim,
if it exists, of stopping times of IFn,n > 1) are again stopping times of the
same family. Let Tn(w) + T(w), w E 0. If Tn 1' T , then by (iv) a(U, F(T,)) c
F ( T ) . To show there is equality, we verify that each generator of F ( T ) is in the
left-side 0-algebra. Let A E F,, and coiisider An [ T > n], which is a generator
<
of F ( T ) , siiice this is just A n [ T nIC. Now A n [ T > n] = A n [Tk >
n] E a ( U k F ( T k ) ) . Thus F ( T ) C a(Uk,lF(Tk)). Next let T[ J, T. Then
nk21 F ( T k ) > F ( T ) . For the reverse inchsion, let A E F ( T k ) for all k >
1
(i.e., is in the intersection). Then we have, for n 1, >

Hence nk,l F ( T k ) c F ( T ) , aiid in both cases limk F ( T k ) = F ( T ) . This fiii-


ishes the proof.

Remark The same coiicepts are meaningful if IFn,n >


1) is replaced
>
by { F t , t 0) with Ft c Ft1 for t < t'. Now Ft+= n,,,
Fs 3 Ft with
a (possibly) strict inclusion. The equality must be assumed, which means
a "right continuity" relative to inclusion order; similarly left continuity is
defined. These problems do not arise in the discrete case, so that the above
proposition is true in the continuous index case only if we assume this right
order continuity. Thus the theory becomes delicate in the coiitiiiuous case,
and we are not going into it here.
However, the following deduction is immediate (for continuous time also):

Corollary 3 Let { F t , t > 0) be an increasing family of a-algebras of


(R,C ,P) and TI, T2 be its stopping times. Then

where F ( T ) denotes the 0-algebra of events prior to T , as in the proposition.


414 6 Stopping Times

Proof Since min(Tl,T2) is a stopping time <


T i , i = 1,2, it is clear
that F(min(T1, T2)) c F(T1) n F(T2). For the opposite inclusion, let A E
F(T1) n F(T2). Then

A n [min(Tl,T2) < t] = (A n [TI < t]) U (A n [T2 < t]) E Ft,


since A E F ( T i ) , i = 1,2, and the definition of F(T,) implies this. Hence
A E 3 ( m i n ( T l ,T2)),as asserted.

A standard and natural manner in which stopping times enter our analyses
may be illustrated as follows. Let X I , X 2 , . . . be a sequeiice of r.v.s and A c R
be an interval (or a Bore1 set). The first time the sequeiice X, enters A is
clearly
TA =inf{n > 0 : X, E A), (3)
where TA = +cc if the set { ) = 0.If Fn = a ( X 1 , . . . ,X,), then we assert
>
that TA is a stopping time of {F,,n 1) (or of the Xn-sequence). Since

and these sets belong to Fk,it follows that TA is an r.v. and is an IFn,n 1)- >
stopping time. It is called the debut of A. If {X,, F,, n >
1) is an arbitrary
adapted (i.e., X, is 3,-measurable and 3, c 3,+1) sequence and T is an
>
{F,,n 1)-stopping time, we define XT : 0 + t o be that function which
is given by the equation

X ~ ( w )(w), w E [ T < oo]


(XT (w) =
lim sup,X,(w), w E [ T = +oo].

Then XT is an extended real-valued measurable fuiictioii as a composition


of the two measurable functions {X,, n >1) and T . It is always an r.v.
if T is a finite stopping time (and extended valued otherwise). Similarly, if
T, = min(T, n), then {T,, n > 1) is a stopping time process of IFk,k 1) >
and {XT,,, n > 1) is a new sequence of r.v.s, obtained from the given set
{Xn,n > I ) , by the stopping time transformations. Clearly this definition
>
makes sense if {T,, n 1) is, more generally, any finite stopping time process.
It is also important to note that XT is F(T)-measurable, since
6.2 Wald's Equation 415

Thus, [XT < x] E 3 ( T ) , x E R (cf. Definition 1). Hence {XT,,, 3(T,), n > 1)
is an adapted sequence. It should be noted that we are fully using the linear
ordering of the range space of the T . If it is only partially ordered, then the
arguments get more involved. Here we consider only the simple case of N.If
the range of T is finite, theii T is called a simple stopping time. [If the range
is in IW+, and it is bounded (or finite) theii T is a bounded(simp1e) stopping
time.] Now we present an application which has interest in statistics.

6.2 Wald's Equation and an Application


The following result was established by A. Wald for the i.i.d. case and is use-
ful in statistical sequential analysis (cf. Wald (1947)). As seen later, it can
be deduced from the martingale theory using stopping time transformations
(cf. Problem 6). However, the present account motivates the latter work, and
illuminates that transformation.

Theorem 1 Let {X,,n >


1) be independent r.v.s on (R, C , P) with a
common mean, supn E(I X, ) < cm, S, = X k ,3, = a(X1, . . . , X,), and
>
n 1)-stopping time. If E ( T ) < oo ( so that T is a finite time),
T be an {3,,
then
ST) = E ( T ) E ( X l ) . (1)
If, further, the X, have mean zero and a common variance, then

E(s;) = E ( T ) Var XI. (2)


Both ( I ) and (2) are deduced from the following observation of some in-
dependent interest. It appears in Neveu (1965) as an effective tool.

>
Lemma 2 Let {Y,, 3,,n 1) be an adapted integrable sequence of r.v.s,
and T be a bounded IFn,n >
1)- stopping time. Then setting Yo = 0, we have

Proof Let T, = min(T, n ) , which is a stopping time of IFk,k > 1) by


< < <
Proposition 1.2. If YA = YT,,, T no < cm, and 0 n no, we have
416 6 Stopping Times

Hence on noting that [ T > k] E Fk,one has, from (4) by adding for 0 In <
no 7

which implies (3) since min(T, no) = T and YAo = YT, by the boundedness of
T.

Proof of Theorem 1 First let Yn = Sn in the above lemma, with Fn =


a(X1, . . . , X,). Then Sn is Fn-adapted, and hence

E3"(Sn+1)= EFn(S,) +E~"(X,+~)


= Sn + E ( X n + l ) (since Fn and Xn+l are independent)
= Sn + E(X1) (by hypothesis). (5)
Hence (3) becomes, if Tn = min(T, n ) again,

Letting n + oo, by the monotone convergence on the right and the dominated
convergence on the left in (6), we get (I), since ST,,+ ST a.e., ST,, IIST1 ,
and ST is integrable. In fact,

(since [ T > n] E FnPl,


which is independent of X n )

For (2), let us set Yn = Si,Tn = min(T, n ) , and Fnas given. Then Yn+l =
S; + x;+, + 2Xn+1(XI + . . . + X,), so that (3) becomes
6.2 Wald's Equation

[since E ~ ~ ( X =
~ E(Xk+1)
+ ~ ) = 0,
by independence and the vanishing mean hypothesis]
= Var X I . E(Tn) (as before). (7)

However, ST,,can be written as follows [cf. (4)]:

By iteration, if n > m, we have

+, Sr,
E (ST,, - )2 = P[T > k
E (x:+~) +1 (by independence)
k=m
= Var X1 . [E(T,+l) - E(T,)] +0 as n , rn + oo.

Since ST,, + ST a.e., this shows the convergence is also in L ~ ( P ) .Thus


limn,, E(S$?,) = E(s$). Letting n + oo in (7) and using this, we get (2).
This completes the proof.

Remark If {X,, n >


1) is an i.i.d. sequence with two moments finite,
then the hypothesis is satisfied for (1) and (2). Since {S,,F,, n >
1) is a
martingale if E ( X , ) = 0, part of the above proof is tailored t o proving that
{ST,, ,F(T,), n > 1) is also a martingale. This result is proved in the next
section. However, the above argument did not use martingale theory and is
self-contained.

As a companion t o this theorem, we present a renewal application. It deals


with the behavior (or fluctuation) of the partial sums of i.i.d. positive r.v.s and
corresponds t o the first renewal (or replacement) of the lifetimes of objects
(such as light bulbs).

Theorem 3 Let {X,, n >


1) be a sequence of i.i.d. positive r.v.s such
that E(X1) = p > 0. If S, = C r = l X k ,a > 0, and

> >
then Ta is a stopping time of (a(X1,. . . , X,), n I ) , E ( T ~<) oo, k 1, and
one has the renewal assertion, lim,,, E ( T a ) / a = p p l , where p p l = 0 if
p = +oo. Further, for each a > 0, we have the bounds

1
-E(Ta) I a/E(rnin(Xl, a ) ) I E(Ta).
2 (9)
418 6 Stopping Times

Proof Let Fn = a ( X 1 , .. . , X,). Then it is evident that

Thus T , is a stopping time of IFn,n >


1 ) . To see that T , has all moments
>
finite, we proceed as follows. For each integer m 1, consider the independent
r.v.s formed from { X n , n 1): >

Since the X i are i.i.d., so are the S i , j = 1 , 2 , . . . . Next let p = P [ S i a]. <
Then the fact that P I X 1 > 01 > 0 implies that p > 0. Also p < 1 if rn is chosen
large enough. This is because ( l l m )ELl X X ,+ E ( X 1 )> 0 a.e. by the SLLN,
and hence xzl
X X ,+ oo a.e. as rn + oo.Thus for any a > 0 , P [ S i > a] > 0
if m is large enough. We fix such an m, so that 0 < p < 1. Consider now for
jm <n < ( j l)m, +
P[T, > j m ] < P [ S 1 < a , S2 < a , . . . , Sjm-1 < a]
< P [ S i < a , S i < a , . . . , SiPl < a ]
- pj-l (by the i.i.d. property of the S ; )
- p[nlml-l 5 p(nlm)-2 = (pllm)np-2
(with [nlm]for the integral part of nlm). (10)
Hence for any integers no > 1,

Let 0 < t < - logpllm, where m and p are chosen for (10). Now we take no
>
large enough so that k log n < tn for n no. Then (11) becomes

E(T;) < x
no

n=l
n"[Ta = n] + x
n h o
e t n p 2 ( ( p 1 ' m ) n [by ( l o ) ]

Since k >
1 is an arbitrary integer, (11') shows that Ta has all moments finite.
It also implies that for each a , T a < oo a.e.
For the second part, since p > 0 , let 0 < a! < p. Set X,' = X i if X i ao, = <
>
0 otherwise. Then { X A , n 1 ) are i.i.d. bounded r.v.s, with
6.2 Wald's Equation 419

where F is the d.f. of X I . Thus choose a 0 such that a < E(X1) < p. If
SA = C r = l Xi, and TA = inf{n >
1 : SA > a}, then by the first part TA is a
stopping time with E(TA) < cm. Hence by ( I ) ,

Since by definition, Sh < S,, we get T, < TA, and theii (12) implies

Hence
lim sup E(T,)/a Il/a.
a-oo

Since n < p is arbitrary, limsup,,,[E(T,)/a] < l / p . If p = +cm, this gives


lim,,, E(T,/a) = 0. If 0 < p < cm, theii

Hence using ( I ) , we get

Thus by (14) liminf,,, E(T,/a) >


l / p . This and the earlier inequality of
(13) yield lim,,, E(T,/a) = 1/p.
The last statement follows from (14) on setting X: = min(Xn,a ) , and T l
is defined with S i = X t , the X[ being i.i.d. [cf. (12) with a 0 = a] :

Since T[ > T,, (15) implies (9), and this completes the proof of the theorem.

The limit statement that lirn,,, E(T,/a) = pP1 > 0 is called a renewal
theorem. Many extensions and applications of the renewal theorem have ap-
peared; for an account of these we refer the reader to Feller (1966).
420 6 Stopping Times

6.3 Stopped Martingales


The availability of stopping time transformations enabled martingale analysis
to enrich considerably. A few aspects of this phenomenon are discussed here.
As indicated at the beginning of Section 3.5, a martingale is a fair game
and a submartingale is a favorable one. If the player (in either game) wishes to
skip some bets in the series but participate at later times in an ongoing play
because of boredom or some other reason (but not because of clairvoyance),
then the instances of skipping become stopping times and it is reasonable t o
expect the observed or transformed process to be a (sub-) martingale if the
original one is. That this expectation is generally correct can be seen from the
next result, called the optional stopping theorem, due to Doob (1953). This is
presented in two forms, one for "stopping" aiid the other for "sampling."
If {X,, F,, n > 0) is an adapted sequence of r.v.s and {V,, n > 0) is
another sequence with Vo = 0 such that for each n > 1,V, is 3,-1-adapted
and F0= (0, Q), then let

It is clear that {(V . X),, F,, n> 1) is an adapted sequence. The (V . X),-
sequeiice is called the predictable transform of the X-sequence, aiid the V-
sequeiice itself is termed predictable, since the present V, is already deter-
mined (i.e., measurable relative to FnP1, the "past" 0-algebra). Thus the pre-
dictable sequence transforms the increments (Xk- X k P l ) into a new sequence
>
{(V . X),, F,, n 1). If the X-process is a martingale and the V-sequence is
bounded, then the transformed process (1) is also called a "martingale trans-
form," aiid the increment sequeiice {XI, X k P l , k
- > I), is sometimes termed
a "martingale difference sequence," for the discrete tiine index t.
>
If T is a stopping time of {F,,n I), and V, = X[T),], then

is a bounded (in fact (0, 1)-valued) predictable sequence. On the other hand,
if {V,+l,F,, n > 1) is any {O,l)-valued decreasing predictable sequence, it
arises from a stopping time T of {F,,n > 1). In fact, let T = inf{n > 1) :
Vn+l = 0), where inf(0) = +cm.Then [T = n] E F,, and [V, = 11 = [ T = n],
so that T is the desired stopping time.
If {X,, F,, n > 1) is an adapted process, and T is a stopping tiine of
{F,,n > I), then the adapted process {Y,,F(T,),n > 1) is called the
transformed X-process by T , where T, = min(T,n) (a stopping time by
Proposition 1.2ii) and Y, = XT,, . If V, = X[T),], then X . V is written
xT,called a stopped process. This is a special form of (1). The problem
considered here is this. If {X, Fn),>lis a martingale, and T is a stopping
tiine of {F,,n > I), when is the transformed process {XT,,, F(T,)),)l also
a martingale? Without further restrictions, {XT,, ,3(Tn))n)l need not be a
6.3 Stopped Martingales 421

martingale, as the following example shows. Let {X,,n > 1) be i.i.d. with
mean zero. If S, = X k ,3, = a ( X 1 , . . . , X,), then {S,, 3,,n > 1) is a
martingale. Consider
T =inf{n >I : S, > 0). (2)
Then T is an {F,,n > 1)-stopping time, and

since S, > 0 on each set [ T = n]. But T > 1 because, by the queueing aspects
>
of the random walk {S,, n 11,P[sup, S, > 01 = 1 (cf. Theorem 2.4.4). If the
transformed process {S1,ST)by {I, T ) is a martingale relative t o {TI,3 ( T ) ) ,
then we must have 0 = E(S1)= E(ST),and this contradicts (3). Incidentally,
this shows that, in the Wald equation (1) of Section 2, E ( T ) = +cc must be
true, so that the expected time for the random walk t o reach the positive part
of the real line is infinite!
We start with the followiiig optional stopping assertion:

Proposition 1 If {X,,3,,n > 1) i s a martingale, {V,+1,3,,n > 1)


is a predictable process such that (+){(V . X ) , , F n , n > 1) c L1(P), t h e n
(+) i s a martingale. I n particular, this i s true if the V, are bounded. S o for
a n y stopping t i m e T the stopped martingale process X T i s again a martingale.

Proof This is immediate since by ( I ) , Y, = ( V . X), is 3,-measurable,


and Y, is integrable for each n , so we have

(V,+l being &-measurable)

= Y, a.e. (X, being a martingale). (4)

If the V, are also bounded, then the integrability hypothesis is automatically


satisfied, and in case V, = X[T>,] this condition holds. The result follows.
-

R e m a r k s 1. The above proposition is clearly true if the X-process is a sub-


(or super-) martingale, with a similar argument. Further, taking expectations
in (4) one has for the submartingale case [since Y, = XT,,, T, = min(T, n)]
6 Stopping Times

(since [ T = k] E Fkand we used the


submartingale property of the X n )

There is equality in (5) throughout in the martingale case.


2. It is of interest t o express xTmore explicitly. By definition,

We now present the famous optional sampling theorem, generalizing the


above result.

>
Theorem 2 (Doob) Let {X,, Fn,n 1) be a submartingale and {T,, n >
>
1) a stopping time process of {F,,n 11, where P[T, < oo] = 1. If Y, = XT,, ,
suppose that (i) E ( Y 2 ) < cc and (ii) lim inf,,, E(X,+XITk>,I)= 0, k > 1.
>
Then {Yn,F(Tn),n 1) is a submartingale. I n particular, if the Xn-process
is a positive (super) martingale, then (i) holds, (ii) can be omitted, so that
>
{Y,, F ( T n ) ,n 1) is a (super) martingale. The same conclusion obtains for
the given X,-process (not necessarily positive) if either T, is bounded for each
n or X, 5 E ~ ~ I (forZ )some Z E L1(P),n 1. >
Proof It is sufficient t o consider a pair of stopping times, say S and
<
T , with S T , for which (i) aiid (ii) are satisfied. We need t o show, since
Xs is F(S)-adapted and F ( S ) c F ( T ) , by Proposition 1.2, that E(I Xsl) <
oo, E ( X T ) < oo aiid
>
E ~ ( ~ ) ( xxS ~ )a.e. (7 )
Indeed, let A E F, aiid consider, for the integrability of Xs,

(since {X,, F,),> is a submartingale and [S> n] E F,)


6.3 Stopped Martingales

(by iteration)

Since X , < X?, we can use (ii) with Tk = S, in (8), so that (letting e + oo)

Thus if A = f l , aiid n = 1 (so that [S > n]= fl) in (9), we get

using (i). Since E ( . ) is a Lebesgue integral, this implies E ( Xs 1 ) < oo. Simi-
larly E(I XTI) < oo. Also, (7) follows from (8) aiid (9). In fact, if Al E F(S),
<
then A1 n [S= k] E Fkand S T implies [S= k] C [ T k]. Hence letting >
A = A1 n [S= k] in (9) and replacing S by T there, we get

Hence summing over k = 1 , 2 , . . . , we get

which is (7).
For the second part, if X n is a positive martiiigale or supermartingale, (ii)
is unnecessary [cf. (8)], and we verify (i). The positivity implies

with equality in the martiiigale case. Now for any stopping time T of (F,, n >
I}, min(T, k ) is also one, aiid by Proposition 1, {XI,Xmin(T,k)}is a martingale
or a supermartingale for IF1,F(min(T, k))), so that
424 6 Stopping Times

But limk,, X,i,(T,k) = XT a.e. Thus by Fatou's lemma, (10) implies


E ( X T ) < oo, which is (i).
For the last part, clearly (i) and (ii) both hold if T is bounded. On the other
hand, if the submartingale is closed on the right by 2,then (by definition)

is again a submartingale. So also is {X;, F,, 1 <


n < oo, E3- ( Z + ) ,
and hence E(X;) < <
E ( Z + ) E ( ZI) < GO. Since by the preceding, if T$ =
min(T,, k), then T; I and these are bounded stopping times so that
{XQ , F(T$), k >
1) is a subinartingale, we have

(by the subinartingale property and X? = X+ )


Tt

Now letting k + oo on the left, using Fatou's lemma, we get E(I XT,, I) < oo.
But

Hence (ii) is also true. This proves the theorem.

The last part of the above proof actually gives a little more than the as-
sertion. We record this for reference. The bound follows from (11).

>
Corollary 3 Let {X,, F,, n 1) be a submartingale. If sup, E ( X, 1 ) <
> <
oo, a n d T i s a n y stopping t i m e of {F,,n I ) , t h e n E ( X T ) 3sup, E ( X,I).
>
I f the submartingale i s uniformly integrable and {T,, n 1) i s a stopping t i m e
> >
process o f {F,,n I ) , t h e n {XT,,, F(T,), n 1) is a submartingale.

As an application of the optional stopping results we present a short deriva-


tion (but not materially different from the earlier proof, though the new view-
point is interesting) of the martingale maximal inequality (cf. Theorem 3.5.6).
6.3 Stopped Martingales 425

Theorem 4 Let { X k , F k ,1 < k < n) be a submartingale and X E R.


Then
r 1 r

X P minxk < X
[Kn

Proof Let TI = inf[k


]>
>
E(X1)-
1.mink 5 ,, X k 2x1
(14)
1 : X k > A], with TI = n, if the set [ I = 0.
Xn dP.

Let T2 = n. Then Ti <


n and both TI, n are (bounded) stopping times of
{Fk, 1 <
k <n}. Thus {XT,,3(T,)}: is a subinartingale by Theorem 2.
>
Hence E(XT, 13(Tl) XT, a.e. Since Ax = [XTl > A] E 3(T1), we have

The second inequality is similarly proved, and the details are omitted.

We now briefly discuss iinprovements obtainable for the martingale coii-


vergence statements using the ideas of stopping times. Here we present a proof
of the martingale convergence without the use of the maximal inequality (but
with the optional stopping result instead, i.e., with Theorem 2 above). This
further illuiniiiates the theory. Here is the aiinouiiced short proof of the mar-
tingale convergence.

>
Theorem 5 Let {X,, Fn,n 1) be an L1(P)-bounded martingale. Then
X, + X, a.e. and E ( X,) <
liminf, E(I X , ) .

Proof By Lemma 3.5.5, which is purely measure theoretic and does iiot
involve any martingale convergence, Xn = xi1) xi2),
- and {x:), Fn,n 1) >
is a positive martingale, i = 1,2. Thus for this proof it may be assumed that
X, > 0 itself. We give an indirect argument.
Suppose that a positive martingale does iiot converge a.e. Then there exist
0 < a < b < ce such that the following event,

must have positive probability. Let us define a stopping time process {T,, n >
> >
I} of IFn,n I} such that {XT,,, 3 ( T n ) ,n 1) is iiot a martingale, contra-
dicting Theorem 2 and thereby proving the result.
6 Stopping Times

Let To = 1, and define {T,, n > 1) as follows:

and inductively set for k >1


TZk(w)= inf{n > T2k-1 (w) : X, (w) < a),
T z k + l ( w ) = i n f { n > T 2 k ( ~ ) : X n ( ~ ) > bw
),E R .

But P ( A ) > 0 by assumption. Thus {T,, n >


1) is an increasing sequence of
functions with T,(w) + co, w E R. Since [Tk = n] is determined by X I , . . . , X,
only, and the sets in braces are measurable, it follows that [Tk = n] E F,, so
> >
that {T,, n 1) is an {F,,n 1)-stopping time process. Moreover, TZk< cc
implies XT,, < a , and similarly, XT2,+, > b if T2k+l < oo.Hence

However, by the optional sampling theorem (cf. Theorem 21,

is again a positive martingale, so that in particular the expectations are con-


<
stant. Hence 0 E ( X T h )< oo and is the same for all k. Moreover,

Thus

since A c [TZkp1< oo].This contradiction shows that P ( A ) = 0, and we must


have X n + X, a.e. The last statement is a consequence of Fatou's lemma.
This completes the proof. The convergence of L1 (PI-bounded sub- (or super-)
martingales follows from this, as before (c.f. Theorem 3.5.11).

The argument here is due t o J. Horowitz. We can extend it, even without
the optional stopping result and by weakening the integrability hypothesis
to include some non-L'(P)-bounded (sub-) martingales. In this case Lemma
3.5.5 is not applicable. The generalization is adapted from one due to Y. S.
Chow, who obtained the result for the directed index sets satisfying a "Vi-
tali condition." The details of the latter are spelled out in the first author's
Exercises 427

monograph [(1979),Theorem IV.4.11. Its proof is again by contradiction. We


therefore present this result without further detail, because the basic idea is
similar t o that of the above theorem.

Theorem 6 Let {X,, F,, n >


1) be a [not necessarily L1(P)-bounded]
submartingale such that for each stopping time T of {F,,n >
1) we have
E(x$)< oo. Then X n + X, a.e., but X, may take infinite values on a set
of positive probability.

We thus end this chapter with these results using stopping times. A few
coinpleineiits are iiicluded in the exercises, as usual.

Exercises
1. Complete the proofs of the omitted parts of Proposition 1.2.

2. Find a pair of stopping times TI, T2 of a stochastic base IFn,n 1) >


such that a T l and TI - T2 are not stopping times of the base for an a! E R.

3. Let {X,,n > 1) be i.i.d. aiid X1 > 0 a.e. If S, = C;=,Xk and


c > 0, let T, = max{n > 1 : S, 5 c). Show that T, is not a stopping time of
,T, = a ( X 1 , . . . , X,), but that T, = ~ , + is
1 one, and T, = inf{n > 1 : S, > c).

4. Let {X,,n > 1) bei.i.d. aiid P I X I I > 0 ] > 0. If Sn = C;=,Xk and


0 < a , b < co, let Tab =inf{n >
1 : S, E [-a,b]). Show that P[Tab < oo] = 1
aiid in fact E ( T ~<~ oo, >
) k 1, slightly extending part of Theorem 2.3.

5. Let {X,,F,, n > 1) be a submartingale and {T,, n >1) be an inte-


>
grable stopping time process of {F,,n 1). Suppose there exists an adapted
sequeiice of positive r.v.s {Y,, Fn,n >
1) such that E3" ( Y n + l ~ [ T , > nis~uni-
)
formly bounded for all j >1 aiid that I X , <C,"=, Y , a.e. on [T, > n] for
j > 1. Then show that {XTrL,F(T,),n >
1) is a submartingale by verifying
the hypothesis of Theorem 3.2. In particular, Y, can be the absolute incre-
ments of the Xn-process.

6. Let {X,, n >


1) be an i.i.d. sequeiice of r.v.s aiid S, = C;=, X k .
Suppose that the moment-generating function AT(.) of the X k exists in some
nondegenerate interval around the origin. If T is a stopping time of

such that Ezb(ISn+l l X [T2n]) < KO < oo for all n aiid E ( T ) < co, then
show that {Y, = e t S n / ( ~ ( t ) ) nF,,
, n >
1) is a martingale and if To = 1,
428 6 Stopping Times

then {Y%, YT) is a martingale for {.El,.E(T)) and the fundamental identity
of sequential analysis obtains:

(Hint: Use the result of Problem 5 in showing the martingale property.) De-
duce, from this result, the conclusioiis of Theorem 2.1 after justifying the
differentiation under the integral sign.

7. Let X be an integrable r.v. on (R, >


C , P) and {.En,n 1) be a stochastic
>
base with C = o ( U n .En).If TI, T2 are a pair of stopping times of {.En,n I},
show that, for the martingale {X, = EFlb(X), .En, n > I}, Xmin(Tl,T2)=
E ~ (Y)( ~ =~~ ~) ( ~ ~ ~ (x),( ~ l where
> ~ 2Y) = ) ~ ~ ((x).
~ [Use
1 ) Theorem 3.2.1

Deduce that

and

(ii) F(T1) n F(T2) = F(min(T1, T2)).

(For (ii), note that A E .E(T1) n.E(T2) implies An [min(Tl,T2) I n] E .En, n >
1. The analysis needs some thought. See, e.g., the first author's book (1979),
p. 351, eq. (6).)
Chapter 7

Limit Laws for Some Dependent Sequences

This chapter is devoted t o a brief account of soine limit laws including the
central limit theorem,and SLLN, for certain classes of dependent random vari-
ables. These cover martingale increments and stationary sequences. A limit
theorem and a general problem for a random number of certain dependent ran-
dom variables are also considered in soine detail. Moreover, Birkhoff's ergodic
theorem, its comparison with SLLN, and a inotivatioii for strict stationarity
are discussed.

7.1 Central Limit Theorems


In Section 5.6 we saw a central limit theorem for m-dependent r.v.s and its
application t o a limit distribution of certain estimators. Here we present a
similar theorem for square integrable martingale increments sequences. It will
facilitate the analysis if we first establish the followiiig key technical result
based on arguments used in a classical problem. Unfortunately, the condi-
tions assumed are not well motivated except that they are needed for the
following proofs. Again limn kn = +oo will be assumed without mention.

Proposition 1 Let { X n k , F n k , l < k < kn},n > 1, be a sequence of


rowwise adapted sequences of r.v.s with F n k C Fn(k+1). Let

Yn (t) = n
k,,

k=l
(1 + itXn,), t € R, ,
= a.
Suppose that the following conditions hold:

(i) E(Yn(t)) + 1 as n + oo, t E R,


>
(ii) {Yn(t),n I} c L 1 ( P ) i s uniformly integrable, for each t E R,
P
(iii) ~ k X;, l + ~
1,
430 7 Limit Laws for Some Dependent Sequences

P
(iv) X,k are strongly infinitesimal in that maXl<k<k,, IXnkl
- - + 0 as
n + oo.

Proof It is t o be shown that the ch.f. t H ~ ( e ' ~ tends


~ " ) to t H ePt2/', as
n + oo. Consider a representation of e2" as (and this is the key observation):

1
eix = (1 + ix) exp{r(x) - -x2),
2 (1)

where r : R + C is defined by the above equation. We now assert that

This is elementary, but it needs a little care. Thus taking logarithms and
expanding ( I ) ,

Hence, for the complex number r(z),one has:

But each of the terms inside parentheses is positive if xl = a! < 1, so that


one has
r 7 2

because a , b > 0 implies a2 + b2 5 (a + b)2, SO that

since 0 5 a! 5 1. This implies (2). Replacing x by tXnk and multiplying over


< <
1 k kn, (1) becomes
7.1 Central Limit Theorems 431

eztsrl= Y,(t) exp x:~+ x


k",

k=l
~ ( t x , ~ ; )= Y,(t)Z,(t) (say),

Taking expectatioiis and letting n + GO, the left side gives a sequence of ch.f.s
that tends to the desired normal N ( 0 , l ) ch.f. [with the first term because of
(i) of the hypothesis] if we show that the second term on the right side of (4)
goes to 0 in L'(P) using conditions (ii)-(iv). This is verified as follows. By

using (iii) and (iv). This implies [by (iii)] that

Also, {eitsrb,n > I ) , being bounded, is uniformly integrable. Now by (ii)


>
e-t2/2, being a finite constant for each t E R,{ ~ , ( t ) e - ~ ' / n~ , 1) is uniformly
integrable. Hence

P
gives a uniformly integrable sequence which + 0. Thus it also goes to zero in
L1(P) by the Vitali convergence (cf. Theorem 1.4.4), completing the proof.

This result will be used to obtain a central limit theorem for martiii-
gale increments (double) arrays. Recall that an adapted integrable process
{X,,Fn, n > 1) qualifies as a martingale increments sequence iff for each
>
n 1,E~~~ (xn+l) = 0 a.e. (cf. Proposition 3.5.2). Similarly, if {Xnk,& k , 1 <
k < k,, n >1) is a double array of martingale increments sequences, and if
for each n,&k c Fnk, for k <k', then EF1l"Xn(k+l))= 0 a.e. For such a
family the followiiig result holds:

Theorem 2 Let the double array {Xnk,Fnk, 1<k < k,, n > 1) of m a r -
tingale increments satisfy th,e three conditions:

I Xnk
(ii) limn P [maxk<krL
- > E] = O for each E > 0,
432 7 Limit Laws for Some Dependent Sequences

P
(iii) X + 1 as n + m.

Then S, = c::, D
Xnk + to an r.v. which is N ( 0 , l ) .

Remark After the proof we shall indicate how (i) and (ii) are coilsequelices
of the classical Lindeberg condition. Also other forms of (iii), and some spe-
cializations of this result will be recorded.

Proof We consider a suitable predictable trailsforin of the Xnk-sequence


so that the new r.v.s satisfy the coiiditioiis of Proposition 1 aiid that the
partial sums of the Xnk aiid of the traiisforined ones are asyinptotically equal
in probability. Then the result follows from the preceding one.
Because of (iii) we define XAk = Xnk on the set [c~I: <
X:, 21, and =O
otherwise, where Xkl = Xnl. Because the above set is in F,(k-l), this becomes
a useful mapping, a special case of what was called a predictable transform,
defined in Exercise 32 of Chapter 3, and discussed again at the beginning of
Section 3. Let SA = k XAk and note that P[SA - S, >
E ] + 0 as n + cc
for any E > 0. In fact,

P
In particular, the last two inequalities of (5) show that XAk = X n k as n + cm.
This implies that {XAk, 1 k < < k,, n >
1) also satisfies conditions (i)-(iii)
of the theorem. We now assert that this transformed sequence satisfies the
hypothesis of Proposition 1.

Let Y,(t) = nizl(l+ itXAk). Then


7.1 Central Limit Theorems

Next define an r.v. Tn by the relation

with inin(@)= k,. Then T, is an integer valued measurable function, i.e.,


[T, = k] E F,,, (a stopping time of {Fnk, 1 5 k 5 k,).) Such mappings,
introduced and discussed in Chapter 6, are useful in the following analysis.
+ <
We thus have, on using 1 x ex for x 0, >

(by definition of T,),


< exp{2t2) . (1+ t 2 ~ ( ~ ; T , , ) ) (6)
(by Doob's opitional sampling theorem 6.3.2).

But by (i) the expectation on the right side of (6) is bounded by C , and hence
>
{Yn(t),n 1) is uniformly integrable, since it is a bounded set of L2(P).Con-
ditions (iii) and (ii) of this theorem are the same as conditions (iii) and (iv) of
Proposition 1. Thus we have verified that all the four conditions are satisfied
by the X~k-sequence,and hence S A + an r.v. which is N ( O , l ) distributed,
so that S, has the same limit distribution. This completes the proof of the
theorem.

Before discussing the relation of the above hypothesis (iii) to the Linde-
berg condition, it is useful to show that the assumptioils are nearly optimal
for normal convergence. More precisely, the following supplement t o the above
result holds.

Proposition 3 Let { X n k , F n k ,1 < k < k n , n > 1) be a n y adapted inte-


grable sequence of r.v.s for which the following conditions hold:

(i), (ii), and (iii) are the same as those of Theorem 2,


(iv) c;" ~ ~ ~ ( h (-X1n k P 0 as n + oo,and
) )+

(v) c::, [E31b(~-l)(Xnk)I23 O as n + oo.


434 7 Limit Laws for Some Dependent Sequences

Then S, = c;" Xnk D


+ to an r.v. which is N ( 0 , l ) distributed.

Proof Let Ynk = xnk-EFvl(k-l) ( X n k ) ,so that E ~ ~ (Y,~)


~ ( ~ - We
= 0. ~ )now
assert that {Ynk,Fnk, < <
1 k kn},>l satisfies the hypothesis of Theorem 2.
Then condition (iv) implies S, - c:" P
Ynk + 0, and hence Sn has the desired
limit distribution.
To see that (i)-(iii) of Theorem 2 hold for the Ynk-sequence, note that

Taking the maximum of both sides and noting, by hypothesis (ii), that Z, =

P
max I Ynk +0 if max (2,) 5 0.
Kk,, Kk,,

< <
To see that the last part of (8) is true, note that {E"lk ( Z n ) , F n k ,1 k k,)
is a positive subinartingale and Zn E L 2 ( P ) . Hence by Theorem 3.5.6ii, with
- 1 ) q = P/(P
p = 312, we have, if Un(k-l) = ~ ~ l ~ ( k (Zn?, 1)(= 311

(by the CBS inequality)

(by the coiiditioiial Jeiisen inequality)

P
as n + oo, since Z, + 0 and {Z,, n >I}, being L2(P)-bounded by (i) and
(ii), is uiiiforinly integrable, so that the first factor i 0 and the second one is
bounded. Hence

by (91, and this shows that (8) is true. Thus condition (ii) of Theorem 2 holds
for the Ynk even in a stronger form. [Namely, the sequence + 0 in L1(P).]
For (iii) of Theorem 2, note that by (iii) and (v) of the hypothesis,
7.1 Central Limit Theorems 435

3 1 and the second term 3 0 on the right side of (10).


so that the first term
Thus we only need to show that the last term 3 0 also. This follows by the
CBS inequality again, since

The Ynk-sequence satisfies (iii) of Theorem 2. Finally, (i) of that theorem is


verified using the argument of (8) and (9). In fact,

E (kSkrt
max lY,k )2 s k , , x?,) + 2 E ( Gink ,ax[^'^^(^-^)
< 2 E ( kmax , (X,k)12) ,

<
since (a1 +a2I2 2(a: +a;) for a, E R. But the first term is bounded, by the
present hypothesis (i), and the second term is majorized by E(maxk<k,, E311(~-1)

(Z,))', where 2, is given in (8). The inequalities (9) now establish the desired
bound, since using the notation there,

which is bounded by (i) of the present hypothesis. Thus {Ynk,.Tnk,1 k < <
k,, n > 1) is a martingale increments sequence satisfying the hypothesis of
Theorem 2. Hence, as noted at the beginning, the result follows.

Recall that the usual Lindeberg condition states for {Xnk,1 < k < kn}n21
that for all E > 0, as n + ce

This implies, in particular, that

It can be shown by examples that (12) is strictly weaker than (11). On the
other hand, condition (ii) of Theorem 2 above can be written as
7 Limit Laws for Some Dependent Sequences

Hence (12) is equivalent t o condition (ii) of Theorem 2. Thus the hypotheses of


the preceding results are weaker than those of similar limit theorems given in
many previous studies [cf., e.g., Lokve, 1963, Sec. 281. The preceding treatment
follows essentially the interesting paper by McLeish (1974).
An account with references t o earlier works, and coiiceiitratiiig on the
martingale illcremelit (double) arrays, is given in the book by Hall aiid Heyde
(1980), t o which we refer the reader for further information and applications.

7.2 Limit Laws for a Random Number of Random


Variables
The previous work on the central limit theorem has been developed around
an i.i.d. sequence {X,, n > 1) of random variables. It is of interest in some
applications t o consider a random number of such random variables. This
idea is important for the area of sequential analysis, and also plays a key
role in Section 8.4 where we analyze the (compound) Poisson process. Here a
central limit theorem aiid some exteiisions are given t o show how a change of
viewpoint from the work of the last section is desirable.
Our first assertion about the problem can be stated as follows:

Theorem 1 Let {X,,n > 1) be an 2.i.d. sequence of r.v.s on ( 0 ,C, P)


with mean 0 and variance 1. Let {N,, n > 1) be a sequence of integer valued
P
EL=,
r.v.s such that N n / n + Y, where Y is a positive discrete r.v. If Sn = Xk,

>
Remark If the set of r.v.s I N n , n 1) is independent of the Xn-sequence,
then such a result was known before in special studies in sequential analysis
and without such an independence assuinptioii but with Y = constant. It was
treated by F.J. Anscombe in the early 1950s. The result in the present form is
>
due t o Rknyi (1960). Note that {SN,,, n 1) is no longer a sequence of sums
of independent r.v.s.

Let us first present an auxiliary result, t o be used in the proof of the


>
above theorem, which is of independent interest. A sequence {A,, n 1) c C
7.2 Limit Laws for a Random Number of R.V.S 437

is called (strongly) mixing with density 0 < a < 1, if lim,,, P(A, n B ) =


a P ( B )for each B E C. A consequence of this concept is given by the following:

Proposition 2 If {A,, n > 1) c C is mixing with density O < a < 1


in ( f l , C , P ) and Q : C + [0,11 is a probability such that Q << P, then
lim,,, Q(A,) = a holds. On the other hand, if P(A,) > 0 for each n > 1,
and if for each m > 1, we have with 0 < a < 1

lim P(A, A,) =a = lim P(A,), (2)


n+m n i m

where P(A,IA,) is the conditional probability of A, given the event A,, then
{A,, n > 1) is mixing with density a .

Proof First note that if Q(.) = P ( B n .), then Q(A,) = P ( B n A,) +


a P ( B ) = aQ(L?).Thus if Q = & / & ( f l ) , so that Q << P and Q is a probability,
we get &(A,) i a . In general, by the Radon-Nikod9m theorem, let f =
dQ/dP, so that Jnf dP = 1. There exists a sequence 0 5 f , 1' f of simple
functions, and if f , = Ck"
k
a;xsl:, let

Then as m + oo we get

Letting n + oo, one has &,(A,) + &(A,) so that Q(A,) + a Q ( R )= a .


For the converse direction, let {A,, n >
1) be a sequence of events for
which ( 2 ) holds. Consider the simple functions

Note that lgnl 5 max(a, 1 - a ) 5 1. Also,

E(g,gk) = + a 2 p ( A ; n A;)
( 1 - Q ) ~ P ( An,Ak)
- a ( l a )[ P ( A ,n A:) + P(Ak n A;)]
-

= P ( A , n AI;)+ a 2 a(P(A,) + P ( A k ) )
-

+ a P ( A k )+ a 2 a ( a + P ( A k ) )= O
-
(3)
as n + oo for each k > >
1, by ( 2 ) . Let KO= sp{g,, n 1) be the linear span
and 'MI = Ro c L 2 ( P )be its closure. Thus (3) implies limn E(g,ho) = 0 for
438 7 Limit Laws for Some Dependent Sequences

each ho E K O . If h E 'MI and E > 0, then there exists a g, E 'Mo such that
I h - g,12 < E, where I f I ; = E ( f 1'). Hence

by the CBS inequality and the fact that Ign12 <1. It follows that, with
(31, limsup, E ( h g , ) < E , so that lim,,, E(hg,) = 0 for all h E 'Ml. If
'Ma = 'Mf c L 2 ( P ) , the orthogonal complement, then E ( f h ) = 0 for all
f E 'Ma, h E 'MI, and since each u E L 2 ( P ) ( = 'MI @ 'Ma) can be uniquely
+
expressed as u = u l u2, U , E 'Mi, i = 1 , 2 , it follows that limn E(ug,) = 0 for
all u E L 2 ( P ) . In particular, if u = X B , B E C, then

Hence {A,, n > 1) c C is mixing if (2) is true, as asserted


The following is a useful consequence of the above result:

Proposition 3 Let {X,, n > 1) be independent and suppose that

lim P[(S, - b,)/a,


n-00
< x] = F ( x ) (4)

exists at all continuity points x of the d.f. F, where 0 < a, 1' GO, b, E R,and
S, = EL=, X k . T h e n S; = (S, b,)/a, i s a mixing sequence with "den-
-

sity" F ( x ) , i n that for all A E C, P ( A ) > O,lim,,, P[Si < x A] = F ( x ) , x


a continuity point of F. I n particular, i f Q << P i s a n y probability, t h e n
limn,, Q[S; < x] = F ( x ) also holds at each continuity point x of F.

Proof Let x be a contiiiuity point of F such that F ( x ) > 0, and consider


A, = [(S, b,)/a, < x], n
- >
1. Since F ( x ) > 0 and lim,,, P(A,) =
F ( x ) > 0, there exists an no such that n >
no + P(A,) > 0. To show
that {An,n > no} is a mixing sequence, it suffices to establish (2) for all
m > no. Now for each k, Sk/a,
P
+ 0 as n + cm,since a, + oo. Hence
D
S: f (Sk/a,) +Y, where Y has d.f. F by the elementary part of Slutsky's
result (cf. Problem 11b in Chapter 2). But

so that this is independent of Sz, and since P ( A k ) > 0,

P
It follows that, since S k / a n + 0, for each k > no,
7.2 Limit Laws for a Random Number of R.V.S 439

This is (2), aiid thus {A,, n >


no) is (strongly) mixing for any such z by
Proposition 2.
For the last part, let &(.) = P ( Ak) for any fixed but arbitrary k no. >
Since P ( A k ) > 0, this elementary conditional probability is well defined. Let
Q be as given. Then Q << Q << P, aiid by the first part of Proposition 2
lim &(A,) = liin P(A, I Ak) = F ( z ) .
n+m n i m

This finishes the proof of the proposition.

We are now ready to present the

Proof of Theorem 1 Let pk = P [ Y = tk]> 0, k >


1, the lk being the
possible values of Y. If r: = [nlk],the integral part, then r t 1' cm as n + cm.
By the central limit theorem (cf. Theorem 5.1.2) and Proposition 3, we have
if M, = [nY],S; = Sn/fi, aiid Ak = [Y = tk],
..

lim < z] = lim PIS:; < z &]pk


77,-00 7L-00
k=l

= liin P[S:: < z (Axjpk


n i m
k=l
(by the dominated convergence theorem)

Here with independence, we used the fact that {[S*,


T",
< z],n 1) is mixing >
by Proposition 3 aiid that {Sc,n >
1) obeys the central limit theorem (and
xk21 pk = 1).On the other hand, consider the identity

P D
We note that N,/M, = (N,/n)(n/M,) + Y / Y = 1. Since SiI,, 4 2,an
N(O,1) distributed r.v., it is bounded in probability. Thus the last term of
P
(7) + 0. The first term on the right converges t o the desired limit r.v.
Thus the theorem follows (from Slutsky's result again) if the middle term
on the right side of (713 0. Since M,/N, 3 1, it suffices t o show that
-1/2 P
(SN,,- S ~ , , ) k t r , + 0. This is inferred from the Kolmogorov inequality (cf.
Theorem 2.2.5) as follows.
Let E > 0,S > 0 be given. Let Ak = [Y = tk]as before, aiid set B, =
[INn - [nY]I < nrl1,rl > 0. If Cnk = [SN,, - ST!, > then ~fi],
440 7 Limit Laws for Some Dependent Sequences

Now choose ko large enough so that P ( D k o )< 613, and then choose r/o >0
small enough so that 0 < v < 70 implies

< 613. (10)

Finally, choose no [= no(ko,qo(E) , S] such that n 2 no + P(B:) < 613. Thus if


>
n no, we have from (9) that the left-side probability is bounded by 6. Since
6 > 0 and E > 0 are arbitrary, this implies that (SN,,- S n ~ , , ) / 3 a 0, so
that (7) implies that S;,, has the liinit d.f. which is N ( 0 , l ) . This completes
the proof of the theorem.

Remark 4 It is clear from the above proof that the independent r.v.s
should satisfy an appropriate central limit theorem (e.g., Liapounov's or the
Lindeberg-Feller's) and that the i.i.d. assuinptioii is not crucial. With a little
care in the error estimates, the result can be shown t o be valid if the liinit r.v.
Y is merely positive and not necessarily discrete, since (as noted in Problem
3 in Chapter 4) it can be approximated by a discrete r.v. t o any degree of
accuracy desired.
We now sketch another extension of the above ideas if the sequence
{X,, n > 1) is replaced by a certain dependent double array of consider-
able practical as well as theoretical interest. This will also indicate how other
extensions are possible.
7.2 Limit Laws for a Random Number of R.V.S 441

The problem t o be described arises typically in life-testing situations. If


the life span of the ith object (bulb, the drug effect on an animal, etc.) of a
certain population is Zi, then it is usually found (to be a good approximation)
that Zi, i = 1 , . . . , n form order statistics from an exponential population. (In
electric bulb life problems this is indeed found t o be an acceptable approx-
imation.) Suppose that one wants t o test just a subset k of the n objects
because of cost or some other reason. Let rj be the j t h selected moment for
<
observation using a chance mechanism. Thus 1 rl < . . . < r k <n and the
ri are integer-valued r.v.s. The problem of interest is the asymptotic distri-
P
bution of { Z r , , i = 1 , . . . , k} as n + oo,where r i / n + t o an r.v. Note that
each ri(= r?) is also a function of n , so that this is in fact a double array of
(a random number of) r.v.s. We can now assert the following generalization
of the above situation. Some previous work (especially Theorem 3.3.9) is also
utilized in this analysis.

We recall the following familiar concept.

Definition 5 A random vector U = (Ul, . . . , Uk) is k-variate Gaussian


k
if for each real vector ( t l , . . . , t k ) , the linear combination V = C i = , tiUi is
N ( p k , a:), where pk = ~ f t,E(Ui) = ~and a; = ~ t titjaZj >
= ~ 0. Then the
vector v = ( v l , . . . , vk),vi = E(Ui) is called the mean and C = (aij) the
covariance ( matrix) of U.

Using this terminology, we present the following i n a h result of the section,


which originally appeared in a compressed form in Rao (1962, Theorem 4.2).

Theorem 6 Let Yl, . . . , Y, be order statistics from an absolutely contin-


<
uous strictly increasing d.f. F. Let 1 ri < . . . < r k + l <
n be integer-valued
P
r.v.s such that r i / n + qiX > 0, i = 1 , . . . , k + l , as n + oo,where X is a posi-
< <
tive discrete r.v., 0 < ql < . . . < qk+l = 1,qi = F ( Q i ) . If they,,, 1 i k + l ,
are thus optionally selected r.v.s from the order statistics Yl, . . . , Y, of F , then

where G is a k-variate Gaussian d.f. with mean vector zero and covariance ma-
trix D C D of the following description: D is a k x k diagonal matrix (dii), dii =
< <
qi/F1(Qi), and C = (aij), with aij = ai = (1 qi)/qi, i j k, i = 1 , . . . , k,
-

where k > 1 is fixed (and F1(z)= dF/dz, which exists a.e.). (A k-variate
Gaussian d.f. is given in Definition 5 above.)

The proof is presented in a series of propositions. The argument is typical


aiid is useful in several other asymptotic results. The first coilsideration is the
>
case that F(z) = 1 e p x ,z 0, the exponential d.f., aiid the r, are nonraii-
-

dom. In any event Qi is the "ith quantile" of F and Yr, is an estimator of Q,.
442 7 Limit Laws for Some Dependent Sequences

Then (11) gives the limit d.f. of the vector estimators of {Q,, 1 < i < k ) and
has independent interest in applications.

Proposition 7 Let Z1 < . . . < Zn be order statistics from the exponential


d.f. Fl given by Fl(z) = 1 e c x if z 2 0(= 0 if z < 0). Suppose 1 5 i l <
-

<
. . < ik n and lim,,,(ij/n) = qj, 0 < ql < . . . < qk < l(p, = 1 - qi); then

lim P [ f i ( Z Z J
n-00
+ logpj) < zj, 1 < j < k] = G(zl,. . . , z k ) , (12)

where G is a k-variate normal d.f. with mean zero and covariance matrix 2,
which is of the same form as C of Theorem 6, with a;' for a, there.

Proof In view of the multivariate extension theory of Section 4.5, it


suffices t o establish, for this proof, that if = ~5=,
tiZG (ti E R),then
D

and variance of in, and p, + - c:=, t j logpj, ncri -


(2, -p,)/a, + an r.v. with N ( 0 , l ) as its d.f., where p, and a, are the mean
~ = =~5 of
~ t , t,tj(iZj
the theorem, because of the Slutsky result. We fill in the details now.
Let $, be the ch.f. of 2,. Then using the form of the density of the Zi given
in Theorem 3.3.9, we get ($, never vanishes, Z, being infinitely divisible.)

where T~ = C' t,. In view of the continuity theorem, it is enough t o verify


that limn,, "=?(ulo,] = 1 uniformly in some neighborhood of u = 0. The
[Ck
desired Gaussian limit assertion then follows. (Here C l is the second derivative
of C, relative t o u.) Thus

and similarly,

For any integers h, H, 0 < h < H, and a such that s + a # 0, one has

+
where 0 < Q < 1, Q/h i 0 if (H h) (h a)-' + 0 as h + oo. Substituting
-

this in (14) and (15) and remembering that i j / n + qj as n + oo,we get


7.2 Limit Laws for a Random Number of R.V.S 443

and, after a similar simplification,

where qo = 0 (so that po = 1). Therefore it follows, since t j = 7j - 7j+l, 1 5


j < k - l , t k = ~ k that
,

uniformly in u E R. Finally, since p,, = c:=,


t j E ( Z i J ) and

where I E =
~ O((n-ij)-l), one has E ( Z i l ) + l o g p j and &(aj+logpj) +0
P
as n + oo. Hence f i ( Z i I - a j ) = f i ( Z i l +
logpj), and by the first para-
graph, this tends t o the standard normal r.v. as n + oo. The proof is finished.

The conclusion is valid in a stronger form, namely, it is true on ( 0 ,C , PB),


where PB = P(B n .)/P(B), the conditional probability for any B E
C , P(B) > 0. This is also called a "stable" convergence. In the above, it is
proved for B = fl. We sketch this strengthening here, since this is also needed.

Proposition 8 Under the same hypothesis as in Proposition 7, the limit


(12) holds if the probability measure P is replaced by PB(.) = P(B n . ) / P ( B ) ,
the conditional probability given B, where P(B) > 0; i.e., "stable" convergence
holds.

Proof The result can be established by a careful reworking of the above


argument. However, it is instructive t o use a version of Proposition 1.1 instead.
We do this with k = 1, for simplicity, in (12). By the Cram&-Wold device (cf.
Proposition 4.5.2) this is actually sufficient. Since also f i ( E ( Z t I ) + l o g p j ) + 0
as n + oo, it is enough t o show that

By Theorem 3.3.9, ZiJ = ~2::


vk/(n k +
I ) , where the Vk are i.i.d.
-

exponential or gamma distributed on R+ with E(Vl) = 1. Let X,, = +[V, -


+
l ] / ( n i 1) and S, = ~ & X,,,
- l with i j / n + qj,O < qj < 1. These r.v.s
have the following properties. Let E > 0 be arbitrary. Then
444 7 Limit Laws for Some Dependent Sequences

as n + oo,where we used the fact that ~ ( -ni j + I)/& + cm.Also,

Heiice
2,-1

[by independelice of F , k and X,(k+l)]

a,-1
t2n
2) (by iteration)
k=O

<
as in (16). Here we used the inequality l + z ex, z > 0. Heiice {Y,(t), n 1)>
is bounded in L 2 ( P ) , which implies its uniform integrability. We now verify
7.2 Limit Laws for a Random Number of R.V.S 445

that Y,(t) + to a constant weakly in L 1 ( P ) , and this gives the desired distri-
butional convergence, as in Proposition 1.1.
Thus let k' be fixed, 1 k' < <
i j - 1, and A E F n k , . Then for n k' we >
have, as in (25),

(by independelice of X, and F,p)

]n
k' 2,-1

= E
[
XA
.=o
+
n(1 itx,,:)
,=kl+l
(1 + itE(X,,,))

since E ( X n j ) = 0. Regarding r i k , , it consists of a finite number (< 2" - 1)


of terms of the form xAXnk1. . . X n k , , where 1 kj k, < < < k', with 1 <j <
<
s k'. Each of these terms satisfies the inequality

Thus (26) and (27) show that for each A E Un,,-


F n k = A,

By linearity of the expectation, (28) gives

for all f = Czl a,XA,,A, E A. If Co = o ( A ) , then the above class of step


functions is dense in L2(Co), and since Y,(t) - 1 E L 2 ( C o ) , (29) implies
that the same relation is true for all bounded f E L2(Co),and then for all
f E L ~ ( C O )This
. means Y,(t) + 1 weakly in L ~ ( c o ) .
We can now apply the result of Proposition 1.1. By using the same sym-
bolism with the present Y,(t) in Eq. (1.4), one gets for A E Co,

L e'tsrl d P = ept2l2 L Y, (t) d P + Y, (t) [Zn(t)- ept2/" dP. (30)

As noted there, Eqs. (21)-(24) imply (Z,(t) ept2l2)3 0 and {Yn(t),n 1)


- >
>
is uniformly integrable by (25). Hence { e z t s -~e~- t 2 / 2 ~ , ( t ) , n. 1) is uniformly
446 7 Limit Laws for Some Dependent Sequences

integrable. This shows that the last integral in (30)+ 0 and by (29) the right
side tends t o e p t 2 1 2 p ( A )Thus
. we have for each A E Co with P ( A ) > 0,

Since Sn = f i ( Z i J - E ( Z i J ) )the
, proposition is established.

We have included this alternative and detailed argument here because


with simple modifications it can be used if the X,, are not independent, but
satisfy the hypothesis of Theorem 1.2 or Proposition 1.3. This work can be
applied also t o certain other problems involving "stable" convergence. Fur-
ther, it should be noted that Proposition 7 is deducible from the proof of the
above result. But the separation has some methodological interest. We now
extend it t o the optional sampling problem.

Proposition 9 Let sj = [ n X j ]be the integral part, with X i = q j X , as


in Theorem 6. If ZiI is replaced b y Zsl in Proposition 8 , so that the Zsl are
an optionally selected set of r.v.s, then (12) holds if n is replaced b y sk+l there.

Proof As in the proof of Proposition 7, consider z,,+~


= c:=, t j Z S Jp, =

-c:=, ti logy,, with M given by the right side of (16). If X takes values
d l , d z ,. . . aiid S j = P [ X = d j ] ,(6, > 0 ) then

- (32)
k
where ni = [ n d I ]2;:
, = t I Z ~ n q l dLetting
,~. n + oo and using Proposi-
tion 8 and the bounded convergence theorem in (321, one gets

where G(.) is the N ( 0 , l ) d.f. This implies the conclusion.


For the next extension we establish:

Proposition 10 Let ( Z 1 ,. . . , Z,), (ql,. . . , qk+l) be as in the preceding


proposition and 1-1, . . . , rk+l be as in Theorem 6. Then

where G is as defined in Proposition 7,

Proof Let sk be defined as in the above proof, aiid consider


7.2 Limit Laws for a Random Number of R.V.S

and e ( Z s I + logp,) has a limit d.f., so that it is bounded in probability.


It follows that the last term of ( 3 3 1 3 0. The first term on the right side of
P
(33) gives the desired result by Proposition 9. The middle term + 0, and then
the result follows by the Cram&-Slutsky theorem if we show that

for j = 1 , . . . , k .
P
For this observe that rj/rk + qj/qk > 0 as n + cm,and hence r j and r k
grow at the same rate. Let d l , d2, . . . be the values taken by X , and fixing j ,
set n k = [nqjdk],the integral part. The desired coiiclusion depends on Kol-
mogorov's inequality (or equivalently here, on the submartingale inequality).
Given E > 0, S > 0, consider, with rj , sj as before,

t n B: = [I r,
~ u Ak - nkl < n6], and (for n k - Sn < l <nk + Sn) one has;
P[Ak n B: n Cnkl

.P[ inax f i l Z l
It-nk <6n
r I e

(where U = andZt
t=l
Unt,
=
e
x
I$ being i.i.d. gamma r.v.s with unit mean)

(by Kolmogorov's inequality)


t=l
448 7 Limit Laws for Some Dependent Sequences

where K O > 0 is a constant independent of n. Now choose mo such that


PIUk>moAk] < &/3, and hence (34) becomes, by (35),

( d
LHS (34) < 2K06
-
mo - 1 E
+ j + P(B').
E2
k=l (nk - dl, + 1)
+
Since mo is fixed, and n k / ( n k - dk 1) + 1, for each k, as n + cm, we can
choose 6 > 0 small enough so that the first term on the right of (36) is at
most €13. Then choose n large enough so that P(BE) < €13. Thus as n + cm,
(36) gives
liin P[2/SI;+IZTJ ZSJ > E] < E.
-
n i m

This implies the desired result.

We now extend the above proposition from the Zi to the original r.v. se-
quence; namely the Xn7s:

Proposition 11 Let X, = (XI,, . . . , Xk,), n = 1 , 2 , . . . , be a sequence of


random vectors and Q = (81,. . . , QI;) be a constant vector. Let {h,, n > 1) be
a n increasing sequence of r.v.s such that P[limn hn = +cm] = 1, and that

liin P[h,(Xl,
n-00
- 01) < 21,. . . , XI;, - 01;) < ~ k =] F ( x 1 , . . . , XI;), (37)

at all continuity points ( x l , . . . , x k ) of F. If f : B + 1 ~ 5 as continuously


differentiable function with B as a convex set containing 0 in its interior,
then one has

at all continuity points x of F , x = ( ~ 1 ,. .. , XI;) and inequality between vec-


tors i s componentwise. Here, if F of (37) denotes the d.f. of the vector
( X I , . . . ,XI;), t h e n F of (38) i s the d.f. of ( X I , . . . , X k ) D f ,where the F-by-
k matrix D = (i3fi/i3xj~,1 < i ,j < k) and prime stands for transposition.

Proof Expand f about 0 in a Taylor series,

where D is the (Jacobian) matrix given in the statement and the Euclidean
norm of the (column) vector Iq(x - 8) 1 = o ( x - 81) by elementary analysis.
Hence
hn(f (Xn) f (0)) = hn(Xn Q)D1 h n ~ ( X n 0). + (39)
Since by (37) h,(X, 0 ) is bounded in probability and h, 1' oo a.e., it follows
P P
from the stochastic calculus that X, - 0 + 0, and then h,v(X, - 0) + 0.
7.3 Ergodic Sequences 449

From this and (39) we deduce that the stated limit d.f. of (38) must be true.

We can now quickly complete the proof of the main result.

Proof of Theorem 6 Let B = R , f,(u) = F p l ( e p " ) , i = 1,.. . , k , and


:
f =( f l , . . . , f k ) in Proposition 11. Let hn = f i Q j = Zr,,
= logpj,Xjn
and qj = 1 - pj. Since F is the k-dimensional normal d.f. with means zero
and covariance matrix given for (121, it is seen that (11) is a consequence
of Proposition 11 with F = G. Note that the ordering of the Z aiid Y is re-
versed, so that the & go into the qj as given in (11). This completes the proof.

If the sampling variables {rj,1 j < <k} are assumed t o be independent


of the Y, then the proof can be simplified considerably. However, because
we have iiot made any such supposition, our proof was necessarily long. The
details are presented since this type of result is of interest in applications. If
k = 1 aiid r, = i j , r k + l = n (both nonrandom), theii the result of Theorem
6 is essentially given in Cramkr's classic [(1946), p. 3691, and Ri.nyi has also
contributed t o this problem. In the above results 0 < qj < 1 was essential and
qj = 0 gives a different (nonnormal) limit d.f. We indicate an instance of this
as an exercise (Problem 6). On the other hand the assuinptioii that the limit
r.v. X in Theorem 6 be discrete is iiot essential, although it is used crucially
in the above proof in invoking the conditional probability analysis, and it will
be interesting t o extend the result t o the general case, i.e., X > 0 is any r.v.

7.3 Ergodic Sequences


In this section we include a considerable generalization of the sufficiency part
of Kolmogorov's SLLN t o certain dependent sequences of r.v.s, called ergodic
sequences. The results are of fundamental iinportaiice in physics, and ergodic
theory has now grown into a separate discipline. Here we present the basic
Birkhoff ergodic theorem as an exteiisioii of Theorem 2.3.7 aiid also use it in
the last chapter in analyzing some key ideas of stationary type processes. Let
us motivate the concept.
If ( R , C, P) is a probability space, T : R + f l is a measurable mapping
[i.e., T p l ( C ) c C], theii T is measure preserving if P ( T p l ( E ) ) = P(E),E E
C. For such transformations, the followiiig interesting recurrence pheiioineiion
was noted by H. Poincari. in about 1890.

Proposition 1 If A E C ,T : 0 + 0 is a measure-preserving transforma-


tion on (fl, C, P), then for almost all w E A, Tn(w) E A for infinitely many
n, where P ( A ) > 0.
450 7 Limit Laws for Some Dependent Sequences

Proof We first note that the result holds for at least one n. Indeed, let
00

B = { ~ E A : T ~ ( w ) @ A forall n > 1 } = ~ { w E A : T ~ ( ~ ) @ A }
n=l

n00

= ~ nT-~(A~),
n=l
A~ = n - A. (1)

Since T is measurable, B E C . Next note that for n # m , T p n ( B )n T p m ( B ) =


0.In fact, if n > m , and w E T-n(B)nT-m(B) = T - ~ ( B ~ T - ((B)), ~ - ~then
)
there must be a point w' E B n T - ( ~ - ~ ) ( such
B ) that Tm(w) = w' E B . But
this is impossible by definition of B above. On the other hand, P(T-'(B)) =
P ( B ) for all k >
1 by the hypothesis on T . Since U;==, T p n ( B ) c fl,we have

This is possible only if P ( B ) = 0. Thus for each w E A - B , Tn(w) E A for at


least one n.
>
Now for any integer k 1, consider

Then by the first paragraph, P ( B k ) = 0, so that P ( U r = l B B = 0. If w E


An B;, then w E A n B; = Ur=l A n TPn(A), so that w E TPn(A) for
some n. Consider w E A n B i = U;==, A n Tp"(A); w E Tp"'(A) for soine
n', for each k. Thus Tn(w) E A for illfinitely many n , as asserted.
The result is clearly false if P is replaced by an infinite measure here. For
+
instance, if R = R, T(w) = w a , and p = Lebesgue measure on R, then we
get a counterexample.
An interesting point here is that almost every w of A c 0 returns to
A infinitely often. Thus ~ i = x ~:( T ' " ( w )is
) the iiumber of times w visits A
under T during the first n instances. A natural problem of interest in statistical
mechanics and in some other applications in physics is to know whether such
(recurrent) points have a mean sojourn time; i.e., does the limit

exist in soine sense? This leads immediately to a more general question. If


f E L1(P), does the limit

lim
n-oo
-
n
x
1 n-l
k=O
f (T'(.))
7.3 Ergodic Sequences 451

exist pointwise a.e. or in L1(P) mean? Thus if X is an r.v., and T is such a


transformation, let X, = X o T n . Then the problem of interest is t o consider
>
the SLLN and WLLN for {X,, n I ) , and thus study their averages:

for various transformations T . This forms the ergodic theory. Note that the
X k are not independent. Here we shall prove the pointwise convergence of
(4), which is a fundamental result originally obtained by G. D. Birkhoff in
1931. In many respects, the proof proceeds along the lines of the martingale
coilvergelice theorem, but cannot be deduced from the latter. As in the mar-
tingale case, a certain maximal inequality is needed. The proof of the latter
has been re-worked and simplified since its first appearance, and the following
short version is essentially due t o A. M. Garsia who obtained it in the middle
1960s. If T is measure preserving aiid Q : f H f o T , f E L1(P), it defines
a positive linear mapping with IQf 11 I 1 1 f 11 and Q l = 1. We actually can
prove the result, more generally, for such "contractive" operators Q on L 1 ( P ) .
More precisely, one has a basic inequality as:

Proposition 2 (Maximal Ergodic Theorem) Let Q : L 1 ( p ) + L 1 ( p ) be


a positive linear operator such that IQ f 11 < If 11. T h e n for a n y f E L1 (P)
we have

Proof Let f, = x r = o ~ ' " fand fn = supolk5n fk. Then fn 1. and [f,>
01 1. Af. Since Q is positive and f = fo, we have f <
f + Q~L
aiid f,+l =
+
f + x:=~(Q'"+~f) = f + QfmI f ~ f A , 0I m I n. Hence I &+I I
+
f Q f$. Coiisequeiitly

because Q is a (positive) contraction. Since f E L 1 ( P ) and [f,> 01 1. A f , we


get ( 5 ) by letting n + oo in (6) on using the dominated convergence theorem.
This completes the proof.
452 7 Limit Laws for Some Dependent Sequences

Note that the finiteness of P is not used here. With this we can establish

Theorem 3 (Birkhoff's Ergodic Theorem) Let ( 0 ,C, p ) be a measure


space and T : fl + R be a measurable and measure-preserving transformation.
T h e n for each f E L 1 ( p ) , we have

exists a.e. and f * i s invariant, i n the sense that f * o T = f * a. e. If, moreover,


p = P, a probability, and 3 c C is the a-subalgebra of invariant sets under T ,
so that 3 = {A E C : P ( A A T P 1 ( A ) ) = 01, t h e n f * = ~ ~ (a.e.,f and ) the
convergence of (7) i s also i n L 1 ( P ) - m e a n . (A A B i s the symmetric difference
of the sets, A, B.)

Proof Since f o T k E L1(p),k > 1, it follows that


Ak = { w : (f OT~)(W) #O)
is 0-finite for each k , so that A = Up=l Ak c R is a-finite. Replacing fl by A,
if necessary, p may be assumed a-finite. We may (and do) take for (7) that
f > 0 a.e. by considering f* separately. The proof of the existence of limit in
(7) t o be given here, is similar t o the first proof of Theorem 3.5.7.
Let 0 < a < b < cm, Sn(f ) = (1112) f o T k , and consider the set

Since clearly lim inf, Sn( f )(Tw) = lim inf, Sn( f )(w) and similarly for lim sup,
we conclude that Bab= T-'(B,~), SO that Babis invariant. If p(Bab) < m,
then the argument of Proposition 1 yields p(Bab) = 0. Let us show that
<
~ ( B a b ) 00.
If A E C, A c Baband p(A) < cm, consider g = f - b x A E L1(p). Also, let
>
B be the set on which Sn(g) 0 for at least one n. Then by Proposition 2,

since A c Babc B . To see the last inclusion, let w E B a b Then


b < liin sup Sn( f )(w) +- b < S, ( f )(w) for at least one n ,
n
. n-1
7.3 Ergodic Sequences 453

Also, since p is a-finite, by (9) p(A) 5 ( l l b ) JE If 1 d p for all A E C(Bab),trace


of C on Bab,with p(A) < cm, it follows that there exist A, 1' Bab,p(An) < cm
and
~ ( B a b=
) n+cx
lim p(An) (10)

as needed for (8). Thus h = f - b x ~ <E &L1(p),


, and applying Proposition 2
>
to h (and Bab in place of Af there, since Sn(h) 0 on Bab),we get

Consider h' =a - f > 0 so that Sn(hl) > 0 on Bab: then Proposition 2 again
implies
0 5 L h1dii=L (a-f?dw (12)

Adding (11) and (12), we obtain 0 5 (a b)p(Bab) 5 0, since a < b. Hence


-

p(Bab) = 0. Letting a < b run through the rationals, one deduces that the set
N = [lim inf, S, ( f ) < lim sup, S, (f )] is p-null, whence lim,,, S, ( f ) = f *
exists a.e.(p), proving the SLLN result in (7). We take p = P, a probability
measure, for the last part.
It is clear that 3 is a a-algebra and f * is F-measurable. Since f E
L1(P), Sn(f)E L1 (P),and by Fatou's inequality

Also, for each A E F

=L f dP [since T - ~ ( A )= A a.e.1

If we show that {Sn(f), >


n 1) is uniformly integrable, then Sn(f)+ f * a.e.
implies, by Vitali's theorem (cf. Theorem 1.4.41, that we can take limits in
(13) and get

and since the extreme integrands are F-measurable, ~ ~ (= ff* a.e.


) follows.
Now ISn(f)ll5 I f l l < oo, so that {S,(f),n >
1) is bounded in L 1 ( P ) .
Also, given E > 0, choose S > 0 such that P(F) < S implies E(If IxF) < E ,
which is possible since f E L1 ( P ) . Hence
7 Limit Laws for Some Dependent Sequences

because P(T-'(F)) = P(F) < 6,so that 1 f dP < E . Thus the uni-
-
form integrability holds. This also implies, by the same Vitali theorem, that
1 ISn(f ) f * 11 0, coinpletiiig the proof.
-
-

A measure preserving traiisforinatioii T : R R is called ergodic or met-


rically transitive if its invariant sets A (in F)have probabilities 0 or 1 only.
In this case, for each r.v. f , the sequence {fn = f o T n , n > 1) is called an
ergodic process. A family { X n , n > 0) (or { X n , o o < n < oo)) is called
strictly stationary if for each integer l > 0 (or any integer )! the finite di-
mensional d.f.s of (X,, , . . . , X,,) and X,,+e, . . . , X,,+e are identical. Thus
X, need not have any finite moments for strict stationarity and it also need
not be ergodic. If T : R + R is one-to-one and both T and T-' are measure
preserving, and X o is an r.v., then the sequence {Xn = X o o T P n ,n > 1) is
strictly stationary, as is easily seen. Recalling the work of Section 5.3, it may
be noted that the symmetric stable processes (or sequences) form a subclass
of strictly stationary family. There is a weaker notion of stationarity if the
process has two (translation invariant) moments which will be discussed in
the last and final chapter. Theorem 3 implies the following statement:

Corollary 4 Let {X,, n > 1) be an integrable strictly stationary sequence


of r.v.s. Then the sequence obeys the S L L N with limit as an invariant r.v.,
which is E'(x~) where Z is the a-algebra of invariant sets in C (of (0,C, P))
and E'(.) is the conditonal expecation relative to Z. If, moreover, {X, =
X o T n , n > 1) i s an ergodic sequence, then we have

(1171)
n-1

k=O
- E ( X ) ( E'(x), Z= (4, a)),a.e. and in L'(P). (16)

It should be remarked that if {X,, n > 0) is a sequence of independent


r.v.s, then it is strictly stationary iff the X, are identically distributed. Fur-
>
ther, an i.i.d. sequence {X,, n 0) is not only stationary as noted, but may
be taken t o be ergodic. This means that if the (0,C , P) is the canonical
space given by the Kolmogorov representation (cf. Theorem 3.4.11) with f l =
R'(I = integers) (this can always be arranged), then (TX,)(w) = X,+l(w) is
well defined, and easily shown t o be ergodic in this case. With such an identifi-
cation, (16) is equivalent t o the sufficiency part of the SLLN given in Theorem
2.3.8. Since in the above T can be more general, it is clear how ergodic theory
can be developed in different directioiis and also on infinite measure spaces.
Exercises 455

Starting with the strictly stationary (and perhaps ergodic) processes, one
can extend the central limit theory and other results t o this dependent class.
We leave the matter here; a few complements are indicated as exercises.

Exercises
1. The result of Proposition 1.1 can be stated in the following slightly more
general form. Assume the conditions (ii) and (iv) there and let (i) and (iii)
be replaced by (i') E(Y,(t)xA) + P ( A ) , A E El aiid (iii')~:" XKk 5 Z > 0
a.e. Then, with simple changes in the proof (see also Proposition 2.7), show
D
that S, +V where the ch.f. of V is given by Qv(t) = ~ ( e q ( + ) ) .

2. Let ( 0 ,E, P) be the Lebesgue unit interval. (R= [O, 11,P = Lebesgue
measure.) Consider the r.v.s Vn defined by V,(w) = s g n ( ~ i n ( 2 ~ + l ~ w )E) , w
>
R,n 0. These are independent with means 0 aiid variances 1, known as the
Rademacher functions. If X n k = (Ifk/&) +
2"I2xA,,, A, = [O, 2Zn],0 k < <
<
n , F,k = o(Xn3,j k), then verify that the sequence

satisfies condition (ii) of Theorem 1.2 but not the Lindeberg condition [not
k D
even its weaker form: CZ& Xi,XIIx,,,l>d+ 0 as n + oo]. Both these obser-
vations are due t o McLeish (1974).

3. Complete the coinputatioiial details of Propositions 2.7 and 2.11

4. Let {N,, n> 1) be a sequence of integer-valued (positive) r.v.s such


a > 0 as n + cm, and let {X,,n > 1) be i.i.d. variables with
P
that N,/n +

two moments finite. Show that XN,, /a


5 0 as n + oo.
5. Show that the proof of Theorem 2.1 can be simplified if the r.v.s N,
there are independent of the {X,, n >
1)-sequence.

6. Let {Y,, , . . . , Y,,,, ) be a set of optionally selected r.v.s from Yl , . . . , Y,


which form order statistics from a random sample of size n with a continuous
strictly increasing d.f. F on the interval (a,PI, -cm < a < P < cm, F1(P)> 0
<
(cf. Theorem 2.61, aiid 1 rl < r a < . . . < r k + l < n , are integer-valued
P P
r.v.s. If + X , a discrete r.v., while r j / n + 0, j = 1 , . . . , k, with
P[rj = ij] = 1, let r3 L r k + l rj. Then show, using the procedure of the
-

proof of Theorem 2.6, that


456 7 Limit Laws for Some Dependent Sequences

where ?? is the k-fold convolutioii of "chi-square" r.v.s with 2(ij+l i j 1) - -

"degrees of freedom," j = 0 , . . . , k - l ( i o = 0). Thus the ch.f. q5 of ?? is given


by
k-1 k

[The argument is similar t o (but simpler than) that of Theorem 2.6. Both
these results on order statistics were sketched by the first author (1962). Also
Rknyi's (1953) results on order statistics are of interest.]

7. Let L"(P) be the Lebesgue space (a, C , P), p > 1, aiid let T, :
L1 (P)+ L1 (P) be a positive linear mapping such that IT, f I <I f 11 and
1 IT, f 1 ,< I f .,I Then it can be verified that the adjoint operator T,* of T, is
also positive aiid satisfies the same norm conditions on L1(P) and L m ( P ) . Let
TI, = TnTn- . . . TI aiid V, = T,*,Tln. Then V, is well defined on L" (P), sat-
isfies the same norm conditions, aiid is positive. I f f E LP(P), 1 < p < oo, aiid
= V, f , show that {g;, 3,,n > 1) is an adapted integrable sequence. Let
7 be the directed set of all bounded or simple stopping times of {3,,n 1). >
>
The sequence Ig;, 3,,n 1) is called an asymptotic martingale (or amart) if
the net { ~ ( g ; ) ,7 E 7 )of real numbers converges. It is asserted that + gf
a.e. and in LP(P), f E Lp(P), 1 < p < oo. This is equivalent to showing that
>
{ 9 ~ , 3 , ,n 1) is an amart. [Proofs of these results are nontrivial, c.f. G.A.
Edgar and L. Sucheston, (1976). We hope that a simple proof can be found!]

8. Show that the conclusioii of Theorem 2.1 is valid if Y there is not nec-
essarily discrete but just strictly positive.

9. Let {X,, n > 1) be i.i.d. variables and Y,


= max(X1, . . . , X,). If N,
P
is an integer-valued r.v. such that N,/n > 0, where X is an r.v., and if
+X
D
(Y, -a,)/b, + Z (nondegenerate) for a, E R, b, > 0, then using the method
of proof of Section 2, show that the above convergence to Z is also "stable,"
as defined there. With this, prove that, Fx being the d.f. of X,

[One has to prove several details. This result is due to 0. Barndorff-Nielsen


(1964).]

10. We now present an application of Birkhoff's result, which is called


a random ergodic theorem, formulated by S. M. Ulain and J. voii Neu-
mann in 1945. Let (R, C , P) and ( S , Z , p ) be two probability spaces. If
Exercises 457

(Y, y , u) = ~ 2 0 0 _ - ~ ~ ( SpIi,
, Z , the doubly infinite product measure space in the
sense of Jessen (cf. Theorem 3.4.3), let 4, : R + 0 ,s E S, be a family of one-
to-one measurable and P-measure-preserving transformations. Note that the
mapping (w, s ) H (4, (w), s) is P x p-measurable. If X : R + R is an integrable
r.v. and for each y E Y, sk(y) is as usual the kth coordinate of y[sk(y) E S,
all k], then the mapping (w, y) H ( q 5 s , ( y ) ( ~ ) , $(y)) is P x u-measurable and
P x u-measure preserving, where $ : Y + Y, defined by sn($(y)) = s , + ~ ( y ) ,
is called a shift transformation. Then 4, b , . . . q5so(Y) (s),gn(y))
(y, s) = ( ~ s , (y,
is well defined for n = 1 , 2 , . . . . Show, using Theorem 3.3, that

a.e.[P],and in L1(P) also, for each y E Y N , P x v ( N ) = 0. (The various


-

subsidiary points noted above are useful in the proof. More on this and ex-
tensions with interesting applications to Markov processes can be found in an
article by S. Kakutani (1951).)
Chapter 8

A Glimpse of Stochastic Processes

In this final chapter a brief account of continuous parameter stochastic pro-


cesses is presented. It iiicludes a direct coiistruction of Brownian motion to-
gether with a few properties leading t o the law of the iterated logarithm
for this process. Also, processes with independent increments, certain other
classes based on random measures and their use in integral representation
of various processes are discussed. In addition, the classification of strictly,
strongly, weakly stationary and harmonizable processes are touched on, so
that a bird's-eye view of the existing aiid fast developing and deeply interest-
ing stochastic theory caii be perceived.

8.1 Brownian Motion: Definition and Construction


One of the most important continuous parameter stochastic processes aiid the
best understood is the Brownian motion, named after the English botanist
Robert Brown, who first observed it experimentally in 1826 as a continual
irregular motion of small particles suspended in fluid, under the impact of
the surrounding molecules. Later this motion was described in (plausible)
mathematical terms by Bachelier in 1900, by Einstein in 1905, and by von
Sinoluchovski in 1906. However, it was Norbert Wiener who in 1923 gave a
rigorous mathematical derivation of this process, and so it is also called a
Wiener process. It occurs in physical phenomena (as Einstein and Smolu-
chovski considered) aiid caii also be used t o describe fluctuatioiis of the stock
market averages (as Bachelier noted) among other things. We now define this
process and derive its existelice by presenting a direct coiistructioii without us-
ing the Kolmogorov-Bochner theory of projective limits (of Chapter 3) which
may also be used.

Definition 1 Let {Xt, t > 0) be a family of r.v.s on a probability space


(R, C , P ) . Then it is called a Brownian motion if (i) PIX. = 01 = 1 aiid if (ii)
for any 0 < to < . . . < t,, the Xt,+l - Xt, , i = 0,1, . . . , n - 1, are mutually
460 8 A Glimpse of Stochastic Processes

independent, with Xt,+, - Xt, distributed as N(0, a2(t,+1 - t,))for some con-
stant a2 > 0. (For simplicity, we take a2 = 1. Compare with Definition 5.4.1.)

It is immediately seen that this collection of r.v.s {Xt,,i = 1 , . . . , n), as n


varies, has a compatible set of d.f.s in the sense of Section 3.4 (it was verified
after 5.4.1). Hence by Theorem 3.4.11 such a process, as required by the above
definition, exists with L? = I W [ " ~ ] , C = a-algebra of cylinder sets of R , and P
given by the normal d.f.s that satisfy coiiditioiis (i) aiid (ii). However, it turns
out that the function t H X t ( u ) is coiitinuous for almost all w E R,so that
the process really concentrates on the set of continuous fuiictioiis C[O,co) c
R. This and several properties of great interest in the analysis of Brownian
motion need separate (and nontrivial) proofs. The present special work is
facilitated by direct coiistructioiis that do not essentially use the Kolmogorov-
Bochiier theory, and so we present one such here. It is based on an interesting
argument given by Ciesielski [(1961), especially p. 4061. For this we need the
completeness property of Haar functions of abstract analysis, indicated in
Exercise 46 in Chapter 3. Since this is essential, let us present it here.
The functions given in the above-noted problem can be written, for all
0<2<1,as
(1)
ho(z) = 1, ho (2) = X[0,1/2]- X(l/2,1],

Writing these as a sequence in increasing values of n (and for each n in in-


creasing k), let us relabel the set thus denoted as {H,, n >
0). Then it is a
bounded orthonormal sequence in L2(0,I ) , so that for any f E L"(O,l),p 1, >
its Fourier expansion can be obtained as

1
where a k f( ) = Jo f (z)Hk(z)dz. The completeness of the Haar system may
be proved in different ways. For instance, one can show that they form a
<
basis in Lp(0, I ) , 1 p < co. However, a short probabilistic argument runs
as follows. If B, = a(H1,.. . , H,), then the atoms of B, are intervals of the
+
form [k2Tn,(k 1)2Tn), and ~ " r 7 - l (H,) = 0 by definition of each Hn. Thus
>
{S,, B,, n 1) of (2) is an L2(0,1)-bounded martingale for each f E L2(0,1).
Hence it converges a.e. aiid in L2(0,1) to f , since a(U,,,- B,) is the Bore1 0-
algebra of (0,l). Thus So1
(fH,)(z)dz = 0 for all n >
0 implies f = 0 a.e.,
which is coinpleteness of the system by definition. As is well-known, in L2(0,1)
this is equivalent to Parseval's equation. Thus f , g E L2(0,1) implies
8.1 Brownian Motion 461

We use this form of completeness in the following work. It is of interest t o


note that the integrated objects gn : z H:J Hn(v) dv are called "Schauder
functions," and their orthonormalized set (by the GramSchmidt process) is
known as the "Franklin functions." These various forms are profitably used
in different types of computations in harmonic analysis.
The existence result is explicitly given by the following:

Theorem 2 Let { & , n > 0) be a n 2.i.d. sequence of N(O,1) r.v.s o n a


probability space ( R ,C , P ) . [ T h e existence of this space follows at once from
the product probability theorem of Jessen ( c f . Theorem 3.4.3).]T h e n the series

converges uniformly in t for almost all w E 0, and the family {Xt, 0 < t < 1)
i s a Brownian motion process o n this probability space, which can t h e n be ex-
tended to all of Kt+.

Remark: The crucial part is the uniform convergence of the series in (4)
for all t , and a.a. (w). This is an important reason for using "random Fourier
series" t o study the sample path properties of their sums.
Proof Let us first establish the uniform convergence of (4). Using the
N ( 0 , l ) hypothesis, we have for u > 0

< ~ [ e x p ( - u 2 / 2 )/lu (integration by parts).

Hence if u2 = 3 log n , we get

Thus by the Borel-Cantelli lemma (cf. Theorem 2.1.9iii)

With this, if grn is the Schauder function, then by integration we note that
&(t) > 0, and if 2" < < <
m < 2"+', 0 t 1, then 0 $,(t) < <
2-"12/4. But
(5) implies
an = max
2"<k<2"+1
<
ItkI [3log P1]
462 8 A Glimpse of Stochastic Processes

with probability 1 for all large enough n. Consequently (4) is dominated by


the following series for a.a.(w):

< <
By (6) the series defined in (4) is uniformly convergent for all t , 0 t 1 with
< <
probability one. It remains t o verify only that {Xt, 0 t 1) is a Browiiiaii
motion process, since the continuity of Xt(w), a.a. ( w ) , is immediate from the
uniform convergence of contiiiuous summands.
It is clear that X o = 0 a.e. One uses the completeness property in the form
of (3) t o verify that Definition lii is also true. Let 0 < t l < . . . < t m < 1.
In view of the uiiiqueiiess theorem for ch.f.s (cf. Theorem 4.5.l), it suffices t o
show that the joint ch.f. of (Xt,, . . . , Xt7,,) is normal and that the (Xt,+lX t , )
are independent. For this coiisider

[by the representation (4) just established]

[by the mutual independence and N ( 0 , l ) of the &]

(where um+l = 0). (7)

< t, < t, < 1, then


Note that if we consider f = ~ p , ~ , )=, gx [ ~ , ~ , ) , O (3)
becomes
00

Using (8) in (71, we get, after a direct computation and a simplification,


8.2 Properties of Brownian Motion 463
m
= E ( e x p ( i q (XI, - XtJ1 ) ) ) (where to = 0).
j=1

This implies that Xt7 X t l p , are independent N(0, ( t j


- - tjPl)) r.v.s, and
< <
completes the proof for 0 t 1.

To get the result for 0 < t < oo,define (inductively) the process as follows.
Let Xt = xi1)for 0 < t < 1, and if X t is defined on 0 < t < n , then let
Xt = x,(!:') + X n for n < t < n + 1 , where { x ~ ( ~ < )t ,<ol } , n > 1, are
independent copies given by (4).
From Eq. (91, for 0 < t l < t 2 , we get E ( X t l X t , ) = E ( X $ ) = t l , so that
the covariance function r(.,.) of the Brownian motion process is given by

This completes the proof on R+ itself.


Remark Of the other methods of construction, one that is interesting and
intuitively simple is t o show that the Brownian motion process is the limiting
form of a suitable sequence of random walk processes. This yields another
coiinection of this theory with our earlier work. However, we do not present
it here.

8.2 Some Properties of Brownian Motion


Even though the sample functions t H Xt(w), t >
0, of a Brownian motion
are uniformly continuous for almost all w E R,their behavior is quite irreg-
ular. Almost all are non-differentiable, and do not have finite variation, on
any interval of positive length, so that one caiiiiot easily extend the classical
integration for this process as an integrator. Let us establish these properties
before discussing others.
First we note the following parametric characterization of a Gaussian pro-
cess, a family {Xt , t E T } all of whose finite-dimensional d.f.s are Gaussian.

Proposition 1 A real Gaussian process {Xt, t E T} is characterized by


a mean function p : t t p(t) = E ( X t ) and a covariance function r : (s,t) H
r(s,t ) = Cov(Xs, X t ) , i.e., a function satisfying r(s,t ) = r ( t , s ) and for each
finite set { t l , . . . , t n } c T, the n-by-n matmx ( r ( t l , t j ) ,i, j = 1 , .. . , n ) is posi-
tive (semi-) definite.

Proof Let p , r be the mean and covariance fuiictioiis (parameters) of the


given Gaussian process {Xt, t E T}. Then for each finite set ( t l , . . . , t,) from
464 8 A Glimpse of Stochastic Processes

T, the joint d.f. of (Xt,, . . . , XtrL),is an n-dimensional Gaussian d.f. so that if


( a 1 , . . . , a n )E Rn, Yn = Cr=L=l a i X t , , then Yn is N(un, a;) distributed, where
un = a i p ( t i ) , a: = Cy=l C y = l a i a j r ( t i , t j ) . Thus its ch.f. cjn is given
by

Hence writing u:, = ua:,, we can deduce that the ch.f. $, of ( X t l , . . . , X t , , ) is

Since each n-dimensional Gaussian ch.f. gn given by (1) is completely deter-


mined by the parameters ( p ( t l ) , . . . , p ( t n ) ) aiid r ( t i , t j ) , i, j = 1, . . . , n ) , that
is, by the functions p aiid r, it follows that a Gaussian process is determined
by these two functions.
On the other hand, if p : T + R and r : T x T + R are given with r as
positive definite, then by substitution of p(ti), r ( t i , t j ) in ( I ) , we get classes of
n-dimensional ch.f.s. Because of the uiiiqueiiess theorem (cf. Theorem 4.5.1)
and the classical Fubiiii theorem, we deduce at once that these ch.f.s (and
hence their d.f.s) form a compatible family. Thus by Kolmogorov's theorem (cf.
Theorem 3.4.11) we can construct a probability space ( 0 ,C , P), (with R =
RT, etc.) and a process {xt, t E T) which has these n-dimensional Gaussian
d.f.s as its finite dimensional d.f.s, so that the process is Gaussian with p and
r as its mean and covariaiice functions. Thus as far as the finite-dimensional
distributioiis are concerned, the given process aiid the coiistructed process are
indistinguishable. In this sense there is essentially a unique Gaussian process
having a given mean and covariance function. This proves more than the
asserted statement.
Since a Brownian motion is also a Gaussian process with p(t) = E ( X t ) = 0
and covariaiice r(s,t ) given by [cf. Eq. (1.10)]

r(s,t) = E(X,Xt) = min(s, t), (2)


a simple calculation of p and r shows that we have the following consequence:

Corollary 2 Let {Xt, t > 0) be a Brownian motion process. Then

(i) {Xt+, - >


X,, t 0} is a Brownian motion process for each n > 0;
(ii) {tXt-1, t > 0) and
8.2 Properties of Brownian Motion

are also Brownian motion processes, (i.e., all have the same finite dimensional
d.f.s).
We can now present a mathematical description of the irregular behavior
of the process, observed by R. Brown under a microscope:

Theorem 3 (Wiener) The sample functions t H Xt(w) of the Brownian


motion process {Xt, t > 0) on a complete probability space ( R , C, P ) , which
are uniformly continuous for a.a.(w) by Theorem 1.2, are nowhere differen-
tiable for a.a. w E R . I n particular, they are not rectifiable in any interval of
positive length [for a. a.(w)].

Proof In view of the preceding corollary, it suffices t o prove this result for
< <
{Xt, O t 1). We present a short argument here followiiig Dvoretzky et. al.
(1961). If Xt is differentiable at a point s E [0, 11, then for t , s close enough,
we must have Xt(w) - X,(w) < C t - sl. One may translate this into the
+
following statement: Let j = [ns] 1, the next integer t o the integral part [ns]
for any natural number n > 1. Then the above inequality gives

for an integer l > 1 and n large enough. Let A(j, l,n ) be the w-set defined
by (3) and let A be the set of w for which X(.)(w) is differentiable at some
point of [0,1]. It is not obvious that A E C . However, we show that it is
contained in a P-null set, so that (by completeness of the space ( R , C, P)),C
contains all subsets of P-null sets which then have zero probabilities. Note
that A(j, !, n ) E C.
It is seen that, by definition,

Consider

n+l
< liminf
n
X[P[1 XlInl < !/n] I3
j=1
8 A Glimpse of Stochastic Processes

(by the i.i.d. property of "equal" increments and Corollary 2)

= lim inf ( n
n
+ 1)(P[ XI/, < eln])'

[since XI/, has the N(0, l l n ) d.f. ]

From (4) and (5) it follows that A is contained in a countable union of P-null
sets and hence has probability zero.
If the path {Xt, (w),0 t < <
1) is rectifiable, then it is of bounded vari-
ation. But from real variable theory we know that a fuiictioii of bounded
variation has a derivative at almost all points s E [ O , 1 ] (Lebesgue measure),
and by the above, the set of w-points with this property is a P-null set. Thus
the last assertion follows, and the proof is complete.

Actually the above authors have shown, in the paper cited, a stronger
result than this theorem, namely: the process {Xt a t , 0 t+ < <
I ) , with a
"linear trend" a t , has almost no points of increase or decrease for any a E R.
Many intricate properties are found for this process because of the analytical
work available for the d.f.s N ( 0 , t ) of X t , as in (5).
By the preceding theorem, almost no Brownian path t H Xt (w),0 t 1, < <
is of bounded variation. However, P. Lkvy has given the followiiig interesting
property on its "second variation" :

Proposition 4 Let {Xt, 0 < t < 1) be a Brownian motion. Then

lim
n-00
x[xic2-"
k=l
-
2
X(k-l)2-71] =1 a.e.

Proof Let Ykn = Xk2-?> X ( k p l ) 2 - n ,so that the Ykn are N(0,2-")
-

distributed and independent. Consequently, {Y:~, 1 k < <


2") are also inde-
~ 2 ~ )
pendent, E ( C : ~ ~ = 1, and

Hence by the ~ e b y s e vinequality, for each E > 0, if


8.3 Iterated Logarithm for Brownian Motion

then
P(A,,,) <~-~2-~+'.
Thus 00 00

and by the Borel-Cantelli lemma (cf. Theorem 2.1.9i), P(A,,,, i.0.) = 0. Hence
>
for the complementary events, P(Ak,, , n no)= 1, so that

Since E > 0 is arbitrary, (7) implies (6), as asserted.


A few other properties of the process are included in the problems section.

8.3 Law of the Iterated Logarithm for Brownian Motion


We have proved the Kolmogorov LIL as Theorem 5.5.1 and the details there
are considerably involved. Here we present a similar LIL for Brownian mo-
tion, due originally to A. Khintchine in 1933, and its proof is easier than the
earlier result. The work is further shortened by using a martingale maximal
inequality. We follow McKean (1969) for its proof.

Theorem 1 Let {Xtl t > 0) be a Brownian m o t i o n process. Then

and also
t\o
xt
(2t log log l/t)1/2
=1
I =1,

t\o
xt
(2t log log l / t ) l / 2 - 1
- 1 =I. (2)

Proof It is sufficient to prove ( I ) , since by Corollary 2.2 -Xt is also


a Brownian motion process and liin sup(-Xt) = lim inf Xt. (There are no
-

measurability problems.) As in Theorem 5.5.1, we establish ( I ) in two parts:


> <
for 1 and for 1. The key observation here is to note that = exp{aXt -
468 8 A Glimpse of Stochastic Processes

a 2 t / 2 ) is a martingale for .Ft = =(X,, 0 < s < t ) , t > 0, for each a! E R.


Indeed, Yt is integrable, since

with the formula for ch.f.s. Also, Yt is .Ft-adapted, and for 0 < s < t we have

Hence {x,.Ft,t > 0) is a positive martingale.


Let b > 0 and consider the maximal inequality (cf. Theorem 3.5.61)

[
P max x , - - s >b ] = P [,st
maxy, t e a ' ] (atR)

Note that since X,, and hence Y,, is (uniformly) continuous on [0,t ] , the
maximum exists, aiid there are again no measurability problems. (In fact we
can consider the result when s ranges over all the rationals and by continuity
the above assertion follows.) In order to apply the Borel-Cantelli lemma, one
chooses a , b of (5) suitably.
Let h(t) = (2t log log l/t)1/2,0 < 6 < 1,E > 0 and t > 0 be small. Set
+
t, = &",a, = (1 ~)6-"h(S"), aiid b, = ih(6"). Then

By (5) and (6) we get

Thus the (first) Borel-Cantelli lemma implies

max(X,
sst ,,
-
1
-a,s)
2
< b,, I
all but finitely many n = 1. (7)

Hence for a.a.(w), we can find no(w) such that n > no(w) aiid tnPl < t 5 t,
implies

Xt (w) 5 max X, (w) < b,


s<t,,
+ antn/2 = -21 h(hn)[(l+ + I]
E)
8.3 Iterated Logarithm for Brownian Motion

Thus
lim sup(Xt/h(t)) <1 a.e.
~'4
671

For the opposite inequality consider the independent events

where the 0 < 6 < 1 and h are as before. Let us calculate

To obtain a lower estimate, consider

-
ae-av2
- a2 lm
epu2I2 d u (by iiitegration by parts).

Hence

Substituting this into ( l o ) , we get on siinplification

For large enough n , the right side of (11) is at least as great as

and this is the general term of a divergent series. Since the A, of (9) are
independent events, we get by the second Borel-Cantelli lemma (cf. Theorem
2.1.9iii), because P(A,) = cm,that P(A,, i.o.)=l. Hence for infinitely
-

many n ,
>
X 6 , ~ (1 &)h(hn)
- +
xs,,+l, a.e.. (12)
470 8 A Glimpse of Stochastic Processes

On the other hand, by (8), X6rb+1 < h(Sn+') from some n onward. But X p is
N ( 0 , hn), so that (by symmetry) XbrL+l> -h(Sn+') also. Thus (12) becomes

Xsrb > (1 - &)h(hn) - h(hn+l)


= (1- &)h(hn) - &h(hn) - & h ( ~ ~ ) - ' 6log(1
~ + log 6-'/ log 6 Y )
+...
> (1 & 3&) h(hn) (for large enough n , since o < 6 < I), a.e..
- -

Consequently
liin sup X t / h ( t ) > (1 - 4 a.e. (13)
t\o
Since 0 < S < 1 is arbitrary, (13) implies

Thus (8) and (14) imply ( I ) , and the proof is complete.

In view of the "scale invariance" (cf. Corollary 2.2ii), one has the following
consequence of (1) and (2):

Corollary 2 For a Brownian motion {Xt, t > 0) on (R, C , P),

and
log log t)'I2) = 1 = 1
1 (15)

( x t / 2 t log log t)'I2) = -1

Many other results can be obtained by such explicit calculations using


I = 1. (16)

the exponential form of the d.f.s. For a good account, we refer the reader
t o Lkvy's (1948) classic. For further developments, McKean (1969) and Hida
(1980) may be consulted. The analysis leads t o stochastic integration with
numerous applications. (See, e.g., Revuz and Yor (1999) 3rd. ed. Springer) for
a recent treatment.)

8.4 Gaussian and General Additive Processes


Since a Brownian motion process is both a Gaussian process and one that
has independent increments, the preceding study leads us in (at least) two
directions. One is t o consider analogous sample function analysis for Gaus-
sian processes whose meail functions, for instance, are zero but whose covari-
ance functions r are more general than (perhaps not factorizable as) the one
8.4 Gaussian and General Additive Processes 471

given by r ( s , t ) = min(s, t)-the Brownian motion covariance. The second


possibility is t o consider processes having independent (and perhaps strictly
stationary) increments which need not be Gaussian distributed. The latter
processes are sometimes called additive processes. We discuss these two dis-
tinct extensioiis briefly in this section t o indicate a view of the progression of
the subject.
Since by Proposition 1.2, a (real) Gaussian process is completely deter-
mined by its mean and covariance functions, many properties of the process
can be studied from the behavior of these two parameters-mean and covari-
ance. But r : (s, t ) H E ( X s X t ) is a symmetric positive definite function, and
as such it qualifies t o be a kernel of the classical Hilbert-Schmidt (and Fred-
holm) theory of integral equations. Thus it is t o be expected that this classical
theory plays a role in the present analysis. Of basic interest is the following
well-known result of T . Mercer. Its proof is not iiicluded here.

Theorem l(A4ercer) Let r : [a,b] x [a,b] + R be a continuous covariance


function. T h e n the equation (-cc < a < b < cm)

admits a n infinite number of values Xi > 0 and a corresponding system of


continuous solutions {u,, n > 1) which form a complete orthonormal sequence
in the Hilbert space of (equivalence class o f ) functions o n the Lebesgue interval
(a, b), denoted simply as L2(a,b), such that

the series converging absolutely and uniformly i n the square [a,b] x [a,b].

This result enables one t o consider Gaussian processes with continuous


covariances as follows. Let {<, n >1) be a sequence of independent N ( 0 , l )
r.v.s on a (product) probability space ( 0 ,C, P) and let r be a continuous
covariance function on the square [a, b] x [a,b]. If the un and A, are as in the
Mercer theorem, let
OC

Because of (2), Theorem 2.2.6 implies that the above series converge a.e. and
in L2(P)-mean. Hence E ( X t ) = 0, and by independelice of the En,

using (2) again. Then by Proposition 2.1, since each Xt is clearly Gaussian,
C, P) of the tn with
{ X t , t E [a, b]) is a Gaussian process on the space (R,
472 8 A Glimpse of Stochastic Processes

mean zero and covariance r. Such a representation as (2) and (3) is useful for
many computations.
To illustrate the effectiveness of the representation (31, and the need for
special techniques, let us calculate the distribution of a general quadratic
functional of a Gaussian process {Xt, 0 < <
t 1) with mean zero aiid a
continuous covariance r . The quadratic functional is

where q E L2(0,1) aiid n E R.Our assuinptioiis imply that X : (t, w) H Xt(w)


is jointly measurable on [ O , l ] x R relative to the dt dP-measure. Thus (5) is
a well-defined r.v. by Fubini's theorem. To find the d.f. of Q ( X ) , an obvious
method is to use ch.f.s. Thus one calculates the latter function, and if it
can be inverted, then the d.f. is obtained. The ch.f. is also useful to get at
least an approximation to its d.f., using certain series expansions aiid related
techniques. The analysis shows how several abstract results are needed here.
We first simplify Q ( X ) by means of the expansion (3). Thus

+a2 L1 q2(t) dt

+a2 1 1
q2(t)dt (by the orthonormality of the ui)

>
since {u,, n 1) is complete in L2(0,1) and q E L2(0, 11, SO that by Parseval's
equation we may cancel the last two terms. Consider the moment-generating
function (m.g.f.) which exists and is more convenient here than the ch.f.,
8.4 Gaussian and General Additive Processes

(by independence and bounded convergence)

[since tj is N ( 0 , l ) so that ([j - has a


"noncentral chi-square" d.f.,]

where D,(X) = n&(l - A X j l ) , the Fredholm determinant of r, is an ana-

lytic function of X in a small disc around the origin. The second term is still
complicated. However, in the classical theory, one sets rl = r, aiid for k > 1,
so
lets r k ( s ,t) =
1
r ( s , x)rk-l(x, t ) dx, to use induction. Then set

which converges absolutely and uniformly if IXI max,,t Ir(s, t ) < 1 on the unit
square. R is called the reciprocal kernel of r, which clearly satisfies

r(s, t) + R(s, t ; A) = X
Now using (21, it can be verified that
I' r(s,x)R(x, t ; A) dx.

the series converging absolutely aiid uniformly if I A is as before. Thus (6)


becomes

If a = 0 in (5), then the m.g.f. of J; X," dt is [ D , ( - T ~ ) ] - ~ /We


~ . thus have
established the following result. (Some others are given as problems.)

Proposition 2 If { X t , 0 t < <


1) is a Gaussian process with mean zero
and a continuous covariance function r , then the m.g.f. of the distribution of
474 8 A Glimpse of Stochastic Processes

the quadratic functional Q ( X ) defined b y (5) is given b y the expression (9).

Regarding the second direction, one considers processes with independent


iiicrements which generalize the second major property of the Brownian mo-
tion. To gain an insight into this class, let {Xt, t E [O, I ] ) be a process with
independent increments. If 0 < s < t < 1, we consider the ch.f. q5s,t of Xt-X,.
For 0 < t l < t 2 < t3 < 1, one has, by the independence of Xt, - Xt2 and
Xt2 - Xt, , the following:

Suppose that the process is stochastically continuous; i.e., for each E > 0,
lim P [ X t - X ,
t i s
> E] = 0, s E (0,l)

Then it is not hard t o verify (cf. Problem 4) that limt,, q5,,t(u) = 1 uniformly
<
in u and s , t in compact intervals. Hence if 0 s < to < t l < . . . < t , < t 1, <
+
with tk = s (k(t s ) / n ) , we get
-

aiid the factors can be made close t o 1 uniformly if n is large enough. Thus
q5s,t is infinitely divisible (in the generalized sense, aiid hence in the ordinary
sense; cf. Problem 17 in Chapter 5 ) . Consequently, by the Lkvy-Khintchine
representation (cf. Theorem 5.2.5) with s = 0 < t < 1, u E R,

cjo,t(u)= e r p {iytu + (PV - 1- -iuu


1+v2
) -
+ v2

v2
d ~ , ( ~ ) ) (12)

for a unique pair {yt, Gt). The dependence of yt and Gt on t is contiiiuous


because of the assumed (stochastic) continuity of the process. Considering the
subinterval [s,t] c [ O , l ] , and using (12), we get a new pair y,,t and G,,t in (12)
for q5,,t. However, using (10) applied t o 0 < s < t < 1, so that &,t = &,,. q5,,t
we get (note that Log q5 exists by Proposition 4.2.9)

Substituting (12) in (13), with the uniqueness of the formula, one can deduce
that y,,t = yt - y, and G,,t = Gt - G,. Thus

. .
This formula can be used t o analyze the processes with independent incre-
ments which are stochastically continuous. By choosing different such pairs,
8.4 Gaussian and General Additive Processes 475

and hence different classes of these processes, one can study various proper-
ties. Note (by Example 3.3.3) that a process with independent increments is
Markovian, and if it has one moment finite and then means zero (in particular,
a Brownian motion), then it is also a martingale. These relations indicate the
directions in which the above types of processes can be studied. (See Problems
5 and 6.) We now illustrate with some simple but important examples.
An interesting and concrete example of an additive process is obtained by
>
considering a nonnegative integer-valued process I N t , t 0) with independent
increments. We now detail this view t o explain the growth in new directions.
For instance, Nt can represent the total number of events that have oc-
curred up t o time t , so that such a process is often termed a counting process.
Now, unlike Brownian motion where it is assumed that increments of the
process are Gaussian, we shall assume only that the independent increments
are strictly stationary, with no distribution implied.
For the sake of simplicity, let No = 0, aiid assume that the probability of
an event occurring during an interval of length t depends upon t. One simple
way t o develop this dependence is t o assume, as originally done by Poisson
(c. 1800) that
+
PIN*, = 11 = x n t o ( A ~ ) , (15)
where X is a nonnegative constant and A t is the length of the interval [0, At].
For a small value of At, equation (15) implies

aiid that events in iioiioverlappiiig time intervals are independent.


[The assumption in (15) is the simplest condition one can place upon the
process Nt. Other assumptions on this probability will lead t o a development
of other processes. Some of these are considered later as problems.]
The probabilities given by both equatioiis (15) and (16) imply that during
a small interval of time, the process Nt, t >
O can have at most one event t o
occur.
The goal now is t o determine the probability distribution for Nt. Letting

be the conditional probability of n events at time t given that there were none
initially. (Note that, in this illustration, the conditioning events always have
positive probability aiid so the difficulties considered in Section 3.2 do not
+
arise.) It follows that at time t At,

(by independent increments assumption)


= Po(t)(1 - Ant + o(At)), (by a consequence of (15)).
476 8 A Glimpse of Stochastic Processes

Thus
po(t + At) - p0(t) = (-Ant +O ( A ~ ) ) P ~ ( ~ )
which is

so upon letting At + 0, one has PA(t) = -XPo(t) since the last term tends
to zero by definition of o ( A t ) . This simple differential equation, with the
assumption No = 0, so that

gives
Po(t)= ePxt
Similarly, for n >1
P,(t + A t ) = P [ N , + L ~= n N o = 01
= P[Nt = n , Nt+nt Nt = ONo = 01
-

+P[Nt = n 1, Nt+nt Nt = I N o = 0]
- -

+P[Nt+nt = n , Nt+at - Nt > 2No = 01


= P[Nt = n N o = O]PINt+At- Nt = ONo = 01
+P[Nt = n 1 N o = O ] P [ N ~ + LNt
- ~ = llNo = 01
-

+P[Nt+nt = nlNo = O]PINt+at - Nt > 2 N o = 01


where the last expression follows from the assumption of independent incre-
ments of Nt. Now, by the (strict) stationarity of Nt's increments, as well as
the conditions (15) aiid (16),it follows that

Rearrangement of this expression aiid letting A t + O gives the system of


differential equations

PA(t) = X P , ( t ) +XPnl(t) for n> 1. (18)

The assumption that No = O gives PIN. = n ] = O for n > 1 so that using


(17) aiid recursively solving (18) it follows that

For each t > 0, P,(t) is the Poisson probability distribution with parameter
X > 0. Thus the iioiiiiegative integer-valued process Nt that has independent
and strictly stationary increments which satisfies (15),is called the Poisson
process with rate parameter X > 0.
8.4 Gaussian and General Additive Processes 477

The Poisson process is a member of a wide class of integer-valued con-


tinuous time stochastic processes collectively known as birth-death processes.
An integer-valued process Xt with independent increments is a birth-death
process if the conditional probabilities given as:

P [ X t + ~=
t n + l X t = n] = A,& + o(At) for n > 0,
P [ X t + ~=
t n lXt = n] = p,At +o(At) for n > 1,
and

so that during a time At, the process can only increase (a "birth") by one unit
or decrease (a "death") by one unit. Birth-death processes have a wide variety
of applications in the biological and physical sciences. A few examples of these
processes are included in the exercises by considering various generalizations
of equations (15) and (16).
An important application of the Poisson process occurs in queueing theory
where the process Nt represents the number of arrivals t o the queue and
equation (15) gives the probability of an arrival t o the queue during a small
time interval. This is a specific example of the queueing model considered in
Section 2.4. We now reconsider the process Nt from a slightly more advanced
point of view.
Thus alternately the Nt process can be obtained as follows: Let X be an
exponentially distributed random variable so that P [ X < x] = 1 - e-'", x >
0, X > 0. If X I , . . . , X, are independent with the same distribution as X , let
>
S n = C i = 1 X k , be the partial sum and for t 0, set Nt = sup{n > <
1 : Sn t}
so that Nt is the last time before the sequence {S,, n >
1) crosses the level
t > 0, where as usual sup(@)= 0. Then Nt is an integer valued random
variable, and its distribution is easily obtained. In fact, since Sn has a gamma
distribution (c.f. Section 4.2) whose density is given by

we have for n = 0 , 1 , 2 , . . . (set So= 0), since [Nt > n] = [S, < t ] ,

=/I >t] fs,,(x?fx,,+,(Y) dx dy


[S,,It,X,,+1+S,,
since S,, Xn+1 are independent
478 8 A Glimpse of Stochastic Processes

Alternatively (following Feller (1966), p . l l ) ,

which is obtained by integrating the second term by parts aiid cancelling the
resulting integrals. Thus {Nt, t >0) is a Poisson process. Moreover, it has
the properties for w E 0 : (a) No(w) = 0, limtioo Nt(w) = oo, (b) integer
valued, nondecreasing, right continuous (i.e. limSLtN, (w) = Nt (w)), and at
discontinuity points, say to, Nt, (w) - N t o (w) = 1. We may ask whether these
properties characterize the Poisson process in the sense that such a process
has independent stationary increments as well as the distribution given by
equation (20). For a positive answer, we need t o strengthen the hypothesis
which can be done in different, but equivalent ways. This is presented as fol-
lows.

Theorem 3 Let {Nt, t > 0) be a nonnegative integer valued nondecreasing


right continuous process with jumps of size I and support Z+ = { 0 , 1 , 2 , . . .).
T h e n the following are equivalent conditions:
1. the process is given by Nt = max{n : S, I t ) , where S, = Xk,
with the X I , as i.i.d and exponentially distributed, i.e.,, P [ X > z] = e-'",z >
0, X > 0,
2. the process has independent stationary increments, each of which i s
Poisson distributed, so that (20) holds for 0 < s < t in the f o r m

3. the process has independent and stationary increments,


4. the process has n o fixed discontinuities, and satisfies the Poisson (con-
ditional) postulates: for each 0 < tl < . . . < t k ; and nk E Z+ one has for a
X>Oash\O
(i) P[Nth+h- Ntr = I N t , = n , , ~= 0,1, . . . k] = X h + o ( h )
>
(ii) P [ N t , + h - N t h 21Nt, = n , , j = 0,1, . . .k] = o ( h ) .

Here we sketch a proof of the difficult part 1. e 2. following Billingsley


(1995, p.300). It is of interest t o observe that, as A. Pri.kop& proved, the
stationarity of (independent) increments of the Nt-process is not implied by
an integer valued independent increment process without fixed discontinuities
although each increment has a Poisson distribution. The method is t o show
1. e 2. e 3. aiid 2. e 4. The complete proof uses PrkkopA's theorem also.
8.4 Gaussian and General Additive Processes 479

Proof of 1. + 2. From definition of Nt, we note that the sets [Nt n] = >
[S, <t], and as seen in (20) that [Nt > <
n] = [Sn t < S,+l]. For given
t> 0 and y >0, we have, on using the independence of S, and Xn+l with
exponential distribution for X, and with = Sn X,+l, +

-
= e '"[s, It,X,+l > t - S,]. (21)
On the other hand the properties of the independent X, of hypothesis 1.
imply for yj 0,>

(since S,+l > t + y on [S, < t < S,+l] implies X,+1 > y1, and use (21))

To simplify the left side, we define new random variables depending on the
fixed (but arbitrary) t as follows. Let xit)= SNt+1 t , x:) = XNt+a, = xit)
X N L + 3 , ..., and observe that for 0 < s < t , [Nt+, Nt >
m] = [SN,+, I
-

+ x(Y:" <
t s] = [ ~ y = ~ s]. This implies

which brings in the increments of the Nt-process with the "new" partial sums
s?) in t e r m of x
)! random variables, and the increment process {Nt+, -

>
Nt, s 0) is similar to the I N t , t >0) process for the X j random variables.
Thus we have [Nt+, N t = m] = [s:) s < s:)+~]. With this new definition,
if A = x:=~ [yjl oo) C &tkis a rectangle, then (22) becomes

But such rectangles as A are generators of the Borel a-algebra of IW" so that
by a standard result (cf. Proposition 1.2.8), (23) holds for all Borel sets A.
480 8 A Glimpse of Stochastic Processes

The next step is to express the joint event [Ns, = m,, z = 1,. . . ,l] as one
+
of the events in (23) using liLr 1 variables xJt)'s. Thus if we consider A as
+ +
the rectangle x t = l [ z l . . . z,, < + +
s, < z1 . . . zm,+l),then we find

and this implies (23) because

This is the key step in using iiiductioii with l = 1 , 2 , . . . , and 0 = to < t l <
. . . < te, since No = 0, so that the following is trivial for != 1, aiid then
assume for l = k, to complete the induction:

and obtain the result for l = k + 1. The equations (24) and (20) imply our
assertion and gives 2.

Proof of 2. + 1. Observe that [No = 01 = [XI > t] aiid so the distribution


>
of No in 2. gives PIX1 t] = ecxt. To get the distribution of X I , X2 (S1=
X I , X 2 = Sz- S1),let 0 = sl < t l < s a < ta and observe that

= P[N,, = 0, Ntl - N,, = 1,N,, - Ntl = 0, aiid Nt, - N,, = 11

(by independence of increments of the Nt-process)

(using the Poisson distribution of iiicrements in the hypothesis 2.)

From this we get for any Bore1 set A c [(yl,yz) : 0 < y1 < yz] c R2 that

The result can now be extended to the sector (0 < yl < . . . < yk) in IW'"
using the mapping z, = y, y,-l, computing the Jacobian, as in the proof
-

of Theorem 3.3.9. One deduces that the X, are independent and each has an
8.4 Gaussian and General Additive Processes 48 1

exponential distribution with parameter X >


0. This gives 1. and 1. + 2. is
established.
We omit the proof of the other equivalences, and refer the reader t o
Billingsley (1995). We shall however use all parts of the above theorem in
the followiiig discussion.
As Feller remarked in his book on Probability Theory (Vol. I ) , "the three
distributions, Bernoulli, Poisson and Normal and their ramifications in the
subject as a whole are astounding." In fact the Poisson process has the same
properties as the Brownian motion, except that it is integer valued aiid is
coiitiiiuous in probability with moving discontiiiuities of unit jumps (with at
most countably many in number). We resume the general theme, strongly
motivated by the Poisson case treated above.
Now formula (12) takes a particular form when Gt has no jump (so a2 = 0)
at the origin (i.e., the Gaussian component is absent) so that the appropriate
formula is that given by the Lkvy measures (c.f. Section 5.2, equation (20)).
In this case it takes the following simple form:

where N({O}) = 0, y is a coilstant aiid N ( . ) is noiidecreasing with

[One starts with d N ( x ) = 9 dG(x) in (121.1 To proceed further, let us


rewrite (20) as:

where t = 1 and 6,(.) is the Dirac point measure ( aiid TO = So, supp (TX) =
{ 0 , 1 , 2 , . . .} = z+). Since TX(.)is clearly a measure on the power set ?(Z+),
>
if XI, X2 0 one has the coiivolution

and its ch.f. gives, with -irx(t) = JZ+ eZtx7rx(dx) = e- X En=, eztnx -
n! -
eX(e"-l)

>
so that {TX, X 0) is a semi-group of probability measures, under convolution.
But (25) motivates the following immediate extension.
Let (S,B, u) be a finite measure space and 0 < c = u(S) < oo. If F ( . ) =
$ u ( . ) , then (S,B, fi) is a probability space (usually) different from (R, C , P).
Let X j : S + R be independent identically distributed random variables
482 8 A Glimpse of Stochastic Processes

relative to 6. Then GxJ : R + R+ is a (simple) random measure on (R, R),


the Borelian line, in the sense that for each s E S, SX7(S)(.) is the Dirac
point measure. We may use the adjunction procedure of Chapter 2, discussed
following Corollary 2.1.8 and replace R by f? = R x S, 5 = C @ B and
P = P @ F. We leave this explicit formulation, and assume that our basic
space is rich enough to carry all these variables. If N is a Poisson random
variable with intensity c(= u(s)) so that P [ N = n] = ep"$, consider the
measure .ire(.) in (25) as a generalized (or compound) variable as:

where N is the Poisson random variable with u(B) as intensity noted above.
Here N and X j are independent. As a conlposition of N and X j , all
at most countable, ?(.) is a random variable. In fact [?(B) = n] =
~ , ~ ~ ~ [Gx,C (B)
j m =, ~n] n [ N = n], for each integer n 2 1 so that ?(B)
is measurable for C , and thus is a random element for all B E B. To find
the distribution of %(B) we proceed through its characteristic function, and
establish the following statement
Lemma 4: For each B E B, ?(B) is Poisson distributed with intensity
C . F(B) = v(B), implying that ?(.) is pointwise a.e. a-additive. Moreover, the
result extends even if the intensity measure u(.) is a-finite.

Proof In establishing this result, through ch.f.s, we employ the fact that
E ( Y ) = E ( E ( Y 2 ) ) for any integrable (or nonnegative) random variable Y
and any r.v. Z (cf. Proposition 3.1.2). In view of Corollary 4.2.2 (uniqueness),
this is seen t o establishe the result. First we assume that 0 < u(S) < cm,as
in the statement so that F(S) = 1.
Thus denoting again by k ~ ( t = ) E(eit'"), and using the hypothesis that
X j are i.i.d. on ( S ,B, fi) which are independent of N , one has

N
= E ( E ( ~ ' ~ E J (B)
= ~N ) ) , by the above identity,
- x ~ ( e " ~ : = 1 ~ i' rB ) ) ~ =( n~) , since N is discrete

so that the difficulties with multiple solutions noted


in Section 3.2 do not arise

by the independence of the X j ,


8.4 Gaussian and General Additive Processes

Comparing this with (25) and the following discussion, we conclude that i i ~ ( . )
is Poisson distributed with intensity u(B).
Now, if B = Ur=lEk, B k E B, 0 < v(Bk) < GO, B k disjoint, then v(B) =
C k = , v(Bk) < GO, implying that v(B,) + 0 as n + ce so that the ch.f.
00

of .irB,, (.) tends to unity and hence iiBrL+ 0 in probability. Hence .ir( ) is a-
additive in P-measure. It is also seen that .irB,, are independent on disjoint
>
sets (and 0 a.e.), it follows that iiB = Cr='=liiB7,,holds pointwise a.e.
Finally, let v(.) be a-finite, and so writing S = U ~ ? = l S nv(S,)
, < GO,
S, disjoint, let .I?, = gS,, which are independent Poisson measures on
(S,, B(S,), v(S, n .)), n > 1, by the preceding paragraph. If % = EzlgS,,,
a sum of independent Poisson random measures, it is Poisson with intensity
0 < u(Sn) < cc on S,, and this depends only on u. Thus the results holds for
a-finite v(.) also, coinpletiiig the proof.

Hereafter we write n(.) for %(.) to simplify notation. Also the relation
between the intensity parameters of N and T ( . ) should be noted.
This result implies several generalizations (the versatility of the Poisson
measure!) of the subject, originally investigated by P. Lkvy in 1937. We in-
dicate a few consequences here. The above property of n(.) motivates the
followiiig generalization.

Definition 5 Let L O ( P )be the space of all real (or complex) random
variables on a probability space ( 0 ,C , P) and (S,B) be a measurable space.
A mapping p : B + L O ( P )is called a r a n d o m m e a s u r e , if the followiiig
(abstraction of Poisson measure given in Lemma 4) holds:

(i) A, E B, n = 1 , 2 , . . . , disjoint, implies {p(A,), n >


1) is a mutually
independent family of infinitely divisible random variables,
(ii) for An as above, p ( U r = l An) = 00 p(An), the series converges in
P-measure.

Since a Poisson random variable is infinitely divisible, and, by Proposi-


tion 5.2.2, a general infinitely divisible nonconstant random variable has an
unbounded range, the above definition includes all these cases. An important
subclass of the illfinitely divisible d.f.'s is the stable family, also analyzed in
Section 5.3 (cf. Theorem 5.3.16 and the structural analysis that follows there).
In the present context, these are called stable random measures, and they in-
clude the Poisson case. Recall that a stable random variable X : R + L O ( P )
has its characteristic function y ( t ) = ~ ( e = ~ eitXdp ~ ~ )to be given (by
the Lkvy formula) as:

y ( t ) = exp{iyt - c l t a ( l - ipsgn t . m(t,a ) ) ) , (27)


484 8 A Glimpse of Stochastic Processes

where y E R,P 5 1,c > 0,O < a 5 2 and


tan y, ifa!#1
m(t,a) =
-:log t i , if a! = 1.

Here a! is the characteristic exponent of y (or X), and a > 2 implies c = 0,


to signify that X is a constant. The ch.f. p of a stable raiidoin measure
p : B + L O ( P ) ,(27), takes the followiiig form:

for all A E B, v(A) < oo where v : B + IW+ is a a-finite measure. By the


two conditioiis in the above definition, one finds that y(.) and c(.) are 0-
additive, and for /I(.) if A, E Bo(= { B E B : v(B) < m)), disjoint, such that
Uzl Ai E Bo, (Bo being a 6-ring)
<
c(Ai)P(Ai) = c ( U z l A i ) P ( U g l Ai),
(and, of course, P(Ao)l I). We leave this to the reader to verify. It is not
necessary that v(.) aiid c(.),Pi.) have any relation except that p(A) is defined
as a real random variable for A E Bo. The fuiiction $(., .) is the exponent in
(29). It is often called the characteristic exponent which is uniquely determined
by the parameters (y, c, a, and p) and conversely determines them to make
(29) the Lkvy formula.
Observe that the Poisson raiidoin measure n : B x fl + &!IW+, is a function
of a set aiid a point, so that n(A, w)(= n(A)(w)) is a noiiiiegative number
which is a-additive in the first variable and a measurable (point) function in
the second. It is sometimes termed a kernel. Its important use is motivated by
the following considerations. In the classical literature (e.g., in Trigonometric
Series of Zygmuiid (1959), Vol. I, p. 96), the Poisson kernel is utilized to define
a Poisson integral which is used to study the continuity, differentiation and
related properties of functions representable as Poisson integrals. Analogous
study should (and could) be given for our case here. To make this point
explicit, we recall the classical case briefly and then present the corresponding
random integrals, indicating a substantial new developinelit of great value,
opening up for investigation. Thus the remainder of this Section is a survey
of the evolving and interesting work for researchers given without detailed
proofs.
For a Lebesgue integrable f : [-T, T] + R, using the orthonormal system
?j,cos nx, sinnx, n = 1 , 2 , .. . , consider the Fourier coefficients a k , bk given by

aiid for 0 < r < 1, set


8.4 Gaussian and General Additive Processes

Then the Poisson kernel P ( . , .) is given by

and f,(.) is representable as the convolution:

The classical results assert that f,(x) + f (z), for all continuous periodic
functions f , uniformly as r + 1. Thus, T is a continuous linear mapping on
L1(-n, T ] .The study leads to a profound analysis in harmonic function theory
and elsewhere, (cf. e.g., Zygmund (1959), p. 96).
Replacing P ( r , z ) dz by ~ ( wds)
, or more inclusively p(ds)(w) of Definition
5 above one could consider the corresponding analysis for random functions
or process (or sequences) that admit integral representation, modeling that
of (30) and then study the sample path behavior of them. Here the Lebesgue
interval [-n, T ] is replaced by (S,B, v) and w (in lieu of r ) varies in ( R , C , P ) .
Such a general study has been undertaken by P. Lkvy when p is a stable ran-
dom measure, (cf. (28), (29)). The resulting class of processes is now called
Lkvy processes. Almost all the classical results have important nontrivial ex-
tensions and the ensuing theory has an enormous growth potential in many
directions. We include here a glimpse of this aspect.
The basic step in the analysis is to define an integral of a scaler (non-
random) function relative to a stable random measure p : B + L O ( P ) .In
the case of a Poisson random measure, the intensity measure v : B + E+
(but a-finite) is a natural one defining the triple (S,B, v). In the general
case (of a stable random measure) (28) or (29) we have y(.),c(.) and P(.)
as set functions, with a-additivity properties but are not generally related
to u of the triple. So the first simplification made is to assume that y(.)
and c(.) are related (or proportional) to v and /3 is a constant. Thus, let
>
?(A) = au(A), (a E R)c(A) = cu(A), (c 0), and ? ' ! , I <
I is a constant, so
that the characteristic exponent $(., .) of (29) becomes for a E R, 0 < a 2, <
$(A, t) = iav(A)t - cv(A)tla{l - iPsgn t . w(t, a)),A E Bo,t E R. (31)
It is now necessary (and nontrivial) to show that exp{-+(A, .)) is a char-
acteristic function. This is true and then one can establish the existence of
an a-stable random measure into L O ( P )on a probability space (R, C , P ) ,
486 8 A Glimpse of Stochastic Processes

using a form of Theorem 3.4.10. This will show that the random measure
p : B0 + L O ( P )is "controlled" by u in the sense that p(A) = 0, a.e. [PI holds
whenever u(A) = 0, and p is governed by the quadruple (a, c, P, u). With this
a stochastic integral correspoiidiiig to the classical one defined by (29) can be
introduced. It will be specialized to show a close relation to strictly stationary
processes, represented as integrals, which also connects measure preserving
mappings of the last chapter at the same time.
Thus if fn = Cy=,aiXA,, Ai E Bo, disjoint, so that fn is a simple function
b)
and fn E L" ( S ,B, ,-define as usual

and if fn + f pointwise and {J, f n dp, n >1) c L O ( P )is Cauchy in prob-


ability (or equivalently in the metric discussed in Exercise 3 of Chapter I ) ,
then we set

since a Cauchy sequence has a unique limit in a metric space. It may now
be shown that the limit does not depend on the sequence {f,, n > 1). The
method is standard but not trivial (cf., Dunford-Schwartz (19581, Sec. IV.10)
and the uniqueness proof for (33) depends o n the availability of a controlling
measure u. Using the existence of such a u, it is possible to consider two
measures p1, p2 and obtain a Lebesgue type decomposition as well as the
Radon-Nikod9m theory for them. This analysis has deep interest in appli-
cations (See Section 5.4 of Rao (2000) for an aspect of this work where the
Lkvy-It6 representation and related integral formulas are given.)
Thus T : f H J , f dp is well-defined, and it may be verified that the
integral (33) or the mapping T is linear. The next important concern is to
characterize the class of p-integrable functions f E LO(S,B, u). We state a
result in this direction for an understanding of this new area and to relate
it with the strictly stationary process introduced at the end of the preceding
chapter. The following result is a substitute for (30) in the present context.

Theorem 6 Let ( S ,B, u) be a a-finite space and p : Bo + L'(P), o n a


probability space (fl,C,P), be a n a-stable r a n d o m measure with parameters
Fr
( a , c, p, u), 0 < a < 2. Let c LO(u)be the set of (real) functions such that
f E Fa provided If 1 , < oo where

Ifll+Ifll~, if 1 < Q < 2 ,


Ifll + I f l l if 0 < 0.< 1 (34)
Ifll+Iflll++, if ~ = 1 ,(e=baseofthenaturallog).

T h e n {F",1.1I,} i s a complete metric space and the mapping T : 3"+ L'(P)


by (33) i s well-defined, one-to-one and the range RT = T ( F a ) C L O ( P )
8.4 Gaussian and General Additive Processes 487

consists of a-stable random variables. Moreover, for each 0 <


a1 < a z 2, <
the mapping T : Fa2+ Lal (u) is continuous between these metric spaces.
I n particular, if a = 0 = pp, and 0 <
a1 < a 2 <
2, then T : Lo2(u) +
L"1 (P) is an isomorphism into, whose range is the set of n2-stable symmetric
random variables with ch.f. 's given by

~ ( e ~ =
~ exp{-ctl"
~ ( ~ ) ) f o V u ) , f E Lm2(u). (35)

A detailed proof of this result with extensions if the spaces are vector val-
ued Lml(u; X ) and Lm2(P,y) where X , y are certain Banach (or even Frkchet)
spaces is given by Y. Okizaki (1979). We omit the details here. It uses various
properties of ch.f.'s and the work of Chapter 5. For a special class of random
measures, namely those defined by symmetric stable independent increment
process, M. Schilder (1970) has given a simpler description of the stochastic
integral (for a brief sketch of this procedure, see Problem 14). It is of inter-
est t o characterize the range R of the stochastic integral T as a subspace of
L O ( P ) ,but this is not a familiar object. For instance, it may be shown that
with (A = min, V = max) the metric given by

{RT, I . 1 1 ) becomes a complete linear metric (or a Frkchet) space, where


a A b = min(a, b) for pairs of real iiuinbers a , b and where p is a symmetric
n-stable random measure. (See also Exercise 5.25 (c).) We are now ready t o
introduce the classification of processes admitting integral representations.
Recall that a (real) process {Xt, t E I) is strictly stationary (as defined
at the end of the last chapter) if for each finite set of indices t l , . . . , tn E I
with t l + +
s , . . . , tn s E I for any s E I (and any index set I with such
an algebraic structure), all the (joint) distributions of ( X t l , . . . , Xt,, ) and
(Xtl+s, . . . , XtrL+,)are identical. Equivalently, their (joint) ch.f.'s satisfy

Now consider this property for a class of a-stable processes in a stronger


form leading t o an interesting area of probability with many applications. For
simplicity we treat here only the symmetric a-stable class. Thus, a process
{ X t , t E I) is termed n-stable if each finite linear combination C:=, a j X t l
is n-stable, as noted in Exercise 5.25 (c). Consequently, for each n >
1, the
finite dimensional ch.f. of X t l , . . . , Xt,, is representable as:
488 8 A Glimpse of Stochastic Processes

where we replaced the support of G, from the unit sphere S by Rn so that


e 2 < t l X > I = 1 and the G, measure is now defined on the Borelian space
(Rn,B,). But as n varies, the system of measure spaces {(Rn,B,, G,), n 1) >
changes, and the consistency of the finite diineiisioiial distributions of the
process implies (by the uiiiqueiiess theorem) that these G, satisfy the con-
ditions of Theorem 3.4.10 and, hence, there is a unique measure G on the
cylinder a-algebra B of R I whose projection, or n-dimensional marginal, sat-
isfies G, = G o T;' where T, : R I + Rn is the coordinate projection. (This
is not a random measure!) If such a G exists, it is called the spectral measure
of the a-stable process. An a-stable symmetric process for which (38) holds
with G, = G o n i l is called a strongly stationary a-stable process. To use the
word stationary again, we observe that it is automatically strictly stationary
+
as defined before. This may be seen as follows. Let t,, s , t, s E I c Rn. Then

= Ptl,...,t,, ( ~ 1 ,. . , u,), U3 € R. (39)


Thus a strongly stationary a-stable class is always strictly stationary. Now
(38) can be expressed more coiiveniently as follows. Consider the subspace
R(I) of RI consisting of all functions [or sequences if I is countable] that
vanish outside a finite set, the latter varying with functions. Thus R(I) E B,
Rn = n,(R(I)) [= n,(R1)], and we can express (38) for each finite subset J of
I as:

If the process is complex valued, then it has the corresponding form (Re =
real part) as:

Since the measure G on (&!(I),B) is obtained through an application of the


Kolmogorov-Bochner theorem, one may hope that all symmetric strictly sta-
tionary a-stable processes are also strongly stationary. However, it is shown
8.4 Gaussian and General Additive Processes 489

by Marcus and Pisier (19841, who introduced and analyzed this class, that
the inclusion is proper unless a! = 2 which corresponds t o the Gaussian case
in which they both coincide. The following (canonical) example of a strongly
stationary a-stable process shows the motivation for this class and its close
affinity with certain problems in (random) trigonometric series.

Example 7: Let ax t Rn be such that ExEIW7' a X O < m, and


{8x, X E Rn} be a set of independent a-stable symmetric variables. Consider
the process
X t x
XtR'"
crxQxe'<t'X', t € R n . (42)

It may be verified that {Xt, t x I = Rn} is a strongly stationary a-stable


<
process (or field) 0 < a! 2 with spectral measure G(.) given by

where hx(.) is the Dirac measure at X x Rn. We omit the details which are
not difficult, although not entirely simple, and refer t o the above noted paper.
An interesting outcome of this example is that if a = 2 then Qx must be
Gaussian by Theorem 5.3.2, and if 0 < a < 2 it is a stable process. However,
if Qx are the Radeinacher functions (i.e., independent sequence taking values
+1 aiid -1 with equal probability), theii (42) reduces t o a series considered by
Paley and Zygmund in 1932 about its convergence with probability one for
every A. If Qx = eZie~lxwhere for X = n, the 8, are independent uniformly dis-
tributed random variables on [0,1],then the series becomes a Steinhaus series.
Here the hypothesis of a-stability of QA was not (and could not be) assumed.
But the convergent series represents an a-stable process. These are not siin-
ple and both were considered in Paley-Zygmund papers and later by Salem
and Zygmund in 1954. The subject was further detailed and generalized by
Kahane (1968 and 1985). The point of this discussion is that a strongly sta-
tionary a-stable class contains these interesting results and a detailed analysis
aiid characterizations are obtained by Marcus and Pisier (1984) noted above.
This study moves in a different direction if a = 2 where we can admit a much
wider class of processes t o be discussed in the following section, and it is called
weak stationarity which will be of equal (and perhaps more) importance in
applications. A surprising fact is that for E X as Rademacher, Stienhaus or
Gaussian i.i.d. variables the series represents an a.e. coiitiiiuous function for t
or a.e. unbounded function (all t ) . One has t o employ different methods. We
refer t o Kahane (1985) for details.
We have noted in (33) above that integrals of the form Js f dp can be
defined for random measures p (with independent values on disjoint sets) on
(S,S , v) for f : S + R (or @) of bounded measurable class from LO(v).In
particular, if S = R aiid fx(s)= em\ X x R,theii one has
8 A Glimpse of Stochastic Processes

Processes admitting such a representation were introduced aiid studied by Y.


Hosoya (1982), K. Urbanik (1968) and others. This class should be termed
strictly harmonizable. Although this is similar t o strict stationarity, neither
includes the other. An immediate problem is t o characterize processes {Xt, t E
R) that admit the representation (44) for a (unique) random measure p,
perhaps of symmetric a-stable class. Here a brief indication of this problem
will be given aiid a more satisfactory second order case will be coilsidered in
the next section, for which the above discussion is also a strong motivation.
The first step is t o obtain a vector measure Z(.) such that X t admits a
(generalized) Fourier transform of Z , and then seek conditions in order that it
has independent values t o render Xt a strictly harinonizable process, and this
will demand that the index set have a group structure. We have just defined in
(32) and (33), the integral of a (bounded) scalar measurable function relative
t o a random (or vector) measure. The same definition holds if the vector mea-
sure takes values in a Banach space when the convergence there is understood
in terms of the metric of the Baiiach space instead of that of "in probability"
(or the corresponding equivalent Frkchet metric). In this connection, the fol-
lowing coiicept aiid theorem originally due t o S. Bochiier in the scaler case
and modified for the Banach space case by R.S. Phillips, are of special interest.

Definition 8 Let X be a Banach space and f : G + X be a mapping,


>
where G is a locally compact abelian group, so that G = Rn, n 1 is possible.
Then f is said t o be V-bounded (V for variation) provided the followiiig three
conditions hold:

(i) f (G) is bounded, or equivalently contained in a ball of X,


(ii) f is measurable relative t o the Bore1 a-algebras of X aiid G, aiid that
the range of f is separable (also termed "strongly measurable"),.
(iii) the set

is such that its closure W in the weak topology of X , is compact, where 'dt'
is the invariant or Haar measure of G [the Lebesgue measure if G = R n ] , and
g is the Fourier transform of g, [i.e., g(s) = Je < 9, 7 > dy or in the case that
G = R n , G = Rn and j(s) = JR ezsSXS(X) dX].
The point of this definition is that f is not required t o be positive definite.
Our aim is t o get a corresponding representation of f as in Theorem 4.4.2 (or
4.5.8), for a process {Xt, t E G). Here is the solution of the first step noted
when G is as above.
8.4 Gaussian and General Additive Processes 49 1

Theorem 9 Let X : G + L a ( P ) ,a > 1 be a process (also called a random


field). T h e n Xt = Je < t , s > dZ(s),t E G, (so it i s strictly harmonizable) i f l
X i s V-bounded and weakly continuous. [Here Z(.) is simply a vector measure.]

Details of the above result can be found, for instance, in the first author's
book, (Rao (2004), p. 550) and will be omitted. A consequence of this repre-
sentation is that {Xt, t E G) under the stated conditions is an integral of a
vector (particularly stochastic) measure Z : B(G) + L a ( P ) , a >
1. But now
one has to find the (probabilistic) properties of the measure Z related to the
random field {Xt, t E G) under consideration. If {Xt, t E G) is strictly sta-
tionary, then some special properties of Z should be obtained. The followiiig
result for strictly stationary a-stable processes answers the above question
and shows the basic role of the classical theory of probability here. It is essen-
tially a restatement of similar analyses of K. Urbaiiik (1968) aiid of Y. Hosoya
(1982) which is given using the new coiicept of strong stationarity introduced
above.

Theorem 10 Suppose { X t , t E R) i s a strictly harmonizable a-stable


process with its representing measure Z : B(R) + L m ( P ) ,(guaranteed by The-
orem 91, which i s also isotropic, meaning that for each z, X E R,Z(A,) and
edZ(A,) are identically distributed where A, = (-oo,z). T h e n {Xt, t E R)
i s strongly stationary a-stable. O n the other hand, if the process is strongly
stationary a-stable, 1 < a < 2 , then it i s V-bounded and hence i s strictly
harmonizable with the representing random measure isotropic in addition.

A key ingredient in obtaining the desired properties of Z(.), is to employ


a form of Fejer's theorem on suminability of Fourier series and integrals. This
was used in the Urbanik-Hosoya treatment if R is replaced by a compact
group. If the X t is not required to be the Fourier transform of a random
measure, but is merely a represeiitation of such a measure relative t o some
element ft E L0(dX), then there is a correspoiidiiig integral (of "Karhunen
type") as
(46)
where G c R is a compact group, and IftJ(.), j > 1) is dense in La(dX).
This is due to A t . Schilder and J. Kuelbs, but as yet there is no method
of construction available for these ft's. The structure and represeiitation of
these strict versioiis are somewhat intricate. This is further exemplified by the
fact that if the process {X,, n E Z ) is of independent aiid strictly stationary
elements, (hence i.i.d.1, then it is a-stable for a = 2 as a Gaussian sequence
but not for 0 < a < 2. Then X, cannot be a (finite) linear combination of
a-stable random variables of the same type. These specializations will not be
discussed further.
Although a-stability, 0 < a 5 2, plays a key role in this work, it can be
generalized further using the fact that it is a special class of the infinitely
492 8 A Glimpse of Stochastic Processes

divisible fanlily to get another strongly stationary process. This enlarges the
previous case and yet not exhausting the strict stationarity. We indicate this
notion, introduced by Marcus (1987), t o round up these ideas known a t this
time.
Comparing formulas (35) and (40) it is clear that the characteristic ex-
ponent, t H +(t) in exp(-$(t)) is a nonnegative, nondecreasing function,
and a similar statement holds for real symmetric infinitely divisible random
variables in Lkvy's form as seen in expression (20) of Section 5.2. In detail,
<
let be a synunetric infinitely divisible real random variable whose ch.f. can,
therefore, be expressed as

where we have set

) 0 and J,3°(x2 /j 1)d ~ ( 2 <


with ~ ( ( 0 ) = ) m, for the Lkvy measure N. Based
on this, (40) is reformulated as follows. If $(.) is the exponent of 6,then a
process { X t ,t E I} is termed <-radial if there is a (cylindrical) finite measure
G on the cylindrical a-algebra of the space IW('), such that

for all finite subsets t l , . . . , t, E I. If t j + s E I for each t j , s E I, then as before


this definition implies that the (-radial process { X t ,t E I) is also strongly sta-
tionary. The earlier work of Marcus and Pisier (1984) applies here and shows
that this enlarged class is still a proper subset (i.e. does not exhaust) of the
strictly stationary family. This enlargement substantially expands the study
of random Fourier series, and their sample path analysis has been advanced
by Marcus (1987). One should also note a n interesting extension of this work
by Cuzick and Lai (1980) in this connection. Some interesting analysis (but
no characterizations) of strictly harmonizable a-stable processes (1 < a 5 2)
are also in Weron (1985).
The restrictions can be substantially relaxed if we consider processes
with two moments. Many nonGuassian classes can be included. Here V -
boundedness of Definition 7, with X as a Hilbert space, plays a central role.
This aspect of the work will now be outlined in the next and final section of
the present chapter.
8.5 Second-Order Processes

8.5 Second-Order Processes


The class of processes { X t , t E I} c L 2 ( P ) t o be considered here uses the
Hilbert space geometry fully. This in conjuction with probabilistic ideas gives
an advantage for this class in applications in such subjects as prediction,
filtering and signal detection.
Consider the natural parameters of a second-order process, namely, its
mean and covariance fuiictioiis rn aiid r :

To make effective use of the Hilbert space geometry, we take the r.v.s as
complex valued, (otherwise we coinplexify the process for convenience) so

where, as usual, the overbar denotes the complex conjugate. The process is
called weakly stationary (or also termed K-stationary, K for Khintchine, or
in the wide sense) if rn(t)= coilstant and r ( s , t ) = f ( s t) with f assumed
-

as a Bore1 fuiictioii in (1). Recall that we defined strictly stationary processes


(or sequences) in Section 7.3 and studied in the above section, as those whose
finite-dimensional distributions are invariant under a shift of the time axis.
Thus a strict sense stationarity implies the weak sense version if the d.f.s have
two moments finite. Since r defined by (1) is positive definite, if I = R, then
by the Bochner-Riesz theorem (cf. Theorem 4.4.5), for a weakly stationary
process {Xt, t E R} we have

for almost all s-t E R (Lebesgue), aiid if r is also continuous, then (2) holds for
all s t E R.Here F is a bounded iiondecreasiiig noiiiiegative function, called
-

the spectral function of the process, uniquely determined by r. Because of this


connection again with the Fourier transform, one can consider the harmonic
analysis of these processes. A very simple example of such a family is any
complete orthoiiorinal sequence {f,, o o < n < oo} in (a separable) L2(P).
Then r(s t) =
- the Kronecker delta function, and clearly F 1 ( X ) dX =
dX/2.ir, F having a constant spectral density F1(relative t o Lebesgue measure.)
For a detailed analysis of stationary processes, we refer the reader t o the
books by Rozanov (1967), or Yaglom (1987). Hereafter a second order process
is assumed measurable, i.e. X : (t, w) H Xt(w) is a measurable mapping of
C x fl + C, so that the mean and covariance fuiictioiis are measurable.
For some problems, it is desirable t o relax the condition that r be a function
of the difference. The advantages of the Fourier analysis can still be retained in
generalizing (2) t o a class of second-order nonstationary processes. A process
494 8 A Glimpse of Stochastic Processes

{Xt, t E R) c L 2 ( p ) with means zero and covariance r is termed LoGve (or


strongly) hharmonizable [introduced by Lokve in an appendix t o Lkvy's book
(1948)l if
r ( s , t) = kk eiS*pitx1F ( ~ x ,XI), s , t E R, (3)

(the two diinensional Lebesgue integral) where F : R2 + @ is a covariance


function of bounded Vitali variation in the plane, i.e.,

i=l j=1
are disjoint intervals of R} < oo. (4)
Here F (A, B) = S sRz xA(X)xB( X I ) F (dX, dX') and F is again called the spec-
tral function (bimeasure) of the process. Clearly every weakly stationary pro-
cess is strongly harmonizable as can be seen when F ( . , .) concentrates on t h e
diagonal X = X I . A very simple harmonizable process which is not weakly
stationary is the following.
Let f E L1(R) and f be its Fourier transform: f (t) = SReitxf (A) dX. If <
is an r.v. with mean zero and unit variance, and X t = ( f (t), then {Xt,t E R}
is such a process. Harinonizable processes are finding interesting applications
in the areas mentioned above, and some basic theory of the strongly harmo-
nizable class is given by Lokve (1963).
Consider the simple stationary example given above. If 'HI = ~ { f ,n, 0) >
is the closed linear span in L ~ ( P ) let, Q : L ~ ( P+ ) 'HI be the orthogonal
projection with range 'Hl. If g, = Qf,, then g, = f, for n >
0, and =0 for
n < 0. Even though {f,, o o < n < oo} is a very simple weakly stationary
process, the gn-sequence is not strongly harmonizable. A proof is not difficult,
but is nontrivial. In fact, if { X t , t E R) c L 2 ( P ) is weakly stationary, and
T is a continuous linear mapping of L 2 ( P ) into itself, letting = TXt,
then {x, t E R} is generally not strongly harmonizable. However, every such
process can be shown t o be weakly harmonizable in the following sense.
A process {Xt, t E R} c L 2 ( P ) is weakly harmonizable if E ( X t ) = 0, and
its covariance can be represented as (3) in which the spectral function F is a
covariance function of bounded variation in Fre'chet's sense:

are disjoint Bore1 sets Ai c R). (5)


< <
It is easily seen that I F I v ( F ) oo, usually with a strict inequality between
the first terms. [v(F) is the Vitali variation cf. (4).] With this relaxation,
8.5 Second-Order Processes 495

but now the integral here has to be defined in the (weaker) sense of A t . Morse
and W. Transue and it is not an absolute integral, in contrast to Lebesgue's
definition used in (3). It is clear that each strongly harmonizable process is
weakly harmonizable, aiid the above examples show that the converse does
not hold. Most of the Lokve theory extends to this general class, although
different methods and techniques of proof are now necessary. These processes
are of considerable interest in applications noted above. The structure theory
of such processes and other results can be found in the literature [in particular,
see the first author's paper (1982)] aiid some further details will be given in
the problems section.
We present a brief characterization of weakly harmonizable processes to
give a feeling for those classes and classifications to contrast with the strict
sense classes discussed in the preceding section. Thus, we have the following di-
rect characterization of weakly harmonizable processes, specializing Theorem
4.9 above to the present (Hilbert space) context. For simplicity we coiisider
G = R, aiid give the essential details of proof.

Theorem 1 A process X : R + L i ( P ) i s weakly harmonizable if and


only if it is V-bounded and weakly continuous in the sense that t H l ( X ( t ) ) i s
a continuous scalar function for each continuous linear functional l o n L 2 ( P ) .

Proof If X is weakly continuous and V-bounded then, f being the Fourier


transform of f , one has: (I . 1 , denoting the uniform norm)

Also, by the Riemanii-Lebesgue lemma Y = {f : f E L1(R)} c Co(R), the


complete (under uniform norm) space of continuous complex functions vanish-
ing at 'cm',and y is uniformly dense in the latter since it separates points of
R and the Stone-Weierstrass theorem applies. Set et : X H e"', and consider
Tl : f H JR f (X)et(A) dX, t E R, so that TI : L1 (R) + Co(R) is one-to-one
aiid contractive. If we set TO(f ) = JR f ( t ) X ( t )d t ( L
~ i ( P ) ) , then the mapping
T2 : L1(R) + L i ( P ) is given by the commuting diagram

Now To is bounded and has a bound preserving extensions to all of Co(R),by


the density of y , and thus is a bounded linear mapping into the Hilbert space
496 8 A Glimpse of Stochastic Processes

L i ( P ) . We use the same symbol for the extension also. Hence, by a classical
Riesz representation theorem (cf. Dunford-Schwartz (1958), IV.7.3) which is
seen t o extend t o the locally compact case (such as R here), there exists a
unique measure Z : B(R) + L i ( P ) , such that

Then To ++ Z correspond to each other and (I Z I B , semi-variation of Z )

To see that X(.) is the Fourier transform of Z(.) so that it is weakly


harmonizable, consider for any l E ( L i ( P ) ) * , a continuous linear functional,
from (8) and the diagram relation

Applying a well-known theorem of E. Hille (cf. Dunford-Schwartz (1958),


p. 324 and p. 153) which allows commuting the integral and the functional &,
it follows that

J, f"(t)(e z)(dt) =
J, f (t)(e 0 x)(t) dt.
Substituting for f" and using Fubini's theorem on the left side (clearly valid
for signed measures), one has

so that

Since f is arbitrary, its coefficient function must vanish a.e., and it is actually
everywhere by the (weak) continuity of that element. This establishes that
X ( . ) is the Fourier transform of the (stochastic) measure Z(.), hence weakly
harmonizable.
Conversely, let X ( . ) be weakly harmonizable so that it is the Fourier trans-
form of a (stochastic or vector) measure Z(.). We claim it is V-bounded. Using
the representation of X t , it is seen that
8.5 Second-Order Processes 497

If & E (Li (P))*,then one has l ( X )(.) to be the Fourier transform of ( l o 2)(.),
and hence, for f E L1(R), it follows that, as before, for the Bochner (or vector
Lebesgue) integrable (f X ) (.), et (.) = eixt,

=LL f (t)et(A) (& o 2)(dA) dt, by Fubini's theorem,

=& ( ( A ) ( ) ) , using the same argument


of the direct part above. (13)
Since & E ( L ~ ( P ) ) *is arbitrary, we conclude from (12) and (13) that

and with n/Io = I Z l ( R ) ,

It follows that (14) implies (cf. Definition 4.7 above) that X is V-bounded,
since L i ( P ) is reflexive and as a Fourier transform of ( l o Z)(.) H ( l o X t ) , is
continuous, i.e., X(.) is weakly continuous, completing the proof.
The same argument extends easily for any locally compact abelian group.
It is not too difficult to show that the covariance function r(., .) of X(.) is
representable as

where F is a "bimeasure", and the integral now has to be defined not as a


standard Lebesgue integral, but in a weaker (nonabsolute) sense using the
work of M. Morse and W. Transue (1956). These will not be discussed here.
Several other extensions of second-order processes are also possible. For some
of these we refer to Cramkr's lectures (1971), Rao (1982), Chang and Rao
(l986), and, for the state of the art during the next decade, Swift (1997).
In this chapter, our aim has been to show some of the work on stochastic
processes growing out of the standard probability theory that we have pre-
sented in the preceding analysis. The reader should get from this an idea of
498 8 A Glimpse of Stochastic Processes

the vigorous growth of stochastic theory into essentially all branches of anal-
ysis, and will now be in a position t o study specialized works on these and
related subjects.

Exercises
1. Find an explicit form of an n-dimensional density of the Brownian mo-
tion process {Xt, 0 t < <
1).

2. Let {Xt, t > 0) be a Gaussian process with mean function zero and
>
covariance r given by r(s,t) = e S p t ,s, t 0. If 0 < t l < . . . < t,, show that
(Xtl , . . . 1 Xt,, ) has a density ft ,,..., t,, give11 by

[Such a family is called an Omstein- Uhlenbeck process.]

>
3. If { X t , t 0) is the Brownian motion of Section 1 and A, = {(t, w ) :
Xt(w) > a ) ,then verify that A, is X x P-measurable, where X is the Lebesgue
measure on R+ and P is the given probability. Thus X = { X t , t 0) is a >
(jointly) measurable random function on R+ x R + R.

4. Prove the remark following Eq. (4.10), namely: If {Xt, a t < <
b) is a
process such that for each E > 0, a t, u < <
b, limti, P [ X t - X,I >
E] = 0,
then the ch.f. q5t,7L : u H ~ ( e " ( ~ satisfies
~ ~ ~limti7,
~ ) )q5t,,(u) = 1 uniformly
in v, and conversely the latter property implies the stochastic continuity, as
given there.

5. Let q5 : R + C be an infinitely divisible ch.f. Show that there is a


probability space (a, C , P) and a stochastic process {Xt, t > 0) on it such
that X o = 0 having (strictly) stationary independent increments satisfying
>
( q 5 ( ~ ) ) ~= E ( e w x 1 ) , t 0. [Hznt: For each 0 = to < t l < . . . < t, de-
fine cjt1 ,,,,,t,, , an n-dimensional ch.f. of independent r.v.s Yl , . . . , Y, such that
E (eZ7LY7) = (q5(u))J, and set Xtl = Yl , Xt, = Yl + EL1
Y2, . . . , Xt,, = 5.
Verify that {q5t1, ,t,, , n > 1) is a consistent family of ch.f.s.1

>
6. (Converse t o 5) Let {Xt, t 0) be a stochastically continuous process,
X o = 0, with strictly stationary independent increments, on a probability
Exercises 499

space (R, C , P). Show that cjt : u H ~ ( e = exp{t


~ ~($(u) +
~ ~u a ) )) for a
constant a , and $(0) = 0, $ : R + C is continuous. Is stochastic continuity
automatic when the rest of the hypothesis holds?

7. Let {f,,n > 1) c L2(0,1) be any complete orthoiiorinal set and


{ & , n > 1) be independent N(O,1) r.v.s. Show that the process {X,, 0 5
t 5 1) given by

is a Brownian motion. (This observation is due t o L.A. Shepp, and the proof is
analogous t o that of Theorem 1.2, but the uniform convergence is less simple.)

8. (A Characterization of Brownian Motion, due t o P. Lkvy) Let {X,, 0 5


t 5 1) be a process on (R, C , P) with strictly stationary independent iii-
crements and Xo = 0. If the sample functions t H Xt(w) are continuous for
almost all w E 0 ,then the process is Brownian motion. [Hint: Observe that the
continuity hypothesis implies P[maxl<k<, IXk/n-X(k-l)/n > E] + 0 for each
E > 0, and this gives P I X 1 / , > E ] G ( i p 1 ) . Verify that & : u H E(eZUX1) is
an infinitely divisible ch.f. for which the Lkvy-Khintchine pair (yt, G t ) satisfies
yt = ty, Gt = tG, where G is a constant except for a jump at the origin.]

9. Solve the system of differential equations

Ph(t) = -XPo(t)
PA(t) = XP,(t) + XPnPl(t) for n >1 (1)

with the coiiditioiis Po(0) = 1 and Pn(0) = 0 for n > 1, recursively.


The
solution of this system (of differential equations) is the Poisson process of
Equation ( 5 ) of Section 4 above.
An alternate method of solution is obtained by letting

which is the probability generating function (p.g.f.) for the system (1). Rewrit-
ing (1) on multiplying by an appropriate power of s show that the partial
derivatives yield the p.g.f. $ which satisfies

with the boundary conditions that $(s, 0) = 1 and $(I, t) = 1.


Solve the partial differential equation (2) t o obtain
500 8 A Glimpse of Stochastic Processes

Expanding $(s, t) as a power series in s, obtain the Poisson process probabili-


ties as coefficients of this power series. Alternately, one could recognize $(s, t )
as the p.g.f. of a Poisson probability distribution with parameter At, assuming
(or establishing) a uiiiqueiiess t heorein for p.g.f.s.

10. Some properties of the Poisson process Nt are detailed in this problem.
(a) (Memoryless property of the exponential distribution) A random
variable T is said to have the memoryless property if for each s, t E R

Show that T with a coiitinuous distribution has the ineinoryless property if


and only if T is an exponential r.v. [ T exponentially distributed implying T
memoryless is straightforward, and for the converse show that if g : R + R+
is a iiondecreasiiig coiitinuous function which satisfies

for all s , t E R, then (being a probability)

for some coilstant n E Kt+. Using this, deduce that T has an exponential
distribution by coilsidering

g(t) = P [ T > t].]

(b) (Time of occurence in a Poisson process) Consider a Poisson process


>
{Xt, t 0) [Xt integer valued!] with rate parameter X > 0. Let TI be the time
of the first event's occurence in this process. Show that the conditional proba-
bility PITl < sl Xt = 11 is the distribution fuiictioii of a uniform [0,t] random
variable. Generalize this result to show for 0 < s < t and k = 0, 1, . . . , n

11. (The pure birth (linear birth) process) Consider a nonnegative integer-
>
valued process {Xt, t 0) with independent increments. A pure birth process
is obtained by assuming

for n = no, no + 1,.. . and X > 0. [Compare this with equation ( I ) above.]
Letting
P,(t) = P [ X t = n Xo = no]
and in a manner analogous to the derivation of the system of differential
equations for the Poisson process, show that a pure birth process {X,, t 0) >
has probabilities P, (t) which satisfy
Exercises

Either solving this system recursively, or using the p.g.f. method detailed in
Problem 9, show

Using PA(t), the derivative, show that the expected value m(t) = E(Xt)
satisfies the differential equation

ml(t) = Xm(t), with m(0) = no. (4)

[Remark: The pure birth process caii be motivated as a stochastic version


of the simple deterministic population model described by (4). Under fairly
iioiirestrictive conditions, it is possible t o formulate a stochastic version of a
large variety of first order population models. For a further discussion of these
ideas, see the second author's text (1999) and the papers by Swift (2001), and
Switkes, Wirkus, Swift and Mihaila (2003).]

12. (The general birth-death process) Consider a noiiiiegative integer val-


>
ued process {Xt, t 0) with independent increments. The general birth-death
process is obtained by assuming

P[Xt+nt =n + llXt = n] = AnAt + o(At) for n = 0,1,2,. ..


aiid
P[Xt+at = n - llXt = n] = p,At + o(At) for n = 1 , 2 , . . .
where An > 0 for n = 0, I , . . . , and pn > 0 for n = 1 , 2 , .. . .
Letting
P,(t) = P [ X t = n i x o = 01
show that the general birth-death process {Xt, t > 0) satisfies

Obtaining a solution to this general system of differential equations is not


easy without some coiiditioiis upon the An's and p,'s. However, it is possible
to obtain the steady-state distribution as follows. Suppose

lim Pn(t) = Pn (say) for n = 0,1,.. .


t+oo

exists. Then provided that the limit aiid the derivative caii be interchanged,
(formulate a simple [uniform] coiiditioii for this) show that the system (5) caii
be written as
8 A Glimpse of Stochastic Processes

An-lPn-l -(pn+Xn)Pn+pn+lPn+l =O for n > 1.


Show further that the system ( 6 ) of difference equations can be solved recur-
sively for Pn t o obtain

with 1

provided the series xZl nr=l % converges. This steady-state distribution


for the general birth-death process can be specialized t o find the steady-state
distribution for a wide class of birth-death processes. For instance, if An = X
for n = 0 , 1 , 2 , . . . , and pn = p ( > X >
0 ) for n = 1 , 2 , . . . , then show that
( 7 ) can be used t o obtain the steady-state distribution for the single-server
aueue:
pn = (')n (I-:) for 0 , 1 , 2 , . . . .

Recently, S.K. Ly (2004) considered the class of birth-death processes with


polyiioinial transition rates. Specifically, if X j = n ; = , ( a k j +
P k ) for j =
+
0 , 1 , . . . aiid pj = n : = , ( % j S k ) for j = 1 , 2 , . . . with p, q positive integers
a k , Pk, ~ k 6k, real numbers, show that

where ,F,(., ., .) is the generalized hypergeometric function defined as

which converges for J z J< ce if p 5 q and if p+q = 1, then the series converges
if z < 1. The symbol ( a ) , is known as the Pochhammer notation aiid it is
defined by
(a), =
r ( a n) + with ( a ) o = 1
r(a)
and 00

r(a)= zap' e -x d z , a > 0 ,

is the standard gamma function. Verify that this gives


Exercises

13. (A catastrophe process) An example of a non-birth-death process can


be obtained by considering a nonnegative integer valued process { X , , t > 0)
with independent increments as before, with its transitions given by j + j 1 +
>
with probability a A t + o ( A t ) for j 0 aiid j + 0 aiid probability y A t + o ( A t )
for j > >
1. Followiiig the method of exercises 11 aiid 12, show that { X t l t 0 )
satisfies
+
PA@)= apn-1 it) - ( a 7 ) P n i t )
where Pn(t) = P [ X ( t ) = n I X ( 0 ) = 01. Note that, since a catastrophe can
occur for any value of X i . ) , we have that PA ( t ) = y CFl
Pj ( t ) aPo ( t ) . This
-

expression simplifies using

C Pj ( t ) = 1 - Po(t).

to give PA@) = y - +
( a y ) P o ( t ) .For simplicity, assume X ( 0 ) = 0. This
>
implies that Po(0)= 1 and Pn(0) = 0 for all n 1. Show that this system of
differential equations has a solution

Show that in general by an iiiductioii argument that

Using the identity.


504 8 A Glimpse of Stochastic Processes

(which can be obtained by repeated integration by parts, or found in a stan-


dard table of integrals) where r(.,
.) is the incomplete gamma function and
a > 0 is a constant, that Pn is

or more succinctly

Deduce that the expected value of X ( t ) is

and the variance is

[This simple process was first considered by Swift (2000a), but has since found
a wide range of additional applications. Indeed, this idea has been applied t o
queueing models by B.K. Kumar, and D. Arivudainambi, (2000) and A. Di
Crescenzo, V. Giorno, A.G. Nobile, (2003) as well as population models by
R.J. Swift (2001), M.L. Green (2004). This simple catastrophe process has also
been extended t o a multiple catastrophe process by I. Chang, A.C. Krinik,
and R.J. Swift (2006), although a different method of solution is required.]

14. Let (S,B) be a measurable space, (R, C , P) a probability space and


X : B + L a ( P ) a symmetric random measure in the sense of Section 4
so that X ( . ) is a-additive in probability and takes independent values on
disjoint sets of B. Suppose X ( A ) is a symmetric a-stable random variable,
0 < a 5 2, A E B so that its ch.f. 4 : ( A , t ) H ~ ( e " ~ ( is~ given ) ) by:
4(A, t ) = e-C(A)tl" where C : B + IW+ is a "rate measure", a a-finite positive
measure fuiictioii on B. Let L O ( C ) be the Lebesgue space on (S,B, C ) . Then
C(A) = 0 + P [ ( X ( A ) = 01 = 1 and C(.) is again called a control measure
of X (.) . If f E La ( C ) is simple so f = Cr=la , x ~,,At E B, disjoint, a, E R,
define

Verify that the integral is well-defined (show that it does not depend upon
the representation of f ) and is a-stable. [Here the availability of the control
Exercises 505

measure is needed.] Show for each 0 < p < a! < 2, one has for each simple fn
in L a ( C ) ,

for some coilstant k(p, a ) > 0 depending only on p and a . [This nontrivial
fact was established independently by M. Schilder (1970) and in an equivalent
form slightly earlier by J. Bretagnalle, D. Dacunha-Castelle and J.L. Krivine
(1966).] If f E L a ( C ) is arbitrary, verify that there exist simple fn E La(C)
< <
such that fn + f in a-norm, 1 a! 2, and {Js fndX, n > 1) is Cauchy in
LP(P). If we set
f
f dX = lim
n
Is
f
fndX,

verify that Y is an a-stable symmetric random variable such that lYf l p , P


aiid I f ICY,C are equivalent. [This needs a careful computation. The work is
carried out using the "control measure C" as in Duiiford - Schwartz (1958),
Section IV.10 or see Schilder (1970). The integral is analogous t o Wiener's
original definition with S = R,B = Bore1 a-algebra of S and C ( . ) as Lebesgue
measure, with p = a = 2 (Brownian motion). But the fact that C ( . ) [and
X ( . ) ] has atoms aiid X ( . ) need not be symmetric in general, introduces some
significant difficulties. The latter aspect was considered by Okazaki (1978)
where he also considered the case that f takes values in a topological vector
space (such as a Fri.chet or Banach space). Thus the integration with stable
random measures (not Brownian motion) extends the work on Stochastic cal-
culus sigiiificaiitly aiid nontrivially.]

15. Let {Xt, t E R) c L 2 ( P ) be a second-order process with means zero


and continuous or measurable covariance r. It is said t o be of class (KF) (after
the authors J. Kampi. de Fi.riet and F . N. Frankiel who introduced it) if

exists. This class was also independently given by Yu. A. Rozanov and E.
Parzen, the latter under the name "asymptotic stationarity".) Show that r
is positive definite, and hence by the Bochner-Riesz theorem, coincides a.e.
(Lebesgue) with the Fourier trailsforin of a positive bounded noiidecreasing F ,
called the associated spectral function of the process. Verify that if Xt is real
valued and stationary theii Xt belongs t o class (KF), so that the stationary
processes are contained in class (KF). [It can be shown, that if Xt = +Zt,
where Zt is a zero mean stationary process and Yt a process with zero mean
+ +
and periodic covariance i.e. r(s k, t k) = r(s,t ) for some k, Y 1Z , these
processes are known as periodically correlated, theii X is in class (KF),but
is not stationary, so class (KF) contains iioiistationary processes.] Show that
any strongly harmonizable process belongs t o the class (KF). [It is also true
506 8 A Glimpse of Stochastic Processes

that some weakly harmonizable processes are contained in the class (KF), but
this proof is somewhat involved. In a slightly more general direction, a process
Xt which has covariance of the form

where g(., A) is a uniformly almost periodic function relative t o R and with


F(.,.) having finite Vitali variation then the process is called almost strongly
harmonizable. The class of almost strongly harmonizable processes contailis
the class of strongly harmonizable processes. This can be immediately seen by
setting g(t, A) = ezAt. Further, if the spectral bimeasure F (., .) concentrates
on the diagonal A = A' the process will be termed almost stationary. The
first author showed (Rao (1978)), that every almost strongly harmonizable
process belongs t o class ( K F ) . These ideas were extended by the second author
(Swift, (1997)) with the introduction of classes of iioiistationary processes
with covariance representation (8) and g(., A) satisfying the following Ceshro
summability condition:

exists uniformly in h and is bounded for all h, p > 1, where

These families are termed (c,p)-summable Crame'r, with p >1. It can be


shown that these nonstationary processes are contained in the class ( K F , p )
processes, p >1 where a process is ( K F , p ) if its covariance r satisfies for
>
p 1 and for each h E R the following limit condition:

16. Finally we describe briefly, another direction of second order random


measures arising in some applications. Thus let (R, B) be the Borelian line and
Z : B + L 2 ( P ) on a probability space (a, C , P),be 0-additive such that Z(.)
is additionally translation invariant in the sense that Z(A) = Z(7,A), x E R,
+
where T,A = { x y : x E A} E- B for all A E B. Then m(A) = E ( Z ( A ) ) =
rn(TzA), and b(A, B) = E ( Z ( A ) Z ( B ) ) = ~ ( T ~T,B).
A , Since the only transla-
tion invariant measure on R [ or Rn] is Lebesgue measure except for a coilstant
of proportionality factor, say a, we have m(A) = a p ( A ) , p being Lebesgue
measure. Suppose that the (clearly) scalar measures b(., B), b(A, .) for each
Exercises 507

A, B E B are such that b(A, B) = b(A x B) where b(.) : B + C is a scalar


measure (0-additive), so this gives b(., B ) = b(. x B) = b(B x .) = b(B,.),
being positive definite. [The general b(., .) is a bimeasure aiid b results from
the additional condition that b is translation invariant, is also terined weak
stationarity of the random measure Z(.) .]
Let K be the space of infinitely differentiable scalar functions on R vanish-
ing outside of compact sets. Thus for every compact set A of B, there exists
f, E K such that f, J X A . This well-known result is from real analysis (c.f. e.g.
Rao (1987, 2004), Proposition 1, p. 632). Then the preceding can be restated
as F(f) = JR f (t)Z(dt) E L 2 ( P ) for each f E K (the integral as in Equa-
tion (811, and show that m(A) = lim,,, JR fn(t)dm(t),fn J, X A for compact
A c R. Here F : K + L ~ ( P is ) called a generalized random process (and field
if R is replaced by IW" ,k >1). It is a classical result of K. It6,- in the mid-
19507s,that the positive definite fuiictioiial P( f , g) = E ( F (f ) F ( g ) ) admits
an integral representation, for a unique positive measure u : B + R+ satis-
fying Jpg $$$& >
< ce for some integer ,k O [such a measure v is usually
terined "tempered"], given by ,!?( f , g) = JR f ( t ) m d v ( t ) ,f , j being Fourier
transforms of f , g E K . Specializing this, show that for bounded Bore1 sets
A, B one has

This representation was essentially given in Thornett (1978) with applications,


aiid an n-dimensional version was discussed already in Vere-Jones (1971.) Ex-
tensions from the point of view of generalized random fields was obtained by
Rao (1969), aiid that of Thornett's work, for some general harinonizable and
KarhunenlCram4r random fields was presented by Swift (2000b). It is noted
that the generalized random fields is the proper setting for this work and ap-
plication. We merely draw the attention of the reader t o the ideas described
here aiid omit further analysis and extensions, with the remark that Thor-
nett's work used some results of Argabright and Gil de Lamadrid (1974) on
Fourier tansforms of unbounded measures which, as the latter authors indi-
cated, is a form of Schwartz distribution theory.

The broad classes of lionstationary process briefly described in Problems


15 aiid 16, aiid in fact the last two sections above, give just a glimpse of the
breadth of the emerging study of processes related t o the harinonizable class
and the broader Stochastic Analysis proper.
References

Argabright, L. and Gil de Lamadrid, J. (1974). Fourier analysis of unbounded


measures on locally compact Abelian groups. Memoirs of A m . Math. Soc. 145.

Barndorff-Nielsen, 0. (1964). On the limit distributions of the maximum of


a random number of independent random variables. Acta Math. ( H u n g . ) 15,
399-403.

Bhattacharya, R. N., and Rao, R. R. (1976). "Normal Approximation and


Asymptotic Axpansions." Wiley, New York.

Billingsley, P. (1968). "Convergence of Probability Measures." Wiley, New


York.

Billingsley, P. (1995). "Probability and Measure.", 3rd. ed., Wiley, New York.

Blackwell, D.,and Dubins,.L. E. (1963). A converse to the dominated coiiver-


gence theorem. Illinois J. Math. 7, 508-514.

Bochner, S. (1955). "Harmonic Analysis and the Theory of Probability." Univ.


of California Press, Berkeley, Calif.

Bochner, S. (1975). A formal approach to stochastic stability. 2. Wahr. 31,


187-198.

Breiman, L. (1968). "Probability." Addison-Wesley, Reading, Mass.

Bruckner, A. M. (1971). Differentiation of integrals. A m e r . Math. Monthly 78,


1-51 (Special issue, Nov., Part 11.)

Cartan, H. (1963). "Elementary Theory of Analytic Functions of One or Sev-


eral Complex Variables." Addison-Wesley, Reading, Mass.
510 References

Chang, D. K. and Rao, 1LI.M. (1986). Bimeasures and nonstationary processes.


in Real and Stochastic Analysis, 7-118, Wiley, New York.

Chen, Z., Rubin, H. aiid Vitale, A. (1997). Independelice and Determination


of Probabilities. Proc. Amer. Math. Soc., 125, 3721-3723.

Chung, K.L. and Ornstein, D.S. (1962). On the recurrence of sums of random
variables. Bull. Amer. Math. Soc., 68, 30-32.

Chung, K. L. (1974). "A Course in Probability Theory," 211d ed. Academic


Press, New York.

Ciesielski, Z. (1961). Holder conditions for realizations of Gaussian processes.


Tmn. Amer. Math. Soc. 99, 403-413.

Cram&, H. (1946). "Mathematical Methods of Statistics." Princeton Univ.


Press, Princeton, New Jersey.

Cram&, H. (1970). "Random Variables and Probability Distributions," 3rd


ed. Cambridge Univ. Press, New York.

Cram&, H. (1971). "Structural and Statistical Problems for a Class of Stochas-


tic Processes." Princeton Univ. Press, Princeton, N.J.

Csorgo, M., aiid Revesz, P. (1981). "Strong Approximatioils in Probability


and Statistics." Academic Press. New York.

Cuzick, J., and Lai, T.L. (1980). On random Fourier series. Trans. Am. Math.
Soc. 261, 53-80.

DeGroot, M. H., and Rao, A/I. M. (1963). Stochastic give-and- take. J. Math.
Anal. Appl. 7, 489-498.

Dharmadhikari, S. W., and Sreehari, M. (1976). A note on stable character-


istic functions. Sankhyu Ser. A., 38, 179-185.

Di Crescenzo, A., Giorno, V., Nobile, A. G. (2003). On the M / M / l Queue


with Catastrophes and its Continuous Approximation, Queueing Systems, 43,
329-347.

Donsker, M.D. (1951). An iiivariaiice principle for certain probability limit


theorems. Mem. Amer. Math. Soc. 6 , 1-12.

Doob, J. L. (1953). "Stochastic Processes." Wiley, New York.


References 511

Dubins, L. E., and Savage, L. J . (1965). "How t o gamble if you must."


RkGraw-Hill, New York.

Dunford, N., aiid Schwartz, J. T . (1958). "Linear Operators. Part I: General


theory." Wiley-Interscience, New York.

Dvovetzky, A., Erdos, P., and Kakutani, S. (1961). Nonincreasing everywhere


of the Brownian motion process. Proc. Fourth Berkeley Symp. in Math. Statist.
Prob. 2, 103-116.

Dynkin, E. B. (1961). "Foundations of the Theory of Markov Processes" (En-


glish translation). Prentice-Hall, Englewood Cliffs, N.J.

Edgar, G. A., and Sucheston, L. (1976). Amarts: A class of asymptotic mar-


tingales. J. Multivar. Anal. 6, 193-221.

Edwards, W.F. (2004). "Dependent Probability Spaces." M S . Thesis, Cali-


fornia State Polytechnic University, Pomona.

Eisenberg, B. aiid Ghosh, B.K. (1987). Independent events in a discrete uni-


form probability space. Amer. Stat. 41, 52-56.

Feller, W . (1943). The general form of the so-called law of the iterated loga-
rithm. Trans. Amer. Math. Soc. 54, 373-402.

Feller, W. (1957). "An Introduction t o Probability Theory and its Applica-


tions," Vol. I (211d Ed.). Wiley, New York.

Feller, W. (1966). "An Introduction t o Probability Theory and its Applica-


tions," Vol. 11. Wiley, New York.

Gikhman, I. I., aiid Skorokhod, A. V. (1969). "Introduction t o the Theory of


Random Processes" (English translation). Saunders. Philadelphia.

Gnedenko. B. V.. and Kolmogorov. A. N. (1954). "Limit Distributions for


Sums of Independent Random Variables" (English translation). Addisoii-
Wesley. Reading, Mass.

Green, R t . L. (2004). The Immigration-Emigration with Catastrophe Model, n


Stochastic Processes and Functional Analysis, A Volume of Recent Advances
in Honor of M.M. Rao, A. C. Krinik, R. J. Swift, Eds., Vol. 238 in the Lecture
Notes in Pure and Applied Mathematical Series, Marcel Dekker, New York,
149-159.
512 References

Gundy, R. I?. (1966). Martingale theory and pointwise convergence of certain


orthogonal series. Trans. Amer. Math. Soc. 124, 228-248.

Haldaiie, J. B. S. (1957). The syadvada system of prediction. Sankhya 18,


195-200.

Hall, P., and Heyde, C. C. (1980). "Martingale Limit Theory and its Appli-
cation." Academic Press. New York.

Halmos, P.R. (1950). "Measure Theory." Van Nostrand, Princeton, N.J.

Hardy, G. H., Littlewood, J. E., and Polya, G. (1934). "Inequalities." Cam-


bridge Univ. Press, London.

Hayes, C. A., and Pauc, C. Y. (1970). "Derivation and Martingales." Springer-


Verlag, Berlin.

Hewitt. E., and Savage, L. J. (1955). Symmetric measures on cartesian prod-


ucts. Trans. Amer. Math. Soc. 80, 470-501.

Hida, T . (1980). "Brownian Motion." Springer-Verlag, Berlin.

Hosoya, Y. (1982). Harmonizable stable processes. 2. Wahrs. 80, 517-533.

Hsu, P. L., and Robbins, H. (1947). Complete convergence and the law of
large numbers. Proc. Nut. Acad. Sci. 33, 25-31.

Ibragimov, I.A., and Linnik, Ju. V. (1971) "Independent and Stationary Ran-
dom Variables." Norodhoff Publishers, The Netherlands.

Ionescu Tulcea, C. (1949). Mesures dans les espaces produits. Atti Accad. Naz.
Lincei Rend. Cl. Sci. Fis. Mat. Natur. 7, 208-211.

Ionescu Tulcea, A., and Ionescu Tulcea, C. (1969). "Topics in the Theory of
Lifting." Springer- Verlag, Berlin.

Jessen, B., and Wintner, A. (1935). Distribution functions and the Rieinann
zeta function. Trans. Amer. Math. Soc. 38, 48-88.

Kac, M., and Slepian, D. (1959). Large excursions of Gaussian processes. Ann.
Math. Statist. 30, 1215-1228.

Kahane, J. P. (1968). (211d ed. 1985). "Some Random Series of Functions."


Cambridge University Press, Cambridge, U.K.
References 513

Kakutani, S. (1951). Random ergodic theorems and Markoff processes with a


stable distribution. Proc. Second Berkely Symp. in Math. Statist, and Prob.,
247-261.

Kendall, D. G. (1959). Unitary dilations of Markov transition operators, and


the corresponding integral representations for transition-probability matri-
ces, in "Probability and statistics" (The Harald Cram& volume). Wiley, New
York, 139-161.

Kolmogorov, A. N. (1933). "Grundbegriffe der Wahrscheiiilichkeitsrechnung."


Springer-Verlag, Berlin.

Kuelbs, J. (1973). A representation theorem for symmetric stable processes


and stable measures on compact groups. 2.Wahrs. 26, 259-271.

Kumar, B. K, Arivudainambi, D. (2000). Transient Solution of an M / M / l


Queue with Catastrophes. Computer and Mathematics with Applications, 40,
1233-1240.

Lai, T.L. (1974). Suminability methods for i.i.d. Random Variables. Proc.
Amer. Math. Soc. 4 5 , 253-261.

Lamb, C. W. (1974). Representation of functions as limits of martingales.


Trans. Amer. Math. Soc. 188, 395-405.

Lkvy, P. (1937). (211d ed. 1954). "Thkorie de l'addition des variables" Gauthier-
Villars, Paris.

Lkvy, P. (1948). "Processus stochastiques et mouvement brownien." Gauthier-


Villars, Paris.

Linde, W. (1983). "Probability in Baiiach Spaces-Stable and Illfinitely Di-


visible Distributions." Wiley-Interscience, New York.

Lokve. M. (1963). "Probability Theory" (3d ed.). Van Nostrand, Princeton.


N.J.

Ly, S.K. (2004). "Birth-Death Processes with Polyiiomial Transition Rates."


M S . Thesis, California State Polytechnic University, Pomona.

McKeaii, H. P., Jr. (1969). "Stochastic Integrals." Academic Press, New York.

McLeish, D. L. (1974). Dependent central limit theorems and iiivariaiice prin-


ciples. Ann. Prob. 2, 620-628.
514 References

Mahalanobis, P. C. (1954). The foundations of statistics. Dialecta 8, 95-111.


(Reprinted in Sankhyu 18 (1957). 183-194.)

Maistrov, L. E. (1974). "Probability Theory: A Historical Sketch" (Transla-


tion). Academic Press. New York.

Marcus, M. B. and Pisier, G. (1984). Characterizations of almost surely con-


tinuous a-stable random Fourier series and strongly stationary processes. Acta
Math 152. 245-301.

Marcus, M. B. (1987). $-radial processes aiid random Fourier series. Memoirs


of the American Mathematical Society, Vol. 68, 1-181.

Meyer, P. A. (1966). "Probability and Potentials." Blaisdell, Waltham,


Mass.

Mooney, D.D. , and Swift, R.J. (1999). "A Course in Mathematical Model-
ing." The Math. Assoc. of America, Washington D.C.

Morse, A/I. aiid Trausue, W. (1956) Biineasures aiid their integral extensions.
Ann. Math. 64, 480-504.

Neal, D.K. and Swift, R.J. (1999). Designing Payoffs for Some Probabilistic
Gambling Games. Miss. J. Math. Sci., 11, 93-102.

Neuts, A/I. F. (1973). "Probability." Allyii & Bacon, Boston, Mass.

Neveu, J. (1965). "Mathematical Foundations of the Calculus of Probability."


Holden-Day, Inc. San Francisco, CA.

Okazaki, Y. (1979). Wiener integral by stable random measure. Fae. Sci.,


Kyushu Univ., A , Math. 33, 1-70.

Paley, R.E.A.C. and Wiener, N. (1934). "Fourier Transforms in the Complex


Domain." Amer. Math. Soc., Providence, R.I.

Parthasarathy, K. R. (1967). "Probability Measures on Metric spaces." Aca-


demic Press, New York.

Parzen, E. (1962). On the estimation of a probability density and mode. Ann.


Math. Statist. 33. 1065-1076.

Pierre, P. A. (1971). Infinitely divisible distributions, coiiditioiis for indepeii-


dence, and central limit theorems. J. Math. Anal. Appl. 33, 341-354.
References 515

Prokhorov, Yu. V. (1956). Convergence of random processes and limit theo-


rems in probability theory. Theor. Veroyat. Premenin. 1, 157-214.

Ramachandran, B., aiid Rao, C. R. (1968). Some results on characteristic


fuiictioiis and characterizatioiis of the normal aiid generalized stable laws.
Sankhyu, Ser. A., 30, 125-140.

Ramaswamy, V., Balasubramanian. K., and Ramachandran, B. (1976). The


stable laws re- visited. Sankhya, Ser. A., 38, 300-303.

Rao, M. M. (1961). Consistency and liinit distributions of estimators of pa-


rameters in explosive stochastic difference equations. Ann. Math. Statist. 32,
195-218.

Rao, M. A/I. (1962). Theory of order statistics. Math. Ann. 147, 298-312.

Rao, M.A/I. (1969). Representation theory of multidimensional generalized ran-


dom fields. Proc. Symp. Multivariate Analysis, 2, Acad. Press, New York,
411-435.

Rao, M. M. (1982). Harinoiiizable processes: Structure theory. L'Enseian.


Math. (211d Series) 28, 295-351.

Rao, A t . M. (1979). "Stochastic Processes and Integration." Sijthoff and No-


ordhoff, Alpheii aan den Rijn, The Netherlands.

Rao, A/I. A/I. (1981). "Foundations of Stochastic Analysis." Academic Press,


New York.

Rao, M. A t . (1987). "Measure Theory and Integration." Wiley Interscience,


New York. Second Edition, Enlarged, Marcel Dekker, Inc., New York. (2004).

Rao, M. A t . (2000). "Stochastic Processes: Inference Theory." Kluwer Aca-


demic Publishers, Dordrecht, The Netherlands.

Rao, M.M. (2004a). Coiivolutions of vector fields - 111: Amenability and


spectral properties, in "Real and Stochastic Analysis: New Perspectives ,
Birkhauser, Boston, MI, 375-401.

Rknyi, A. (1953). On the theory of order statistics. Acta Math. (Hung.) 4,


191-232.

R&iyi, A. (1960). On the central liinit theorem for the sum of a random num-
ber of independent random variables. Acta Math. (Hung.) 11, 97-102.
516 References

Royden, H. L. (1968). "Real Analysis" (2nd. ed.). Macmillan, Sew York.

Rozanov, Yu. A. (1967). "Stationary Random Processes" (English transla-


tion). Holden-Day, San Francisco.

Samorodnitsky, G. and Taqqu, M S . (1994). "Stable Son-Gaussian Random


Processes: Stochastic Models with Infinite Variance." Chapman & Hall, Lon-
don. UK.

Schilder, A/I. (1970). Some structure theorems for the symmetric stable laws.
Ann. Statist. 41, 412-421.

Shiflett, R.C., and Schultz, H.S. (1979). An Approach to Independent Sets.


Math. Spec. 12, 11-16.

Shohat, J.,and Tamarkin, J . D. (1950). "The Problem of Moments " (2nd


ed.). Amer. Math. Soc., Providence, R.I.

Sion, M. (1968). "Introduction to the Methods of Real Analysis." Holt, Rine-


hart, and Winston, New York.

Skorokhod, A. V. (1965). "Studies in the Theory of Random Processes" (En-


glish translation). Addison-Wesley, Reading, Massachusetts.

Spitzer, F. (1956). A combiliatorial lemma and its application t o probability


theory. Trans. Amer. Math. Soc. 82, 323-339.

Stomberg, K. (1994). "Probability for Analysts." Chapman & Hall, CRC


Press.

Sudderth, W. D. (1971). A "Fatou equation" for randomly stopped variables.


Ann. Math. Statist. 42, 2143-2146.

Swift, R.J. (1997). Some Aspects of Harmonizable Processes and Fields, in


Real and Stochastic Analysis: Recent Advances, Ed. R/I.R/I. Rao, 303-365, CRC
Press.

Swift, R.J. (2000a). A Simple Immigration-Catastrophe Process, The Math.


Sci., 25, 32-36.

Swift, R.J. (2000b). Sonstationary Random Measures, Far East Journal of


Th. Stat., 4, 193-206.
References 517

Swift, R. J.(2001). Transient Probabilities for a Simple Birth-Death-Immigration


Process Under the Influence of Total Catastrophe, Int. J. Math. Math. Sci.,
25. 689-692.

Switkes, J., Wirkus, S., Swift, R.J., and Mihaila, I. (2003). On the Means of
Deterministic and Stochastic Populations, The Math. Sci., 28, 91-98.

Thornett, ALL. (1978). A Class of Second-Order Stationary Random Mea-


sures, Stochastic Processes Appl. 8 323-334.

Tjur, T . (1974). "Conditional Probability Distributions." Inst. Math. Statist.,


Univ. of Copenhagen, Denmark.

Tucker, H. G. (1967). "A Graduate Course in Probability." Academic Press,


New York.

Urbanik, K. (1968). Random measures and harmonizable sequences. Studia


Math. 31, 61-88.

Vere-Jones D. (1974). An Elementary Approach t o Spectral Theory of Statioii-


ary Random Measures. in Stochastic Geometry, E.A. Harding, D.G. Keiidall
Eds. Wiley, New York, 307-321.

Wald, A. (1947). "Sequential Analysis." Wiley, New York.

Weron, A. (1985). Harmoiiizable stable processes on groups: Spectral, Ergodic,


and Interpolation Properties. 2. Wahrs. 68, 473-491.

Wiener, N. (1923). Differential space, J . Math. Phys. (M.I.T.), 2, 131-174.

Zaanen, A. C. (1967). "Integration" (211d. ed.). North-Holland, Amsterdam,


The Netherlands.

Zolotarev, V.M. (1986). "One-dimensional Stable Distributions.", Amer. Math.


Soc. Trans. Monograph, Vol. 65, Providence, R.I.

Zygmund, A. (1959). "Trigonometric Series." Cambridge University Press,


Cambridge, UK.
Author Index

ceby;ev, P.L., 58 Chen, Z., 88


Choquet, G, 218
Alexandroff, A.D., 277 Chow, Y.S., 426
Andersen, E.S., 182, 185, 201 Chuiig, K.L., 41, 81, 85, 98, 101,
Anderson, E.S., 405 145, 250, 265, 379
Andrk, D., 357 Ciesielski, Z., 460
Anscombe, F.J., 436 Cormack, A.M., 268
Arivudainambi, D., 504 Cram&, H., 96, 239, 251, 253, 257,
Austin, D.G., 214 260, 266, 296, 369, 449, 497
Csorgo, M., 364
Bachelier, L, 459
Bahadur, R.R., 205 Dacunha-Castelle, D., 505
Barndorff-Nielsen, O., 456 de Finetti, B., 5, 306
Bawly, G., 320 de Moivre, A., 292
Bergstrom, H., 296 DeGroot, RLH., 209
Berk, K.N., 408 Deny, J., 218
Bernoulli, J., 58 Devinatz, A., 289
Bernstein, S., 33 Dharmadhikari, S.W., 395
Berry, A.C., 296 Di Crescenzo, A., 504
Bhattacharya, R.N., 302 Doeblin, W., 140
Billingsley, P., 345, 351, 478, 481 Donsker, M.D., 342, 347
Bingham, M.S., 274 Doob, J.L., 8, 30, 126, 127, 140,
Birkhoff, G.D., 103 178, 182, 186, 215, 390, 420,
Bishop, E., ix, 28, 137 422
Blackwell, D., 109, 203, 208 Dubins, L.E., 109
Bochner, S., 73, 169, 185, 256, 341, Dunford, N., 505
396, 490 Dvoretzky, A., 465
Borel, E., 4, 58, 94 Dynkin, E.B., 8, 10
Borwain, J.M., 95
Bourbaki, N., 47 Edgar, G.A., 456
Bray, H.E., 225 Edwards, W.F., 87
Bretagnalle, J., 505 Egorov, D.F., 50
Brower, L.E.J., ix Einstein, A., 459
Brown, R., 459, 465 Eisenberg, B., 87
Bruckner, A.M., 131, 135 Erdos, P., 341, 406
Burkholder, D.L., 206 Essken, C.G., 296
Esscher, F., 369
Carleson, L., 220
Cartan, H., 249 Fefferman, C., 220
Chang, D.K., 497
Chang, I., 504
Author Index

Feller, W., 44, 78, 140, 147, 212, Kendall, D.G., 212
293, 325, 332, 369, 375, 399, Khintchine, A,, 59, 103, 242, 283,
406, 419, 478, 481 292, 306, 328, 336, 365, 389,
Fisher, R.A., 205 397, 399, 467
Fisk, D.L., 201 Kibble, W.F., 387
Frankiel, F.N., 505 Kolmogorov, A N . , 4, 5, 53, 63,
Fuchs, W.H.J., 81, 101 64, 77, 96, 140, 153, 158, 159,
169, 173, 185, 306, 341, 363-
Garsia, A.M., 451 365
Ghosh, B.K., 87 Koloinogorov, A N . , 358
Gikhman, I.I., 351 Krinik, A.C., 504
Giorno, V., 504 Krivine, J.L., 505
Gnedenko, B.V., 285, 308, 312, 322, Kuelbs, J., 491
326, 341 Kumar, B.K., 504
Green, M.L., 504
Gundy, R.F., 219 Li:vy, P., 90, 93, 99, 103, 140, 147,
240, 289, 293, 306, 331, 335,
Hhjek, J., 97, 178 336, 341, 390, 391, 395, 397,
Haldane, J.B., 4 466, 470, 483, 485, 494, 499
Hall, P., 436 Lamb, C.W., 220
Halmos, P.R., 177, 205 Laplace, P.S., 292
Hardy, G.H., 17, 364 Liapounov, A., 4, 293
Hartman, P., 374 Linde, W., 396
Hausdorff, F., 364 Lindeberg, J.W., 292, 293, 325
Hayes, C.A., 131, 135 Lindley, D.V., 77
Helly, E., 224, 225 Littlewood, J.E., 17, 364
Hewitt, E., 44 Lokve, M., 249, 436, 494
Heyde, C.C., 436 Ly, S.K., 502
Hida, T., 470
Hoeffding, W., 197 Mahalanobis, P.C., 4
Horowitz, J., 426 Maistrov, L.E., 4
Hosoya, Y., 490, 491 Mann, H.B., 95, 347
Hsu, P.L., 100, 287 Marcinkiewicz, J., 215, 289
Hunt, G.A., 140, 404 Marcus, M.B., 489, 492
Hunt, R., 220 Markov, A.A., 59, 103, 140
Marshall, A.W., 97
It6, K., 507 Mathias, M., 257
McKean, H.P., 467, 470
Jerison, M., 216 McLeish, D.L., 436, 455
Jessen, B., 91, 158, 159, 163, 182, Mercer, T., 471
185, 201 Meyer, P.A., 24
Mihaila, I., 501
Kac, M., 133, 341, 404 Mooney, D.D., 501
Kahane, J.P., 58, 489 Morse, M., 495, 497
Kakutani, S., 457
Kampi: de Fkriet, J., 505 Neal, D.K., 95
Author Index

Neuts, M.F., 4 Schatte, P., 379


Neyman, J., 205 Schilder, M., 487, 491, 505
Nobile, A.G., 504 Schultz, H.S., 87
Schwartz, J.T., 505
Okazaki, Y., 487 Shanks, D., 95
Orey, S., 407 Shepp, L.A., 499
Omsteiii, D.S., 85 Shiflett, R.C., 87
Ottaviani, M.G., 97 Shohat, J , 234
Sion, M., 13
Pdya, G., 101, 245, 264 Skorokhod, A.V., 32, 89, 337, 351,
Paley, R., 90, 489 358
Parthasarathy, K.R., 207, 274, 351 Slepian, D., 133
Parzen, E., 72, 505 Slutsky, E.E., 96
Pauc, C.Y., 131, 135 Smirnov, N.V., 341, 363
Phillips, R.S., 490 Spitzer, F., 81, 401
Pierre, P.A., 388, 393 Sreehari, M., 395
Pisier, G., 489, 492 Steiiihaus, H., 284
Poincari., H., 449
Strassen, V., 363, 374
Poisson, S.D., 303 Stromberg, K.R., 281
Poly'a, G., 17 Suchestoii, L., 456
Pri.kop&,A., 478 Switkes, J.M., 501
Pratt, J.W., 31
Prokhorov, Yu. V., 342, 351, 358, Tamarkin, J.D., 234
363 Teicher, H., x, 366
Tjur, T., 136
Ri.iiyi, A., 4, 5, 97, 136, 178, 358, Transue, W., 495, 497
436, 449, 456 Tucker, H., 249
R6vi.s~~ P., 364 Tulcea, I., 136, 164, 165, 171, 189
Radon, J., 268
Raikov, D.A., 289 Ulam, S.M., 456
Rajchmaii, A., 61 Urbanik, K., 490, 491
Ramachaiidraii, B., 397
Ramaswamy, V., 341 Vitale, R.A., 88
Rao, C.R., 203, 397 Voiculescu, D., 341
Rao, R.R., 72, 302 von Mises, R., 4
Revuz, D., 470 von Neumann, J., 456
Riesz, F., 49, 218, 260 von Smoluchovski, M., 459
Robbins, H., 100, 287
Royden, H.L., 13, 90 Wald, A., 95, 347, 415
Rozanov, Yu, A., 493 Wendel, J.G., 401
Rozanoz, Yu. A., 505 Weron, A., 492
Rubiii, H., 88 Weyl, H., 95
White, J.S., 383
Salem, R., 489 Wiener, N., 90, 459, 465
Savage, L.J., 5, 44, 205 Wintner, A., 91, 374
Sazoiiov, V.V., 302 Wirkus, S.A., 501
Wold, H., 266
Wrench, J.W., 95

Yaglom, A.M., 493


Yor, D., 470

Zaanen, A.C., 20
Zolotarev, V.M., 296, 327
Zygmund, A., 484, 485, 489
Subject Index

V-bounded, 490 Coiiditioii of Lindeberg, 325


A-class, 10 Coiiditioiial
T-class, 10 dominated convergence, 107
p-norm, 14 expectation, 104
~ e b y i e v ' sinequalith 19 expectation (elementary), 104
Fatou's lemma, 107
Accompanying laws, 321 monotone convergence, 107
Adapted, 414 probability (elementary), 104
Adapted sequence, 174 probability function, 105
Adjunction, 40 Vitali convergence, 107
Algebra, 5 Conditional expectation operator,
Amart, 456 106
Asymptotically constant, 318 averaging property, 107
Atoms, 207 commutativity property, 107
contractive property, 107
Bayes formula, 208
faithful, 107
Bernoulli trial, 7
Coiiditioiially independent, 137
Best (nonlinear) predictor, 203
Continuity set, 266, 345
Biineasure, 494
Continuity theorem for ch.f.s, 240
Birkhoff's Ergodic Theorem, 452
Control measure, 504
Bootstrap method, 71
Convergence
Bore1 space, 207
a.e., 46
Borel-Cantelli lemma, 41
complete, 100
Bounded in probability, 96
in distribution, 46
Bounded stopping, 415
in mean, 46
Brownian motion, 91, 343, 344, 459
in probability, 46
Cauchy-Buniakowski-Schwarz inequal- P-uniformly, 50
ity, 16 unconditional, 57
CBS inequality, 16 vague, 47
Central limit theorem, 292 Weak-star, 47
Ces&ro summability, 63, 506 Convex, 13
Chapman-Kolmogorov equation, 146 Convolution, 235
Character, 272 Correlation, 18
Characteristic exponent, 335, 337, Correlation characteristic, 152
484 Covariance, 18
Characteristic function, 22 Cummulant function, 281
Characteristic functions, 234 Cylinder sets, 160
Coefficient of variation, 284
Cofinal, 132 de La Vallke Poussin criterion, 24
Compatibility conditions, 159 Debut, 414
Subject Index

Density, 27 Events prior, 411


Dependent Evolution equation, 151
symmetrically, 44 Excessive function, 217
Dependent probability space, 87 Exchangeable sequence, 197
Differentiation basis, 132 Expectation, 12
Directed set, 160 Expected value, 12
Disjunctification, 51 Experiment, 5
Distinguished logarithm, 251, 305
Distribution Factorization criterion, 205
Bernoulli, 236, 481 Fair game, 175
beta, 210 Fatou's lemma, 13
binomial, 236, 305 Favorable game, 175
bivariate normal, 206 Frkchet variation, 494
Cauchy, 236, 306, 387 Franklin functions, 461
degenerate, 236 Fubini-Stone theorem, 19
Dirichlet , 211 Functional central limit theorem,
exponential, 98, 477, 478, 500 342
exponential family, 206 Fundamental identity of sequential
gamma, 98, 236, 477 analysis, 428
Gaussian, 159, 235 Fuiidaineiital law of probability, 21
geometric, 315
log-normal, 277, 283 Gamma function, 502
multinomial, 101, 160, 305 Gaussian component, 390
noncentral chi-square, 284, 473 Generalized Bawly's theorem, 327
normal, 235, 481 Generalized hypergeometric func-
Pareto, 285, 328 tion, 502
Poisson, 235, 476, 478, 481 Generalized Poisson component, 390
steady-state, 501 Generalized random process, 507
Student's, 282 Glivenko-Cantelli theorem, 69
symmetric stable, 331
Holder's inequality, 14
uniform, 210, 236, 305, 500
conditional, 116
unitary, 236
Haar function, 460
Wishart, 392
Harmonic function, 217
Distribution function, 8
Harmonizable
Dominated convergence theorem,
almost, 506
13
Lokve, 494
Doob decomposition, 189
strictly, 490
Doob-Dynkin lemma, 8
strongly, 494
Doob-Meyer decomposition, 189
weakly, 494
Einpiric Helly's selection principle, 224
distribution, 69 Helly-Bray theorem, 225
Equicontinuous, 279 Herglotz lemma, 269
Ergodic, 68, 133, 454 Hurewicz-Oxtoby ergodic theorem,
Esscher transformation, 369 217
Subject Index

Image measure, 21 Likelihood ratio, 199


Incomplete gamma function, 504 Lindeberg-Feller theorem, 325
Independence, 33
mutually, 34 in-dependent, 378
pairwise, 34 Mahabharata, 4
probabilistic, 35 Markov chain, 146
statistical, 35 finite, 146
stochastic, 35 Markov kernel, 217
Indicator function, 28 Markov time, 411
Iiifiiiitely divisible, 304 Markov's inequality, 18
Iiifiiiitely often, 41 Markovian, 141
Infinitesimal, 318 family, 141
Irreducible. 212 strict-sense, 152
A/lartingale, 174
Jaina, 3 asymptotic, 456
Jensen's inequality, 17 difference, 175
conditional, 116 Jordan-type decomposition, 177
Joint distribution function. 38 Martingale difference sequence, 420
Maximal Ergodic Theorem, 451
Kac-Slepian paradox, 402 Median, 96
Kernel, 73 Memoryless property, 500
reciprocal, 473 Metrically transitive, 454
Kolmogorov Minimal sufficient a-algebra, 205
function, 312 Minimally positive definite, 397
Kolmogorov's A/linkowski7sinequality, 14
inequality, 51 conditional, 116
Kolmogorov-Bochner theorem, 40, Mixture, 32
171 Moment generating function, 278
Kronecker's lemma, 63 A/loment problem, 234
A/lonotone class, 10
Lkvy class, 397 A/lonotone convergence theorem, 13
Lkvy inequalities, 99
Lkvy measure, 312 Natural base, 174
Lkvy metric, 229 Noiiparainetric statistics, 197
Lkvy spectral set, 312 Norming constants, 332
Lkvy's inversion formula, 237
Lkvy-Khintchine representation, 306 Optional, 411
Latticial Class, 10 Optional sampling theorem, 422
Law of the iterated logarithm, 365 Order statistics, 88, 153
Least squares principle, 376
Lebesgue decomposition theorem, Path reversible, 212
20 Permutable sequence, 197
Lebesgue space, 28 Pochhammer notation, 502
Liapounov inequality, 15 Poincare's formula, 28
Liapounov's theorem, 293 Polish space, 173
Positive subillvariant measure, 212
Subject Index

Possible value, 81 Radon-Nikod9m Theorem, 20


Post-Widder formula, 98 Random ergodic theorem, 456
Potential, 218 Random field
Predictable, 420 Markov, 141
Predictable transform, 213, 420 Random measure, 483
Probability generating function, 499 Random process, 158
Probability space, 6 Random sample, 153
Process Random variable, 7
(c,p)-summable Cram&, 506 abstract, 9
<-radial, 492 complex, 10, 22
kth order, 211 generalized, 9
*, 201 lattice type, 289
additive, 471 multidimensional, 9
birth-death, 477 Random walk, 77, 81
catastrophe, 503 persistent, 84
class (KF,p ) , 506 recurrent, 84
class (KF), 505 transient, 84
counting, 475 Recurrent point, 81
ergodic, 454 Reflection principle, 357
explosive, 406 Regular conditional distribution, 125
F, 201 Regular conditional probability, 122
general birth-death, 501 Renewal, 286
generalized, 507 Renewal theorem, 419
Lkvy, 485 Resampling, 71
linear birth, 500 Riemann zeta function, 389
Markov, 141
multiple Markov, 21 1 Sample function, 344
Ornstein-Uhlenbeck, 498 Sample mean, 280
Poisson, 476 Sample variance, 280
pure birth, 500 Schauder functions, 461
stable, 406 SchefE7slemma, 26
stopped, 420 Semicontinuous function, 231
stopping time, 411 Shift transformation, 457
unstable, 406 Signum function, 238
wide sense Markov, 152 Simple stopping, 415
Wiener, 343, 344 Single server queueing problem, 77
Product class, 10 SLLN, 61, 63, 64, 72
Productlike measure, 165 Spectral density, 493
generalized, 168 Spectral function, 493
Spectral measure, 488
Quasi-martingale, 201 Spitzer's identity, 401
Queueing, 77, 213, 477, 502 St. Petersburg paradox, 95
Stable, 331
Rademacher function, 455 Stable family, 483
Radon's theorem, 268 Stable function, 396
Subject Index

Stable random measures, 483 conditionally, 114


Stable type laws, 332
Standard deviation, 18 Variance, 18
Star condition, 201 Variation
State space, 146 Frkchet, 494
Stationary Vitali, 494
almost, 506 Vitali variation, 494
asymptotic, 505 Vitali's theorem, 24
Khintchine sense, 493
strictly, 454 Weak law of large numbers (WLLN),
weakly, 282, 493 58
wide sense, 493 Weakly sequentially compact, 225
Stationary transition probabilities, Wiener measure, 344
212 WLLN, 58, 59, 61, 62, 95
Steady-state distibution, 501
Stochastic base, 174
Stochastic process, 158
Stochastically continuous, 474
Stopping time, 411
Strong law of large numbers (SLLN),
61
Strongly measurable, 490
Strongly mixing, 437
Strongly stationary a-stable, 488
Sub-Markov kernel, 217
Subharmonic function, 217
Subillvariant function, 217
Submartingale, 174
Sufficient statistic, 204, 205
Sufficient subfield, 204
Superharmonic function, 217
Supermartingale, 174
syadvada, 3
Symmetric event, 45
Symmetrically dependent sequence,
197

Tail a-algebra, 39
Three series theorem, 56
Tightness conditions, 364
Transition probability, 150

U-statistics, 197
Unfavorable game, 175
Uniformly integrable, 23

You might also like