100%(1)100% found this document useful (1 vote) 773 views706 pagesProbability and Random Processes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Probability and
Random Processes
with Applications
to Signal Processing
Third Edition
Henry Stark
Ilinois Institute of Technology
John W. Woods
Rensselaer Polytechnic Institute
PEARSON
ne
ia
Pearson Education InternationalIf you purchased this book within the United States or Canada
you should be aware that it has been wrongfully imposted
‘without the approval of the Publisher or the Author,
Vice President and Editorial Director, BCS: Marcia J. Horton
Vice President and Director of Production and Manufacturing, ESM: David W. Riccard
Publisher: Tom Robbins
Acquisitions Editor: Bric Frank
Associate Editor: Alice Dworkin
Executive Managing Editor: Vince O'Brien
Managing Editor: David A. George
Production Editor: Scott Disonno
Director of Central Services: Paul Belfanti
Creative Director: Carole Anson
Art Director: Jayne Conte
Art Editor: Adam Velthaus
Manufacturing Manager: Trudy Pisciotti
Manufacturing Buyer: Lisa McDowell
Senior Marketing Manager: Holly Stark
Editorial Assistant: Jodu McDonnell
PEARSON
oe
BAe | © 2002 Prontice Hall
SUNEGEEE Prentice-Hall, Inc
ULEDIES Upper Saddle River, New Jersey 07468
[AM rights reserved. No part of this book may be reproduced, in any form
or by any means, without permission in writing from the publisher,
The author and publisher of this book have used their best efforts in preparing this book. ‘These
forts include the development, research, and testing of the theories and programs to determine their
effectiveness. The author and publisher make no warranty of any kind, expressed or Implied, with regard to
these programs or the documentation contained in this book. The author and publisher shall not be Hable
in any event for incidental or consequential damages in connection with, or arising out of, the furnishing,
performance, or use of these programs,
Printed in the United States of America
10987654321
ISBN 0-13-178457-9
Pearson Education LTD.
Pearson Education Australia PTY, Limited
Pearson Education Singapore, Pie, Ltd
Pearson Education North Asia Ltd
Pearson Education Canada, Ltd.
Pearson Educacién de Mexico, $.A. de C.V.
Pearson Education — Japan
Pearson Education Malaysia, Pte. Lté
Pearson Education, Upper Saddle River, New JerseyTo my father P.D. Stark (in memoriam)
Prom darkness to light (1941-1945).
Henry Stark
To Harriet.
John W. WoodsContents
Preface
1
12
13
14
16
1.6
Lz
18
19
Introduction to Probability
Introduction: Why Study Probability?
‘The Different Kinds of Probability
A. Probability as Intuition
B. Probability as the Ratio of Favorable to Total Outcomes
(Classical Theory)
C. Probability as a Measure of Frequency of Occurrence
D. Probability Based on an Axiomatic Theory
Misuses, Miscalculations, and Paradoxes in Probability
Sets, Fields, and Events
Exaruples of Sample Spaces
Axiomatic Definition of Probability
Joint, Conditional, and Total Probabilities; Independence
Bayes’ Theorem and Applications
Combinatorics
Occupancy Problems
Extensions and Applications
Bernoulli Trials—Binomial and Multinomial Probability Laws
Multinomial Probability Law
1.10 Asymptotic Behavior of the Binomial Law: The Poisson Law
1.11 Normal Approximation to the Binomial Law
1.12 Summary
‘Problems
References
a
1
2
2
Ao movi
Contents
2 Random Variables
24
2.2
23
24
25
26
27
28
Introduction,
Definition of a Random Variable
Probability Distribution Function
Probability Density Function (PDF)
Four Other Common Density Functions
More Advanced Density Functions
Continuous, Discrete, and Mixed Random Variables
Examples of Probability Mass Functions
Conditional and Joint Distributions and Densities
Failure Rates
Summary
Problems
References
Additional Reading
3 Functions of Random Variables
31
3.2
3.3
34
3.5
3.6
Introduction
Functions of a Random Variable (Several Views)
Solving Problems of the Type ¥ = g(X)
General Formula. of Determining the pdf of Y = 9{X)
Solving Problems of the Type Z = 9(X,¥)
Solving Problems of the Type V = (X,Y), W = A(X, ¥)
Fundamental Problem
Obtaining fyw Directly from fxy
Additional Examples
Summary
Problems
References
Additional Reading
4 Expectation and Introduction to Estimation
Ad
42
43
44
Expected Value of a Random Variable
On the Validity of Equation 4.1-8
Conditional Expectations
Conditional Expectation as a Random Variable
Moments
Joint Moments
Properties of Uncorrelated Random Variables
Jointly Gaussian Random Variables
Contours of Constant Density of the Joint Gaussian pdf
Chebyshev and Schwarz Inequalities
Random Variables with Nonnegative Values
The Schwarz Inequality
58
58
59.
62
66
71
74
%
7
80
105
108
109
15
5
116
116
119
120
129
134
152
152
154
187
161
162
168
168
169
169
172
183
190
192
196
198
201
203,
205
207
208Contents vii
4.5 Moment-Generating Functions 211
4.6 Chernoff Bound 214
4.7 Characteristie Functions 216
Joint Characteristic Functions 222
The Central Limit Theorem 225
4.8 Estimators for the Mean and Variance of the Normal Law 230
Confidence Intervals for the Mean 231
Confidence Interval for the Variance 234
49 Summary 236
Problems 237
References 243
Additional Reading 243,
5 Random Vectors and Parameter Estimation 244
5.1 Joint Distribution and Densities 244
5.2 Multiple Transformation of Random Variables 248
5.3. Expectation Vectors and Covariance Matrices 251
5.4 Properties of Covariance Matrices 254
5.5. Simultaneous Diagonalization of Two Covariance Matrices and
Applications in Pattern Recognition 259
Projection 262
Maximization of Quadratic Forms 263
5.6 The Multidimensional Gaussian Law 269
5.7 Characteristic Functions of Random Vectors 277
‘The Chazacteristic Function of the Normal Law 280
5.8 Parameter Estimation 282,
Estimation of B[X] 284
5.9 Estimation of Vector Means and Covariance Matrices 286
Estimation of 286
Estimation of the Covariance K 287
5.10 Maximurn Likelihood Estimators 290
5.11 Linear Estimation of Vector Parameters 294
5.12 Summary 297
Problems 298
References 303
Additional Reading 303
6 Random Sequences 304
6.1 Basic Concepts 304
Infinite-Length Bernoulli ‘Trials 310
Continuity of Probability Measure 315
Statistical Specification of a Random Sequence 317
6.2 Basic Principles of Discrete-Time Linear Systems 334
6.3 Random Sequences and Linear Systems 340,viii Contents
6.4 WSS Random Sequences 348
Power Spectral Density 351
Tuterprotation of the PSD 352
Synthesis of Random Sequences and Discrete-Time Simulation 355
Decimation, 358
Interpolation 359
6.5 Markov Random Sequences 362
ARMA Models 365
Markov Chains 366
6.6 Vector Random Sequences and State Equations 372
6.7 Convergence of Random Sequences 375
6.8 Lows of Large Numbers 383
6.9 Summary 387
Problems 388
References 399
7 Random Processes 401
7.1 Basic Definitions 402
7.2 Some Important Random Processes 406
Asynebronous Binary Signaling 406
Poisson Counting Process 408
Alternative Derivation of Poisson Process a2
Random Telegraph Signal 414
Digital Modulation Using Phase-Shift Keying 416
Wiener Process or Brownian Motion 418
‘Markov Random Processes 421
Birth-Death Markov Chains 425
‘Chapman-Kolmogorov Equations 429
Random Process Generated from Random Sequences 430
7.3. Continuous-Time Linear Systems with Random Inputs 430
White Noise 436
7.4 Some Useful Classifications of Random Processes 437
Stationarity. 437
7.5 Wide-Sense Stationary Processes and LSI Systems 439
Wide-Sense Stationary Case 440
Power Spectral Density 443
‘An Interpretation of the psd 444
More on White Noise 448
Stationary Processes and Differential Equations 455
7.6 Periodic and Cyclostationary Processes 458
7.7 Vector Processes and State Equations 464
State Equations 466
78 Summery 469Contents ix
Problems 469
References 486
8 Advanced Topics in Random Processes 487
8.1 Mean-Square (m.s.) Calculus 487
Stochastic Continuity and Derivatives [8-3] 487
Purther Results on m.s. Convergence (8-1) 497
8.2 m.s. Stochastic Integrals 502
8.2 ms. Stochastic Differential Equations 506
8.4 Ergodicity [8-3] 511
8.5 Karhunen-Lodve Expansion (8-5] 518
8.6 Representation of Bandlimited and Periodic Processes 524
Bandlimited Processes 525
Bandpass Random Processes 528
WSS Periodic Processes 530
Fourier Series for WSS Processes 533,
8.7 Summary 535
88 Appendix: Integral Equations 535
Existence Theorem 536,
Problems 540
References 551
9 Applications to Statistical Signal Processing 552
9.1 Estimation of Random Variables 552
More on the Conditional Mean 558.
Orthogonality and Linear Estimation 560
Some Properties of the Operator E 568
9.2 Innovation Sequences and Kalman Filtering 570
Predicting Gaussian Random Sequences 574
Kalman Predictor and Filter 575
Error Covariance Equations 581
9.3 Wiener Filters for Random Sequences 585
Unrealizable Case (Smoothing) 585
Causal Wiener Filter 587,
94 Expectation-Maximization Algorithm. 589
Log-Likelihood for the Linear Transformation 592
Surnmary of the E-M algorithm 594
E-M Algorithm for Exponential Probability Functions 594
Application to Emission ‘Tomography 595
Log-likelihood Function of Complete Data. 598
E-step 598
Mestep 599
9.5 Hidden Markov Models (HMM) 600
Specification of an HMM 601Contents
96
97
98
Application to Speech Processing
Efficient Computation of P{E|M] with a Recursive Algorithm
Viterbi Algorithm and the Most Likely State
Sequence for the Observations
Spectral Estimation
‘The Periodogram
Bartlett’s Procedure~Averaging Periodograms
Parametric Spectral Estimate
Maximum Entropy Spectral Density
Simulated Annealing
Gibbs Sampler
Noncausal Gauss-Markov Models
Compound Markov Models
Gibbs Line Sequence
Summary
Problems
References:
Appendix A Review of Relevant Mathematics
AL
AQ
AB
AA
Basic Mathematics
Sequences
Convergence
Summations
Z-Transform
Continuous Mathematics
Definite and Indefinite Integrals
Differentiation of Integrals
Integration by Parts
Completing the Square
Double Integration
Functions
Residue Method for Inverse Fourier Transformation
Fact:
Inverse Fourier Transform for psd of Random Sequence
Mathematical Induction
Axiom of Induction
References
Appendix B Gamma and Delta Functions
Ba
BQ
Gamma Function
Dirac Delta Function
Appendix C Functional Transformations and Jacobians
Cl
Introduction
604
605
607
610
61L
64
616
620
623
624
625
629
630
633.
635,
639
641
641
641
642
643
643
644
645,
645
616
oar
647
649
650
653
656
656
657
658
658
659
662
662Contents
C.2 Jacobians for n = 2
C3 Jacobian for General n
Appendix D Measure and Probability
D.1_ Introduction and Basic Ideas
Measurable Mappings and Functions
D.2. Application of Measure Theory to Probability
Distribution Measure
Appendix E Sampled Analog Waveforms and Discrete-time Signals
Index
xi
663
664
668
668
670
670
671
672
674Preface
‘The first edition of this book (1986) grew out of a set of notes used by the authors to teach,
two one-semester courses on probability and random processes at Rensselaer Polytechnic
Institute (RPI). At that time the probability course at RPI was required of all students in
the Computer and Systems Engineering Program and was a highly recommended elective for
students in closely related areas. While many undergraduate students took the course in the
Junior year, many seniors and first-year graduate students took the course for credit as well.
Then, as now, most of the students were engineering students. To serve these students well,
we felt that we should be rigorous in introducing fundamental principles while furnishing
many opportunities for students to develop their skills at solving problems.
‘There are many books in this area and they range widely in their coverage and depth.
At one extreme are the very rigorous and authoritative books that view probability from the
point of view of measure theory and relate probability to rather exotic theorems such as the
Radon-Nikodym theorem (see for example Probability and Measure by Patrick Billingsley,
Wiley, 1978). At the other extreme are books that usually combine probebility and statistics
and largely omit underlying theory and the more advanced types of applications of proba-
bility. In the middle are the large number of books that combine probability and random
processes, largely avoiding a. measure theoretic approach, preferring to emphasize the axioms
upon which the theory is based. It would be fair to say that our book falls into this latter
category. Nevertheless this begs the question: why write or revise another book in this area
if there are already several good texts out there that use the same approach and provide
roughly the same coverage? Of course back in 1986 there were few books that emphasized
the engineering applications of probability and random processes and that integrated the
latter into one volume. Now there are several such books.
Both authors have been associated (both as students and faculty) with colleges and
universities that have demanding programs in engineering and applied science. Thus theix
xiiixiv Preface
experience and exposure have been to superior students that would not be content with
fa text that furnished a shallow discussion of probability. At the same time, however, the
authors wanted to write a book on probability and random processes for enginesring and
applied science students. A measure-theoretic book, or one that avoided the engineering
applications of probability and the processing of random signals, was regarded not suitable
for such students, At the same time the authors felt that the book should have enough depth
so that students taking 2 year graduate courses in advanced topics such as estimation
and detection, pattern recognition, voice and image processing, networking and queuing,
and so forth would not be handicapped by insufficient, knowledge of the fundamentals and
applications of random phenomena. In a nutshell we tried to write a book that combined
rigor with accessibility and had a strong self-teaching orientation. To that end we included a
large number of worked-out examples, MATLAB codes, and special appendices that include
a review of the kind of basic math needed for solving problems in probability as well as
an introduction to measure theory and its relation to probability. The MATLAB codes, as
well as other useful material such as multiple choice exams that cover each of the book's
sections, can be found at the book’s web site http : //www.prenhall.com/stark.
‘The normal use of this book would be as follows: for a Brst course in probability at, say
the junior or senior year, a reasonable goal is to cover Chapters 1 through 4. Nevertheless
we have found that this may be too much for students not well prepared in mathematics.
In that case we suggest a load reduction in which combinatorics in Chapter 1 (parts of
Section 1.8), failure rates in Chapter 2 (Section 2.7), more advanced density functions and
the Poisson transform in Chapter 3 are lightly or not. covered the first time around. The
proof of the Central Limit Theorem, joint characteristic functions, and Section 4.8, which
deals with statistics, all in Chapter 4, can, likewise, also be omitted on a first reading.
Chapters 5 to 9 provide the material for a first course in random processes. Normally
such a course is taken in the first year of graduate studies and is required for all further study
in signal processing, communications, computer and communication networking, controls,
and estimation and detection theory. Here what to cover is given greater latitude. If pressed
for time, we suggest that the pattern recognition applications and simultaneous diagonal-
ization of two covariance matrices in Chapter 5 be given lower preference than the other
material in that chapter. Chapters 6 and 7 are essential for any course in random processes
and the coverage of the topics therein should be given high priority. Chapter 9 on signal
processing should, likewise be given high priority, because it illustrates the applications
‘of the theory to current state-of-art problems. However, within Chapter 9, the instructor
can choose among a number of applications and need not cover them all if time pressure
becomes an issue. Chapter 8 dealing with advanced topics is critically important to the more
advanced students, especially those secking further studies toward the Ph.D. Nevertheless
it too can be lightly covered or omitted in a first course if time is the critical factor.
Readers familiar with the 2"¢ edition of this book will find significant changes in the
3° edition. The changes were the result of numerous suggestions made by lecturers and
students alike. To begin with, we modified the title to Probability and Random Processes
with Applications to Signal Processing, to better reflect the contents. We removed the two
chapters on estimation theory and moved some of this material to other chapters where
it naturally fitted in with the material already there. Some of the material on parameterPreface xv
estimation e.g., the Gauss-Markov Theorem has been removed, owing to the need for finding
space for new material, In terms of organization, the major changes have been in the random.
processes part of the book. Many readers preferred seeing discrete-time random phenomena
in one chapter and continuous-time phenomena in another chapter. In the earlier editions
of the book there was a division along these lines but also a secondary division along the
lines of stationary versus non-stationary processes. For some this made the book awkward
to teach from. Now all of the material on discrete-time phenomena appears in one chapter
(Chapter 6); likewise for continuous-time phenomena (Chapter 7). Another major change is
a new Chapter 9 that discusses applications to signal processing. Included are such topics
as: the orthogonality principle, Wiener and Kalman filters, The Expectation-Maximization
algorithm, Hidden Markov Models, and simulated annealing. Chapter 8 (Advanced Topics)
covers much of the same ground as the old Chapter 9 e.g., stochastic continuity, mean-
square convergence, Ergodicity etc. and material from the old Chapter 10 on representation
of random processes.
There have been significant changes in the first half of the book also. For example, in
Chapter 1 there is an added section on the misuses of probability in ordinary life. Here
we were helped by the discussions in Steve Pinker’s excellent book How the Mind Works
(Norton Publishers, New York, 1997). Chapter 2 (Random. Variables) now includes disous-
sions on more advanced distributions such as the Gamma, Chi-square and the Student-t
All of the chapters have many more worked-out examples as well as more homework prob-
lems. Whenever convenient we tried to use MATLAB to obtain graphical results. Also, being
a book primarily for engineers, many of the worked-out example and homework problems
relate to real-life systems or situations.
We have added several new appendices to provide the necessary background mathe-
matics for certain results in the text and to enrich the reader's understanding of proba-
bility. An appendix on Measure Theory falls in the latter category, Among the former are
appendices on the delta and gamma functions, probability-related basic math, including the
principle of proof-by-induction, Jacobians for n-dimensional transformations, and material
on Fourier and Laplace inversion.
For this edition, the authors would like to thank Geoffrey Williamson and Yongyi Yang
for numerous insightful discussions and help with some of the MATLAB programs. Also we
‘thank Nikos Galatsanos, Miles Wernick, Geoffrey Chan, Joseph LoCicero, and Don Ucei for
helpful suggestions. We also would like to thank the administrations of Hlinois Institute of
‘Technology and Rensselaer Polytechnic Institute for their patience and support while this
third edition was being prepared. Of course, in the end, it is the reaction of the students
that is the strongest driving force for improvements, To all our students and readers we owe
a large debt of gratitude.
Henry Stark
John W. WoodsIntroduction to Probability
4.1 INTRODUCTION: WHY STUDY PROBABILITY?
One of the most frequent questions posed by beginning students of probability is: “Is
anything truly random and if so how does one differentiate between the truly random and
that which, because of a lack of information, is treated as random but really isn’t?” First,
regarding the question of truly random phenomena: “Do such things exist?” A theologian
might state the case as follows: “We cannot claim to know the Creator's mind, and we
cannot predict His actions because He operates on a scale too large to be perceived by man.
Hence there are many things we shall never be able to predict no matter how refined our
measurements.”
At the other extreme from the cosmic scale is what happens at the atomic level. Our
friends the physicists speak of such things as the probability of an atomic system being in
a certain state. The uncertainty principle says that, try as we might, there is a limit to
the accuracy with which the position and momentum can be simultaneously ascribed to a,
particle. Both quantities are fuzzy and indeterminate.
Many, including some of our most famous physicists, believe in an essential random-
ness of nature. Eugen Merzbacher in his well-known textbook on quantum mechanics [1-1]
writes:
‘The probability doctrine of quantum mechanics asserts that: the indetermination, of
which we have just given an example, is a property inherent in nature and not merely a
profession of our temporary ignorance from which we expect to be relieved by a future
better and more complete theory. The conventional interpretation thus denies the
Possibility of an ideal theory which would encompass the present quantum mechanics2 Chapter 1 _ Introduction to Probability
but would be free of its supposed defects, the most notorious “imperfection” of quantum,
mechanics being the abandonment of strict classical determinism.
But the issue of determinism versus inherent indeterminism need never even be consid-
ered when discussing the validity of the probabilistic approach. The faet remains that there
is, quite literally, a nearly uncountable number of situations where we cannot make any cate~
gorical deterministic assertion regarding a phenomenon because we cannot measure all the
contributing elements. Take, for example, predicting the value of the current i(t) produced
by a thermally excited resistor R. Conceivably, we might accurately predict #{¢) at some
instant f in the future if we could keep track, say, of the 10? or so excited electrons moving
in each other's magnetic ficlds and setting up local field pulses that eventually all contribute
to producing i(t). Such a calculation is quite inconceivable, however, and therefore we use
a probabilistic model rather than Maxwell’s equations to deal with resistor noise. Similar
arguments can be made for predicting weather, the outcome of a coin toss, the time to
failure of a computer, and many other situations,
‘Thus to conclude: Regardless of which position one takes, that is, determinism versus
indeterminism, we are forced to use probabilistic models in the real world because we do
not know, cannot calculate, or cannot measure all the forces contributing to an effect. The
forces may be too complicated, too numerous, or too faint.
Probability is a mathematical model to help us study physical systems in an average
sense, Thus we cannot use probability in any meaningful sense to answer questions such
as: “What is the probability that a comet will strike the earth tomorrow?” or “What is the
probability that there is life on other planets?"t
R. A. Fisher and R. Von Mises, in the first third of the twentieth century, were
largely responsible for developing the groundwork of modern probability theory. The modern
axiomatic treatment upon which this book is based is largely the result of the work by Andrei
N. Kolmogorov [1-2]
4.2 THE DIFFERENT KINDS OF PROBABILITY
‘There are essentially four kinds of probebility. We briefly discuss them here,
A. Probability as Intuition
This kind of probability deals with judgments based on intuition. Thus “She will probably
marry him,” and “He probably drove too fast,” are in this category. Intuitive probability
can lead to contradictory behavior. Joe is still likely to buy an imported Itsibitsi, world
famous for its reliability, even though bis neighbor Frank has a 19-year-old Buick that has
never broken down and Joe’s other neighbor, Bill, has his Itsibitsi in the repair shop. Here
Joe may be behaving “rationally,” going by the statistics and ignoring, so-to-speak, his
T Nevertheless, certain evangelists and others have dealt with this question rather fearlessly, However,
whatever probability system these people use, it is not the system that we shall discuss in this book,Sec. 1.2. THE DIFFERENT KINDS OF PROBABILITY 3
personal observation. On the other hand, Joe will be wary about letting his nine-year-old
daughter Jane swim in the local pond, if Frank reports that Bill thought that he might
have seen an alligator in it. This despite the fact that no one has ever reported seeing
an alligator in this pond, and countless people have enjoyed swimming in it without ever
having been bitten by an alligator. To give this example some credibility, assume that the
pond is in Florida. Here Joe is ignoring the statistics and reacting to, what is essentially,
a rumor. Why? Possibly because the cost to Joe “just-in-case” there is an alligator in the
pond would be too high (1-3).
People buying lottery tickets intuitively believe that certain number combinations like
month/day/year of their grandson’s birthday are more likely to win that, say, 06-06-06.
How many people will bet even odds that a coin that, heretofore has behaved “fairly,” that
is in an unbiased fashion, will come up heads on the next toss, if in the last seven tosses it
has come up heads? Many of us share the belief that the coin has some sort of memory and
that, after seven heads, that coin must “make things right” by coming up with more tails.
A mathematical theory dealing with intuitive probebility was developed by
B. O, Koopman {1-4}. However, we shall not discuss this subject in this book
B. Probability as the Ratio of Favorable to Total Outcomes
(Classical Theory)
In this approach, which is not experimental, the probability of an event is computed a
priori! by counting the number of ways Ng that # can occur and forming the ratio Ne /N-
where V is the number of all possible outcomes, that is, the number of all alternatives to
E plus Ng. An important notion here is that all outcomes are equally likely. Since equally
likely is really a way of saying equally probable, the reasoning is somewhat circular. Suppose
we throw a pair of unbiased dice and ask what is the probability of getting a seven? We
partition the outcome into 36 equally likely outcomes as shown in Table 1.2-1 where each
entry is the sum of the numbers on the two dice.
Table 1.2-1 Outcomes of Throwing
Two Dice
ist die
Qddie}1 2 3 4 8 6
i [23 4 5 6 7
2 |3 45 6 7 8
3 |4 5 6 7 8 38
4 |5 6 7 8 9
5 |6 78 9 Ww
6 97 8 9 WU B
1A priori means relating to reasoning from self-evident propositions or presupposed by experience. A
posfertor means relating ¢0 reasoning ftom observed facts.4 Chapter 1_ Introduction to Probability
‘The total number of outcomes is 36 if we keep the dice distinct. The number of ways
of getting a seven is . Hence
Plgotting a seven] = 5 t
Example 1,2-1
‘Throw a fair coin twice (note that since no physical experimentation is involved, there is no
problem in postulating an ideal “fair coin”), The possible outcomes are HH, HT, TH, TT.
‘The probability of getting at least one tail Tis computed as follows: With E denoting the
event of getting at least one tail, the event E is the set of outcomes
B= {HT,TH,TT}
‘Thus, E occurs whenever the outcome is HT or TH or TT. The number of elements in E
ig Ng = 3; the number of all outcomes, N, is four. Hence
Plat least one T) = Ni
‘The classical theory suffers from ab least two significant problems: (1) Tt cannot deal with
outcomes that are not equally likely; and (2) it cannot handle uncountably infinite outcomes,
without ambiguity (see the example by Athanasios Papoulis [1-5]). Nevertheless, in those
problems where it is impractical to actually determine the outcome probabilities by exper-
imentation and where, because of symmetry considerations, one can indeed argue equally
likely outcomes the classical theory is useful.
Historically, the classical approach was the predecessor of Richard Von Mises? [1-6]
relative frequency approach developed in the 1930s.
C. Probability as a Measure of Frequency of Occurrence
‘The relative-frequency approach to defining the probability of an event £ is to perform
an experiment n times. The number of times that F appears is denoted by mp. Then it is
tempting to define the probability of E occurring by
ne
PIE) = im "2 (1.24)
Quite clearly since we must _have One difficulty with this approach
iS that we can never perform the experiment an infinite number of times so we can only
estimate P[£) from a finite number of trials. Secondly, we postulate that nz/n approaches
a, limit as n goes to infinity. But consider flipping a fair coin 1000 times. The likelihood
of getting exactly 500 heads is very small; in fact, if we flipped the coin 10,000 times, the
likelihood of getting exactly 5000 heads is even smaller. As n -+ 00, the event of observing
exactly n/2 heads becomes vanishingly small. Yet our intuition demands that P{head] = 5See. 1.3, MISUSES, MISCALCULATIONS, AND PARADOXES IN PROBABILITY 5
for a fair coin. Suppose we choose a 6 > 0; then we shall find experimentally that if the coin
is truly fair, the number of times that
ng
es 1.2-2)
ntl >s (122)
as n becomes large, becomes very small. Thus although it is very. unlikely that at any stage
of this experiment, especially when n is large, np/mis exactly J, this ratio will nevertheless
hover around 7, and the number of times if will make significant excursion away from the
‘Vicnity OF F according to Equation 1.2-2 becomes very small indeed.
Despite _these_problems with the frequency definition of probability, the relative-
frequency concept is essential in applying probability theory to the physical world.
D. Probability Based on an Axiomatic Theory
‘This is the approach followed in most modern textbooks on the subject. To develop it we
must introduce certain idees, especially those of a random experirnent, a sample description
space, and an event. Briefly stated, a random experiment is simply an experiment in which
‘the outcomes are nondeterministic, that is, probabilistic. Hence the word random in random
experiment. The sample description space is the set of all outcomes of the experiment. An
event is a subset of the sample description space that satisfies certain constraints, In general,
however, almost any subset of the sample description space is an event.
‘These notions are refined in Sections 1.4 and 1.5
1.3 MISUSES, MISCALCULATIONS, AND PARADOXES IN PROBABILITY
The misuse of probability and statistics in everyday life is quite common. Many of the
misuses are illustrated by the following examples. Consider a defendant in a murder trial
who pleads not guilty to murdering his wife. The defendant has on numerous occasions
beaten his wife. His lawyer argues that, yes, the defendant has beaten his wife but that
among men who do so, the probability that one of them will actually murder his wife is
only 0.001, that is, only one in a thousand. Let us assume that this statement is true. It
is meant to sway the jury by implying that the fact of beating one’s wife is no indicator
of murdering one’s wife. Unfortunately, unless the members of the jury have taken a good
course in probability, they might not be aware that a far more significant question is the
following: Given that a battered wife is murdered, what is the probability that the husband is
ihe murderer? Statistics show that this probability is, in fact, greater than one-hall
Tn the 1996 presidential race, Senator Bob Dole’s age became an issue. His opponents
claimed that a 72-year-old white male hes a 27 per cent risk of dying in the next five years.
‘Thus it was argued, were Bob Dole elected, the probability that he would fail to survive his
term was greater than one-in-four. The trouble with this argument is that the probability
of survival, as computed, is not conditioned on additional pertinent facts. As it happens,
if a 72-year-old male is still in the work force and, additionally, happens to be rich, then
taking these additional facts into consideration, the average 73-year-old (the age at which6 Chapter 1 _ Introduction to Probability
Dole would have assumed the presidency) has only a one-in-eight chance of dying in the
next four years [1-3].
Misuse of probability appears frequently in predicting life elsewhere in the universe. In
his book Probability 1 (Harcouxt Brace & Company, 1998), Amir Aczel assures us that we
can be certain that alien life forms are out there just waiting to be discovered. However, in
a cogent review of Aczel’s book, John Durant of London’s Imperial College writes:
Statistics are extremely powerful and important, and Aczel is a very clear and capable
exponent of them. But statistics cannot substitute for empirical knowledge about the
way the universe behaves. We now have no plausible way of arriving at robust estimates
about the way the universe behaves. We now have no plausible way of arriving at
robust estimates for the probability of life arriving spontaneously when the conditions
are right. $0, until we either discover extratertestial life or understand far more about
how at least one form of life—terrestrial life—first appeared, we can do little more
than guess at the likelihood that life exists elsewhere in the universe. And as long as
we're guessing, we should not dress up our interesting speculations as mathematical
certainties.
‘The computation of probabilities based on relative frequency can lead to paradoxes. An
excellent example is found in (1-3]. We repeat the example here:
In a sample of American women between the ages of 35 and 50, 4 out of 100 develop
breast cancer within a year. Does Mrs. Smith, a 49-year-old American woman, therefore
have a 4% chance of getting breast cancer in the next yeat? There is no answer. Suppose
‘that in a sample of women between the ages of 45 and 90—a class to which Mrs. Smith:
also belongs—11 out of 100 develop breast cancer in a year. Are Mrs, Smith’s chances
4%, ot are they 11%? Suppose that her mother had breast cancer, and 22 out of 100
women between 45 and 90 whose mathors had the disease will develop it. Are her
chances 4%, 11%, or 22%? She also smokes lives in California, had two children before
the age of 25 and one after 40 is of Greck descent ... What group should we compare
her with to figure out the “true” odds? You might think, the more specific the class,
‘the better—but the more specific the class, the smaller its size and the less reliable the
frequency. If there were only two people in the world very much like Mrs, Smith, and
fone developed breast cancer, would anyone say that Mrs. Smith’s, chances are 50%?
In the limit, the only class that is truly comparable with Mrs. Smith in all her details
is the class containing Mrs. Smith herself. But in a class of one “relative frequency”
makes no sense.
‘The previous example should not leave the impression that the study of probability,
based on relative frequency, is useless. For one, there are a huge number of engineering and
scientific situations that are not as nearly complex as the case of Mrs. Smith's likelihood of
getting cancer. Also, it is true that if we refine the class and thereby reduce the class size,
our estimate of probability based on relative frequency becomes less stable. But exactly how
Sch los SL Geop within the sealant ‘of probability and its offspring statistics
(e.g, see the Law of Large Numbers in Section 4.4). Also, there are many situations where
the required conditioning, that is, class refinement, is such that the class size is sufficiently
large for excellent estimates of probability. And finally returning to Mrs, Smith, if the classSec. 14. SETS, FIELDS, AND EVENTS 7
size starts to get too small, then stop adding conditions and learn to live with a probability
estimate associated with a larger, less refined class. This estimate may be sufficient for all
kkinds of actions, that is, planning screening tests, and the like
1.4 SETS, FIELDS, AND EVENTS
A set is a collection of objects, either concrete or abstract. An example of a concrete set
is the set of all New York residents whose height equals or exceeds 6 feet. A subset of a
set is a collection that is contained within the larger set. Thus the set of all New York
City residents whose height is between 6 and 6} feet is a subset of the previous set. In the
study of probability we are particularly interested in the set of all outcomes of a random
experiment and subsets of this set. We sometimes denote the experiment by the symbol %
and the set of all outcomes by ©. The set is called the sample space or sample description
space of the random experiment .9% Subsets of © are called events, Since any set is a subset,
of itself, 9 is itself an event. In particular 9 is called the certain event. Thus 9 is used to
denote two objects: the set of all elementary outcomes of a random event and the certain
event. We shall see that using the same symbol for both objects is entirely consistent. A
little later we shall be somewhat more precise as to our definition of events
Examples of Sample Spaces
Example L4-t 0
The experiment consists of flipping a coin once. Then 2 = {H,T"} where Hf is a head and
Tiga tail,
Example 1.4-2
‘The experiment consists of flipping a coin twies. Then © = (HH, HT,TH,TT}. A typical
subset of 2 is {HH, HT, TH}; it is the event of getting at least one head in two flips.
Example 14-3 0
The experiment consists of choosing @ person at random and counting the hairs on his or
her head. Then
= {0,1,2,...,10"},
that is, the set of all nonnegative integers up to 10’, it being assumed that no human head
has more than 107 hairs.
Bxample 1-4-4
‘The experiment consists of determining the age to the nearest year of each member of a
married couple chosen at random. Then with « denoting the age of the man and y denoting
the age of the woman, @ is described by
Q = {2-tuples (x,y): 2 any integer in 10 ~ 200; y any integer in 10 ~ 200}.
Note that in Example 1.4-4 we have assumed that no human lives beyond 200 years and that
no married person is ever less than ten years old. Similarly, in Example 14-1, we assumed8 Chapter 1 Introduction to Probability
that the coin never lands on edge. If the latter is a possible outcome, it must be included
in Q in order for it to denote the set of all outcomes as well as the certain event.
Example 1.45
‘The experiment consists of observing the angle of defiection of a nuclear particle in on elastic
collision. Then
= (O:-m SOS a}.
An example of a subset of @ is
sost}ca
aia
B= {
Example 1.4-6
‘The experiment consists of measuring the instantaneous power, P, consumed by a current-
driven resistor. Then
Q= {P:P 20}.
Since power cannot be negative, we leave out nogative values of P in 9. A subset of © is
E={P > 10-? watts}.
Note that in Examples 14-5 and 14-6, the number of elements in © is uncountably infinite.
‘Therefore there are an uncountably infinite number of subsets. When, as in Example 14-4,
the number of outcomes is finite, the number of distinct subsets is also finite, and each
represents an event. Thus if Q = {¢1,..., G7}, the number of subsets of is 2”. We can see
this by noting that each deseriptor , t either is or is not present in any given,
subset of ©, This gives rise to 2 distinct events, one of which being the one in which none
of the G; appear, This event, involving none of the ¢ € ©, is called the impossible event and
is denoted by g. It is the event that cannot happen and is only included for completeness.
The set @ is also called the empty set.
Set algebra. The union (sum) of two sets E and F, written BUF or E+ F, is the set
of all clements that are in at least one of the sets # or F. Thus with B = {1,2,3,4} and
F = {1,3,4,5,6)7
BUF ={1,2,3,4,5,6}
If E is a subset of F, we indicate this by writing 2 CF’. Clearly for EC F it follows that
EUF = F. We indicate that ¢ is an element of © or “belongs” to 9 by writing ¢ € ©. Thus
we can write
BUF ={66€ Bor C€ F or lies in both} (L41)
‘The intersection or set product of two sets E and F, written EOF or EF is the set of
elements common to both E and F. Thus in the preceding example
EF = {1,3,4).
Lhe order of the elements in a set is not important.Sec. 14. SETS, FIELDS, AND EVENTS 9
Formally, EF & {:¢ € Band ¢ € F}. The complement of a set B, written B*, is the set
of all elements not in H. From this it follows that if @ is the sample description space or,
more generally, the universal set, then TT
BU ES =Q. (1.4-2)
Also EE® = 6. The difference of two sets or, more appropriately, the reduction of B by F,
written E — F is the set of elements in # that are not in F. It should be clear that,
BH P= BFe
FE =FE*
and, in general E ~ F 4 F — E. The exclusive-or of two sets, written # I’, is the set of
all elements in E or F but not both. Tt is readily shown that!
B@P=(E-F)U(F-E). (1.43)
‘The operation of unions, intersections, and so forth, can be symbolically represented by Venn
diagrams, which are useful as aids in reasoning and in establishing probability relations. The
various set operations EU F, EF, E°, B-F, F — E, E® F are shown in Figure 1.4-1 in
hateh lines.
Two sets E, F are said to be disjoint if BF = @; that is, they have no elements in
common. Given any set E, an n-partition of E consists of a sequence of sets By, i= 1,...,
such that Ej CB, UN, Ej = B and BB; = ¢ all i # j. Thus given two sets E, F a
2-partition of F is
F = FEUFE*. (14-4)
It is easy to demonstrate, using Venn diagrams, the following results:
(BUF)? = EeFe (1.4.5)
(EF) = BSU FS (14-6)
and, by induction,* given sets Ey,..., B,
(U B) = fies (14-7)
fa a = Ue (1.48)
=
"Equation 1.4-5 shows why U is preferable, at least initially, to + to indicate union. The beginnings
student might—in etror—write (B ~ F) + (F - B) = B— F + F ~ = 0, which is meaningless
See Section Ad in Appendix A for the moaning of induction10 Chapter 1 Introduction to Probability
F
E
‘po
&
(a EUF (b) EF to) E°
@
| F
(a) EF fe) FE EOF
Figure 14-1
‘The relations are known as De Morgan’s laws after the English mathematician Augustus
> Morgan (1806-1871).
‘Two sets E and F are said to be equal if every element in Fis in F and vice versa,
Equivalently,
E=F i ECF, FCE. (1.49)
Sigma fields. Consider a universal set @ and a collection of subsets of Q. Let BF,
denote subsets in this collection, This collection of subsets forms a field 421 if
A 64,96 4.
Tf B €.@ and F €#, then BUF € 4, and BF ¢ #3
(A UB eM, then Be € A.
A sigma (0) field’ # is a field that is closed under any countable set of unions, intersections,
and combinations, Thus if By, -..;Bn,--. belong to ¥ 80 do
Uz ana Ae
i= ist
Also sometimes called an algebra.
SFrom this it follows that if B,,..., By belongs to so do LJ", Bi € MB and (YL, Bi € A.
4Also sometimes called a o-algebra.Sec. 1.8. AXIOMATIC DEFINITION OF PROBABILITY ee
where these are defined as
UB 3 {the set of all elements in at least one Ey}
a
and wo
(7) © {the set of all elements in every Bj}
fel
Events. Consider an experiment with sample description space ®. If 0 has a countable
number of elements, then every subset of © may be assigned a probability ina way consistent
with the axioms given in the next section. Then the class of all subsets make up a o-field
and each subset is an event. Thus we speak of the o-field of events. However, when
is not countable, that is, when @ = R = the real line, then not every subset of & can be
assigned a probability that will be consistent with the axiomatic theory. Only those subsets
to_which a probability can_be assigned consistent with the axioms will be called events
‘The collection of those subsets is smaller than the collection of all possible subsets that one
can define on . This smaller collection forms a o-field. On the real line R the o-field is
sometimes called the Borel field of events and, as a practical matter, includes all subsets of
engineering and scientific interest.’
At this stage of our development, we have two of the three objects required for the
axiomatic theory of probability, namely, a sample description space and a o-field ¥ of
events defined on 2. We still need a probability measure P. The three objects (0,%P)
form a triplet called a probability space &
4.5 AXIOMATIC DEFINITION OF PROBABILITY
Probability is a set function P(] that assigns to every event B é F a number P{B} called
the probability of B such that
Al) PIE) 20. (15-1)
A) PY=1 (1.5-2)
PP) PVF) = PlEl+ PP] it EF = 4. (15-3)
‘These axioms are sufficient to establish the following basic results, all but one of which
we leave as exercises for the reader:
(4) Pig] = 0. (15-4)
(5) PFS = Pld] ~ P|EF) where Be FF 6H (15-5)
‘tor two-dimensional Euclidean sample spaces, the Borel field of events would be subsets of Rx R; for
threo-dimensional sample spaces it would be subsets of Fx Rx R.
#4 fourth axiom: P [US Bx] = Si, Plth) if £2 = 6 all» j must be imcluded to enshle one to
deal rigorously with limits and countable unions. This axioin is of no concern to us hete but will bein later
chapters12 Chapter 1. Introduction to Probability
(6) P[B] = 1— PIE", (1.5-6)
(7) PIEUF| = PIE] + PIF] - PIPE) (1.5-7)
From Axiom 3 we can establish by mathematical induction that
P (U =| SPE HEE;=o all i#5 (1.58)
= iI
From this result and Equation 1.5-7 we establish the general result that P (Uf. Ei] <
” . P{]. This result is sometimes Inown as the union bound.
Example 1.5-1
We wish to prove result (7). Pirst we decompose the event BU F into three disjoint events
as follows:
BUF = BF°UE’FUBF.
By Axiom 3
PIBUF] = PIBF°UE*F] + P[EF)
= PIEF*| + P|E*F] + P{EF), by axiom (3) again
= PIE] - P[RF| + P[F| - PIEP) + PIBF|
= P(E] + PIF] - PIEF). (15-9)
In obtaining line three from line two, we used result (5).
Example 1.5-2
‘The experiment consists of throwing a coin once, Hence
Q= (HT).
‘The a-field of events consists of the following sets: {H}, {7}, ©, ¢. With the coin assumed
fair, we havet
PAY =PUTY=3, Pi =1, Pie] = 0.
Example 1.5-3
‘The experiment consists of throwing a die once. The outcome is the number of dots ni
appearing on the upface of the die. The set @ is given by @ = {1,2,3,4,5,6}. The o-field
of events consists of 2° elements. Some are
$0, {1}, {1,2}, (1,2, 3}, (1,4,6}, (12,4, 5},
Fo be clear we must distinguish bebween outcomes ¢ and elementary events {¢). The outcomes ¢ are
elements of ©. The elementary events {C} are subsets of 2, Probability isa set function; it assigns a number
to every event. Thus should we write P{{¢}] rather than P[{] and more generally P[{¢1, Ca.---1S»}] rather
than P{Gi,¢2,---.¢n]- However, we shall frequently dispence with this notational complication and hope
that the meaning of the equation will be clear from the context,