Professional Documents
Culture Documents
MASSACHUSETTSINSTITUTEOFTECHNOLOGY
6.436J/15.085J Fall 2008
Lecture 1 9/3/2008
PROBABILISTICMODELSANDPROBABILITYMEASURES
Contents
1. Probabilistic experiments
2. Sample space
3. Discrete probability spaces
4. -elds
5. Probability measures
6. Continuity of probabilities
PROBABILISTICEXPERIMENTS
Probability theory is a mathematical framework that allows us to reason about
phenomena or experiments whose outcome is uncertain. A probabilistic model
is a mathematical model of a probabilistic experiment that satises certain math-
ematical properties (the axioms of probability theory), and which allows us to
calculate probabilities and to reason about the likely outcomes of the experi-
ment.
A probabilistic model is dened formally by a triple (,F,P), called a
probabilityspace, comprised of the following three elements:
(a) is the samplespace, the set of possible outcomes of the experiment.
(b) F is a -eld, a collection of subsets of .
(c) Pis a probabilitymeasure, a function that assigns a nonnegative probabil-
ity to every set in the -eld F.
Our objective is to describe the three elements of a probability space, and
explore some of their properties.
1
2 SAMPLESPACE
The sample space is a set comprised of all the possible outcomes of the ex-
periment. Typical elements of are often denoted by , and are called ele-
mentaryoutcomes, or simply outcomes. The sample space can be nite, e.g.,
= {
1
, . . . ,
n
}, countable, e.g., = N, or uncountable, e.g., = R or
={0,1}
.
As a practical matter, the elements of must be mutually exclusive and
collectively exhaustive, in the sense that once the experiment is carried out, there
is exactly one element of that occurs.
Examples
(a) If the experiment consists of a single roll of an ordinary die, the natural sample
space is the set = {1,2, . . . ,6}, consisting of 6 elements. The outcome = 2
indicates that the result of the roll was 2.
(b) If the experiment consists of ve consecutive rolls of an ordinary die, the natural
sample space is the set = {1,2, . . . ,6}
5
. The element = (3,1,1,2,5)is an
example of a possible outcome.
(c) If the experiment consists of an innite number of consecutive rolls of an ordinary
die, the natural sample space is the set ={1,2, . . . ,6}
(1p)p
k1
= 1.
k=1
(e) Let = N. We x a parameter > 0, and let p
k
= e
k
/k!, for every k N.
This results in a legitimate probability space because
k=1
e
k
/k! = 1.
(f) We toss an unbiased coin ntimes. We let ={0,1}
n
, and if we believe that all se-
quences of heads and tails are equally likely, we let P() = 1/2
n
for every .
3
4
(g) We roll a die ntimes. We let ={1,2, . . . ,6}
n
, and if we believe that all elemen-
tary outcomes (6-long sequences) are equally likely, we let P() = 1/6
n
for every
.
Given the probabilities p
i
, the problem of determining P(A)for some sub-
set of is conceptually straightforward. However, the calculations involved in
determining the value of the sum
A
P()can range from straightforward to
daunting. Various methods that can simplify such calculations will be explored
in future lectures.
-FIELDS
When the sample space is uncountable, the idea of dening the probability of
a general subset of in terms of the probabilities of elementary outcomes runs
into difculties. Suppose, for example, that the experiment consists of drawing
a number from the interval [0,1], and that we wish to model a situation where all
elementary outcomes are equally likely. If we were to assign a probability of
zero to every , this alone would not be of much help in determining the proba-
bility of a subset such as [1/2,3/4]. If we were to assign the same positive value
to every , we would obtain P({1,1/2,1/3, . . .}) = , which is undesirable.
A way out of this difculty is to work directly with the probabilities of more
general subsets of (not just subsets consisting of a single element).
Ideally, we would like to specify the probability P(A) of every subset of
. However, if we wish our probabilities to have certain intuitive mathematical
properties, we run into some insurmountable mathematical difculties. A so-
lution is provided by the following compromise: assign probabilities to only a
partial collection of subsets of . The sets in this collection are to be thought
of as the nice subsets of , or, alternatively, as the subsets of of interest.
Mathematically, we will require this collection to be a -eld, a term that we
dene next.
4
Denition2. Given a sample space , a -eldis a collection F of subsets
of , with the following properties:
(a) F.
(b) If A F, then A
c
F.
(c) If A
i
F for every iN, then
i=1
A
i
F.
Aset Athat belongs to Fis called an event, an F-measurableset, or simply
a measurableset. The pair (,F)is called a measurablespace.
The term event is to be understood as follows. Once the experiment is
concluded, the realized outcome either belongs to A, in which case we say
that the event Ahas occurred, or it doesnt, in which case we say that the event
did not occur.
Exercise1.
(a) Let F be a -eld. Prove that if A, B F, then AB F. More generally, given
a countably innite sequence of events A
i
F, prove that
i=1
A
i
F.
(b) Prove that property (a) of -elds (that is, F) can be derived from properties
(b) and (c), assuming that the -eld F is non-empty.
The following are some examples of -elds. (Check that this is indeed the
case.)
Examples.
(a) The trivial -eld, F ={,}.
(b) The collection F ={, A, A
c
,}, where Ais a xed subset of .
(c) The set of all subsets of : F = 2
|A}. ={A
(d) Let = {1,2, . . . ,6}
n
, the sample space associated with n rolls of a die. Let
A = { = (
1
, . . .
n
) |
1
2}, B = { = (
1
, . . . ,
n
) | 3
1
4}, and
C ={= (
1
, . . . ,
n
)|
1
5}, and F ={, A, B, C, AB, AC, BC,}.
Example (d) above can be thought of as follows. We start with a number
of subsets of that we wish to have included in a -eld. We then include
more subsets, as needed, until a -eld is constructed. More generally, given a
collection of subsets of , we can contemplate forming complements, countable
unions, and countable intersections of these subsets, to form a new collection.
We continue this process until no more sets are included in the collection, at
which point we obtain a -eld. This process is hard to formalize in a rigorous
manner. An alternative way of dening this -eld is provided below. We will
need the following fact.
5
5
Proposition1. Let Sbe an index set (possibly innite, or even uncountable),
and suppose that for every s we have a -eld F
s
of subsets of the same
sample space. Let F = , i.e., a set A belongs to F if and only if
sS
F
s
A F
s
for every sS. Then F is a -eld.
Proof. We need to verify that F has the three required properties. Since each
F
s
is a -eld, we have F
s
, for every s, which implies that F. To
establish the second property, suppose that A F. Then, A F
s
, for every s.
Since each sis a -eld, we have A
c
F
s
, for every s. Therefore, A
c
F,
as desired. Finally, to establish the third property, consider a sequence {A
i
}of
elements of F. In particular, for a given sS, every set A
i
belongs to F
s
. Since
is a -eld, it follows that
i=1
A
i
F
s
. This veries the third property and establishes that
F is indeed a -eld.
Suppose now that we start with a collection C of subsets of , which is not
necessarily a -eld. We wish to form a -eld that contains C. This is always
possible, a simple choice being to just let F = 2
i=1
(A
i
).
A probabilitymeasureis a measure Pwith the additional property P()=
1. In that case, the triple (,F,P)is called a probabilityspace.
For any A F, P(A)is called the probability of the event A. The assign-
ment of unit probability to the event expresses our certainty that the outcome
of the experiment, no matter what it is, will be an element of . Similarly, the
outcome cannot be an element of the empty set; thus, the empty set cannot occur
and is assigned zero probability. If an event A F satises P(A) = 1, we say
that Aoccurs almostsurely. Note, however, that Ahappening almost surely is
not the same as the condition A = . For a trivial example, let = {1,2,3},
p
1
=.5, p
2
=.5, p
3
= 0. Then the event A={1,2}occurs almost surely, since
P(A) =.5 +.5 = 1, but A= . The outcome 3 has zero probability, but is still
possible.
The countable additivity property is very important. Its intuitive meaning
is the following. If we have several events A
1
, A
2
, . . ., out of which at most
one can occur, then the probability that one of them will occur is equal to
the sum of their individual probabilities. In this sense, probabilities (and more
generally, measures) behave like the familiar notions of area or volume: the area
or volume of a countable union of disjoint sets is the sumof their individual areas
or volumes. Indeed, a measure is to be understood as some generalized notion
of a volume. In this light, allowing the measure (A)of a set to be innite is
natural, since one can easily think of sets with innite volume.
The properties of probability measures that are required by Denition 3 are
often called the axioms of probability theory. Starting from these axioms, many
other properties can be derived, as in the next proposition.
7
i=1
n
P(A
i
).
i=1
(b) For any event A, we have P(A
c
) = 1P(A).
(c) If the events Aand Bsatisfy AB, then P(A)P(B).
(d) (Unionbound)For any sequence {A
i
}of events, we have
P A
i
P(A
i
).
i=1 i=1
(e) (Inclusion-exclusion formula)For any collection of events A
1
, . . . , A
n
,
n
P A
i
= P(A
i
) P(A
i
A
j
)
i=1 i=1 (i,j): i<j
+ P(A
i
A
j
A
k
) + + (1)
n
P(A
1
A
n
).
(i,j,k): i<j<k
Proof.
(a) This property is almost identical to condition (b) in the denition of a mea-
sure, except that it deals with a nite instead of a countably innite collec-
tion of events. Given a nite collection of disjoint events A
1
, . . . , A
n
, let us
dene A
k
= for k > n, to obtain an innite sequence of disjoint events.
Then,
n n
P A
i
=P A
i
= P(A
i
) = P(A
i
).
i=1 i=1 i=1 i=1
Countable additivity was used in the second equality, and the fact P() = 0
was used in the last equality.
(b) The events A and A
c
are disjoint. Using part (a), we have P(AA
c
) =
P(A) +P(A
c
). But AA
c
= , whose measure is equal to one, and the
result follows.
(c) The events Aand B\Aare disjoint. Also, A(B\A) = B. Therefore,
using also part (a), we obtain P(A)P(A) +P(B\A) =P(B).
(d) Left as an exercise.
8
(e) Left as an exercise; a simple proof will be provided later, using random
variables.
Part (e) of Proposition 2 admits a rather simple proof that relies on random
variables and their expectations, a topic to be visited later on. For the special
case where n= 2, the formula simplies to
P(AB) =P(A) =P(B)P(AB).
Let us note that all properties (a), (c), and (d) in Proposition 2 are also valid
for general measures (the proof is the same). Let us also note that for a proba-
bility measure, the property P() = 0need not be assumed, but can be derived
from the other properties. Indeed, consider a sequence of sets A
i
, each of which
is equal to the empty set. These sets are disjoint, since = . Applying
the countable additivity property, we obtain
A
i
, then lim
i
P(A
i
) =P(A).
i=1
(c) If {A
i
}is a decreasing sequence of sets in F (i.e., A
i
A
i+1
, for all i),
and A=
A
i
, then lim
i
P(A
i
) =P(A).
i=1
(d) If {A
i
}is a decreasing sequence of sets (i.e., A
i
A
i+1
, for all i) and
i=1
A
i
is empty, then lim
i
P(A
i
) = 0.
Proof. We rst assume that (a) holds and establish (b). Observe that A=A
1
(A
2
\A
1
)(A
3
\A
2
). . ., and that the events A
1
, (A
2
\A
1
), (A
3
\A
2
), . . .
10
P(A) =P(A
1
) + P(A
i
\A
i1
)
i=2
n
=P(A
1
)+ lim P(A
i
\A
i1
)
n
i=2
n
=P(A
1
)+ lim P(A
i
)P(A
i1
)
n
i=2
=P(A
1
) + lim (P(A
n
)P(A
1
))
n
= lim P(A
n
).
n
Suppose now that property (b) holds, let A
i
be a decreasing sequence of sets,
and let A=
A
i
. Then, the sequence A
c
is increasing, and De Morgans law,
i=1 i
together with property (b) imply that
P(A
c
) =P
i=1
A
i
c
=P
i=1
A
i
c
= lim P(A
i
c
),
n
and
P(A) = 1P(A
c
) = 1 lim P(A
c
) = lim 1P(A
c
) = lim P(A
i
).
i i
n n n
Property (d) follows from property (c), because (d) is just the special case of
(c) in which the set Ais empty.
To complete the proof, we now assume that property (d) holds and establish
that property (a) holds as well. Let B
i
F be disjoint events. Let A
n
=
i=n
B
i
. Note that {A
n
} is a decreasing sequence of events. We claim that
n=1
A
n
=. Indeed, if A
1
, then B
n
for some n, which implies that
/
i=n+1
B
i
= A
n+1
. Therefore, no element of A
1
can belong to all of the
sets A
n
, which means that that the intersection of the sets A
n
is empty. Property
(d) then implies that lim
n
P(A
n
) = 0.
Applying nite additivity to the ndisjoint sets B
1
, B
2
, . . . , B
n1
,
B
i
,
i=n
we have
n1
P B
i
= P(B
i
) +P B
i
.
i=1 i=1 i=n
This equality holds for any n, and we can take the limit as n . The rst term
on the right-hand side converges to
i=1
P(B
i
). The second term is P(A
n
), and
11
as observed before, converges to zero. We conclude that
P
i=1
B
i
=
n
i=1
P(B
i
),
and property (a) holds.
References
(a) Grimmett and Stirzaker, Chapter 1, Sections 1.1-1.3.
(b) Williams, Chapter 1, Sections 1.0-1.5, 1.9-1.10.
12
MIT OpenCourseWare
http://ocw.mit.edu
6.436J / 15.085J Fundamentals of Probability
Fall 2008
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.