Lecture 02

EN.553.
310/311 Probability and Statistics Fall 2022
Lecture 2: Axioms of Probability
The content of this lecture roughly corresponds to Sec. 3.4 of Sheldon and Ross’ Introduction to Probability
and Statistics for Engineers and Scientists.
2.1 Probability
1
The intuitive notion of probability
Since experiments are repeatable (ad infinitum), when we have an event A, we can think of the probability
of A, written P (A), as the long-term relative frequency of the event; that is, we perform the experiment a
large number of times, say N , and count how many of these trails resulted in an outcome belonging to the
n
event A, say n, and take this ratio N as N → ∞.
n
You will notice N is always between 0 and 1 for every N . You’ll also notice that, if this limit tends to 0,
then the likelihood the event A occurs is rare; whereas if this limit tends to 1, the event A is very likely to
occur.
Three Axioms of Probability
(1) for every event A, P (A) ≥ 0.

(2) P (Ω) = 1 (here, Ω is our sample space)
(3) whenever A, B are mutually exclusive events, P (A ∪ B) = P (A) + P (B)
(3∗ ) whenever A1 , A2 , A3 , . . . are any sequence of mutually exclusive events, P (A1 ∪ A2 ∪ · · ·) = P (A1 ) +
P (A2 ) + · · ·
Axiom (3) is called finite additivity. Axiom (3∗ ) is called countable additivity.
These axioms are equivalent if Ω is a finite set. However, if Ω is infinity, it turns out that axiom (3∗ )
implies axiom (3), but not necessarily the other way around. Therefore, when the 3 axioms are stated, it is
customary to take axioms (1), (2) and (3∗ ) as the axioms. (More on this in a bit, see remark ??)
The point here is no matter how we define a probability, it must satisfy these 3 axioms.
Here are consequences of the axioms (and consequently properties that all probability functions satisfy!)
Theorem 2.1 For any event A,
P (A) = 1 − P (Ac ) (2.1)
This is called the complementary rule of probability. Also, since (Ac )c = A, we can write this as P (Ac ) =
1 − P (A).
1 This is discussed in §2.1 of the textbook (R. J. Larsen and M. L. Marx, An Introduction to Mathematical Statistics and
its Applications, 6th ed., Pearson 2018. ) – I paraphrase here.
2-1
2-2 Lecture 2: Axioms of Probability
The proof is easy: since A and Ac are mutually exclusive and Ω = A ∪ Ac , we have
axiom(2) axiom(3)
1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac ).
and (??) follows by subtraction.

Since Ωc = ∅, we have
Theorem 2.2
P (∅) = P (Ωc ) = 1 − P (Ω) = 0.
Remark 2.3 By taking A1 = A, A2 = B, and Ai = ∅ for i ≥ 3, the countable additivity axiom says
P (A ∪ B ∪ ∅ ∪ ∅ ∪ · · ·) = P (A ∪ B) = P (A) + P (B) + P (∅) + P (∅) + · · · = P (A) + P (B). This shows that
countable additivity implies finite additivity.
Theorem 2.4 If A ⊆ B, then P (A) ≤ P (B)
Here is a Venn diagram “proof”:
Point: B is the mutually exclusive union of A and B ∩ Ac and since P (B ∩ Ac ) ≥ 0 (axiom (1)), it follows
P (B) = P (A) + P (B ∩ Ac ) ≥ P (A) + 0
Corollary 2.5 Since A ⊆ Ω, it follows P (A) ≤ P (Ω) = 1. So every event A has the property P (A) ≤ 1.
Theorem 2.6 (example of inclusion-exclusion principle) For any events A and B,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
Notice that knowing any 3 of the 4 probabilities in the above equation gives us the 4th . For instance, P (A ∩
B) = P (A) + P (B) − P (A ∪ B).
Here’s another proof by “picture”:
Lecture 2: Axioms of Probability 2-3
We can think of A ∪ B as the union of 3 mutually exclusive regions: R1 ∪ R2 ∪ R3 = A ∪ B.
• On one hand,
P (A ∪ B) = P (R1 ∪ R2 ∪ R3 ) = P (R1 ) + P (R2 ) + P (R3 )
• On the other hand,

P (A) = P (R1 ∪ R2 ) = P (R1 ) + P (R2 )
P (B) = P (R2 ∪ R3 ) = P (R2 ) + P (R3 )
and adding
P (A) + P (B) = P (R1 ) + 2P (R2 ) + P (R3 )
= P (A ∪ B) + P (A ∩ B)
So, P (A ∪ B) = P (A) + P (B) + P (A ∩ B).

Example 1. Suppose we know P (A) = 0.3, P (B) = 0.4, and P (A ∩ B) = 0.2. Find P (A ∪ B).
By the inclusion-exclusion rule,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= 0.3 + 0.4 − 0.2 = 0.5
The inclusion-exclusion principle generalize to any finite number of events. For example, for 3 events:
P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 )
− P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 )
+ P (A1 ∩ A2 ∩ A3 )
2-4 Lecture 2: Axioms of Probability
and the inclusion-exclusion principle for 4 events:
P (A1 ∪ A2 ∪ A3 ∪ A4 ) = P (A1 ) + P (A2 ) + P (A3 ) + P (A4 )

− P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A1 ∩ A4 ) − P (A2 ∩ A3 ) − P (A2 ∩ A4 ) − P (A3 ∩ A4 )
+ P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A4 ) + P (A1 ∩ A3 ∩ A4 ) + P (A2 ∩ A3 ∩ A4 )
− P (A1 ∩ A2 ∩ A3 ∩ A4 )
These formula can come in handy to give us a strategy for computing the union of a bunch of events.
Example 2. Suppose we have 3 events A, B and C that all have equal probability
P (A) = P (B) = P (C) = 0.5
Also, suppose we know
P (A ∩ B) = P (A ∩ C) = P (B ∩ C) = 0.25
and finally, suppose we also know
P (A ∩ B ∩ C) = 0.125.
Compute P (A ∪ B ∪ C). (Try to do this in two ways!)

Here’s one way: using the inclusion-exclusion principle for 3 sets, we have
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C)

= 0.5 + 0.5 + 0.5 − 0.25 − 0.25 − 0.25 + 0.125
= 1.5 − 0.75 + 0.125 = 0.875
Here’s another way by filling in probabilities into the mutually exclusive regions of a Venn diagram:
We start with P (A ∩ B ∩ C) = 0.125 (the center-most region). Work backwards...e.g. since P (A ∩ B) =

0.25, it must be that P (A ∩ B ∩ C c ) = 0.125. Same for P (A ∩ B c ∩ C) and P (Ac ∩ B ∩ C). Finally,
P (A ∩ B c ∩ C c ) = 0.125 since P (A) = 0.5 and 0.375 is taking up by the other mutually exclusive regions.
So (A ∪ B ∪ C) = 1 − P (Ac ∩ B c ∩ C c ) = 1 − 0.125 = 0.875.

Lecture 02

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 02

Uploaded by

Copyright:

Available Formats

EN.553.

310/311 Probability and Statistics Fall 2022

Lecture 2: Axioms of Probability

(1) for every event A, P (A) ≥ 0.

Theorem 2.1 For any event A,

P (A) = 1 − P (Ac ) (2.1)

its Applications, 6th ed., Pearson 2018. ) – I paraphrase here.

and (??) follows by subtraction.

P (∅) = P (Ωc ) = 1 − P (Ω) = 0.

Theorem 2.4 If A ⊆ B, then P (A) ≤ P (B)

Here is a Venn diagram “proof”:

Theorem 2.6 (example of inclusion-exclusion principle) For any events A and B,

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

We can think of A ∪ B as the union of 3 mutually exclusive regions: R1 ∪ R2 ∪ R3 = A ∪ B.

• On the other hand,

So, P (A ∪ B) = P (A) + P (B) + P (A ∩ B).

and the inclusion-exclusion principle for 4 events:

P (A1 ∪ A2 ∪ A3 ∪ A4 ) = P (A1 ) + P (A2 ) + P (A3 ) + P (A4 )

P (A) = P (B) = P (C) = 0.5

Also, suppose we know

and finally, suppose we also know

Compute P (A ∪ B ∪ C). (Try to do this in two ways!)

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C)

We start with P (A ∩ B ∩ C) = 0.125 (the center-most region). Work backwards...e.g. since P (A ∩ B) =

You might also like