You are on page 1of 4

EN.553.

310/311 Probability and Statistics Fall 2022

Lecture 2: Axioms of Probability

The content of this lecture roughly corresponds to Sec. 3.4 of Sheldon and Ross’ Introduction to Probability
and Statistics for Engineers and Scientists.

2.1 Probability
1
The intuitive notion of probability
Since experiments are repeatable (ad infinitum), when we have an event A, we can think of the probability
of A, written P (A), as the long-term relative frequency of the event; that is, we perform the experiment a
large number of times, say N , and count how many of these trails resulted in an outcome belonging to the
n
event A, say n, and take this ratio N as N → ∞.
n
You will notice N is always between 0 and 1 for every N . You’ll also notice that, if this limit tends to 0,
then the likelihood the event A occurs is rare; whereas if this limit tends to 1, the event A is very likely to
occur.
Three Axioms of Probability

(1) for every event A, P (A) ≥ 0.


(2) P (Ω) = 1 (here, Ω is our sample space)
(3) whenever A, B are mutually exclusive events, P (A ∪ B) = P (A) + P (B)
(3∗ ) whenever A1 , A2 , A3 , . . . are any sequence of mutually exclusive events, P (A1 ∪ A2 ∪ · · ·) = P (A1 ) +
P (A2 ) + · · ·

Axiom (3) is called finite additivity. Axiom (3∗ ) is called countable additivity.
These axioms are equivalent if Ω is a finite set. However, if Ω is infinity, it turns out that axiom (3∗ )
implies axiom (3), but not necessarily the other way around. Therefore, when the 3 axioms are stated, it is
customary to take axioms (1), (2) and (3∗ ) as the axioms. (More on this in a bit, see remark ??)
The point here is no matter how we define a probability, it must satisfy these 3 axioms.
Here are consequences of the axioms (and consequently properties that all probability functions satisfy!)

Theorem 2.1 For any event A,

P (A) = 1 − P (Ac ) (2.1)

This is called the complementary rule of probability. Also, since (Ac )c = A, we can write this as P (Ac ) =
1 − P (A).
1 This is discussed in §2.1 of the textbook (R. J. Larsen and M. L. Marx, An Introduction to Mathematical Statistics and

its Applications, 6th ed., Pearson 2018. ) – I paraphrase here.

2-1
2-2 Lecture 2: Axioms of Probability

The proof is easy: since A and Ac are mutually exclusive and Ω = A ∪ Ac , we have

axiom(2) axiom(3)
1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac ).

and (??) follows by subtraction. 


Since Ωc = ∅, we have

Theorem 2.2

P (∅) = P (Ωc ) = 1 − P (Ω) = 0.

Remark 2.3 By taking A1 = A, A2 = B, and Ai = ∅ for i ≥ 3, the countable additivity axiom says
P (A ∪ B ∪ ∅ ∪ ∅ ∪ · · ·) = P (A ∪ B) = P (A) + P (B) + P (∅) + P (∅) + · · · = P (A) + P (B). This shows that
countable additivity implies finite additivity.

Theorem 2.4 If A ⊆ B, then P (A) ≤ P (B)

Here is a Venn diagram “proof”:

Point: B is the mutually exclusive union of A and B ∩ Ac and since P (B ∩ Ac ) ≥ 0 (axiom (1)), it follows
P (B) = P (A) + P (B ∩ Ac ) ≥ P (A) + 0 

Corollary 2.5 Since A ⊆ Ω, it follows P (A) ≤ P (Ω) = 1. So every event A has the property P (A) ≤ 1.

Theorem 2.6 (example of inclusion-exclusion principle) For any events A and B,

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

Notice that knowing any 3 of the 4 probabilities in the above equation gives us the 4th . For instance, P (A ∩
B) = P (A) + P (B) − P (A ∪ B).
Here’s another proof by “picture”:
Lecture 2: Axioms of Probability 2-3

We can think of A ∪ B as the union of 3 mutually exclusive regions: R1 ∪ R2 ∪ R3 = A ∪ B.

• On one hand,
P (A ∪ B) = P (R1 ∪ R2 ∪ R3 ) = P (R1 ) + P (R2 ) + P (R3 )

• On the other hand,


P (A) = P (R1 ∪ R2 ) = P (R1 ) + P (R2 )
P (B) = P (R2 ∪ R3 ) = P (R2 ) + P (R3 )
and adding
P (A) + P (B) = P (R1 ) + 2P (R2 ) + P (R3 )
= P (A ∪ B) + P (A ∩ B)

So, P (A ∪ B) = P (A) + P (B) + P (A ∩ B). 


Example 1. Suppose we know P (A) = 0.3, P (B) = 0.4, and P (A ∩ B) = 0.2. Find P (A ∪ B).
By the inclusion-exclusion rule,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= 0.3 + 0.4 − 0.2 = 0.5

The inclusion-exclusion principle generalize to any finite number of events. For example, for 3 events:
P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 )
− P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 )
+ P (A1 ∩ A2 ∩ A3 )
2-4 Lecture 2: Axioms of Probability

and the inclusion-exclusion principle for 4 events:

P (A1 ∪ A2 ∪ A3 ∪ A4 ) = P (A1 ) + P (A2 ) + P (A3 ) + P (A4 )


− P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A1 ∩ A4 ) − P (A2 ∩ A3 ) − P (A2 ∩ A4 ) − P (A3 ∩ A4 )
+ P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A4 ) + P (A1 ∩ A3 ∩ A4 ) + P (A2 ∩ A3 ∩ A4 )
− P (A1 ∩ A2 ∩ A3 ∩ A4 )

These formula can come in handy to give us a strategy for computing the union of a bunch of events.
Example 2. Suppose we have 3 events A, B and C that all have equal probability

P (A) = P (B) = P (C) = 0.5

Also, suppose we know

P (A ∩ B) = P (A ∩ C) = P (B ∩ C) = 0.25

and finally, suppose we also know

P (A ∩ B ∩ C) = 0.125.

Compute P (A ∪ B ∪ C). (Try to do this in two ways!)


Here’s one way: using the inclusion-exclusion principle for 3 sets, we have

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C)


= 0.5 + 0.5 + 0.5 − 0.25 − 0.25 − 0.25 + 0.125
= 1.5 − 0.75 + 0.125 = 0.875

Here’s another way by filling in probabilities into the mutually exclusive regions of a Venn diagram:

We start with P (A ∩ B ∩ C) = 0.125 (the center-most region). Work backwards...e.g. since P (A ∩ B) =


0.25, it must be that P (A ∩ B ∩ C c ) = 0.125. Same for P (A ∩ B c ∩ C) and P (Ac ∩ B ∩ C). Finally,
P (A ∩ B c ∩ C c ) = 0.125 since P (A) = 0.5 and 0.375 is taking up by the other mutually exclusive regions.
So (A ∪ B ∪ C) = 1 − P (Ac ∩ B c ∩ C c ) = 1 − 0.125 = 0.875.

You might also like