You are on page 1of 4


310/311 Probability and Statistics Fall 2022

Lecture 2: Axioms of Probability

The content of this lecture roughly corresponds to Sec. 3.4 of Sheldon and Ross’ Introduction to Probability
and Statistics for Engineers and Scientists.

2.1 Probability
The intuitive notion of probability
Since experiments are repeatable (ad infinitum), when we have an event A, we can think of the probability
of A, written P (A), as the long-term relative frequency of the event; that is, we perform the experiment a
large number of times, say N , and count how many of these trails resulted in an outcome belonging to the
event A, say n, and take this ratio N as N → ∞.
You will notice N is always between 0 and 1 for every N . You’ll also notice that, if this limit tends to 0,
then the likelihood the event A occurs is rare; whereas if this limit tends to 1, the event A is very likely to
Three Axioms of Probability

(1) for every event A, P (A) ≥ 0.

(2) P (Ω) = 1 (here, Ω is our sample space)
(3) whenever A, B are mutually exclusive events, P (A ∪ B) = P (A) + P (B)
(3∗ ) whenever A1 , A2 , A3 , . . . are any sequence of mutually exclusive events, P (A1 ∪ A2 ∪ · · ·) = P (A1 ) +
P (A2 ) + · · ·

Axiom (3) is called finite additivity. Axiom (3∗ ) is called countable additivity.
These axioms are equivalent if Ω is a finite set. However, if Ω is infinity, it turns out that axiom (3∗ )
implies axiom (3), but not necessarily the other way around. Therefore, when the 3 axioms are stated, it is
customary to take axioms (1), (2) and (3∗ ) as the axioms. (More on this in a bit, see remark ??)
The point here is no matter how we define a probability, it must satisfy these 3 axioms.
Here are consequences of the axioms (and consequently properties that all probability functions satisfy!)

Theorem 2.1 For any event A,

P (A) = 1 − P (Ac ) (2.1)

This is called the complementary rule of probability. Also, since (Ac )c = A, we can write this as P (Ac ) =
1 − P (A).
1 This is discussed in §2.1 of the textbook (R. J. Larsen and M. L. Marx, An Introduction to Mathematical Statistics and

its Applications, 6th ed., Pearson 2018. ) – I paraphrase here.

2-2 Lecture 2: Axioms of Probability

The proof is easy: since A and Ac are mutually exclusive and Ω = A ∪ Ac , we have

axiom(2) axiom(3)
1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac ).

and (??) follows by subtraction. 

Since Ωc = ∅, we have

Theorem 2.2

P (∅) = P (Ωc ) = 1 − P (Ω) = 0.

Remark 2.3 By taking A1 = A, A2 = B, and Ai = ∅ for i ≥ 3, the countable additivity axiom says
P (A ∪ B ∪ ∅ ∪ ∅ ∪ · · ·) = P (A ∪ B) = P (A) + P (B) + P (∅) + P (∅) + · · · = P (A) + P (B). This shows that
countable additivity implies finite additivity.

Theorem 2.4 If A ⊆ B, then P (A) ≤ P (B)

Here is a Venn diagram “proof”:

Point: B is the mutually exclusive union of A and B ∩ Ac and since P (B ∩ Ac ) ≥ 0 (axiom (1)), it follows
P (B) = P (A) + P (B ∩ Ac ) ≥ P (A) + 0 

Corollary 2.5 Since A ⊆ Ω, it follows P (A) ≤ P (Ω) = 1. So every event A has the property P (A) ≤ 1.

Theorem 2.6 (example of inclusion-exclusion principle) For any events A and B,

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

Notice that knowing any 3 of the 4 probabilities in the above equation gives us the 4th . For instance, P (A ∩
B) = P (A) + P (B) − P (A ∪ B).
Here’s another proof by “picture”:
Lecture 2: Axioms of Probability 2-3

We can think of A ∪ B as the union of 3 mutually exclusive regions: R1 ∪ R2 ∪ R3 = A ∪ B.

• On one hand,
P (A ∪ B) = P (R1 ∪ R2 ∪ R3 ) = P (R1 ) + P (R2 ) + P (R3 )

• On the other hand,

P (A) = P (R1 ∪ R2 ) = P (R1 ) + P (R2 )
P (B) = P (R2 ∪ R3 ) = P (R2 ) + P (R3 )
and adding
P (A) + P (B) = P (R1 ) + 2P (R2 ) + P (R3 )
= P (A ∪ B) + P (A ∩ B)

So, P (A ∪ B) = P (A) + P (B) + P (A ∩ B). 

Example 1. Suppose we know P (A) = 0.3, P (B) = 0.4, and P (A ∩ B) = 0.2. Find P (A ∪ B).
By the inclusion-exclusion rule,
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= 0.3 + 0.4 − 0.2 = 0.5

The inclusion-exclusion principle generalize to any finite number of events. For example, for 3 events:
P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 )
− P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 )
+ P (A1 ∩ A2 ∩ A3 )
2-4 Lecture 2: Axioms of Probability

and the inclusion-exclusion principle for 4 events:

P (A1 ∪ A2 ∪ A3 ∪ A4 ) = P (A1 ) + P (A2 ) + P (A3 ) + P (A4 )

− P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A1 ∩ A4 ) − P (A2 ∩ A3 ) − P (A2 ∩ A4 ) − P (A3 ∩ A4 )
+ P (A1 ∩ A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A4 ) + P (A1 ∩ A3 ∩ A4 ) + P (A2 ∩ A3 ∩ A4 )
− P (A1 ∩ A2 ∩ A3 ∩ A4 )

These formula can come in handy to give us a strategy for computing the union of a bunch of events.
Example 2. Suppose we have 3 events A, B and C that all have equal probability

P (A) = P (B) = P (C) = 0.5

Also, suppose we know

P (A ∩ B) = P (A ∩ C) = P (B ∩ C) = 0.25

and finally, suppose we also know

P (A ∩ B ∩ C) = 0.125.

Compute P (A ∪ B ∪ C). (Try to do this in two ways!)

Here’s one way: using the inclusion-exclusion principle for 3 sets, we have

P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C)

= 0.5 + 0.5 + 0.5 − 0.25 − 0.25 − 0.25 + 0.125
= 1.5 − 0.75 + 0.125 = 0.875

Here’s another way by filling in probabilities into the mutually exclusive regions of a Venn diagram:

We start with P (A ∩ B ∩ C) = 0.125 (the center-most region). Work backwards...e.g. since P (A ∩ B) =

0.25, it must be that P (A ∩ B ∩ C c ) = 0.125. Same for P (A ∩ B c ∩ C) and P (Ac ∩ B ∩ C). Finally,
P (A ∩ B c ∩ C c ) = 0.125 since P (A) = 0.5 and 0.375 is taking up by the other mutually exclusive regions.
So (A ∪ B ∪ C) = 1 − P (Ac ∩ B c ∩ C c ) = 1 − 0.125 = 0.875.

You might also like