You are on page 1of 13

Handout 2

MA 202 - Probability and Statistics

IIT Ropar, Rupnagar

January 14, 2020

1
1 Conditional Probability and Independence
- Multiplication rule

- Total probability theorem

- Bayes’ theorem

- Independence and Conditional Independence

Importance

We are often interested in calculating probabilities when some partial information concerning the
result of the experiment is available or recalculating probabilities in light of additional information.

Conditional Probability Motivation: Consider that you want to buy a second hand car
with your limited knowledge. You calculate (estimate) the probability that the car is drivable
to be 0.8 (say). Then you call your friend who is an expert in automobile. He find some ugly
truths about your selected second hand car and now you revise your probability (that the car
is worthy to buy) to 0.3 (say).

So conditional probability is to calculate probability of an event in light of new or addi-


tional information.

Example 1.1. Consider that all 6 possible outcome of a fair die roll are equally likely. Con-
sider two events
A = the outcome is 6.
B = the outcome is an even number = {2, 4, 6}.

1 #(A ∩ B)
P(the outcome is 6|the outcome is even) = =
3 #(B)
#(A ∩ B)/#(Ω)
=
#(B)/#(Ω)
P(A ∩ B)
=
P(B)

Example 1.2. Consider single role of a pair of dice. The sample space Ω of this experiment
is
Ω = {(i, j) : i = 1, 2, . . . , 6, j = 1, 2, . . . , 6}

Suppose that each of the 36 outcomes are equally likely. Let A be the event that sum of the
5
outcomes on two dice is 8. Thus, A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}. Then P(A) = 36 .

2
Suppose you know that first roll was 3 and this is denoted by event B. What is the probability
that the sum of two rolls is 8.

1
P(B) = , P(A|B) =?
6
First roll is 3, then the total possibilities are 6, i.e. B = {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}.
By looking at the new universe (or new sample space) B, it is easy to observe that

1
P(A|B) = .
6
Further,
P(A ∩ B) 1/36 1
= = .
P(B) 1/6 6
Thus
P(A ∩ B)
= P(A|B)
P(B)
The above examples motivate the definition of conditional probability as follows.

Definition 1.1. The conditional probability of event A given event B is given by

P(A ∩ B)
P(A|B) = , P(B) 6= 0.
P(B)

Remark 1.1. 1. Suppose B has occurred that means the outcome is inside B. P(A|B) =
probability of A, given that B has occurred. Since B has already occurred the outcome
may at most lie within A ∩ B. Hence the probability of occurrence of A given B occurred
is P(A ∩ B)/P(B).

2. B is new universe now. That is P(B|B) = 1.

3. Conditional probability specify a probability law.

P(A∩B)
(a) P(A|B) = P(B)
≥ 0.
P(Ω∩B) P(B)
(b) P(Ω/B) = P(B)
= P(B)
= 1.
(c) Additivity Axiom. For A and B disjoint

3
P((C ∪ D) ∩ B) P((C ∩ B) ∪ (D ∩ B))
P(C ∪ D/B) = = , C ∩D =φ
P(B) P(B)
P(C ∩ B) P(D ∩ B)
= + (by A3 )
P(B) P(B)
= P(C|B) + P(D|B).

#(A∩B)
4. If the possible outcomes are finitely many and equally likely, then P(A|B) = #(B)
.

Example 1.3. Toss a fair coin three times successively. We wish to find the conditional
probability P(A|B), where A = {more heads than tails comes up}, B = {1st toss is a head}.

Solution.
Ω = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T };
A = {HHH, HHT, HT H, T HH}, B = {HHH, HHT, HT H, HT T };
A ∩ B = {HHH, HHT, HT H}.
)
3
P(A ∩ B) = 8
3
=⇒ P(A|B) =
P(B) = 84 4

Note: Since all possible outcomes are equally likely here, we can calculate P(A|B) =
#(A∩B)
#(B)
= 34 .

Multiplication Rule
Assuming that all the conditioning events have positive probabilities, then it follows

n−1
P (∩ni=1 Ai ) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) . . . P An | ∩i=1

Ai ,

which can be viewed as

P(A2 ∩ A1 ) P(A1 ∩ A2 ∩ . . . ∩ An )
P (∩ni=1 Ai ) = P(A1 ) · ···
P(A1 ) P(A1 ∩ A2 ∩ . . . ∩ An )
= P(A1 ).P(A2 |A1 ) . . . P(An |A1 ∩ A2 ∩ . . . ∩ An−1 ).

Example 1.4. A bag of marbles contains 2 blue and 3 red marbles. 2 marbles are drawn at
random. What is the probability that both are blue?

Solution. Let A be the event that first marble is blue and B be the event that second mar-
ble is also blue. Now, P(A) = 2/5 and P(B|A) = 1/4. Using multiplication rule, we have,

4
2 1 1
P(A ∩ B) = P(A) · P(B|A) = 5
· 4
= 10
.

Partition of s set. A family of sets P is a partition of X iff all of the following condition
hold.

1. P does not contain the empty set.

2. The union of sets in P is equal to X.

3. The intersection of any two distinct sets in P is empty (or elements of P are pairwise
disjoint).

Total Probability Theorem


Let A1 , A2 , . . . , An be disjoint events that form a partition of the sample space (which means
Ai ∩ Aj = φ for i, j ∈ {1, 2, . . . , n} and A1 ∪ A2 ∪ . . . ∪ An = Ω) and assume that P(Ai ) > 0 ∀ i.
Then for any event B, we have

P(B) = P(A1 ) · P(B|A1 ) + · · · + P(An ) · P(B|An ).

Proof.

Now, B = B ∩ Ω = B ∩ (∪ni=1 Ai ) = (A1 ∩ B) ∪ (A2 ∩ B) ∪ . . . ∪ (An ∩ B).


This implies

P(B) = P(A1 ∩ B) + P(A2 ∩ B) + · · · + P(An ∩ B)


= P(A1 ) · P(B|A1 ) + · · · + P(An ) · P(B|An ).

Total probability can be viewed as a weighted sum where weights are respective
probabilities.

5
Example 1.5. Two cards from an ordinary deck of 52 cards are missing. What is the proba-
bility that a random card drawn from this deck is a spade ?

Solution. Let E be the event that the randomly drawn card is a spade. Let Fi , i = 0, 1, 2 be
the events that i spades are missing from the deck. By total probability theorem

P(E) = P(E|F0 )P(F0 ) + P(E|F1 )P(F1 ) + P(E|F2 )P(F2 )


13 39 13 39 13 39
     
13 0 2 12 1 1 11 2 0
=  +  +
50 52 50 52 50 52

2 2 2
1
= .
4
Bayes’ Theorem
Let A1 , A2 , · · · , An is a partition of sample space Ω.

- Suppose we know prior probabilities P(Ai ).

- Suppose we also know conditional probabilities P(B|Ai ) for each i.

- Aim is to calculate P(Ai |B), that is revise “beliefs”, given that B has occurred.

P(Ai ∩ B) P(Ai )P(B|Ai )


P(Ai |B) = =
P(B) P(B)
P(Ai )P(B|Ai )
= Pn
i=1 P(Ai )P(B|Ai )

Remark 1.2. Bayes’ theorem is extendable to countable partition of Ω.

Example 1.6. Consider radar detection problem, where


A = {event that an aircraft is present}
B = {the radar generates an alarm }.

6
We are given that

P(A) = 0.05, P(B|A) = 0.99 (detection), P(B|Ac ) = 0.1 (False alarm).

Now calculate
P(B|A)P(A)
P(Aircraft is present|Alarm) = P(A|B) =
P(B)
P(B|A)P(A)
=
P(B|A)P(A) + P(B|Ac )P(Ac )
0.99 × 0.05
= = 0.3426.
0.99 × 0.05 + 0.95 × 0.1

Here it says that given that radar is giving an alarm the probability that an aircraft is
present is only 34.26%.

Example 1.7 (Spam filtering using naive Bayesian). Consider the following data.

S.No. Contains word “Lottery” Spam


1 yes yes
2 no yes
3 yes yes
4 yes yes
5 yes yes
6 no no
7 yes no
8 yes yes
9 yes yes
10 yes yes

P(B|A)P(A)
P(A|B) =
P(B|A)P(A) + P(B|Ac )P(Ac )
In spam filtering example P(Spam) = 0.8, P(Not spam) = 0.2, P(Lottery|Spam) = 7/8 and
P(Lottery|N ot spam) = 1/2. Now

P(Lottery|Spam)P(Spam)
P(Spam|Lottery) =
P(Lottery|Spam)P(Spam) + P(Lottery|N ot spam)P(N ot spam)
7/8 × 0.8
= = 0.875.
7/8 × 0.8 + 1/2 × 0.2

7
Thus the probability of a new mail containing word “Lottery” being spam is 0.875. Suppose
the cutoff probability of a mail to be sent to SPAM folder is 0.90, this new mail will go to
INBOX.

2 Independence of two and more events


We have discussed earlier where occurrence of event B affect the probability of occurrence
of event A. An important special case arises when the occurrence of B provides no such
information, i.e.
P(A|B) = P(A).

When the above equality holds, we say that A is independent of B. Since P(A|B) =
P(A∩B)
P(B)
=⇒ P(A ∩ B) = P(A) · P(B). This lead to the following definition.
Definition 2.1. Two events A and B are said to be independent if P(A ∩ B) = P(A)P(B).
Notes.

1. Independence is often easy to grasp intuitively. For example, outcomes from random
experiments separated by time and space are generally independent. For example if
you toss a coin here and your friend roll a die in another room are independent events.
Further, if India wins cricket match there will be rain describes two independent events.
However, if India wins match and there will be fire crackers describe two dependent
events.

2. However, independence is difficult to visualize by looking on the sample space or venn


diagram.

3. Do not confuse independent events with disjoint events.

1. Disjointness is a set-theoretic concept while independence is a probabilistic concept.


2. Two events can be independent under one probability law and dependent under another
probability law.

8
If A and B are such that A ∩ B = φ, P(A) > 0 and P(B) > 0, then A and B are not
independent. Since P(A ∩ B) = 0 6= P(A) · P(B).

Suppose A and B are disjoint events. If P(A) 6= 0 and it is told that event B has occurred then P
revised probability of event A given occurrence of event B and which is changed
and hence A and B are not independent.

4. If A and B are independent, then so are A and B c .

P(A) = P(A ∩ B) + P(A ∩ B c ) = P(A)P(B) + P(A ∩ B c ).

This implies
P(A ∩ B c ) = P(A)[1 − P(B)] = P(A)P(B c ).

This means, A and B c are independent.

5. In fact if A and B are independent. Then following pairs of events are independent

(a) Ac , B
(b) A, B c
(c) Ac , B c

Suppose that A is independent of B and is also independent of C . Is A necessarily independent


of B ∩ C?. The answer is surprisingly no. Consider following example.
Example 2.1. Two fair dice are thrown. Let A denote the event that the sum of the dice is
7. Let B be the event that first die equals 4 and let C be the event that second die equal three.
Now
A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
B = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)}
C = {(1, 3), (2, 3), (3, 3), (4, 3), (5, 3), (6, 3)}
A ∩ B = {(4, 3)}, A ∩ C = {(4, 3)}
B ∩ C = {(4, 3)}, A ∩ B ∩ C = {(4, 3)}
Since all events are equally likely.

1 1 1
P(A ∩ B) = = P(A)P(B) = · .
36 6 6
1 1 1
P(A ∩ C) = = P(A)P(C) = · .
36 6 6
P(A ∩ B ∩ C)
P(A|B ∩ C) = = 1 6= P(A).
P(B ∩ C)

9
Conditional Independence. We know that conditional probabilities of events, conditioned
on a particular event form a legitimate probability law. We can thus define independence of
various events with respect to this probability law.

Definition 2.2 (Conditional independence). Given an event C, the event A and B are called
conditionally independent if

P(A ∩ B|C) = P(A|C)P(B|C).

Conditioning may affect independence. Assume that A and B are independent. If we


are told that C has occurred. Are A and B still independent?.

Example 2.2. Consider two independent fair coin tosses, in which all four possible outcomes
are equally likely. Let
H1 = { 1st toss is a head }.
H2 = { 2nd toss is head }.
D = { the two tosses have different result }.
We can check that P(H1 ) = 21 , since H1 = {HH, HT }
P(H2 ) = 12 , since H2 = {T H, HH}
P(H1 ∩ H2 ) = 14 , since H1 ∩ H2 = {HH}.
Thus P(H1 ∩ H2 ) = P(H1 )P(H2 ) and hence these events are unconditionally independent.
Next,
P(H1 ∩ D) 1/4 1
P(H1 |D) = = =
P(D) 1/2 2
Similarly, P(H2 |D) = 1/2
P(H1 ∩ H2 |D) = 0, since H1 ∩ H2 ∩ D = φ.
So,
P(H1 ∩ H2 |D) 6= P(H1 |D)P(H2 |D).

Conditioning may affect independence


Venn-diagram
A and B are such that

P(A ∩ B) = P(A)P(B).

Here, A ∩ B ∩ C = φ, A ∩ B 6= φ, B ∩ C 6= φ
and (A ∩ C) ∩ (B ∩ C) = φ.

P(A ∩ B ∩ C)
P(A ∩ B|C) = = 0,
P(C)

10
but
P(A ∩ C)
P(A|C) = 6= 0,
P(C)
P(B ∩ C)
P(B|C) = 6= 0.
P(C)
Example 2.3 (Conditional independence doesn’t imply independence). A and A are condi-
tionally independent given A but are not independent. Since, P(A ∩ A|A) = P(A|A) = 1 =
P(A|A)P(A|A). But P(A ∩ A) 6= P(A)P(A), if P(A) 6= 0, 1.

Independence of a collection of events (mutual independence). The events A1 , A2 , . . . , An


are independent if
Y
P (∩i∈S Ai ) = P(Ai ) ∀ subset S of {1, 2, . . . , n}
i∈S

n n n
  
There will be total 2
+ 3
+ ··· + n
= 2n − n − 1 conditions.

For the case of three events its implies



P(A1 ∩ A2 ) = P(A1 )P(A2 ) 

P(A1 ∩ A3 ) = P(A1 )P(A3 ) =⇒ pairwise independent

P(A2 ∩ A3 ) = P(A2 )P(A3 )

P(A1 ∩ A2 ∩ A3 ) = P(A1 )P(A2 )P(A3 ).

(a) First three conditions do not imply the fourth condition.

(b) Further the fourth condition does not imply the first three.

Example 2.4 (Pairwise independence doesn’t imply mutual independence). Consider an urn
containing 4 balls numbered 110, 101, 011 and 000. From this urn one ball is drawn at random.
For k = 1, 2, 3 let Ak be the event of drawing a ball numbered with 1 in the kth position. Now
P(A1 ) = P(A2 ) = P(A3 ) = 21 . Further, P(A1 ∩ A2 ) = P(A2 ∩ A3 ) = P(A3 ∩ A1 ) = 14 .
Moreover, P(A1 ∩ A2 ∩ A3 ) = 0. Hence A1 , A2 and A3 are pairwise independent but not
mutually independent.

Example 2.5 (The fourth condition does not imply the first three). Toss two different stan-
dard dice having colors white and black. The sample space S of the outcomes consists of all
ordered pairs (i, j), i, j = 1, . . . , 6 such that S = {(1, 1), (1, 2), · · · , (6, 6)}. Consider the events
A1 = {first die = 1, 2 or 3}
A2 = {first die = 3, 4 or 5}

11
A3 = {sum of faces is 9}
In this example P(A1 ∩ A2 ∩ A3 ) = P(A1 )P(A2 )P(A3 ), but P(A1 ∩ A2 ) 6= P(A1 )P(A2 ), P(A1 ∩
A3 ) 6= P(A1 )P(A3 ). Here P(A1 ) = 1/2, P(A2 ) = 1/2 and P(A3 ) = 1/9. Further, P(A1 ∩ A2 ∩
A3 ) = 1/36, P(A1 ∩ A2 ) = 1/6, P(A2 ∩ A3 ) = 1/12 and P(A1 ∩ A3 ) = 1/36.
Example 2.6. Suppose S = {1, 2, . . . , 8}, with all outcomes equally likely. Let A1 = A2 =
{1, 2, 3, 4} and A3 = {2, 5, 6, 8}. It is easy to verify that the fourth condition does not imply
the first three.
Example 2.7. If A1 , A2 and A3 are independent events, then A1 and A2 ∪A3 are independent.
Also, A1 and A2 ∪ Ac3 are independent.
Solution.

1. P(A1 ∩ (A2 ∪ A3 )) = P((A1 ∩ A2 ) ∪ (A1 ∩ A3 ))


= P(A1 ∩ A2 ) + P(A1 ∩ A3 ) − P(A1 ∩ A2 ∩ A3 )
= P(A1 )P(A2 ) + P(A1 )P(A3 ) − P(A1 )P(A2 )P(A3 )
= P(A1 ) [P(A2 ) + P(A3 ) − P(A2 )P(A3 )]
= P(A1 )P(A2 ∪ A3 ).
2. P(A1 ∩ (A2 ∪ Ac3 )) = P((A1 ∩ A2 ) ∪ (A1 ∩ Ac3 ))
= P((A1 ∩ A2 ) + P(A1 ∩ Ac3 ) − P(A1 ∩ A2 ∩ Ac3 )
= P(A1 )P(A2 ) + P(A1 )P(Ac3 ) − P(A1 )P(A2 )P(Ac3 )
= P(A1 ) [P(A2 ) + P(Ac3 ) − P(A2 )P(Ac3 )]
= P(A1 )P(A2 ∪ Ac3 ).

Proposition 2.1. Further if A1 , A2 and A3 are independent, then


(a) Ac1 , Ac2 and Ac3 are independent.

(b) Ac1 ∪ Ac2 and A3 are independent.


P(Ac1 ∩ Ac2 ∩ Ac3 ) = P(Ac1 )P(Ac2 )P(Ac3 )
P(Ac1 ∩ Ac2 ) = P(Ac1 )P(Ac2 )
Proof. (a) We need to show the following conditions.
P(Ac1 ∩ Ac3 ) = P(Ac1 )P(Ac3 )
P(Ac2 ∩ Ac3 ) = P(Ac2 )P(Ac3 )

P(Ac1 ∩ Ac2 ) = P(A1 ∪ A2 )c = 1 − [P(A1 ) + P(A2 ) − P(A1 )P(A2 )]


= 1 − P(A1 ) − P(A2 )(1 − P(A1 )) = (1 − P(A1 ))(1 − P(A2 ))
= P(Ac1 )P(Ac2 ).

12
Similarly for other pairs of events. Next

P(Ac1 ∩ Ac2 ∩ Ac3 ) = P(A1 ∪ A2 ∪ A3 )c = 1 − P(A1 ∪ A2 ∪ A3 )


= 1 − [P(A1 ) + P(A2 ) + P(A3 ) − P(A1 )P(A2 ) − P(A2 )P(A3 )
− P(A3 )P(A1 ) + P(A1 )P(A2 )P(A3 )]
= 1 − P(A1 ) − P(A2 ) − P(A3 ) + P(A1 )P(A2 ) + P(A2 )P(A3 )
+ P(A3 )P(A1 ) − P(A1 )P(A2 )P(A3 )
= 1 − P(A1 ) − P(A2 )(1 − P(A1 )) − P(A3 )(1 − P(A1 ))
+ P(A2 )P(A3 )(1 − P(A1 ))
= (1 − P(A1 )) [1 − P(A2 ) − P(A3 ) + P(A2 )P(A3 )]
= (1 − P(A1 )) [1 − P(A2 ) − P(A3 )(1 − P(A2 ))]
= (1 − P(A1 ))(1 − P(A2 ))(1 − P(A3 ))
= P(Ac1 )P(Ac2 )P(Ac3 )

(b)

P((Ac1 ∪ Ac2 ) ∩ A3 ) = P(A3 ∩ (A1 ∩ A2 )c )


= P(A3 ) − P(A3 ∩ A1 ∩ A2 )
= P(A3 ) − P(A1 )P(A2 )P(A3 )
= P(A3 ) [1 − P(A1 )P(A2 )] = P(A3 ) [1 − P(A1 ∩ A2 )]
= P(A3 )P(A1 ∩ A2 )c = P(A3 )P(Ac1 ∪ Ac2 ).

Thus Ac1 ∪ Ac2 is independent of A3 .

References
Dimitri Bertsekas and John N. Tsitsiklis (2008). Introduction to Probability, Athena
Scientific, 2nd edition.

Sheldon M. Ross (2009). Introduction to Probability and Statistics for Engineers and
Scientists, Academic Press.

13

You might also like