You are on page 1of 73

PROBABILITY AND PROBABILITY

DISTRIBUTIONS

Ishaan Taneja

2
About the author
The author, Ishaan Taneja, is a highly accomplished and driven professional with a profound
expertise in the fields of Statistics and Computer Science. With a solid academic background,
including a B.Sc. (Hons.) in Statistics from Ram Lal Anand College, University of Delhi,
and an M.Sc. in Statistics from Hindu College, University of Delhi. They have consistently
demonstrated exceptional performance and a passion for data analysis.

Currently pursuing an M.S. in Computer Science at IIT Madras, Ishaan Taneja is actively
exploring the intersection of Statistics and Machine Learning, further expanding their knowl-
edge and understanding of the subject. With their diverse experiences, including their role
as a Course Instructor at IIT Madras BS Degree Programme, where they have contributed
to the development of an online Statistics coursework.

Ishaan Taneja possesses the unique ability to bridge theoretical concepts with practical ap-
plications. Through their book, ”Probability and Probability Distributions,” they aim to
provide readers with a comprehensive and accessible exploration of these subjects, sharing
their expertise and empowering others in the field.

3
Contents
1 Data, Statistics and Probability 6
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.1 Deterministic pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Random-like pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 Statistical study of phenomenon . . . . . . . . . . . . . . . . . . . . . 7
1.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3.1 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Occurrence of events . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1.1 One event can be contained in another: . . . . . . . . . . . . 10
1.3.1.2 Complement of an event: . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Combining events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.1 Union of events: . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.2 Intersection of events: . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.3 Solved examples: . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Disjoint events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3.1 Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 De Morgan’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.5 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Venn Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Introduction to Probability 17
2.1 Events and Chance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Mathematical and Probability theory . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Mathematical Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Basic properties of probability . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1 Working with probability spaces . . . . . . . . . . . . . . . . . . . . . 22
2.4 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Uniform Distribution on a finite sample space . . . . . . . . . . . . . 25
2.4.2 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.1 Conditional Probability Space . . . . . . . . . . . . . . . . . . . . . . 27
2.5.2 Multiplication rule: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.3 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Bayes’ Theorem and Independence . . . . . . . . . . . . . . . . . . . . . . . 31
2.6.1 Law of Total Probability . . . . . . . . . . . . . . . . . . . . . . . . . 31

4
2.6.1.1 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6.2 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6.2.1 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.3 Independence of Two Events . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.3.1 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6.4 Mutual Independence of Three Events . . . . . . . . . . . . . . . . . 41
2.6.4.1 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7 Repeated Independent Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1 Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1.1 Single Bernoulli Trial . . . . . . . . . . . . . . . . . . . . . . 43
2.7.1.2 Repeated Bernoulli Trials . . . . . . . . . . . . . . . . . . . 44
2.7.1.3 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7.2.1 Visualtion of Binomial Distribution . . . . . . . . . . . . . . 47
2.7.2.2 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7.3 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.7.3.1 Visualisation of Geometric Distribution . . . . . . . . . . . 52
2.7.3.2 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . 53
2.8 Discrete Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.8.1 Random Variable and events . . . . . . . . . . . . . . . . . . . . . . . 57
2.8.2 Distribution of a Discrete Random Variable . . . . . . . . . . . . . . 58
2.8.2.1 Probability Mass Function (PMF) . . . . . . . . . . . . . . 58
2.8.2.2 Properties of PMF . . . . . . . . . . . . . . . . . . . . . . . 59
2.8.3 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.9 Common Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.9.1 Uniform Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 64
2.9.2 Bernoulli Random Variable . . . . . . . . . . . . . . . . . . . . . . . 64
2.9.3 Binomial Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 65
2.9.4 Geometric Random Variable . . . . . . . . . . . . . . . . . . . . . . . 65
2.9.5 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . 65
2.9.5.1 Negative Binomial Random Variable . . . . . . . . . . . . . 66
2.9.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.9.6.1 Application of Poisson Distribution . . . . . . . . . . . . . . 66
2.9.6.2 Visualtion of Poisson Distribution . . . . . . . . . . . . . . . 69
2.9.6.3 Poisson Random Variable . . . . . . . . . . . . . . . . . . . 69
2.9.7 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . 69
2.9.7.1 Hypergeometric Random Variable . . . . . . . . . . . . . . . 70
2.10 Functions of one discrete random variable . . . . . . . . . . . . . . . . . . . 70
2.10.1 Solved Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5
Chapter 1

1 Data, Statistics and Probability


1.1 Introduction
1.1.1 Deterministic pattern
• The result can be predicted with certainty.
• There is a theoretical framework for deterministic phenomenon.
• For example:

1.1.2 Random-like pattern


• The result can be predicted with chances of it happening but not with certainty.
• We cannot expect exact models but can expect some patterns when the random-phenomenon
repeats.
• For example:

6
1.1.3 Statistical study of phenomenon

1.2 Basic Concepts


1.2.1 Experiment
In statistics, any process or phenomenon that we wish to study statistically is called an
experiment.
In other words, when we want to study statistically to see if there are statistical patterns in
the data or phenomena, we use the word experiment to denote that.
Examples

(i) Tossing a coin.

(ii) Rolling a die.

(iii) Indian Premier League tournament.

1.2.2 Outcome
The result of an experiment (in as much detail as necessary) is referred to as the Outcome
of that experiment.
It may also be referred to as data in some sense.

Examples

(i) When the experiment is ‘Tossing a coin’, the possible outcomes could be either ‘Heads’
or ‘Tails’.

(ii) When the experiment is ‘Rolling a die’, the possible outcomes could be either ‘1’ or ‘2’
or ‘3’ or ‘4’ or ‘5’ or ‘6’.

(iii) When the experiment is complex like ‘Indian Premier League tournament’, we cannot
list down all possible outcomes easily. (It could be available in YAML format file)

7
1.2.3 Sample Space
The sample space is a set that contains all outcomes of an experiment. It is typically denoted
by ‘S’.
Note:
• In practice, it is enough to imagine a sample space instead of explicitly writing it down.
• In situations where confusion occurs, writing down the sample space can give the clarity.
Examples

(i) If the experiment is ‘Tossing a coin’ then, the sample space; S = {Heads, Tails}.

(ii) If the experiment is ‘Rolling a die’ then, the sample space; S = {1, 2, 3, 4, 5, 6}.

(iii) If the experiment is as complex as ‘Indian Premier League tournament’ then the sample
space is difficult to write down. So, we generally break it into small experiments
depending on what level of details we want.

• Runs scored in one delivery: {1, 2, 3, 4, . . . }


• Winner in 2022: {CSK, MI, DC, . . . }(10 teams in 2022)

1.2.3.1 Solved Examples:

Q1. In an urn of marbles, there are three marbles each of colour red, blue and white. If
the experiment is drawing a marble from the urn, then write down its sample space.
Solution:
The possible outcomes are that either the marble drawn is of red colour or blue colour
or white colour.
Hence, the sample space; S = {red, blue, white}

Q2. In an urn of marbles, there are two red coloured marbles, and each one of blue and
white colour. If the experiment is drawing two marbles from the urn with replacement,
then write down its sample space.
Solution:
The possible outcomes in the first draw is that the marble drawn is of red colour or
blue colour or white colour.
Since, we are drawing the marble with replacement, so for the second draw the possible
outcomes are the same as first draw.
Let us denote, Red: R, Blue: B and White: W
Hence, the sample space; S = {RR, RB, RW, BR, BW, BB, WR, WB, WW}

Q3. In an urn of marbles, there are two red coloured marbles, and 1 each of blue and white
colour. If the experiment is drawing two marbles from the urn without replacement,
then write down its sample space.
Solution:

8
The possible outcomes in the first draw is that the marble drawn is of red colour or
blue colour or white colour.
Since, we are drawing the marble without replacement, so for the second draw the
possible outcomes depends on the first draw.
Let us denote, Red: R, Blue: B and White: W
Hence, the sample space; S = {RR, RB, RW, BR, BW, WR, WB}

Q4. If the experiment is to draw a card from a well shuffled pack of 52 cards, then write
down its sample space.
Solution:
The possible outcomes are that either the numbered card or the face card is drawn.
In addition, it could belong to any of the four suits, i.e. Hearts, Diamonds, Clubs and
Spades.
So, the sample space can conveniently be written as the Cartesian product of these
two sets, i.e. Set of suits and Set of values.
Hence, S = {Hearts, Diamonds, Clubs, Spades}×{2, 3, 4, 5, 6, 7, 8, 9, 10, J, K, Q, A}
which is the same as the set of 52 cards in the shuffled pack.

Remark:
In the above example, if we were to draw 13 cards from the well shuffled pack of 52 cards,
then the number of possible outcomes are 52 C13 .
And, writing down the sample space for this experiment is a very tedious task. Hence, it is
not useful to write down sample space in such cases, but to imagine it instead.

1.3 Events
An event is a subset of the sample space. There is a technical restriction on what subsets
can be events.
(More details on technical restrictions in section 2.8.1)

Note:

• Events are central objects in probability theory.

• All set theory notions apply to events, and that tends to have a natural meaning.

• If the number of elements in sample space = N


Then, the maximum number of events (i.e. number of possible subsets) = 2N

Examples

(i) Tossing a coin: S = {Heads, Tails}.

• Number of events = 22 = 4

9
• Events: empty set, {Heads}, {Tails}, {Heads, Tails}
(ii) Rolling a die: S = {1, 2, 3, 4, 5, 6}.
• Number of events = 26 = 64
• Events: empty set, {1}, {2}, {3}, {4}, {5}, {6}, {1,2}, {1,3}, . . . , {1, 2, 3, 4, 5, 6}
• Some events can also be described in words: Getting an even number i.e. {2,4,6},
Getting a multiple of 3 i.e. {3,6}, etc.
(iii) Fisherman goes out to fish:
• Sample space is complicated as we have to describe everything like catch of fish
in kilos, type of fish and so on.
• We can still define events of interest quite easily, even on the complicated sample
space.
Events: Catch is more than 100 kg, Pomfret is the catch, etc.

1.3.1 Occurrence of events


An event is said to have “occurred” if the actual outcome of the experiment belongs to the
event.

1.3.1.1 One event can be contained in another:


If the set A is a subset of the set B i.e. A ⊆ B implies that:
• If A occurred, B has also occurred.
• If B occurred, A may or may not have occurred.
Examples
Rolling a die: S = {1, 2, 3, 4, 5, 6}.
(i) A = {2, 6} and B = {even number} =⇒ A ⊆ B
(ii) A = {1, 3, 5} and B = {prime number > 2} =⇒ B ⊆ A

1.3.1.2 Complement of an event:


The complement of an event A is denoted by Ac and is defined as:
Ac = {outcomes in S and not in A} = (S \A) which implies that:
• If A occurred, Ac did not occur.
• If Ac occurred, A did not occur.
Examples
Rolling a die: S = {1, 2, 3, 4, 5, 6}.
(i) A = {2, 4, 6} =⇒ Ac = {1, 3, 5}
(ii) A = {not a prime number} =⇒ Ac = {prime number}

10
1.3.2 Combining events
We can combine events to create new events, and the standard way to operate on is by using
the two operations on the events.
These set operations are very central in set theory, namely, ‘Unions’ and ‘Intersections’.

1.3.2.1 Union of events:


The union of two or more than two events is the collection of all the outcomes that belong
to either of the events. It is denoted by ‘∪’.
Note:

• In English terminology, it is referred to as “OR”.

• The union of events is said to be occurred if either of the events have occurred.

Examples

(i) Rolling a die: S = {1, 2, 3, 4, 5, 6}.


A = {even number} = {2, 4, 6} and B = {multiple of 3} = {3, 6}
Implies that, A ∪ B = {either an even number or a multiple of 3} = {2, 3, 4, 6}

(ii) Fisherman’s Catch: A = {more than 200 Kg} and B = {less than 50 Kg}
Implies that, A ∪ B = {either more than 200 Kg or less than 50 Kg}

1.3.2.2 Intersection of events:


The intersection of two or more than two events is the collection of all the outcomes that
are common to all the events. It is denoted by ‘∩’.
Note:

• In English terminology, it is referred to as “AND”.

• The intersection of events is said to be occurred if all the events have occurred.

Examples

(i) Rolling a die: S = {1, 2, 3, 4, 5, 6}.


A = {even number} = {2, 4, 6} and B = {multiple of 3} = {3, 6}
Implies that, A ∩ B = {even number and multiple of 3} = {6}

(ii) Fisherman’s Catch: A = {more than 100 Kg} and B = {less than 150 Kg}
Implies that, A ∩ B = {more than 100 Kg and less than 150 Kg}

11
1.3.2.3 Solved examples:

Q1. For an experiment of rolling a die, let us define the events as follows:
A = {even number} and B = {prime number}

(i) Find A ∪ B.
(ii) Find A ∩ B.

Solution:
A = {even number} = {2, 4, 6}
B = {prime number} = {2, 3, 5}

(i) A ∪ B = {either an even number or a prime number} = {2, 3, 4, 5, 6}


(ii) A ∩ B = {even number and prime number} = {2}

Q2. 5 cards are to be drawn without replacement from a well shuffled pack of 52 cards.
Can we represent the event that there are no aces as an intersection of 5 events?
Solution:
Yes. Let us define the events as follows:
E = {no aces in 5 draws of cards}
E1 = {no ace in the first draw of card}
E2 = {no ace in the second draw of card}
E3 = {no ace in the third draw of card}
E4 = {no ace in the fourth draw of card}
E5 = {no ace in the fifth draw of card}
Then, we can express the event E as (E1 ∩ E2 ∩ E3 ∩ E4 ∩ E5 )

Q3. In an IPL experiment, consider an event E = {5 runs being scored in two legal deliveries}.
Can we represent this event as the union of events?
Solution:
Yes. The possible number of runs that can be scored off bat on any legal delivery is
0, 1, 2, 3, 4 and 6. (Assuming there are no extras and no overthrows)
Let us define the events as follows:
E1 = {1 run off the first delivery and 4 runs off the second delivery} = {1+4}
E2 = {2 runs off the first delivery and 3 runs off the second delivery} = {2+3}
E3 = {3 runs off the first delivery and 2 runs off the second delivery} = {3+2}
E4 = {4 runs off the first delivery and 1 run off the second delivery} = {4+1}
Then, we can express the event E as (E1 ∪ E2 ∪ E3 ∪ E4 )

1.3.3 Disjoint events


Two or more than two events are said to be disjoint events if they have an empty intersection.
In general, the events E1 , E2 , E3 , . . . are disjoint if:

12
Ei ∩ Ej = empty set ; for any i ̸= j
If the events A and B are disjoint, then it implies that:

• If A occurred, B did not occur.

• If B occurred, A did not occur.

• A is subset of B c and B is subset of Ac .

Examples

(i) Rolling a die: S = {1, 2, 3, 4, 5, 6}.


A = {even number} = {2, 4, 6} and B = {odd number} = {1, 3, 5}
A and B are disjoint events because, A ∩ B = empty set

(ii) Fisherman’s Catch: A = {more than 200 Kg} and B = {less than 50 Kg}
A and B are disjoint events because, A ∩ B = empty set

1.3.3.1 Partition
If two or more disjoint events make up the whole sample space together, then it is referred
to as the Partition.
We can make a partition of large sample spaces into multiple disjoint events for study.
Note:
The event and its complement are referred to as a “Partition” because:

• The event and its complement are always disjoint events, i.e. A ∩ Ac = empty set

• The event and its complement together cover all the outcomes, i.e. A ∪ Ac = S

Examples
For an experiment of drawing a card from a pack of 52 well shuffled cards, consider the
events:
E1 = {The card is a Spade}, E2 = {The card is a Heart}, E3 = {The card is a Club}
and E4 = {The card is a Diamond}
Then, the events E1 , E2 , E3 and E4 are referred to as the Partition.

1.3.4 De Morgan’s Law


De Morgan’s Laws are defined as:

• (A ∪ B)c = Ac ∩ B c

• (A ∩ B)c = Ac ∪ B c

These laws are very useful and will come to a rescue when we want to interpret an event
given in English.

13
1.3.5 Solved Examples:
Going through these examples will enhance your skill of translating English to events.

Q1. The hats of 5 persons are identical and get mixed up in a box. If each person picks a
hat at random from the box, then consider the following events:
A = {No person gets their own hat}
B = {Every person gets their own hat}
C = {At least one person does not get their own hat}
D = {At least one person gets their own hat}
Based on the given information, answer the following questions.

(i) What is Ac ?
(ii) What is B c ?
(iii) Are A and B disjoint events?
(iv) What is A ∩ B c ?

Solution:

(i) As we know that, Ac = {outcomes in S and not in A} = (S \A)


So, even if any one of the 5 persons get their own hat, that outcome will not
belong to the event A but will belong to S.
Hence, Ac = {At least one person gets their own hat} = D

(Note: Don’t take the Ac as “Everyone gets their own hat”, it’s not an En-
glish comprehension. That’s where writing down complete sample space is a
complicated task, but having an idea of sample space is useful.)
(ii) Again, as we know that, B c = {outcomes in S and not in B} = (S \B)
So, even if any one of the 5 persons does not get their own hat, that outcome will
not belong to the event B but will belong to S.
Hence, B c = {At least one person does not get their own hat} = C
(iii) Since, A ∩ B = empty set =⇒ A and B are disjoint events.
i.e. the occurrence of these events that no person gets their own hat and every
person gets their own hat is not possible together.
(iv) A ∩ B c =⇒ All the outcomes that are in the event A but not in B.
Now,

A ∩ B c = A ∩ C ; (because, B c = C)
= {No person gets their own hat and Atleast one person does not get their own hat}
= {No person gets their own hat}
c
∴A∩B =A

14
Another way to approach:
Since A and B are disjoint events =⇒ A ⊆ B c
Hence, A ∩ B c = A

Q2. For the IPL experiment, in one over 6 deliveries are bowled where, in each delivery
0, 1, 2, 3, 4 or 6 runs may be scored. Consider the following events:
A = {No 4s in one over}
B = {No 6s in one over}
C = {Exactly 20 runs scored in one over}
Based on the given information, answer the following questions.

(i) What is A ∪ B?
(ii) What is A ∩ B?
(iii) Can A ∩ B ∩ C occur?

Solution:

(i) A ∪ B = {Either no 4s in one over or no 6s in one over}


i.e. either of the events A or B occur.
Remember that the outcome {6,6,6,6,6,6} also belong to A ∪ B because event A
has occurred even if B didn’t occur.
(ii) A ∩ B = {No 4s and no 6s in one over}
i.e. both the events A and B must occur.
(iii) For A ∩ B ∩ C to occur, one should be able to score 20 runs with no 4s and no 6s.
But, the maximum runs scored in one over with no 4s and no 6s is only 18 (i.e.
{3,3,3,3,3,3}.
Hence, A ∩ B ∩ C = empty set =⇒ It can not occur.

1.4 Venn Diagram


Venn diagram is a very useful visualization for sets.
We consider sample space to be one big ellipse or circle or rectangle, while every subset is
represented as a smaller circle inside it.
Let us consider three events, i.e. A, B and C and represent it in Venn diagram as follows:

15
• Union of events:
All covered regions.

The shaded region in the above Venn diagram represents A ∪ B.

• Intersection of events:
Common region.

16
The shaded region in the above Venn diagram represents A ∩ B c , i.e. Region of A
outside B.
Example:
Rolling a die: Even number, but not a multiple of 3.

• Disjoint events:
No overlap in regions.

In the above Venn diagram, events A and C and events B and C are disjoint events.

Similarly, we can do more complicated operations and will get different looking results.
One can also verify the De Morgan’s law using the Venn diagram. (Do give it a try!)

2 Introduction to Probability
2.1 Events and Chance
Now that we have defined the sample space, outcomes and events, we are interested in
whether the event will occur or not. If yes, then what are the chances of it occurring?
For instance,

• Tossing a coin: What are the chances of getting a head?

• 13 cards drawn randomly from a well shuffled pack of 52 cards: What are the chances
of getting all 13 cards of the same suit?

• Football world cup: What are the chances of India winning?

• Cricket world cup: What are the chances of India winning?

Though intuitively, we can say that the chances of occurring of an event is low or high, but
still we want to assign the proper values of chance to the events in an experiment.

17
2.2 Mathematical and Probability theory
The probability theory let us work with numbers, which represent the chance and to associate
a mathematical value of chance to the occurrence of an event in the experiment.
The probability theory is a mathematically developed theory.

2.2.1 Mathematical Theory


Let us first discuss the ingredient of mathematical theory as defined:
• Define the basic objects of interest precisely, and this will come from other prior fields.
For instance, in probability it comes primarily from the set theory as discussed in the
above sections that sample space is a set and events are its subset and so on.

• A few assumptions about the object we have or define the conditions which are always
valid i.e., axioms.

• Deduce everything else with logical proof.

2.2.2 Probability Theory


The three important objects in probability theory (together make up the Probability Space)
are:
• Sample space

• Events

• Probability function.
We have already discussed sample space and events in detail in the earlier sections. Let us
now discuss the probability function in detail.

2.2.2.1 Probability
Probability is a function P that assigns to each event a real number between 0 and 1. The
entire probability space (sample space, events and probability function) should satisfy the
following two axioms:
(i) P (S) = 1 (Probability of entire sample space equals 1)

(ii) If E1 , E2 , E3 , . . . are disjoint events then,


P (E1 ∪ E2 ∪ E3 ∪ . . . ) = P (E1 ) + P (E2 ) + P (E3 ) + . . .
It implies that the function take subsets of sample space i.e. events, as input and gives a real
number between 0 and 1 as output (referred to as probability), which is a representative of
chance of occurrence of that event.
Note:

18
• Higher the output value means higher the chance of occurring of that event.

• 0 means the event cannot occur, and 1 means it will always occur.

• Often, probability can be mentioned in percentages.


As an example, if probability of an event occurrence is 0.9 then, we can say there is a
90% chance of it occurring.

Examples
Tossing a coin: S = {H, T}.
Which of the following probability function(s) satisfy the axioms?

(i) P (empty) = 0, P ({H}) = 0.5, P ({T}) = 0.5 and P ({H, T}) = 1

(ii) For 0 ≤ p ≤ 1:
P (empty) = 0, P ({H}) = p, P ({T}) = 1 − p and P ({H, T}) = 1

(iii) P (empty) = 0, P ({H}) = 0.5, P ({T}) = 0.6 and P ({H, T}) = 1

Solution:

(i) • P (S) = P ({H, T}) = 1


• The events {H} and {T } are disjoint and,
P ({H} ∪ {T }) = P ({H, T}) = P (S) = 1 which is the same as,
P ({H}) + P ({T }) = 0.5 + 0.5 = 1
Hence, the given probability function satisfy the axioms.

(i) • P (S) = P ({H, T}) = 1


• The events {H} and {T } are disjoint and,
P ({H} ∪ {T }) = P ({H, T}) = P (S) = 1 which is the same as,
P ({H}) + P ({T }) = p + (1 − p) = 1
Hence, the given probability function satisfy the axioms.

(iii) • P (S) = P ({H, T}) = 1


• The events {H} and {T } are disjoint and,
P ({H} ∪ {T }) = P ({H, T}) = P (S) = 1 which is not the same as,
P ({H}) + P ({T }) = 0.5 + 0.6 = 1.1
Hence, the given probability function does not satisfy the axioms.

19
2.3 Basic properties of probability
Property 1: Empty Set
The probability of an empty set (denoted by Φ) equals 0.
i.e., P (Φ) = 0

Proof:
We know that an event and its complement are always disjoint.
Since Φc = S, S and Φ are disjoint and Φ ∪ S = S
By using Axiom (2),

P (S ∪ Φ) = P (S) + P (Φ)
=⇒ P (S) = P (S) + P (Φ)
∴ P (Φ) = 0

Property 2: Complement
Let E c be the complement of an event E. Then,

P (E c ) = 1 − P (E)

Proof:

We know that an event and its complement are always disjoint. So, E and E c are disjoint
and E ∪ E c = S
By using Axiom (2),

P (E ∪ E c ) = P (E) + P (E c )
P (S) = P (E) + P (E c )
=⇒ 1 = P (E) + P (E c ) ; (By Axiom 1, P (S) = 1)
∴ P (E c ) = 1 − P (E)

Property 3: Subset
If the event E is a subset of the event F , i.e., E ⊆ F , then

20
P (F ) = P (E) + P (F \E) =⇒ P (E) ≤ P (F )

Proof:

F \E = F ∩ E c (Outside E and inside F )


Also, E and F \E are disjoint and E ∪ (F \E) = F .
By using Axiom (2),

P (E ∪ (F \E)) = P (E) + P (F \E)


=⇒ P (F ) = P (E) + P (F \E)
Since probability of an event cannot be negative i.e., P (F \E) ≥ 0 ; (Always)
P (F ) = P (E) + (something non-negative)
∴ P (F ) ≥ P (E)

Property 3a: Difference and Intersection


If E and F are events, then

P (E) = P (E ∩ F ) + P (E \F )
P (F ) = P (E ∩ F ) + P (F \E)

Proof:
Since E ∩ F is a subset of E, using the subset property

P (E) = P (E ∩ F ) + P (E \(E ∩ F ))
Now, E \(E ∩ F ) = E \F
Hence, P (E) = P (E ∩ F ) + P (E \F )
Similarly,
P (F ) = P (E ∩ F ) + P (F \E)

Property 4: Union and Intersection


If E and F are events, then

21
P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F )

Proof:
The events (E \F ), (E ∩ F ) and (F \E) are disjoint.
Also, E ∪ F = (E \F ) ∪ (E ∩ F ) ∪ (F \E)
By using Axiom (2),

P (E ∪ F ) = P (E \F ) + P (E ∩ F ) + P (F \E)
(As we know that for any events A and B, P (A \B) = P (A) − P (A ∩ B))
=⇒ P (E ∪ F ) = [P (E) − P (E ∩ F )] + P (E ∩ F ) + [P (F ) − P (E ∩ F )]
∴ P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F )

If we consider the events E and F as disjoint, then P (E ∩ F ) = 0. Plugging in, we will get
the Axiom (2).
Hence, this property is a generalization of the Axiom (2).

2.3.1 Working with probability spaces


Q1. A waiter and a cashier are to be hired. There are 4 applicants - David and Megha from
Delhi, and Rajesh and Veronica from Mumbai. The restaurant hires one person at
random as waiter, and another from the remaining as cashier. Write out the following:

(i) Sample space.


(ii) Event A: Cashier is from Delhi
(iii) Event B: Exactly one position is filled by a Delhiite
(iv) Event C: Neither position is filled by a Delhiite

Solution:
Let us denote, David: D, Megha: M , Rajesh: R and Veronica: V .

(i) The outcome of this experiment is of the form {Waiter name, Cashier name}
Therefore, the sample space is given by:

S ={(D, M ), (D, R), (D, V ), (M, D), (M, R), (M, V ),


(R, D), (R, M ), (R, V ), (V, D), (V, M ), (V, R)}

(ii) If cashier is from Delhi, then the cashier position can be filled either by David or
by Megha.
Therefore, the event A is given by:

A ={(D, M ), (M, D), (R, D), (R, M ), (V, D), (V, M )}

22
(iii) If exactly one position is to be filled by a Delhiite, then either the cashier position
or the waiter position, but not both, can be filled either by David or by Megha.
Therefore, the event B is given by:

B ={(D, R), (D, V ), (M, R), (M, V ),


(R, D), (V, D), (R, M ), (V, M )}

(iv) Neither position is to be filled by a Delhiite, means that none of the position is
to be filled by David and Megha.
Therefore, the event C is given by:

C ={(R, V ), (V, R)}

Q2. In a town, there are fishing boats that go out to catch fish everyday. Over the years,
folks have observed the following:

• Chances of catching more than 400 kg of fish in a day is 35%.


• Chances of catching more than 500 kg of fish in a day is 10%.

What are the chances of catching between 400 and 500 kg of fish in a day?
Solution:
Let us define the events as follows:
A : Catching more than 400 kg of fish in a day.
B : Catching more than 500 kg of fish in a day.
So, P (A) = 0.35 and P (B) = 0.10

Also, B ⊆ A, because all the outcomes of B will be contained in A as well.


Now, the event E={The catch of fish in a day is between 400 and 500kg}= A \B
i.e., the outcome should belong to the event A but not the event B.

In other words, the catch of fish in a day should be more than 400 kg but not more
than 500 kg. Therefore,

P (E) = P (A \B)
(By subset property, if B ⊆ A then, P (A) = P (B) + P (A \B))
=⇒ P (E) = P (A) − P (B)
= 0.35 − 0.10
∴ P (E) = 0.25

Q3. Suppose you hear the following forecast for rain and temperature:

• Chance of rain tomorrow is 60%.

23
• Chance of maximum temperature above 30◦ C tomorrow is 70%.
• Chance of rain and maximum temperature above 30◦ C tomorrow is 40%.

What are the chances of no rain and maximum temperature below 30◦ C tomorrow?
Solution:
Let us define the events as follows:
A : There will be rain ; Ac : There will be no rain
B : The maximum temperature is above 30◦ C ; B c : The maximum temperature
is below 30◦ C
So, P (A) = 0.60, P (B) = 0.70 and P (A ∩ B) = 0.40
Now, the event
E ={There will be no rain and maximum temperature below 30◦ C} = Ac ∩ B c
Therefore,

P (E) = P (Ac ∩ B c )
= P (A ∪ B)c ; (Using De Morgan’s Law)
= 1 − P (A ∪ B)
=⇒ P (E) = 1 − [P (A) + P (B) − P (A ∩ B)] ; (Using Property 4: Union and intersection)
= 1 − [0.60 + 0.70 − 0.40]
∴ P (E) = 0.10

2.4 Distributions
The idea of distributions is to assign probabilities to each of the individual outcomes in the
sample space.
It gives us the sense of how the probabilities are distributed over the outcomes.
It is possible when:

• When the outcomes can be enumerated as first, second, third and so on.

• The sample space is countable or finite sample space.

Example:
Rolling a die: S = {1,2,3,4,5,6}
Let us suppose that all the individual outcomes are events and,
P ({1}) = p1 , P ({2}) = p2 , P ({3}) = p3 , P ({4}) = p4 , P ({5}) = p5 and, P ({6}) = p6
where, 0 ≤ pi ≤ 1; i = 1, 2, 3, 4, 5, 6
Also, the events {1}, {2}, {3}, {4}, {5} and {6} are disjoint events.
And, {1} ∪ {2} ∪ {3} ∪ {4} ∪ {5} ∪ {6} = S.
Hence, by Axiom (2), p1 + p2 + p3 + p4 + p5 + p6 = P (S) = 1

Therefore, the notion of distribution for this case is that:

24
Each of the pi ; i = 1, 2, 3, 4, 5, 6, have to be between 0 and 1, and they have to add up to 1.

We can also find the probability of any complicated event by splitting it as union of in-
dividual outcomes.

P ({1, 3, 5}) = P ({1} ∪ {3} ∪ {5})


= P ({1}) + P ({3}) + P ({5}) ; (because, disjoint events)
= p1 + p 3 + p5

Note:
1
If the die is fair, then pi =; i = 1, 2, 3, 4, 5, 6
6
i.e. each individual outcome has the same probability. (Generally, referred to as equally-
likely outcomes)

2.4.1 Uniform Distribution on a finite sample space


S = {finite number of outcomes} and the outcomes are equally likely.
i.e. each individual outcome, in sample space S, has the same probability of occurrence.
1 1
Hence, probability of an outcome = =⇒ P (an outcome)=
Number of outcomes in S |S|
And,
Number of outcomes in the event
P (event) =
Number of outcomes in S

2.4.2 Solved Examples:


Q1. There are 5 red and 8 blue marbles in an urn. A marble is drawn from the urn at
random. What is the probability that a red marble is drawn?
Solution:
The sample space; S = {R1 , R2 , R3 , R4 , R5 , B1 , B2 , B3 , B4 , B5 , B6 , B7 , B8 }
Since, the marble is drawn at random, the distribution is uniform.
Let us define the event,
A = {The marble drawn is red} = {R1 , R2 , R3 , R4 , R5 }
Therefore,
Number of outcomes in the event A
P (A) = ; (because, Uniform distribution)
Number of outcomes in S
5
∴ P (A) =
13

Q2. In a throw of two fair dice, what is the probability that the sum of the two numbers is
8?

25
Solution:
The sample space consists of 36 outcomes given by:
S ={(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Since, the fair dice are rolled, the distribution is assumed to be uniform.
Let us define the event,
A = {The sum of the two numbers is 8} = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
Therefore,
Number of outcomes in the event A
P (A) = ; (because, Uniform distribution)
Number of outcomes in S
5
∴ P (A) =
36
Q3. The hats of 3 persons are identical and get mixed up. Each person picks a hat at
random. What is the probability that none of the persons gets their own hat?
Solution:
Let the three persons be P1 , P2 and P3 , and their hats are H1 , H2 and H3 respectively.
Then the possible outcomes are listed in a table below, where each column represents
a person and each row is the hat picked by them. The sample space is as follows:

P1 P2 P3
1 H1 H2 H3
2 H1 H3 H2
3 H2 H1 H3
4 H2 H3 H1
5 H3 H1 H2
6 H3 H2 H1

Since, each person picks a hat at random, the distribution is uniform.


Let us define the event,
A = {None of the person gets their own hat} = {(H2 , H3 , H1 ), (H3 , H1 , H2 )}
Therefore,
Number of outcomes in the event A
P (A) = ; (because, Uniform distribution)
Number of outcomes in S
2
∴ P (A) =
6
26
2.5 Conditional Probability
The basic idea is that, we can split the initial probability space into an event that has
occurred and some conditional probability space given that event has occurred.

2.5.1 Conditional Probability Space


Consider a probability space:
Sample space S, Collection of events and Probability function P .

Let B be an event with P (B) > 0, then the Conditional Probability Space given B is:

• Sample space : B, because the event B has already occurred, and we have moved into
this sample space.

• Events : A ∩ B, for every event A in the original space.


P (A ∩ B)
• Probability function : It is given by, P (A|B) = . It is also called conditional
P (B)
probability of A given B or probability of A given B.

Remember that, the new probability function also satisfy both the axioms:

1) The probability of entire sample space equals 1, i.e.


P (B ∩ B)
P (B|B) = =1
P (B)
2) If A1 , A2 , A3 , . . . are disjoint events then,
P (A1 ∪ A2 ∪ A3 ∪ . . . |B) = P (A1 |B) + P (A2 |B) + P (A3 |B) + . . .

Note:

27
• P (B) > 0 because, if P (B) = 0 then, it has no chance of occurring, and can’t be
observed.

• One can also check that all the basic properties of probability are satisfied by the
probability function P (A|B).

Example:
Toss a coin three times:
Initial sample space ; S = {HHH, HHT, HT H, T HH, T T H, T HT, HT T, T T T }
Let us suppose, an event B = {First toss result in tails} has occurred.
Therefore, the reduced probability space to work on for the second and third toss is:

• Sample space ; B = {T HH, T T H, T HT, T T T }.

• Events and probability function can be redefined to account for occurrence of B.

2.5.2 Multiplication rule:


In some cases, it is easier to work with conditional probability space than the original prob-
ability space. So, we can compute probabilities like:

For any event A in original space ; P (A ∩ B) = P (B)P (A|B)

This formula is often referred to as the multiplication rule of probability.

It can be generalized to n events by keep on iterating the multiplication rule: (Do try!)

P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) . . . P (An | A1 ∩ A2 ∩ · · · ∩ An−1 )

2.5.3 Solved Examples:


Q1. A fair die is rolled. Consider an event E = {2, 4, 6}, then compute the following
probabilities:

(i) P ({2}|E)
(ii) P ({1}|E)
(iii) P ({2, 3, 4}|E)

Solution:
Sample space: S = {1, 2, 3, 4, 5, 6}
Since, the fair die is rolled, the distribution is assumed to be uniform, i.e.

1
P ({i}) = ; i = 1, 2, 3, 4, 5, 6
6

28
The event E = {2, 4, 6}. Therefore,
Number of outcomes in the event
P (E) = ; (because, Uniform distribution)
Number of outcomes in S
3
∴ P (E) =
6
P ({2} ∩ E)
(i) P ({2}|E) =
P (E)
P ({2})
=
P (E)
1/6 1
∴ P ({2}|E) = 3 =
/6 3

P ({1} ∩ E)
(ii) P ({1}|E) =
P (E)
P ({Φ})
=
P (E)
0
∴ P ({1}|E) = 3 = 0
/6

P ({2, 3, 4} ∩ E)
(iii) P ({2, 3, 4}|E) =
P (E)
P ({2, 4})
=
P (E)
2/6 2
∴ P ({2, 3, 4}|E) = 3 =
/6 3
Q2. There are two urns ‘A’ and ‘B’. Urn ‘A’ contains 7 red marbles and 6 blue marbles
while, Urn ‘B’ contains 5 red marbles and 8 blue marbles. If an urn is chosen at
random and then the marble is drawn at random from the chosen urn. Based on this
information, compute the following probabilities:
(i) The red ball is drawn given that Urn ‘A’ is chosen.
(ii) The blue ball is drawn given that Urn ‘B’ is chosen.
Solution:
(i) It is given that the Urn ‘A’ is chosen. Hence, we have the conditional probability
space, where the sample space is given by:
S = {R1 , R2 , R3 , R4 , R5 , R6 , R7 , B1 , B2 , B3 , B4 , B5 , B6 }
Since, we are drawing the marble at random, the distribution is uniform.
7
∴ P (Red Marble | Urn A) =
13
29
(ii) It is given that the Urn ‘B’ is chosen. Hence, we have the conditional probability
space, where the sample space is given by:
S = {R1 , R2 , R3 , R4 , R5 , B1 , B2 , B3 , B4 , B5 , B6 , B7 , B8 }
Since, we are drawing the marble at random, the distribution is uniform.
8
∴ P (Blue Marble | Urn B) =
13
Q3. Consider a class of 15 students: 4 from State-1, 8 from State-2 and 3 from State-
3. Three different students are chosen at random one after another. What is the
probability that the selected three are from State-1, State-3 and State-1 again in that
order?
Solution:
Let us define the following events:
A1 : The first student is from State-1
A2 : The second student is from State-3
A3 : The third student is from State-1
Since, students are chosen at random. Therefore,
4
P (A1 ) =
15
3
P (A2 |A1 ) = (Because, it is given that one student of State-1 is already selected)
14
3
And, P (A3 |A1 ∩ A2 ) =
13
Because, it is given that the one from each State-1 and 3 is already selected.
The probability that the selected three are from State-1, State-3 and State-1 again in
that order is P (A1 ∩ A2 ∩ A3 ).
Now,

P (A1 ∩ A2 ∩ A3 ) = P (B ∩ A) (Consider, A1 = B and, A2 ∩ A3 = A)


= P (B)P (A|B) (Using multiplication rule)
= P (A1 )P (A2 ∩ A3 |A1 )
4
= × P (A2 ∩ A3 |A1 ) (Because, students are chosen at random)
15
4
= × {P (A2 |A1 )P (A3 |A1 ∩ A2 )} (Again, using multiplication rule)
15
4 3 3
= × ×
15 14 13
6
∴ P (A1 ∩ A2 ∩ A3 ) =
455

Q4. A family has 2 children. What is the probability that both are girls, given that at least
one is a girl?
Solution:

30
Let us denote Boy: B and Girl: G
Then the sample space: S = {BB, BG, GB, GG} and, the distribution is assumed to
be uniform.
Let us define the following events:
E = {At least one is a girl} = {GB, BG, GG}
F = {Both are girls} = {GG}
Therefore, the probability that both are girls, given that at least one is a girl is:
P (F ∩ E)
P (F |E) =
P (E)
P ({GG} ∩ {GB, BG, GG})
=
P ({GB, BG, GG})
P ({GG})
=
P ({GB, BG, GG})
1/4
= 3/4
; (Because, Uniform ditribution)
1
∴ P (F |E) =
3

2.6 Bayes’ Theorem and Independence


2.6.1 Law of Total Probability
Consider a probability space:
Sample space S, Collection of events and Probability function P .

Let us consider the events A and B such that event B partitions the sample space into two
halves i.e. B and B c , and A is an event of interest.
Then, the Law of Total Probability states that:
P (A) = P (A ∩ B) + P (A ∩ B c ) = P (A|B)P (B) + P (A|B c )P (B c )

31
Proof:
Since, A ∩ B and A ∩ B c are disjoint events and, A = (A ∩ B) ∪ (A ∩ B c )
By using Axiom (2),

P (A) = P (A ∩ B) + P (A ∩ B c )
=⇒ P (A) = P (A|B)P (B) + P (A|B c )P (B c ) ; (Using Multiplication rule)

Generalization:

If there are n partitions of the sample space, say B1 , B2 , . . . Bn , then for any event A in
the sample space:

P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) . . . P (A|Bn )P (Bn )

2.6.1.1 Solved Examples:

Q1. There are two urns ‘A’ and ‘B’. Urn ‘A’ contains 7 red marbles and 6 blue marbles
while, Urn ‘B’ contains 5 red marbles and 8 blue marbles. If an urn is chosen at
random and then the marble is drawn at random from the chosen urn, then find the
probability that a red ball is drawn.
Solution:
In Q2 of section 2.5.3, we have already computed that:
7 5
P (Red Marble | Urn A) = ; P (Red Marble | Urn B) =
13 13
6 8
P (Blue Marble | Urn A) = ; P (Blue Marble | Urn B) =
13 13
Let us define the event,
B1 : The marble is drawn from Urn ‘A’
B2 : The marble is drawn from Urn ‘B’

32
R: The marble drawn is red
The events B1 and B2 partitions the sample space into two halves, i.e. B1c = B2 .
Therefore, by using the Law of Total Probability, we get:
P (R) = P (R|B1 )P (B1 ) + P (R|B2 )P (B2 )
   
7 1 5 1
= × + ×
13 2 13 2
6
∴ P (R) =
13
Q2. An economic model predicts that if interest rates rise, then there is a 60% chance that
unemployment will increase, but that if interest rates do not rise, then there is only a
30% chance that unemployment will increase. If the economist believes that there is
a 40% chance that interest rates will rise, what should she calculate is the probability
that unemployment will increase?
Solution:
Let us define the events,
B1 : The interest rates rise
B2 : The interest rates do not rise
A: There is an increase in unemployment
Then,
40 60
P (B1 ) = ; P (A|B1 ) =
100 100
60 30
P (B2 ) = ; P (A|B2 ) =
100 100
Either the interest rates will rise or will not rise. Hence, the events B1 and B2 partitions
the sample space into two halves. (B1c = B2 )
Therefore, by using the Law of Total Probability, we get:
P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 )
   
60 40 30 60
= × + ×
100 100 100 100
∴ P (A) = 0.42

Q3. A man possesses 5 coins- 2 are double-headed, 1 is double-tailed, and 2 are normal.
He picks a coin at random and tosses it. What is the probability that he sees a head?
Solution:
Let us define the events,
B1 : The double-headed coin is tossed
B2 : The double-tailed coin is tossed
B3 : The normal coin is tossed
A: He sees a head
Then,

33
2
P (B1 ) = ; P (A|B1 ) = 1
5
1
P (B2 ) = ; P (A|B2 ) = 0
5
2 1
P (B3 ) = ; P (A|B3 ) =
5 2

Since, the events B1 , B2 and B3 partitions the sample space into three parts. (B1c = B2 )
Therefore, by using the Law of Total Probability, we get:

P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + P (A|B3 )P (B3 )


     
2 1 1 2
= 1× + 0× + ×
5 5 2 5
∴ P (A) = 0.60

2.6.2 Bayes’ Theorem


Consider a probability space:
Sample space S, Collection of events and Probability function P .

The Bayes’ theorem states that if A and B are events with P (A) > 0, P (B) > 0 then:

P (A ∩ B) = P (B)P (A|B) (1)


And, P (A ∩ B) = P (A)P (B|A) (2)

From (1) and (2), we get:


P (B)P (A|B)
P (B|A) =
P (A)

34
Note:
• Bayes’ theorem allows us to move across different conditional probability spaces.
• Bayes’ theorem along with the Law of Total Probability are at the heart of so many
applications in probability.

2.6.2.1 Solved Examples:

Q1. There are two urns ‘A’ and ‘B’. Urn ‘A’ contains 7 red marbles and 6 blue marbles
while, Urn ‘B’ contains 5 red marbles and 8 blue marbles. If an urn is chosen at
random and then the marble is drawn at random from the chosen urn, then find the
probability that urn ‘A’ is chosen given that the red ball is drawn.
Solution:
In Q1 of section 2.6.1.1, we have already computed that:
7 5
P (A|B1 ) = ; P (A|B2 ) =
13 13
6 8
P (Ac |B1 ) = ; P (Ac |B2 ) =
13 13
6
And, P (A) = ; (By using the Law of Total Probability)
13
Therefore, by using the Bayes’ theorem, we get:
P (A|B1 )P (B1 )
P (B1 |A) =
P (A)
(7/13) × (1/2)
= 6/13

7
∴ P (B1 |A) =
12
Q2. In a city, 1% of people have Swine Flu. In the Flu test, 95% of people with Swine
flu test positive, while 2% of people without the disease will test positive. A person
is randomly chosen from the city and tests positive. What is the probability that the
person actually has Swine Flu?
Solution:
Let us define the events,
A: Person tests positive
B: Person has Swine Flu
=⇒ B c : Person does not have Swine Flu
Then,

P (B) = 0.01 ; P (A|B) = 0.95


P (B c ) = 0.99 ; P (A|B c ) = 0.02

35
The events B and B c partition the sample space in two halves. Therefore, by using
the Law of Total Probability, we get:

P (A) = P (A|B)P (B) + P (A|B c )P (B c )


= (0.95 × 0.01) + (0.02 × 0.99)
=⇒ P (A) = 0.0293

Now, by using the Bayes’ Theorem, we get:

P (A|B)P (B)
P (B|A) =
P (A)
0.01 × 0.95
=
0.0293
∴ P (B|A) = 0.3242

This implies that there is only 32.42% chance that the person has Swine Flu given that
the person tests positive. Hence, the test is not much reliable.

Q3. A student attempting an MCQ with 4 choices (of which one is correct) knows the
correct answer with probability 34 . If she does not know, she guesses a random choice.
Given that a question was answered correctly, what is the conditional probability that
she knows the answer?
Solution:
Let us define the events,
A: She answered the question correctly
B: She knows the correct answer of the question
=⇒ B c : She guesses the answer of the question
Then,

3
P (B) = ; P (A|B) = 1
4
1 1
P (B c ) = ; P (A|B c ) =
4 4

The events B and B c partition the sample space in two halves. Therefore, by using
the Law of Total Probability, we get:

P (A) = P (A|B)P (B) + P (A|B c )P (B c )


   
3 1 1
= 1× + ×
4 4 4
13
=⇒ P (A) =
16

36
Now, by using the Bayes’ Theorem, we get:
P (A|B)P (B)
P (B|A) =
P (A)
1 × (3/4)
= 13
/16
12
=
13
∴ P (B|A) = 0.9231
This implies that, there is approximately, 92.31% chance that she knew the answer to
a question given that the question was answered correctly.
Q4. You first roll a fair die, then toss as many fair coins as the number that showed on the
die. Given that 5 heads are obtained, what is the probability that the die showed 5?
Solution:
Let us define the events,
Ei : The die showed the number i ; i = 1, 2, 3, 4, 5, 6
A: 5 Heads are obtained
Then,
1
P (Ei ) = ; i = 1, 2, 3, 4, 5, 6 and, P (A|Ei ) = 0 ; i = 1, 2, 3, 4
6
Now, by using Uniform distribution, we get:
1 3
P (A|E5 ) = and P (A|E6 ) =
32 32
Since, the events E1 , E2 , . . . E6 partition the sample space in two halves. Therefore, by
using the Law of Total Probability, we get:
P (A) = P (A|E1 )P (E1 ) + P (A|E2 )P (E2 ) + . . . P (A|E6 )P (E6 )
   
1 1 1 3
=0+ × + ×
6 32 6 32
1
=⇒ P (A) =
48
Now, by using the Bayes’ theorem, we get:
P (A|E5 )P (E5 )
P (E5 |A) =
P (A)
( /32) × (1/6)
1
= 1/48

1
=
4
∴ P (E5 |A) = 0.25

37
2.6.3 Independence of Two Events
Two events A and B are independent if:

P (A ∩ B) = P (A)P (B) ; P (A) > 0, P (B) > 0

In other words, the events A and B are independent if the probability of occurrence of A
(B) is unaffected by the occurrence of B (A), i.e. P (A|B) = P (A) and P (B|A) = P (B).
If the two events A and B are independent, then it implies that:

(i) The events A and B c are also independent.


Proof:
The events A ∩ B and A ∩ B c are disjoint and, (A ∩ B) ∪ (A ∩ B c ) = A.
By using Axiom (2),

P ((A ∩ B) ∪ (A ∩ B c )) = P (A ∩ B) + P (A ∩ B c )
P (A) = P (A ∩ B) + P (A ∩ B c )
=⇒ P (A ∩ B c ) = P (A ∩ B) − P (A)
= P (A)P (B) − P (A) ; (A and B are independent events)
= P (A)[1 − P (B)]
∴ P (A ∩ B ) = P (A)P (B c )
c

(ii) The events Ac and B are also independent.


Proof:
The events A ∩ B and Ac ∩ B are disjoint and, (A ∩ B) ∪ (Ac ∩ B) = B.
By using Axiom (2),

P ((A ∩ B) ∪ (Ac ∩ B)) = P (A ∩ B) + P (Ac ∩ B)


P (B) = P (A ∩ B) + P (Ac ∩ B)
=⇒ P (Ac ∩ B) = P (A ∩ B) − P (B)
= P (A)P (B) − P (B) ; (A and B are independent events)
= [1 − P (A)]P (B)
∴ P (A ∩ B) = P (Ac )P (B)
c

(iii) The events Ac and B c are also independent.


Proof:

38
P ((Ac ∩ B c )) = P (A ∪ B)c ; (Using De Morgan’s Law)
= 1 − P (A ∪ B)
= 1 − [P (A) + P (B) − P (A ∩ B)]
= 1 − P (A) − P (B) + P (A)P (B) ; (A and B are independent events)
= 1 − P (A) − P (B)[1 − P (A)]
= [1 − P (A)][1 − P (B)]
∴ P (A ∩ B ) = P (Ac )P (B c )
c c

Note:
• The independence is defined very precisely in probability that one should not use their
intuition to conclude that the events are independent.
• If the two events A and B are independent, then the multiplicative rule states that the
probability of occurrence of event A and event B is given by, P (A ∩ B) = P (A)P (B)
• If the two events A and B are disjoint and B occurs then A definitely did not occur.
Thus, they can never be independent, as the occurrence of B impacts the conditional
probability of A.
=⇒ For event to be independent, they should have non-empty intersection

2.6.3.1 Solved Examples:

Q1. If a fair coin is tossed thrice and the events are defined as follows:
A: First toss is heads
B: Second toss is heads
Then, check the following:
(i) Are events A and B independent?
(ii) Are events A and B c independent?
Solution:
The sample space; S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
Since, the same fair coin is tossed thrice. Therefore, the distribution is uniform.
1
A = {HHH, HHT, HT H, HT T } ; P (A) =
2
1
B = {HHH, HHT, T HH, T HT } ; P (B) =
2
1
and, A ∩ B = {HHH, HHT } ; P (A ∩ B) =
4

39
(i). Since,
1
P (A ∩ B) =
4
1 1
= ×
2 2
∴ P (A ∩ B) = P (A)P (B)
Hence, A and B are independent events.
(ii). Since, the events A and B are independent, this implies that the event A and
B c are also independent.
Q2. If a fair dice is rolled and the events are defined as follows:
A: The die showed even number
B: The die showed odd number
Are events A and B independent?
Solution:
The sample space; S = {1, 2, 3, 4, 5, 6}
1
A = {2, 4, 6} ; P (A) =
2
1
B = {1, 3, 5} ; P (B) =
2
A∩B =Φ =⇒ A and B are disjoint events
Since, A and B are disjoint events, they can not be independent.
Therefore, A and B are dependent events.
Q3. If a card is drawn from a well shuffled pack of 52 cards, and the events are defined as
follows:
A : Card is a spade
B : Card is a king
Are events A and B independent?
Solution:
13
A = {Spade card} ; P (A) =
52
4
B = {King card} ; P (B) =
52
1
A ∩ B = {Spade-King card} ; P (A ∩ B) =
52
Now,
   
13 4
P (A)P (B) = ×
52 52
1
=
52
∴ P (A)P (B) = P (A ∩ B)

40
Hence, the events A and B are independent.
Remark: As there’s only one outcome, which can belong to multiple events. That’s why
even though only one card is drawn, still two or more events can be independent.

2.6.4 Mutual Independence of Three Events


The events A, B and C are mutually independent if:
(1) P (A ∩ B ∩ C) = P (A)P (B)P (C)

(2) P (A ∩ B) = P (A)P (B)

(3) P (A ∩ C) = P (A)P (C)

(4) P (B ∩ C) = P (B)P (C)


Generalization:
The events A1 , A2 , . . . An are mutually independent if, for all i1 , i2 , . . . ik :

P (Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P (Ai1 )P (Ai2 ) . . . P (Aik )

Note:
• For mutual independence of n events, there are almost 2n constraints i.e.
P ({Intersection of any subset of events}) = Product of the P ({Events})

• If n events are mutually independent, then it implies that any subset of events with or
without complementing are independent as well.

2.6.4.1 Solved Examples:

Q1. If a fair coin is tossed twice and the events are defined as follows:
A: Either both toss are heads or tails
B: First toss is heads
C: Second toss is heads
Then, check for independence of the three events.
Solution:
The sample space; S = {HH, HT, T H, T T }
Since, the same fair coin is tossed twice. Therefore, the distribution is uniform.
2
A = {HH, T T } ; P (A) =
4
2
B = {HH, HT } ; P (B) =
4
2
C = {HH, T H} ; P (C) =
4
41
Now,
1
A ∩ B = {HH} ; P (A ∩ B) = = P (A)P (B) =⇒ Events A and B are independent
4
1
A ∩ C = {HH} ; P (A ∩ C) = = P (A)P (C) =⇒ Events A and C are independent
4
1
B ∩C = {HH} ; P (B ∩C) = = P (B)P (C) =⇒ Events B and C are independent
4
1
But, A ∩ B ∩ C = {HH} ; P (A ∩ B ∩ C) = ̸= P (A)P (B)P (C)
4
Hence, the events A, B and C are not mutually independent but are pairwise inde-
pendent, i.e. each event is independent of every other possible combination of paired
events.

Q2. Two roads each connect A and B and B and C. Each of the four roads get blocked
with probability p independent of all other roads. What is the probability that there
is an open route from A to B given that there is no open route from A to C?

Solution:
Let us define the events,
E1 : First road from A to B i.e. Road 1, gets blocked
E2 : Second road from A to B i.e. Road 2, gets blocked
E3 : First road from B to C i.e. Road 3, gets blocked
E4 : Second road from B to C i.e. Road 4, gets blocked
F : There is no open route from A to B
E : There is no open route from A to C
It is given that each road get blocked with probability p independent of all other roads.
This implies that the 4 events are mutually independent and,
1
P (Ei ) = ; i = 1, 2, 3, 4
4

42
Now,
P (F c ) = P (Either road 1 or road 2 is open)
= P (E1c ∪ E2c )
= P (E1 ∩ E2 )c ; (Using De Morgan’s Law)
= 1 − P (E1 ∩ E2 )
= 1 − P (E1 )P (E2 ) ; (E1 and E2 are independent events)
∴ P (F C ) = 1 − p2
and, P (F ) = 1 − P (F c ) = p2
And, P (E|F ) = 1
Because, If there is no open route from A to B, then certainly there would be no open
route from A to C
Also,
P (E|F c ) = P (Road 3 and Road 4 are blocked)
= P (E3 ∩ E4 )
= P (E3 )P (E4 ) ; (E3 and E4 are independent events)
∴ P (E|F ) = p2
c

Therefore,
P (E) = P (E|F )P (F ) + P (E|F c )P (F c ) ; (Law of total probability)
= (1 × p2 ) + (p2 × (1 − p2 ))
∴ P (E) = 2p2 − p4
Hence, probability that there is an open route from A to B given that there is no open
route from A to C is:
P (E ∩ F c )
P (F c |E) =
P (E)
P (E|F c )P (F c )
= ; (Using Multiplication rule)
P (E)
p2 × (1 − p2 )
=
2p2 − p4
1 − p2
∴ P (F c |E) =
2 − p2

2.7 Repeated Independent Trials


2.7.1 Bernoulli Distribution

2.7.1.1 Single Bernoulli Trial


Consider a probability space: Sample space S, Collection of events and probability

43
function P .
Consider an event A, such that P (A) = p
If in an experiment, the occurrence of a particular event A is considered to be a success
and the non-occurrence of the event A is considered a failure, then we can interpret it
as a Bernoulli trial as follows:

• Sample space (with only two possible outcomes); S = {Success, Failure}.


• P (Success) = p

Or, if we denote the success by 1 and failure by 0 then,


S = {0, 1} with P (1) = p and P (0) = 1 − p
This distribution is denoted as Bernoulli(p).

2.7.1.2 Repeated Bernoulli Trials


Consider a Bernoulli(p) trial with S = {0, 1}.
If this Bernoulli(p) trial is repeated n times independently, then the sample space will
consist of 2n outcomes.
Example:
If n = 3, then S = {000, 001, 010, 100, 110, 101, 011, 111} with probabilities as follows:

• P (000) = P (0 in trial 1 and 0 in trial 2 and 0 in trial 3)


= P (0 in trial 1)P (0 in trial 2)P (0 in trial 3) ; (Trials are independent)
= (1 − p)(1 − p)(1 − p)
∴ P (000) = (1 − p)3

• P (101) = P (1 in trial 1 and 0 in trial 2 and 1 in trial 3)


= P (1 in trial 1)P (0 in trial 2)P (1 in trial 3) ; (Trials are independent)
= p(1 − p)p
∴ P (101) = p2 (1 − p)

Generalization:
If a Bernoulli(p) trial is repeated n times independently, then

P (b1 b2 . . . bn ) = pw (1 − p)n−w , where w = Number of 1s in b1 , b2 , . . . , bn

2.7.1.3 Solved Examples:

Q1. Suppose a fair coin is tossed five times. If in each toss of a coin, getting tails is
considered to be as a success, then compute the following:

44
(i) Probability of getting 0 tails
(ii) Probability of getting 2 tails

Solution:
For a single toss of a fair coin, consider an event A = {Tail}
Since in a trial, the occurrence of a particular event A is considered to be a success
and the non-occurrence of the event A is considered a failure, then we can interpret it
as a Bernoulli(p) trial with:

• S = {0, 1}, where tail is denoted by 1 and head by 0.


1
• p = (because fair coin)
2
Now the coin is tossed 5 times, i.e., Bernoulli(0.5) trial is repeated 5 times indepen-
dently, which implies that the sample space will consist of 25 (i.e., 32) equally likely
outcomes.

(i) P (0 Tails) = P (HHHHH)


= P (0 in all 5 trials)
= P (00000)
 0  5−0
1 1
= × 1− ; (Trials are independent)
2 2
 5
1
=
2
1
∴ P (0 Tails) =
32

(ii) P (2 Tails) =P ({T T HHH, T HT HH, T HHT H, T HHHT, HT T HH


HT HT H, HT HHT, HHT T H, HHT HT, HHHT T })
=P ({11000, 10100, 10010, 10001, 01100, 01010, 01001, 00110, 00101, 00011})
(Since, the events {11000},{10100},. . .,{00011} are disjoint)
=⇒ P (2 Tails) =P ({11000}) + P ({10100}) + P ({10010}) + P ({10001}) + P ({01100})
+ P ({01010} + P ({01001}) + P ({00110}) + P ({00101}) + P ({00011})
 5  5  5
1 1 1
= + + ... + ; (Trials are independent)
2 2 2
 5
1
=10 ×
2
10
∴ P (2 Tails) =
32

45
2
Q2. Suppose a biased coin with P (T ) = is tossed five times. If in each toss of a
3
coin, getting tails is considered to be as a success, then compute the following:

(i) Probability of getting 1 tail


(ii) Probability of getting 5 tails

Solution:
For a single toss of the given coin, consider an event A = {T ail}
Since in a trial, the occurrence of a particular event A is considered to be a success
and the non-occurrence of the event A is considered a failure, then we can interpret it
as a Bernoulli(p) trial with:

• S = {0, 1} where, tail is denoted by 1 and head by 0.


2
• p = (given)
3
 
2
Now the coin is tossed 5 times, i.e., Bernoulli trial is repeated 5 times indepen-
3
dently, which implies that the sample space will consist of 32 outcomes.

(i) P (1 Tail) =P ({T HHHH, HT HHH, HHT HH, HHHT H, HHHHT }


=P ({10000, 01000, 00100, 00010, 00001})
(Since, the events {10000},{01000},. . .,{00001} are disjoint)
=⇒ P (1 Tail) =P ({10000}) + P ({01000}) + P ({00100}) + P ({00010}) + P ({00001})
   5−1    5−1    5−1
2 1 2 1 2 1
= + + ... + ; (Trials are independent)
3 3 3 3 3 3
   5−1
2 1
=5 ×
3 3
10
∴ P (1 Tail) =
243

(ii) P (5 Tails) = P (T T T T T )
= P (1 in all the 5 trials)
= P (11111)
 5  5−5
2 2
= × 1− ; (Trials are independent)
3 3
 5
2
=
3
32
∴ P (5 Tails) =
243

46
2.7.2 Binomial Distribution

We can simplify the repeated Bernoulli trials (section 2.7.1.2) to Binomial distribution.
Consider a Bernoulli(p) trial with S = {0, 1}.
If this Bernoulli(p) trial is repeated n times independently, then the number of successes
in these n trials is given by Binomial distribution as follows:

• Sample space; S = {0, 1, 2, . . . , n}


Because we can have 0 successes, 1 success, 2 successes, . . . , n successes.
• Distribution function is given by

P (B(n, p) = k) = nCk pk (1 − p)n−k ; k = 0, 1, 2, . . . , n


Proof:

P (B(n, p) = k) = P (Out of n Bernoulli(p) trials exactly k are 1′ s)


= P (Trials b1 , b2 , . . . , bn result in exactly k 1s)
= (Number of b1 , b2 , . . . , bn with exactly k 1′ s) × pk (1 − p)n−k

Number of ways to get exactly k 1′ s in n trials = nCk .


∴ P (B(n, p) = k) = nCk pk (1 − p)n−k ; k = 0, 1, 2, . . . , n

Remark:

P (B(n, p) = 0 or B(n, p) = 1 or . . . or B(n, p) = n) = 1


=⇒ P (B(n, p) = 0) + P (B(n, p) = 1) + . . . + P (B(n, p) = n) = 1 ; (Because disjoint events)
=⇒ (1 − p)n + nC1 p (1 − p)n−1 + nC2 p2 (1 − p)n−2 + . . . + pn = 1

2.7.2.1 Visualtion of Binomial Distribution

47
Note:

• The plot starts at (1 − p)n and then increases till it reaches the peak and then falls
to pn
• The peak is roughly around np and the exact values are as follows:
(1) If ‘(n + 1)p’ is an integer, then it is bimodal (i.e. two peaks) and the two peak
values are ‘(n + 1)p’ and ‘(n + 1)p − 1’.
(2) If ‘(n + 1)p’ is not an integer, then there exists a unique modal value (i.e. unique
peak value) and it’s the integral part of ‘(n + 1)p’.

48
2.7.2.2 Solved Examples:

Q1. Each person has a disease with probability 0.1 independently. Out of 100 random
persons tested for the disease, what is the probability that 20 persons test positive?
Assume that the disease can be tested accurately with no false positives.

Solution:
Each test can be considered as a Bernoulli(p) trial, where success is the person tested
as positive and p = 0.1.
Now, this Bernoulli(0.1) trial is repeated 100 times independently. Hence, the proba-
bility of getting 20 successes in these 100 trials is given by:

P (B(100, 0.1) = 20) = 100C20 (0.1)20 (1 − 0.1)100−20


= 100C20 (0.1)20 (0.9)80
∴ P (B(100, 0.1) = 20) ≈ 0.0012

Therefore, probability that out of 100 random persons tested for the disease, exactly
20 persons test positive is 0.0012 approximately.

Q2. Suppose a fair coin is tossed 10 times, then find the following probabilities:

(i) The number of heads is a multiple of 3.


(ii) The number of heads is even.

Solution:
Each toss of a fair coin can be considered as a Bernoulli(p) trial, where success is
getting a head and p = 0.5.
Now, this Bernoulli(0.5) trial is repeated 10 times independently.
Hence, its distribution is defined as Binomial(10, 0.5), say B with sample space
S = {0, 1, 2, . . . , 10}
Let us define the events
E : {Number of heads is a multiple of 3} = {0, 3, 6, 9}
F : {Number of heads is even} = {0, 2, 4, 6, 8, 10}

49
(i) The probability that the number of heads is a multiple of 3 is given by:

P (E) = P (B = 0 or B = 3 or B = 6 or B = 9)
= P (B = 0) + P (B = 3) + P (B = 6) + P (B = 9) ; (Because, disjoint events)
 10  3  10−3  6  10−6
1 10 1 1 10 1 1
= 1− + C3 1− + C6 1−
2 2 2 2 2
 9  10−9
1 1
+ 10C9 1−
2 2
 10
1
= [1 + 10C3 + 10C6 + 10C9 ]
2
∴ P (E) = 0.33

Hence, probability that the number of heads is a multiple of 3 is 0.33.


(ii) The probability that the number of heads is even is given by:

P (F ) = P (B = 0 or B = 2 or B = 4 or B = 6 or B = 8 or B = 10)

Since, the events B = 0, B = 2, . . . , B = 10 are disjoint events

P (F ) = P (B = 0) + P (B = 2) + P (B = 4) + P (B = 6) + P (B = 8) + P (B = 10)
 10  2  10−2  4  10−4
1 10 1 1 10 1 1
= 1− + C2 1− + C4 1−
2 2 2 2 2
 6  10−6  8  10−8  10
1 1 1 1 1
+ 10C6 1− + 10C8 1− +
2 2 2 2 2
 10
1
= [1 + 10C2 + 10C4 + 10C6 + 10C8 + 10C10 ]
2
∴ P (E) = 0.50

Hence, probability that the number of heads is even is 0.50.

Q3. A bit (0 or 1) sent by Alice to Bob gets flipped with probability 0.1.

(i) If 5 bits are sent by Alice independently, what is the probability that at most 2
bits get flipped?
(ii) If 10 bits are sent by Alice independently, what is the probability that at most 2
bits get flipped?

Solution:
Each bit sent by Alice can be considered as a Bernoulli(p) trial, where success is a bit
getting flipped and p = 0.1.

50
(i) The Bernoulli(0.1) trial is repeated 5 times independently.
Hence, its distribution is defined as Binomial(5, 0.1), say B with sample space;
S = {0, 1, 2, . . . , 5}
Let us define an event E : {At most 2 bits gets flipped} = {0, 1, 2}

P (E) = P (B = 0 or B = 1 or B = 2)
= P (B = 0) + P (B = 1) + P (B = 2) ; (Because, disjoint events)
= (1 − 0.1)5 + 5C1 (0.1)(1 − 0.1)5−1 + 5C2 (0.1)2 (1 − 0.1)5−2
= (0.9)5 + 5C1 (0.1)(0.9)4 + 5C2 (0.1)2 (0.9)3
∴ P (E) = 0.9914

Hence, the probability that at most 2 of the 5 bits gets flipped is 0.9914.
(ii) The Bernoulli(0.1) trial is repeated 10 times independently.
Hence, its distribution is defined as Binomial(10, 0.1), say B, with sample space;
S = {0, 1, 2, . . . , 10}
Let us define an event E : {At most 2 bits gets flipped} = {0, 1, 2}

P (E) = P (B = 0 or B = 1 or B = 2)
= P (B = 0) + P (B = 1) + P (B = 2) ; (Because, disjoint events)
= (1 − 0.1)10 + 10C1 (0.1)(1 − 0.1)10−1 + 10C2 (0.1)2 (1 − 0.1)10−2
= (0.9)10 + 10C1 (0.1)(0.9)9 + 5C2 (0.1)2 (0.9)8
∴ P (E) = 0.9298

Hence, the probability that at most 2 of the 10 bits gets flipped is 0.9298.

2.7.3 Geometric Distribution

We can also simplify the repeated Bernoulli trials (section 2.7.1.2) to Geometric dis-
tribution.
Consider a Bernoulli(p) trial with S = {0, 1}.
If this Bernoulli(p) trial is repeated independently, then the number of trials needed
for the first success is given by the Geometric distribution as follows:

• Sample space; S = {1, 2, 3, 4, 5, 6, . . .}


This is because we can get the first success in 1st trial, 2nd trial, 3rd trial, . . ., and
so on.
• P (G(p) = k) = (1 − p)k−1 p ; k = 1, 2, 3, . . .

51
Proof:

P (G(p) = k) = P (Failure in the first (k − 1) trials and success in the k-th trial)
= P (First (k − 1) trials result in 0 and k-th trial is 1)
= P ( 00 . . . 0} 1)
| {z
(k-1) times

∴ P (G(p) = k) = (1 − p)k−1 p ; k = 1, 2, 3 . . .

Remark:

• The number of trials is unknown, as we keep on repeating the trials independently


till we get the first success.
• The probability of getting the first success within the first k trials is given by:

P (G(p) ≤ k) = P (G(p) = 1 or G(p) = 2 or . . . or G(p) = k)


P (G(p) ≤ k) = P (G(p) = 1) + P (G(p) = 2) + . . . + P (G(p) = k)
P (G(p) ≤ k) = p + (1 − p)p + (1 − p)2 p + . . . + (1 − p)k−1 p
P (G(p) ≤ k) = p[1 + (1 − p) + (1 − p)2 + . . . + (1 − p)k−1 ]
1 − (1 − p)k
 
P (G(p) ≤ k) = p ; (By using sum of G.P)
1 − (1 − p)
∴ P (G(p) ≤ k) = 1 − (1 − p)k

2.7.3.1 Visualisation of Geometric Distribution

52
Note:
• The plot starts at p and then keeps falling.
• Even though the plot keeps on decreasing but, if p < 1, it never goes all the way to
zero.

2.7.3.2 Solved Examples:

Q1. In Ludo, a player needs to repeatedly throw a fair die till she gets a 1. Find the
following:
(i) The probability that she needs lesser than 6 throws.
(ii) The probability that she needs lesser than 11 throws.
(iii) The probability that she needs lesser than 21 throws.
Solution:
Each throw of a fair die can be considered as a Bernoulli(p) trial, where success is
1
getting 1 and p = .
6
 
1
Now, this Bernoulli trial is repeated independently till we get the first success.
6  
1
Hence, its distribution is defined as Geometric , say G, with sample space ; S =
6
{1, 2, 3 . . .}
(i) The probability that she needs lesser than 6 throws is:
P (G < 6) = P (G ≤ 5)
 5
1
=1− 1− ; (Remark in section 2.7.3)
6
 5
5
=1−
6
∴ P (G < 6) = 0.5981
Hence, the probability that she needs lesser than 6 throws is 0.5981.
(ii) The probability that she needs lesser than 11 throws is:
P (G < 11) = P (G ≤ 10)
 10
1
=1− 1− ; (Remark in section 2.7.3)
6
 10
5
=1−
6
∴ P (G < 11) = 0.8385

53
Hence, the probability that she needs lesser than 11 throws is 0.8385.
(iii) The probability that she needs lesser than 21 throws is:
P (G < 21) = P (G ≤ 20)
 20
1
=1− 1− ; (Remark in section 2.7.3)
6
 20
5
=1−
6
∴ P (G < 21) = 0.9739
Hence, the probability that she needs lesser than 6 throws is 0.9739.

Q2. Player 1 is 40% free throw shooter, while Player 2 is 70% shooter. Each throw
is independent of all previous throws. The two players alternate shooting, with
Player 1 starting till the basket is scored.
(i) What is the probability that Player 1 wins before the 3rd round?
(ii) What is the probability that Player 1 wins?

Solution:
Each throw of basketball by Player 1 and by Player 2 can be considered as Bernoulli(0.4)
and Bernoulli(0.7) trial, respectively, where success is getting the basket scored.
(i) The favourable outcomes for an event that Player 1 wins before the third round
is as follows:
• Player 1 wins in the first round, i.e. 1P1
And, the probability is given by:
P (1P1 ) = 0.4
• Player 1 wins in the second round, which implies that both player loses in the
first round i.e. 0P1 0P2 1P1
And, the probability is given by:
P (0P1 0P2 1P1 ) = P (0P1 )P (0P2 )P (1P1 ) ; (because, independent events)
= (1 − 0.4)(1 − 0.7)(0.4)
= (0.6)(0.3)(0.4)
∴ P (0P1 0P2 1P1 ) = 0.072
Since, the events {1P1 } and {0P1 0P2 1P1 } are disjoint. Therefore, the probability
that Player 1 wins before the third round is:
P (Player 1 wins before 3rd round) = P (1P1 ) + P (0P1 0P2 1P1 )
= 0.4 + 0.072
= 0.472

54
Hence, there is 47.2% chance that Player 1 wins the game before the third
round.
(ii) Similarly, the favourable outcomes for the event that Player 1 wins the game are:

{1P1 , 0P1 0P2 1P1 , (0P1 0P2 )2 1P1 , (0P1 0P2 )3 1P1 , (0P1 0P2 )4 1P1 , . . .}

where, (0P1 0P2 )i means that (0P1 0P2 ) is repeated i times.


Now,
• The probability for {1P1 } given by:

P (1P1 ) = 0.4

• The probability for {0P1 0P2 1P1 } given by:

P (0P1 0P2 1P1 ) = P (0P1 )P (0P2 )P (1P1 ) ; (because, independent events)


= (1 − 0.4)(1 − 0.7)(0.4)
∴ P (0P1 0P2 1P1 ) = (0.6)(0.3)(0.4)

• The probability for {(0P1 0P2 )2 1P1 } given by:

P ((0P1 0P2 )2 1P1 ) = P (0P1 )P (0P2 )P (0P1 )P (0P2 )P (1P1 )


(Because, independent events)
= (1 − 0.4)(1 − 0.7)(1 − 0.4)(1 − 0.7)(0.4)
∴ P (0P1 0P2 1P1 ) = [(0.6)(0.3)]2 (0.4)

and so on.
Since, the events {1P1 }, {0P1 0P2 1P1 },{(0P1 0P2 )2 1P1 },. . . are disjoint, probability
that Player 1 wins the game is:

P (Player 1 wins) = P (1P1 ) + P (0P1 0P2 1P1 ) + P ((0P1 0P2 )2 1P1 ) + . . .


= 0.4 + [(0.6)(0.3)]0.4 + [(0.6)(0.3)]2 0.4 + . . .
0.4
= ; (Using sum of infinite GP series)
1 − (0.6)(0.3)
0.4
=
1 − 0.18
∴ P (Player 1 wins) = 0.4878

Hence, there is 48.78% chance that Player 1 wins the game.

Remark:
This question is not quite based on geometric distribution because, if it had been
geometric, then instead of (0.6)(0.3) it would have been 0.6.

55
2.8 Discrete Random Variable
Random Variable:
A random variable is a function with domain as the sample space of an experiment
and range as the real numbers, i.e. a function from the sample space to the real line.
There is a technical condition that functions need to satisfy, which will be discussed in
section 2.8.1.
Discrete Random Variable:
A random variable is said to be discrete if its range is a discrete set.
Remarks:

• The range of a random variable is the set of values taken by it and is a subset of the
real line.
• The discrete sets are as follows:
(i) All finite subsets of the real line are discrete.
(ii) All subsets of integers are discrete.
(iii) The set of all integer multiples of a real number is discrete.
(iv) Any subset of a discrete set is discrete.
• The non-discrete sets are as follows:
(i) Any interval is not discrete, i.e. (a, b) is not discrete for a < b.
(ii) Any subset that contains a non-discrete set is not discrete.

Examples:

(1) Tossing a coin: S = {H, T}.


A few possible valid random variables are:
• Random variable X : X(H) = 0, X(T) = 1
Range (X) = {0, 1}
• Random variable Y : Y (H) = −10, Y (T) = 200
Range (Y ) = {−10, 200}

• Random variable
√ Z : Z(H) = 2, Z(T) = π
Range (Z) = { 2, π}
• Random variable U : U (H) = 0, U (T) = 0
Range (U ) = {0}
(2) Throw a die: S = {1, 2, 3, 4, 5, 6}
A random variable can be defined as follows:
• Pick any 6 real numbers, say x1 , x2 , x3 , x4 , x5 , x6 .
• X(1) = x1 , X(2) = x2 , X(3) = x3 , X(4) = x4 , X(5) = x5 , X(6) = x6
i.e. Range (X) = {x1 , x2 , x3 , x4 , x5 , x6 }

56
However, xi ’s need not be distinct.
Suppose x2 = x4 = x6 = 1 and x1 = x3 = x5 = 0, then
Random variable E : E(2) = E(4) = E(6) = 1, E(1) = E(3) = E(5) = 0
But if xi ’s are distinct, then the random variable is a one-to-one function.

Note:

• For an experiment, a lot of valid random variables are possible but usually, the
meaningful functions are considered, like random variable X in the first example.
• In practice, the random variables are easier to assign probabilities and easier to
work with.

2.8.1 Random Variable and events

The technical conditions for a subset of sample space to be an event is that if X is a


random variable associated to the experiment then, the event should be expressible in
terms of X.
For any real x, a few examples of expression of events in terms of random variable X
are as follows:

(i) (X < x) = {s ∈ S : X(s) < x}


i.e. set of all outcomes for which X takes value less than x.
(ii) (X > x) = {s ∈ S : X(s) > x}
i.e. set of all outcomes for which X takes value greater than x.
(iii) (X = x) = {s ∈ S : X(s) = x}
i.e. set of all outcomes for which X takes value equal to x.
(iv) (X ≤ x) = {s ∈ S : X(s) ≤ x}
i.e. set of all outcomes for which X takes value less than or equal to x.
(v) (X ≥ x) = {s ∈ S : X(s) ≥ x}
i.e. set of all outcomes for which X takes value greater than or equal to x.
(vi) (x1 < X < x2 ) = {s ∈ S : x1 < X(s) < x2 } ; x1 , x2 ∈ R
i.e. set of all outcomes for which X takes value greater than x1 but less than x2 .

Examples:

(1) Rolling a die: S = {1, 2, 3, 4, 5, 6}


Random variable X : X(1) = 1, X(2) = 2, X(3) = 3, X(4) = 4, X(5) = 5, X(6) =
6
Range (X) = {1, 2, 3, 4, 5, 6}
Any event E can be expressed in terms of X as follows:
• E : {The die shows 1} : (X = 1)

57
• E : {1, 2, 3} : (X < 4)
• E : {2, 5} : (X = 2) ∪ (X = 5)
(2) Rolling a die: S = {1, 2, 3, 4, 5, 6}
Random variable X : X(2) = X(4) = X(6) = 1, X(1) = X(3) = X(5) = 0
Range (X) = {0, 1}
Any event E can be expressed in terms of X as follows:
• E : {1, 3, 5} : (X = 0)
• E : {2, 4, 6} : (X = 1)
• E : {Null Event} : (X < 0)
• E : {1, 2, 3, 4, 5, 6} : (X ≤ 1)
Remark: Not all events can be written in terms of X, like E : {2, 5}.
Hence, due to the technical restriction, this subset of sample space cannot be
considered as an event.

2.8.2 Distribution of a Discrete Random Variable

2.8.2.1 Probability Mass Function (PMF)


The probability mass function (PMF) of a discrete random variable X with range set
T is the function fX : T → [0, 1] defined as:

fX (t) = P (X = t) for t ∈ T

In general, the probability mass function is the probability that the random variable
takes one particular value.
The probability of any event, defined using the random variable X, can be computed
by using the PMF as follows:

• Let (X = t) be an event then,

P (X = t) = P (all outcomes that result in X taking value t)

• Let B be a subset of range of random variable T and (X ∈ B) be an event, then

P (X ∈ B) = P (all outcomes that result in X taking values in B)


X
= P (X = t)
t∈B
X
= fX (t)
t∈B

Note: A discrete random variable X with range T = {t1 , t2 , . . . , tk } and PMF fX can
be anticipated in tabular form as follows:

58
t t1 t2 ... tk
fX (t) fX (t1 ) fX (t2 ) ... fX (tk )

where, the first row is the value that the random variable takes and the second row is
the probability with which the random variable takes that value.

2.8.2.2 Properties of PMF


Consider a random variable X with range T = {t1 , t2 , . . . , tk } and PMF fX .
The defining properties of PMF are:

(1) 0 ≤ fX (t) ≤ 1, because fX (t) = P (X = t) is a probability.


X
(2) fX (t) = 1, because P (X ∈ T ) = 1
x∈T

2.8.3 Solved Examples:

Q1. The PMF of a random variable X is given in the following table:


t −1 1 2 4
fX (t) 0.5 0.25 0.125
Compute the following:
(i) Range of X
(ii) fX (4)
(iii) P (X > 3)
 
3
(iv) P X <
2

Solution:

(i) The range of a random variable is the set of values taken by it, i.e.
TX = {−1, 1, 2, 4}
X
(ii) By the property of PMF, we know that fX (t) = 1.
x∈T

=⇒ fX (−1) + fX (1) + fX (2) + fX (4) = 1


=⇒ 0.5 + 0.25 + 0.125 + fX (4) = 1
=⇒ 0.875 + fX (4) = 1
∴ fX (4) = 0.125

59
(iii) Let (X ∈ B) be an event such that B = (X > 3) = {4} then,
X
P (X > 3) = P (X = t)
t∈B
= P (X = 4)
∴ P (X > 3) = 0.125
 
3
(iv) Let (X ∈ B) be an event such that B = X < = {−1, 1} then,
2
  X
3
P X< = P (X = t)
2 t∈B
= P (X = −1) + P (X = 1)
= 0.5 + 0.25
 
3
∴P X< = 0.75
2
Q2. The PMF of a random variable X is given by:
c
fX (k) = k ; k = 1, 2, 3, . . .
3
Compute the following:
(i) c
(ii) P (X > 10)
(iii) P (X > 10|X > 5)
Solution:
(i) The range of random variable X is not finite but countable, i.e.
T = {1, 2, 3, . . .} X
By property of PMF, we know that fX (t) = 1
x∈T

=⇒ fX (1) + fX (2) + fX (3) + . . . = 1


c c c
=⇒ + 2 + 3 + ... = 1
3 3 3
 
c 1 1
=⇒ 1 + + 2 + ... = 1
3 3 3
Using the sum of infinite GP series, we get:
 
c 1
=1
3 1 − 31
2
∴c=3× =2
3
60
(ii)
P (X > 10) = P (X = 11) + P (X = 12) + P (X = 13) + . . .
c c c
= 11 + 12 + 13 + . . .
3 3 3 
c 1 1
= 11 × 1 + + 2 + . . .
3 3 3
 
c 1
= 11 × ; (Sum of infinite GP series)
3 1 − 13
2 3
= 11 × ; (because, c = 2)
3 2
1
∴ P (X > 10) = 10
3
(iii) By using the conditional probability, we get:
P [(X > 10) ∩ (X > 5)]
P (X > 10|X > 5) =
P (X > 5)
P (X > 10)
= (1)
P (X > 5)
Now,
P (X > 5) = P (X = 6) + P (X = 7) + P (X = 8) + . . .
c c c
= 6 + 6 + 8 + ...
3 3
 3 
c 1 1
= 6 × 1 + + 2 + ...
3 3 3
 
c 1
= 6× ; (Sum of infinite GP series)
3 1 − 13
2 3
= 6× ; (because, c = 2)
3 2
1
∴ P (X > 5) = 5
3
Putting the value of P (X > 5) in (1), we get:
P (X > 10)
P (X > 10|X > 5) =
P (X > 5)
 
1
310
=  
1
35
1
∴ P (X > 10|X > 5) = 5
3
61
Q3. A fair coin is flipped three times.
(i) Associate a random variable to the number of heads and compute its
PMF.
(ii) Associate a random variable to the flip number (if any) that shows the
first head and compute its PMF.
Solution:
Let us define the following random variable:
X : Number of heads in three flips of a fair coin
Y : Flip number that shows the first head
Sample space; S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
Since the same fair coin is flipped thrice, the distribution is uniform.
Outcome P(Outcome) X Y
HHH 1/8 3 1
HHT 1/8 2 1
HT H 1/8 2 1
HT T 1/8 1 1
T HH 1/8 2 2
T HT 1/8 1 2
TTH 1/8 1 3
TTT 1/8 0 None
(i) Range (X) ; T = {0, 1, 2, 3}
Now,
1
• fX (0) = P ({T T T }) =
8

• fX (1) = P ({HT T, T HT, T T H})


= P ({HT T }) + P ({T HT }) + P ({T T H}) ; (because, disjoint events)
1 1 1 3
= + + =
8 8 8 8

• fX (2) = P ({HHT, HT H, T HH})


= P ({HHT }) + P ({HT H}) + P ({T HH}) ; (because, disjoint events)
1 1 1 3
= + + =
8 8 8 8

1
• fX (3) = P ({HHH}) =
8
Hence,

62
t 0 1 2 3
fX (t) 1/8 3/8 3/8 1/8

(ii) Range (Y ) ; T = {0, 1, 2, 3} (Denoting ‘None’ as 0)


Now,
1
• fY (0) = P ({T T T }) =
8
• fY (1) = P ({HHH, HHT, HT H, HT T })
= P ({HHH}) + P ({HHT }) + P ({HT H}) + P ({HT T }) ; (because,
disjoint events)
1 1 1 1 4
= + + + =
8 8 8 8 8
• fY (2) = P ({T HH, T HT })
= P ({T HH}) + P ({T HT }) ; (because, disjoint events)
1 1 2
= + =
8 8 8
1
• fY (3) = P ({T T H}) =
8
Hence,
t 0 1 2 3
fY (t) 1/8 4/8 2/8 1/8

Q4. For a certain lottery, a three-digit number is randomly selected (from 000 to
999). If a ticket matches the number exactly, it is worth 2 lakhs. If the ticket
matches exactly two of the three digits, it is worth |20000. Otherwise, it is
worth nothing. Let X be the value of the ticket. Find the distribution of X.
Solution:
There are a total of 1000 tickets numbered from 000 to 999.
Range (X) ; T = {0, 20000, 200000}
• fX (200000) = P (All three digits are matched)
1
=
1000

• fX (20000) = P (Any two of three digits matched)


(Since any two of three digits can be same and third one can be any of remaining 9)
3
C2 × 12 × 9
fX (20000) =
1000
27
∴ fX (20000) =
1000

63
• fX (0) = 1 − fX (20000) − fX (200000)
(By using the property of PMF)
1 27
=⇒ fX (0) = 1 − −
1000 1000
28
fX (0) = 1 −
1000
972
∴ fX (0) =
1000

2.9 Common Discrete Distributions


2.9.1 Uniform Random Variable

A random variable X follows Uniform distribution with parameter T , i.e. X ∼ U (T ),


where T is some finite set, then:

• Range (X) = T
1
• PMF fX (t) = for all t ∈ T and |T | = Size of finite set T
|T |
Note: If all the outcomes of a random variable are equally likely, then it can be
assumed to be uniform.
Examples:

i) Tossing a fair coin: S = {H, T}.


Random variable X : X(H) = 0, X(T) = 1
=⇒ X ∼ U ({0, 1})
ii) Throw a fair die: S = {0, 1,2,3,4,5,6}.
Random variable X : X(1) = 1, X(2) = 2, X(3) = 3, X(4) = 4, X(5) = 5, X(6) =
6
=⇒ X ∼ U ({1, 2, 3, 4, 5, 6})

2.9.2 Bernoulli Random Variable

A random variable X follows Bernoulli distribution with parameter p, i.e.


X ∼ Bernoulli(p), where p = probability of success and 0 ≤ p ≤ 1, then:

• Range (X) ; T = {0, 1}


• PMF fX (t) = pt (1 − p)1−t for all t ∈ T

It is also referred to as binary in some sense because it takes only two values, 0 and 1.
Refer section 2.7.1 for more details.

64
2.9.3 Binomial Random Variable

A random variable X follows Binomial distribution with parameter n and p, i.e.


X ∼ Binomial(n, p), where
p = Probability of success ; 0 ≤ p ≤ 1
n = Number of independent Bernoulli trials ; n : Positive integer
then:

• Range (X) ; T = {0, 1, 2, 3, . . . , n}


• PMF fX (t) = nCt pt (1 − p)n−t for all t ∈ T

Refer section 2.7.2 for more details.

2.9.4 Geometric Random Variable

A random variable X follows Geometric distribution with parameter p, i.e.


X ∼ Geometric(p), where p = probability of success and 0 ≤ p ≤ 1 then:

• Range (X) = T
1
• PMF fX (t) = for all t ∈ T and |T | = Size of finite set T
|T |
Refer section 2.7.3 for more details.

2.9.5 Negative Binomial Distribution

Consider a Bernoulli(p) trial with S = {0, 1}


If this Bernoulli(p) trial is repeated independently, then the number of trials needed
for the first r successes is given by the Negative Binomial distribution as follows:

• Sample space ; S = {r, r + 1, r + 2, r + 3, . . .}


Because we can get the first r successes in r trials, r + 1 trials, r + 2 trials, . . ., so
on.
• P (N B(r, p) = k) = k−1Cr−1 (1 − p)k−r pr ; k = r, r + 1, r + 2, . . .

Note: Negative binomial distribution is a generalization of the Geometric distribution.


On plugging r = 1 in the PMF of negative binomial distribution, one can obtain the
geometric distribution.

65
2.9.5.1 Negative Binomial Random Variable
A random variable X follows Negative Binomial distribution with parameters r and p,
i.e.
X ∼ Negative Binomial(r, p), where
p = Probability of success ; 0 ≤ p ≤ 1
r = Number of successes ; r : Positive integer
then:

• Range (X) ; T = {r, r + 1, r + 2, r + 3, . . .}


• PMF fX (t) = k−1Cr−1 pr (1 − p)k−r for all t ∈ T

2.9.6 Poisson Distribution

Consider an event that keeps occurring over a period of time with the following two
assumptions:

1. The rate of occurrence (λ) of the event is constant.


2. Given one occurrence, the time for the next occurrence is not affected by it, i.e.
independent occurrences of the event.

Then the number of times the event occurred in a fixed interval of time is given by
Poisson distribution as follows:

• Sample space ; S = {0, 1, 2, 3, . . .}


Because, in a given interval of time the event can occur 0 times, 1 time, 2 times,. . .,
so on.
λk
• P (P (λ) = k) = e−λ ; k = 0, 1, 2, 3, . . .
k!

2.9.6.1 Application of Poisson Distribution

(1) Emission of a particle by radioactive decay.


In 2608 time intervals of 7.5 seconds each, emission of alpha particles is as follows:

No. of particles 0 1 2 3 4 5 6 7 8 9 10
No. of times 57 203 383 525 532 408 273 139 45 27 16
Fraction 0.022 0.078 0.147 0.201 0.204 0.156 0.105 0.053 0.017 0.0104 0.0061

66
The rate of occurrence of the event, i.e.

Total number of particles


Emission rate ; λ =
Number of intervals
(0 × 57) + (1 × 203) + (2 × 383) + . . . + (10 × 16)
=
2608
10086
=
2608
∴ λ = 3.8673

This implies that there are 3.8673 emission of particles per 7.5 seconds.
Here, the emission rate can be assumed to be constant, and also it is reasonable
to assume that the time of next emission is independent of the past emissions.
Hence, by using the Poisson distribution, we get:

λk
P (No. of Particles = k) = e−λ
k!
The probabilities computed using the Poisson distribution are close to the fractions
given in the Table.

Visually, this implies that the above assumptions holds true and the Poisson dis-
tribution fits the data, i.e. the chances that k particles are emitted in a given time
interval of 7.5 seconds can be obtained using the Poisson distribution.
(2) A meteorite entering earth’s atmosphere
In 276 months from 1994-2016, the number of fireballs entering the atmosphere is
as follows:

67
No. of fireballs 0 1 2 3 4 5 6 7 8 9 10
No. of times 24 54 76 52 37 21 7 3 1 0 1
Fraction 0.087 0.196 0.275 0.188 0.134 0.076 0.025 0.0109 0.0036 0 0.0036

The rate of occurrence of the event, i.e.

Total number of fireballs


Arrival rate ; λ =
Number of intervals
(0 × 24) + (1 × 54) + (2 × 76) + . . . + (10 × 1)
=
276
696
=
276
∴ λ = 2.5217

This implies that there are 2.5217 fireballs per month.


Here, the arrival rate can assumed to be constant, and also it is reasonable to
assume that the time of next arrival is independent of the past arrival of fireballs.
Hence, by using the Poisson distribution, we get:

λk
P (No. of fireballs = k) = e−λ
k!
The probabilities computed using the Poisson distribution are close to the fractions
given in the Table.

Visually, this implies that the above assumptions holds true and the Poisson dis-
tribution fits the data, i.e. the chances that k fireballs arrived in a month can be
obtained using the Poisson distribution.

68
2.9.6.2 Visualtion of Poisson Distribution

Note: As the value of λ increases, the peak is shifted from left to right.

2.9.6.3 Poisson Random Variable


A random variable X follows Poisson distribution with parameter λ, i.e.
X ∼ Poisson(λ), where λ > 0 then:
• Range (X) ; T = {0, 1, 2, 3, . . .}
λt
• PMF fx (t) = e−λ × for all t ∈ T
t!

2.9.7 Hypergeometric Distribution

Consider a population of N persons of which r are of Type 1 and the remaining N − r


are of Type 2. If m persons are selected uniformly at random, without replacement,
then the number of Type 1 persons selected is given by Hypergeometric distribution
as follows:
• Sample space ; S = {max(0, m − (N − r)), . . . , min(r, m)}
Because, we can select 0, 1, 2, . . . , m persons of Type 1, but:
(i). If the number of persons to be selected is more than the persons of Type 1,
i.e. m > r, then the maximum number of persons of Type 1 selected is r only.
(ii). If the number of persons to be selected is more than persons of Type 2, i.e.
m > N − r, then the minimum number of persons of Type 1 selected is
m − (N − r) only.

69
For more details, refer to examples of section 2.9.7.1
r
Ck × N −r Cm−k
• P (H(N, r, m) = k) = NC
; k = max(0, m − (N − r)), . . . , min(r, m)
m

2.9.7.1 Hypergeometric Random Variable


A random variable X follows Hypergeometric distribution with parameter N, r and m,
i.e. X ∼ Hypergeometric(N, r, m), where
N = Population size ; N : Positive integer
r = Number of Type 1 entities in population ; r : Positive integer
m = Number of entities to be selected from population ; m : Positive integer
then:

• Range (X) ; T = {max(0, m − (N − r)), . . . , min(r, m)}


Ck × N −r Cm−k
r
• PMF fX (t) = NC
for all t ∈ T
m

Examples:

• If N = 100, r = 50, m = 20 =⇒ Range (X) = {0, 1, 2, . . . , 20}


• If N = 100, r = 10, m = 20 =⇒ Range (X) = {0, 1, 2, . . . , 10}
• If N = 100, r = 90, m = 20 =⇒ Range (X) = {10, 11, 12, . . . , 20}

2.10 Functions of one discrete random variable


Consider a random variable X with range T and PMF fX (t).
Let g(X) be a function from R to real line. Then, g(X) can be seen as a composite of
two functions, as:

• X is a function from sample space to T (subset of real line).


• g(X) is a function from T to another subset of real line.

So, g(X) is also a random variable in the same probability space.

Remarks:
If X is a random variable with range T and PMF fx (t), then the PMF of function of
the random variable X, i.e. g(X) can be found using fx (t) as follows:

fg(X) (a) = P (g(X) = a)


= P (X ∈ {t : g(t) = a})
X
∴ fg(X) (a) = fx (t)
t:g(t)=a

70
2.10.1 Solved Examples:

Q1. Let X ∼ Uniform({−2, −1, 0, 1, 2}) and g(x) = x2 . Find:

(i) Range of g(X)


(ii) PMF of g(X)

Solution:
Range (X) ; T = {−2, −1, 0, 1, 2}
1
fX (t) = for all t ∈ T ; (Since X ∼ Uniform)
5

t fX (t) g(t) = t2
−2 1/5 4
−1 1/5 1
0 1/5 0
1 1/5 1
2 1/5 4

(i) Range g(X) = {0, 1, 4}


(ii) PMF of g(X) = X 2 is as follows:

• fg(X) (0) = P (X = 0)
1
=
5

• fg(X) (1) = P (X ∈ {−1, 1})


= P (X = 1) + P (X = −1)
1 1 2
= + =
5 5 5

• fg(X) (4) = P (X ∈ {−2, 2})


= P (X = 2) + P (X = −2)
1 1 2
= + =
5 5 5
Hence,
a 0 1 4
fg(X) (a) 1/5 2/5 2/5

71
  (
1 x for x < 5
Q2. Let X ∼ Geometric and g(X) =
2 5 for x ≥ 5
Find:
(i) Range of g(X)
(ii) PMF of g(X)
Solution:
Range (X) ; T = {1, 2, 3, 4, 5, 5, 5, . . .}
 t−1
1 1
fX (t) = × for all t ∈ T ; (Because X ∼ Geometric)
2 2

t fX (t) g(t)
1 1/2 1
2 1/22 2
3 1/23 3
4 1/24 4
5 1/25 5
6 1/26 5
7 1/27 5
.. .. ..
. . .

(i) Range g(X) = {1, 2, 3, 4, 5}


(ii) The PMF of g(X) is as follows:

• fg(X) (1) = P (X = 1)
1
=
2

• fg(X) (2) = P (X = 2)
1
= 2
2
1
∴ fg(X) (2) =
4

• fg(X) (3) = P (X = 3)
1
= 3
2
1
∴ fg(X) (3) =
8

72
• fg(X) (4) = P (X = 4)
1
= 4
2
1
∴ fg(X) (4) =
16

• fg(X) (5) = P (X ∈ {5, 6, 7, . . .})


= P (X ≥ 5)
= 1 − P (X ≤ 4)
15
=1−
16
1
∴ fg(X) (5) =
16
Hence,
a 1 2 3 4 5
fg(X) (a) 1/2 1/4 1/8 1/16 1/16

73

You might also like