You are on page 1of 213

Chapter 2- 201781

Sample Space and Events

The sample space Ω (or S) associated with an experiment is the


set of all possible outcomes of such an experiment

Ω = {ω1 , . . . , ωn }

A subset E of Ω, E ⊆ Ω, is called an event


Informally, an event is a statement on the outcomes of a random
experiment

E can contain just one outcome (simple event or state) (e.g.


E = {ω3 })
or it can contain more than one (composite event or, simply,
event) (e.g. E = {ω2 , ω21 })

E = ∅ is referred to as the impossible (or vacuous) event


E = Ω is called the certain event or universal event
Sample Space
Describe the sample spaces for the following experiments
I roll a regular die
Sample Space
Describe the sample spaces for the following experiments
I roll a regular die
{1, 2, 3, 4, 5, 6}

I toss n coins (n = 4)
Sample Space
Describe the sample spaces for the following experiments
I roll a regular die
{1, 2, 3, 4, 5, 6}

I toss n coins (n = 4)

{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,


TTTT , TTTH, TTHT , TTHH, THTT , THHT , THTH, THHH}

I cast a regular die and, if 6 comes up, toss a coin


Sample Space
Describe the sample spaces for the following experiments
I roll a regular die
{1, 2, 3, 4, 5, 6}

I toss n coins (n = 4)

{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,


TTTT , TTTH, TTHT , TTHH, THTT , THHT , THTH, THHH}

I cast a regular die and, if 6 comes up, toss a coin


6H, 6T , 1, 2, 3, 4, 5

I measure the time for the emission of radioactive particle from


some atom
Sample Space
Describe the sample spaces for the following experiments
I roll a regular die
{1, 2, 3, 4, 5, 6}

I toss n coins (n = 4)

{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,


TTTT , TTTH, TTHT , TTHH, THTT , THHT , THTH, THHH}

I cast a regular die and, if 6 comes up, toss a coin


6H, 6T , 1, 2, 3, 4, 5

I measure the time for the emission of radioactive particle from


some atom
(0, ∞)
(this is a non-discrete case)
Events

Describe the events


1) when rolling a die, an even number comes up
2) when tossing n coins, the first n − 1 outcomes are tails
3) measuring the time for the emission of radioactive particle
from some atom, the emission occurs after 3 minutes
Events

Describe the events


1) when rolling a die, an even number comes up
2) when tossing n coins, the first n − 1 outcomes are tails
3) measuring the time for the emission of radioactive particle
from some atom, the emission occurs after 3 minutes

1) E = {2, 4, 6}
2)
n−1 n−1
z }| { z }| {
E = {TTTTTT T , TTTTTT H}
3) E = (3, ∞) (using minutes as unit of time)
Sample Space and Events

Events are subsets of a set, the sample space Ω


(Notice that not all subsets of Ω in general are events. The choice
of subsets that are to be events will depend on the phenomenon at
hand)

Logical statements on events ⇐⇒ operation with sets


Both event A and event B occur (conjunction) ⇐⇒ A ∩ B

at least one of A and B occurs (disjunction)/ either A or B or


both occur ⇐⇒ A ∪ B

the event A does not occur (negation) ⇐⇒ Ā

the event A occurs but the event B does not ⇐⇒ A \ B


A Set-theory Recap

Let A, B be two subsets of Ω

A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}
A ∩ B ≡ AB = {ω ∈ Ω : ω ∈ A and ω ∈ B}
Ā ≡ Ac = {ω ∈ Ω : ω ∈
/ A} = Ω \ A
B \ A = B ∩ Ā = {ω ∈ Ω : ω ∈ B and ω ∈
/ A}

Notation
When A is a countable set, we denote with |A| the number of
points/elements in A (the size or cardinality of A)
A Set-theory Recap
Properties
Commutativity
A ∪ B = B ∪ A, A∩B =B ∩A
Associativity
(A ∪ B) ∪ C = A ∪ (B ∪ C ), (A ∩ B) ∩ C = A ∩ (B ∩ C )
Distributivity
A∩(B ∪C ) = (A∩B)∪(A∩C ), A∪(B ∩C ) = (A∪B)∩(A∪C )
Idempotency
A ∪ A = A, A∩A=A
De Morgan’s laws
A ∪ B = Ā ∩ B̄, A ∩ B = Ā ∪ B̄

n
[ n
\ n
\ n
[
Ai = Āi , Ai = Āi
i=1 i=1 i=1 i=1
A Set-theory Recap. Poincaré Identities
Inclusion-Exclusion Principle (Poincaré Identity)
Let Ai ⊂ Ω, i = 1, . . . , n
n
[
| Ai | = |A1 ∪ A2 ∪ . . . ∪ An | =
i=1
n
X X
|Ai | − |Ai ∩ Aj |
i=1 1≤i<j≤n
X
+ |Ai ∩ Aj ∩ Ak |
1≤i<j<k≤n

− . . . + (−1)n−1 |A1 ∩ A2 ∩ . . . An |

For example, for n = 3

|A∪B ∪C | = |A|+|B|+|C |−|A∩B|−|A∩C |−|B ∩C |+|A∩B ∩C |


A Set-theory Recap. Poincaré Identities

Using the De Morgan’s law, it then follows


n
\ n
[
| Āi | = |Ω \ Ai |
i=1 i=1
n
[ n
[
= |Ω| − | Ai | since Ω ⊇ Ai
i=1 i=1
n
X X
= |Ω| − |Ai | + |Ai ∩ Aj |
i=1 1≤i<j≤n
X
− |Ai ∩ Aj ∩ Ak |
1≤i<j<k≤n

+ . . . + (−1)n−1 |A1 ∩ A2 ∩ . . . An |
An Example
In a group of 21 (Arabic-speaking) Lebanese, 16 speak French, 13
English, 4 Armenian, 9 English and French, 2 French and
Armenian, 3 English and Armenian and 1 English, French and
Armenian. How many speak only Arabic?
An Example
In a group of 21 (Arabic-speaking) Lebanese, 16 speak French, 13
English, 4 Armenian, 9 English and French, 2 French and
Armenian, 3 English and Armenian and 1 English, French and
Armenian. How many speak only Arabic?
Let A be the set of Armenophones, E the set of Anglophones, F
the set of Francophones
The set of those who speak only Arabic is
An Example
In a group of 21 (Arabic-speaking) Lebanese, 16 speak French, 13
English, 4 Armenian, 9 English and French, 2 French and
Armenian, 3 English and Armenian and 1 English, French and
Armenian. How many speak only Arabic?
Let A be the set of Armenophones, E the set of Anglophones, F
the set of Francophones
The set of those who speak only Arabic is

Ā ∩ Ē ∩ F̄ = A ∪ E ∪ F = Ω \ (A ∪ E ∪ F )

|Ā ∩ Ē ∩ F̄ | = |Ω| − |A ∪ E ∪ F |
= |Ω| − (|A| + |E | + |F |) + |A ∩ E | + |A ∩ F | + |E ∩ F |
−|A ∩ E ∩ F |
= 21 − (4 + 13 + 16) + 3 + 2 + 9 − 1 = 1
Some Terminology

Ω is also called the certain event

∅ is also called the impossible event

If A ∩ B = ∅, A and B are said to be disjoint or incompatible


or mutually exclusive events
(that A and B both occur is impossible: the occurrence of one
prevents the occurrence of the other)

If A ⊆ B, A is said to imply B (B occurs if A occurs)


In fact ⊆ is the set-theoretic equivalent of ⇒
Probability

The first two elements of a probabilistic model are the sample


space and the notion of events.

The third element is the assignment of a probability to the events


and outcomes of a random experiment

We need to formalize statements such as


the probability that when we roll a die an even number comes up is
1/2
Probability as Measure of Relative Frequencies

One interpretation views probability as a relative frequency (which


can be justified a posteriori by the result known as the law of large
numbers)

Carry out repeatedly and independently the same experiment a


large number of times N (roll the same die in the same conditions
N times)
record the number of times SN (E ) the event E occurs (an even
number comes up)
assign to the event the probability P(E ) = SN (E )/N, N large, (the
empirical limiting relative frequency in the N repetitions)
Probability as Measure of Relative Frequencies

The frequency definition of probability is based on the assumption


that identical and independent experiments can be carried out
(which is not always the case)
A priori there is no guarantee that the relative frequency should
converge to a limit and if that is the case, it is not clear how large
N should be for the approximation to be reliable
In general, the assignment of a probability to an event is rather
subtle
Sometimes there are some natural choices, for example based on
the existence of symmetries in the random experiment at hand
Sometimes the choice will be subjective (probability assignments
may differ from individual to individual)
Let us define probability as a (mathematical) function that satisfies
some properties (a version of these properties are indeed verified by
the relative frequency)
Discrete Probability Space
Let the sample space Ω be non-empty and countable for the rest of
this chapter
Third element necessary to complete the description of a
probabilistic model is a function
P : Ω → [0, 1]
called probability that verifies the following two properties
P1) P(Ω) = 1
The certain event has probability 1
P2) σ-additivity (countable additivity)
For every sequence (Ai )i∈N of disjoint/mutually exclusive
events, Ai ∩ Aj = ∅, i 6= j
∞ ∞
!
[ X
P Ai = P(Ai )
i=1 i=1

(Ω, P) is called a discrete probability space, with Ω its sample


space, and subsets of Ω the events
Probability. Properties

For a discrete probability space (Ω, P), the following statements


hold true
1) The impossible event has probability zero

P(∅) = 0

Proof. If Ai = ∅, i ∈ N, then ∞
S
i=1 Ai = ∅

∞ ∞ ∞
!
[ X X
P(∅) = P Ai = P(Ai ) = P(∅)
i=1 i=1 i=1

which holds iff P(∅) = 0.


Probability. Properties

2) Finite additivity
Let Ai , i = 1, . . . , n, be a finite family of disjoint events,
Ai ∩ Aj = ∅, i 6= j
n n
!
[ X
P Ai = P(Ai )
i=1 i=1

This follows from countable additivity, P2, setting Ak = ∅ for


all k ≥ n. Thus it is a weaker notion than countable additivity.

When Ω is finite, we can equally define the probability space


using properties P1 and P2 or P1 and finite additivity.
Probability. Properties
Other consequences of P1 and P2 (draw the corresponding Venn
diagram if in doubt)
3) P(Ā) = 1 − P(A)

1 = P(Ω) = P(A ∪ Ā) = P(A) + P(Ā)


since A ∩ Ā = ∅
4) For any A, B ⊆ Ω

P(B \ A) = P(B) − P(A ∩ B)

Since B = (B \ A) ∪ (A ∩ B) , with B \ A and (A ∩ B)


disjoint, then

P(B) = P(B \ A) + P(A ∩ B)

hence the result


Probability. Properties

5) For any or any A, B ⊆ Ω

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Since A ∪ B = A ∪ (B \ A) and A and B \ A are disjoint,

P(A ∪ B) = P(A) + P(B \ A) = P(A) + P(B) − P(A ∩ B)

using 4)
Inclusion-Exclusion Formulae (Poincaré’s Identities)

(Ω, P) a discrete probability space.


For any n ≥ 1 and for any choice of sets (events) A1 , . . . An ⊆ Ω
n
X X
P(A1 ∪ . . . ∪ An ) = (−)k−1 P(Ai1 ∩ . . . ∩ Aik )
k=1 1≤i1 <...<ik ≤n
n
X X
P(A1 ∩ . . . ∩ An ) = (−)k−1 P(Ai1 ∪ . . . ∪ Aik )
k=1 1≤i1 <...<ik ≤n

They can be proven by induction


They are actually valid on any probability space (finite, countable
or uncountable)
Example

For example for n = 4, calling the events A1 = A, A2 = B,


A3 = C , A4 = D

P(A ∪ B ∪ C ∪ D) = P(A) + P(B) + P(C ) + P(D)


−P(A ∩ B) − P(A ∩ C ) − P(A ∩ D)
−P(B ∩ C ) − P(B ∩ D) − P(C ∩ D)
+P(A ∩ B ∩ C ) + P(A ∩ B ∩ D)
+P(A ∩ C ∩ D) + P(B ∩ C ∩ D)
−P(A ∩ B ∩ C ∩ D)

Alternating sum of the probabilities of each event (4 terms), each


possible pair of events (6), each possible triple of events (4), each
possible quadruple (1)
When to use the Poincaré formula

Very often if one has to compute the probability that at least one
event occurs or the probability that no event occurs, the Poincaré
formula is most useful

Indeed, let Ai be i = 1, . . . , n n events


P(A1 ∪ . . . ∪ An ) is the probability that at least one of the n
events occurs

P(Ā1 ∩ . . . ∩ Ān ) is the probability that none occurs

By the Poincaré formulae, these probabilities can be written in


terms of probability involving fewer events, which are often easier
to compute
Probability of events and outcomes
(Ω, P) discrete probability space, A = {ωi1 , . . . , ωik } a compound
event, then its probability is
k
X
P(A) = P(ωij )
j=1
P
with the restriction 1 = P(Ω) = ω∈Ω P(ω)
Example

Ω = {ω1 , ω2 , ω3 , ω4 } P(ωi ) = p, ∀i
A1 = {ω1 , ω2 }, A2 = {ω1 , ω3 }, A3 = {ω1 , ω4 }

Find P(Ai ), P(Ai ∩ Aj ), P(A1 ∩ A2 ∩ A3 ), P(Ai ∪ Aj ) etc.


Probability of events and outcomes
(Ω, P) discrete probability space, A = {ωi1 , . . . , ωik } a compound
event, then its probability is
k
X
P(A) = P(ωij )
j=1
P
with the restriction 1 = P(Ω) = ω∈Ω P(ω)
Example

Ω = {ω1 , ω2 , ω3 , ω4 } P(ωi ) = p, ∀i
A1 = {ω1 , ω2 }, A2 = {ω1 , ω3 }, A3 = {ω1 , ω4 }

Find P(Ai ), P(Ai ∩ Aj ), P(A1 ∩ A2 ∩ A3 ), P(Ai ∪ Aj ) etc.

P(ωi ) = 1/4, P(Ai ) = 2/4, P(Ai ∩ Aj ) = 1/4 = P(A1 ∩ A2 ∩ A3 )


P(Ai ∪ Aj ) = 3/4, P(A1 ∪ A2 ∪ A3 ) = P(Ω) = 1
Uniform Probability Space
A discrete probability space (Ω, P) which is finite (that is, Ω is
finite) and such that the outcomes ωi ∈ Ω are equiprobable is
called a uniform probability space.
In this case, then
P(ωi ) = 1/|Ω|
The previous page example is such a space
More generally for any event E (a subset of Ω)

|E |
P(E ) =
|Ω|

The probability of an event is the ratio of the number of cases that


are favorable to it, to the number of possible cases, when there is
nothing to make us believe that one case should occur rather than
any other [Laplace]

Thus the problem of computing a probability of an event becomes


the problem of counting the number of its elements
Equiprobability
It is necessary that all points in the sample space be equiprobable
for computing probability via simple counting P(E ) = |E |/|Ω|
Suppose we want to compute the probability of getting a three by
summing the numbers that turn up on tossing two dice

E = the sum of two throws is 3


Since the sum is symmetric, we can think of using the following
sample space, where [i, j] is the unordered couple, with i the
number for one of the dice and j for the other
 
[1, 1] [1, 2] [1, 3] [1, 4] [1, 5] [1, 6]

 [2, 2] [2, 3] [2, 4] [2, 5] [2, 6]  
 [3, 3] [3, 4] [3, 5] [3, 6] 
Ω=  
 [4, 4] [4, 5] [4, 6] 

 [5, 5] [5, 6] 
[6, 6]
Equiprobability
There is nothing wrong in using this sample space, however these
simple events are not equally probable For example,

p([1, 1]) = 1/36


p([1, 2]) = p((1, 2) ∪ (2, 1)) = p((1, 2)) + p((2, 1)) = 2/36,

[i, j] unordered couple, (i, j) ordered couple


[i, j] one die shows the i-th face the other the j-th face
(i, j) the first die shows i the second j, if we throw them one after
the other or, if you toss them at the same time, just color the dice
differently: the red die shows i, the blue die j

[i, j] = {(i, j), (j, i)}

Thus we may not compute the probability by simple counting the


number of the points in the space that are favorable to the event
(one point [1, 2]) and divide by the size of the space 21 We would
get 1/21, instead of the correct probability which is 2/36
Equiprobability
Instead we can use the following equiprobable sample space
 
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) 
 
 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) 
Ω=  (4, 1)

 (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) 
 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

whose |Ω| = 36 outcomes are all equiprobable P((i, j)) = 1/36,


thus |E = {i + j = 3}| = 2, P(E ) = |E |/|Ω| = 2/36

More generally (exercise) the probability of rolling a sum of k, with


two dice is
6 − |7 − k|
P({i + j = k}) = , k = 2, . . . , 12
36
Combinatorics. Multiplication Rule

First rule of counting. Multiplication Rule

If an object is formed by making a succession of choices such that


there are n1 possibilities for the first choice, n2 possibilities for the
second (after the first choice is made) etc. then the total number
of objects that can be made by making a set of choices is

|E | = n1 · n2 · · ·

For the rule to apply, the number of available possibilities of each


choice must be the same irrespective of which choice was made
previously (ni for the i-th choice, which may be different from nj ).
However the set of available possibilities may differ and depend on
the choice made at the previous stages
Problem

In how many ways can a woman that has 8 skirts, 4 pair of shoes
and 10 shirts be dressed?
Problem

In how many ways can a woman that has 8 skirts, 4 pair of shoes
and 10 shirts be dressed?

8 · 4 · 10 = 320

You may find it easier to represent this rule of counting using a


tree diagram
Problem
How many ways can we roll three dice? In how many ways can
three dice appear when they are rolled? How many possible
numbers we get by rolling 3 dice (the order counts)?
Problem
How many ways can we roll three dice? In how many ways can
three dice appear when they are rolled? How many possible
numbers we get by rolling 3 dice (the order counts)?

We have a succession of three choices (one per die) and each die
represents a multiple choice of six possibilities ni = 6, thus there
are
6 · 6 · 6 = 216
possible rolls

This is obvious if you think of tossing the dice one after the other
rather than simultaneously (although it does not matter) The set
of choices at each throw of the dice in this case does not change
based on what happens at the previous stage (it is always one of
the numbers {1, 2, 3, 4, 5, 6} (and thus the number does not
change which is only thing that matters for the multiplicative rule)
Problem

In how many ways can two dice show different faces?


Problem

In how many ways can two dice show different faces?

6·5

In this case there are 6 available possibilities for the first choice.
Five different possibilities for the second die. Which five depends
on the first choice (e.g. if 3 shows up on the first roll, the set of
available possibilities is {1, 2, 4, 5, 6}, if 2 shows up, such set is
{1, 3, 4, 5, 6}), but there are always 5 possibilities for the second
die, so the first rule of counting as formulated above still applies
Problem

How many different calendars are possible for a year?


Problem

How many different calendars are possible for a year?

7 · 2 = 14

Each year starts on one of the seven days (Sunday, Monday, ...,
Saturday). Each year is either a leap year (i.e., it includes February
29) or not
Standard Ways of Counting and Basic Probabilistic Models

We are going to see now two standard ways of counting which can
be used in the majority of combinatorial problems

I Sampling Methods

I Allocation Methods
Basic Probabilistic Models. Sampling Model

Consider a population of n individuals (people, cards, numbered


balls), i.e. an aggregate of n distinguishable elements without
regard to their order (think of an urn containing n numbered balls)

Choose an individual from the population (a sample of size 1)


(that is, draw one ball from the urn)
How many ways can we do this?
Basic Probabilistic Models. Sampling Model

Consider a population of n individuals (people, cards, numbered


balls), i.e. an aggregate of n distinguishable elements without
regard to their order (think of an urn containing n numbered balls)

Choose an individual from the population (a sample of size 1)


(that is, draw one ball from the urn)
How many ways can we do this?
The answer is n

Choose k individuals successively, one at a time


How many ways can we do this?
Basic Probabilistic Models. Sampling Model

Consider a population of n individuals (people, cards, numbered


balls), i.e. an aggregate of n distinguishable elements without
regard to their order (think of an urn containing n numbered balls)

Choose an individual from the population (a sample of size 1)


(that is, draw one ball from the urn)
How many ways can we do this?
The answer is n

Choose k individuals successively, one at a time


How many ways can we do this?
It depends
Basic Probabilistic Models. Sampling Model
It depends on whether or not a chosen individual is returned to the
population before another is chosen (that is, on whether or not we
put the ball we have drawn back into the urn)

It also depends on whether the sample of size k is ordered or not


(namely, ordering means that the same individuals chosen in a
different order are considered to be a different sample- drawing 1
before 2 is considered different from drawing 2 before 1)

The two kinds of sampling models are called


sampling with replacement
we put the drawn ball back into the urn or we replace the ball
we have drawn in the urn
sampling without replacement
we do not replace the drawn balls
In addition we need to consider the sample to be ordered or
un-ordered
Basic Probabilistic Models

Thus we will consider the following 2 · 2 = 4 cases

I Sampling with replacement and with ordering

I Sampling without replacement and with ordering

I Sampling without replacement and without ordering

I Sampling with replacement and without ordering


Sampling with replacement and with ordering
I. Sampling with replacement and with ordering
The first individual is from a population with n individuals

n1 = n

We return the individual to the population, thus the set (and


hence the number) of possibilities for the choice of a second
individual is the same as for the first

n2 = n

Thus repeating the argument, using the multiplication rule (first


rule), we find that there are

n1 · · · nk = nk

ways to draw an (ordered) sample of size k from a population of n


individuals
Sampling with replacement and with ordering
nk counts the number of different ordered k-tuple from a
population of size n

Ordering means, e.g., that (1, 2, 1, 3) is considered different from


(1, 2, 3, 1), (2, 1, 3, 1), etc.

Put it differently, the sample space Ω is the set of ordered k-tuples

Ω = {(a1 , a2 , . . . , ak ), ai = 1, ..., n}

Let us write down explicitly the number of ordered samples of size


k = 2 that can be obtained from n = 3 objects, if an object may
appear more than once (sampling with replacement)
Sampling with replacement and with ordering
nk counts the number of different ordered k-tuple from a
population of size n

Ordering means, e.g., that (1, 2, 1, 3) is considered different from


(1, 2, 3, 1), (2, 1, 3, 1), etc.

Put it differently, the sample space Ω is the set of ordered k-tuples

Ω = {(a1 , a2 , . . . , ak ), ai = 1, ..., n}

Let us write down explicitly the number of ordered samples of size


k = 2 that can be obtained from n = 3 objects, if an object may
appear more than once (sampling with replacement)

(1, 1), (1, 2), (1, 3)


(2, 1), (2, 2), (2, 3)
(3, 1), (3, 2), (3, 3)
Sampling without replacement and with ordering
II. Sampling without replacement and with ordering
The first individual is chosen from a population with n individuals

n1 = n
Sampling without replacement and with ordering
II. Sampling without replacement and with ordering
The first individual is chosen from a population with n individuals

n1 = n

We do not return the individual to the population (we do not


replace it), so now the population contains n − 1 individuals. The
number of possibilities for the choice of a second individual is

n2 = n − 1

Thus using the multiplication rule, there are


n!
n · (n − 1) · · · (n − k + 1) = (n)k ≡ ≡ Pk,n
(n − k)!

ways of drawing an ordered sample of size k from a population of


n individuals without replacement
(n)k is called lower or falling factorial
Sampling without replacement and with ordering

Notice that in this case the k choices are not independent, as the
earlier individuals we drew affect the set of possibilities of later
individuals, but the number of possibilities of a given later draw of
an individual is not affected by earlier draws, so the first rule still
applies
Sampling without replacement and with ordering

In this case (sampling without replacement and with ordering) the


sample space is the set of ordered k-tuples with the constraint that
no two elements can be the same (known also as k-permutations
or ordered k-sets)

Ω = {(a1 , a2 , . . . , ak ), ai = 1, . . . , n, ai 6= aj if i 6= j}

Let us write down explicitly the number of ordered pairs (k = 2)


that can be obtained from a populations of n = 3 objects, if an
object may not appear more than once (sampling without
replacement)
Sampling without replacement and with ordering

In this case (sampling without replacement and with ordering) the


sample space is the set of ordered k-tuples with the constraint that
no two elements can be the same (known also as k-permutations
or ordered k-sets)

Ω = {(a1 , a2 , . . . , ak ), ai = 1, . . . , n, ai 6= aj if i 6= j}

Let us write down explicitly the number of ordered pairs (k = 2)


that can be obtained from a populations of n = 3 objects, if an
object may not appear more than once (sampling without
replacement)
(1, 2) (1, 3)
(2, 1) (2, 3)
(3, 1) (3, 2)
Some problems
I How many ways are there for 2 persons to sit in 7 chairs that
are in a row?
Some problems
I How many ways are there for 2 persons to sit in 7 chairs that
are in a row?
7 · 6 = (7)2
Sampling with ordering (it matters in a theater, say, where
you sit) and replacement (two people may not sit on the same
chair)

I How many ways are there to make a 3-letter word (a string of


3 letters) using a 26-letter alphabet?
Some problems
I How many ways are there for 2 persons to sit in 7 chairs that
are in a row?
7 · 6 = (7)2
Sampling with ordering (it matters in a theater, say, where
you sit) and replacement (two people may not sit on the same
chair)

I How many ways are there to make a 3-letter word (a string of


3 letters) using a 26-letter alphabet?
263

I How many ways are there to form a 3-letter word if the letters
must be different?
Some problems
I How many ways are there for 2 persons to sit in 7 chairs that
are in a row?
7 · 6 = (7)2
Sampling with ordering (it matters in a theater, say, where
you sit) and replacement (two people may not sit on the same
chair)

I How many ways are there to make a 3-letter word (a string of


3 letters) using a 26-letter alphabet?
263

I How many ways are there to form a 3-letter word if the letters
must be different?
26 · 25 · 24 = (26)3
Sampling with ordering

Drawing an ordered sample of size k from an urn containing n balls


ai label of the ball selected at i-th step ai = 1, . . . , n

Type sample space Ω |Ω|


(a1 , a2 , . . . , ak ) ordered k-tuple

with replacement Ω = {(a1 , a2 , . . . , ak ) : ai = 1, . . . , n} nk

without replacement Ω = {(a1 , a2 , . . . , ak ) : ai = 1, . . . , n, ai 6= aj if i 6= j} n(n − 1) · · · (n − k + 1)


n≥k (set of k-permutations of an n-set) n!
= (n−k)! ≡ (n)k ≡ Pk,n

For example, if n = 3, k = 2, the detailed sample spaces are:

with replacement without replacement

(1, 1), (1, 2), (1, 3) (1, 2), (1, 3)


(2, 1), (2, 2), (2, 3) (2, 1), (2, 3)
(3, 1), (3, 2), (3, 3) (3, 1), (3, 2)
Permutations

As we have seen (n)k ≡ Pk,n ≡ n!/(n − k)! denotes the number of


ordered samples of size k drawn from a population of n individuals
without replacement
A sample of size n (i.e. k = n) includes therefore the whole
population and represents a re-ordering or a shuffling of its
elements, generally referred to as an n-permutation or simply a
permutation

The number of different permutations of n elements is then

(n)n = n!
Some problems

I How many different words (i.e. strings of letters) can be


obtained from the letters BEIRUT (using all letters)?
Some problems

I How many different words (i.e. strings of letters) can be


obtained from the letters BEIRUT (using all letters)?

6!

I How many different words (i.e. strings of letters) can be


obtained from the letters BAALBEK (using all letters)?
Some problems

I How many different words (i.e. strings of letters) can be


obtained from the letters BEIRUT (using all letters)?

6!

I How many different words (i.e. strings of letters) can be


obtained from the letters BAALBEK (using all letters)?

7!
2!2!
shuffling the two A’s and the two B’s does not change the
word
Sampling without Ordering

We now consider the case in which the order of the elements in the
sample is irrelevant/disregarded
In this case too we have to distinguish between sampling with or
without replacement

A word on terminology: we will always explicitly specify whether the elements in the samples are ordered or not,

unless it is clear from the context. Sometimes however the word sample is used to refer to what we have called an

ordered sample and the word population is used to refer to an aggregate of elements without regard to their order.

We have indeed used population in this sense, but when we consider sampling from a population, we will mostly

talk about unordered samples of size k instead of sub-populations of size k. In some books, you may find that the

problem of counting the number of possible unordered samples of size k from a population of size n is referred to

as counting the number of subpopulations of size k of a given population of size n


Sampling without replacement and without ordering

III. Sampling without replacement and without ordering


Let’s start by constructing explicitly the sample space of unordered
k-tuple sampled without replacement
n = 4, k = 2, {1, 2, 3, 4} the individuals in the population
The possible unordered samples of size k = 2 obtained without
replacement are
Sampling without replacement and without ordering

III. Sampling without replacement and without ordering


Let’s start by constructing explicitly the sample space of unordered
k-tuple sampled without replacement
n = 4, k = 2, {1, 2, 3, 4} the individuals in the population
The possible unordered samples of size k = 2 obtained without
replacement are

Ω = {[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]}

because we do not count as different the samples in which the


individual 1 has been drawn before the individual 2 or after it
Notice however that while the order of the sampling is irrelevant,
different individuals make a different sample (the individuals are
distinguishable/labelled/numbered) so [1, 2] = [2, 1] 6= [1, 3]
Sampling without replacement and without ordering

For general n and k

Ω = {[a1 , . . . , ak ] : ai = 1, . . . , n, ai 6= aj , i 6= j}

where [a1 , . . . , ak ] indicates an unordered k-tuple whose elements


are a1 , . . . , ak

Since all the k individuals in a sample of size k are different (no


replacement) and the sample is unordered, we can actually
consider selecting all the individuals together (that is, drawing k
numbered balls all together or one at a time is the same thing)
Sampling without replacement and without ordering
How many elements are in Ω?

We can sample one individual at a time, or equivalently, draw k


individuals from the population

An ordered k-tuple from the population can be obtained from an


unordered one by numbering its elements

There are k! different ways of numbering k elements

Thus there are exactly k! times as many ordered samples of size k


as there are unordered samples of size k

Indeed we can obtain any ordered sample of size k by shuffling one


specific sample of size k (unordered sample) and since the
elements in the sample are not repeated (sampling without
replacement) there are k! ordered samples for each unordered one
(think of the BEIRUT example above, where k = 6)
Sampling without replacement and without ordering

Hence, the number of unordered samples of size k from a


population of size n when the sampling is carried out without
replacement is
 
Pk,n (n)k n! n
= = ≡ ≡ Ck,n
k! k! (n − k)!k! k

the binomial coefficient

Some properties of the binomial coefficient


    n  
n n X n
= , ak b n−k = (a + b)n
k n−k k
k=0
Exercises
I How many different committees of 3 people can be formed
from a group of 30 people?
Exercises
I How many different committees of 3 people can be formed
from a group of 30 people?
 
30 30!
≡ C3,30 =
3 27!3!

I A committee is made up of a president, a treasury and


vice-president. How many different committees can be formed
from a group of 30 people?
Exercises
I How many different committees of 3 people can be formed
from a group of 30 people?
 
30 30!
≡ C3,30 =
3 27!3!

I A committee is made up of a president, a treasury and


vice-president. How many different committees can be formed
from a group of 30 people?
 
30
3! = (30)3 = 30 · 29 · 28 = P3,30
3

choose three people, 30



3 , and then consider all possible
orderings (3!)
Sampling with replacement and without ordering

IV. Sampling with replacement and without ordering

Consider now sampling with replacement and do not distinguish


samples that differ only in the order of their elements

Explicit construction of the sample space


population {1, 2, 3, 4} (n = 4) the possible unordered samples of
size k = 2 are

Ω = {[1, 1], [2, 2], [3, 3], [4, 4], [1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]}

In this case again [1, 2] = [2, 1], thus it is counted once, and we
can have the same individual repeated in a sample (as many as k
times in a k-sample) because we replace the individual back in the
population before sampling next time
Sampling with replacement and without ordering. The
sample space

For general n and k, the sample space can be written as

Ω = {[a1 , . . . , ak ] : ai = 1, . . . , n, }

where [a1 , . . . , ak ] indicates an unordered k-tuple whose elements


are a1 , . . . , ak
Sampling with replacement and without ordering. The
sample space

What is |Ω|?

The number of unordered samples of size k sampled with


replacement from a population of size n is
   
n−1+k (n − 1 + k)! n−1+k
|Ω| = = =
k k!(n − 1)! n−1

It can be proven by induction, but we will prove it later considering


a different (but equivalent) model
Exercise

A bakery has a promotion offering 6 doughnuts for the price of


four dollars. You want to take advantage of the offer. Considering
that the bakery sells 4 types of doughnuts, how many different
boxfuls of 6 doughnuts are possible for you to choose?
Exercise

A bakery has a promotion offering 6 doughnuts for the price of


four dollars. You want to take advantage of the offer. Considering
that the bakery sells 4 types of doughnuts, how many different
boxfuls of 6 doughnuts are possible for you to choose?
   
n−1+k 4−1+6
= = 84
k 6

Here we do not distinguish the same collection of 6 doughnuts


arranged in different orders in the box

Unordered sampling with replacement because we can (we must in


fact since k = 6 > n = 4) choose the same type of doughnut more
than once, but we are not distinguishing the same items in
different orders
Sampling without ordering

Drawing an unordered sample of size k from an urn containing n balls


ai label of the ball selected at i-th step ai = 1, . . . , n

Type sample space Ω |Ω|


[a1 , a2 , . . . , ak ] unordered k-tuple

   
n−1+k n−1+k
with replacement Ω = {[a1 , a2 , . . . , ak ] : ai = 1, . . . , n} k
= n−1

 
n n!
without replacement Ω = {[a1 , a2 , . . . , ak ] : ai = 1, . . . , n, ai 6= aj if i 6= j} k
= k!(n−k)! ≡ Ck,n
n≥k (set of combinations)

For example, if n = 3, k = 2, the detailed sample spaces are:

with replacement without replacement

[1, 1], [1, 2], [1, 3] [1, 2], [1, 3], [2, 3]
[2, 2], [2, 3], [3, 3]
Some Problems

Six letters are selected at random one after another from the
English alphabet (26 letters) with replacement
Find the probabilities that: a) the word formed is made up of
vowels (6, if we count Y as vowel); b) it is the word BEIRUT
Some Problems

Six letters are selected at random one after another from the
English alphabet (26 letters) with replacement
Find the probabilities that: a) the word formed is made up of
vowels (6, if we count Y as vowel); b) it is the word BEIRUT
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6
Some Problems

Six letters are selected at random one after another from the
English alphabet (26 letters) with replacement
Find the probabilities that: a) the word formed is made up of
vowels (6, if we count Y as vowel); b) it is the word BEIRUT
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6

b) |Ω| = 266 , |E | = 1, p(E ) = (1/26)6

Do the same if the sampling is without replacement


Some Problems

Six letters are selected at random one after another from the
English alphabet (26 letters) with replacement
Find the probabilities that: a) the word formed is made up of
vowels (6, if we count Y as vowel); b) it is the word BEIRUT
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6

b) |Ω| = 266 , |E | = 1, p(E ) = (1/26)6

Do the same if the sampling is without replacement


26

a) |Ω| = (26)6 , |E | = (6)6 , p(E ) = 6!/(26)6 = 1/ 6
Some Problems

Six letters are selected at random one after another from the
English alphabet (26 letters) with replacement
Find the probabilities that: a) the word formed is made up of
vowels (6, if we count Y as vowel); b) it is the word BEIRUT
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6

b) |Ω| = 266 , |E | = 1, p(E ) = (1/26)6

Do the same if the sampling is without replacement


26

a) |Ω| = (26)6 , |E | = (6)6 , p(E ) = 6!/(26)6 = 1/ 6

b) |Ω| = (26)6 , |E | = 1, p(E ) = 1/(26)6


Problem
What is the probability for any fixed element of a population of
size n to be included in a random ordered sample of size k when
the sampling is without replacement?
(For example, the population is a set of labelled balls (1, . . . , n),
we want to find the probability that the ball numbered 3, say, is
drawn from it when we randomly draw k balls without
replacement, considering the sample to be ordered)
Problem
What is the probability for any fixed element of a population of
size n to be included in a random ordered sample of size k when
the sampling is without replacement?
(For example, the population is a set of labelled balls (1, . . . , n),
we want to find the probability that the ball numbered 3, say, is
drawn from it when we randomly draw k balls without
replacement, considering the sample to be ordered)

|Ω| = (n)k
|E | = k(n − 1)k−1

(the element of interest-the fixed element- can be in any of k


positions in the sample, and the other k − 1 elements composing
the ordered sample should come from the remaining population of
size n − 1)
k(n − 1)k−1 (n − 1)! (n − k)! k
P(E ) = =k =
(n)k (n − k)! n! n
Problem. Continued
Equivalently,
    
1 n−1 n−1
|E | = k! = (k − 1)!k = (n − 1)k−1 k
1 k −1 k −1

choose the element of interest (there is only 1 choice), choose the


other (k − 1) elements out of the (n − 1) remaining in the
n−1

population, k−1 , to complete the sample, and shuffle (k!)

Alternatively, let us go to the complement


Problem. Continued
Equivalently,
    
1 n−1 n−1
|E | = k! = (k − 1)!k = (n − 1)k−1 k
1 k −1 k −1

choose the element of interest (there is only 1 choice), choose the


other (k − 1) elements out of the (n − 1) remaining in the
n−1

population, k−1 , to complete the sample, and shuffle (k!)

Alternatively, let us go to the complement


E =the element is included in the ordered sample
Ē =the element is not included in the ordered sample

(n − 1)k n−k k
P(E ) = 1 − P(Ē ) = 1 − =1− =
(n)k n n
Problem

What is the probability that any fixed element of a population of


size n to be included at least once in a random ordered sample of
size k?
Problem

What is the probability that any fixed element of a population of


size n to be included at least once in a random ordered sample of
size k?

At least once implies the sampling is with replacement


E = ”the element is included in the ordered sample at least once”
Ē = ”the element is not included in the ordered sample”

(n − 1)k 1 k
 
P(E ) = 1 − P(Ē ) = 1 − =1− 1−
nk n
Standard Ways of Counting and Basic Probabilistic Models

We are going to see the second standard way of counting

I Sampling Model

I Allocation Model

We will also establish a mapping between the two models


Occupancy Models
k objects (balls/particles) are placed in n cells (boxes/urns)

I In general, cells can be distinguishable or indistinguishable, but


we restrict ourselves to the distinguishable case in this course

I Numbers of objects in the cells are calledPoccupation numbers


ki = number of objects in the i-th cell ( ni=1 ki = k)
We consider 2 cases
cells can contain at most one object (ki = 0, 1, ∀i)
cells can contain any number of objects (ki = 0, 1, . . . , k, ∀i)

I Objects can be both distinguishable and indistinguishable (2


possibilities)

I The order of the objects within a cell is immaterial in this


course. We only care of which objects are in which cell, not
the order of the objects in a cell (if the objects are identical,
this is always the case)
Sampling Models ⇔ Occupancy Models. A Dictionary
A dictionary to translate from the sampling models to the
occupancy models and the other way round

population with n individuals n distinguishable cells/boxes


sample of size k k particles/balls/objects

sampling with replacement occupation numbers


ki = 0, 1, . . . , k, i = 1, . . . , n

sampling without replacement occupation numbers


ki = 0, 1, i = 1, . . . , n

ordered samples distinguishable objects


(position in the sample) (label of object)

unordered samples indistinguishable objects


Indistinguishable objects in n (distinguishable) cells
The number of possible placements of k indistinguishable objects
in n (distinguishable) cells that can contain any number of objects
(the total number of ways of sampling with replacement an
unordered sample of size k from a population of size n) is
 
n−1+k
k

Proof. The n cells can be thought of as being made up of n + 1


separators. Representing a separator with a bar | and an object
with ?, a general assignment of k objects is represented by a
symbol that ends and begins with a bar
(e.g., k = 6 and n = 5, | ? ? ? | ? ||| ? ?| is the occupation model
defined by (k1 , . . . , kn ) = (3, 1, 0, 0, 2)).
The internal n − 1 separators and k balls can appear in any
arbitrary order, thus the number of distinguishable placement
equals the number of ways of selecting/choosing k places out of
n−1+k
Indistinguishable objects in n (distinguishable) cells

One could have obtained the same results algebraically, by


determining the number of ordered sets of n non-negative integers
that sum to k

(k1 , . . . , kn ), ki ≥ 0
k1 + k2 + . . . + kn = k

because with indistinguishable objects, two distributions are


distinguishable only if the corresponding n-tuples are not identical

The occupancy numbers are ordered since the cells are


distinguishable. [If we were to assume the cells to be
indistinguishable the order among the occupancy numbers would
be disregarded]
Summary
Drawing a sample of size k from an urn containing n balls
ai label of the ball selected at i-th step ai = 1, . . . , n

 
n+k−1
with replacement nk k
 
n! n
without replacement (n)k = (n−k)! k
ai 6= aj , i 6= j
n≥k

ordered sample unordered sample


(a1 , . . . , ak ) [a1 , . . . , ak ]

Allocating/distributing k particles (objects) into n distinguishable cells (boxes/arrangements)


P
(k! , . . . , kn ), ki number of objects assigned to the i-th cell i ki = k

 
n+k−1
without exclusion nk k
ki = 0, . . . , k
 
n! n
with exclusion (n)k = (n−k)! k
ki = 0, 1
(⇒ n ≥ k)
distinguishable objects indistinguishable objects
order within the cell immaterial
Exercise
Let f (x1 , . . . , x12 ) be an analytic function of 12 variables. How
many partial derivatives of order 5 exist?
Exercise
Let f (x1 , . . . , x12 ) be an analytic function of 12 variables. How
many partial derivatives of order 5 exist?
Partial derivatives do not depend on the order of differentiations
Exercise
Let f (x1 , . . . , x12 ) be an analytic function of 12 variables. How
many partial derivatives of order 5 exist?
Partial derivatives do not depend on the order of differentiations

n = 12, k = 5  
12 − 1 + 5
= 4368
5

Think of an urn with balls labelled x1 , . . . , x12 . Drawing the


unordered sample [x3 , x2 , x6 , x6 , x10 ] would correspond to

∂5
f (x1 , x2 , . . . , x12 )
∂x2 ∂x3 ∂ 2 x6 ∂x10
or think of 12 boxes and k = 5 identical balls to be placed, with
each ball allocation in a box determining with respect to which
variable one derivative is taken
Exercise-Summer 2017

Fifteen railway trucks are to be arranged into five different sidings.


How many different arrangements of the trucks into the sidings are
possible so that exactly one siding is empty [arrangements are
considered different only based on the number of trucks in the
sidings]?
Exercise-Summer 2017

Fifteen railway trucks are to be arranged into five different sidings.


How many different arrangements of the trucks into the sidings are
possible so that exactly one siding is empty [arrangements are
considered different only based on the number of trucks in the
sidings]?

n = 5 (boxes=slidings), k = 15

The following counts all possibile arrangements (including those


with more than one siding unoccupied by a truck), so it is larger
than the number we wish to determine
   
n−1+k 5 − 1 + 15
= = 3876 (wrong answer)
k 15
Exercise-Summer 2017-Continued
The number of arrangements we are after instead is obtained
breaking down the problem:
I which siding is empty? choose the empty sliding:
Exercise-Summer 2017-Continued
The number of arrangements we are after instead is obtained
breaking down the problem:
I which siding is empty? choose the empty sliding:
5

1 available choices
I place one truck in each non empty siding:
Exercise-Summer 2017-Continued
The number of arrangements we are after instead is obtained
breaking down the problem:
I which siding is empty? choose the empty sliding:
5

1 available choices
I place one truck in each non empty siding:
there is one way to do this, as the trucks are indistinguishable
in this problem
I place the unassigned trucks k̃ = k − 4 = 11 into the four non
empty sidings ñ = 5 − 1 = 4 in all possible ways
Exercise-Summer 2017-Continued
The number of arrangements we are after instead is obtained
breaking down the problem:
I which siding is empty? choose the empty sliding:
5

1 available choices
I place one truck in each non empty siding:
there is one way to do this, as the trucks are indistinguishable
in this problem
I place the unassigned trucks k̃ = k − 4 = 11 into the four non
empty sidings ñ = 5 − 1 = 4 in all possible ways
   
ñ − 1 + k̃ 4 − 1 + 11
=
k̃ 11

I apply first rule of counting


   
5 4 − 1 + 11
· = 1820
1 11
Multinomial Coefficients
Consider k distinguishable objects (from a population). In how
many possible ways can we distribute the objects to n groups of
given size k1 , . . . , kn , with ki > 0, k1 + . . . + kn = k (assume first
the groups to be distinct, e.g. numbered)?
Equivalently, how many ways can a population of k elements be
divided into n ordered parts of which the first contains k1 elements,
the second k2 , etc. [The order within the group is ignored]

Choose k1 objects out of k (with no attention to the order)


Multinomial Coefficients
Consider k distinguishable objects (from a population). In how
many possible ways can we distribute the objects to n groups of
given size k1 , . . . , kn , with ki > 0, k1 + . . . + kn = k (assume first
the groups to be distinct, e.g. numbered)?
Equivalently, how many ways can a population of k elements be
divided into n ordered parts of which the first contains k1 elements,
the second k2 , etc. [The order within the group is ignored]

Choose k1 objects out of k (with no attention to the order)


 
k
k1

Choose k2 objects out of the remaining elements


Multinomial Coefficients
Consider k distinguishable objects (from a population). In how
many possible ways can we distribute the objects to n groups of
given size k1 , . . . , kn , with ki > 0, k1 + . . . + kn = k (assume first
the groups to be distinct, e.g. numbered)?
Equivalently, how many ways can a population of k elements be
divided into n ordered parts of which the first contains k1 elements,
the second k2 , etc. [The order within the group is ignored]

Choose k1 objects out of k (with no attention to the order)


 
k
k1

Choose k2 objects out of the remaining elements


 
k − k1
k2
Multinomial Coefficients
After forming the (n − 1)-st group there remain
k − k1 − . . . − kn1 = kn elements that form the last group
The total number of ways is thus (multiplication rule)
       
k k − k1 k − k1 − . . . − kn−2 kn
· ··· ·
k1 k2 kn−1 kn
k!
=
k1 !k2 ! · · · kn !
k!
= n
Πi=1 ki !
 
k

k1 , k2 , . . . , kn
 
k1 + k2 + . . . kn
=
k1 , k2 , . . . , kn

As a notation above Πni=1 ai = a1 · a2 · · · an


Problems
I What is the number of ways in which a class of 11 students
can be split into 3 subgroups of sizes 4,4,3, which will be
assigned different problems to work on?
Problems
I What is the number of ways in which a class of 11 students
can be split into 3 subgroups of sizes 4,4,3, which will be
assigned different problems to work on?
 
11 11!
= = 11500
4, 4, 3 4!4!3!
This is what the multinomial coefficient counts. The groups
are distinct (labelled by the problem), the objects are distinct
(students)

I What if the groups are instead indistinguishable (the students


are assigned the same problem)?
Problems
I What is the number of ways in which a class of 11 students
can be split into 3 subgroups of sizes 4,4,3, which will be
assigned different problems to work on?
 
11 11!
= = 11500
4, 4, 3 4!4!3!
This is what the multinomial coefficient counts. The groups
are distinct (labelled by the problem), the objects are distinct
(students)

I What if the groups are instead indistinguishable (the students


are assigned the same problem)?
 
11 1 11! 1
= = 5750
4, 4, 3 2 4!4!3! 2
Multinomial Coefficients

Suppose that we want to split 4 distinguishable objects {1, 2, 3, 4}


into three groups of size (2, 1, 1) The multinomial coefficient
counts the following 4!/2! = 12 groups

(1, 2|3|4) (1, 3|2|4) (1, 4|2|3) (2, 3|1|4) (2, 4|1|3) (3, 4|1|2)
(1, 2|4|3) (1, 3|4|2) (1, 4|3|2) (2, 3|4|1) (2, 4|3|1) (3, 4|2|1)

However, if the groups (of the same size) are indistinguishable then
the second line is the same as the first, so the total number of
ways of splittings is 12/2! = 6
Multinomial Coefficients

To sum up, the multinomial coefficient


  n
k X
, ki = k
k1 , k2 , . . . , kn
i=1

counts the number of possible ways of grouping k distinguishable


objects into n distinguishable groups of given sizes k1 , . . . , kn , such
that however the order of the objects inside each group is irrelevant

(Put it in a different language, it counts the number of possible


ordered partitions of k distinguishable objects into n disjoint
subsets of sizes k1 , . . . , kn )
Multinomial Coefficients

If the groups of given size k1 , . . . , kn are indistinguishable, the total


number of ways of grouping k distinguishable objects is instead
 
k 1
k1 , k2 , . . . , kn b1 !b2 ! · · · bk !
Pk
where bi ≥ 0 is the number of groups of size i (k = i=1 ibi )

the number of groups of the same size is the only thing that
matters since the groups are indistinguishable
two groups of different sizes are distinguishable in view of their
different sizes

Of course, if only some groups are indistinguishable the above


expression should be modified accordingly
Multinomial Coefficients

Three observations:
1 If we allow ki to be zero (the multinomial coefficient does
make sense since 0! = 1) and we sum
k
X k!
k1 ! · · · kn !
k1 ,...,kn =0
k1 +...+kn =k

we obtain exactly the number which we expect to obtain


which number?
Multinomial Coefficients

Three observations:
1 If we allow ki to be zero (the multinomial coefficient does
make sense since 0! = 1) and we sum
k
X k!
k1 ! · · · kn !
k1 ,...,kn =0
k1 +...+kn =k

we obtain exactly the number which we expect to obtain


which number? nk
Multinomial Coefficients

2 the multinomial coefficient


k!
k1 !k2 ! · · · kn !
counts also the number of different k-letter words from the
n-word alphabet (α1 , . . . , αn ) in which the letter α1 appears
exactly k1 times,..., αn appears kn times

The letters of the alphabet are the (distinguishable) groups


and the objects are the characters in the string (e.g. the
second character in the string is the letter A means in the
group A there is the object labelled 2)

(Re-examine previous exercise on the number of distinct words


that can be formed with the letters BAALBEK)
Multinomial Coefficients
3 Notice also that when we obtained the multinomial coefficient
we said the k1 objects go to the first group, k2 to the second
etc.

If this is not specified one needs first to consider the


combinatorics associated with selecting the groups (which
must thus be distinguishable). For example,

in how many ways can 20 different objects be divided among


4 people so that three of them have 6 objects and one has 2?
Multinomial Coefficients
3 Notice also that when we obtained the multinomial coefficient
we said the k1 objects go to the first group, k2 to the second
etc.

If this is not specified one needs first to consider the


combinatorics associated with selecting the groups (which
must thus be distinguishable). For example,

in how many ways can 20 different objects be divided among


4 people so that three of them have 6 objects and one has 2?
 
4 20!
3 6!6!6!2!

where the first factor is associated with choosing which 3 of


the 4 people will have 6 objects (we need to choose together
groups-people here- that contain the same number of objects)
Birthday Problem

What is the probability that in a classroom with t students at least


2 have the same birthday?

Assume that there are 365 days in the year, and that the
probability that a student has the birthday in a given day is the
same for all days of the year

Let us try to map the problem in terms of allocation of balls to


boxes.
What are the boxes? What are the balls?

Hint: use the complement set

E = ”at least two people have the same birthday”


Ē =
Birthday Problem

What is the probability that in a classroom with t students at least


2 have the same birthday?

Assume that there are 365 days in the year, and that the
probability that a student has the birthday in a given day is the
same for all days of the year

Let us try to map the problem in terms of allocation of balls to


boxes.
What are the boxes? What are the balls?

Hint: use the complement set

E = ”at least two people have the same birthday”


Ē = ”no two people have same birthday
Birthday Problem
E = ”at least two people have the same birthday”
Ē = no two people have same birthday

P(E ) = 1 − P(Ē )
 
(365)t 365 t!
= 1− =1−
365t t 365t
365 364 365 − t + 1
= 1− · ···
365 365 365
n = 365 boxes (each labelled by a day of the year) and t
(distinguishable) balls (people are distinguishable)
|Ω| = 365t and |Ē | = (365)t since the t distinguishable balls may
not fall in the same box
Equivalently, to
365
 compute |Ē |, choose the t boxes that should be
occupied, t , then place the t balls, all possible ways of placing
the t balls in the t boxes being (t)t = t!
Birthday Problem
Notice that already when t = 23, the probability of having at least
two people with the same birthday is already greater than 1/2

Since we are interested in the event that no two people have the
same birthday, but we are not really interested in which people,
why can’t we consider the people indistinguishable?
If we did so, the points in the corresponding sample space would
not have the same probability, so we might not use the formula
based on the counting of the outcomes to compute the probability.
Namely, we are assuming that 1) a person can be born with the
same probability any day of the year, i.e. 1/365 ; 2) this probability
does not depend on another person’s birthday (the events are
independent- see later here for a more precise description)
This implies the denominator must be 365t = nt (distinguishable
balls)
Birthday Problem

Consider a simpler case t = 2 people and a year with n = 3 days

Considering the people as distinguishable (p1 , p2 ) the sample space


has the following nk = 32 = 9 outcomes
p1 p2 p1 p2 p1 p2

p1 p2 p1 p2 p1 p2

p2 p1 p2 p1 p2 p1

which are equiprobable, each with probability 1/3 · 1/3 = 1/9


(indeed, take a ball and throw it in the box, and assume it falls in
any box with the same probability, which is 1/3)
Birthday Problem
If we considered people as indistinguishable (p), one could consider
the corresponding sample space Ω̂ to be (since we can ignore the
label)
pp pp pp

p p p p p p

whose n−1+t

t = 6 elements do not all have the same probability:
the states in the second line have twice the probability as those in
the first line (2/9 vs 1/9)

This is because we are working under the assumption that a ball


can fall in a box with the same probability 1/n, with n the number
of boxes, and where a ball lands does not affect where another ball
will land.

When this is the case, always label the balls even if the problem
says the balls are identical
Sample spaces

Notice however that if we change assumptions (e.g. we do not


assume that each object can go with the same probability in any
cell independently of another), a sample space such as Ω̂ could
then also be equiprobable

See Example 1.8.6 in the book of DeGroot-Schervich (hand-out)


Problem
A collection of indistinguishable particles may occupy a set of
available discrete states, at thermodynamic equilibrium. More than
one particle may occupy any given state and all allocations are
equally probable. Consider 22 such particles (bosons) and 8 states.
Compute the probability that the number of states that are not
occupied is 3
Problem
A collection of indistinguishable particles may occupy a set of
available discrete states, at thermodynamic equilibrium. More than
one particle may occupy any given state and all allocations are
equally probable. Consider 22 such particles (bosons) and 8 states.
Compute the probability that the number of states that are not
occupied is 3

We can think of the n = 8 states as being the labelled boxes, and


of the k = 22 particles as being the indistinguishable balls. States
may be occupied by any number of particles

Thus, the total number of arrangements is |Ω| =


Problem
A collection of indistinguishable particles may occupy a set of
available discrete states, at thermodynamic equilibrium. More than
one particle may occupy any given state and all allocations are
equally probable. Consider 22 such particles (bosons) and 8 states.
Compute the probability that the number of states that are not
occupied is 3

We can think of the n = 8 states as being the labelled boxes, and


of the k = 22 particles as being the indistinguishable balls. States
may be occupied by any number of particles

Thus, the total number of arrangements is |Ω| = n+k−1 = 29


 
k 22

Since all arrangements/microstates are equiprobable (key


assumption), we can compute the probability of the event of
interest as
|E |
P(E ) =
|Ω|
Problem. Continued

E is the set of arrangements where three states are empty

|E | =
Problem. Continued

E is the set of arrangements where three states are empty


  
8 21
|E | =
3 17

Choose which states are empty: 83 possible ways




Fill the other ñ = 8 − 3 states with one particle in each state (this
can be done in one way only, since the particles are
indistinguishable)

Place the remaining k̃ = 22 − 5 = 17 particles: ñ−1+ k̃


k̃ 
ways
8 21
 
|E | 3 17
P(E ) = = 29 ≈ 0.214
|Ω| 22
Caution

Notice that  
n+k −1
k
counts all possible ways of allocating the k identical particles to
the n states, including those that leave states empty, which is why
we filled the states first with one particle

Notice also that this way of approaching the problem would not
work if the particles were distinguishable
Exercise
Consider 4 balls and three labelled boxes. If each ball has the same
probability of falling in any box and of doing so independently of
the others, what is the probability that one box is left empty?
Exercise
Consider 4 balls and three labelled boxes. If each ball has the same
probability of falling in any box and of doing so independently of
the others, what is the probability that one box is left empty?
Total number of ways of arranging k = 4 objects in n = 3 balls is
|Ω| = 34
that the balls are identical or not does not change anything in this case,
since they all have the same probability 1/3 of falling into any box and
they do so independently of each other (1/3)4 (see previous exercise
however when the assumptions are different)
For the numerator: choose the empty box: 31 ; place all the k = 4


balls in the other ñ = 2 boxes (there are 24 possible ways) but


remove the two configurations where the 4 balls are in one of these
two boxes (otherwise there will be an additional empty box)
3
 4
1 (2 − 2) 42
=
34 81
Exercise. Continued

The term 24 − 2 can be also seen by applying the Poincaré


identities
Let Ai be the event ”the i-th box is empty”

|Ā1 ∩ A¯2 |

is what we want. The indices of the boxes are irrelevant since we


have already counted how many ways there are to choose the
non-empty boxes

|Ā1 ∩ A¯2 | = |A1 ∪ A2 | = |Ω| − |A1 ∪ A2 |


= |Ω| − |A1 | − |A2 | + |A1 ∩ A2 |
= 24 − 1 − 1 + 0 = 24 − 2
Exercise. Continued
Equivalently, count all possible assignments. There are two
possibilities only:
a box is empty, a box contains 3 balls and a box contains 1
ball
a box is empty, and the remaining two boxes contain each two
balls
For all these cases, choose first which box contains how many
(choosing together boxes that contain the same number of balls)
then assign the balls to the boxes (multinomial coefficient)
3 2 4! 3 2 4!
   
1 1 3!1! + 1 2 2!2! 42
=
34 81
If you try to approach the problem as we did the previous problem,
placing one ball in each box that should be filled, and then
counting all possible ways to place the others you would have
over-counted
Caution

Care should be taken when considering the sample spaces and the
probability of its outcomes

It will become clearer with practice, but the following two slides
(taken from two textbooks) should give you further details on how
to assign probabilities to sample spaces
Ordered versus Unordered Samples

Sometimes the same collection of elements in different orders are treated as different samples, and sometimes the

same elements in different orders are treated as the same sample. ”In general, how can one tell which is the correct

way to count in a given problem? Sometimes, the problem description will make it clear which is needed. For

example, if we are asked to find the probability that the items in a sample arrive in a specified order, then we

cannot even specify the event of interest unless we treat different arrangements of the same items as different

outcomes. However, there are cases in which the problem description does not make it clear whether or not one

must count the same elements in different orders as different outcomes. Indeed, there are some problems that can

be solved correctly both ways... In general, this is the principle that should guide the choice of counting method. If

we have the choice between whether or not to count the same elements in different orders as different outcomes,

then we need to make our choice and be consistent throughout the problem. If we count the same elements in

different orders as different outcomes when counting the outcomes in the sample space” Ω, ”we must do the same

when counting the elements of the event E of interest. If we do not count them as different outcomes when

counting Ω, we should not count them as different when counting E” (De Groot)
Concerning ordering and sampling in practice

Sometimes ”one feels intuitively that the order within the sample should be irrelevant, and the beginner is therefore

prone to think of samples as not being ordered. But conclusions from a sample are possible only on the basis of

certain probabilistic assumptions, and for these it is necessary to have an appropriate model for the conceptual

experiment of obtaining a sample. Now such an experiment obviously involves choices that can be distinguished

from each other, meaning choices that are labelled in some way. For theoretical purposes it is simplest to use the

integers as labels, and this amounts to ordering the sample... In other words, even though the order within the

samples may be ultimately disregarded, the conceptual experiment involves ordered samples, and ... this affects the

appropriate assignment of probabilities.” (Feller)


Problem

A deck of 52 cards (13 values A, 2, . . . , 10, J, Q, K , for each of four


suits ♦, ♥, ♣, ♠) is shuffled thoroughly and the cards are then
distributed among four players so that each player receives 13
cards. What is the probability that each player will receive one ace?
Problem

A deck of 52 cards (13 values A, 2, . . . , 10, J, Q, K , for each of four


suits ♦, ♥, ♣, ♠) is shuffled thoroughly and the cards are then
distributed among four players so that each player receives 13
cards. What is the probability that each player will receive one ace?

The k = 52 cards are distinguishable. We can consider the n = 52


positions in which they are dealt as distinguishable (boxes)
|Ω| = (52)52 = 52! (all possible shufflings)
|E | = 134 4!48! (the ace must be received by a player among
his/her 13 cards-as first, second, or 13th card: 134 ; shuffle the
aces: 4!; and the non aces: 48!)

134 4!48!
p(E ) =
52!
Problem (continued). Solution 2

Now we can compute the same probability considering as sample


space Ω the positions in the 52-card deck that are occupied by the
four aces but not counting different arrangements of the four aces
in those four positions as different outcomes
 
52
|Ω| =
4

Of all these arrangements, we want to consider only those in which


an ace is among the 13 positions corresponding to the first player,
etc. That is, exactly 134 of the combinations in Ω satisfy the
requirement we seek, thus

134
p(E ) = 52

4
Problem (continued). Solution 3
Consider instead as sample space Ω the assignments of the 52
cards (distinguishable) to 4 (distinguishable) groups of equal size
 
52 52!
|Ω| = =
13, 13, 13, 13 13!4
(multinomial coefficient)
The groups of cards E we are interested in are those that contain
each an ace and 12 non aces
        
4 48 3 36 2 24 1 12 4!48!
|E | = =
1 12 1 12 1 12 1 12 12!4
(choose an ace and assign it to the first player, complete the set
choosing 12 cards from the 48 cards that are not aces, then do the
same for the other players using the remaining cards)

4!48!134
p(E ) =
52!
Hypergeometric
Consider an urn that contains N balls, K of which red and N − K
non-red. n balls are drawn (one after another without replacement
or at once) at random from the urn. What is the probability pk
that k red balls are drawn?

The total number of ways of choosing n balls out of N is:


Hypergeometric
Consider an urn that contains N balls, K of which red and N − K
non-red. n balls are drawn (one after another without replacement
or at once) at random from the urn. What is the probability pk
that k red balls are drawn?

The total number of ways of choosing n balls out of N is: Nn




The number of ways of choosing k red balls and n − k non-red is


Hypergeometric
Consider an urn that contains N balls, K of which red and N − K
non-red. n balls are drawn (one after another without replacement
or at once) at random from the urn. What is the probability pk
that k red balls are drawn?

The total number of ways of choosing n balls out of N is: Nn




The
K
 of ways of choosing k red balls and n − k non-red is
 number
N−K
k · n−k

Thus, the probability of interest


K N−K
 
k ·
pk = N
n−k
n

with the requirement max(0, n − (N − K )) ≤ k ≤ min(K , n)


otherwise pk = 0

pk defines the hyper-geometric distribution (chapter 3)


Hypergeometric
Notice that all the balls in the urn are implicitly labelled in the
derivation above. The equiprobable sample space we have
considered in the previous page is the set of unordered samples of
size k from an urn with labelled balls

Suppose N = 5, K = 3 (the urn contains 5 balls, three of which


red) and we are interested in the probability of drawing exactly
k = 1 red ball in a sample of size n = 2
The sample
 space that we are considering contains the following
N 5
n = 2 = 10 elements

[R1 B1 ] [R2 B1 ] [R3 B1 ] [R1 B2 ] [R2 B2 ] [R3 B2 ]


[R1 R2 ] [R1 R3 ] [R2 R3 ] [B1 B2 ]

and the number of outcomes favorable to our event


 56 (those on
3 2
the first line), thus the probability 6/10 = 1 1 / 2
Hypergeometric
Nothing changes in the computation of the probability, if we
consider as sample space the set of ordered samples
Both the sample space and the set of interest are now two times
larger than above (for each element we now have 2!)
Indeed, in the general case, this can be seen from the equation
K
 N−K  K
 N−K 
k · n−k n! k · n−k
 
n (K )k · (N − K )n−k
= = N
(N)n k (N)n

n
|Ω| = (N)n counts all ordered n-tuples from a population of size N
and the numerators in the first and second term count all ordered
sets of size n with k red balls and n − k non-red
central term: count all ordered subsets from each sub-population,
then count all possible orderings of the n balls, keeping at the
same time fixed the balls of each color since their ordering has
already been taken into account: n!/(n − k)!k! = kn


left-hand side: choose the k red balls, the n − k non-red balls and
count all possible orderings (n!)
Hypergeometric
You can look at the problem differently, since
K
 N−K  n
 N−n 
k · n−k k · K −k
N
 = N

n K
Consider as sample space the set of all distinct words from a set of
K R’s and N − K R̄’s (as if you were drawing all balls from the
urn and placing them one after another on a row, with the balls
indistinguishable other than for their colors, R, R̄) thus (as the
BAALBEK example)
 
N! N
|Ω| = =
K !(N − K )! K
We are interested in the subset of such words whose first n letters
are k R’s and n − k R̄’s (our sample) and whose last N − n letters
are the remaining letters, K − k of which are
 R’s. The total
number of such words is indeed kn · K N−n
−k
(There is nothing special to our sample being the first n letters,
however)
Exercise

A bakery makes 80 loaves of bread daily. Ten of them are


underweight. An inspector weighs 5 loaves at random. What is the
probability that an underweight loaf will be discovered?
Exercise

A bakery makes 80 loaves of bread daily. Ten of them are


underweight. An inspector weighs 5 loaves at random. What is the
probability that an underweight loaf will be discovered?

a means at least one, not one

P(at least 1 underweight) = 1 − P(no underweight loaves)


10
 70
·
= 1 − 0 80 5
5
70
5
= 1− 80
 ≈ 0.4965
5
Conditional Probability

Consider an urn that contains B blue and R red balls

Let us draw a ball from it. The probability that the ball is blue is

P(1st blue) =
Conditional Probability

Consider an urn that contains B blue and R red balls

Let us draw a ball from it. The probability that the ball is blue is
B
P(1st blue) =
B +R
Now, the conditional probability answers questions such as this:
what is the probability of drawing a red ball if (given that) the first
ball was blue?
E.g., in the case of sampling without replacement we have:

P(2nd red|1st blue) =


Conditional Probability

Consider an urn that contains B blue and R red balls

Let us draw a ball from it. The probability that the ball is blue is
B
P(1st blue) =
B +R
Now, the conditional probability answers questions such as this:
what is the probability of drawing a red ball if (given that) the first
ball was blue?
E.g., in the case of sampling without replacement we have:
R B −1
P(2nd red|1st blue) = , P(2nd blue|1st blue) =
B +R −1 B +R −1

Notice P(2nd red|1st blue) + P(2nd blue|1st blue) = 1


Conditional Probability
A, B ⊆ Ω two events, with P(B) > 0
The conditional probability is the probability that A occurs, given
that B has occurred
P(A ∩ B)
P(A|B) =
P(B)

One can verify that if we condition on Ω, we get the usual


probability
P(A|Ω) = P(A)
Also

P(B|B) = 1
P(∅|B) = 0

The conditional probability P(·|B) has the same properties on the


space ΩB = Ω ∩ B = B as the original probability P(·) has on Ω
Conditional Probability
For a sequence of disjoint events Ai , i ∈ N, Ai ∩ Aj = ∅, when i 6= j
!
[ X
P Ai |B = P(Ai |B)
i i

which is the second axiom of probability. Indeed,


! ! !
[ [
P Ai |B = P Ai ∩ B /P(B)
i i
!
[
= P (Ai ∩ B) /P(B)
i
X X
= P (Ai ∩ B) /P(B) = P (Ai |B)
i i

since (Ai ∩ B) ∩ (Aj ∩ B) = Ai ∩ Aj ∩ B = ∅ ∩ B = ∅


Conditional Probability
In particular,
P(A|B) + P(Ā|B) = 1
However in general

P(A|B) + P(A|B̄) 6= 1

Furthermore
P(A|B) = 1 A ⊇ B
and thus (first axiom)

P(Ω|B) = P(ΩB |B) = 1

Accordingly, since the axioms of probability hold upon


conditioning, all results we have obtained for the unconditional
probability hold true when we condition on the same event. E.g.,

P(A ∪ C |B) = P(A|B) + P(C |B) − P(A ∩ C |B)


Exercise
A fair coin is tossed twice. What is the probability that both tosses
land on heads given that
1 the first toss lands head up
2 at least one of the tosses lands head up

Since the coin is fair, all the points in the sample space

Ω=
Exercise
A fair coin is tossed twice. What is the probability that both tosses
land on heads given that
1 the first toss lands head up
2 at least one of the tosses lands head up

Since the coin is fair, all the points in the sample space

Ω = {(H, H), (T , T ), (H, T ), (T , H)}

have the same probability. The event we need to consider is

A = {(H, H)}

and those we need to condition on


Exercise
A fair coin is tossed twice. What is the probability that both tosses
land on heads given that
1 the first toss lands head up
2 at least one of the tosses lands head up

Since the coin is fair, all the points in the sample space

Ω = {(H, H), (T , T ), (H, T ), (T , H)}

have the same probability. The event we need to consider is

A = {(H, H)}

and those we need to condition on


1 B1 = {(H, T ), (H, H)}
2 B2 = {(H, T ), (T , H), (H, H)}
Exercise. Continued
Thus the resulting probabilities are different in the two cases
1
P ((H, H) ∩ {(H, T ), (H, H)})
P(A|B1 ) =
P ({(H, T ), (H, H)})
P ((H, H))
=
P ((H, T )) + P ((H, H))
1/4 1
= =
1/4 + 1/4 2

2
P ((H, H) ∩ {(H, T ), (T , H), (H, H)})
P(A|B2 ) =
P ({(H, T ), (T , H), (H, H)})
P ((H, H))
=
P ((H, T )) + P ((T , H)) + P ((H, H))
1/4 1
= =
1/4 + 1/4 + 1/4 3
Interpreting the Conditional Probability
Since A = {(H, H)} ⊂ B1 = {(H, T ), (H, H)}, the sample space is
now B1 , thus we could have obtained P(A|B1 ) directly by simple
counting (because the space is uniform)

1
P(A|B1 ) =
2
Similarly, A = {(H, H)} ⊂ B2 = {(H, T ), (H, H), (T , H)}

1
P(A|B2 ) =
3
That is, another way of reading

P(A ∩ B)
P(A|B) =
P(B)

is that the probability of A given B is proportional to that part of


A that lies in B
Hypergeometric Using Conditional Probability
Consider again the probability of having k red balls in a sample of
size n drawn without replacement from an urn with N balls, K of
which are red
K N−K
   
k n−k n (K )k (N − K )n−k
N
 =
n
k (N)n

The factor
(K )k (N − K )n−k K K −1 K −k +1
= ···
(N)n N N −1 N −k +1
N −K N − K − (n − k) + 1
· ···
N −k N − k − (n − k) + 1

is the product of conditional probabilities: K


N being the probability
−1
of drawing a red ball from the original urn, K N−1 being the
conditional probability of drawing a red ball given that the first
draw was a red ball etc. (drawing, first, all red balls and then all
the required non-red)
Hypergeometric Using Conditional Probability

Since we can arrange the terms in both numerator and


denominator, the factor
(K )k (N − K )n−k K (K − 1) · · · (N − K − (n − k) + 1)
=
(N)n N(N − 1) · · · (N − k − (n − k) + 1)

counts the probability of drawing without replacement k red balls


and n − k non red balls in any given order

The hyper-geometric distributions sums up all these kn




contributions.
Indeed kn is the number of ways of placing k red balls in n places

Hypergeometric Using Conditional Probability
For example, suppose the urn has N = 10 balls, K = 3 of which
are red and N − K = 7 non-red (without loss of generality, we can
color them all blue). We sample 3 balls without replacement. The
probability that one of the drawn balls is red is given by
3 7
 
1
2
10
3
There are 3 red possible patterns in a sample of size 3 with one red
ball
A1 = RBB A2 = BRB A3 = BBR
The hypergeometric counts
p(A1 ∪ A2 ∪ A3 ) = p(A1 ) + p(A2 ) + p(A3 )
3 7 6 7 3 6 7 6 3
= · · + · · + · ·
10 9 8 10 9 8 10 9 8
3 7
3·7·6
= 3 = 1 102
10 · 9 · 8 3
(the first line is because Ai are mutually exclusive)
Exercise

What is the probability of drawing k balls that are red from an urn
with N total balls, K of which red, if instead we sample with
replacement?

Look at the relation


K N−K
   
k n−k n (K )k (N − K )n−k
N
 =
n
k (N)n
and consider how the cardinality of the space changes ...
Exercise

What is the probability of drawing k balls that are red from an urn
with N total balls, K of which red, if instead we sample with
replacement?

Look at the relation


K N−K
   
k n−k n (K )k (N − K )n−k
N
 =
n
k (N)n
and consider how the cardinality of the space changes ...
   k 
K n−k
  k
n K (N − K )n−k

n K
= 1−
k Nn k N N

(binomial-see chapter 3)
Multiplication Rule

Now, from
P(A ∩ B)
P(A|B) =
P(B)
the following multiplication rule follows

P(A ∩ B) = P(A|B)P(B)

the probability that A and B will both occur is equal to the


product of the probability that B will occur multiplied by the
probability that A will occur given that B has occurred

If P(A) > 0, by symmetry (commutativity of ∩), we can also write

P(A ∩ B) = P(B|A)P(A)
Exercise

Consider a well-shuffled regular deck of cards (13 cards


A, 2, . . . , 10, J, Q, K , for each of four suits ♣, ♠, ♦, ♥). Two cards
are dealt off the top of the deck. What is the probability that the
first card is A♦ and the second 2♠ ?

P(C 1 = A♦ ∩ C 2 = 2♠) =
Exercise

Consider a well-shuffled regular deck of cards (13 cards


A, 2, . . . , 10, J, Q, K , for each of four suits ♣, ♠, ♦, ♥). Two cards
are dealt off the top of the deck. What is the probability that the
first card is A♦ and the second 2♠ ?

P(C 1 = A♦ ∩ C 2 = 2♠) = P(C 2 = 2♠|C 1 = A♦)P(C 1 = A♦)


1 1
=
51 52
Generalization to many events. Chain Rule

Consider n events Ai ⊆ Ω, i = 1, . . . , n, then the following chain


rule holds
n
!
\
P Ai = P(A1 )P(A2 |A1 )P(A3 |A2 ∩A1 ) · · · P(An |An−1 ∩· · ·∩A1 )
i=1

or in more compact form, using the notation AB ≡ A ∩ B

P (A1 · · · An ) = P(A1 )P(A2 |A1 )P(A3 |A2 A1 ) · · · P(An |An−1 · · · A1 )

It does not matter the order with which we condition on the events
(which event we choose as the first to condition on, although of
course once a choice is made, we need to stick with it). In
practice, however, some choices make the computation much
easier than other choices
Law of Total Probability
Consider a complete set of disjoint events. Namely,
n
[
A1 , . . . , An , Ai = Ω, Ai ∩ Aj = ∅, i 6= j
i=1
exhaustive (at least one event will occur) and mutually exclusive,
then the following law of total probability holds
X n
P(B) = P(B|Ai )P(Ai )
i=1
Proof. B can be written as the union of disjoint sets
n n
!
[ [
B =B ∩Ω=B ∩ Ai = B ∩ Ai
i=1 i=1
(B ∩ Ai ) ∩ (B ∩ Aj ) = B ∩ Ai ∩ Aj = ∅
Hence
n
X n
X
P(B) = P(B ∩ Ai ) = P(B|Ai )P(Ai )
i=1 i=1
Law of Total Probability

The above formula provides the basic tool to compute the


probabilities of complicated events in terms of conditional
probabilities
Exercise

Assume people can be born any day of the year with equal
probability. What is the probability that a randomly selected
person was born on January, 1?
Exercise

Assume people can be born any day of the year with equal
probability. What is the probability that a randomly selected
person was born on January, 1?

Consider the complete set of events Y365 , Y366 , where Yk =”the


year has k days” and the event B =”birthday on January, 1”

P(B) = P(B|Y365 )P(Y365 ) + P(B|Y366 )P(Y366 )


1 3 1 1
= +
365 4 366 4
Graphical Representation

It may be useful to approach a probabilistic problem graphically.


We can construct a probability tree starting from a node, and then
branching out, with each branch being associated with one of the
events of a complete set.

The starting point is a complete set for which the unconditioned


probabilities are known.
Assume A1 , A2 is such a set, namely

Ω = A1 ∪ A2 , A1 ∩ A2 = ∅ (A2 = Ā)

We grow branches from the root, with each branch corresponding


to one of the two events and we furnish the branches with the
unconditioned probabilities P(A1 ) and P(A2 ).
Graphical Representation

)
A1
P(

P(
A2
)
Graphical Representation
Suppose a new complete set of events is available, B1 , B2 , B3 , for
which the conditional probabilities P(Bi |Aj ) are known. Then we
can use each A-branch as the root of a new sub-tree, whose
branches correspond to the events Bi . These new branches will be
furnished with the conditional probabilities P(Bi |Aj ).
The probabilities of a given event that is the intersection of some
events, e.g. B2 ∩ A1 , can be read from the graph: one has to select
the unique path that contains the branches of the events of
interest, B2 and A1 , and multiply the probabilities one encounters,
because of the chain rule

P(A1 ∩ B2 ) = P(B2 |A1 )P(A1 )

Similarly, unconditional probabilities of events can be obtained


employing the law of total probability

P(B2 ) = P(B2 ∩ Ω) = P(B2 ∩ A1 ) + P(B2 ∩ A2 )


Graphical Representation of the Chain Rule
)
|A 1
B1
P(

P(B2 |A1 )P(A1 ) = P(B2 ∩ A1 )


P(B2 |A1 )
)
A1 P(
P( B
3 |A
1)

P( )
A2
) |A 2
B1
P(
P(B2 |A2 )
P(B2 |A2 )P(A2 ) = P(B2 ∩ A2 )

P(
B
3 |A
2)
Graphical Representation

The process can be iterated if there is another complete set of


events {Ck }, for which the conditional probabilities given the
events previously represented are known. Indeed, each branch is
furnished with the conditional probability of the event it represents,
conditioned on the intersection of all the events that are met when
one follows the path that from that branch reaches the root.
Exercise 63
For customers purchasing a refrigerator at a particular appliance
store, consider the events
A =”refrigerator purchased was made in the USA”
B =”refrigerator had an ice-maker”
C =”customer purchased extended warrant”
Assume the following unconditional and conditional probabilities:
P(A) = .75 P(B|A) = 0.9 P(B|Ā) = 0.8
P(C |A ∩ B) = 0.8 P(C |A ∩ B̄) = 0.6
P(C |Ā ∩ B) = 0.7 P(C |Ā ∩ B̄) = 0.3
Compute P(A ∩ B ∩ C ), P(B ∩ C ), P(C ) using the tree or
otherwise

P(A ∩ B ∩ C ) = 0.8 · 0.9 · 0.75 = 0.54


P(B ∩ C ) = 0.54 + 0.14 = 0.68
P(C ) = 0.68 + 0.045 + 0.015 = 0.74
Problem

In an urn containing N balls, R are red. What is the probability


that the second draw is a red ball? Consider both sampling with
and without replacement
Problem

In an urn containing N balls, R are red. What is the probability


that the second draw is a red ball? Consider both sampling with
and without replacement

Ri = ”the i-th draw is a red ball”

P(R2 ) = P(R2 |R1 )P(R1 ) + P(R2 |R̄1 )P(R̄1 )


 RR R N−R R
 NN +N N = N with replacement
=
 R−1 R R N−R R
N−1 N + N−1 N = N without replacement

(of course in the case of sampling with replacement, the urn has
the same composition for any draw)
Thus
P(R2 ) = P(R1 )
General Case
What is the probability that the k-th draw is a red ball?
Ek =”the k-th draw is a red ball” We want to compute

P(Ek )

the unconditional probability.


We do not know if this red ball is the first, the second, or the k-th
red ball that has been drawn
The colors of the balls drawn before the k-th went unnoticed

If the sampling is with replacement clearly

p(Ek ) = R/N

since before carrying out the k-th draw the urn contains the same
balls as at the start

the same result holds true if the sampling is without replacement


General Case. Solution 1. Law of Total Probability
For i = 0, . . . , k − 1, let Ri∈k−1 be the event ”there are i red balls
among the first k − 1 drawn balls”
The events {Ri∈k−1 }k−1
i=0 form a complete set (mutually
exclusive-we may not have i red balls among the first k − 1 and at
the same time j 6= i red balls- and exhaustive), so we can
determine p(Ek ) (probability that the k-the ball is red) using the
law of total probability
k−1
X
p(Ek ) = P(Ek |Ri∈k−1 ) · P(Ri∈k−1 )
i=0
R
k−1
 N−R 
X R −i i k−1−i
= · N
N − (k − 1)

i=0 k−1
k−1 R−1
 N−R 
R X i k−1−i R
= N−1
=
N

k−1
N
i=0
Exercise

The last identity is because of the Vandermonde’s identity


r     
X m n m+n
=
i r −i r
i=0

with r = k − 1, m = R − 1, n = N − R.

Exercise. Prove the identity considering indistinguishable balls in


distinguishable cells without exclusion, or put it differently, use the
Vandermonde’s identity to prove that the total number of ways of
placing k indistinguishable balls into n distinguishable cells that
can be occupied by any number of balls is
Exercise

The last identity is because of the Vandermonde’s identity


r     
X m n m+n
=
i r −i r
i=0

with r = k − 1, m = R − 1, n = N − R.

Exercise. Prove the identity considering indistinguishable balls in


distinguishable cells without exclusion, or put it differently, use the
Vandermonde’s identity to prove that the total number of ways of
placing k indistinguishable balls into n distinguishable cells that
can be occupied by any number of balls is
 
n−1+k
k
General Case. Solution 2
Another proof. Draw without replacement an ordered sample of k
balls out of a population of N balls. The total number of such
samples is |Ω| = (N)k = N!/(N − k)!
Among all these samples we need to count only those that have as
k-th ball a red one
k

which number to
   
R R (N − 1)! (N − 1)!
|Ek | = (N − 1)k−1 = =R
1 1 (N − 1 − (k − 1))! (N − k)!
(choose one ball among the red ones, place it in the k-th position,
consider an ordered (k − 1)-size sample from the population of all
balls, except one to be placed in k-th position)
|Ek | R
p(Ek ) = =
|Ω| N
General Case. Solution 3

Equivalently, consider the problem of placing N distinguishable


balls in N labelled cells having at most one ball per cell (and thus
in this case exactly one), the labels of the cell corresponding to the
position with which the balls are drawn in the original problem

1 k N

|Ω| = N! |Ek | = R(N − 1)!


(choose a red ball, place it in the k-th position, place all other
balls in all possible ways)

|Ek | R
p(Ek ) = =
|Ω| N
Bayes’ Theorem

Consider a complete set of disjoint events:


n
[
A1 , . . . , An , Ai = Ω, Ai ∩ Aj = ∅, i 6= j
i=1

Bayes’ Theorem

P(B|Ai )P(Ai )
P(Ai |B) = Pn
j=1 P(B|Aj )P(Aj )

Proof
P(Ai ∩ B) P(B|Ai )P(Ai )
P(Ai |B) = = Pn
P(B) j=1 P(B|Aj )P(Aj )

P(Ai ) prior probabilities P(Ai |B) posterior probabilities


Bayes’ Theorem
We have different hypotheses Ai about some phenomenon that are
mutually exclusive and exhaustive (that is, one hypothesis is true
-exhaustive- and only one-mutually exclusive)
We do not know which hypothesis is true, but we have an a-priori
knowledge of how likely each is P(Ai ) (prior probability)
We carry out an experiment and we employ the result of this
experiment B (the data) to re-assess the probabilities of these
hypotheses. After the experiment (conditional on it), the
probability of each hypothesis is
P(B|Ai )P(Ai )
P(Ai |B) = Pn i = 1, . . . , n
j=1 P(B|Aj )P(Aj )

P(B|Ai ) indicates the probability that given that the hypothesis Ai


is true, we obtain the result B from the experiment
Thus we use both the result of the experiment (data) and the prior
probabilities (our informed prior knowledge) to determine the
posterior probability
Exercise
We have 10 (biased) coins. When the i-th coin is tossed, the
probability of heads is i/10 (i = 1, . . . , 10). We randomly select a
coin, toss it, and get heads. What is the probability that it was the
coin numbered 5? (Spring 2015-Exam)
Exercise
We have 10 (biased) coins. When the i-th coin is tossed, the
probability of heads is i/10 (i = 1, . . . , 10). We randomly select a
coin, toss it, and get heads. What is the probability that it was the
coin numbered 5? (Spring 2015-Exam)

H1 =”head comes up” (our experiment’s outcome, our data)


Ik =”the coin selected is the k-th”
P(H1 |Ik ) indicates the probability that if the hypothesis Ik is true
(that is, if the k-th coin in the one selected and tossed), the coin
lands head up.
P(Ik |H1 ) is what we should compute for k = 5
Exercise
We have 10 (biased) coins. When the i-th coin is tossed, the
probability of heads is i/10 (i = 1, . . . , 10). We randomly select a
coin, toss it, and get heads. What is the probability that it was the
coin numbered 5? (Spring 2015-Exam)

H1 =”head comes up” (our experiment’s outcome, our data)


Ik =”the coin selected is the k-th”
P(H1 |Ik ) indicates the probability that if the hypothesis Ik is true
(that is, if the k-th coin in the one selected and tossed), the coin
lands head up.
P(Ik |H1 ) is what we should compute for k = 5

P(H1 |Ik )P(Ik ) P(H1 |Ik )P(Ik )


P(Ik |H1 ) = = P10
P(H1 ) j=1 P(H1 |Ij )P(Ij )
k 1
10 10 k
= P10 j 1 =
j=1 10 10
55

Thus P(I5 |H1 ) = 1/11


Exercise
Mr Jones from New York City and Mr Clark from Philadelphia
decide to meet up in NYC. Mr Clark is a very indecisive person and
thus flips a coin to make up his mind about going to NYC. If the
coin is heads, he goes to the station, and there he chooses which
of the six available trains to board by casting a die. Given that Mr
Jones sees that Mr Clark has not arrived in NYC with any of the
first five trains, what is the probability that the latter (Mr Clark)
will arrive with the 6th train? (Fall 2014 Exam)
Exercise
Mr Jones from New York City and Mr Clark from Philadelphia
decide to meet up in NYC. Mr Clark is a very indecisive person and
thus flips a coin to make up his mind about going to NYC. If the
coin is heads, he goes to the station, and there he chooses which
of the six available trains to board by casting a die. Given that Mr
Jones sees that Mr Clark has not arrived in NYC with any of the
first five trains, what is the probability that the latter (Mr Clark)
will arrive with the 6th train? (Fall 2014 Exam)

T6 =”Mr Clark took the 6th train”


N5 =”Mr Clark was not on any of the first five trains”
Exercise
Mr Jones from New York City and Mr Clark from Philadelphia
decide to meet up in NYC. Mr Clark is a very indecisive person and
thus flips a coin to make up his mind about going to NYC. If the
coin is heads, he goes to the station, and there he chooses which
of the six available trains to board by casting a die. Given that Mr
Jones sees that Mr Clark has not arrived in NYC with any of the
first five trains, what is the probability that the latter (Mr Clark)
will arrive with the 6th train? (Fall 2014 Exam)

T6 =”Mr Clark took the 6th train”


N5 =”Mr Clark was not on any of the first five trains”
1 1 1 1 1
P(N5 |T6 ) = 1, P(T6 ) = · , P(N5 ) = + ·
2 6 2 2 6
(last equality is because if Mr Clark was not on any of the first 5
trains either he took the sixth train or he did not leave for NYC)
p(N5 |T6 )p(T6 ) 1
p(T6 |N5 ) = =
p(N5 ) 7
Independence

Two events A and B are independent iff

P(A ∩ B) = P(A)P(B)

Thus since when P(A), P(B) > 0

P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B)

we can, equivalently, state that independence is

P(B|A) = P(B) P(A|B) = P(A)

the fact that one event occurs does not modify the probability of
the other
Independence. General Case

A1 , . . . , An are mutually/statistically independent or simply


independent iff

P(Ai1 ∩ · · · ∩ Aik ) = P(Ai1 ) · · · P(Aik )


∀k = 2, . . . , n, ∀1 ≤ i1 < · · · < ik ≤ n

That is, we have to verify the property for any possible subset of
A1 , . . . , An
Pairwise independence

P(Ai ∩ Aj ) = P(Ai )P(Aj ), ∀i 6= j

does not imply events are mutually independent


Independence. General Case
Example (seen before- here )

Ω = {ω1 , ω2 , ω3 , ω4 } P(ωi ) = 1/4, ∀i

The events

A1 = {ω1 , ω2 }, A2 = {ω1 , ω3 }, A3 = {ω1 , ω4 }

are pairwise independent


1 1 1
= P(Ai ∩ Aj ) = P(Ai )P(Aj ) = ·
4 2 2
but not independent since
 3
1 1
= P(A1 ∩ A2 ∩ A3 ) 6= P(A1 )P(A2 )P(A3 ) =
4 2
Independence
Saying that two events A and B are independent does not mean
that they have ”nothing to do with each other” or that one does
not ”influence” the other
Consider the experiment of rolling two dice and the events
A =”the sum of the numbers that come up is 7”
B = ”the first die shows k”, k a fixed number, k = 1, . . . , 6
A and B are independent (the fact that one event occurs does not
modify the probability of the other):

1 1 1
· = p(A) · p(B) = p(A ∩ B) =
6 6 36
Yet they have something to do with each other, as it were.

Independence of two events means instead that the probabilities of


the second event given the first are the same no matter what the
outcomes are of the first one
Problem

Is A independent of A?
Problem

Is A independent of A?

P(A) = P(A ∩ A) = P(A)P(A)


the first equality by idempotency of ∩, the second by independence

Thus, if A is independent of A then P(A) = {0, 1} (and vice versa)


Problem
If A and B are independent, then
I Ā, B̄ are independent
I A and B̄ are independent (and then obviously Ā, and B)

Let us prove the first

1 − (1 − P(Ā))(1 − P(B̄)) = 1 − P(A)P(B)


= 1 − P(A ∩ B) = P(A ∩ B)
= P(Ā ∪ B̄)
= P(Ā) + P(B̄) − P(Ā ∩ B̄)

That is

1 − (1 − P(Ā))(1 − P(B̄)) = P(Ā) + P(B̄) − P(Ā ∩ B̄)

Hence P(Ā)P(B̄) = P(Ā ∩ B̄)

Exercise: Prove the second result


Mutually Exclusive and Independent

Mutually exclusive and independence are two different ideas


I Two events are mutually exclusive if the occurrence of one
prevents the occurrence of the other

I Two events are independent if the occurrence of one does not


change the probability of the other

However notice the following


Given two events A and B such that p(A) > 0 and p(B) > 0, then

A ∩ B = ∅ ⇒ P(A ∩ B) 6= P(A)P(B)
mutually exclusive ⇒ dependence
Conditional Independence

Two events A and B are conditionally independent given C is

P(A ∩ B|C ) = P(A|C )P(B|C )

which simplifies the general condition

P(A ∩ B|C ) = P(A|B ∩ C )P(B|C )

valid for all events.

Equivalently, conditional independence of A and B given C can be


expressed as

P(A|B ∩ C ) = P(A|C ) P(B|A ∩ C ) = P(B|C )


Exercise. Continued
Let us reconsider the previous problem
We have 10 (biased) coins. When the i-th coin is tossed, the
probability of heads is i/10 (i = 1, . . . , 10). We randomly select a
coin, toss it, and get heads. Toss again the same coin. What is the
probability that it lands head up again?

Ht =”the selected coin lands head up at the t-th toss”, t = 1, 2


Ik =”the selected coin is the k-th coin”, k = 1, . . . , 10, a complete
set
We want to compute P(H2 |H1 )
We can use the conditional version of law of total probability,
which states that if {Ai }10
i=1 is complete set
n
X n
X
P(B|C ) = P(B ∩ Ai |C ) = P(B|Ai ∩ C )P(Ai |C )
i=1 i=1

since P(B ∩ Ai |C ) = P(B|Ai ∩ C )P(Ai |C )


Exercise. Continued
Hence,
10
X 10
X
P(H2 |H1 ) = P(H2 ∩ Ik |H1 ) = P(H2 |Ik ∩ H1 )P(Ik |H1 )
k=1 k=1
Now the coin tosses are conditionally independent given the Ik ,
that is
P(H2 |Ik ∩ H1 ) = P(H2 |Ik ),
so that
10
X
P(H2 |H1 ) = P(H2 |Ik )P(Ik |H1 )
k=1
Using the results of the previous part, one gets
10
X
P(H2 |H1 ) = P(H2 |Ik )P(Ik |H1 )
k=1
10
X k k 385
= = = 0.7 > P(H1 ) = 0.55
10 55 550
k=1
Exercise. Continued
What is the meaning of the following ?
10
X
P(H2 |H1 ) = P(H2 |Ik )P(Ik |H1 )
k=1

After the experiment H1 is carried out we have learnt something


about the coin (Ik ). Thus P(Ik |H1 ), the posterior probability after
the first experiment, should be used as the probability that the
selected coin is the k-th, in place of P(Ik ) which would be used if
instead no experiment had been carried out
10
X
P(H2 ) = P(H2 |Ik )P(Ik )
k=1

Every time we carry out an experiment, the posterior probabilities


become the new prior probabilities

You might also like