Chapt 2

Chapter 2- 201781
Sample Space and Events
The sample space Ω (or S) associated with an experiment is the

set of all possible outcomes of such an experiment
Ω = {ω1 , . . . , ωn }
A subset E of Ω, E ⊆ Ω, is called an event

Informally, an event is a statement on the outcomes of a random
experiment
E can contain just one outcome (simple event or state) (e.g.

E = {ω3 })
or it can contain more than one (composite event or, simply,
event) (e.g. E = {ω2 , ω21 })
E = ∅ is referred to as the impossible (or vacuous) event

E = Ω is called the certain event or universal event
Sample Space
Describe the sample spaces for the following experiments
I roll a regular die
Sample Space
{1, 2, 3, 4, 5, 6}
I toss n coins (n = 4)
Sample Space
{1, 2, 3, 4, 5, 6}
{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,

TTTT , TTTH, TTHT , TTHH, THTT , THHT , THTH, THHH}
I cast a regular die and, if 6 comes up, toss a coin

Sample Space
{1, 2, 3, 4, 5, 6}


6H, 6T , 1, 2, 3, 4, 5
I measure the time for the emission of radioactive particle from

some atom
Sample Space
{1, 2, 3, 4, 5, 6}


6H, 6T , 1, 2, 3, 4, 5
I measure the time for the emission of radioactive particle from

some atom
(0, ∞)
(this is a non-discrete case)
Events
Describe the events

1) when rolling a die, an even number comes up
2) when tossing n coins, the first n − 1 outcomes are tails
3) measuring the time for the emission of radioactive particle
from some atom, the emission occurs after 3 minutes
Events
Describe the events

1) when rolling a die, an even number comes up
2) when tossing n coins, the first n − 1 outcomes are tails
3) measuring the time for the emission of radioactive particle
from some atom, the emission occurs after 3 minutes
1) E = {2, 4, 6}
2)
n−1 n−1
z }| { z }| {
E = {TTTTTT T , TTTTTT H}
3) E = (3, ∞) (using minutes as unit of time)
Sample Space and Events
Events are subsets of a set, the sample space Ω

(Notice that not all subsets of Ω in general are events. The choice
of subsets that are to be events will depend on the phenomenon at
hand)
Logical statements on events ⇐⇒ operation with sets

Both event A and event B occur (conjunction) ⇐⇒ A ∩ B
at least one of A and B occurs (disjunction)/ either A or B or

both occur ⇐⇒ A ∪ B
the event A does not occur (negation) ⇐⇒ Ā
the event A occurs but the event B does not ⇐⇒ A \ B

A Set-theory Recap
Let A, B be two subsets of Ω
A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}
A ∩ B ≡ AB = {ω ∈ Ω : ω ∈ A and ω ∈ B}
Ā ≡ Ac = {ω ∈ Ω : ω ∈
/ A} = Ω \ A
B \ A = B ∩ Ā = {ω ∈ Ω : ω ∈ B and ω ∈
/ A}
Notation
When A is a countable set, we denote with |A| the number of
points/elements in A (the size or cardinality of A)
A Set-theory Recap
Properties
Commutativity
A ∪ B = B ∪ A, A∩B =B ∩A
Associativity
(A ∪ B) ∪ C = A ∪ (B ∪ C ), (A ∩ B) ∩ C = A ∩ (B ∩ C )
Distributivity
A∩(B ∪C ) = (A∩B)∪(A∩C ), A∪(B ∩C ) = (A∪B)∩(A∪C )
Idempotency
A ∪ A = A, A∩A=A
De Morgan’s laws
A ∪ B = Ā ∩ B̄, A ∩ B = Ā ∪ B̄
n
[ n
\ n
\ n
[
Ai = Āi , Ai = Āi
i=1 i=1 i=1 i=1
A Set-theory Recap. Poincaré Identities
Inclusion-Exclusion Principle (Poincaré Identity)
Let Ai ⊂ Ω, i = 1, . . . , n
n
[
| Ai | = |A1 ∪ A2 ∪ . . . ∪ An | =
i=1
n
X X
|Ai | − |Ai ∩ Aj |
i=1 1≤i<j≤n
X
+ |Ai ∩ Aj ∩ Ak |
1≤i<j<k≤n
− . . . + (−1)n−1 |A1 ∩ A2 ∩ . . . An |
For example, for n = 3
|A∪B ∪C | = |A|+|B|+|C |−|A∩B|−|A∩C |−|B ∩C |+|A∩B ∩C |

A Set-theory Recap. Poincaré Identities
Using the De Morgan’s law, it then follows

n
\ n
[
| Āi | = |Ω \ Ai |
i=1 i=1
n
[ n
[
= |Ω| − | Ai | since Ω ⊇ Ai
i=1 i=1
n
X X
= |Ω| − |Ai | + |Ai ∩ Aj |
i=1 1≤i<j≤n
X
− |Ai ∩ Aj ∩ Ak |
1≤i<j<k≤n
+ . . . + (−1)n−1 |A1 ∩ A2 ∩ . . . An |
An Example
In a group of 21 (Arabic-speaking) Lebanese, 16 speak French, 13
English, 4 Armenian, 9 English and French, 2 French and
Armenian, 3 English and Armenian and 1 English, French and
Armenian. How many speak only Arabic?
An Example
Let A be the set of Armenophones, E the set of Anglophones, F
the set of Francophones
The set of those who speak only Arabic is
An Example
Let A be the set of Armenophones, E the set of Anglophones, F
the set of Francophones
The set of those who speak only Arabic is
Ā ∩ Ē ∩ F̄ = A ∪ E ∪ F = Ω \ (A ∪ E ∪ F )
|Ā ∩ Ē ∩ F̄ | = |Ω| − |A ∪ E ∪ F |
= |Ω| − (|A| + |E | + |F |) + |A ∩ E | + |A ∩ F | + |E ∩ F |
−|A ∩ E ∩ F |
= 21 − (4 + 13 + 16) + 3 + 2 + 9 − 1 = 1
Some Terminology
Ω is also called the certain event
∅ is also called the impossible event
If A ∩ B = ∅, A and B are said to be disjoint or incompatible

or mutually exclusive events
(that A and B both occur is impossible: the occurrence of one
prevents the occurrence of the other)
If A ⊆ B, A is said to imply B (B occurs if A occurs)

In fact ⊆ is the set-theoretic equivalent of ⇒
Probability
The first two elements of a probabilistic model are the sample

space and the notion of events.
The third element is the assignment of a probability to the events

and outcomes of a random experiment
We need to formalize statements such as

the probability that when we roll a die an even number comes up is
1/2
Probability as Measure of Relative Frequencies
One interpretation views probability as a relative frequency (which

can be justified a posteriori by the result known as the law of large
numbers)
Carry out repeatedly and independently the same experiment a

large number of times N (roll the same die in the same conditions
N times)
record the number of times SN (E ) the event E occurs (an even
number comes up)
assign to the event the probability P(E ) = SN (E )/N, N large, (the
empirical limiting relative frequency in the N repetitions)
Probability as Measure of Relative Frequencies
The frequency definition of probability is based on the assumption

that identical and independent experiments can be carried out
(which is not always the case)
A priori there is no guarantee that the relative frequency should
converge to a limit and if that is the case, it is not clear how large
N should be for the approximation to be reliable
In general, the assignment of a probability to an event is rather
subtle
Sometimes there are some natural choices, for example based on
the existence of symmetries in the random experiment at hand
Sometimes the choice will be subjective (probability assignments
may differ from individual to individual)
Let us define probability as a (mathematical) function that satisfies
some properties (a version of these properties are indeed verified by
the relative frequency)
Discrete Probability Space
Let the sample space Ω be non-empty and countable for the rest of
this chapter
Third element necessary to complete the description of a
probabilistic model is a function
P : Ω → [0, 1]
called probability that verifies the following two properties
P1) P(Ω) = 1
The certain event has probability 1
P2) σ-additivity (countable additivity)
For every sequence (Ai )i∈N of disjoint/mutually exclusive
events, Ai ∩ Aj = ∅, i 6= j
∞ ∞
!
[ X
P Ai = P(Ai )
i=1 i=1
(Ω, P) is called a discrete probability space, with Ω its sample

space, and subsets of Ω the events
Probability. Properties
For a discrete probability space (Ω, P), the following statements

hold true
1) The impossible event has probability zero
P(∅) = 0
Proof. If Ai = ∅, i ∈ N, then ∞
S
i=1 Ai = ∅
∞ ∞ ∞
!
[ X X
P(∅) = P Ai = P(Ai ) = P(∅)
i=1 i=1 i=1
which holds iff P(∅) = 0.

2) Finite additivity
Let Ai , i = 1, . . . , n, be a finite family of disjoint events,
Ai ∩ Aj = ∅, i 6= j
n n
!
[ X
P Ai = P(Ai )
i=1 i=1
This follows from countable additivity, P2, setting Ak = ∅ for

all k ≥ n. Thus it is a weaker notion than countable additivity.
When Ω is finite, we can equally define the probability space

using properties P1 and P2 or P1 and finite additivity.
Other consequences of P1 and P2 (draw the corresponding Venn
diagram if in doubt)
3) P(Ā) = 1 − P(A)
1 = P(Ω) = P(A ∪ Ā) = P(A) + P(Ā)

since A ∩ Ā = ∅
4) For any A, B ⊆ Ω
P(B \ A) = P(B) − P(A ∩ B)
Since B = (B \ A) ∪ (A ∩ B) , with B \ A and (A ∩ B)

disjoint, then
P(B) = P(B \ A) + P(A ∩ B)
hence the result

5) For any or any A, B ⊆ Ω
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Since A ∪ B = A ∪ (B \ A) and A and B \ A are disjoint,
P(A ∪ B) = P(A) + P(B \ A) = P(A) + P(B) − P(A ∩ B)
using 4)
Inclusion-Exclusion Formulae (Poincaré’s Identities)
(Ω, P) a discrete probability space.

For any n ≥ 1 and for any choice of sets (events) A1 , . . . An ⊆ Ω
n
X X
P(A1 ∪ . . . ∪ An ) = (−)k−1 P(Ai1 ∩ . . . ∩ Aik )
k=1 1≤i1 <...<ik ≤n
n
X X
P(A1 ∩ . . . ∩ An ) = (−)k−1 P(Ai1 ∪ . . . ∪ Aik )
k=1 1≤i1 <...<ik ≤n
They can be proven by induction

They are actually valid on any probability space (finite, countable
or uncountable)
Example
For example for n = 4, calling the events A1 = A, A2 = B,

A3 = C , A4 = D
P(A ∪ B ∪ C ∪ D) = P(A) + P(B) + P(C ) + P(D)

−P(A ∩ B) − P(A ∩ C ) − P(A ∩ D)
−P(B ∩ C ) − P(B ∩ D) − P(C ∩ D)
+P(A ∩ B ∩ C ) + P(A ∩ B ∩ D)
+P(A ∩ C ∩ D) + P(B ∩ C ∩ D)
−P(A ∩ B ∩ C ∩ D)
Alternating sum of the probabilities of each event (4 terms), each

possible pair of events (6), each possible triple of events (4), each
possible quadruple (1)
When to use the Poincaré formula
Very often if one has to compute the probability that at least one
event occurs or the probability that no event occurs, the Poincaré
formula is most useful
Indeed, let Ai be i = 1, . . . , n n events

P(A1 ∪ . . . ∪ An ) is the probability that at least one of the n
events occurs
P(Ā1 ∩ . . . ∩ Ān ) is the probability that none occurs
By the Poincaré formulae, these probabilities can be written in

terms of probability involving fewer events, which are often easier
to compute
Probability of events and outcomes
(Ω, P) discrete probability space, A = {ωi1 , . . . , ωik } a compound
event, then its probability is
k
X
P(A) = P(ωij )
j=1
P
with the restriction 1 = P(Ω) = ω∈Ω P(ω)
Example
Ω = {ω1 , ω2 , ω3 , ω4 } P(ωi ) = p, ∀i
A1 = {ω1 , ω2 }, A2 = {ω1 , ω3 }, A3 = {ω1 , ω4 }
Find P(Ai ), P(Ai ∩ Aj ), P(A1 ∩ A2 ∩ A3 ), P(Ai ∪ Aj ) etc.

Probability of events and outcomes
(Ω, P) discrete probability space, A = {ωi1 , . . . , ωik } a compound
event, then its probability is
k
X
P(A) = P(ωij )
j=1
P
with the restriction 1 = P(Ω) = ω∈Ω P(ω)
Example
Ω = {ω1 , ω2 , ω3 , ω4 } P(ωi ) = p, ∀i
A1 = {ω1 , ω2 }, A2 = {ω1 , ω3 }, A3 = {ω1 , ω4 }
Find P(Ai ), P(Ai ∩ Aj ), P(A1 ∩ A2 ∩ A3 ), P(Ai ∪ Aj ) etc.
P(ωi ) = 1/4, P(Ai ) = 2/4, P(Ai ∩ Aj ) = 1/4 = P(A1 ∩ A2 ∩ A3 )

P(Ai ∪ Aj ) = 3/4, P(A1 ∪ A2 ∪ A3 ) = P(Ω) = 1
Uniform Probability Space
A discrete probability space (Ω, P) which is finite (that is, Ω is
finite) and such that the outcomes ωi ∈ Ω are equiprobable is
called a uniform probability space.
In this case, then
P(ωi ) = 1/|Ω|
The previous page example is such a space
More generally for any event E (a subset of Ω)
|E |
P(E ) =
|Ω|
The probability of an event is the ratio of the number of cases that

are favorable to it, to the number of possible cases, when there is
nothing to make us believe that one case should occur rather than
any other [Laplace]
Thus the problem of computing a probability of an event becomes

the problem of counting the number of its elements
Equiprobability
It is necessary that all points in the sample space be equiprobable
for computing probability via simple counting P(E ) = |E |/|Ω|
Suppose we want to compute the probability of getting a three by
summing the numbers that turn up on tossing two dice
E = the sum of two throws is 3

Since the sum is symmetric, we can think of using the following
sample space, where [i, j] is the unordered couple, with i the
number for one of the dice and j for the other
 
[1, 1] [1, 2] [1, 3] [1, 4] [1, 5] [1, 6]

 [2, 2] [2, 3] [2, 4] [2, 5] [2, 6]  
 [3, 3] [3, 4] [3, 5] [3, 6] 
Ω=  
 [4, 4] [4, 5] [4, 6] 

 [5, 5] [5, 6] 
[6, 6]
Equiprobability
There is nothing wrong in using this sample space, however these
simple events are not equally probable For example,
p([1, 1]) = 1/36

p([1, 2]) = p((1, 2) ∪ (2, 1)) = p((1, 2)) + p((2, 1)) = 2/36,
[i, j] unordered couple, (i, j) ordered couple

[i, j] one die shows the i-th face the other the j-th face
(i, j) the first die shows i the second j, if we throw them one after
the other or, if you toss them at the same time, just color the dice
differently: the red die shows i, the blue die j
[i, j] = {(i, j), (j, i)}
Thus we may not compute the probability by simple counting the

number of the points in the space that are favorable to the event
(one point [1, 2]) and divide by the size of the space 21 We would
get 1/21, instead of the correct probability which is 2/36
Equiprobability
Instead we can use the following equiprobable sample space
 
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) 
 
 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) 
Ω=  (4, 1)

 (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) 
 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)
whose |Ω| = 36 outcomes are all equiprobable P((i, j)) = 1/36,

thus |E = {i + j = 3}| = 2, P(E ) = |E |/|Ω| = 2/36
More generally (exercise) the probability of rolling a sum of k, with

two dice is
6 − |7 − k|
P({i + j = k}) = , k = 2, . . . , 12
36
Combinatorics. Multiplication Rule
First rule of counting. Multiplication Rule
If an object is formed by making a succession of choices such that

there are n1 possibilities for the first choice, n2 possibilities for the
second (after the first choice is made) etc. then the total number
of objects that can be made by making a set of choices is
|E | = n1 · n2 · · ·
For the rule to apply, the number of available possibilities of each

choice must be the same irrespective of which choice was made
previously (ni for the i-th choice, which may be different from nj ).
However the set of available possibilities may differ and depend on
the choice made at the previous stages
Problem
In how many ways can a woman that has 8 skirts, 4 pair of shoes
and 10 shirts be dressed?
Problem
In how many ways can a woman that has 8 skirts, 4 pair of shoes
and 10 shirts be dressed?
8 · 4 · 10 = 320
You may find it easier to represent this rule of counting using a

tree diagram
Problem
How many ways can we roll three dice? In how many ways can
three dice appear when they are rolled? How many possible
numbers we get by rolling 3 dice (the order counts)?
Problem
How many ways can we roll three dice? In how many ways can
three dice appear when they are rolled? How many possible
numbers we get by rolling 3 dice (the order counts)?
We have a succession of three choices (one per die) and each die
represents a multiple choice of six possibilities ni = 6, thus there
are
6 · 6 · 6 = 216
possible rolls
This is obvious if you think of tossing the dice one after the other
rather than simultaneously (although it does not matter) The set
of choices at each throw of the dice in this case does not change
based on what happens at the previous stage (it is always one of
the numbers {1, 2, 3, 4, 5, 6} (and thus the number does not
change which is only thing that matters for the multiplicative rule)
Problem
In how many ways can two dice show different faces?

Problem
In how many ways can two dice show different faces?
6·5
In this case there are 6 available possibilities for the first choice.
Five different possibilities for the second die. Which five depends
on the first choice (e.g. if 3 shows up on the first roll, the set of
available possibilities is {1, 2, 4, 5, 6}, if 2 shows up, such set is
{1, 3, 4, 5, 6}), but there are always 5 possibilities for the second
die, so the first rule of counting as formulated above still applies
Problem
How many different calendars are possible for a year?

Problem
How many different calendars are possible for a year?
7 · 2 = 14
Each year starts on one of the seven days (Sunday, Monday, ...,
Saturday). Each year is either a leap year (i.e., it includes February
29) or not
Standard Ways of Counting and Basic Probabilistic Models
We are going to see now two standard ways of counting which can
be used in the majority of combinatorial problems
I Sampling Methods
I Allocation Methods
Basic Probabilistic Models. Sampling Model
Consider a population of n individuals (people, cards, numbered

balls), i.e. an aggregate of n distinguishable elements without
regard to their order (think of an urn containing n numbered balls)
Choose an individual from the population (a sample of size 1)

(that is, draw one ball from the urn)
How many ways can we do this?


The answer is n
Choose k individuals successively, one at a time



The answer is n
Choose k individuals successively, one at a time

It depends
It depends on whether or not a chosen individual is returned to the
population before another is chosen (that is, on whether or not we
put the ball we have drawn back into the urn)
It also depends on whether the sample of size k is ordered or not

(namely, ordering means that the same individuals chosen in a
different order are considered to be a different sample- drawing 1
before 2 is considered different from drawing 2 before 1)
The two kinds of sampling models are called

sampling with replacement
we put the drawn ball back into the urn or we replace the ball
we have drawn in the urn
sampling without replacement
we do not replace the drawn balls
In addition we need to consider the sample to be ordered or
un-ordered
Basic Probabilistic Models
Thus we will consider the following 2 · 2 = 4 cases
I Sampling with replacement and with ordering
I Sampling without replacement and with ordering
I Sampling without replacement and without ordering
I Sampling with replacement and without ordering

Sampling with replacement and with ordering
I. Sampling with replacement and with ordering
The first individual is from a population with n individuals
n1 = n
We return the individual to the population, thus the set (and

hence the number) of possibilities for the choice of a second
individual is the same as for the first
n2 = n
Thus repeating the argument, using the multiplication rule (first

rule), we find that there are
n1 · · · nk = nk
ways to draw an (ordered) sample of size k from a population of n

individuals
nk counts the number of different ordered k-tuple from a
population of size n
Ordering means, e.g., that (1, 2, 1, 3) is considered different from

(1, 2, 3, 1), (2, 1, 3, 1), etc.
Put it differently, the sample space Ω is the set of ordered k-tuples
Ω = {(a1 , a2 , . . . , ak ), ai = 1, ..., n}
Let us write down explicitly the number of ordered samples of size

k = 2 that can be obtained from n = 3 objects, if an object may
appear more than once (sampling with replacement)
nk counts the number of different ordered k-tuple from a
population of size n
Ordering means, e.g., that (1, 2, 1, 3) is considered different from

(1, 2, 3, 1), (2, 1, 3, 1), etc.
Put it differently, the sample space Ω is the set of ordered k-tuples
Ω = {(a1 , a2 , . . . , ak ), ai = 1, ..., n}
Let us write down explicitly the number of ordered samples of size

k = 2 that can be obtained from n = 3 objects, if an object may
appear more than once (sampling with replacement)
(1, 1), (1, 2), (1, 3)

(2, 1), (2, 2), (2, 3)
(3, 1), (3, 2), (3, 3)
Sampling without replacement and with ordering
II. Sampling without replacement and with ordering
The first individual is chosen from a population with n individuals
n1 = n
II. Sampling without replacement and with ordering
The first individual is chosen from a population with n individuals
n1 = n
We do not return the individual to the population (we do not

replace it), so now the population contains n − 1 individuals. The
number of possibilities for the choice of a second individual is
n2 = n − 1
Thus using the multiplication rule, there are

n!
n · (n − 1) · · · (n − k + 1) = (n)k ≡ ≡ Pk,n
(n − k)!
ways of drawing an ordered sample of size k from a population of

n individuals without replacement
(n)k is called lower or falling factorial
Notice that in this case the k choices are not independent, as the
earlier individuals we drew affect the set of possibilities of later
individuals, but the number of possibilities of a given later draw of
an individual is not affected by earlier draws, so the first rule still
applies
In this case (sampling without replacement and with ordering) the

sample space is the set of ordered k-tuples with the constraint that
no two elements can be the same (known also as k-permutations
or ordered k-sets)
Ω = {(a1 , a2 , . . . , ak ), ai = 1, . . . , n, ai 6= aj if i 6= j}
Let us write down explicitly the number of ordered pairs (k = 2)

that can be obtained from a populations of n = 3 objects, if an
object may not appear more than once (sampling without
replacement)
In this case (sampling without replacement and with ordering) the

sample space is the set of ordered k-tuples with the constraint that
no two elements can be the same (known also as k-permutations
or ordered k-sets)
Ω = {(a1 , a2 , . . . , ak ), ai = 1, . . . , n, ai 6= aj if i 6= j}
Let us write down explicitly the number of ordered pairs (k = 2)

that can be obtained from a populations of n = 3 objects, if an
object may not appear more than once (sampling without
replacement)
(1, 2) (1, 3)
(2, 1) (2, 3)
(3, 1) (3, 2)
Some problems
I How many ways are there for 2 persons to sit in 7 chairs that
are in a row?
Some problems
are in a row?
7 · 6 = (7)2
Sampling with ordering (it matters in a theater, say, where
you sit) and replacement (two people may not sit on the same
chair)
I How many ways are there to make a 3-letter word (a string of

3 letters) using a 26-letter alphabet?
Some problems
are in a row?
7 · 6 = (7)2
chair)

263
I How many ways are there to form a 3-letter word if the letters
must be different?
Some problems
are in a row?
7 · 6 = (7)2
chair)

263
I How many ways are there to form a 3-letter word if the letters
must be different?
26 · 25 · 24 = (26)3
Sampling with ordering
Drawing an ordered sample of size k from an urn containing n balls

ai label of the ball selected at i-th step ai = 1, . . . , n
Type sample space Ω |Ω|

(a1 , a2 , . . . , ak ) ordered k-tuple
with replacement Ω = {(a1 , a2 , . . . , ak ) : ai = 1, . . . , n} nk
without replacement Ω = {(a1 , a2 , . . . , ak ) : ai = 1, . . . , n, ai 6= aj if i 6= j} n(n − 1) · · · (n − k + 1)

n≥k (set of k-permutations of an n-set) n!
= (n−k)! ≡ (n)k ≡ Pk,n
For example, if n = 3, k = 2, the detailed sample spaces are:
with replacement without replacement
(1, 1), (1, 2), (1, 3) (1, 2), (1, 3)

(2, 1), (2, 2), (2, 3) (2, 1), (2, 3)
(3, 1), (3, 2), (3, 3) (3, 1), (3, 2)
Permutations
As we have seen (n)k ≡ Pk,n ≡ n!/(n − k)! denotes the number of

ordered samples of size k drawn from a population of n individuals
without replacement
A sample of size n (i.e. k = n) includes therefore the whole
population and represents a re-ordering or a shuffling of its
elements, generally referred to as an n-permutation or simply a
permutation
The number of different permutations of n elements is then
(n)n = n!
Some problems
I How many different words (i.e. strings of letters) can be

obtained from the letters BEIRUT (using all letters)?
Some problems

6!

obtained from the letters BAALBEK (using all letters)?
Some problems

6!

obtained from the letters BAALBEK (using all letters)?
7!
2!2!
shuffling the two A’s and the two B’s does not change the
word
Sampling without Ordering
We now consider the case in which the order of the elements in the
sample is irrelevant/disregarded
In this case too we have to distinguish between sampling with or
without replacement
A word on terminology: we will always explicitly specify whether the elements in the samples are ordered or not,
unless it is clear from the context. Sometimes however the word sample is used to refer to what we have called an
ordered sample and the word population is used to refer to an aggregate of elements without regard to their order.
We have indeed used population in this sense, but when we consider sampling from a population, we will mostly
talk about unordered samples of size k instead of sub-populations of size k. In some books, you may find that the
problem of counting the number of possible unordered samples of size k from a population of size n is referred to
as counting the number of subpopulations of size k of a given population of size n

Sampling without replacement and without ordering
III. Sampling without replacement and without ordering

Let’s start by constructing explicitly the sample space of unordered
k-tuple sampled without replacement
n = 4, k = 2, {1, 2, 3, 4} the individuals in the population
The possible unordered samples of size k = 2 obtained without
replacement are
III. Sampling without replacement and without ordering

Let’s start by constructing explicitly the sample space of unordered
k-tuple sampled without replacement
n = 4, k = 2, {1, 2, 3, 4} the individuals in the population
The possible unordered samples of size k = 2 obtained without
replacement are
Ω = {[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]}
because we do not count as different the samples in which the

individual 1 has been drawn before the individual 2 or after it
Notice however that while the order of the sampling is irrelevant,
different individuals make a different sample (the individuals are
distinguishable/labelled/numbered) so [1, 2] = [2, 1] 6= [1, 3]
For general n and k
Ω = {[a1 , . . . , ak ] : ai = 1, . . . , n, ai 6= aj , i 6= j}
where [a1 , . . . , ak ] indicates an unordered k-tuple whose elements

are a1 , . . . , ak
Since all the k individuals in a sample of size k are different (no

replacement) and the sample is unordered, we can actually
consider selecting all the individuals together (that is, drawing k
numbered balls all together or one at a time is the same thing)
How many elements are in Ω?
We can sample one individual at a time, or equivalently, draw k

individuals from the population
An ordered k-tuple from the population can be obtained from an

unordered one by numbering its elements
There are k! different ways of numbering k elements
Thus there are exactly k! times as many ordered samples of size k

as there are unordered samples of size k
Indeed we can obtain any ordered sample of size k by shuffling one

specific sample of size k (unordered sample) and since the
elements in the sample are not repeated (sampling without
replacement) there are k! ordered samples for each unordered one
(think of the BEIRUT example above, where k = 6)
Hence, the number of unordered samples of size k from a

population of size n when the sampling is carried out without
replacement is

Pk,n (n)k n! n
= = ≡ ≡ Ck,n
k! k! (n − k)!k! k
the binomial coefficient
Some properties of the binomial coefficient

n
n n X n
= , ak b n−k = (a + b)n
k n−k k
k=0
Exercises
I How many different committees of 3 people can be formed
from a group of 30 people?
Exercises

30 30!
≡ C3,30 =
3 27!3!
I A committee is made up of a president, a treasury and

vice-president. How many different committees can be formed
Exercises

30 30!
≡ C3,30 =
3 27!3!
I A committee is made up of a president, a treasury and

vice-president. How many different committees can be formed

30
3! = (30)3 = 30 · 29 · 28 = P3,30
3
choose three people, 30

3 , and then consider all possible
orderings (3!)
Sampling with replacement and without ordering
IV. Sampling with replacement and without ordering
Consider now sampling with replacement and do not distinguish

samples that differ only in the order of their elements
Explicit construction of the sample space

population {1, 2, 3, 4} (n = 4) the possible unordered samples of
size k = 2 are
Ω = {[1, 1], [2, 2], [3, 3], [4, 4], [1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]}
In this case again [1, 2] = [2, 1], thus it is counted once, and we
can have the same individual repeated in a sample (as many as k
times in a k-sample) because we replace the individual back in the
population before sampling next time
Sampling with replacement and without ordering. The
sample space
For general n and k, the sample space can be written as
Ω = {[a1 , . . . , ak ] : ai = 1, . . . , n, }
where [a1 , . . . , ak ] indicates an unordered k-tuple whose elements

are a1 , . . . , ak
Sampling with replacement and without ordering. The
sample space
What is |Ω|?
The number of unordered samples of size k sampled with

replacement from a population of size n is

n−1+k (n − 1 + k)! n−1+k
|Ω| = = =
k k!(n − 1)! n−1
It can be proven by induction, but we will prove it later considering

a different (but equivalent) model
Exercise
A bakery has a promotion offering 6 doughnuts for the price of

four dollars. You want to take advantage of the offer. Considering
that the bakery sells 4 types of doughnuts, how many different
boxfuls of 6 doughnuts are possible for you to choose?
Exercise
A bakery has a promotion offering 6 doughnuts for the price of

four dollars. You want to take advantage of the offer. Considering
that the bakery sells 4 types of doughnuts, how many different
boxfuls of 6 doughnuts are possible for you to choose?

n−1+k 4−1+6
= = 84
k 6
Here we do not distinguish the same collection of 6 doughnuts

arranged in different orders in the box
Unordered sampling with replacement because we can (we must in

fact since k = 6 > n = 4) choose the same type of doughnut more
than once, but we are not distinguishing the same items in
different orders
Sampling without ordering
Drawing an unordered sample of size k from an urn containing n balls

Type sample space Ω |Ω|

[a1 , a2 , . . . , ak ] unordered k-tuple

n−1+k n−1+k
with replacement Ω = {[a1 , a2 , . . . , ak ] : ai = 1, . . . , n} k
= n−1

n n!
without replacement Ω = {[a1 , a2 , . . . , ak ] : ai = 1, . . . , n, ai 6= aj if i 6= j} k
= k!(n−k)! ≡ Ck,n
n≥k (set of combinations)
For example, if n = 3, k = 2, the detailed sample spaces are:
with replacement without replacement
[1, 1], [1, 2], [1, 3] [1, 2], [1, 3], [2, 3]
[2, 2], [2, 3], [3, 3]
Some Problems
Six letters are selected at random one after another from the
English alphabet (26 letters) with replacement
Find the probabilities that: a) the word formed is made up of
vowels (6, if we count Y as vowel); b) it is the word BEIRUT
Some Problems
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6
Some Problems
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6
b) |Ω| = 266 , |E | = 1, p(E ) = (1/26)6
Do the same if the sampling is without replacement

Some Problems
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6
b) |Ω| = 266 , |E | = 1, p(E ) = (1/26)6

26

a) |Ω| = (26)6 , |E | = (6)6 , p(E ) = 6!/(26)6 = 1/ 6
Some Problems
a) |Ω| = 266 , |E | = 66 , p(E ) = (6/26)6
b) |Ω| = 266 , |E | = 1, p(E ) = (1/26)6

26

a) |Ω| = (26)6 , |E | = (6)6 , p(E ) = 6!/(26)6 = 1/ 6
b) |Ω| = (26)6 , |E | = 1, p(E ) = 1/(26)6

Problem
What is the probability for any fixed element of a population of
size n to be included in a random ordered sample of size k when
the sampling is without replacement?
(For example, the population is a set of labelled balls (1, . . . , n),
we want to find the probability that the ball numbered 3, say, is
drawn from it when we randomly draw k balls without
replacement, considering the sample to be ordered)
Problem
What is the probability for any fixed element of a population of
size n to be included in a random ordered sample of size k when
the sampling is without replacement?
(For example, the population is a set of labelled balls (1, . . . , n),
we want to find the probability that the ball numbered 3, say, is
drawn from it when we randomly draw k balls without
replacement, considering the sample to be ordered)
|Ω| = (n)k
|E | = k(n − 1)k−1
(the element of interest-the fixed element- can be in any of k

positions in the sample, and the other k − 1 elements composing
the ordered sample should come from the remaining population of
size n − 1)
k(n − 1)k−1 (n − 1)! (n − k)! k
P(E ) = =k =
(n)k (n − k)! n! n
Problem. Continued
Equivalently,

1 n−1 n−1
|E | = k! = (k − 1)!k = (n − 1)k−1 k
1 k −1 k −1
choose the element of interest (there is only 1 choice), choose the

other (k − 1) elements out of the (n − 1) remaining in the
n−1

population, k−1 , to complete the sample, and shuffle (k!)
Alternatively, let us go to the complement

Problem. Continued
Equivalently,

1 n−1 n−1
|E | = k! = (k − 1)!k = (n − 1)k−1 k
1 k −1 k −1
choose the element of interest (there is only 1 choice), choose the

other (k − 1) elements out of the (n − 1) remaining in the
n−1

population, k−1 , to complete the sample, and shuffle (k!)
Alternatively, let us go to the complement

E =the element is included in the ordered sample
Ē =the element is not included in the ordered sample
(n − 1)k n−k k
P(E ) = 1 − P(Ē ) = 1 − =1− =
(n)k n n
Problem
What is the probability that any fixed element of a population of

size n to be included at least once in a random ordered sample of
size k?
Problem
What is the probability that any fixed element of a population of

size n to be included at least once in a random ordered sample of
size k?
At least once implies the sampling is with replacement

E = ”the element is included in the ordered sample at least once”
Ē = ”the element is not included in the ordered sample”
(n − 1)k 1 k

P(E ) = 1 − P(Ē ) = 1 − =1− 1−
nk n
Standard Ways of Counting and Basic Probabilistic Models
We are going to see the second standard way of counting
I Sampling Model
I Allocation Model
We will also establish a mapping between the two models

Occupancy Models
k objects (balls/particles) are placed in n cells (boxes/urns)
I In general, cells can be distinguishable or indistinguishable, but

we restrict ourselves to the distinguishable case in this course
I Numbers of objects in the cells are calledPoccupation numbers

ki = number of objects in the i-th cell ( ni=1 ki = k)
We consider 2 cases
cells can contain at most one object (ki = 0, 1, ∀i)
cells can contain any number of objects (ki = 0, 1, . . . , k, ∀i)
I Objects can be both distinguishable and indistinguishable (2

possibilities)
I The order of the objects within a cell is immaterial in this

course. We only care of which objects are in which cell, not
the order of the objects in a cell (if the objects are identical,
this is always the case)
Sampling Models ⇔ Occupancy Models. A Dictionary
A dictionary to translate from the sampling models to the
occupancy models and the other way round
population with n individuals n distinguishable cells/boxes

sample of size k k particles/balls/objects
sampling with replacement occupation numbers

ki = 0, 1, . . . , k, i = 1, . . . , n
sampling without replacement occupation numbers

ki = 0, 1, i = 1, . . . , n
ordered samples distinguishable objects

(position in the sample) (label of object)
unordered samples indistinguishable objects

Indistinguishable objects in n (distinguishable) cells
The number of possible placements of k indistinguishable objects
in n (distinguishable) cells that can contain any number of objects
(the total number of ways of sampling with replacement an
unordered sample of size k from a population of size n) is

n−1+k
k
Proof. The n cells can be thought of as being made up of n + 1

separators. Representing a separator with a bar | and an object
with ?, a general assignment of k objects is represented by a
symbol that ends and begins with a bar
(e.g., k = 6 and n = 5, | ? ? ? | ? ||| ? ?| is the occupation model
defined by (k1 , . . . , kn ) = (3, 1, 0, 0, 2)).
The internal n − 1 separators and k balls can appear in any
arbitrary order, thus the number of distinguishable placement
equals the number of ways of selecting/choosing k places out of
n−1+k
Indistinguishable objects in n (distinguishable) cells
One could have obtained the same results algebraically, by

determining the number of ordered sets of n non-negative integers
that sum to k
(k1 , . . . , kn ), ki ≥ 0
k1 + k2 + . . . + kn = k
because with indistinguishable objects, two distributions are

distinguishable only if the corresponding n-tuples are not identical
The occupancy numbers are ordered since the cells are

distinguishable. [If we were to assume the cells to be
indistinguishable the order among the occupancy numbers would
be disregarded]
Summary
Drawing a sample of size k from an urn containing n balls

n+k−1
with replacement nk k

n! n
without replacement (n)k = (n−k)! k
ai 6= aj , i 6= j
n≥k
ordered sample unordered sample

(a1 , . . . , ak ) [a1 , . . . , ak ]
Allocating/distributing k particles (objects) into n distinguishable cells (boxes/arrangements)

P
(k! , . . . , kn ), ki number of objects assigned to the i-th cell i ki = k

n+k−1
without exclusion nk k
ki = 0, . . . , k

n! n
with exclusion (n)k = (n−k)! k
ki = 0, 1
(⇒ n ≥ k)
distinguishable objects indistinguishable objects
order within the cell immaterial
Exercise
Let f (x1 , . . . , x12 ) be an analytic function of 12 variables. How
many partial derivatives of order 5 exist?
Exercise
Partial derivatives do not depend on the order of differentiations
Exercise
Partial derivatives do not depend on the order of differentiations
n = 12, k = 5
12 − 1 + 5
= 4368
5
Think of an urn with balls labelled x1 , . . . , x12 . Drawing the

unordered sample [x3 , x2 , x6 , x6 , x10 ] would correspond to
∂5
f (x1 , x2 , . . . , x12 )
∂x2 ∂x3 ∂ 2 x6 ∂x10
or think of 12 boxes and k = 5 identical balls to be placed, with
each ball allocation in a box determining with respect to which
variable one derivative is taken
Exercise-Summer 2017
Fifteen railway trucks are to be arranged into five different sidings.

How many different arrangements of the trucks into the sidings are
possible so that exactly one siding is empty [arrangements are
considered different only based on the number of trucks in the
sidings]?
Exercise-Summer 2017
Fifteen railway trucks are to be arranged into five different sidings.

How many different arrangements of the trucks into the sidings are
possible so that exactly one siding is empty [arrangements are
considered different only based on the number of trucks in the
sidings]?
n = 5 (boxes=slidings), k = 15
The following counts all possibile arrangements (including those

with more than one siding unoccupied by a truck), so it is larger
than the number we wish to determine

n−1+k 5 − 1 + 15
= = 3876 (wrong answer)
k 15
Exercise-Summer 2017-Continued
The number of arrangements we are after instead is obtained
breaking down the problem:
I which siding is empty? choose the empty sliding:
5

1 available choices
I place one truck in each non empty siding:
5

1 available choices
there is one way to do this, as the trucks are indistinguishable
in this problem
I place the unassigned trucks k̃ = k − 4 = 11 into the four non
empty sidings ñ = 5 − 1 = 4 in all possible ways
5

1 available choices
there is one way to do this, as the trucks are indistinguishable
in this problem
I place the unassigned trucks k̃ = k − 4 = 11 into the four non
empty sidings ñ = 5 − 1 = 4 in all possible ways

ñ − 1 + k̃ 4 − 1 + 11
=
k̃ 11
I apply first rule of counting

5 4 − 1 + 11
· = 1820
1 11
Multinomial Coefficients
Consider k distinguishable objects (from a population). In how
many possible ways can we distribute the objects to n groups of
given size k1 , . . . , kn , with ki > 0, k1 + . . . + kn = k (assume first
the groups to be distinct, e.g. numbered)?
Equivalently, how many ways can a population of k elements be
divided into n ordered parts of which the first contains k1 elements,
the second k2 , etc. [The order within the group is ignored]
Choose k1 objects out of k (with no attention to the order)


k
k1
Choose k2 objects out of the remaining elements


k
k1
Choose k2 objects out of the remaining elements

k − k1
k2
After forming the (n − 1)-st group there remain
k − k1 − . . . − kn1 = kn elements that form the last group
The total number of ways is thus (multiplication rule)

k k − k1 k − k1 − . . . − kn−2 kn
· ··· ·
k1 k2 kn−1 kn
k!
=
k1 !k2 ! · · · kn !
k!
= n
Πi=1 ki !

k
≡
k1 , k2 , . . . , kn

k1 + k2 + . . . kn
=
k1 , k2 , . . . , kn
As a notation above Πni=1 ai = a1 · a2 · · · an

Problems
I What is the number of ways in which a class of 11 students
can be split into 3 subgroups of sizes 4,4,3, which will be
assigned different problems to work on?
Problems

11 11!
= = 11500
4, 4, 3 4!4!3!
This is what the multinomial coefficient counts. The groups
are distinct (labelled by the problem), the objects are distinct
(students)
I What if the groups are instead indistinguishable (the students

are assigned the same problem)?
Problems

11 11!
= = 11500
4, 4, 3 4!4!3!
This is what the multinomial coefficient counts. The groups
are distinct (labelled by the problem), the objects are distinct
(students)
I What if the groups are instead indistinguishable (the students

are assigned the same problem)?

11 1 11! 1
= = 5750
4, 4, 3 2 4!4!3! 2
Suppose that we want to split 4 distinguishable objects {1, 2, 3, 4}

into three groups of size (2, 1, 1) The multinomial coefficient
counts the following 4!/2! = 12 groups
(1, 2|3|4) (1, 3|2|4) (1, 4|2|3) (2, 3|1|4) (2, 4|1|3) (3, 4|1|2)
(1, 2|4|3) (1, 3|4|2) (1, 4|3|2) (2, 3|4|1) (2, 4|3|1) (3, 4|2|1)
However, if the groups (of the same size) are indistinguishable then
the second line is the same as the first, so the total number of
ways of splittings is 12/2! = 6
To sum up, the multinomial coefficient

n
k X
, ki = k
k1 , k2 , . . . , kn
i=1
counts the number of possible ways of grouping k distinguishable

objects into n distinguishable groups of given sizes k1 , . . . , kn , such
that however the order of the objects inside each group is irrelevant
(Put it in a different language, it counts the number of possible

ordered partitions of k distinguishable objects into n disjoint
subsets of sizes k1 , . . . , kn )
If the groups of given size k1 , . . . , kn are indistinguishable, the total

number of ways of grouping k distinguishable objects is instead

k 1
k1 , k2 , . . . , kn b1 !b2 ! · · · bk !
Pk
where bi ≥ 0 is the number of groups of size i (k = i=1 ibi )
the number of groups of the same size is the only thing that
matters since the groups are indistinguishable
two groups of different sizes are distinguishable in view of their
different sizes
Of course, if only some groups are indistinguishable the above

expression should be modified accordingly
Three observations:
1 If we allow ki to be zero (the multinomial coefficient does
make sense since 0! = 1) and we sum
k
X k!
k1 ! · · · kn !
k1 ,...,kn =0
k1 +...+kn =k
we obtain exactly the number which we expect to obtain

which number?
Three observations:
1 If we allow ki to be zero (the multinomial coefficient does
make sense since 0! = 1) and we sum
k
X k!
k1 ! · · · kn !
k1 ,...,kn =0
k1 +...+kn =k
we obtain exactly the number which we expect to obtain

which number? nk
2 the multinomial coefficient

k!
k1 !k2 ! · · · kn !
counts also the number of different k-letter words from the
n-word alphabet (α1 , . . . , αn ) in which the letter α1 appears
exactly k1 times,..., αn appears kn times
The letters of the alphabet are the (distinguishable) groups

and the objects are the characters in the string (e.g. the
second character in the string is the letter A means in the
group A there is the object labelled 2)
(Re-examine previous exercise on the number of distinct words

that can be formed with the letters BAALBEK)
3 Notice also that when we obtained the multinomial coefficient
we said the k1 objects go to the first group, k2 to the second
etc.
If this is not specified one needs first to consider the

combinatorics associated with selecting the groups (which
must thus be distinguishable). For example,
in how many ways can 20 different objects be divided among

4 people so that three of them have 6 objects and one has 2?
3 Notice also that when we obtained the multinomial coefficient
we said the k1 objects go to the first group, k2 to the second
etc.
If this is not specified one needs first to consider the

combinatorics associated with selecting the groups (which
must thus be distinguishable). For example,
in how many ways can 20 different objects be divided among

4 people so that three of them have 6 objects and one has 2?

4 20!
3 6!6!6!2!
where the first factor is associated with choosing which 3 of

the 4 people will have 6 objects (we need to choose together
groups-people here- that contain the same number of objects)
Birthday Problem
What is the probability that in a classroom with t students at least

2 have the same birthday?
Assume that there are 365 days in the year, and that the
probability that a student has the birthday in a given day is the
same for all days of the year
Let us try to map the problem in terms of allocation of balls to

boxes.
What are the boxes? What are the balls?
Hint: use the complement set
E = ”at least two people have the same birthday”

Ē =
Birthday Problem
What is the probability that in a classroom with t students at least

2 have the same birthday?
Assume that there are 365 days in the year, and that the
probability that a student has the birthday in a given day is the
same for all days of the year
Let us try to map the problem in terms of allocation of balls to

boxes.
What are the boxes? What are the balls?
Hint: use the complement set

Ē = ”no two people have same birthday
Birthday Problem
Ē = no two people have same birthday
P(E ) = 1 − P(Ē )

(365)t 365 t!
= 1− =1−
365t t 365t
365 364 365 − t + 1
= 1− · ···
365 365 365
n = 365 boxes (each labelled by a day of the year) and t
(distinguishable) balls (people are distinguishable)
|Ω| = 365t and |Ē | = (365)t since the t distinguishable balls may
not fall in the same box
Equivalently, to
365
compute |Ē |, choose the t boxes that should be
occupied, t , then place the t balls, all possible ways of placing
the t balls in the t boxes being (t)t = t!
Birthday Problem
Notice that already when t = 23, the probability of having at least
two people with the same birthday is already greater than 1/2
Since we are interested in the event that no two people have the
same birthday, but we are not really interested in which people,
why can’t we consider the people indistinguishable?
If we did so, the points in the corresponding sample space would
not have the same probability, so we might not use the formula
based on the counting of the outcomes to compute the probability.
Namely, we are assuming that 1) a person can be born with the
same probability any day of the year, i.e. 1/365 ; 2) this probability
does not depend on another person’s birthday (the events are
independent- see later here for a more precise description)
This implies the denominator must be 365t = nt (distinguishable
balls)
Birthday Problem
Consider a simpler case t = 2 people and a year with n = 3 days
Considering the people as distinguishable (p1 , p2 ) the sample space

has the following nk = 32 = 9 outcomes
p1 p2 p1 p2 p1 p2
p1 p2 p1 p2 p1 p2
p2 p1 p2 p1 p2 p1
which are equiprobable, each with probability 1/3 · 1/3 = 1/9

(indeed, take a ball and throw it in the box, and assume it falls in
any box with the same probability, which is 1/3)
Birthday Problem
If we considered people as indistinguishable (p), one could consider
the corresponding sample space Ω̂ to be (since we can ignore the
label)
pp pp pp
p p p p p p
whose n−1+t

t = 6 elements do not all have the same probability:
the states in the second line have twice the probability as those in
the first line (2/9 vs 1/9)
This is because we are working under the assumption that a ball

can fall in a box with the same probability 1/n, with n the number
of boxes, and where a ball lands does not affect where another ball
will land.
When this is the case, always label the balls even if the problem
says the balls are identical
Sample spaces
Notice however that if we change assumptions (e.g. we do not

assume that each object can go with the same probability in any
cell independently of another), a sample space such as Ω̂ could
then also be equiprobable
See Example 1.8.6 in the book of DeGroot-Schervich (hand-out)

Problem
A collection of indistinguishable particles may occupy a set of
available discrete states, at thermodynamic equilibrium. More than
one particle may occupy any given state and all allocations are
equally probable. Consider 22 such particles (bosons) and 8 states.
Compute the probability that the number of states that are not
occupied is 3
Problem
occupied is 3
We can think of the n = 8 states as being the labelled boxes, and

of the k = 22 particles as being the indistinguishable balls. States
may be occupied by any number of particles
Thus, the total number of arrangements is |Ω| =

Problem
occupied is 3
We can think of the n = 8 states as being the labelled boxes, and

of the k = 22 particles as being the indistinguishable balls. States
may be occupied by any number of particles
Thus, the total number of arrangements is |Ω| = n+k−1 = 29

k 22
Since all arrangements/microstates are equiprobable (key

assumption), we can compute the probability of the event of
interest as
|E |
P(E ) =
|Ω|
Problem. Continued
E is the set of arrangements where three states are empty
|E | =
Problem. Continued
E is the set of arrangements where three states are empty

8 21
|E | =
3 17
Choose which states are empty: 83 possible ways

Fill the other ñ = 8 − 3 states with one particle in each state (this
can be done in one way only, since the particles are
indistinguishable)
Place the remaining k̃ = 22 − 5 = 17 particles: ñ−1+ k̃

k̃
ways
8 21

|E | 3 17
P(E ) = = 29 ≈ 0.214
|Ω| 22
Caution
Notice that
n+k −1
k
counts all possible ways of allocating the k identical particles to
the n states, including those that leave states empty, which is why
we filled the states first with one particle
Notice also that this way of approaching the problem would not
work if the particles were distinguishable
Exercise
Consider 4 balls and three labelled boxes. If each ball has the same
probability of falling in any box and of doing so independently of
the others, what is the probability that one box is left empty?
Exercise
Consider 4 balls and three labelled boxes. If each ball has the same
probability of falling in any box and of doing so independently of
the others, what is the probability that one box is left empty?
Total number of ways of arranging k = 4 objects in n = 3 balls is
|Ω| = 34
that the balls are identical or not does not change anything in this case,
since they all have the same probability 1/3 of falling into any box and
they do so independently of each other (1/3)4 (see previous exercise
however when the assumptions are different)
For the numerator: choose the empty box: 31 ; place all the k = 4

balls in the other ñ = 2 boxes (there are 24 possible ways) but

remove the two configurations where the 4 balls are in one of these
two boxes (otherwise there will be an additional empty box)
3
4
1 (2 − 2) 42
=
34 81
Exercise. Continued
The term 24 − 2 can be also seen by applying the Poincaré

identities
Let Ai be the event ”the i-th box is empty”
|Ā1 ∩ A¯2 |
is what we want. The indices of the boxes are irrelevant since we

have already counted how many ways there are to choose the
non-empty boxes
|Ā1 ∩ A¯2 | = |A1 ∪ A2 | = |Ω| − |A1 ∪ A2 |

= |Ω| − |A1 | − |A2 | + |A1 ∩ A2 |
= 24 − 1 − 1 + 0 = 24 − 2
Exercise. Continued
Equivalently, count all possible assignments. There are two
possibilities only:
a box is empty, a box contains 3 balls and a box contains 1
ball
a box is empty, and the remaining two boxes contain each two
balls
For all these cases, choose first which box contains how many
(choosing together boxes that contain the same number of balls)
then assign the balls to the boxes (multinomial coefficient)
3 2 4! 3 2 4!

1 1 3!1! + 1 2 2!2! 42
=
34 81
If you try to approach the problem as we did the previous problem,
placing one ball in each box that should be filled, and then
counting all possible ways to place the others you would have
over-counted
Caution
Care should be taken when considering the sample spaces and the
probability of its outcomes
It will become clearer with practice, but the following two slides
(taken from two textbooks) should give you further details on how
to assign probabilities to sample spaces
Ordered versus Unordered Samples
Sometimes the same collection of elements in different orders are treated as different samples, and sometimes the
same elements in different orders are treated as the same sample. ”In general, how can one tell which is the correct
way to count in a given problem? Sometimes, the problem description will make it clear which is needed. For
example, if we are asked to find the probability that the items in a sample arrive in a specified order, then we
cannot even specify the event of interest unless we treat different arrangements of the same items as different
outcomes. However, there are cases in which the problem description does not make it clear whether or not one
must count the same elements in different orders as different outcomes. Indeed, there are some problems that can
be solved correctly both ways... In general, this is the principle that should guide the choice of counting method. If
we have the choice between whether or not to count the same elements in different orders as different outcomes,
then we need to make our choice and be consistent throughout the problem. If we count the same elements in
different orders as different outcomes when counting the outcomes in the sample space” Ω, ”we must do the same
when counting the elements of the event E of interest. If we do not count them as different outcomes when
counting Ω, we should not count them as different when counting E” (De Groot)
Concerning ordering and sampling in practice
Sometimes ”one feels intuitively that the order within the sample should be irrelevant, and the beginner is therefore
prone to think of samples as not being ordered. But conclusions from a sample are possible only on the basis of
certain probabilistic assumptions, and for these it is necessary to have an appropriate model for the conceptual
experiment of obtaining a sample. Now such an experiment obviously involves choices that can be distinguished
from each other, meaning choices that are labelled in some way. For theoretical purposes it is simplest to use the
integers as labels, and this amounts to ordering the sample... In other words, even though the order within the
samples may be ultimately disregarded, the conceptual experiment involves ordered samples, and ... this affects the
appropriate assignment of probabilities.” (Feller)

Problem
A deck of 52 cards (13 values A, 2, . . . , 10, J, Q, K , for each of four

suits ♦, ♥, ♣, ♠) is shuffled thoroughly and the cards are then
distributed among four players so that each player receives 13
cards. What is the probability that each player will receive one ace?
Problem
A deck of 52 cards (13 values A, 2, . . . , 10, J, Q, K , for each of four

suits ♦, ♥, ♣, ♠) is shuffled thoroughly and the cards are then
distributed among four players so that each player receives 13
cards. What is the probability that each player will receive one ace?
The k = 52 cards are distinguishable. We can consider the n = 52

positions in which they are dealt as distinguishable (boxes)
|Ω| = (52)52 = 52! (all possible shufflings)
|E | = 134 4!48! (the ace must be received by a player among
his/her 13 cards-as first, second, or 13th card: 134 ; shuffle the
aces: 4!; and the non aces: 48!)
134 4!48!
p(E ) =
52!
Problem (continued). Solution 2
Now we can compute the same probability considering as sample

space Ω the positions in the 52-card deck that are occupied by the
four aces but not counting different arrangements of the four aces
in those four positions as different outcomes

52
|Ω| =
4
Of all these arrangements, we want to consider only those in which

an ace is among the 13 positions corresponding to the first player,
etc. That is, exactly 134 of the combinations in Ω satisfy the
requirement we seek, thus
134
p(E ) = 52

4
Problem (continued). Solution 3
Consider instead as sample space Ω the assignments of the 52
cards (distinguishable) to 4 (distinguishable) groups of equal size

52 52!
|Ω| = =
13, 13, 13, 13 13!4
(multinomial coefficient)
The groups of cards E we are interested in are those that contain
each an ace and 12 non aces

4 48 3 36 2 24 1 12 4!48!
|E | = =
1 12 1 12 1 12 1 12 12!4
(choose an ace and assign it to the first player, complete the set
choosing 12 cards from the 48 cards that are not aces, then do the
same for the other players using the remaining cards)
4!48!134
p(E ) =
52!
Hypergeometric
Consider an urn that contains N balls, K of which red and N − K
non-red. n balls are drawn (one after another without replacement
or at once) at random from the urn. What is the probability pk
that k red balls are drawn?
The total number of ways of choosing n balls out of N is:

Hypergeometric
The total number of ways of choosing n balls out of N is: Nn

The number of ways of choosing k red balls and n − k non-red is

Hypergeometric
The total number of ways of choosing n balls out of N is: Nn

The
K
of ways of choosing k red balls and n − k non-red is
number
N−K
k · n−k
Thus, the probability of interest

K N−K

k ·
pk = N
n−k
n
with the requirement max(0, n − (N − K )) ≤ k ≤ min(K , n)

otherwise pk = 0
pk defines the hyper-geometric distribution (chapter 3)

Hypergeometric
Notice that all the balls in the urn are implicitly labelled in the
derivation above. The equiprobable sample space we have
considered in the previous page is the set of unordered samples of
size k from an urn with labelled balls
Suppose N = 5, K = 3 (the urn contains 5 balls, three of which

red) and we are interested in the probability of drawing exactly
k = 1 red ball in a sample of size n = 2
The sample
space that we are considering contains the following
N 5
n = 2 = 10 elements
[R1 B1 ] [R2 B1 ] [R3 B1 ] [R1 B2 ] [R2 B2 ] [R3 B2 ]

[R1 R2 ] [R1 R3 ] [R2 R3 ] [B1 B2 ]
and the number of outcomes favorable to our event

56 (those on
3 2
the first line), thus the probability 6/10 = 1 1 / 2
Hypergeometric
Nothing changes in the computation of the probability, if we
consider as sample space the set of ordered samples
Both the sample space and the set of interest are now two times
larger than above (for each element we now have 2!)
Indeed, in the general case, this can be seen from the equation
K
N−K K
N−K
k · n−k n! k · n−k

n (K )k · (N − K )n−k
= = N
(N)n k (N)n

n
|Ω| = (N)n counts all ordered n-tuples from a population of size N
and the numerators in the first and second term count all ordered
sets of size n with k red balls and n − k non-red
central term: count all ordered subsets from each sub-population,
then count all possible orderings of the n balls, keeping at the
same time fixed the balls of each color since their ordering has
already been taken into account: n!/(n − k)!k! = kn

left-hand side: choose the k red balls, the n − k non-red balls and
count all possible orderings (n!)
Hypergeometric
You can look at the problem differently, since
K
N−K n
N−n
k · n−k k · K −k
N
= N

n K
Consider as sample space the set of all distinct words from a set of
K R’s and N − K R̄’s (as if you were drawing all balls from the
urn and placing them one after another on a row, with the balls
indistinguishable other than for their colors, R, R̄) thus (as the
BAALBEK example)

N! N
|Ω| = =
K !(N − K )! K
We are interested in the subset of such words whose first n letters
are k R’s and n − k R̄’s (our sample) and whose last N − n letters
are the remaining letters, K − k of which are
R’s. The total
number of such words is indeed kn · K N−n
−k
(There is nothing special to our sample being the first n letters,
however)
Exercise
A bakery makes 80 loaves of bread daily. Ten of them are

underweight. An inspector weighs 5 loaves at random. What is the
probability that an underweight loaf will be discovered?
Exercise
A bakery makes 80 loaves of bread daily. Ten of them are

underweight. An inspector weighs 5 loaves at random. What is the
probability that an underweight loaf will be discovered?
a means at least one, not one
P(at least 1 underweight) = 1 − P(no underweight loaves)

10
70
·
= 1 − 0 80 5
5
70
5
= 1− 80
≈ 0.4965
5
Conditional Probability
Consider an urn that contains B blue and R red balls
Let us draw a ball from it. The probability that the ball is blue is
P(1st blue) =
B
P(1st blue) =
B +R
Now, the conditional probability answers questions such as this:
what is the probability of drawing a red ball if (given that) the first
ball was blue?
E.g., in the case of sampling without replacement we have:
P(2nd red|1st blue) =

B
P(1st blue) =
B +R
Now, the conditional probability answers questions such as this:
what is the probability of drawing a red ball if (given that) the first
ball was blue?
E.g., in the case of sampling without replacement we have:
R B −1
P(2nd red|1st blue) = , P(2nd blue|1st blue) =
B +R −1 B +R −1
Notice P(2nd red|1st blue) + P(2nd blue|1st blue) = 1

A, B ⊆ Ω two events, with P(B) > 0
The conditional probability is the probability that A occurs, given
that B has occurred
P(A ∩ B)
P(A|B) =
P(B)
One can verify that if we condition on Ω, we get the usual

probability
P(A|Ω) = P(A)
Also
P(B|B) = 1
P(∅|B) = 0
The conditional probability P(·|B) has the same properties on the

space ΩB = Ω ∩ B = B as the original probability P(·) has on Ω
For a sequence of disjoint events Ai , i ∈ N, Ai ∩ Aj = ∅, when i 6= j
!
[ X
P Ai |B = P(Ai |B)
i i
which is the second axiom of probability. Indeed,

! ! !
[ [
P Ai |B = P Ai ∩ B /P(B)
i i
!
[
= P (Ai ∩ B) /P(B)
i
X X
= P (Ai ∩ B) /P(B) = P (Ai |B)
i i
since (Ai ∩ B) ∩ (Aj ∩ B) = Ai ∩ Aj ∩ B = ∅ ∩ B = ∅

In particular,
P(A|B) + P(Ā|B) = 1
However in general
P(A|B) + P(A|B̄) 6= 1
Furthermore
P(A|B) = 1 A ⊇ B
and thus (first axiom)
P(Ω|B) = P(ΩB |B) = 1
Accordingly, since the axioms of probability hold upon

conditioning, all results we have obtained for the unconditional
probability hold true when we condition on the same event. E.g.,
P(A ∪ C |B) = P(A|B) + P(C |B) − P(A ∩ C |B)

Exercise
A fair coin is tossed twice. What is the probability that both tosses
land on heads given that
1 the first toss lands head up
2 at least one of the tosses lands head up
Since the coin is fair, all the points in the sample space
Ω=
Exercise
Ω = {(H, H), (T , T ), (H, T ), (T , H)}
have the same probability. The event we need to consider is
A = {(H, H)}
and those we need to condition on

Exercise
Ω = {(H, H), (T , T ), (H, T ), (T , H)}
have the same probability. The event we need to consider is
A = {(H, H)}
and those we need to condition on

1 B1 = {(H, T ), (H, H)}
2 B2 = {(H, T ), (T , H), (H, H)}
Exercise. Continued
Thus the resulting probabilities are different in the two cases
1
P ((H, H) ∩ {(H, T ), (H, H)})
P(A|B1 ) =
P ({(H, T ), (H, H)})
P ((H, H))
=
P ((H, T )) + P ((H, H))
1/4 1
= =
1/4 + 1/4 2
2
P ((H, H) ∩ {(H, T ), (T , H), (H, H)})
P(A|B2 ) =
P ({(H, T ), (T , H), (H, H)})
P ((H, H))
=
P ((H, T )) + P ((T , H)) + P ((H, H))
1/4 1
= =
1/4 + 1/4 + 1/4 3
Interpreting the Conditional Probability
Since A = {(H, H)} ⊂ B1 = {(H, T ), (H, H)}, the sample space is
now B1 , thus we could have obtained P(A|B1 ) directly by simple
counting (because the space is uniform)
1
P(A|B1 ) =
2
Similarly, A = {(H, H)} ⊂ B2 = {(H, T ), (H, H), (T , H)}
1
P(A|B2 ) =
3
That is, another way of reading
P(A ∩ B)
P(A|B) =
P(B)
is that the probability of A given B is proportional to that part of

A that lies in B
Hypergeometric Using Conditional Probability
Consider again the probability of having k red balls in a sample of
size n drawn without replacement from an urn with N balls, K of
which are red
K N−K

k n−k n (K )k (N − K )n−k
N
=
n
k (N)n
The factor
(K )k (N − K )n−k K K −1 K −k +1
= ···
(N)n N N −1 N −k +1
N −K N − K − (n − k) + 1
· ···
N −k N − k − (n − k) + 1
is the product of conditional probabilities: K

N being the probability
−1
of drawing a red ball from the original urn, K N−1 being the
conditional probability of drawing a red ball given that the first
draw was a red ball etc. (drawing, first, all red balls and then all
the required non-red)
Since we can arrange the terms in both numerator and

denominator, the factor
(K )k (N − K )n−k K (K − 1) · · · (N − K − (n − k) + 1)
=
(N)n N(N − 1) · · · (N − k − (n − k) + 1)
counts the probability of drawing without replacement k red balls

and n − k non red balls in any given order
The hyper-geometric distributions sums up all these kn

contributions.
Indeed kn is the number of ways of placing k red balls in n places

For example, suppose the urn has N = 10 balls, K = 3 of which
are red and N − K = 7 non-red (without loss of generality, we can
color them all blue). We sample 3 balls without replacement. The
probability that one of the drawn balls is red is given by
3 7

1
2
10
3
There are 3 red possible patterns in a sample of size 3 with one red
ball
A1 = RBB A2 = BRB A3 = BBR
The hypergeometric counts
p(A1 ∪ A2 ∪ A3 ) = p(A1 ) + p(A2 ) + p(A3 )
3 7 6 7 3 6 7 6 3
= · · + · · + · ·
10 9 8 10 9 8 10 9 8
3 7
3·7·6
= 3 = 1 102
10 · 9 · 8 3
(the first line is because Ai are mutually exclusive)
Exercise
What is the probability of drawing k balls that are red from an urn
with N total balls, K of which red, if instead we sample with
replacement?
Look at the relation

K N−K

N
=
n
k (N)n
and consider how the cardinality of the space changes ...
Exercise
What is the probability of drawing k balls that are red from an urn
with N total balls, K of which red, if instead we sample with
replacement?
Look at the relation

K N−K

N
=
n
k (N)n
and consider how the cardinality of the space changes ...
k
K n−k
k
n K (N − K )n−k

n K
= 1−
k Nn k N N
(binomial-see chapter 3)
Multiplication Rule
Now, from
P(A ∩ B)
P(A|B) =
P(B)
the following multiplication rule follows
P(A ∩ B) = P(A|B)P(B)
the probability that A and B will both occur is equal to the

product of the probability that B will occur multiplied by the
probability that A will occur given that B has occurred
If P(A) > 0, by symmetry (commutativity of ∩), we can also write
P(A ∩ B) = P(B|A)P(A)
Exercise
Consider a well-shuffled regular deck of cards (13 cards

A, 2, . . . , 10, J, Q, K , for each of four suits ♣, ♠, ♦, ♥). Two cards
are dealt off the top of the deck. What is the probability that the
first card is A♦ and the second 2♠ ?
P(C 1 = A♦ ∩ C 2 = 2♠) =
Exercise
Consider a well-shuffled regular deck of cards (13 cards

A, 2, . . . , 10, J, Q, K , for each of four suits ♣, ♠, ♦, ♥). Two cards
are dealt off the top of the deck. What is the probability that the
first card is A♦ and the second 2♠ ?
P(C 1 = A♦ ∩ C 2 = 2♠) = P(C 2 = 2♠|C 1 = A♦)P(C 1 = A♦)

1 1
=
51 52
Generalization to many events. Chain Rule
Consider n events Ai ⊆ Ω, i = 1, . . . , n, then the following chain

rule holds
n
!
\
P Ai = P(A1 )P(A2 |A1 )P(A3 |A2 ∩A1 ) · · · P(An |An−1 ∩· · ·∩A1 )
i=1
or in more compact form, using the notation AB ≡ A ∩ B
P (A1 · · · An ) = P(A1 )P(A2 |A1 )P(A3 |A2 A1 ) · · · P(An |An−1 · · · A1 )
It does not matter the order with which we condition on the events
(which event we choose as the first to condition on, although of
course once a choice is made, we need to stick with it). In
practice, however, some choices make the computation much
easier than other choices
Law of Total Probability
Consider a complete set of disjoint events. Namely,
n
[
A1 , . . . , An , Ai = Ω, Ai ∩ Aj = ∅, i 6= j
i=1
exhaustive (at least one event will occur) and mutually exclusive,
then the following law of total probability holds
X n
P(B) = P(B|Ai )P(Ai )
i=1
Proof. B can be written as the union of disjoint sets
n n
!
[ [
B =B ∩Ω=B ∩ Ai = B ∩ Ai
i=1 i=1
(B ∩ Ai ) ∩ (B ∩ Aj ) = B ∩ Ai ∩ Aj = ∅
Hence
n
X n
X
P(B) = P(B ∩ Ai ) = P(B|Ai )P(Ai )
i=1 i=1
Law of Total Probability
The above formula provides the basic tool to compute the

probabilities of complicated events in terms of conditional
probabilities
Exercise
Assume people can be born any day of the year with equal
probability. What is the probability that a randomly selected
person was born on January, 1?
Exercise
Assume people can be born any day of the year with equal
probability. What is the probability that a randomly selected
person was born on January, 1?
Consider the complete set of events Y365 , Y366 , where Yk =”the

year has k days” and the event B =”birthday on January, 1”
P(B) = P(B|Y365 )P(Y365 ) + P(B|Y366 )P(Y366 )

1 3 1 1
= +
365 4 366 4
Graphical Representation
It may be useful to approach a probabilistic problem graphically.

We can construct a probability tree starting from a node, and then
branching out, with each branch being associated with one of the
events of a complete set.
The starting point is a complete set for which the unconditioned

probabilities are known.
Assume A1 , A2 is such a set, namely
Ω = A1 ∪ A2 , A1 ∩ A2 = ∅ (A2 = Ā)
We grow branches from the root, with each branch corresponding

to one of the two events and we furnish the branches with the
unconditioned probabilities P(A1 ) and P(A2 ).
)
A1
P(
P(
A2
)
Suppose a new complete set of events is available, B1 , B2 , B3 , for
which the conditional probabilities P(Bi |Aj ) are known. Then we
can use each A-branch as the root of a new sub-tree, whose
branches correspond to the events Bi . These new branches will be
furnished with the conditional probabilities P(Bi |Aj ).
The probabilities of a given event that is the intersection of some
events, e.g. B2 ∩ A1 , can be read from the graph: one has to select
the unique path that contains the branches of the events of
interest, B2 and A1 , and multiply the probabilities one encounters,
because of the chain rule
P(A1 ∩ B2 ) = P(B2 |A1 )P(A1 )
Similarly, unconditional probabilities of events can be obtained

employing the law of total probability
P(B2 ) = P(B2 ∩ Ω) = P(B2 ∩ A1 ) + P(B2 ∩ A2 )

Graphical Representation of the Chain Rule
)
|A 1
B1
P(
P(B2 |A1 )P(A1 ) = P(B2 ∩ A1 )

P(B2 |A1 )
)
A1 P(
P( B
3 |A
1)
P( )
A2
) |A 2
B1
P(
P(B2 |A2 )
P(B2 |A2 )P(A2 ) = P(B2 ∩ A2 )
P(
B
3 |A
2)
The process can be iterated if there is another complete set of

events {Ck }, for which the conditional probabilities given the
events previously represented are known. Indeed, each branch is
furnished with the conditional probability of the event it represents,
conditioned on the intersection of all the events that are met when
one follows the path that from that branch reaches the root.
Exercise 63
For customers purchasing a refrigerator at a particular appliance
store, consider the events
A =”refrigerator purchased was made in the USA”
B =”refrigerator had an ice-maker”
C =”customer purchased extended warrant”
Assume the following unconditional and conditional probabilities:
P(A) = .75 P(B|A) = 0.9 P(B|Ā) = 0.8
P(C |A ∩ B) = 0.8 P(C |A ∩ B̄) = 0.6
P(C |Ā ∩ B) = 0.7 P(C |Ā ∩ B̄) = 0.3
Compute P(A ∩ B ∩ C ), P(B ∩ C ), P(C ) using the tree or
otherwise
P(A ∩ B ∩ C ) = 0.8 · 0.9 · 0.75 = 0.54

P(B ∩ C ) = 0.54 + 0.14 = 0.68
P(C ) = 0.68 + 0.045 + 0.015 = 0.74
Problem
In an urn containing N balls, R are red. What is the probability

that the second draw is a red ball? Consider both sampling with
and without replacement
Problem
In an urn containing N balls, R are red. What is the probability

that the second draw is a red ball? Consider both sampling with
and without replacement
Ri = ”the i-th draw is a red ball”
P(R2 ) = P(R2 |R1 )P(R1 ) + P(R2 |R̄1 )P(R̄1 )

 RR R N−R R
 NN +N N = N with replacement
=
 R−1 R R N−R R
N−1 N + N−1 N = N without replacement
(of course in the case of sampling with replacement, the urn has
the same composition for any draw)
Thus
P(R2 ) = P(R1 )
General Case
What is the probability that the k-th draw is a red ball?
Ek =”the k-th draw is a red ball” We want to compute
P(Ek )
the unconditional probability.

We do not know if this red ball is the first, the second, or the k-th
red ball that has been drawn
The colors of the balls drawn before the k-th went unnoticed
If the sampling is with replacement clearly
p(Ek ) = R/N
since before carrying out the k-th draw the urn contains the same
balls as at the start
the same result holds true if the sampling is without replacement

General Case. Solution 1. Law of Total Probability
For i = 0, . . . , k − 1, let Ri∈k−1 be the event ”there are i red balls
among the first k − 1 drawn balls”
The events {Ri∈k−1 }k−1
i=0 form a complete set (mutually
exclusive-we may not have i red balls among the first k − 1 and at
the same time j 6= i red balls- and exhaustive), so we can
determine p(Ek ) (probability that the k-the ball is red) using the
law of total probability
k−1
X
p(Ek ) = P(Ek |Ri∈k−1 ) · P(Ri∈k−1 )
i=0
R
k−1
N−R
X R −i i k−1−i
= · N
N − (k − 1)

i=0 k−1
k−1 R−1
N−R
R X i k−1−i R
= N−1
=
N

k−1
N
i=0
Exercise
The last identity is because of the Vandermonde’s identity

r
X m n m+n
=
i r −i r
i=0
with r = k − 1, m = R − 1, n = N − R.
Exercise. Prove the identity considering indistinguishable balls in

distinguishable cells without exclusion, or put it differently, use the
Vandermonde’s identity to prove that the total number of ways of
placing k indistinguishable balls into n distinguishable cells that
can be occupied by any number of balls is
Exercise
The last identity is because of the Vandermonde’s identity

r
X m n m+n
=
i r −i r
i=0
with r = k − 1, m = R − 1, n = N − R.
Exercise. Prove the identity considering indistinguishable balls in

distinguishable cells without exclusion, or put it differently, use the
Vandermonde’s identity to prove that the total number of ways of
placing k indistinguishable balls into n distinguishable cells that
can be occupied by any number of balls is

n−1+k
k
General Case. Solution 2
Another proof. Draw without replacement an ordered sample of k
balls out of a population of N balls. The total number of such
samples is |Ω| = (N)k = N!/(N − k)!
Among all these samples we need to count only those that have as
k-th ball a red one
k
which number to

R R (N − 1)! (N − 1)!
|Ek | = (N − 1)k−1 = =R
1 1 (N − 1 − (k − 1))! (N − k)!
(choose one ball among the red ones, place it in the k-th position,
consider an ordered (k − 1)-size sample from the population of all
balls, except one to be placed in k-th position)
|Ek | R
p(Ek ) = =
|Ω| N
General Case. Solution 3
Equivalently, consider the problem of placing N distinguishable

balls in N labelled cells having at most one ball per cell (and thus
in this case exactly one), the labels of the cell corresponding to the
position with which the balls are drawn in the original problem
1 k N
|Ω| = N! |Ek | = R(N − 1)!

(choose a red ball, place it in the k-th position, place all other
balls in all possible ways)
|Ek | R
p(Ek ) = =
|Ω| N
Bayes’ Theorem
Consider a complete set of disjoint events:

n
[
A1 , . . . , An , Ai = Ω, Ai ∩ Aj = ∅, i 6= j
i=1
Bayes’ Theorem
P(B|Ai )P(Ai )
P(Ai |B) = Pn
j=1 P(B|Aj )P(Aj )
Proof
P(Ai ∩ B) P(B|Ai )P(Ai )
P(Ai |B) = = Pn
P(B) j=1 P(B|Aj )P(Aj )
P(Ai ) prior probabilities P(Ai |B) posterior probabilities

Bayes’ Theorem
We have different hypotheses Ai about some phenomenon that are
mutually exclusive and exhaustive (that is, one hypothesis is true
-exhaustive- and only one-mutually exclusive)
We do not know which hypothesis is true, but we have an a-priori
knowledge of how likely each is P(Ai ) (prior probability)
We carry out an experiment and we employ the result of this
experiment B (the data) to re-assess the probabilities of these
hypotheses. After the experiment (conditional on it), the
probability of each hypothesis is
P(B|Ai )P(Ai )
P(Ai |B) = Pn i = 1, . . . , n
j=1 P(B|Aj )P(Aj )
P(B|Ai ) indicates the probability that given that the hypothesis Ai

is true, we obtain the result B from the experiment
Thus we use both the result of the experiment (data) and the prior
probabilities (our informed prior knowledge) to determine the
posterior probability
Exercise
We have 10 (biased) coins. When the i-th coin is tossed, the
probability of heads is i/10 (i = 1, . . . , 10). We randomly select a
coin, toss it, and get heads. What is the probability that it was the
coin numbered 5? (Spring 2015-Exam)
Exercise
H1 =”head comes up” (our experiment’s outcome, our data)

Ik =”the coin selected is the k-th”
P(H1 |Ik ) indicates the probability that if the hypothesis Ik is true
(that is, if the k-th coin in the one selected and tossed), the coin
lands head up.
P(Ik |H1 ) is what we should compute for k = 5
Exercise
H1 =”head comes up” (our experiment’s outcome, our data)

Ik =”the coin selected is the k-th”
P(H1 |Ik ) indicates the probability that if the hypothesis Ik is true
(that is, if the k-th coin in the one selected and tossed), the coin
lands head up.
P(Ik |H1 ) is what we should compute for k = 5
P(H1 |Ik )P(Ik ) P(H1 |Ik )P(Ik )

P(Ik |H1 ) = = P10
P(H1 ) j=1 P(H1 |Ij )P(Ij )
k 1
10 10 k
= P10 j 1 =
j=1 10 10
55
Thus P(I5 |H1 ) = 1/11

Exercise
Mr Jones from New York City and Mr Clark from Philadelphia
decide to meet up in NYC. Mr Clark is a very indecisive person and
thus flips a coin to make up his mind about going to NYC. If the
coin is heads, he goes to the station, and there he chooses which
of the six available trains to board by casting a die. Given that Mr
Jones sees that Mr Clark has not arrived in NYC with any of the
first five trains, what is the probability that the latter (Mr Clark)
will arrive with the 6th train? (Fall 2014 Exam)
Exercise
T6 =”Mr Clark took the 6th train”

N5 =”Mr Clark was not on any of the first five trains”
Exercise
T6 =”Mr Clark took the 6th train”

N5 =”Mr Clark was not on any of the first five trains”
1 1 1 1 1
P(N5 |T6 ) = 1, P(T6 ) = · , P(N5 ) = + ·
2 6 2 2 6
(last equality is because if Mr Clark was not on any of the first 5
trains either he took the sixth train or he did not leave for NYC)
p(N5 |T6 )p(T6 ) 1
p(T6 |N5 ) = =
p(N5 ) 7
Independence
Two events A and B are independent iff
P(A ∩ B) = P(A)P(B)
Thus since when P(A), P(B) > 0
P(A ∩ B) = P(B|A)P(A) = P(A|B)P(B)
we can, equivalently, state that independence is
P(B|A) = P(B) P(A|B) = P(A)
the fact that one event occurs does not modify the probability of
the other
Independence. General Case
A1 , . . . , An are mutually/statistically independent or simply

independent iff
P(Ai1 ∩ · · · ∩ Aik ) = P(Ai1 ) · · · P(Aik )

∀k = 2, . . . , n, ∀1 ≤ i1 < · · · < ik ≤ n
That is, we have to verify the property for any possible subset of
A1 , . . . , An
Pairwise independence
P(Ai ∩ Aj ) = P(Ai )P(Aj ), ∀i 6= j
does not imply events are mutually independent

Independence. General Case
Example (seen before- here )
Ω = {ω1 , ω2 , ω3 , ω4 } P(ωi ) = 1/4, ∀i
The events
A1 = {ω1 , ω2 }, A2 = {ω1 , ω3 }, A3 = {ω1 , ω4 }
are pairwise independent

1 1 1
= P(Ai ∩ Aj ) = P(Ai )P(Aj ) = ·
4 2 2
but not independent since
3
1 1
= P(A1 ∩ A2 ∩ A3 ) 6= P(A1 )P(A2 )P(A3 ) =
4 2
Independence
Saying that two events A and B are independent does not mean
that they have ”nothing to do with each other” or that one does
not ”influence” the other
Consider the experiment of rolling two dice and the events
A =”the sum of the numbers that come up is 7”
B = ”the first die shows k”, k a fixed number, k = 1, . . . , 6
A and B are independent (the fact that one event occurs does not
modify the probability of the other):
1 1 1
· = p(A) · p(B) = p(A ∩ B) =
6 6 36
Yet they have something to do with each other, as it were.
Independence of two events means instead that the probabilities of

the second event given the first are the same no matter what the
outcomes are of the first one
Problem
Is A independent of A?
Problem
Is A independent of A?
P(A) = P(A ∩ A) = P(A)P(A)

the first equality by idempotency of ∩, the second by independence
Thus, if A is independent of A then P(A) = {0, 1} (and vice versa)

Problem
If A and B are independent, then
I Ā, B̄ are independent
I A and B̄ are independent (and then obviously Ā, and B)
Let us prove the first
1 − (1 − P(Ā))(1 − P(B̄)) = 1 − P(A)P(B)

= 1 − P(A ∩ B) = P(A ∩ B)
= P(Ā ∪ B̄)
= P(Ā) + P(B̄) − P(Ā ∩ B̄)
That is
1 − (1 − P(Ā))(1 − P(B̄)) = P(Ā) + P(B̄) − P(Ā ∩ B̄)
Hence P(Ā)P(B̄) = P(Ā ∩ B̄)
Exercise: Prove the second result

Mutually Exclusive and Independent
Mutually exclusive and independence are two different ideas

I Two events are mutually exclusive if the occurrence of one
prevents the occurrence of the other
I Two events are independent if the occurrence of one does not

change the probability of the other
However notice the following

Given two events A and B such that p(A) > 0 and p(B) > 0, then
A ∩ B = ∅ ⇒ P(A ∩ B) 6= P(A)P(B)
mutually exclusive ⇒ dependence
Conditional Independence
Two events A and B are conditionally independent given C is
P(A ∩ B|C ) = P(A|C )P(B|C )
which simplifies the general condition
P(A ∩ B|C ) = P(A|B ∩ C )P(B|C )
valid for all events.
Equivalently, conditional independence of A and B given C can be

expressed as
P(A|B ∩ C ) = P(A|C ) P(B|A ∩ C ) = P(B|C )

Exercise. Continued
Let us reconsider the previous problem
coin, toss it, and get heads. Toss again the same coin. What is the
probability that it lands head up again?
Ht =”the selected coin lands head up at the t-th toss”, t = 1, 2

Ik =”the selected coin is the k-th coin”, k = 1, . . . , 10, a complete
set
We want to compute P(H2 |H1 )
We can use the conditional version of law of total probability,
which states that if {Ai }10
i=1 is complete set
n
X n
X
P(B|C ) = P(B ∩ Ai |C ) = P(B|Ai ∩ C )P(Ai |C )
i=1 i=1
since P(B ∩ Ai |C ) = P(B|Ai ∩ C )P(Ai |C )

Exercise. Continued
Hence,
10
X 10
X
P(H2 |H1 ) = P(H2 ∩ Ik |H1 ) = P(H2 |Ik ∩ H1 )P(Ik |H1 )
k=1 k=1
Now the coin tosses are conditionally independent given the Ik ,
that is
P(H2 |Ik ∩ H1 ) = P(H2 |Ik ),
so that
10
X
P(H2 |H1 ) = P(H2 |Ik )P(Ik |H1 )
k=1
Using the results of the previous part, one gets
10
X
P(H2 |H1 ) = P(H2 |Ik )P(Ik |H1 )
k=1
10
X k k 385
= = = 0.7 > P(H1 ) = 0.55
10 55 550
k=1
Exercise. Continued
What is the meaning of the following ?
10
X
P(H2 |H1 ) = P(H2 |Ik )P(Ik |H1 )
k=1
After the experiment H1 is carried out we have learnt something

about the coin (Ik ). Thus P(Ik |H1 ), the posterior probability after
the first experiment, should be used as the probability that the
selected coin is the k-th, in place of P(Ik ) which would be used if
instead no experiment had been carried out
10
X
P(H2 ) = P(H2 |Ik )P(Ik )
k=1
Every time we carry out an experiment, the posterior probabilities

become the new prior probabilities

Chapt 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapt 2

Uploaded by

Copyright:

Available Formats

Chapter 2- 201781

Sample Space and Events

The sample space Ω (or S) associated with an experiment is the

A subset E of Ω, E ⊆ Ω, is called an event

E can contain just one outcome (simple event or state) (e.g.

E = ∅ is referred to as the impossible (or vacuous) event

{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,

I cast a regular die and, if 6 comes up, toss a coin

{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,

I cast a regular die and, if 6 comes up, toss a coin

I measure the time for the emission of radioactive particle from

{HHHH, HHHT , HHTH, HHTT , HTHH, HTTH, HTHT , HTTT ,

I cast a regular die and, if 6 comes up, toss a coin

I measure the time for the emission of radioactive particle from

Describe the events

Describe the events

Events are subsets of a set, the sample space Ω

Logical statements on events ⇐⇒ operation with sets

at least one of A and B occurs (disjunction)/ either A or B or

the event A does not occur (negation) ⇐⇒ Ā

the event A occurs but the event B does not ⇐⇒ A \ B

Let A, B be two subsets of Ω

For example, for n = 3

|A∪B ∪C | = |A|+|B|+|C |−|A∩B|−|A∩C |−|B ∩C |+|A∩B ∩C |

Using the De Morgan’s law, it then follows

Ω is also called the certain event

∅ is also called the impossible event

If A ∩ B = ∅, A and B are said to be disjoint or incompatible

If A ⊆ B, A is said to imply B (B occurs if A occurs)

The first two elements of a probabilistic model are the sample

The third element is the assignment of a probability to the events

We need to formalize statements such as

One interpretation views probability as a relative frequency (which

Carry out repeatedly and independently the same experiment a

The frequency definition of probability is based on the assumption

(Ω, P) is called a discrete probability space, with Ω its sample

For a discrete probability space (Ω, P), the following statements

which holds iff P(∅) = 0.

This follows from countable additivity, P2, setting Ak = ∅ for

When Ω is finite, we can equally define the probability space

1 = P(Ω) = P(A ∪ Ā) = P(A) + P(Ā)

P(B \ A) = P(B) − P(A ∩ B)

Since B = (B \ A) ∪ (A ∩ B) , with B \ A and (A ∩ B)

P(B) = P(B \ A) + P(A ∩ B)

hence the result

5) For any or any A, B ⊆ Ω

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

Since A ∪ B = A ∪ (B \ A) and A and B \ A are disjoint,

P(A ∪ B) = P(A) + P(B \ A) = P(A) + P(B) − P(A ∩ B)

(Ω, P) a discrete probability space.

They can be proven by induction

For example for n = 4, calling the events A1 = A, A2 = B,

P(A ∪ B ∪ C ∪ D) = P(A) + P(B) + P(C ) + P(D)

Alternating sum of the probabilities of each event (4 terms), each

Indeed, let Ai be i = 1, . . . , n n events

P(Ā1 ∩ . . . ∩ Ān ) is the probability that none occurs

By the Poincaré formulae, these probabilities can be written in

Find P(Ai ), P(Ai ∩ Aj ), P(A1 ∩ A2 ∩ A3 ), P(Ai ∪ Aj ) etc.

Find P(Ai ), P(Ai ∩ Aj ), P(A1 ∩ A2 ∩ A3 ), P(Ai ∪ Aj ) etc.

P(ωi ) = 1/4, P(Ai ) = 2/4, P(Ai ∩ Aj ) = 1/4 = P(A1 ∩ A2 ∩ A3 )

The probability of an event is the ratio of the number of cases that

Thus the problem of computing a probability of an event becomes

E = the sum of two throws is 3