LU1 - Introduction To Probability Theory

LU1: Introduction to Probability
Theory
◼ History
◼ Counting formulas
◼ Probability definitions
LU1: Introduction to Probability Theory
◼ Topics
▪ History
▪ Counting formulas
❑ Permutations
❑ Combinations
▪ Probability definitions
❑ Basic concepts
❑ Laplace’s theory
❑ Frequentist theory
❑ Subjective theory
Ana Cristina Costa 2

◼ At the end of this learning unit students should be able to

▪ Apply the appropriate counting formula to each situation
▪ Understand the difference between permutations and combinations
▪ Explain the difference between an elementary outcome of a random
experiment and a non-elementary outcome
▪ Explain the difference between sample space and event, and identify
the sample space of a random experiment
▪ Compute probabilities using the Laplace’s theory (i.e., Classical
definition of probability)
▪ Explain the limitations of the Laplace’s theory
▪ Understand the Frequentist and Subjective theories
▪ Explain the limitations of the Frequentist theory

◼ Resources on the Internet

▪ Newbold, P., Carlson, W. L., Thorne, B. (2013). Statistics for Business and
Economics. 8th Edition, Boston: Pearson, Sections 3.1 and 3.2. (requires VPN
connection)
▪ Kyle Siegrist (2020) Random - Probability, Mathematical Statistics, Stochastic

Processes. University of Alabama in Huntsville, Chapter 0 and Chapter 1. (access: 9
Feb 2021)
▪ MathsIsFun.com (2017) Combinations and Permutations. (access: 9 Feb 2021)
▪ Lightner, J. (1991). A Brief Look at the History of Probability and Statistics. The
Mathematics Teacher, 84(8), 623-630. (requires VPN connection)

History
In the frivolous court of the kings of France, an experienced and inveterate player
– the knight of Méré – having found certain apparent contradictions between the
assessment of probabilities of gain in a certain game and his extensive experience,
proposed this problem to Pascal (1623-1662), among other questions about
games.
One of them was immediately resolved by Pascal; others were resolved by Fermat
(1601-1665) due to his correspondence with the latter.
In the 17th century there is still to be cited the remarkable work of Huyghens
(1629-1695) who introduced the notion of mean value or mathematical
expectation, and the masterful treatise of Jacob Bernoulli (1654-1705) who still
influences probabilistic thinking today.
Then comes de Moivre (1667-1754) proposing a first version of the Central Limit
Theorem, to which Gauss (1777-1855) and, fundamentally, Laplace (1749-1827)
would then give a more general form.

History
Bayes (1702-1761) formulates the first attempt to mathematize statistical inference.
Since the end of the 19th century, Galton (1822-1911), K. Pearson (1857-1936) and
Student (1876-1937) (pseudonym of W. S. Gosset) begin the broad formulation of
statistics and its applications.
In the twentieth century it is practically impossible to name contributors, so many of

them are. In Mathematical Statistics, must be highlighted the names of Fisher (1890-
1962), Wald (1902-1950) and Neyman (1894-1981).
Gauss (1777-1855), Lagrange (1736-1813) and Poisson (1781-1840), to name just a

few authors, also made important contributions to the Theory of Probabilities.

Counting formulas
◼ In counting processes, a complex problem is decomposed into a sequence

of independent elementary problems
◼ The number of results of the original problem is equal to the product of
the number of results of elementary problems
The toss of a coin

has 2 outcomes,
and the roll of a
dice has 6.
The sequential toss
of the coin and roll
of the dice gives
2x6 different
outcomes.

Counting formulas
◼ Example
▪ A bit is equal to 0 or 1; a byte is a sequence of 8
bits. Sequence with
repetition
▪ How many different codes can be represented
by a byte?
Number of ordered
sequences of 8
elements, where
1st position 2nd position … 8th position
each element can
2 x 2 x … x 2 = 28
take 2 possible
values
✓ The order of the bits is important!

Counting formulas
◼ Permutations with repetition

▪ The number of ordered sequences of dimension r, which it is possible
to form with the n elements of a set A, is given by nr
❑ When the order does matter it is a permutation
❑ In permutations with repetition, we can re-use the same element within

the sequence

Counting formulas
◼ Example
▪ How many codes with 4 digits can you choose for
the ATM card if none of the digits can be
Sequence without
repeated?
repetition
1st digit 2nd digit 3rd digit 4th digit Number of ordered
10! sequences of 4
10 x 9 x 8 x 7 =
10 − 4 ! elements, where
the 10 possible
elements can not
✓ The order of the digits is important! be repeated
✓ Without repetition our choices get reduced each
time

Counting formulas
◼ Partial permutations (or Sequences without repetition)

▪ The number of ordered sequences of r (r ≤ n) elements, where the n
𝑛!
possible elements can not be repeated is given by
𝑛−𝑟 !
❑ When the order does matter it is a permutation
❑ In partial permutations, we can not re-use the same element within the
sequence

Counting formulas
◼ Permutations of n
▪ Case of a sequence without repetition when n = r
▪ There are 𝑛! ordered sequences without repetition
❑ A permutation of a set of objects is an arrangement of those objects into a

particular order: there are n! ways to order the n elements of a set A
◼ Example
▪ To access a particular computer, it is necessary to enter a password
consisting of 10 different digits
▪ The number of passwords you can choose is equal to 10!

Counting formulas
◼ Permutations with multinomial coefficients (or

Multiset permutations)
▪ Let A be a set of k elements. We want to form

Multiset
ordered sequences of n elements, with n > k (i.e., permutations
at least one element will have to be repeated), such
that: Suppose we have a
set with n items,
❑ The 1st element appears n1 times
where there are n1,
❑ The 2nd element appears n2 times n2,…, nk that are
❑ … identical. The
number of ways to
❑ The k-th element appears nk times, and n1+n2+…+nk = n
permute them is
given by this
▪ The number of distinct sequences we obtain is given multinomial
𝑛! coefficient.
by the multinomial coefficient:
𝑛1 !×𝑛2 !×⋯×𝑛𝑘 !

Counting formulas
◼ Example
• To access a particular computer, it is necessary to enter a password
consisting of 15 digits. How many passwords can be formed in order to
select:
• 2 times the digit 0
• 1 time the other digits
• Problem of permutations with multinomial coefficients:

15!
2!3!1!1!1!1!1!1!1!3!

Counting formulas
◼ Combinations
▪ A combination is a selection of items from a collection, such that the
order of selection does NOT matter (unlike permutations)
▪ A k-combination of a set A with n elements is a subset of k distinct

elements (k ≤ n) of A, which is equal to the binomial coefficient
𝑛 𝑛!
𝐶𝑘𝑛 = =
𝑘 𝑛−𝑘 !𝑘!

Counting formulas
◼ Example
▪ A restaurant needs to hire 2 cooks and 3 waiters from 14 candidates,
of which 4 are cooks and 10 are waiters. In how many different ways
can we do it?
▪ The order does not matter (choosing Mary and John is equal to
choosing John and Mary) → problem of Combinations
𝐶24 × 𝐶310 = 6 × 120 = 720
Choose 2 Choose 3
among the 4 among the 10
cooks waiters

Probability definitions
◼ Basic concepts
▪ Random experiment
❑ In the context of this Learning Unit, it is said that an experiment is
random if:
1. We know all its possible results.
2. Each time it is carried out, it is not known in advance which of
the possible results will happen.
3. Can be repeated under similar conditions.

◼ Basic concepts
▪ Elementary outcome (or elementary result)
❑ An elementary outcome of a random experiment is a result that
cannot be subdivided into any other.
▪ Event
❑ In the context of this Learning Unit, an event is a set of one or more
elementary outcomes from a random experience.

◼ Basic concepts
▪ Sample space
❑ In the context of this Learning Unit, a Ω sample space is the set of
all elementary outcomes of a random experience.
❑ An elementary outcome corresponds to a subset of Ω formed by a

single element.
❑ An event corresponds to any subset of Ω.

◼ Example 1
▪ Consider a random experience that consists of flipping a coin and
observing which of the faces comes out: H=“Head" or T=“Tail".
▪ What is the sample space of this random experiment?

✓ Ω1 = {H, T}
Source: https://justflipacoin.com/

◼ Example 2
▪ Consider a random experience that consists of rolling a dice.

✓ Ω2 = {1, 2, 3, 4, 5, 6}
▪ Let A be the event “the outcome is an even face”. How should we

represent this event?
✓ A = {2, 4, 6}
Source: https://www.netclipart.com

◼ Example 3
▪ Consider a random experience that consists of rolling 2 dices.

✓ Ω3 = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), … }
▪ Let B be the event “the sum of the outcomes of the two dices is equal
to 6”. How should we represent this event?
✓ B = {(1,5), (2,4), (3,3), (4,2), (5,1)}

◼ Laplace’s theory (or Classical definition of probability)

▪ Let Ω be the sample space of a random experience with N elementary
outcomes that are all equally-likely to occur. Let A be an event with n
elementary outcomes.
▪ The probability of A is represented by P(A) and it is given by

P(A) = n/N
❑ n is the cardinal number of the event A
❑ N is the cardinal number of the sample space
➢ Assumption: the probability of any elementary outcome is 1/N

◼ Laplace’s theory (or Classical definition of probability)

▪ The probability of an event is the ratio of the number of cases
favorable to it, to the number of all cases possible when nothing leads
us to expect that any one of these cases should occur more than any
other, which renders them, for us, equally possible
#𝐴 𝑛
𝑃 𝐴 = =
#Ω 𝑁

◼ Example 2 (continued)
▪ Consider a random experience that consists of rolling a dice. Let A be
the event “the outcome is an even face”. What is the probability of the
event A?
▪ The cardinal number of the sample space is

✓ # Ω2 = # {1, 2, 3, 4, 5, 6} = 6
▪ The cardinal number of the event A is

✓ # A = # {2, 4, 6} = 3
▪ The probability of event A is

✓ P(A) = 3/6 = 0.5

◼ Example 3 (continued)
▪ Consider a random experience that consists of rolling 2 dices. Let B be
the event “the sum of the outcomes of the two dices is equal to 6”.
What is the probability of the event B?
▪ The cardinal number of the sample space is

✓ # Ω3 = # {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), … } = 36
▪ The cardinal number of the event B is

✓ # B = # {(1,5), (2,4), (3,3), (4,2), (5,1)} = 5
▪ The probability of event B is

✓ P(B ) = 5/36 = 0.1389

◼ Consequences of Laplace’s theory

▪ The probability of an event is a number between 0 and 1
#A n
0 ≤ #Ω = N ≤ 1, because #A ≤ # and #A ≥ 0
▪ If an event occurs with certainty, its probability is 1

#Ω N
P Ω = = =1
#Ω N
Example: P(outcome of a dice is greater than or equal to 1) = P() = 1
▪ If an event does not occur with certainty, its probability is 0

#∅ 0
P ∅ = = =0
#Ω N
Example: P(outcome of a dice is 7) = P() = 0

◼ Limitations of Laplace’s theory

▪ Inapplicable when the number of possible elementary outcomes is
infinite
▪ Inapplicable when elementary outcomes are not equiprobable,

because the definition is circular
▪ Inapplicable to complex phenomena

◼ Frequentist theory (or Empirical theory)

▪ Consider a random experience that can repeated any number of times,
so that we can produce a series of independent trials under identical
conditions. In each observation, depending on chance, a particular
event A either occurs or does not occur.
▪ Let n be the number of repetitions of the experiment and let n(A) be

the number of times that event A occurs in this series of experiments.
❑ The ratio n(A)/n is called the relative frequency of the event A (in this given
series of independent and identical trials)
▪ The probability of A is given by

𝑛(𝐴)
𝑃 𝐴 = lim
𝑁→+∞ 𝑛

◼ Frequentist theory (or Empirical theory)

▪ The relative frequency of occurrence of an event A, observed in a
number of repetitions of the experiment, is a measure of the
probability of that event:
𝑛(𝐴)
𝑃(𝐴) ≈
𝑛
▪ It has been empirically observed that the relative frequency becomes

stable in the long run.
▪ This definition no longer requires that the elementary outcomes be

equiprobable!

◼ Example 4
▪ John Kerrich, along with internee Eric Christensen, tossed a coin 10 000 times
and observed the occurrence of “heads“ while interned in Denmark during
World War II.
▪ By recording the number of heads obtained as the trials continued, Kerrich was
able to demonstrate that the proportion of heads obtained asymptotically
approached the theoretical value of 0.5 (see the results in the LU1_Examples Excel file).
❑ The probability of “heads“ is equal to 0.5
❑ A fair coin was used
Kerrich, J. E. (1946). An Experimental Introduction to the

Theory of Probability. Copenhagen: J. Jorgensen.
◼ Example 5
▪ When you toss a coin, there are only two possible outcomes, heads or tails. On
any one toss, you will observe one outcome or another—heads or tails. Over a
large number of tosses, though, the percentage of heads and tails will come to
approximate the true probability of each outcome.
▪ In this applet, you can set the true probability of heads for your virtual coin,
and then toss it any number of times.
❑ Notice how the proportion of tosses that produce heads can be quite
variable at first but will eventually settle down to the true probability.
❑ Click the "Quiz Me" button to complete the activity.
“Statistical Applets”, Probability, book companion site of Moore, D., Notz, W. &
Fligner, M. (2015) The Basic Practice of Statistics. (accessed February 2021)
◼ Example 6
▪ In the academic year 2002/2003, students of the Degree in Information
Management were asked to roll 2 dices at least 50 times and record the "sum
of the dots".
▪ Objective: to compare the probabilities computed through the Frequentist

theory and the Laplace’s theory.
❑ The results of the experiment of Alexandra Pinto are available in the

Example6a sheet of the LU1_Examples Excel file
❑ Balanced dices were not used and therefore the Frequentist setting is more
appropriate
❑ The calculation of probabilities using the Laplace’s theory is available in the

Example6b sheet of the LU1_Examples Excel file

◼ Consequences of Frequentist theory

▪ The probability of an event is a number between 0 and 1
▪ If an event occurs with certainty, its probability is 1
▪ If an event does not occur with certainty, its probability is 0

◼ Limitations of Frequentist theory

▪ The convergence property of the frequency, whose limit might not
exist
▪ Inapplicable when the experiment cannot be repeated
▪ Inapplicable when the experiment cannot be repeated under identical

conditions
▪ Inapplicable to complex phenomena

◼ Remarks on the Frequentist theory

▪ The frequentist definition is deliberately vague, in certain points,
because a practical definition is intended.
❑ We need to make statements such as: "the probability that the patient will
survive the operation is 0.4".
❑ Why 0.4? The answer is possibly because 40% of previous patients survived
the operation.
❑ But, were the previous patients identical to this patient? No, they were not
- we are all individuals. But this is how this theory is often applied.
➢ This is the problem that is faced whenever a mathematical model is

applied to a practical and real situation. We must be prepared to
understand when we can apply a model and when not to apply it.

◼ Subjective theory
▪ The probability of an event is the degree of belief a person attaches to
that event, based on his/her available information, in a scale from 0 to
1 (or 0% to 100%).
❑ This reasoning holds only under the assumption of rationality, which
assumes that people act coherently.
❑ It was developed by probabilist B. de Finetti.

◼ Example 7
▪ In an interview, an economist said that he considered the
"Improvement" of the economic situation as likely as its "Stagnation".
However, he viewed “Improvement” as twice as likely as the
“Breakdown” of economic activity.
❑ Sample space:
Ω = {“Improvement”, “Stagnation”, “Breakdown”}
❑ It is not possible to determine the probability associated with each

outcome:
P(“Improvement”) = P(“Stagnation”) = 2 x P(“Breakdown”)

◼ Limitations of Subjective theory

▪ It contains no formal calculations and only reflects the subject's
opinions and past experience rather than on data or computation.
▪ Subjective probabilities differ from person to person.
▪ There is usually a high degree of personal bias.
➢ One way to improve the quality of a subjective probability is to use the

opinion of an expert in that field (e.g., an investment banker’s opinion of
the probability that a hostile takeover will succeed, or an engineer’s
opinion of the feasibility of a new energy technology).

◼ Remarks on probability definitions

▪ Much of the mathematics of probability was developed based on the
simplistic definition of Laplace’s theory. Alternative interpretations of
probability (e.g., Frequentist and Subjective theories) also have
problems.
▪ The following Learning Unit introduces the Axiomatic theory, which

deals in abstractions, avoiding the limitations and philosophical
complications of any probability interpretation. This means that the
probabilities of our events can be perfectly arbitrary, except that they
must satisfy a set of simple axioms.
❑ The classical theory will correspond to the special case of so-called
equiprobable spaces.

LU1 - Introduction To Probability Theory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LU1 - Introduction To Probability Theory

Uploaded by

Copyright:

Available Formats

LU1: Introduction to Probability

Ana Cristina Costa 2

◼ At the end of this learning unit students should be able to

Ana Cristina Costa 3

◼ Resources on the Internet

▪ Kyle Siegrist (2020) Random - Probability, Mathematical Statistics, Stochastic

▪ MathsIsFun.com (2017) Combinations and Permutations. (access: 9 Feb 2021)

Ana Cristina Costa 4

Ana Cristina Costa 5

Bayes (1702-1761) formulates the first attempt to mathematize statistical inference.

In the twentieth century it is practically impossible to name contributors, so many of

Gauss (1777-1855), Lagrange (1736-1813) and Poisson (1781-1840), to name just a

Ana Cristina Costa 6

◼ In counting processes, a complex problem is decomposed into a sequence

The toss of a coin

Ana Cristina Costa 7

✓ The order of the bits is important!

Ana Cristina Costa 8

◼ Permutations with repetition

❑ When the order does matter it is a permutation

❑ In permutations with repetition, we can re-use the same element within

Ana Cristina Costa 9

Ana Cristina Costa 10

◼ Partial permutations (or Sequences without repetition)

❑ When the order does matter it is a permutation

Ana Cristina Costa 11

❑ A permutation of a set of objects is an arrangement of those objects into a

Ana Cristina Costa 12

◼ Permutations with multinomial coefficients (or

▪ Let A be a set of k elements. We want to form

Ana Cristina Costa 13

• 3 times the digit 1

• 3 times the digit 9

• 1 time the other digits

• Problem of permutations with multinomial coefficients:

Ana Cristina Costa 14

▪ A k-combination of a set A with n elements is a subset of k distinct

Ana Cristina Costa 15

𝐶24 × 𝐶310 = 6 × 120 = 720

Ana Cristina Costa 16

Ana Cristina Costa 17

Ana Cristina Costa 18

❑ An elementary outcome corresponds to a subset of Ω formed by a

❑ An event corresponds to any subset of Ω.

Ana Cristina Costa 19

▪ What is the sample space of this random experiment?

Ana Cristina Costa 20

▪ What is the sample space of this random experiment?

▪ Let A be the event “the outcome is an even face”. How should we

Ana Cristina Costa 21

▪ What is the sample space of this random experiment?

Ana Cristina Costa 22

◼ Laplace’s theory (or Classical definition of probability)

▪ The probability of A is represented by P(A) and it is given by

➢ Assumption: the probability of any elementary outcome is 1/N

Ana Cristina Costa 23

◼ Laplace’s theory (or Classical definition of probability)

Ana Cristina Costa 24

▪ The cardinal number of the sample space is

▪ The cardinal number of the event A is

▪ The probability of event A is

Ana Cristina Costa 25