Mathematics For Computer Science: Prof - Dr.hab. Viorel Bostan

Mathematics for Computer Science
Prof.dr.hab. Viorel Bostan
Technical University of Moldova

viorel.bostan@adm.utm.md
Lecture 15
1 / 67
Probability Joke
The probability of someone watching you

is proportional to...
the stupidity of your action.
2 / 67
A little bit of history
Founding fathers
Probability theory began in 17th century France in the
correspondence of two great French mathematicians, Blaise Pascal
and Pierre de Fermat.
Blaise Pascal Pierre Fermat
They influenced such researchers as Huygens, Bernoulli and

DeMoivre in establishing a mathematical theory of probability.
3 / 67
More bits of history
A.N.Kolmogorov in 1920 developed the theory based on Lebesgue

spaces.
Andrey Nikolaevich Kolmogorov, 1903–1987
4 / 67
Why probability is important for Computer Science?
Many algorithms rely on randomization.

Many aspects of computer systems are designed around
probabilistic assumptions and analysis:
memory management;
branch prediction;
packet routing;
load balancing;
Probability comes up in:
information theory;
cryptography;
artificial intelligence;
big data analysis;
game theory.
5 / 67
Why probability is important for Computer Science?
Beyond these engineering applications, an understanding of
probability gives insight into many daily problems, such as
polling;
weather prediction;
DNA testing;
data analytics
actuarial sciences (insurance);
risk assessment;
financial investing;
gambling.
Machine Learning is based on Probability!
So probability is good stuff!
Probably!
6 / 67
Back to history
Consider a dice game that played an important role in the historical
development of probability.
Chevalier de Mere
7 / 67
Back to history
It is said that de Mere had been betting that, in 4 rolls of a die, at

least one six would turn up.
He was winning consistently and, to get more people to play, he

changed the game to bet that,
in 24 rolls of two dice, a pair of sixes would turn up.
It is claimed that de Mere lost with 24 and felt that 25 rolls were
necessary to make the game favorable.
So, he asked Pascal to explain him that ”mystery”.
Pascal send an e-mail to Fermat and started the duscussion.
8 / 67
Back to history
The probability that no 6 turns up on the first toss is
5/6
The probability that no 6 turns up on either of the first two tosses is

2
5
6
Reasoning in the same way, the probability that no 6 turns up on any
of the first four tosses is 4
5
6
Thus, the probability of at least one 6 in the first four tosses is
4
5
1− = 0.518
6
9 / 67
Back to history
The probability of at least one 6 in the first four tosses is
0.518
So, in more than 51% of the cases de Mere was winning!
Similarly, for the second bet, with 24 rolls, the probability that de
Mere wins is 24
35
1− = 0.491
36
and for 25 rolls it is 25
35
1− = 0.506
36
10 / 67
What is a probability problem?
Every probability problem involves some sort of randomized

experiment, process, or game.
A randomized experiment/process/game is an
experiment/process/game whose outcome nobody knows at the
beginning.
Each such problem involves two distinct challenges:
1 How do we model the situation mathematically?
2 How do we solve the resulting mathematical problem?
Probability problems are classified upon the number of possible
outcomes into
Discrete probability problem;
Continuous probability problem.
11 / 67
Discrete Probability
Consider chance experiments with a finite number of possible

outcomes: ω1 , ω2 , . . . , ωn .
Example
Roll a die (which is an experiment) and the possible outcomes are:
{1, 2, 3, 4, 5, 6} corresponding to the side that turns up.
Example
Toss a coin (another experiment) with possible outcomes: H (heads)
and T (tails).
Example
Outcomes of two tosses of a coin (experiment) are: {HH, HT, TH, TT }
12 / 67
It is frequently useful to be able to refer to an outcome of an

experiment.
Example
Consider the mathematical expression which gives the sum of four
rolls of a die. To do this, we could let Xi , i = 1, 2, 3, 4, represent the
values of the outcomes of the four rolls, and then we could write the
expression
Y = X1 + X2 + X3 + X4
for the sum of the four rolls.
The Xi ’s are called random variables.
Y is another random variable.
13 / 67
Definition
A random variable is the expression whose value is the outcome of a
particular experiment.
Just as in the case of other types of variables in mathematics, random

variables can take on different values.
Let X be the random variable which represents the roll of one die and
assign probabilities to the possible of X.
Assign to each outcome ωj a nonnegative number m(ωj ) s.t.
m ( ω1 ) + m ( ω2 ) + . . . + m ( ω6 ) = 1
The function m(ωj ) is called the distribution function of random

variable X.
In this case assign 1/6 to each of the outcomes.

14 / 67
Definition
Suppose we have an experiment whose outcome depends on chance.
We represent the outcome of the experiment by a capital Roman
letter, such as X, called a random variable. The sample space of the
experiment is the set of all possible outcomes. If the sample space is
either finite or countably infinite, the random variable is said to be
discrete. Sample space usually is denoted by Ω.
Definition
The elements of a sample space are called outcomes. Each subset of a
sample space is defined to be an event. Normally, we shall denote
outcomes by lower case letters and events by capital letters.
15 / 67
Example
A die is rolled once. We let X denote the outcome of this experiment.
Then the sample space for this experiment is the 6−element set
Ω = {1, 2, 3, 4, 5, 6}
where each outcome i, for i = 1, . . . , 6, corresponds to the number of

dots on the face which turns up. The event
E = {2, 4, 6}
corresponds to the statement that the result of the roll is an even

number. The event E can also be described by saying that X is even.
16 / 67
Definition
Let X be a random variable which denotes the value of the outcome
of a certain experiment with finitely many possible outcomes. Let Ω
be the sample space of the experiment (i.e., the set of all possible
values of X, or equivalently, the set of all possible outcomes of the
experiment.) A distribution function for X is a real-valued function
m whose domain is Ω and which satisfies:
1. m(ω ) ≥ 0, for all ω∈Ω

2. ∑ m(ω ) = 1,
ω ∈Ω
For any subset E of Ω, define the probability of E to be the number

P(E) given by
P(E) = ∑ m( ω )
ω ∈E
17 / 67
Example
For rolling a die,
1
m(i) = , i = 1, 2, . . . , 6
6
Clearly, it is a distribution function. Then
P(E) = P({2, 4, 6})

= ∑ m( ω )
ω ∈{2,4,6}
= m(2) + m(4) + m(6)
3 1
= =
6 2
18 / 67
Example
Consider tossing a coin twice. There are several ways to record the
outcomes of this experiment: Record the two tosses, in the order they
occurred:
Ω1 = {HH, HT, TH, TT }
Record the outcomes by simply recording the number of heads that
appeared:
Ω2 = {0, 1, 2}
Finally, Record the two outcomes, without regard to the order in
which they occurred:
Ω3 = {HH, HT, TT }.
19 / 67
Example (Contd.)
Assume that all four outcomes are equally likely, and define the
distribution function m(ω ) by
1
m(HH ) = m(HT ) = m(TH ) = m(TT ) =
4
Let E = {HH, HT, TH }. Then
1 1 1 3
P(E) = m(HH ) + m(HT ) + m(TH ) = + + =
4 4 4 4
Similarly, if F = {HH, HT } then
1 1 1
P(F) = m(HH ) + m(HT ) = + =
4 4 2
20 / 67
Example
3 people, A, B, and C, are running for the same office, and assume
that one and only one of them wins. The sample space may be taken
as the 3−element set Ω = {A, B, C}. Suppose that A and B have the
same chance of winning, but that C has only 1/2 the chance of A or B.
Then we assign
m(A) = m(B) = 2m(C)
Since
m(A) + m(B) + m(C) = 1
we see that
2m(C) + 2m(C) + m(C) = 1
which implies that
5m(C) = 1
21 / 67
Example (Contd.)
m(A) = 2/5, m(B) = 2/5, m(C) = 1/5.

To sum up, sample space and distribution function are:
Ω = {A, B, C}
Ω A B C
m 2/5 2/5 1/5
Let E be the event that either A or C wins. Then E = {A, C}, and
2 1 3
P(E) = m(A) + m(C) = + =
5 5 5
22 / 67
Properties of probability
Theorem
The probabilities assigned to events by a distribution function on a sample
space Ω satisfy the following properties:
1. P(E) ≥ 0 for every E ⊂ Ω

2. P( Ω ) = 1
3. If E ⊂ F ⊂ Ω, then P(E) ≤ P(F)
4. If E and F are disjoint, then P(E ∪ F) = P(E) + P(F)
5. P(E) = 1 − P(E)for every E ⊂ Ω
23 / 67
Theorem
Let A1 , . . . , An be pairwise disjoint events with Ω = A1 ∪ . . . ∪ An and let
E be any event. Then
n
P(E) = ∑ P(E ∩ Ai )
i=1
Corollary
For any two events A and B,
P(A) = P(A ∩ B) + P(A ∩ B)
25 / 67
Theorem
If A and B are subsets of Ω, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
26 / 67
Tree Diagrams
A very useful tool in solving discrete probability problems are tree

digrams.
Example
Consider three tosses of a coin.
When have an experiment which takes place in stages, we often find

it convenient to represent the outcomes by a tree diagram as shown
on next slide.
27 / 67
Tree Diagrams
Start
H T 1st Toss
H T H T 2nd Toss
H T H T H T H T 3rd Toss
28 / 67
Tree Diagrams
Start
H T 1st Toss
H T H T 2nd Toss
ω1 ω2 ω3 ω4 ω5 ω6 ω7 ω8 ← outputs
28 / 67
Tree Diagrams
Start
H T 1st Toss
H T H T 2nd Toss
HHH HHT HTH HTT THH THT TTH TTT
28 / 67
Tree diagrams
Example (Contd.)
Sample space is
Ω = { ω1 , ω2 , . . . , ω8 }
= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT }.
All 8 outputs are equally likely to occur.
Let E be the event that at least one head will occur,
and F be the event that exactly one pair of either heads or tails will
happen.
Compute P(E) and P(F).
29 / 67
Tree diagrams
Start
1/2 1/2
H T 1st Toss
1/2 1/2 1/2 1/2
H T H T 2nd Toss
1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
30 / 67
Tree diagrams
Start
1/2 1/2
H T 1st Toss
1/2 1/2 1/2 1/2
H T H T 2nd Toss
1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ← m ( ωi )
30 / 67
Tree diagrams
Example (Example contd.)

Let E be the event that at least one head turns up.
Then E is the event that no heads turn up.
This event occurs for only one outcome, namely, ω8 = TTT.

Thus,
E = {TTT }
1
P(E) = P({TTT }) = m(TTT ) =
8
1 7
P(E) = 1 − P(E) = 1 − =
8 8
31 / 67
Tree diagrams
Example (Example contd.)
Let F be the event that either a pair of heads or a pair of tails turn up.
Let F1 be the event that exactly one pair of heads turns up,
and F2 consists of exactly a pair of tails.
Then F = F1 ∪ F2 and use formula:
P(F) = P(F1 ) + P(F2 ) − P(F1 ∩ F2 ),

3
P(F1 ) = P({HHT, HTH, THH }) = ,
8
3
P(F2 ) = P({TTH, THT, HTT }) = ,
8
P(F1 ∩ F2 ) = P(∅) = 0,
3 3 3
P(F) = + = .
8 8 4
32 / 67
Tree Diagrams
Example (Contd.)
Need to compute the probability that either the first outcome is head
or the second outcome is a tail.
33 / 67
Tree diagrams
Start
1/2 1/2
H T 1st Toss
1/2 1/2 1/2 1/2
H T H T 2nd Toss
1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ← m ( ωi )
34 / 67
Tree Diagrams
Example (Contd.)
Probability that either 1st outcome is head or 2nd outcome is tail.
Let A = {1st outcome is head} = {ω1 , ω2 , ω3 , ω4 },
and B = {2nd outcome is tail} = {ω3 , ω4 , ω7 , ω8 }.
By looking at the paths in the tree, we see that
1 1 1 1 1
P(A) = P(B) = + + + = ,
8 8 8 8 2
A ∩ B = { ω3 , ω4 } ,
1 1 1
P(A ∩ B) = + = ,
8 8 4
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
1 1 1 3
= + − =
2 2 4 4
35 / 67
Basic steps in solving discrete probability problems
Distinguish 4 basic steps in the solution process of discrete

probability problems:
1 Find the Sample Space Ω
36 / 67
Basic steps in solving discrete probability problems
Distinguish 4 basic steps in the solution process of discrete

probability problems:
1 Find the Sample Space Ω
2 Define Events of Interest
3 Determine Outcome Probabilities (find distribution function
m( ω ))
4 Compute Event Probabilities
36 / 67
Monty Hall Problem
Let’s make some assumptions in order to model the game formally:

1 The car is equally likely to be hidden behind each of the three
doors.
2 The player is equally likely to pick each of the three doors,
regardless of the car’s location.
3 After the player picks a door, the host must open a different door
with a goat behind it and offer the player the choice of staying
with the original door or switching.
4 If the host has a choice of which door to open, then he is equally
likely to select each of them.
What is the probability that a player who switches wins the car?
38 / 67
Monty Hall Problem
First objective is to identify all possible outcomes of the experiment.
A typical experiment involves several randomly-determined

quantities.
For example, the Monty Hall game involves three such quantities:
1. The door concealing the car.

2. The door initially chosen by the player.
3. The door that the host opens to reveal a goat.
Use tree diagrams. First stage indicates the car location.
For each possible location of the prize, the player could initially
choose any of the three doors. Represent this in a second layer added
to the tree.
Then a third layer represents the possibilities of the final step when
the host opens a door to reveal a goat.
39 / 67
MHP. Step1: Find sample space
40 / 67
MHP. Step 2: Define events of interest
Objective is to answer questions of the form “What is the probability

that . . . ?”, where the missing phrase might be
“the player wins by switching”,
“the player initially picked the door concealing the prize”,
“the prize is behind door C”.
41 / 67
Each of these phrases characterizes a set of outcomes:
the outcomes specified by “the prize is behind door C” is:
{(C, A, B), (C, B, A), (C, C, A), (C, C, B)}
The event that the player initially picked the door concealing the
prize is the set:
{(A, A, B), (A, A, C), (B, B, A), (B, B, C), (C, C, A), (C, C, B)}
The event that the player wins by switching is the set of outcomes:
{(A, B, C), (A, C, B), (B, A, C), (B, C, A), (C, A, B), (C, B, A)}
42 / 67
43 / 67
MHP. Step 3: Determine the outcomes probability
44 / 67
Next convert edge probabilities into outcome probabilities.
This is a purely mechanical process: the probability of an outcome is

equal to the product of the edge-probabilities on the path from the
root to that outcome.
For example, the probability of outcome, (A, A, B) is
1 1 1 1
· · =
3 3 2 18
There’s an easy, intuitive justification for this rule. As the steps in an
experiment progress randomly along a path from the root of the tree
to a leaf, the probabilities on the edges indicate how likely the walk is
to proceed along each branch.
For example, a path starting at the root in our example is equally

likely to go down each of the three top-level branches.
45 / 67
46 / 67
MHP. Step 4: Compute event probability
P{switching wins} = P{(A, B, C), (A, C, B), (B, A, C),

(B, C, A), (C, A, B), (C, B, A)}
= P{(A, B, C)} + P{(A, C, B)} + P{(B, A, C)}
+P{(B, C, A)} + P{(C, A, B)} + P{(C, B, A)}
1 1 1 1 1 1
= + + + + +
9 9 9 9 9 9
2
=
3
Correct answer is: player who switches doors wins the car with
probability 23 .
In contrast, a player who stays with his or her original door wins
with probability 13 , since staying wins if and only if switching loses.
47 / 67
Simulation of probability
Goal: to simulate on a computer a probabilistic experiment.

Example
Simulate an experiment with 3 possible outcomes ω1 , ω2 and ω3 , i.e.
Ω = { ω1 , ω2 , ω3 }
such that
1 1 1
m ( ω1 ) = , m ( ω2 ) = , m ( ω3 ) = .
2 3 6
If we have a die, such an event can be simulated by marking three
faces of a six-sided die with ω1 , two faces with ω2 and one face with
ω3 .
How to simulate this experiment on a computer?
48 / 67
First, find a computer analog of rolling a die. This is done on the

computer by using a random number generator.
Actually, it is a pseudorandom number generator.
If you type
>> x=rand;
Then x will be a random number between 0 and 1.
>> x=rand(N);
returns an N × N matrix whose entries are random numbers from
[0, 1);
More generally, rand(M,N) returns an M × N random matrix.
49 / 67
Example
>> rand(3,4)
 
0.2785 0.9649 0.9572 0.1419
0.5469 0.1576 0.4854 0.4218 ,
0.9575 0.9706 0.8003 0.9157 3×4
>> a=rand(1,6)

a = 0.7922 0.9595 0.6557 0.0357 0.8491 0.9340 1×6 ,
>> c=rand(3,1)
 
0.6787
c = 0.7577
0.7431 3×1
50 / 67
What if we need a random real number not in interval [0, 1], but in an
arbitrary interval [a, b]?
In this case the the following MATLAB commands do the job:

>> x=rand;
>> y=(b-a)*x+a;
Clearly, if x ∈ [0, 1], then y = (b − a)x + a ∈ [a, b].
The following command is even better:

>> y=(b-a)*rand+a;
51 / 67
How about not a real number, but an integer?
Want to simulate tossing a die ⇒ need a random variable with values
from {1, 2, 3, 4, 5, 6}.
>> x=fix(6*rand)+1;
Indeed,
0≤r<1
0 ≤ 6r < 6
[6r] ∈ {0, 1, 2, 3, 4, 5}
[6r] + 1 ∈ {1, 2, 3, 4, 5, 6}
Note that the probability of any of 64 outcomes is the same:
1
m(i) = , i = 1, 2, 3, 4, 5, 6.
6
52 / 67
Example (Contd)
Need to simulate an experiment with 3 possible outcomes:
Ω = { ω1 , ω2 , ω3 }
such that
1 1 1
m ( ω1 ) = , m ( ω2 ) = , m ( ω3 ) = .
2 3 6
Let x = [6r] + 1, with r ∈ [0, 1] a random number.
If x ∈ {1, 2, 3}, then ω1 has happened;
if x ∈ {4, 5}, then ω2 ;
if x ∈ {6}, then ω3 .
Note that ω1 could correspond to any 3 values, e.g. ω1 → {2, 4, 5},

ω2 → {1, 6} and ω3 → {3}.
53 / 67
Example (Coin Tossing)

Want to simulate coin tossing. Intuition suggests that the probability
of obtaining a head on a single toss of a coin is 1/2.
To have the computer toss a coin, can pick a random real number
r ∈ [0, 1].
If r < 1/2 call the outcome heads (H); if not, call it tails (T). Another
way to proceed would be to ask the computer to pick a random
integer from set {0, 1}, and let H be 0, and T be 1.
Running a simulation program 20 times results for ex. in:
HHHTTTHTTTTHTTTTTHTT.
Note that in 20 tosses, 6 heads and 14 tails have occured.
Run it again, and results most probably will be different.

54 / 67
Toss a coin n times, where n is much larger than 20, and see if the
proportion of heads is closer to intuitive guess of 1/2.
The simulation program will keep track of the number of heads.
Run program with n = 1000, to obtain say 489 heads. Then

489
proportion of heads is 1000 = 0.489.
Run it with n = 10000, to get 5067 heads.The proportion of heads is
5067
10000 = 0.5067.
This time, the proportion of heads is closer to the true value 0.5.
A mathematical model for this experiment is called Bernoulli Trials.

The Law of Large Numbers (will be studied later), shows that in the
Bernoulli Trials model, the proportion of heads should be near 0.5,
consistent with intuitive idea of the frequency interpretation of
probability.
55 / 67
Heads or Tails Game
Example (Heads or Tails Game)

Two friends, Petru and Paul play a game called ”Heads or Tails”.
In this game, a fair coin is tossed 40 times. Each time a head comes
up Petru wins 1 cent from Paul, and each time a tail comes up Petru
loses 1 cent to Paul.
For example, the results of one particular game with 40 coin tosses is
HHTHTHHTTTTTHHHHTHTHTHTHHHHHTTHHHTHHTHHT
with Petru winning 8 cent at the end of the game.
Clearly, for another run, the results will be different.
56 / 67
Heads or Tails Game
Heads or Tails Game. Petru’s winnings evolution graph
10
2
Winnings (banut)
−1
−2
−3
−4
−5
−6
−7
−8
−9
−10
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Toss number
57 / 67
Heads or Tails Game
10
2
Winnings (banut)
−1
−2
−3
−4
−5
−6
−7
−8
−9
−10
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Toss number
58 / 67
Heads or Tails Game
10
2
Winnings (banut)
−1
−2
−3
−4
−5
−6
−7
−8
−9
−10
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Toss number
59 / 67
Heads or Tails Game
It is natural to ask for the probability that Petru will win j cents.
Let random variable W be defined as
W = amount of cents Petru won in one game.

P(W = j) =?.
Here, j could be any even number (Why even?) from −40 to 40.
It is reasonable to guess that the value of j with the highest

probability is j = 0, since this occurs when the number of heads
equals the number of tails.
Similarly, we would guess that the values of j with the lowest

probabilities are j = ±40.
We would like to see if our intuition is correct (this time).

Heads or Tails Game
It is natural to ask for the probability that Petru will win j cents.
Let random variable W be defined as
W = amount of cents Petru won in one game.

P(W = j) =?.
Here, j could be any even number (Why even?) from −40 to 40.
It is reasonable to guess that the value of j with the highest

probability is j = 0, since this occurs when the number of heads
equals the number of tails.
Similarly, we would guess that the values of j with the lowest

probabilities are j = ±40.
We would like to see if our intuition is correct (this time).
60 / 67
Heads or Tails Game
Simulate the game a large number of times and keep track of the
number of times that Petru’s final winnings are j.
The proportions over the number of games played give estimates for
the corresponding probabilities.
For example, if in 10000 games Petru has won 6 bănuţ 754 times, then
754
P(W = 6) ≈ = 0.0754.
10000
Count the number of times Petru has won j cents and plot these
results to get the distribution of probabilities.
61 / 67
Heads or Tails Game: 1000000 games
Distribution of winnings
0.14
0.13
0.12
0.11
0.1
0.09
Probability (winnings = j)
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
−40 −35 −30 −25 −20 −15 −10 −5 0 5 10 15 20 25 30 35 40
j
65 / 67
Heads or Tails Game
Such graphs we call bar graphs.
The vertical line, or bar, at position j on the horizontal axis, has a

height equal to the proportion of outcomes which equal j.
Such a graph can be plotted using the MATLAB command: bar
Note that the more games are played the ”smoother” will be the
resulting curve above the bars.
Intuition about Petru’s final winnings was quite correct.
The highest probability will have the event consisting in Petru

winning 0 cents, while the lowest probability (close to 0) will have
the event ”Petru’s winnings are ±40, ±38, ±36”, with such events
corresponding to almsot all of the 40 tosses being of the same type,
which intuitively is highly unprobable.
66 / 67
The End of Lecture
67 / 67
Mathematics for Computer Science
Prof.dr.hab. Viorel Bostan
Technical University of Moldova

viorel.bostan@adm.utm.md
Lecture 16
1 / 69
Probability Joke
A lottery is a tax on people ...
who don’t know probability!
2 / 69
Uniform distribution
Definition
The uniform distribution on a sample space Ω containing n elements
is the function m defined by
1
m (ω ) =
n
for every ω ∈ Ω.
Uniform distribuiton assigns to each outcome the same probability.

Important!
When an experiment is analyzed to describe its possible outcomes,
there is no single correct choice of sample space. In other words, for
the same experiment we can have several choices for sample space.
But keep in mind that distribution functions can be different!
3 / 69
Example
Consider the experiment of tossing a coin twice. Sample space is
Ω = {HH, HT, TH, TT }
and assign the uniform distribution function m:
1
m(HH ) = m(HT ) = m(TH ) = m(TT ) = .
4
On the other hand, for some purposes, it may be more useful to
consider the 3−element sample space for the same experiment:
Ω1 = {0, 1, 2}
in which i, i = 0, 1, 2 is the outcome i heads turn up.
4 / 69
Example (Contd.)
The distribution function m1 on Ω1 defined by the equations:
1 1 1
m1 (0) = , m1 ( 1 ) = , m1 (2) =
4 2 4
corresponds to distribution m on Ω, but it is not uniform.
e on Ω1 :
Also, it is possible to choose another distribution function m
1
e (0) = m
m e (1) = m
e (2) = .
3
Although m e is a perfectly good uniform distribution function, it is
not consistent with observed data on coin tossing. So, it is not
appropriate for the considered experiment.
5 / 69
Example
Consider the experiment that consists of rolling a pair of dice.
Ω = {(i, j) | 1 ≤ i, j ≤ 6}
= {(1, 1), (1, 2), . . . , (2, 1), . . . , (5, 6), (6, 6)}
To determine the size of Ω, we note that there are six choices for i,
and for each choice of i there are six choices for j, leading to 36
different outcomes.
Assume that the dice are not loaded. In mathematical terms, this
means we adopt the uniform distribution function on Ω:
1
m((i, j)) = , 1 ≤ i, j ≤ 6
36
6 / 69
Example (Contd.)
Interested in answering the questions:
What is the probability of getting a sum of 8 on the roll of two dice?
What is the probability of getting a sum of 11?
The first event, denoted by E, is the subset of Ω :
E = {(2, 6), (6, 2), (3, 5), (5, 3), (4, 4)}.
A sum of 11 is the subset F given by
F = {(5, 6), (6, 5)}.
7 / 69
Example (Contd.)
Consequently, by definition of probability:
P(E) = ∑ m (ω )
ω ∈E
= m((2, 6)) + m((6, 2)) + m((3, 5)) + m((5, 3)) + m((4, 4))
1 5
= 5· =
36 36
P(F) = ∑ m ( ω )
ω ∈F
= m((5, 6)) + m((6, 5))
1 1
= 2· = .
36 18
8 / 69
Example (Contd.)
What is the probability of getting neither snakeeyes (double ones)
nor boxcars(double sixes)?
The event of getting either one of these two outcomes is the set
H = {(1, 1), (6, 6)}.
Hence, the probability of obtaining neither is given by
2 17
P(H ) = 1 − P(H ) = 1 − = .
36 18
Until now, we assigned an equal probability to each outcome, i.e. the

uniform distribution function was chosen. These are the natural
choices provided the coin is a fair one and the dice are not loaded.
9 / 69
Determination of probabilities
How to decide which probability distribution should be used in

practice?
One way is by symmetry.
For example, in the case of the coin toss, there are no physical
difference between the two sides of a coin that can affect the chance
of one side or the other turning up.
Similarly, with an ordinary die there is no essential difference

between any two sides of the die, and so by symmetry we assign the
same probability for any possible outcome.
In general, considerations of symmetry often suggest the uniform

distribution function.
10 / 69
Determination of probabilities
ATTENTION:
We should not always assume that, just because we do not know any
reason to suggest that one outcome is more likely than another, it is
appropriate to assign equal probabilities.
Consider the experiment of guessing the sex of a newborn child.
Your intuition induced by symmetry argument may say that the

probability of a boy being born is the same as that of a girl, that is:
P(Boy) = 0.5 and P(Girl) = 0.5
In reality, it is more appropriate to assign a distribution function

which assigns
P(Boy) = 0.513 and P(Girl) = 0.487
11 / 69
Infinite sample spaces
If a sample space Ω has an infinite number of points, then the way
that a distribution function is defined depends upon whether or not
the sample space is countable. If
Ω = {ω1 , ω2 , ω3 , . . . , ω2018 , . . .}
is a countably infinite sample space, then a distribution function is

defined exactly as previously, except that the sum must now be a
convergent infinite sum.
All results considered until now are true in this case.
One thing impossible on a countably infinite sample space (but

possible on a finite sample space) is to define a uniform distribution
function.
If the sample space is infinite but NOT COUTABLE (like R), then it is
a continuous probability case.
12 / 69
Infinite sample spaces
Indeed, if
Ω = { ω1 , ω2 , ω3 , . . . }
and uniform distribution is to be used:
m(wi ) = α, ∀i ∈ N,
then, no matter how small is α, the infinte sum of α0 s will be ∞.
For example, α = 0.000001 = 1

100000 , a sum of one million of such α0 s
will be 10.
On the other hand, must have:
∑ m( ω ) = 1
w∈ Ω
No uniform distribuiton on an infinite sample space!
13 / 69
2nd Joke of the Day
There are 3 kind of people:

those who can count and those who can’t ...
(Math folclore joke)
14 / 69
3rd Joke
15 / 69
3rd Joke

those who know binary and those who don’t ...
(CS folclore joke)
15 / 69
4th Joke
Cei patru apostoli erau următorii trei:

Luca s, i Matei
16 / 69
Counting
Informal definiton
Combinatorics is the area of mathematics concerned with counting.
How difficult is counting?

It’s challenging! And it depends on you.
Counting is a practical skill like integration.
You just need to translate the word-problem to a math problem
that you know how to solve.
Just, don’t be lost in translation.
17 / 69
Counting. A little bit of history
A problem appearing on one of the oldest survived mathematics

manuscripts of about 1650 B.C. was translated as:
Houses 7
Cats 49
Mice 343
Wheat 2401
Hekat 16807
19607
18 / 69
Counting. A little bit of history
An example of counting principle can be traced to at least 1730 in a
popular poem:
As I was going to St.Ives,

I met a man with seven wives,
Each wife had seven sacks,
Each sack had seven cats,
Each cat had seven kits.
Kits, cats, sacks and wives,
How many were going to St.Ives?
The correct answer is of course, one!

The others are going in the opposite direction.
19 / 69
Memory quiz
You might have heard this old poem before. Where?
Die Hard 3
20 / 69
Counting example
7820480135385502964448038 3171004832173501394113017 5763257331083479647409398 8247331000042995311646021

4894459918669156762409921 3208234421597368647019265 5800949123548989122628663 8496243997123475922766310
1082662032430379651370981 3437254656355157864869113 6042900801199280218026001 8518399140676002660747477
1178480894769706178994993 3574883393058653923711651 6116171789137737896701405 8543691283470191452333763
1253127351683239693851327 3644909946040480189969149 6144868973001582369735121 8675309258374137092461352
1301505129234077811069011 3790044132737084094417246 6247314593851169234746152 869432111236399686296665
1311567111143866433882194 3870332127437971355322815 6814428944266874963488274 8772321203608477245851154
1470029452721203876862144 4080505804577801451363100 6870852945543886849147881 8791422161722582546341091
1578271047286257499433886 4167283461025723481249203 6914955508120950093732397 9062628024592126283973285
1638243921852176243192354 4235996831123777788211249 6949632451369871524235413 9137845566925526349897794
1763580219131985963102365 4670939445749439042111220 7128211143613619828415650 915376296603189291934419
1826227795601842231029694 4815379351865384279613427 7173920083651862307925394 9270880194077636406984249
1843971826751020372014203 4837052948212922604442190 7215654874211755676220587 9324301480722103490379204
2396951193722134526177237 5106384238550185506715309 7256932847164391040233050 9436090832146695147140581
2781394568268599801096354 5142368192004769218069910 7332226570752354316203178 9475308159734538249013238
2796605196713610405408019 5181234096130144084041856 7426441829541573444964139 942376623917486974923202
2931016394761975263190347 5198267398125617994391348 7632198126531809327186321 9511972558779880288252979
9334580582944051551972967 5317592940316231219758372 7712154432211912882310511 9602413424619187112552264
3075514410490975920315344 5384358126771794128356947 7858918664240262356610010 9631217114906129219461111
3111474985252793452860017 5439211712248901995423417 7898156786763212963178679 9908189853102753335981319
3145621587936120118438701 5610379826092838192760458 8147591017037573337886166 9913237476341764299813987
3148901255628881103198549 5632317555465228677676044 8149436716871371161932035 315769310532511128321993
5692168374637019617423712 8176063831682536571306791
Two different subsets of the ninety 25-digit numbers shown above

have the same sum!
21 / 69
Counting seems easy
Just count: 1, 2, 3, 4, . . .
The explicit approach works well for counting simple things like
your fingers and for extremely complicated things for which there is
no identifiable structure.
The number of different ways to

select a dozen doughnuts when
there are five varieties available.
The number of 16-bit numbers

with exactly 4 ones.
22 / 69
Where counting is used?
Counting is useful in computer science for several reasons:

To compute the time and storage necessary in solving a
computational/computer problem.
To find the best algorithm for a specific task.
To find the winning strategy in games.
Counting is extensively used in graph theory.
Counting is the foundation of the probability theory, especially
the discrete one.
Several proof techniques rely on counting, such as ”pigeonhole
principle” or ”combinatorial proof”.
23 / 69
Where counting is used: Algorithms
Given n numbers: a1 , a2 , a3 , . . . , an it is necessary to sort them,
i.e., to put them in increasing (or decreasing) order: for. ex. input
4, 7, 6, 1, 3, 1, 9, 5 gives an output 1, 1, 3, 4, 5, 6, 7, 9.
What is the minimum number of binary comparisons needed to

sort n numbers?
What is the fastest way any algorithm could possibly sort?
24 / 69
Where counting is used: Graph theory
How many different

n-node graphs are there?
How many different
mappings need to be
checked to see if two
arbitrary n-node graphs
are isomorphic (similar)?
How many different
pairings between n boys
and n girls are there?
25 / 69
Where counting is used: Games
How many different configurations exist

for a Rubik’s cube?
How many different chess positions can

exist after n moves?
How many weighings are needed to

find the one counterfeit coin among 12
coins?
26 / 69
Where counting is used: Probability theory
The probability of one event in a uniform sample space is
number of event outcomes

number of all outcomes
What is the probability of a full house in poker?

What is the probability of having two people with the same
birthday in a room with n people?
What is the probability that a smart pickpocket thief will ”guess”
your ATM bank card PIN number?
What is the probability to have a profit of 100$ playing roulette
in a Las Vegas cazino if you have 500$ to play lose ?
27 / 69
Counting one thing by another
Therea are several rules for counting, most of them being intuitive.
How do you count people in a room?
For example, you can count the heads since for each person there is
exactly one head. Or you can count hands and divide by two. Just
recall the joke about a shepard in Transylvania and a reporter.
The general principle is that:
Counting General Principle

Count one thing by counting another!
Counting General Principle Rephrased

Find the cardinality of a set X
by finding the cardinality of a related set Y.
28 / 69
Mapping Rule
Theorem (Mapping Rule)
1 If f : X → Y is surjective, then |X| ≥ |Y| .

2 If f : X → Y is injective, then |X| ≤ |Y| .
3 If f : X → Y is bijective, then |X| = |Y| .
If you figure out the size of one set, then you can determine the sizes
of many other sets by bijections.
Example
Let
A = {all ways to choose 12 doughnuts from 5 available varieties:
chocolate, lemon, vanilla, raspberry and hazelnut}.
B = {all 16-bit sequences with exactly 4 ones}.
E.g: 0110001000001000, 1000100100000001, 0001001000100010.
29 / 69
Doughnuts example
Example (Contd.)
Consider a particular selection:
00
|{z} |{z} 000000
| {z } 00
|{z} 00
|{z}
chocolate lemon vanilla raspberry hazelnut
00
|{z} 1 |{z} 1 000000
| {z } 1 00
|{z} 1 00
|{z}
chocolate lemon vanilla raspberry hazelnut
We just formed a 16-bit sequence containing exactly 4 ones :
0011000000100100
There is a bijection from set A to set B: map 12 doughnuts
consisting of c chocolate, l lemon, v vanilla, r raspberry and h
hazelnut to the sequence 0| .{z. . 0}10| .{z
. . 0}10| .{z
. . 0}10| .{z
. . 0}10| .{z
. . 0}, a
c l v r h
sequence containing 16 bits and 4 ones.
By Mapping Rule we have |A| = |B| .
30 / 69
Sequences
Previous example and Mapping Rule suggest the following: learn to
count really well just few things and then use bijections to count
everything else.
sequence−counting
problems
S
bijection
A
all counting problems
31 / 69
Sequences vs Sets
A set is an unordered collection of distinct elements.

For example {a, b, c} is a set, and {c, a, b} is the same set.
On the other hand {a, b, a} is not a set since a appears twice.
A sequence is an ordered collection of elements (called
components or terms) that are not necessarily distinct.
For example, (a, b, c) and (c, a, b) are two different sequences.
Moreover, (a, b, a) is a valid 3-element sequence.
Definition
A k−sequence is a sequence containing exactly k terms. A
2−sequence is also called a pair. A 3−sequence is called a triple. A
k−bit sequence is a k−sequence whose terms are bits, either 0 or 1.
32 / 69
Sum rule
Example
A good computer science student has 10 books on math, 35 books on
programming and 15 books on algorithms.
How many books does he/she have?
Let set M to be the set of math books, P be the set of prog books and
A be the set of books on algorithms. In these notations we are asked
to find |M ∪ P ∪ A|.
Theorem (Sum Rule)

If A1 , A2 , . . . , An are disjoint sets, then
|A1 ∪ A2 ∪ . . . ∪ An | = |A1 | + |A2 | + . . . + |An |
|M ∪ P ∪ A| = |M| + |P| + |A|

= 10 + 35 + 15 = 60 33 / 69
Product rule
Definition
Let P1 , P2 , . . . , Pn be sets, then
P1 × P2 × . . . × Pn = {(p1 , p2 , . . . , pn ) | pi ∈ Pi }
Theorem (Product Rule)

If P1 , P2 , . . . , Pn are sets, then
|P1 × P2 × . . . × Pn | = |P1 | · |P2 | · . . . · |Pn |
Product rule does not require the sets to be disjoint.
34 / 69
Product rule
How many different 7-digit

phone numbers can be
created?
Keep in mind that a phone
number can’t start with digit 0. How many different plates can be
issued in R.Moldova?
Define sets
C = {all counties}
F = {1, . . . , 9}
S = {A, B, . . . , Z}
D = {0, 1, . . . , 9}
D = {001, 002, . . . , 999}
Answer:
F × D6 = 9 · 106

C × S2 × D = 43 · 262 · 999

= 29, 038, 932.
36 / 69
Putting rules together. Passwords
Example
On a computer system a valid password is a sequence of between 6
and 8 symbols. 1st symbol must be a letter (lowercase or uppercase),
remaining are either letters or digits. How many different passwords
are possible?
F = {a, b, . . . , z, A, B, . . . , Z}
S = {a, b, . . . , z, A, B, . . . , Z, 0, 1, . . . , 9}
Set of all possible passwords is (disjoint union):
F × S5 ∪ F × S6 ∪ F × S7

F × S5 ∪ F × S6 ∪ F × S7 = F × S5 + F × S6 + F × S7

= 52 · 625 + 52 · 626 + 52 · 627

≈ 1.8 · 1014 different passwords
37 / 69
Worst passwords by PC Magazine
Here is the list: If you are using any of

10. Your first name these please turn off your
computer immediately, go
9. blink182
take a nap and then
8. password1 change your password to
7. myspace1 a smarter one. Or use one
6. monkey of existing programms to
5. letmein generate it.
4. abc123 If you are stubborn
and/or in love with your
3. qwerty
password, then go out and
2. 123456 hand in your wallet to the
1. password first pickpocket thief!
38 / 69
Worst passwords in UK
Here is the list

10. thomas (0.99%)
9. arsenal (1.11%) Just remember that with a
small investment of 500
8. monkey (1.33%)
Euro and a simple
7. charlie (1.39%) programm you can check
6. qwerty (1.41% up to 8.2 · 109 passwords
5. 123456 (1.63%) per second.
4. letmein (1.76%) And quess what
passwords will be checked
3. liverpool (1.82%)
first ...
2. password (3.78%)
1. 123 (3.784%)
39 / 69
Subsets of an n-element set
How many different subsets of an n-element set X are there?

There is a natural bijection from subsets of X to n−bit sequences.
Let x1 , x2 , . . . xn be the elements of X.
Then a particular subset of X maps to the sequence (b1 , b2 , . . . , bn ),
where bi = 1 if and only if xi is in that subset.
There are as many subsets as different n−bit sequences. Now, how
many such sequences do exist?
If B = {0, 1}, then the set of all n−bit sequences is
n
| ×B×
B {z. . . × B} = B
n times
|Bn | = |B|n = 2n
40 / 69
The Pigeonhole Principle
Old puzzle: A drawer in a dark room contains red socks, green

socks and blue socks. How many socks must you withdraw to be
sure that you have a mathcing pair?
Clearly, picking out three socks is not enough. You might end up
with one red, one green and one blue socks.
The solution of this and many other problems rely on the Dirichlet
Principle or Pigeonhole Principle, which is a consequence of the
Mapping Rule.
Pigeonhole or Diriclet Principle

If |X| > |Y| , then for every function f : X → Y there exist two
different elements of X that are mapped to the same element of Y.
41 / 69
The Pigeonhole Principle. Too many pigeons
43 / 69
The Pigeonhole Principle
Let X be the set of socks and Y be the set of available colors.
X f
Y
fisrt sock
red
second sock
blue
third sock green
fourth sock
The Pigeonhole Principle states that if |X| > |Y| = 3, then at least 2
elements of X must be mapped to the same element of Y.
44 / 69
Generalized Pigeonhole Principle
Generalized Pigeonhole Principle
If |X| > k · |Y| , then every function f : X → Y maps at least k + 1
different elements of X that are mapped to the same element of Y.
X f
Y
:
:
:
:
|X| |Y|
:
:
|X| = k ⋅ |Y| +1 > k ⋅ |Y|

45 / 69
Hairs on heads problem
If you pick two people at random, surely, there are extremely

small chances that they have the same amount of hairs on their
heads.
However, in Chisinau, there are actually four people who have
exactly the same amount of hairs!
Chisinau has about 700, 000 non-bald people, and the number of
hairs on a person’s head is at most 200, 000.
Let X be the set of non-bold people in Chisinau and
Y = {1, 2, . . . , 200000} and let f map each person to the number
of hairs on his/her head.
Since |X| > 3 |Y|, the Generalized Pigeonhole Principle implies
that at least 4 people will have the same number of hairs.
I don’t know them, but I know for sure that they exist!
46 / 69
Counting example revisited
7820480135385502964448038 3171004832173501394113017 5763257331083479647409398 8247331000042995311646021
4894459918669156762409921 3208234421597368647019265 5800949123548989122628663 8496243997123475922766310
1082662032430379651370981 3437254656355157864869113 6042900801199280218026001 8518399140676002660747477
1178480894769706178994993 3574883393058653923711651 6116171789137737896701405 8543691283470191452333763
1253127351683239693851327 3644909946040480189969149 6144868973001582369735121 8675309258374137092461352
1301505129234077811069011 3790044132737084094417246 6247314593851169234746152 869432111236399686296665
1311567111143866433882194 3870332127437971355322815 6814428944266874963488274 8772321203608477245851154
1470029452721203876862144 4080505804577801451363100 6870852945543886849147881 8791422161722582546341091
1578271047286257499433886 4167283461025723481249203 6914955508120950093732397 9062628024592126283973285
1638243921852176243192354 4235996831123777788211249 6949632451369871524235413 9137845566925526349897794
1763580219131985963102365 4670939445749439042111220 7128211143613619828415650 915376296603189291934419
1826227795601842231029694 4815379351865384279613427 7173920083651862307925394 9270880194077636406984249
1843971826751020372014203 4837052948212922604442190 7215654874211755676220587 9324301480722103490379204
2396951193722134526177237 5106384238550185506715309 7256932847164391040233050 9436090832146695147140581
2781394568268599801096354 5142368192004769218069910 7332226570752354316203178 9475308159734538249013238
2796605196713610405408019 5181234096130144084041856 7426441829541573444964139 942376623917486974923202
2931016394761975263190347 5198267398125617994391348 7632198126531809327186321 9511972558779880288252979
9334580582944051551972967 5317592940316231219758372 7712154432211912882310511 9602413424619187112552264
3075514410490975920315344 5384358126771794128356947 7858918664240262356610010 9631217114906129219461111
3111474985252793452860017 5439211712248901995423417 7898156786763212963178679 9908189853102753335981319
3145621587936120118438701 5610379826092838192760458 8147591017037573337886166 9913237476341764299813987
3148901255628881103198549 5632317555465228677676044 8149436716871371161932035 315769310532511128321993
5692168374637019617423712 8176063831682536571306791
Two different subsets of the ninety 25-digit numbers shown above

have the same sum!
47 / 69
Counting example revisited
Let X be the collection of all subsets of the 90 numbers in the list.

Every 25−digit number is less than 1025 . Therefore, the sum of
any subset of those 90 numbers is at most 90 · 1025 .
So, let Y = 0, 1, 2, . . . , 90 · 1025 .

Let f : X → Y that maps any subset of numbers (in X) to its sum

(in Y).
|X| = 290 ≥ 1.237 · 1027 .
On the other hand |Y| = 90 · 1025 + 1 ≤ 0.901 · 1027 .
Both numbers are enormous, but |X| is a little bit bigger than
|Y| . By Pigeohole Principle, f maps at least two elements of X to
the same element of Y.
In other words, two different subsets must have the same sum!
48 / 69
Binary Search
Here is a game: I think of an animal.
You can ask me 20 questions that take an yes/no answer such as:
”Does this animal have fur?”
”Is this animal eating people?”
etc
To win the game, you must ask a question like:
”Is the animal a fox?”
”Is this salmon?”
and receive the answer ”Yes”.
In effect, you have 19 questions to determine the animal I am

thinking of, and then you must ask the final question to confirm your
guess.
Suppose I know a million animals.

Can you always determine which animal I am thinking of?
49 / 69
Binary Search
Any questioning strategy can be represented in a form of a binary

tree.
50 / 69
Binary Search
Each internal node in the tree

represents a question, and each
leaf represents a final guess at my
animal.
51 / 69
Binary Search
Each internal node in the tree

represents a question, and each
leaf represents a final guess at my
animal.
A tree of depth m will have at

most 2m leaves.
Prove this using mathematical

induction over m.
A depth 19 binary tree can have at

most 219 = 524, 288 leaves, and I
can use any of a million animals.
51 / 69
Binary Search
There are at most 524, 288 leaves, and 1, 000, 000 animals.
But each animal must be associated to one leaf.
By the Pigeonhole Principle, at least two animals must be associated

with some leaf in the tree.
Therefore, you can’t always determine the animal using only 19

questions.
Generaly, if n animals are known, then m = [log2 n] questions are

necessary to identify the animal. Why? (since 2m = n).
Otherwise, a binary tree of lower depth must have fewer than n

leaves, and some animals will remain unidentified.
52 / 69
Weighing Coins
Let’s consider the problem of identifying an off-weight counterfeit

coin among a collection of coins using a balance scale.
Consider 12 coins of which 11 have the same weight and a

counterfeit coin with a different weight.
Using only three weighings you must identify the counterfeit coin
and determine whether it is lighter or heavier than the rest of the
coins.
The strategy can be represented by a ternary tree.
53 / 69
Weighing Coins
c1 c2 c3 c4 vs. c5 c6 c7 c8
< = >
.............. c1 c2 c3 vs. c9 c10 c11 ..............
< = >
c9 vs. c10
< = >
c10 heavier c11 heavier c9 heavier
54 / 69
Weighing Coins
Each internal node represents a weighing, and the leaves represent

the result.
A run of this algorithm corresponds to a path from the root to a leaf.
In a ternary tree of depth 3 (3 weighings) there are at most 33 = 27

leaves.
For the counterfeit coin (among any 12 coins) we have 2 possible

answers: is it lighter or heavier.
So, there are 24 possibilities for 27 leaves. Such a strategy exists.
It is another problem to find this strategy!
55 / 69
Weighing Coins
Can the weighing problem be solved for 14 coins and 3 weighings?
Since any of the 14 coins could be the counterfeited one, there are 28
possible situations.
So we have 28 pigeons.
The holes are the 27 leaves in any strategy ternary tree of depth 3.
Since there are more pigeons than holes, the Pigeonhole Principle
implies that some leaf is not associated to a unique situation.
Therefore, for any weighing strategy, there is a pair of situations that

this strategy can not distinguish, in other words the strategy will give
a wrong answer.
56 / 69
Weighing Coins
In general, suppose we have n coins and w weighings.
For a correct weighings strategy to exist, there must be as many

leaves as situations.
That is 3w ≥ 2n, or equivalently w ≥ log3 (2n).
For example, 3 log3 (2 · 14) = log3 28 ≈ 3.033
Note that Pigeonhole Principle also implies that for 13 coins and 3
weighings a strategy may exist.
It does not exclude the case that the solution can fail to exist.
Actually, the solution does not exist in this case.
57 / 69
Basic strategy for counting
Recall our basic strategy for counting:

1 Learn to count sequences.
2 Translate everything else into a sequence-counting problem via
bijections.
3 Just don’t be lost in translation!
58 / 69
Generalized Product Rule
Consider a k−sequence:
( |{z} , |{z} , |{z} , . . . , |{z} )

1st entry 2nd entry 3rd entry k−th entry
Theorem (Generalized Product Rule)

Let S be a set of length k−sequences. If there are:
n1 possible first entries

n2 possible second entries for each first entry
n3 possible third entries for each combination of 1st and 2nd entries
etc
Then,
|S| = n1 · n2 · n3 · . . . nk
59 / 69
UTM alphabet
How many words of length 3 can be formed from alphabet

{U, T, M}?
By product rule we can form 33 = 27 different words:
UUU, UUT, UUM, UTU, UTT, . . . , UTM, . . . , MMM
Now, how many words of length 3 can be formed from alphabet

{U, T, M} such that all letters are different?
By generalized product rule, we can form 3 · 2 · 1 = 6 different words:
UTM, UMT, TUM, TMU, MUT, MTU
60 / 69
”Defective” Lei
Moldavian Lei bill is defective if some digit in its 6−digit serial

number appears more than once.
How common are such defective bills?
61 / 69
”Defective” Lei
In fact, how common are non-defective bills?
Assume that all serial numbers occur equally often.
Need to compute
number of 6 − digit numbers with all digits different

total number of 6 − digit numbers
First, consider the denominator. We have no restictions on the digits.
There are 10 possibilities for the first digit,

10 possibilities for the second digit, and so on.
Thus, total number of 6−digit serial number is 106 .
62 / 69
”Defective” Lei
For numerator, it is not allowed to use the same digit twice.
So, there are still 10 possibilities for the first digit, but only 9 for the
second digit, 8 for the third digit, an so forth.
By Generalized Product Rule there are
10 · 9 · 8 · 7 · 6 · 5 = 151, 200
serial numbers with all digits different. Probability of having a
non-defective lei is
151, 200
= 0.1512
1, 000, 000
and the probability of getting a defective lei is
1 − 0.1512 = 0.8488 = 84%
63 / 69
A chess problem
In how many ways can we place a pawn, a knight and a bishop on

a chessboard such that no two pieces share a row or column?
K K
B
B P
P
Valid configuration Invalid configuration
64 / 69
A chess problem
Map this problem to a question about sequences:

rp , cp , rk , ck , rb , cb ,
rp , rk and rb are distinct rows, cp , ck and cb − distinct columns.
Above configuration is mapped to (7, 6, 2, 5, 5, 2).

It is a bijection. By Bijection Rule, the number of configurations is
the same as the number of valid sequences.
65 / 69
A chess problem
Count the number of valid sequences using the Generalized Product

Rule:
rp is one of the 8 rows.
cp is one of the 8 columns.
rk is one of the 7 rows (any row except the row rp ).
ck is one of the 7 columns (any column except cp ).
rb is one of the 6 rows (any one but rp and rk ).
cb is one of the 6 columns (any one but cp and ck ).
Total number of configurations is
8 · 8 · 7 · 7 · 6 · 6 = 112, 896
66 / 69
Permutations
Definition
A permutation of a set S is a sequence that contains every element of
S exactly once.
For example, here are all permutations of the set {a, b, c}:
(a, b, c) (a, c, b) (b, a, c)

(b, c, a) (c, a, b) (c, b, a)
How many permutations of an n−element set are there?
For example, as we can see there are 6 permutations of a 3−element

set.
67 / 69
Permutations
How many permutations of an n−element set are there?
For the 1st element in the sequence there are n choices,
for the 2nd there are n − 1 choices,
for the 3rd – n − 2 choices, and so forth.
Thus, there are totaly
n · (n − 1) · (n − 2) · . . . · 2 · 1 = n!
permutations for a set with n elements.
68 / 69
The End of Lecture
69 / 69

Mathematics For Computer Science: Prof - Dr.hab. Viorel Bostan

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematics For Computer Science: Prof - Dr.hab. Viorel Bostan

Uploaded by

Copyright:

Available Formats

Mathematics for Computer Science

Prof.dr.hab. Viorel Bostan

Technical University of Moldova

The probability of someone watching you

Blaise Pascal Pierre Fermat

They influenced such researchers as Huygens, Bernoulli and

A.N.Kolmogorov in 1920 developed the theory based on Lebesgue

Andrey Nikolaevich Kolmogorov, 1903–1987

Many algorithms rely on randomization.

Machine Learning is based on Probability!

So probability is good stuff!

It is said that de Mere had been betting that, in 4 rolls of a die, at

He was winning consistently and, to get more people to play, he

in 24 rolls of two dice, a pair of sixes would turn up.

So, he asked Pascal to explain him that ”mystery”.

Pascal send an e-mail to Fermat and started the duscussion.

The probability that no 6 turns up on either of the first two tosses is

The probability of at least one 6 in the first four tosses is

So, in more than 51% of the cases de Mere was winning!

Every probability problem involves some sort of randomized

Consider chance experiments with a finite number of possible

It is frequently useful to be able to refer to an outcome of an

Just as in the case of other types of variables in mathematics, random

Assign to each outcome ωj a nonnegative number m(ωj ) s.t.

The function m(ωj ) is called the distribution function of random

In this case assign 1/6 to each of the outcomes.

where each outcome i, for i = 1, . . . , 6, corresponds to the number of

corresponds to the statement that the result of the roll is an even

1. m(ω ) ≥ 0, for all ω∈Ω

For any subset E of Ω, define the probability of E to be the number

P(E) = P({2, 4, 6})

m(A) = 2/5, m(B) = 2/5, m(C) = 1/5.

1. P(E) ≥ 0 for every E ⊂ Ω

P(A) = P(A ∩ B) + P(A ∩ B)

P(A ∪ B) = P(A) + P(B) − P(A ∩ B)

A very useful tool in solving discrete probability problems are tree

When have an experiment which takes place in stages, we often find

All 8 outputs are equally likely to occur.

Let E be the event that at least one head will occur,

Compute P(E) and P(F).

1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ← m ( ωi )

Example (Example contd.)

Then E is the event that no heads turn up.

This event occurs for only one outcome, namely, ω8 = TTT.

and F2 consists of exactly a pair of tails.

Then F = F1 ∪ F2 and use formula:

P(F) = P(F1 ) + P(F2 ) − P(F1 ∩ F2 ),

1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 ← m ( ωi )

Distinguish 4 basic steps in the solution process of discrete

Distinguish 4 basic steps in the solution process of discrete

Let’s make some assumptions in order to model the game formally:

A typical experiment involves several randomly-determined

1. The door concealing the car.

Use tree diagrams. First stage indicates the car location.

Objective is to answer questions of the form “What is the probability

“the player wins by switching”,

“the player initially picked the door concealing the prize”,

“the prize is behind door C”.

Each of these phrases characterizes a set of outcomes:

the outcomes specified by “the prize is behind door C” is:

{(C, A, B), (C, B, A), (C, C, A), (C, C, B)}

This is a purely mechanical process: the probability of an outcome is

For example, the probability of outcome, (A, A, B) is