You are on page 1of 28

Models and Algoritms for

Internet Computing

Basic probability: axioms,


conditional probability, random
variables, distributions
Application: Verifying Polynomial
Identities
• Computers can make mistakes:
• Incorrect programming
• Hardware failures

 sometimes, use randomness to check output


• Example: we want to check a program that multiplies together
monomials
E.g: (x+1)(x-2)(x+3)(x-4)(x+5)(x-6) ?= x6-7x3+25
• In general check if F(x) = G(x) ?

• One way is:


• Write another program to re-compute the coefficients
• That’s not good: may goes same path and produces the same bug as in
the first
3
How to use randomness
• Assume the max degree of F & G is d. Use this algorithm :
RANDOM_TEST
1. Pick a uniform random number r from:
{1,2,3, … 100d}
2. Check if F(r)=G(r) then output “equivalent”, otherwise “non-
equivalent”
• Note: this is much faster than the previous way – O(d) vs. O(d2)
• One-sided error:
• Answer “non-equivalent” always true
• Answer “equivalent” can be wrong
• How it can be wrong:
• If accidentally picked up a root of F(x)-G(x) = 0
• This can occur with probability at most 1/100

4
Axioms of probability
• We need a formal mathematical setting for analyzing
the randomized algorithms like RANDOM_TEST
• Any probabilistic statement must refer to the underlying
probability space
• Definition 1: A probability space has three components:
• A sample space , which is the set of all possible outcomes of
the random process modeled by the probability space
• A family of sets  representing the allowable events, where
each set in F is a subset of the sample space
• A probability function Pr: FR satisfying definition 2 below
An element of W is called a simple or elementary event
• Example: In RANDOM_TEST, the sample space is the
set of integers {1,…100d}.
• Each choice of an integer r in this range is a simple event

5
Axioms
• Def2: A probability function is any function Pr: FR that
satisfies the following conditions:
1. For any event E, O Pr(E) 1;
2. Pr(W) =1; and
3. For any sequence of pairwise mutually disjoint events E 1, E2, E3 …,
Pr(i1Ei) = i1Pr(Ei)
• events are sets  use set notation to express event combinations

• In RANDOM_TEST:
• Each choice of an integer r is a simple event
• All the simple events have equal probability
• The sample space has 100d simple events, and the sum of the
probabilities of all simple events must be 1  each simple event has
probability 1/100d 6
Lemmas
• Lem1: For any two events E1, E2:
Pr(E1E2)= Pr(E1) + Pr(E2)- Pr(E1E2)
• Lem2(Union bound): For any finite of countably
infinite sequence of events E1, E2, E3 …,
Pr(i1Ei)  i1Pr(Ei)
• Lem3(inclusion-exclusion principle) Let E1, E2, E3 … be any
n events. Then
Pr(i=1,nEi) =i=1,nPr(Ei) - i<jPr(EjEj) +
i<j<kPr(EiEj Ek) - …
7
+(-1) i1...il Pr(r=1,lEir) +…
l+1
Analysis of RANDOM_TEST
• The algo gives an incorrect answer if the random
number it chooses is a root of polynomial F-G
• Let E represent the event that RANDOM_TEST failed to
give the correct answer
• The elements of the set corresponding to E are the roots of
the polynomial F-G that are in the set of integer {1,…100d}
• Since F-G has degree at most d then has no more than d
roots  E has at most d simple events
• Thus, Pr( RANDOM_TEST fails) = Pr(E)  d/(100d) =
1/100
8
How to improve the algo. for
smaller failure probability?
• Can increase the sample space
• E.g. {1,…, 1000d}
• But not so good
• Repeat the algo multiple times, using different
random values to test
• If F(r)=G(r) for just one of these many rounds then
output “non-equivalent”
• Can sample from {1,…100d} many times with or
without replacements
• With replacement: like having no memory of previous
selections

9
Notion of independence
• Def3: Two events E and F are independent iff (if and only if)
Pr(EF)= Pr(E) . Pr(F)
More generally, events E1, E2, …, Ek are mutually independent iff for any
subset I[1,k]: Pr(iIEi)= PiIPr(Ei)
• Now for our algorithm samples with replacements
• The choice in one iteration is independent from the choices in previous
iterations
• Let Ei be the event that the ith run of algo picks a root ri s.t. F(ri)-G(ri)=0
• The probability that the algo returns wrong answer is
Pr(E1 E2 … Ek) = Pi=1,kPr(Ei)  Pi=1,k (d/100d) = (1/100)k
• Sampling without replacement:
• The probability of choosing a given number is conditioned on the events of
the previous iterations

10
Notion of conditional probability

• Def 4: The conditional probability that event E


occurs given that event F occurs is
Pr(E|F) = Pr(EF)
/Pr(F)
• Note this con. pro. only defined if Pr(F)>0
• When E and F are independent and Pr(F)>0 then
Pr(E|F) = Pr(EF)
/Pr(F) = Pr(E).Pr(F)
/Pr(F) = Pr(E)
• Intuitively, if two events are independent then
information about one event should not affect the
probability of the other event.
11
Sampling without replacement
• Again assume FG
• We repeat the algorithm k times: perform k iterations of
random sampling from [1,…100d]
• What is the prob that all k iterations yield roots of F-G,
resulting in a wrong output by our algo?
• Need to bound Pr(E1 E2 … Ek)
Pr(E1 E2 … Ek)= Pr(Ek|E1 … Ek-1) . Pr(E1 E2 … Ek-1)
= Pr(E1). Pr(E2|E1). Pr(E3|E1 E2) … Pr(Ek|E1 … Ek-1)
• Need to bound Pr(Ej|E1 … Ej-1):  d-(j-1)/100d-(j-1) Why?
So Pr(E1 E2 … Ek)  Pj=1,k d-(j-1)/100d-(j-1)  (1/100)k, slightly better
than sampling with replacement
• Use d+1 iterations: always give correct answer. Why?
Efficient?
12
Random variables

• Def 5: A random variable X on a sample space


W is a real-valued function on W; that is X:
WR. A discrete random variable is a random
variable that takes on only finite or countably
infinite number of values
• So, “X=a” represents the set {s |X(s)=a}
• Pr(X=a) =  X(s)=a Pr(s)
Eg. Let X is the random variable representing the
sum of the two dice. What is the prob of X=4?
13
Random variables

• Def6: Two random variables X and Y are independent iff


for all values x and y:
Pr( (X=x)(Y=y) ) = Pr(X=x). Pr(Y=y)

14
Expectation
• Def 7: The expectation of a discrete random variable
X, denoted by E[X] is given by E[X] = ii.Pr(X=i)
• where the summation is over all values in range of X
• Eg. compute the expectation of the random variable X
representing the sum of two dice

15
Linearity of expectation
• Theorem:
• E[i=1,nXi] = i=1,nE[Xi]
• E[c X] = c E[X] for all constant c

16
Bernoulli and Binomial random
variables
• Consider an experiment that succeeds with probability p
and fails with probability 1-p
• Let Y be a random variable that takes 1 if the experiment
succeeds and 0 if otherwise. Such a r.v. is called a Bernoulli or
an indicator random variable
• E[Y] = p
• Now we want to count X, the number of success in n tries
• A binomial random variable X with parameters n and p,
denoted by B(n,p), is defined by the following
probability distribution on j=0,1,2,…, n:
Pr(X=j) = (n choose j) pj(1-p) n-j
• E.g. used a lot in sampling (book: Mit-Upfal)

17
The Hiring Problem Revisited

HIRE-ASSISTANT(n)
1 best←0
candidate 0 is a least-qualified dummy candidate
2 for i←1 to n
3 do interview candidate i
4 if candidate i is better than candidate best
5 then best←i
6 hire candidate i

18
Cost Analysis
• We are not concerned with the running time of
HIRE-ASSISTANT, but instead with the cost
incurred by interviewing and hiring.
• Interviewing has low cost, say ci, whereas hiring
is expensive, costing ch. Let m be the number of
people hired. Then the cost associated with this
algorithm is O (nci+mch). No matter how many
people we hire, we always interview n candidates
and thus always incur the cost nci, associated
with interviewing.

19
Worst-case analysis

• In the worst case, we actually hire every candidate


that we interview. This situation occurs if the
candidates come in increasing order of quality, in
which case we hire n times, for a total hiring cost of
O(nch).

20
Probabilistic analysis

• Probabilistic analysis is the use of


probability in the analysis of problems. In
order to perform a probabilistic analysis, we
must use knowledge of the distribution of the
inputs.
• For the hiring problem, we can assume that
the applicants come in a random order.

21
Randomized algorithm

• We call an algorithm randomized if its behavior is


determined not only by its input but also by values
produced by a random-number generator.

22
Indicator random variables
The indicator random variable I[A]
associated with event A is defined as

1 i f A occur s
I [ A]  
0 i f A does not occur
• Lemma: Given a sample space  and an
event A in the sample space , let XA=I{A}.
Then E[XA]=Pr(A).

23
Analysis of the hiring problem using indicator random
variables

• Let X be the random variable whose value equals


the number of times we hire a new office assistant
and Xi be the indicator random variable associated
with the event in which the ith candidate is hired.
Thus,
X=X1+X2+…+Xn

By the lemma above, we have


E[Xi]=Pr{ candidate i is hired}=1/i. Thus,
E[X]=1+1/2+1/3+…+1/n=ln n+O(1)
24
Randomized algorithms
RANDOMIZED-HIRE-ASSISTANT(n)
1 randomly permute the list of candidate
2 best←0
3 for i←1 to n
4 do interview candidate i
5 if candidate i is better than candidate best
6 then best←i
7 hire candidate i

25
Food for thoughts
PERMUTE-BY-SORTING(A)
1 n←length[A]
2 for i←1 to n
3 do P[i] ←RANDOM(1,n3)
4 sort A, using P as sort keys
5 return A
Lemma: Procedure PERMUTE-BY-SORTING
produces a uniform random permutation of
input, assuming that all priorities are distinct.
26
Food for thoughts
RANDOMIZE-IN-PLACE(A)
1 n←length[A]
2 for i←1 to n
3 do swap A[i ]↔A[RANDOM(i,n)]

Lemma: Procedure RANDOMIZE-IN-PLACE


computes a uniform random permutation.

27
Thank you for​
your attentions!​

You might also like