You are on page 1of 3

We would like to show that the probability of

any subset of size k (from a set of size n) using


Reservoir sampling has an equal probability of being generated. To do so, we merely need to show
that for any variable i, i has equal probability of
being in any subset of size k out of n.
Lets start by first asking ourselves what is the
probability of a variable i being in any subset of
size k. Since each subset of size k has k elements,
and we have n elements to choose from, the probability of i appearing in any subset is: k/n.
Lets now compute the probability of an element
indexed i to be in the subset we create using Reservoir sampling. There are two cases:
1. i < k: In this case i has been originally selected so we need to show what is the probability
that it will survive. let A be the event that ith
element is not replaced.
P (A) = {the probability that elements k...n
dont replace i}.
What is the probability that an element j > k
replaces any element from the first k elements selected ?
our sampling technique selects a random number
1

between 0..j with probability

j+1
The probability that j doesnt replace the i elej
.
ment is thus:
j+1
The reason is that the probability that it will re1
place i is
. Hence the probability that it
j+1
1
j
doesnt replace it is 1
=
.
j+1
j+1
The probability that every index from k to n doesnt
k
k+1
n1
replace i is:

...
.
k+1 k+2
n
k
Notice that this simplifies to .
n
2. i >= k: In this case we want to see what
is the probablity that i is selected to replace some
index j < k but does not get replaces itself in
future selections. Lets denote this event by B.
V
P (B) = P (i is selected) P (i is not replaced) =
P (i is selected) P (i is not replaced).
P (i is selected) = we choose i if the random number j 0...i < k.
P (j < k) =

k
i+1

.
2

For the second part we need to find the probablity that i doesnt get replaces by future elements
i + 1...n.
Well use what we have already seen from the previous part (1). The probablity that it doesnt get
i+1 i+2
n1
i+1
replaced is:

...
=
i+2 i+3
n
n
Proof. so P (B) =

k
i+1

i+1
n

k
n

You might also like