You are on page 1of 7

CS 6841 - Assignment 1

Shijin, EE12B128

1. (10 points) (Designing a Tournament) This question will help you understand the
applications of Chernoff bound and Union bound more, as youll deal with tuning parameters to optimize the efficiency of the system. Youll also get an understanding of
why NBA tournaments are designed the way they are, with more repeated matches being
played as we get closer to the finals. For example, in the NBA, the early rounds are
best-of-five, and the later rounds become best-of-seven. You will understand why, and
also learn how to design tournaments with n participants, for large n.
Suppose there are n teams, and they are totally ranked. That is, there is a well-defined
best team, second ranked team and so on. Its just that we (the algorithm designer) dont
know the ranking. Moreover, assume that for any given match between two players, the
better ranked team will win the match with probability p = 21 + , independent of all
other matches between these players and all other players also. Here is a small positive
constant.
(a) (2 points) Let n be a power of two, and fix an arbitrary tournament tree starting
with n/2 matches, then n/4 matches and so on. What is the probability that the
best team wins the tournament?
Solution:

1
2

log2 (n)

(b) (3 points) Use Chernoff bounds to bound the probability that the best team will
not win the tournament, if each match-up occurs as a best-of-k series. How many
games do you end up conducting in total to get a 1  probability of the best team
winning?
Solution: Consider the indicator variable Ii for the event that the best player
wins the ith game of the k matches
in a best-of-k series. So in expectation, the

1
best player wins k 2 + games. We find the probability of the best player
losing using the chernoff bound.
" k

#
X
1
2
Pr [best player losing in best-of-k series] = Pr
Ii < k( + ) 1
2
1 + 2
i=1
" k
#
X
k
= Pr
Ii <
2
i=1
"
2 1
#
2
k
+

2
exp 1+2
2


2k
= exp
1 + 2
Now the probability that the best player loses one of the log2 (n) games,
h each
i
2 k
with k matches, (and hence the tournament) is atmost log2 (n) exp 1+2 ,

using the Union Bound.


To get a  upper
on the losing probability of the best player, we require
i
h 2bound
k
log2 (n) exp 1+2 , which gives

k

1 + 2
2


[log log n log ]

So the total number of games conducted is nk (n log log n)


(c) (5 points) Now design a tournament with a total of O (n) games to get 1 
probability of the best team winning eventually. O (n) means O(n) for all constant
 > 0.
Solution: Consider the following game: first there are n/2 games, each a bestof-k match, and then the n/2 winners pair up to form n/4 games, each a bestof-(k+1) match, and so on for log2 (n) rounds.
The probability of the best player losing is bounded as:


 2

2k
(k + 1)
Pr [best player losing] exp
+ exp
+ ...
1 + 2
1 + 2
2 (k+1)


1 exp( 1+2
) log2 (n)
2k
= exp
2 (k+1)

1 + 2
1 exp( 1+2 )


2
k
1
< exp
1 + 2 1 exp( 2 (k+1) )
1+2

To have this probability , we require k

 

2 (k+1)
log  1exp( 1+2 )
2 (k+1)
1+2

. The total

number of games played is n2 k + n4 (k + 1) + ... = n(2k + 1). Since any k above


the lower bound(which is independent of n) will give a 1  probab of best
player winning, the total number of games is O (n).
2. (25 points) (Counting Distinct Elements) Universal hash families come in handy
when designing streaming algorithms that make a single pass and use very little space in
comparison to the size of the input stream. Here, we will construct a streaming algorithm
to estimate the number of distinct elements in the stream. The technique here is from a
seminal work of Alon, Matias and Szegedy (STOC 1996); the authors were awarded the
Godel prize in 2005 for this work.
(a) (5 points) Recall that for a prime p, and an integer k, where 1 k p, the

Page 2

collection:
Hk = {hab : x (ax + b) mod k | a, b Fp , a 6= 0}
is a 2-universal hash family. Prove the following lemma:
Lemma 1. For every set S Fp ,
1. if |S| < k, then
PrhH4k [x S : h(x) = 0] < 1/4;
2. while, if |S| 2k, then
PrhH4k [x S : h(x) = 0] 3/8.
Solution:
1. |S| < k:
Pr [x S : h(x) = 0]

hH4k

X
xS

Pr [h(x) = 0]

hH4k

X 1
=
4k
xS
|S|
4k
1
k
=
<
4k
4

2. |S| 2k:
Pr [x S : h(x) = 0]

hH4k

X
xS

Pr [h(x) = 0]

hH4k

x,yS:x6=y

Pr [h(x) = 0 h(y) = 0]

hH4k

X
X 1
1

=
4k x,yS:x6=y 16k 2
xS

|S|
|S|
2
=

4k
16k 2
2k 2k
2k

4k 2 4k 4k
1 1
3
= =
2 8
8
(b) (10 points) Consider a stream of integers: x1 , x2 , . . .; where each xi [n]. For a
parameter k, where 1 k n, design a streaming algorithm with the following
guarantees:
1. if the stream contains strictly less than k distinct elements, the output is No
with probability at least 1 1/n2 ;

Page 3

2. while, if the stream contains at least 2k distinct elements, then the output is
Yes with probability at least 1 1/n2 .
The algorithm should proceed in three phases as in the template solution below.
Fill in the details of the algorithm in the given template; you need not provide
implementation details but should be precise in your description.
Solution:
Initialization: (pre-processing done before making the pass)
1. Consider a 2-universal hash family H : [n] [k].
2. Draw t = 6048 log n samples from the hash family, independently
and uniformly at random. Let the functions be g1 , g2 , ..., gt .
Processing: (x [n] is the element to process) Find g1 (x), g2 (x), ..., gt (x).
Output: (called after the input is processed) If more than t/6 of the hash
functions have gi (x) = 0, then output Yes. Else output No
Prove the guarantees of the algorithm designed above and calculate the space used
by the algorithm (in O() notation).
Solution: If there are more than 2k distinct elements, then the probability
that none of them hash to bucket 0 is Prgi H (no element maps to bucket 0) =
#distinct
2k
1 k1
e12 17 . On the other hand, if there are less than k
1 k1
#distinct
distinct elements, then Prgi H (no element maps to bucket 0) = 1 k1


1 k
1
1 k 4.
Now, for the t copies, if there are less than k distinct elements, then we want
less than t/6 of the hash functions to map the element x to bucket 0. The
probability of this can be bounded using a Chernoff bound:
1 t

t
1
t
Pr t (#functions that dont hash to 0 < (1 ) = ) exp( 9 4 )
gH
4
3
6
2
t
= exp ( )
72
3024 2 log n
= exp (
)
72
72 2 log n
exp (
)
72
1
= 2
n

1
Hence the output is No with probability 1 n2 .

Page 4

On the other hand, if there are more than 2k distinct elements, we want more
than 6t of the hash functions to map to bucket 0.
1
6t
6t
1
5t
(1 ) = ) exp( 3636 7 )
gH
7
36
6
2
t
)
= exp (
3024
3024 2 log n
= exp (
)
3024
1
= 2
n

And thus the output is Yes with probability 1 n12 .

Pr t (#functions that hash to 0 <

The space used by this algorithm is O(log n).


(c) (10 points) Using the above, design a streaming algorithm that outputs an estimate
of the number of distinct elements, upto a factor 2 with probability at least 1 1/n
(n is a parameter passed to the algorithm). As before, the algorithm should be
in three phases (and may use the above procedures). Prove the guarantees and
calculate the total space used by the algorithm.
extra creditextra creditextra credit
(d) (5 extra credit points) Show how the above algorithm can be modified to obtain a
factor (1 + )-approximate with high probability.
extra creditextra creditextra credit
(e) (10 extra credit points) Can you reduce the space even further while maintaining
the guarantees as in part c? The improvement should be asymptotic, and not just
by a constant factor.
3. (15 points) (Locality Sensitive Hashing) Given a distance metric d : X X R0 ,
a LSH family is a collection:
H = { h : X [M ] }
such that, for all x, y X:
1. if d(x, y) R,
PrhH [h(x) = h(y)] p1 ;
2. while, if d(x, y) > cR,
PrhH [h(x) = h(y)] < p2 .
3. moreover, this (collision) probability is a non-increasing function of d(x, y).

Page 5

Here, R, c, p1 , p2 are parameters that determine the quality of the hash family. As was
outlined in class, Indyk and Motwani designed an algorithm to store n data-points so
that c-approximate near-neighbor queries may be answered in sub-linear time.
Recall that the algorithm consists of two components:
Preprocessing: (given the data-points x1 , . . . , xn , and two integer parameters k, l
chosen appropriately.)
1. Choose l hash functions, g1 , . . . , gl Hk and initialize hash tables for each of
them. In other words, each gi is the concatenation of k (randomly chosen) hash
functions from H.
2. Insert each data point in each of the hash functions.
Query: (given data-point y.)
1. Iterate over the points mapped to gi (y) for 1 i l in the hash table and
output all points xj at distance at most cR.
2. Abort the above search if more than 10l data-points have already been looked
at among all the hash tables.
(a) (3 points) Calculate the value of k so that, if x, y X satisfy d(x, y) > cR, then:
PrgHk [g(x) = g(y)] < 1/n.
Show hence that the probability of aborting the query procedure is at most 1/5.
Solution: If x, y X s.t d(x, y) > cR, then since g is composed of k independent hash functions, Q
PrgHk (g(x) = g(y)) = kj=1 Prgj H (gj (x) = gj (y)) < (p2 )k .
So if p2 k < n1 , i.e, k > logp2 n1 then PrgHk [g(x) = g(y)] < 1/n.
(b) (6 points) For the value of k calculated above, calculate the collision probability,
under a random g, of a data-point x that is indeed within distance R from y.
Calculate l such that the probability that none of the hash functions gi cause a
collision between x and y is at most 1/5.
Solution: An element x that is less than R distance from y collides with it if out
of the l hash functions, i s.t gi (x) = gi (y). The probability that gi (x) = gi (y)
is atleast (p1 )k , from part (a). Now the probability that gi (x) 6= gi (y) is atmost (1 (p1 )k ). The probability that gi (x) 6= gi (y) for all i 1, ..., l is atmost (1 (p1 )k )l , and thus the probability that there is a collision is atleast
1 (1 (p1 )k )l .
For this part, we need Pr(no collision) (1 (p1 )k )l 51 . This implies that
l log(1/5)/ log(1 (p1 )k ).

Page 6

(c) (6 points) State tha guarantees of the algorithm: output, space-complexity, timecomplexity and confidence for the values of k, and l calculated above.
Solution:

Page 7

You might also like