Professional Documents
Culture Documents
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Analysis of hashing with
chaining:
• Given a hash table with m slots and n keys,
define load factor = n/m : average number of
keys per slot.
.... x ....
1 n n
1 1 n
(1 ) 1 (n i)
n i 1 j i 1 m mn i 1
1 2 n(n 1) n 1
1 (n ) 1 1
mn 2 2m 2 2n
(2 ) (1 ).
2 2n
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1
E[Yk ] E[ Xk ]
T T m
k k
– If k T , then nh ( k ) Yk and |{ : T , k }| n
n
thus E[nh ( k )] E[Yk ]
m
– If k∈T, then because k appears in T[h(k)] and the count
Yk does not include k, we have nh(k) = Yk + 1
and |{ : T , k }| n 1
n 1 1
Thus E[nh ( k )] E[Yk ] 1 1 1 1
m m
Designing a universal class of hash functions:
p:prime
Z p 0,1,, p 1 p 1,2,, p 1
Z
p ,m ha ,b : a Z *p and b Z p
Theorem:
Hp,m is universal.
i 1 i 1 i 0 1
Cor: Inserting an element into an open-
addressing hash table with load factor α
requires at most 1/(1- α) probes on
average, assuming uniform hashing.
Thm: Given an open-address hash table with
load factor α<1, the expected number of
probes in a successful search is at most
1 1
, assuming uniform hashing and
ln
1
that each key in the table is equally
likely to be searched for.
Pf: Suppose we search for a key k.
If k was the (i+1)st key inserted into the
hash table, the expected number of probes
made in a search for k is at most
1/(1-i/m)=m/(m-i).
• Averaging over all n keys in the hash table
gives us the average number of probes in a
successful search:
1 n1 m m n 1 1 1
( H m H mn )
n i 0 m i n i 0 m i
m m
1 1 1 dx 1 m 1 1
ln ln
k mn1 k mn x m n 1
Perfect Hashing:
• Perfect hashing :
good for when the keys are static; i.e. , once
stored, the keys never change, e.g. CD-ROM, the
set of reserved word in programming language.
• Thm :
If we store n keys in a hash table of size m=n2
using a hash function h randomly chosen from a
universal class of hash functions, then the
probability of there being any collisions < ½ .
• Proof:
Let h be chosen from an universal family. Then each
n
pair collides with probability 1/m , and there are 2
pairs of keys.
Let X be a r.v. that counts the number of collisions.
When m=n2,
n 1 n2 n 1 1
E[ X ] 2
2 m 2 n 2
By Markov ' s inequality , Pr[ X t ] E[ X ]/ t ,
and take t 1.
• Thm: If we store n keys in a hash table of
size m=n using a hash function h randomly
chosen from universal class of hash
functions, then j 0 j ] 2n , where n j is
m 1 2
E [ n
the number of keys hashing to slot j.
• Pf:
– It is clear for any nonnegative integer a,
a
a a 2
2
2
– m 1
m 1
nj
E[ n j ] E[ n j 2 ]
2
j 0 j 0 2
m 1
j
m 1 n
E[ n j ] 2 E[ ]
j 0 j 0 2
nj
m 1
j
m 1 n
E[ n] 2 E[ ] n 2 E[ ]
j 0 2 j 0 2
n 1 n(n 1) n 1
, since m n.
2 m 2m 2
m 1
n 1
E[ n j ] n 2
2
2n 1 2n.
j 0 2
• Cor: If store n keys in a hash table of size
m=n using a hash function h randomly
chosen from a universal class of hash
functions and we set the size of each
secondary hash table to mj=nj2 for j=0,…,m-
1, then the expected amount of storage
required for all secondary hash tables in a
perfect hashing scheme is < 2n.
• Cor: Same as the above,
Pr{total storage 4n} < 1/2
• Pf:
– By Markov’s inequality, Pr{ X t } E[X]/t.
m 1
– Take X m j and t 4n :
j 0
m 1
m 1
E[ m j ]
2n 1
Pr{ m j 4n}
j 0
.
j 0 4n 4n 2