Hash Table 2010

Hash Tables:
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Analysis of hashing with
chaining:
• Given a hash table with m slots and n keys,
define load factor = n/m : average number of
keys per slot.
• Assume each key is equally likely to be

hashed into any slot: simple uniform hashing
(SUH).
• Thm: In a hash table in which collisions are
resolved by chaining, an unsuccessful search
takes expected time Θ(1+ ) under SUH.
Proof:
Under the assumption of SUH, any un-stored
key is equally likely to hash to any of the m
slots.
The expected time to search unsuccessfully for
a key k is the expected time to search to the
end of list T[h(k)], which is exactly .
Thus, the total time required is Θ(1+ ). □
• Thm: In a hash table in which collisions are resolved
by chaining, a successful search takes time Θ(1+ ),
on the average under SUH.
Proof:
Let the element being searched for equally likely to be
any of the n elements stored in the table.
The expected time to search successfully for a key.
.... x ....
Elements before x in the list were inserted after x was

inserted.
We want to find the expected number of elements
added to x’s list after x was added to the list.
Let xi denote the ith element into the table, for i =1 to n,
and let ki=key[xi].
Define Xij = I{ h(ki)=h(kj) }. Under SUH, we have
Pr{ h(ki)=h(kj) } = 1/m = E[Xij ].
1 n n
1 n n
E[  (1   X ij )]   (1   E[X ij ])
n i 1 j  i 1 n i 1 j  i 1
1 n n
1 1 n
  (1   )  1  (n  i)
n i 1 j i 1 m mn i 1
1 2 n(n  1) n 1  
 1 (n  )  1  1 
mn 2 2m 2 2n
 
(2   )  (1   ).
2 2n

The multiplication method:

0<A<1
Knuth suggests that A  ( 5  1) / 2  0.6180339887...

Take k  123456, p  14, m  214  16384 and w  32.
s  A  232  2654435769
k  s  327706022297664  (76300  2 32 )  176112864
The 14 most significant bits of 176112864 is 67!
Universal Hashing
• H={ h: U→{0,…,m-1} }, which is a finite collection of
hash functions.
• H is called “universal” if for each pair of distinct keys
k, ∈ U,the number of hash functions h∈H for which
h(k)=h( ) is atmost |H|/m
• Define ni = the length of list T[i]
• Thm:
suppose h is randomly selected from H, using chaining to
resolve collisions. If k is not in the table, then E[nh(k)] ≤ α.
If k is in the table, then
E[nh(k)] ≤ 1+α
• Proof:
– For each pair k and  of distinct keys,
define Xk =I{h(k)=h( )}.
– By definition, Prh{h(k)=h()} ≤ 1/m, and so
E[Xk ] ≤ 1/m.
– Define Yk to be the number of keys other than k
that hash to the same slot as k, so that
Yk   Xk 
T
 k
1
E[Yk ]   E[ Xk ]  
T T m
 k  k
– If k  T , then nh ( k )  Yk and |{ :  T ,   k }|  n
n
thus E[nh ( k )]  E[Yk ]   
m
– If k∈T, then because k appears in T[h(k)] and the count
Yk does not include k, we have nh(k) = Yk + 1
and |{ :  T ,   k }| n  1
n 1 1
Thus E[nh ( k )]  E[Yk ]  1  1  1    1 
m m
Designing a universal class of hash functions:
p:prime
Z p   0,1,, p  1 p  1,2,, p  1

Z
For any a  Z pand b  Z,p define
ha ,b ( k )  (( ak  b) mod p ) mod m, ha,b:Zp→Zm
 p ,m  ha ,b : a  Z *p and b  Z p 
Theorem:
Hp,m is universal.
Pf: Let k,  be two distinct keys in Zp.

Given ha,b, Let r=(ak+b) mod p , and
s=(a +b) mod p.
Then r-s≡a(k- ) mod p
For any ha,b∈Hp,m, distinct inputs k and map
to distinct r and s modulo p.
Each possible p(p-1) choices for the pair
(a,b) with a≠0 yields a different resulting
pair (r,s) with r≠s, since we can solve for
a and b given r and s:
a=((r-s)((k-  )-1 mod p)) mod p
b=(r-ak) mod p
• There are p(p-1) possible pairs (r,s) with
r≠s, there is a 1-1 correspondence
between pairs (a,b) with a≠0 and
(r,s), r≠s.
• For any given pair of inputs k and  , if we
pick (a,b) uniformly at random from
Z p  Z p, the resulting pair (r,s) is equally
likely to be any pair of distinct values
modulo p.
• Pr[ k and  collide]=Prr,s[r≡s mod m]
• Given r, the number of s such that s≠r and
s≡r (mod m) is at most
⌈p/m⌉-1≤((p+m-1)/m)-1
=(p-1)/m
∵ s, s+m, s+2m,…., ≤p
• Thus,
Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1)
=1/m
Therefore, for any pair of distinct
k, ∈Zp,
Pr[ha,b(k)=ha,b( )] ≤1/m,
so that Hp,m is universal.
• Open addressing:
– There is no list and no element stored
outside the table.
– Advantage: avoid pointers, potentially yield
fewer collisions and faster retrieval.
– h : U   0,1, , m  1   0,1, , m  1
– For every k, the probe sequence
h  k , 0  , h  k ,1 , , h  k , m  1
is a permutation of 0,1, , m  1 .
– Deletion from an open-address hash table
is difficult.
– Thus chaining is more common when keys
must be deleted.
• Linear probing:
– h : U   0,1, , m  1 ~ an ordinary hash
function (auxiliary hash function).
– h  k , i    h  k   i  .mod m
• Quadratic probing:
– h  k , i    h   k   c1i  c2 
i 2
mod m ,where h’ is an
auxiliary hash function, c1 and c2≠0 and are
constants.
• Double hashing:
– h  k , i    h1  k   ih2  k   mod m ,where h1 and h2
are auxiliary hash functions.
–   m 2  probe sequences; Linear and
Quadratic have   m  probe sequences.
• Analysis of open-addressing hashing
n
 : load factor,
m
with n elements and m slots.
• Thm:
Given an open-address hash table with load
factor   n m  1 , the expected number
of probes in an unsuccessful search is at
most 1  1    , assuming uniform hashing.
• Pf:
– Define the r.v. X to be the number of probes
made in an unsuccessful search.
– Define Ai: the event there is an ith probe and it
is to an occupied slot.
– Event  X  i  A1  A2 .   Ai 1
– Pr  X  i  Pr  A1  A2    Ai 1
 Pr  A1  Pr  A2 | A1  Pr  A3 | A1  A2  
 Pr  Ai 1 | A1  A2    Ai  2 
n
Pr  A1 
m
– The prob. that there is a jth probe and it is to an
occupied slot, given that the first j-1 probes were
to occupied slots is (n-j+1)/(m-j+1). Why?
– ∵n<m, (n-j)/(m-j) ≤ n/m for all 0 ≤ j<m.
– Pr[ X  i ]  n  n  1  n  i  2
m m 1 m  i  2
n i 1
 ( )   i 1
m
  
1
E[ X ]   Pr[ X  i ]    i 1
  
i
i 1 i 1 i 0 1
Cor: Inserting an element into an open-
addressing hash table with load factor α
requires at most 1/(1- α) probes on
average, assuming uniform hashing.
Thm: Given an open-address hash table with
load factor α<1, the expected number of
probes in a successful search is at most
1 1
 , assuming uniform hashing and
ln
1
that each key in the table is equally
likely to be searched for.
Pf: Suppose we search for a key k.
If k was the (i+1)st key inserted into the
hash table, the expected number of probes
made in a search for k is at most
1/(1-i/m)=m/(m-i).
• Averaging over all n keys in the hash table
gives us the average number of probes in a
successful search:
1 n1 m m n 1 1 1
    ( H m  H mn )
n i 0 m  i n i 0 m  i 
m m
1 1 1 dx 1 m 1 1
     ln  ln
 k mn1 k  mn x  m  n  1  
Perfect Hashing:
• Perfect hashing :
good for when the keys are static; i.e. , once
stored, the keys never change, e.g. CD-ROM, the
set of reserved word in programming language.
• Thm :
If we store n keys in a hash table of size m=n2
using a hash function h randomly chosen from a
universal class of hash functions, then the
probability of there being any collisions < ½ .
• Proof:
Let h be chosen from an universal family. Then each
n
pair collides with probability 1/m , and there are  2 
pairs of keys.
Let X be a r.v. that counts the number of collisions.
When m=n2,
 n  1 n2  n 1 1
E[ X ]      2
 2 m 2 n 2
By Markov ' s inequality , Pr[ X  t ]  E[ X ]/ t ,
and take t  1.
• Thm: If we store n keys in a hash table of
size m=n using a hash function h randomly
chosen from universal class of hash
functions, then  j 0 j ]  2n , where n j is
m 1 2
E [ n
the number of keys hashing to slot j.
• Pf:
– It is clear for any nonnegative integer a,
a
a  a  2 
2
 2
– m 1 
m 1
 nj 
E[ n j ]  E[  n j  2  ]
2
j 0 j 0   2 
m 1
 j
m 1 n
 E[ n j ]  2 E[  ]
j 0 j 0  2 
nj 
m 1
 j
m 1 n
 E[ n]  2 E[  ]  n  2 E[  ]
j 0  2  j 0  2 
total number of collisions
 n  1 n(n  1) n  1
    , since m  n.
 2 m 2m 2
m 1
n 1
E[ n j ]  n  2 
2
 2n  1  2n.
j 0 2
• Cor: If store n keys in a hash table of size
m=n using a hash function h randomly
chosen from a universal class of hash
functions and we set the size of each
secondary hash table to mj=nj2 for j=0,…,m-
1, then the expected amount of storage
required for all secondary hash tables in a
perfect hashing scheme is < 2n.
• Cor: Same as the above,
Pr{total storage  4n} < 1/2
• Pf:
– By Markov’s inequality, Pr{ X  t }  E[X]/t.
m 1
– Take X   m j and t  4n :
j 0
m 1
m 1
E[ m j ]
2n 1
Pr{ m j  4n} 
j 0
  .
j 0 4n 4n 2

Hash Table 2010

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hash Table 2010

Uploaded by

Copyright:

Available Formats

Hash Tables:

• Assume each key is equally likely to be

Elements before x in the list were inserted after x was

The multiplication method:

Knuth suggests that A  ( 5  1) / 2  0.6180339887...

For any a  Z pand b  Z,p define

ha ,b ( k )  (( ak  b) mod p ) mod m, ha,b:Zp→Zm

Pf: Let k,  be two distinct keys in Zp.

total number of collisions

You might also like