You are on page 1of 3

Hash Tables 9/2/2002 3:15 AM

Outline and Reading


Hash functions and hash tables (§2.5.2)
Hash Tables Hash function details
! Hash code map (§2.5.3)
0 ∅ ! Compression map (§2.5.4)
1

025-612-0001
Collision handling (§2.5.5)
2
3 ∅ ! Chaining
451-229-0004 981-101-0004
4 ! Linear probing
! Double hashing

9/2/2002 3:15 AM Hash Tables 1 9/2/2002 3:15 AM Hash Tables 2

Hash Functions and Hash Tables Example


A hash function h maps A hash table for a given key
keys of a given type to type consists of We design a hash table
integers in a fixed interval ! Hash function h for a dictionary storing
[0, N − 1] ! Array (called table) of size N items (SSN, Name), 0 ∅
Example: When implementing a dictionary where SSN (social 1 025-612-0001

h(x) = x mod N with a hash table, the goal is to security number) is a 2 ∅


store item (k, o) at index i = h(k) nine-digit positive 3 ∅
is a hash function for
integer keys A collision occurs when two integer 4 451-229-0004 981-101-0004
keys in the dictionary have
Our hash table uses an


The integer h(x) is called the same hash value
the hash value of key x array of size N = 10,000
Collision handing schemes: 9997 ∅
The goal of a hash and the hash function
! Chaining: colliding items 9998 200-751-9998
function is to uniformly are stored in a sequence h(x) = last four digits of x
9999 ∅
disperse keys in the ! Open addressing: the We use chaining to
range [0, N − 1] colliding item is placed in a handle collisions
different cell of the table

9/2/2002 3:15 AM Hash Tables 3 9/2/2002 3:15 AM Hash Tables 4

Hash Functions Hash Code Maps


Memory address: Component sum:
A hash function is The hash code map
! We reinterpret the memory We partition the bits of
usually specified as the is applied first, and address of the key object as
!
the key into components
the compression map an integer (default hash code
composition of two of all Java objects)
of fixed length (e.g., 16
is applied next on the or 32 bits) and we sum
functions: result, i.e., ! Good in general, except for the components
numeric and string keys
Hash code map: h(x) = h2(h1(x))
Integer cast:
(ignoring overflows)
Suitable for numeric keys
h1: keys → integers The goal of the hash ! We reinterpret the bits of the
!

of fixed length greater


function is to key as an integer
Compression map: than or equal to the
“disperse” the keys ! Suitable for keys of length number of bits of the
h2: integers → [0, N − 1] in an apparently less than or equal to the integer type (e.g., long
number of bits of the integer
random way and double in Java)
type (e.g., byte, short, int
and float in Java)
9/2/2002 3:15 AM Hash Tables 5 9/2/2002 3:15 AM Hash Tables 6

1
Hash Tables 9/2/2002 3:15 AM

Hash Code Maps (cont.) Compression Maps


Polynomial accumulation: Polynomial p(z) can be
! We partition the bits of the evaluated in O(n) time
Division: Multiply, Add and
key into a sequence of
using Horner’s rule: ! h2 (y) = y mod N Divide (MAD):
components of fixed length
(e.g., 8, 16 or 32 bits) ! The following ! The size N of the ! h2 (y) = (ay + b) mod N
a0 a1 … an−1 polynomials are hash table is usually ! a and b are
We evaluate the polynomial successively computed, chosen to be a prime
!
each from the previous
nonnegative integers
p(z) = a0 + a1 z + a2 z2 + …
… + an−1zn−1 one in O(1) time ! The reason has to do such that
at a fixed value z, ignoring p0(z) = an−1 with number theory a mod N ≠ 0
overflows pi (z) = an−i−1 + zpi−1(z) and is beyond the ! Otherwise, every
! Especially suitable for strings (i = 1, 2, …, n −1) scope of this course integer would map to
(e.g., the choice z = 33 gives
at most 6 collisions on a set We have p(z) = pn−1(z) the same value b
of 50,000 English words)
9/2/2002 3:15 AM Hash Tables 7 9/2/2002 3:15 AM Hash Tables 8

Linear Probing Search with Linear Probing


Linear probing handles Example: Consider a hash table A Algorithm findElement(k)
collisions by placing the i ← h(k)
h(x) = x mod 13 that uses linear probing p←0
colliding item in the next !

(circularly) available ! Insert keys 18, 41, findElement(k) repeat


We start at cell h(k) c ← A[i]
table cell 22, 44, 59, 32, 31, !
if c = ∅
73, in this order We probe consecutive
Each table cell inspected !
return NO_SUCH_KEY
locations until one of the
is referred to as a following occurs else if c.key () = k
“probe” " An item with key k is
return c.element()
Colliding items lump 0 1 2 3 4 5 6 7 8 9 10 11 12 found, or
else
i ← (i + 1) mod N
together, causing future " An empty cell is found,
p←p+1
collisions to cause a or
until p = N
longer sequence of 41 18 44 59 32 22 31 73 " N cells have been
return NO_SUCH_KEY
probes 0 1 2 3 4 5 6 7 8 9 10 11 12 unsuccessfully probed

9/2/2002 3:15 AM Hash Tables 9 9/2/2002 3:15 AM Hash Tables 10

Updates with Linear Probing Double Hashing


To handle insertions and insert Item(k, o) Double hashing uses a Common choice of
deletions, we introduce a ! We throw an exception secondary hash function compression map for the
special object, called if the table is full d(k) and handles secondary hash function:
AVAILABLE, which replaces ! We start at cell h(k) collisions by placing an d2(k) = q − k mod q
deleted elements ! We probe consecutive item in the first available
removeElement(k) cells until one of the cell of the series where
! We search for an item with following occurs (i + jd(k)) mod N ! q<N
key k " A cell i is found that is for j = 0, 1, … , N − 1 ! q is a prime
either empty or stores
If such an item (k, o) is
!
AVAILABLE, or The secondary hash The possible values for
found, we replace it with the
special item AVAILABLE
" N cells have been function d(k) cannot have d2(k) are
unsuccessfully probed zero values
and we return element o 1, 2, … , q
! We store item (k, o) in
! Else, we return cell i The table size N must be
NO_SUCH_KEY a prime to allow probing
of all the cells
9/2/2002 3:15 AM Hash Tables 11 9/2/2002 3:15 AM Hash Tables 12

2
Hash Tables 9/2/2002 3:15 AM

Example of Double Hashing Performance of Hashing


Consider a hash
k h (k ) d (k ) Probes In the worst case, searches, The expected running
18 5 3 5 insertions and removals on a time of all the dictionary
table storing integer 41 2 1 2 hash table take O(n) time
22 9 6 9 ADT operations in a
keys that handles 44 5 5 5 10 The worst case occurs when
hash table is O(1)
collision with double 59 7 4 7 all the keys inserted into the
32 6 3 6 dictionary collide In practice, hashing is
hashing 31 5 4 5 9 0 The load factor α = n/N very fast provided the
! N = 13 73 8 4 8 affects the performance of a load factor is not close
! h(k) = k mod 13 hash table to 100%
d(k) = 7 − k mod 7 Assuming that the hash
!
0 1 2 3 4 5 6 7 8 9 10 11 12 values are like random Applications of hash
Insert keys 18, 41, numbers, it can be shown tables:
22, 44, 59, 32, 31, that the expected number of ! small databases
31 41 18 32 59 73 22 44 probes for an insertion with compilers
73, in this order open addressing is
!

0 1 2 3 4 5 6 7 8 9 10 11 12 ! browser caches
1 / (1 − α)
9/2/2002 3:15 AM Hash Tables 13 9/2/2002 3:15 AM Hash Tables 14