Hash functions and hash tables (§2.5.2)

Hash Tables Hash function details

! Hash code map (§2.5.3)

0 ∅ ! Compression map (§2.5.4)

1

∅

025-612-0001

Collision handling (§2.5.5)

2

3 ∅ ! Chaining

451-229-0004 981-101-0004

4 ! Linear probing

! Double hashing

A hash function h maps A hash table for a given key

keys of a given type to type consists of We design a hash table

integers in a fixed interval ! Hash function h for a dictionary storing

[0, N − 1] ! Array (called table) of size N items (SSN, Name), 0 ∅

Example: When implementing a dictionary where SSN (social 1 025-612-0001

store item (k, o) at index i = h(k) nine-digit positive 3 ∅

is a hash function for

integer keys A collision occurs when two integer 4 451-229-0004 981-101-0004

keys in the dictionary have

Our hash table uses an

…

The integer h(x) is called the same hash value

the hash value of key x array of size N = 10,000

Collision handing schemes: 9997 ∅

The goal of a hash and the hash function

! Chaining: colliding items 9998 200-751-9998

function is to uniformly are stored in a sequence h(x) = last four digits of x

9999 ∅

disperse keys in the ! Open addressing: the We use chaining to

range [0, N − 1] colliding item is placed in a handle collisions

different cell of the table

Memory address: Component sum:

A hash function is The hash code map

! We reinterpret the memory We partition the bits of

usually specified as the is applied first, and address of the key object as

!

the key into components

the compression map an integer (default hash code

composition of two of all Java objects)

of fixed length (e.g., 16

is applied next on the or 32 bits) and we sum

functions: result, i.e., ! Good in general, except for the components

numeric and string keys

Hash code map: h(x) = h2(h1(x))

Integer cast:

(ignoring overflows)

Suitable for numeric keys

h1: keys → integers The goal of the hash ! We reinterpret the bits of the

!

function is to key as an integer

Compression map: than or equal to the

“disperse” the keys ! Suitable for keys of length number of bits of the

h2: integers → [0, N − 1] in an apparently less than or equal to the integer type (e.g., long

number of bits of the integer

random way and double in Java)

type (e.g., byte, short, int

and float in Java)

9/2/2002 3:15 AM Hash Tables 5 9/2/2002 3:15 AM Hash Tables 6

1

Hash Tables 9/2/2002 3:15 AM

Polynomial accumulation: Polynomial p(z) can be

! We partition the bits of the evaluated in O(n) time

Division: Multiply, Add and

key into a sequence of

using Horner’s rule: ! h2 (y) = y mod N Divide (MAD):

components of fixed length

(e.g., 8, 16 or 32 bits) ! The following ! The size N of the ! h2 (y) = (ay + b) mod N

a0 a1 … an−1 polynomials are hash table is usually ! a and b are

We evaluate the polynomial successively computed, chosen to be a prime

!

each from the previous

nonnegative integers

p(z) = a0 + a1 z + a2 z2 + …

… + an−1zn−1 one in O(1) time ! The reason has to do such that

at a fixed value z, ignoring p0(z) = an−1 with number theory a mod N ≠ 0

overflows pi (z) = an−i−1 + zpi−1(z) and is beyond the ! Otherwise, every

! Especially suitable for strings (i = 1, 2, …, n −1) scope of this course integer would map to

(e.g., the choice z = 33 gives

at most 6 collisions on a set We have p(z) = pn−1(z) the same value b

of 50,000 English words)

9/2/2002 3:15 AM Hash Tables 7 9/2/2002 3:15 AM Hash Tables 8

Linear probing handles Example: Consider a hash table A Algorithm findElement(k)

collisions by placing the i ← h(k)

h(x) = x mod 13 that uses linear probing p←0

colliding item in the next !

We start at cell h(k) c ← A[i]

table cell 22, 44, 59, 32, 31, !

if c = ∅

73, in this order We probe consecutive

Each table cell inspected !

return NO_SUCH_KEY

locations until one of the

is referred to as a following occurs else if c.key () = k

“probe” " An item with key k is

return c.element()

Colliding items lump 0 1 2 3 4 5 6 7 8 9 10 11 12 found, or

else

i ← (i + 1) mod N

together, causing future " An empty cell is found,

p←p+1

collisions to cause a or

until p = N

longer sequence of 41 18 44 59 32 22 31 73 " N cells have been

return NO_SUCH_KEY

probes 0 1 2 3 4 5 6 7 8 9 10 11 12 unsuccessfully probed

To handle insertions and insert Item(k, o) Double hashing uses a Common choice of

deletions, we introduce a ! We throw an exception secondary hash function compression map for the

special object, called if the table is full d(k) and handles secondary hash function:

AVAILABLE, which replaces ! We start at cell h(k) collisions by placing an d2(k) = q − k mod q

deleted elements ! We probe consecutive item in the first available

removeElement(k) cells until one of the cell of the series where

! We search for an item with following occurs (i + jd(k)) mod N ! q<N

key k " A cell i is found that is for j = 0, 1, … , N − 1 ! q is a prime

either empty or stores

If such an item (k, o) is

!

AVAILABLE, or The secondary hash The possible values for

found, we replace it with the

special item AVAILABLE

" N cells have been function d(k) cannot have d2(k) are

unsuccessfully probed zero values

and we return element o 1, 2, … , q

! We store item (k, o) in

! Else, we return cell i The table size N must be

NO_SUCH_KEY a prime to allow probing

of all the cells

9/2/2002 3:15 AM Hash Tables 11 9/2/2002 3:15 AM Hash Tables 12

2

Hash Tables 9/2/2002 3:15 AM

Consider a hash

k h (k ) d (k ) Probes In the worst case, searches, The expected running

18 5 3 5 insertions and removals on a time of all the dictionary

table storing integer 41 2 1 2 hash table take O(n) time

22 9 6 9 ADT operations in a

keys that handles 44 5 5 5 10 The worst case occurs when

hash table is O(1)

collision with double 59 7 4 7 all the keys inserted into the

32 6 3 6 dictionary collide In practice, hashing is

hashing 31 5 4 5 9 0 The load factor α = n/N very fast provided the

! N = 13 73 8 4 8 affects the performance of a load factor is not close

! h(k) = k mod 13 hash table to 100%

d(k) = 7 − k mod 7 Assuming that the hash

!

0 1 2 3 4 5 6 7 8 9 10 11 12 values are like random Applications of hash

Insert keys 18, 41, numbers, it can be shown tables:

22, 44, 59, 32, 31, that the expected number of ! small databases

31 41 18 32 59 73 22 44 probes for an insertion with compilers

73, in this order open addressing is

!

0 1 2 3 4 5 6 7 8 9 10 11 12 ! browser caches

1 / (1 − α)

9/2/2002 3:15 AM Hash Tables 13 9/2/2002 3:15 AM Hash Tables 14

