You are on page 1of 31

Chapter 4

Hash Tables

COMP2015 HKBU/CS/JF/2023 1
Motivation
n Binary search tree and AVL tree can be used to implement
Insert and Find operations
q The premise is that the data should be sorted in advance

n Objective:

Based on a table named hash table, hashing is a technique


used for performing
q Insertions

q Deletions

q Finds

in constant time “on an average” without sorting the data

COMP2015 HKBU/CS/JF/2023 2
Hash Table
n A hash table is a kind of symbol table.

n A symbol table is like a dictionary, with which you can look


up a word for its meaning.

COMP2015 HKBU/CS/JF/2023 3
Symbol Table

n Formally, a symbol table is a collection of 〈key, value〉 pairs.


q “Key” and “value” can be anything.

q “Value” can be composed of multiple parts.

Key Value
23123456 Chan Yat, Male, 96868686
23234567 Cheung Wai Man, Male, 62323232
23345678 Lee Wing Man, Female, 94343434
23456789 Wong Siu Ming, Male, 61700800
… ...

COMP2015 HKBU/CS/JF/2023 4
Hash Function
n Hash Function: a function h that maps a search key to an
integer index.
q e.g., h(“06053344”) = 36

“06053344”
(a search key)

q A hash function:
n Should be simple to compute;

n Should ensure that any two distinct keys get different


cells;
n Should distribute the keys evenly among the cells.

COMP2015 HKBU/CS/JF/2023 5
Hash Table
n A symbol table that uses a hash function is called a hash
table.

“06053344”
h(“06053344”)

n Hash table is a fixed size (TableSize) array containing keys.


n Each key is mapped into some number in the range 0 to
TableSize - 1, and placed in the appropriate cells.

COMP2015 HKBU/CS/JF/2023 6
Collision in Hashing
n A hash function can sometimes return the same index
(called hash value) for two different keys.
h(“06053344”) = 36
h(“06059999”) = 36

n Having two or more keys hash to the same index is called
collision.

n Hashing basic plan:


q create an array for the items to be stored

q use a hash function to figure out storage location from


key
q a collision resolution scheme is necessary

COMP2015 HKBU/CS/JF/2023 7
Hash Function

Usually, m << N.

h(Ki) = an integer in [0, …, m-1], the hash value of Ki

COMP2015 HKBU/CS/JF/2023 8
Hash Function

n The division method for numeric keys


q h(k) = k mod m

where m is the table size and a prime number.

q e.g. m = 11, k = 100, h(k) = 1


Requires only a single division operation (quite fast)

n Can the keys be strings?


n Non-numeric keys are converted to numbers

COMP2015 HKBU/CS/JF/2023 9
Hash Function
n Method 1
q Add up the ASCII values of the characters in the string

q e.g. the string “HongKong” becomes 795

(72+111+110+103+75+111+110+103)

q Problems:
n Different permutations of the same set of characters
would have the same hash value
n If the table size is large, the keys are not distributed well.

n e.g. If m=10007 and all the keys are eight or fewer


characters long. Since ASCII value <= 127, the hash
function can only assume values between 0 and
127*8=1016
COMP2015 HKBU/CS/JF/2023 10
Hash Function

n Method 2
Represent the first three characters s1s2s3 in the string by the
ASCII code, choose a number r, the integer value for the
string is
ASCII (s1) + ASCII(s2)r + ASCII(s3)r2

COMP2015 HKBU/CS/JF/2023 11
Collision Handling: Separate Chaining

n Instead of a hash table, we use a table of linked list


n keep a linked list of keys that hash to the same value

h(K) = K mod 10

COMP2015 HKBU/CS/JF/2023 12
Separate Chaining

n To insert a key K
q Compute h(K) to determine which list to traverse, then
insert in the linked list

n To delete a key K
q Compute h(K), then search for K within the list. Delete K
if it is found.

n To find a key K, compute h(K), then search in the linked list

n Disadvantage: Memory allocation in linked list manipulation


will slow down the program.

COMP2015 HKBU/CS/JF/2023 13
Collision Handling: Open Addressing

n Open addressing:
q Relocate the key K to be inserted if it collides with an
existing key. That is, we store K at a different entry.

n Two issues arise


q What is the relocation scheme?

q How to search for K later?

n Three common methods for resolving a collision in open


addressing
q Linear probing

q Quadratic probing

q Double hashing

COMP2015 HKBU/CS/JF/2023 14
Open Addressing

n To insert a key K, compute h0(K)

- If the cell is empty, insert it there

- If collision occurs, probe alternative cell h1(K), h2(K), ....


until an empty cell is found,
where
hi(K) = (hash(K) + f(i)) mod m, with f(0) = 0
f: collision resolution strategy

COMP2015 HKBU/CS/JF/2023 15
Linear Probing

n f(i) = i
q cells are probed sequentially (with wraparound)

q hi(K) = (hash(K) + i) mod m

n Insertion:
q Let K be the new key to be inserted. We compute hash(K)

q For i = 0 to m-1

n compute L = ( hash(K) + i ) mod m

n If the cell is empty, then we put K there and stop.

q If we cannot find an empty entry to put K, it means that the

table is full and we should report an error.

COMP2015 HKBU/CS/JF/2023 16
Linear Probing

Question: Insert the following values into the hash table using
hash(K)=K mod 10 (table size) and linear probing to resolve
collisions
1, 5, 11, 7, 12, 17, 6, 25

0 1 2 3 4 5 6 7 8 9

111 12 25
5 6 17
7

COMP2015 HKBU/CS/JF/2023 17
Linear Probing

n hi(K) = (hash(K) + i) mod m


n Example: inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10

To insert 58, probe


T[8], T[9], T[0],
T[1]

To insert 69, probe


T[9], T[0], T[1],
T[2]

COMP2015 HKBU/CS/JF/2023 18
Primary Clustering

n We call a block of contiguously occupied table entries a


cluster. The larger the cluster, the slower the performance.

n Linear probing has the following disadvantages:


q Once h(K) falls into a cluster, this cluster will definitely
grow in size by one. Thus, this may worsen the
performance of insertion in the future.
q If two clusters are only separated by one entry, then
inserting one key into a cluster can merge the two clusters
together. Thus, the cluster size can be increased drastically
by a single insertion. This means that the performance of
insertion can deteriorate drastically after a single insertion.
q Large clusters are easy targets for collisions.

COMP2015 HKBU/CS/JF/2023 19
Primary Clustering in Linear
Probing

n Linear probing tends to create long sequences of filled cells


in a hash table.

n This effect is called primary clustering.

n Primary clustering degrades the performance of a hash table.

n Quadratic probing can be used to address the primary


clustering problem.

COMP2015 HKBU/CS/JF/2023 20
Quadratic Probing
n f(i) = i2
n hi(K) = ( hash(K) + i2 ) mod m
n Example: inserting keys 89, 18, 49, 58, 69 with hash(K) = K
mod 10

To insert 58, probe


T[8], T[9], T[(8+4)
mod 10]

To insert 69, probe


T[9], T[(9+1) mod
10], T[(9+4) mod
10]

COMP2015 HKBU/CS/JF/2023 21
Quadratic Probing

n Two keys with different home positions will have different


probe sequences
q e.g., m=101, h(k1)=30, h(k2)=29

q probe sequence for k1: 30, 30+1, 30+4, 30+9

q probe sequence for k2: 29, 29+1, 29+4, 29+9

n If the table size is prime, a new key can always be inserted if


the table is at least half empty.

COMP2015 HKBU/CS/JF/2023 22
Quadratic Probing

n If the table size is not prime?


q The number of alternative locations can be severely
reduced.

q e.g., m=16, h(k1)=h0

q probe sequence for k1: h0, h0 + 1, h0 + 4, h0 + 9, ……

q The only alternative locations would be at distances 1, 4,


or 9 away.

n If the table size is a prime of the form 4k+3, the entire table
can be probed.

COMP2015 HKBU/CS/JF/2023 23
Quadratic Probing

n Secondary clustering

q Keys that hash to the same home position will probe the
same alternative cells

q To avoid secondary clustering, the probe sequence needs


to be a function of the original key value, not the home
position

COMP2015 HKBU/CS/JF/2023 24
Tutorial
Insert the following keys into the hash table using hash(K)=K
mod m (table size) and linear/quadratic probing to resolve
collisions
19, 39, 29, 9, 79, 49, 69
0 1 2 3 4 5 6 7 8 9

COMP2015 HKBU/CS/JF/2023 25
Double Hashing

n To alleviate the problem of clustering, the sequence of


probes for a key should be independent of its primary
position
=> use two hash functions: hash() and hash2()

n The idea is that even if two keys hash to the same value
using hash(), they will have different values using hash2().

n f(i) = i * hash2(K)
q e.g., hash2(K) = R - (K mod R), where R is a prime
number smaller than m
q hash2 should return a nonzero integer.

COMP2015 HKBU/CS/JF/2023 26
Double Hashing
n hi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod m
n f(i) = i * hash2(K); hash2(K) = R - (K mod R),
n Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69
To insert 49,
hash2(49)=7, 2nd
probe is T[(9+7)
mod 10]

To insert 58,
hash2(58)=5, 2nd
probe is T[(8+5)
mod 10]

To insert 69,
hash2(69)=1, 2nd
probe is T[(9+1)
mod 10]

COMP2015 HKBU/CS/JF/2023 27
Rehashing

n No matter what collision resolution scheme is used, the


performance of an open addressing hash table degrades when
the table is becoming full.

n Build another table, twice as big (and prime).


n The hash function has to be changed.
n All entries in the hash table have to be re-entered using the
new hash function.
n This process is called rehashing.

n Note: Rehashing is rarely necessary for most applications.

COMP2015 HKBU/CS/JF/2023 28
Example Applications

n Compilers use hash tables (symbol tables) to keep track of


declared variables.

n On-line spelling checkers: after prehashing the entire


dictionary, one can check each word in constant time and
print out the misspelled word in order of their appearance
in the document.

COMP2015 HKBU/CS/JF/2023 29
Issues
n What do we lose?
q Operations that require ordering are inefficient
q FindMax
q FindMin
q PrintSorted

n What do we gain?
q Insert: O(1)
q Delete: O(1)
q Find: O(1)

n How to handle collision?


q Separate chaining
q Open addressing

COMP2015 HKBU/CS/JF/2023 30
Tutorial
v [Hash Table – More Collision Resolution] Suppose you
have a hash table of size 10, with initial hash function h(x) =
2x. Insert the following keys into the hash table:
2, 12, 3, 10, 5, 7
v (a) Have your hash table using linear probing to resolve
collisions.
v (b) Have your hash table using quadratic probing to resolve
collisions.
v (c) Have your hash table using double hashing, with second
hash function f(x) = x + 1.

COMP2015 HKBU/CS/JF/2023 31

You might also like