You are on page 1of 18

Spring 2021

Data Structures 2

Hashing
Hash Tables and Hash Functions

- Hash table: an array of some fixed size, that


positions elements according to an algorithm called a
hash function.

- A hash function maps keys of a given type to


integers in a fixed interval [0, N-1]

- h(x) =x mod N is a hash function for integer keys

- The integer h(x) is called the hash value of key x

- The goal of the hash function is to distribute the keys


in a random way.

1
Example
0
If we have a hash table with size 10 and the
following hash function h(k) =k % 10 1 41
Insert the following elements 41, 34, 7, and 18
2

3
- Note that when we search an element, we
will calculate the hash function and search 4 34
in the specified location resulted from hash
function. 5

7 7

8 18

2
Collision
- The event that two hash table elements map into the
same slot in the array
Example
- If we have h(k) =k % 10
add 41, 34, 7, 18, then 21
- 21 hashes into the same slot as 41.
- 21 should not replace 41 in the hash table;
they should both be there.
Collision Handling/Resolution:
a strategy for fixing collisions in a hash table

We have the following ways to handle collisions:


1. Separate Chaining
2. Open Addressing
a. Linear probing
b. Quadratic Probing
c. Double Hashing

3
1. Separate chaining
Let each cell in the table point to a linked list
of entries that map there

Example
h(x)=x mod 13
Insert keys 18, 41, 22, 44, 59, 32, 31, 73 in this order

0
1
2 41
3
4
5 184431
6 32
7 59
8 73
9 22
10
11
12

4
2. Open Addressing
- on a collision, look for another empty spot in the
array
- Examples of open addressing
o linear probing
o quadratic probing
o double hashing
- Searching for an element when using open
addressing scheme must continue looking for item
until it finds it or an empty slot.

5
a. Linear probing
handles collisions by placing the colliding item in the
next (circularly) available table cell
Example
h(x)=x mod 13
Insert keys 18, 41, 22, 44, 59, 32, 31, 73 in this order

0
1
2 41
3
4
5 18
6 44
7 59
8 32
9 22
10 31
11 73
12

6
Search with Linear Probing
We probe consecutive locations until one of the following
occurs
- An item with key k is found, or
- An empty cell is found, or
- N cells have been unsuccessfully probed.

Clustering problem
- elements being placed close together by probing,
which degrades hash table's performance
- Example add 89, 18, 49, 58, 9
- now searching for the value 28 will have to check
half the hash table! no longer constant time...

7
b. Quadratic probing
- resolving collisions on slot i by putting the colliding
element into slot i+1, i+4, i+9, i+16, ...
- Example add 89, 18, 49, 58, 9
i. 49 collides (89 is already there), so we
search ahead by +1 to empty slot 0
ii. 58 collides (18 is already there), so we
search ahead by +1 to occupied slot 9,
then +4 to empty slot 2
iii. 9 collides (89 is already there), so we
search ahead by +1 to occupied slot 0,
then +4 to empty slot 3

8
c. Double Hashing
- You have a primary hash function h1(k)
- Double hashing uses a second hash function h2(k)
and handles collisions by placing an item in the first
available cell of the series
(h1(k)+ j*h2(k)) mod N for j=0, 1, … , N -1

- The secondary hash function h2(k)cannot have zero


values
- The table size N must be a prime to allow probing of
all the cells
- Common choice of compression function for the
secondary hash function:
o h2(k) =q-(k mod q) where
 q<n
 q is prime

9
Example 1
Consider a hash table storing integer keys that handles
collision with double hashing
N=13
h(k)=k mod 13
d(k)=7- k mod 7
Insert keys 18, 41, 22, 44, 59, 32, 31, 73
k h(k) d(k)
18 5 5 is empty
41 2 2 is empty
22 9 9 is empty
44 5 5 5 is busy (5+5)%13 =10
59 7 4 7 is empty
32 6 3 6 is empty
31 5 4 5 is busy (5+4)%13 =9 is also busy
(9+4)%13 =0
73 8 4 8

10
The final hash table:

0 31
1
2 41
3
4
5 18
6 32
7 59
8 73
9 22
10 44
11
12

11
Example 2
Consider inserting keys 10,22,31,4,15,28,17,88,59 into a
hash table of length m=11 using open addressing with the
hash function h1 (k) =k mod 11.
Illustrate the resulting table when using double hashing
with h2 (k) =1+(k mod 10)
H(k)= h1 (k)+j*h2 (k) ,j=0,1,2 … m-1
k h(k)
10 10 10 is empty
22 0 0 is empty
31 9 9 is empty
4 4 4 is empty
15 4 4 is busy so calculate h2(k)=1+15 mod 10 =6
So (4+6 mod 11= 10) is busy
So (4+2*6) mod 11=5 is empty
28 6 6 is empty
17 6 6 is busy so calculate h2(k)=1+17 mod 10 =8
So (6+8 mod 11= 3) is empty
88 0 0 is busy so calculate h2(k)=1+88 mod 10 =9
So (0+9 mod 11= 9) is busy
So(0+2*9 mod 11) =18 mod 11 =7 is empty
59 4 4 is busy so calculate h2(k)=1+(59 mod 10) =10
So (4+10 mod 11= 3) is busy
So 4+2*10 mod 11= 24 mod 11 =2 is empty

12
The final hash table will be as following: -

0 1 2 3 4 5 6 7 8 9 10 11

22 59 17 4 15 28 88 31 10

13
Notes in Open Addressing:
- To handle insertions and deletions, we introduce a
special object, called AVAILABLE, which replaces
deleted elements.
Remove From hash table:
- We search for an entry with key k
- If such an entry k is found,
o We replace it with the special item
AVAILABLE and we return element o
Insert into hash table
- We start at cell h(k)
- We probe consecutive cells until
o A cell i is found that is either empty or stores
AVAILABLE.

14
Analysis of hash tables
Main operation: Search of item in table
- Worst-case cost of finding an item O(n)

- Average Case can be constant time O(1)

- Worst-case analysis does not make sense for hash


tables, look at average case cost

- Cost highly depends on the load factor

- Load factor  of a hash table is the ratio:

N
M
Where N is no. of elements inserted in hash table and
M is array size

15
Rehashing
- increasing the size of a hash table's array
- re-storing all of the items into the array using the
hash function
o When should we rehash?
 when load reaches a certain level
 when an insertion fails
- To get O(1) average case performance for lookups
and insertions, need
o good hash function
 distributes objects evenly among all indeces
o a load factor that is not too high
 choose table size well appropriate to number
of elements you expect to store
o keep rehashing to a minimum
 choose the largest initial capacity size you
can reasonably afford.

16
Hash versus tree
Which is better, a hash table or a search tree

Hash Tree
- Better average for - Guarantee on worst-case search time
lookup and insertion - Possible successor and predecessor
search.
- Easy to access items in sorted order

17

You might also like