You are on page 1of 60

Hash Tables

Unit – III – Chapter 5 of Data Structures and Algorithm Analysis in C++ - Mark
Allen Weiss
Array as table

studid name score


0012345 Kuzhali 81.5
0033333 Ezhil 90
0056789 Arasi 56.8
...
9801010 Abraham 20
9802020 John 100
...
9903030 Ismail 73
9908080 Begham 49
Consider this problem. We want to store 1,000
student records and search them by student id.

2
Array as table
name score
0
: : :
12345 Kuzhali 81.5 One ‘stupid’ way is to store the
: : : records in a huge array (index
33333 Ezhil 90 0..9999999). The index is used as
: : : the student id, i.e. the record of the
56789 Arasi 56.8 student with studid 0012345 is
: : : stored at A[12345]
: : :
9908080 Begham 49
: : :
9999999

3
Array as table
 Store the records in a huge array where the index
corresponds to the key
 add - very fast O(1)
 delete - very fast O(1)
 search - very fast O(1)
 But it wastes a lot of memory! Not feasible.
 So we need hash tables and Hash Functions

4
Hash Table
 A hash table is a data structure that stores things and allows
insertions, lookups, and deletions to be performed in O(1) time.
 An algorithm converts an object, typically a string, to a number.
Then the number is compressed according to the size of the table
and used as an index.
 There is the possibility of distinct items being mapped to the
same key. This is called a collision and must be resolved.

5
Hash Table Example
• The simplest kind of hash table is an array of records.
• This example has 701 records.

[0] [1] [2] [3] [4] [5] [ 700]

...

An array of records
[4]
Hash Table
Number 506643548
 Each record has a special field,
called its key.
 In this example, the key is a
long integer field called
Number.

[0] [1] [2] [3] [4] [5] [ 700]

...
[4]
Hash Table
 The number might be a person's Number 506643548

identification number, and the rest of


the record has information about the
person.

[0] [1] [2] [3] [4] [5] [ 700]

...
Hash Table
 When a hash table is in use, some spots contain valid
records, and other spots are "empty".

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 506643548 155778322
Inserting a new record
 In order to insert a new record, the Number 580625685
key must somehow be converted
to an array index.
 The index is called the hash value
of the key.

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 506643548 155778322
Hash Function h(k)
Number 580625685
h(k) = k mod m
m – hash table length
 Typical way create a hash value:
(Number mod 701)

What is (580625685 mod


701) ?
[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 506643548 155778322
Inserting a new record
Number 580625685
 Typical way create a hash value:
(Number mod 701)

What is (580625685 mod


701) ?
3

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 506643548 155778322
Number 580625685
Inserting a new record
 The hash value is used for the
location of the new record.

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 506643548 155778322
Inserting a new record
 The hash value is used for the
location of the new record.

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 155778322
Collisions
 Here is another new record to insert, Number 701466868
with a hash value of 2.
 This is called a collision, because
there is already another valid record at
[2].

My hash
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 155778322
Collisions
Number 701466868

When a collision occurs,


move forward until you
find an empty spot.

My hash
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 155778322
Collisions
Number 701466868

When a collision occurs,


move forward until you
find an empty spot.

My hash
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 155778322
Collisions
Number 701466868

When a collision occurs,


move forward until you
find an empty spot.

My hash
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 155778322
Collisions
Number 701466868

When a collision occurs,


move forward until you
find an empty spot.

Empty
My hash
Spot
value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 155778322
Collisions

The new record goes


in the empty spot.

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 701466868 155778322
Searching for a key
 Calculate the hash value. Number 701466868
 Check that location of the array for
the key.

My hash
Not me. value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 701466868 155778322
Searching for a key
 Keep moving forward until you find the Number 701466868
key, or you reach an empty spot.

My hash
Not me. value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 701466868 155778322
Searching for a key
 Keep moving forward until you find the Number 701466868
key, or you reach an empty spot.

My hash
Not me. value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 701466868 155778322
Searching for a key
 Keep moving forward until you find the Number 701466868
key, or you reach an empty spot.

My hash
Yes value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 701466868 155778322
Searching for a key
 Keep moving forward until you find the Number 701466868
key, or you reach an empty spot.

My hash
Yes value is [2].

[0] [1] [2] [3] [4] [5] [ 700]

...
281942902 233667136 580625685 506643548 701466868 155778322
Solutions to Collision
 The problem arises because we have two keys that hash in
the same array entry, a collision. There are two ways to
resolve collision:
 Hashing with Chaining: every hash table entry contains a pointer to
a linked list of keys that hash in the same entry
 Hashing with Open Addressing: every hash table entry contains
only one key. If a new key hashes to a table entry which is filled,
systematically examine other table entries until you find one empty
entry to place the new key
 Linear Probing
 Quadratic Probing
 Rehashing (Or) Double Hashing

28
Chained Hash Table
One way to handle collision is to store the
collided records in a linked list. The array
now stores pointers to such lists. If no key
0 maps to a certain hash value, that array
1 nil entry points to nil.
2 nil
3
4 nil
5
:

HASHMAX nil Key: 9903030


name: Ezhilan
score: 73

29
Hash Table without Linked List
 Separate chaining has the disadvantage of using linked lists.
 Normal hash function hash(x) = key mode TableSize
 but in the solution resolve strategy
 Where hi(x) = (hash(x) + f(i)) mod Table Size, f(0) = 0
 The function f is the collision of resolve strategy
 There are 3 collision resolve strategy
 Linear probing
 Quadratic probing
 Rehashing (Or double hashing)

30
Linear Probing (நேரியல் ஆய்வு)
Empty Table Linear probing: Given auxiliary hash function h, the probe sequence
starts at slot h(k) and continues sequentially through the table, wrapping
0 after slot m − 1 to slot 0. Given key k and probe number i (0 ≤ i < m),
1 h(k, i) = (h(k) + i) mod m.
2 In linear probing, collisions are resolved by sequentially scanning
3 an array (with wraparound) until an empty cell is found.
4 Example
5 Key k:[89, 18, 49, 58, 69]
6 m = 10
h(89,10) = 9
7
h(18,10) = 8
8 h(49,10) = 9
9 h(58,10) = 8
h(69,10) = 9

31
Hash table with linear probing
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 89

h(k, i) = (h(k) + i) mod m. 0


1
k= 89, i =0; h(89) is 9 2
h(89,0) = (9 + 0 ) mod 10
3
h(89,0) = (9) mod 10
h(89,0) = 9 4
5
6
7
8
9 with linear
Hash table 89 probing

32
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i) mod m. 0


1
k= 18, i =0; h(18) is 8 2
h(18,0) = (8 + 0 ) mod 10
3
h(18,0) = (8) mod 10
h(18,0) = 8 4
5
6
7
8 18
9 with linear
Hash table 89 probing

33
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i) mod m. 0


1
k= 49, i =0; h(49) is 9 2
h(49,0) = (9 + 0 ) mod 10
3
h(49,0) = (9) mod 10
h(49,0) = 9 4
5
6
Collision 7
Due to collision to probe
the empty cell by 8 18
incrementing i value 9 with linear
Hash table 89 probing

34
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 49

h(k, i ) = (h(k) + i) mod m. 0 49


1
k= 49, i =1; h(49) is 9 2
h(49,1) = (9 + 1 ) mod 10
3
h(49,1) = (10) mod 10
h(49,1) = 0 4
5
6
7
8 18
9 with linear
Hash table 89 probing

35
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 49

h(k, i ) = (h(k) + i) mod m. 0 49


1
k= 58, i = 0; h(58) is 8 2
h(58,0) = (8 + 0 ) mod 10
3
h(58,0) = (8) mod 10
h(58,0) = 8 4
5
6
Collision 7
Due to collision to probe
the empty cell by 8 18
incrementing i value 9 with linear
Hash table 89 probing

36
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 49

h(k, i ) = (h(k) + i) mod m. 0 49


1
k= 58, i = 1; h(58) is 8 2
h(58,1) = (8 + 1 ) mod 10
3
h(58,1) = (9) mod 10
h(58,1) = 9 4
5
6
Collision 7
Again increment i value
8 18
9 with linear
Hash table 89 probing

37
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 49

h(k, i ) = (h(k) + i) mod m. 0 49


1
k= 58, i = 2; h(58) is 8 2
h(58,2) = (8 + 2) mod 10
3
h(58,2) = (10) mod 10
h(58,2) = 0 4
5
6
Collision 7
8 18
Again increment i value 9 with linear
Hash table 89 probing

38
Linear Probing (நேரியல் ஆய்வு)
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 49

h(k, i ) = (h(k) + i) mod m. 0 49


1 58
k= 58, i = 3; h(58) is 8 2
h(58,3) = (8 + 3) mod 10
3
h(58,3) = (11) mod 10
h(58,3) = 1 4
5
Questions: 6
 What is the storage index of Key value 69? 7
 How many linear probing will taken?
 What is the I value of after 69 stored?
8 18
9 with linear
Hash table 89 probing

39
Clustering (கொத்தாக்கம்)
 The position of the initial mapping i0 of key k is called the home
position of k.
 When several insertions map to the same home position, they end up
placed contiguously in the table. This collection of keys with the
same home position is called a cluster.
 Primary clustering: It means that any key that hashes into the cluster
will require several attempts to resolve the collision, and then it will
be add to the cluster.

40
Clustering
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 89

h(k, i) = (h(k) + i) mod m. 0


1
k= 89, i =0; h(89) is 9 2
h(89,0) = (9 + 0 ) mod 10
3
h(89,0) = (9) mod 10
h(89,0) = 9 4
5
6
7
8
Primary Cluster: 9
Number of probes: 9 9 Hash table
89

41
Clustering
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i) mod m. 0


1
k= 18, i =0; h(18) is 8 2
h(18,0) = (8 + 0 ) mod 10
3
h(18,0) = (8) mod 10
h(18,0) = 8 4
5
Primary Cluster: 9, 8 6
Number of probes: 8 7
8 18
Questions: 9 Hash table
89
 What is the values of secondary clustering?

42
Quadratic Probing
 Quadratic probing is a collision resolution method that eliminates the primary
clustering problem of linear probing.
 In this probing, the function f is quadratic. i.e., f(i) = i2
 Probing sequence is

h(k, i) = (h(k) mod m.


= (h(k) + 1 )mod m.
= (h(k) + 4) mod m.
= (h(k) + 9) mod m.
i.e.,
(h(k) + i2 ) mod m.

43
Quadratic Probing - Example
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 89

h(k, i ) = (h(k) + i2) mod m. 0


1
k= 89, i =0; h(89) is 9 2
h(89,0) = (9 + 0 ) mod 10
3
h(89,0) = (9) mod 10
h(89,0) = 9 4
5
Linear Probe
6
7
8
9 Hash table
89

44
Quadratic Probing - Example
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 89

h(k, i ) = (h(k) + i2) mod m. 0


1
k= 18, i =0; h(18) is 8 2
h(18,0) = (8 + 0 ) mod 10
3
h(18,0) = (8) mod 10
h(18,0) = 8 4
5
Linear Probe
6
7
8 18
9 Hash table
89

45
Quadratic Probing - Example
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i2) mod m. 0


1
k= 49, i =0; h(49) is 9 2
H(49,0) = (9 + 0 ) mod 10
3
H(49,0) = (9) mod 10
H(49,0) = 9 4
5
Linear Probe
Collision 6
Quadratic Probe 7
h(k, i ) = (h(k) + i2) mod m. 8 18
h(49,1) = (h(49) + 12) mod 10. 9 Hash table
89
h(49,1) = (9 + 1) mod 10 = 0
46
Quadratic Probing - Example
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i2) mod m. 0 49


1
2
3
4
5
Quadratic Probe 6
7
h(k, i ) = (h(k) + i2) mod m.
8 18
h(49,1) = (h(49) + 12) mod 10.
h(49,1) = (9 + 1) mod 10 = 0 9 Hash table
89

47
Quadratic Probing - Example
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i2) mod m. 0 49


1
Quadratic Probe
2 58
i (0 ≤ i < 10), 3
h(58, 02) = Collision 4
h(58, 12) = Collision 5
h(58, 22) = Empty spot, key will insert 6
h(69, 0) = ? 7
8 18
9 Hash table
89

48
Quadratic Probing - Example
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18

h(k, i ) = (h(k) + i2) mod m. 0 49


1
2 58
Quadratic Probe 3 69
4
5
6
7
8 18
9 Hash table
89

49
Double Hashing
 The last collision resolution method we will examine is double hashing.
 Quadratic probing problems: Secondary clustering: elements that hash to
the same position will probe the same alternative cells
 Probe sequence is – combine two different hash functions

h(k, i) = (h1(k) mod m.


= (h1(k) + 1. h2(x)) mod m.
= (h1(k) + 2. h2(x)) mod m
i.e.,
h(k, i) = i. hash2(x).
hash2(x) = R – (x mod R) – R is the prime number less than table size.
hi(k,i) = hi(k) + (i. hash2(x)) mod m.
50
Double Hashing
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18
h(k,i) = hi(k) + (i.hash2(x)) mod m.
0
Double Hashing
1
i (0 ≤ i < 10), 2
h(89,0) = No collision 3
h(18,0) = No collision 4
h(49,0) = Collision… 5
h(49,1) = (h(49) + (1.hash2(49))) mod 10
6 49
= ((9) + (1. (7 – (49 mod 7)) mod 10
7
= (9) + (1. (7 – 0)) mod 10
8 18
= (9) + (7) mod 10
9 89
= (16) mod 10 = 6 Hash table

51
Double Hashing
Example
Key k:[89, 18, 49, 58, 69], m = 10, i (0 ≤ i < m), After 18
h(k,i) = hi(k) + (i.hash2(x)) mod m.
0 69
Double Hashing
1
i (0 ≤ i < 10), 2
h(89,0) = No collision 3 58
h(18,0) = No collision 4
h(49,0) = Collision… 5
h(49,1) = (h(49) + (1.hash2(49))) mod 10
6 49
= ((9) + (1. (7 – (49 mod 7)) mod 10
7
= (9) + (1. (7 – 0)) mod 10
8 18
= (9) + (7) mod 10
9 89
= (16) mod 10 = 6. h(58)= ? h(69)= ? Hash table

52
Rehashing
 Increase the size of the hash table when load factor too high.
 Typically expand the table to twice its size (but still prime)
 Reinsert existing elements into new hash table

53
Rehashing Example
Problem with large tables
Extensible Hashing
Extensible Hashing Example
Extensible Hashing Example
Extensible Hashing Example
Unit – III Completed

You might also like