You are on page 1of 39

Hash Tables

Hashing
Hashing is a technique that is used to uniquely identify a specific
object from a group of similar objects [1].

• In universities, each student is assigned a unique roll number


that can be used to retrieve information about them.
• In libraries, each book is assigned a unique number that can
be used to determine information about the book, such as its
exact position in the library or the users it has been issued to
etc.
Hash Table
• Simple array like data structure

• (Key, Value) pair


• In a hash table, an item's key is the value used to map to an
index.
• For all items that might possibly be stored in the hash table, every
key is ideally unique, so that the hash table's algorithms can
search for a specific item by that key.

• Data stored in buckets


• Each hash table array element is called a bucket.
Hash Function
A hash function computes a bucket index from the item's key.

• The idea is to assign a key to each element.


• Time Complexity
• The element may be accessed in O(1) time.
• Hash Function
• Modulo!

• Collisions!
Collisions
• A collision occurs when an item being inserted into a hash
table maps to the same bucket as an existing item in the
hash table.
• Ex: For a hash function of key % 10, 44 would be inserted in
bucket 44 % 10 = 4; later inserting 94 would yield a collision
because 94 % 10 is also 4.

• Chaining
• Chaining is a collision resolution technique where each bucket
has a list of items (so bucket 4's list would become 44, 94).
• Open addressing (closed hashing)
• Open addressing is a collision resolution technique where
collisions are resolved by looking for an empty bucket elsewhere
in the table (so 94 might be stored in bucket 5).

Source: [2]
Hash Function
• A common hash function uses the modulo operator %, which
computes the integer remainder when dividing two numbers.
• Ex: For a 20 element hash table, a hash function of key % 20 will
map keys to bucket indices 0 to 19.

• Input: key
• Output/index : key % hash_table_size

Basic requirements for good hash function:


• Easy to compute
• Uniform Distribution
• Less Collision
Example: Hash Table

Source: [2]
Operations
• A hash table's operations of insert, remove, and search each
use the hash function to determine an item's bucket.
• Ex: Inserting 215 first determines the bucket to be 215 % 10 = 5.

• Insert(key, value)

• Delete(key)

• Search(key)

• Implementation depends on collision handling


0 wolf
Example 1
2 cat
3 mouse
• Hash table size = 5
4 dog
• Keys = "cat", "dog", "wolf", "mouse"
• Hash function:
• (Summation of ascii values of each character) % hash table size

• Example:
• "cat": (99+97+116) % 5 = 312 % 5 = 2
• "dog": (100+111+103) % 5 = 314 % 5 = 4
• "wolf": (119+111+108+102) % 5 = 440 % 5 = 0
• "mouse": (109+111+107+115+101) % 5 = 543 % 5 = 3
0

Example: collision
1
2
3
• Hash table size = 7 4 cat
• Keys = "cat", "dog", "wolf", "mouse" 5

• Hash function: 6 dog

• (Summation of ascii values of each character) % hash table size

• Example:
• "cat": (99+97+116) % 7 = 312 % 7 = 4
• "dog": (100+111+103) % 7 = 314 % 7 = 6
• "wolf": (119+111+108+102) % 7 = 440 % 7 = 6
• "mouse": (109+111+107+115+101) % 7 = 543 % 7 = 4
Collision Strategies
• Chaining

• Linear Probing

• Quadratic Probing

• Double Hashing
Chaining
• Chaining handles hash table collisions by using a list for each
bucket, where each list may store multiple items that map to
the same bucket.

• Insert
• The insert operation first uses the item's key to determine
the bucket, and then inserts the item in that bucket's list.
• Search
• Searching also first determines the bucket, and then
searches the bucket's list.
• Delete
• After searching the value using the search function the item
is removed if found.

Source: [2]
Chaining
Assume table is vector<vector<int>>

Insert(key, value)
Search(key)
{
int pos = HashF(key) {
table[pos].append(value) int pos = HashF(key)
} for(i = 0; I < table[pos].size; i++)
{
Delete(key)
if(table[pos][i] == key)
{
int pos = HashF(key) return table[pos][i]
val = Search(key) }
if(val is not -1) return -1
{ }
remove(table, val)
}
}
Chaining: Insert
• Hash function: 0

• key % 10 1 11
2 732
3
• Insert(44, 44)
4 44 94
• Insert(98, 98)
5
• Insert(8, 8)
6 26
• Insert(94, 94)
7
• Insert(11, 11) 8 98 8
• Insert(732, 732) 9
• Insert(26, 26)
Chaining: Search
Search(94) // returns 94
Search(67) // returns -1
0
1 11
2 732
3
Bucket 4 = 94 % 10 4 44 94
5
6 26
Bucket 7 = 67 % 10 7
8 98 8
9
Chaining: Delete
0 Delete(44) 0
Delete(8)
1 11 1 11
Delete(77)
2 732 2 732
3 3
4 44 94 4 94
5 5
6 26 6 26
7 7
8 98 8 8 98
9 9
Linear Probing
• A hash table with linear probing handles a collision by starting
at the key's mapped bucket, and then linearly searches
subsequent buckets until an empty bucket is found.
0
• Hash function: 1
• key % 10 2
• Insert(44, 44) 3
• Insert(98, 98) 4 44
• Insert(8, 8) 5 94
• Insert(94, 94) 6
• Insert(35, 35) ? 7
• Insert(48, 48) ? 8 98
9 8
Linear Probing
• Types of empty bucket:
• Empty-since-start
• An empty-since-start bucket has been empty since the hash table
was created.
• Empty-after-removal
• An empty-after-removal bucket had an item removed that caused
the bucket to now be empty.
• The distinction will be important during searches, since searching
only stops for empty-since-start, not for empty-after-removal.

• If the probing reaches the last bucket, the probing continues


at bucket 0.
• The insert operation returns true if the item was inserted and
returns false if all buckets are occupied.

Source: [2]
Linear Probing: Insert
Insert(key, value)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size())
{
if(table[pos] is Empty)
Insert(42, 42)
{ Insert(19, 19)
table[pos] = value
return true
}
pos = (pos + 1)%table.size()
probed++
}
return false
}
Source: [2]
Linear Probing: Delete
Delete(key)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size() &&
table[pos] is not EmptySinceStart)
{
if(table[pos].key == key) Delete(202)
{ Delete(19)
table[pos].clear()
table[pos] = EmptyAfterRemoval
}
pos = (pos + 1)%table.size()
probed++
}
}
Source: [2]
Linear Probing: Search
Search(key)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size() && table[pos]
is not EmptySinceStart)
{
if(table[pos].key == key)
{ Search(42) // returns 42
return table[pos].value Search(19) // returns 19
} Search(115) // returns -1
pos = (pos + 1)%table.size()
probed++
}
return -1
}
Source: [2]
Quadratic Probing
• A hash table with quadratic probing handles a collision by starting
at the key's mapped bucket, and then quadratically searches
subsequent buckets until an empty bucket is found.
• If an item's mapped bucket is H, the following formula is used to
determine the item's index in the hash table.
• (H + c1∗i + c2∗i2) % (table_size)
• c1 and c2 are programmer-defined constants for quadratic
probing.
• Inserting a key uses the formula, starting with i = 0, to repeatedly
search the hash table until an empty bucket is found.
• Each time an empty bucket is not found, i is incremented by 1.
• Iterating through sequential i values to obtain the desired table
index is called the probing sequence.

Source: [2]
Quadratic Probing: Insert
Hash function: key % 10
Assuming c1 = 5, c2 = 7 Hash Table
Quadratic probing sequence: (H + 5*i + 7* i* i) % 10
0
1
Operation H(key) i Bucket index Bucket 2
empty? 3
Insert key 75 75 % 10 = 5 0 (5 + 5*0 + 7*0*0) % 10 = 5 Yes 4
Insert key 46 46 % 10 = 6 0 (6 + 5*0 + 7*0*0) % 10 = 6 Yes 5 55
Insert key 35 35 % 10 = 5 0 (5 + 5*0 + 7*0*0) % 10 = 5 No 6 46
1 (5 + 5*1 + 7*1*1) % 10 = 7 Yes 7 35
8
9
Quadratic Probing: Insert
• Hash function: key % 16
• Assuming c1 = 1, c2 = 1.
• Insertion order: 49, 32, 3, 99, 16, 64, 23, 42, 11

• Probing sequence when inserting 48 into the table:


• i=0: (48%16 + 1*0 + 1*0*0)%16 = 0
• i=1: (48%16 + 1*1 + 1*1*1)%16 = 2
• i=2: (48%16 + 1*2 + 1*2*2)%16 = 6
• i=3: (48%16 + 1*3 + 1*3*3)%16 = 12

Source: [2]
Quadratic Probing: Delete
Hash function: key % 16, Assuming c1 = 1, c2 = 1
Insertion order: 49, 32, 3, 99, 16, 64, 23, 42, 11

Operation H(key) i Bucket index


Delete key 32 32 % 16 = 0 0 (0 + 1*0 + 1*0*0) % 10 = 0

Source: [2]
Quadratic Probing: Search
Hash function: key % 16, Assuming c1 = 1, c2 = 1

Source: [2]
Quadratic Probing
Insert(key, value) Search(key)
{ {
int pos = HashF(key) int pos = HashF(key)
int probed = 0 int probed = 0
while(probed < table.size()) while(probed < table.size() &&
{ table[pos] is not EmptySinceStart)
if(table[pos] is Empty) {
{ if(table[pos].key == key)
table[pos] = value {
return true return table[pos].value
} }
probed++ probed++
pos = (HashF(key) + c1*probed + pos = (HashF(key) + c1*probed +
c2*probed^2)%table.size() c2*probed^2)%table.size()
} }
return false return -1
} }
Quadratic Probing
Delete(key)​
{​
int pos = HashF(key)​
int probed = 0​
while(probed < table.size() &&​ table[pos] is not EmptySinceStart)​
{​
if(table[pos].key == key)
{
table[pos].clear()
table[pos] = EmptyAfterRemoval
}

probed++​
pos = (HashF(key) + c1*probed + c2*probed^2)%table.size()
}​
}
Double Hashing
• Double hashing uses 2 different hash functions to compute
bucket indices.
• Using hash functions h1 and h2, a key's index in the table is
computed with the following formula
• (h1(key) + i*h2(key)) % table_size

• Inserting a key uses the formula, starting with i = 0, to


repeatedly search hash table buckets until an empty bucket is
found.
• Each time an empty bucket is not found, i is incremented by 1.

Source: [2]
Double Hashing: Insert

Source: [2]
Double Hashing: Search/Delete

Inserting order: 49, 32, 3, 99, 16, 24, 23, 42, 19


Delete (3)
Delete(19)

Source: [2]
Double Hashing: Insert & Search
Insert(key, value) Search(key)
{ {
int pos = HashF1(key) int pos = HashF1(key)
int probed = 0 int probed = 0
while(probed < table.size()) while(probed < table.size() &&
{ table[pos] is not EmptySinceStart)
if(table[pos] is Empty) {
{ if(table[pos].key == key)
table[pos] = value {
return true return table[pos].value
} }
probed++ probed++
pos = (HashF1(key) + pos = (HashF1(key) +
probed*HashF2(key))%table.size() probed*HashF2(key))%table.size()
} }
return false return -1
} }
Double Hashing: Delete
Delete(key)​
{​
int pos = HashF(key)​
int probed = 0​
while(probed < table.size() &&​ table[pos] is not EmptySinceStart)​
{​
if(table[pos].key == key)
{
table[pos].clear()
table[pos] = EmptyAfterRemoval
}

probed++​
pos = (HashF1(key) + probed*HashF2(key)) % table.size()
}​
}
Hashing Functions
• Perfect Hashing

• Modulo Hashing

• Mid-square Hashing
• Base 10
• Base 2

• Multiplicative String Hashing

• Adler-32 Hashing

• Direct hashing
Modulo Hashing
Assume N is the size of the table

HashF(key)
{
return key % N
}
Mid-Square Hashing
Assume N is the size of the table
R ≥ ceil(log10N)

HashF(key)
{
int pos = key^2
string spos = pos
rDigits = ceil((spos.size()-R)/2)
spos.erase(spos.size()-rDigits,rDigits)
lDigits = spos.size()-R
spos.erase(0,lDigits)

return parseInt(spos) % N
}
Mid-Square Hashing
Assume N is the size of the table
R ≥ ceil(log2N)

HashF(key)
{
int pos = key^2
lBits = ceil(((numBits in key^2)-R)/2)
eBits = pos >> lBits
eBits = eBits & (0xFFFFFFFF >> (32-R))
return eBits%N
}
Multiplicative String Hashing
Assume N is the size of the table
MULTIPLIER = 2

HashF(key)
{
pos = 0
for(c : key)
{
pos = (pos* MULTIPLIER ) + c
}
return pos%N
}
Adler-32 Hashing
Assume N is the size of the table
ADLERMOD = 65521

HashF(key)
{
a = 1, b = 0
for(c : key)
{
a = (a + c) % ADLERMOD
b = (b + a) % ADLERMOD
}
return ((b << 16) | a)%N
}

You might also like