Professional Documents
Culture Documents
Hashing
Hashing is a technique that is used to uniquely identify a specific
object from a group of similar objects [1].
• Collisions!
Collisions
• A collision occurs when an item being inserted into a hash
table maps to the same bucket as an existing item in the
hash table.
• Ex: For a hash function of key % 10, 44 would be inserted in
bucket 44 % 10 = 4; later inserting 94 would yield a collision
because 94 % 10 is also 4.
• Chaining
• Chaining is a collision resolution technique where each bucket
has a list of items (so bucket 4's list would become 44, 94).
• Open addressing (closed hashing)
• Open addressing is a collision resolution technique where
collisions are resolved by looking for an empty bucket elsewhere
in the table (so 94 might be stored in bucket 5).
Source: [2]
Hash Function
• A common hash function uses the modulo operator %, which
computes the integer remainder when dividing two numbers.
• Ex: For a 20 element hash table, a hash function of key % 20 will
map keys to bucket indices 0 to 19.
• Input: key
• Output/index : key % hash_table_size
Source: [2]
Operations
• A hash table's operations of insert, remove, and search each
use the hash function to determine an item's bucket.
• Ex: Inserting 215 first determines the bucket to be 215 % 10 = 5.
• Insert(key, value)
• Delete(key)
• Search(key)
• Example:
• "cat": (99+97+116) % 5 = 312 % 5 = 2
• "dog": (100+111+103) % 5 = 314 % 5 = 4
• "wolf": (119+111+108+102) % 5 = 440 % 5 = 0
• "mouse": (109+111+107+115+101) % 5 = 543 % 5 = 3
0
Example: collision
1
2
3
• Hash table size = 7 4 cat
• Keys = "cat", "dog", "wolf", "mouse" 5
• Example:
• "cat": (99+97+116) % 7 = 312 % 7 = 4
• "dog": (100+111+103) % 7 = 314 % 7 = 6
• "wolf": (119+111+108+102) % 7 = 440 % 7 = 6
• "mouse": (109+111+107+115+101) % 7 = 543 % 7 = 4
Collision Strategies
• Chaining
• Linear Probing
• Quadratic Probing
• Double Hashing
Chaining
• Chaining handles hash table collisions by using a list for each
bucket, where each list may store multiple items that map to
the same bucket.
• Insert
• The insert operation first uses the item's key to determine
the bucket, and then inserts the item in that bucket's list.
• Search
• Searching also first determines the bucket, and then
searches the bucket's list.
• Delete
• After searching the value using the search function the item
is removed if found.
Source: [2]
Chaining
Assume table is vector<vector<int>>
Insert(key, value)
Search(key)
{
int pos = HashF(key) {
table[pos].append(value) int pos = HashF(key)
} for(i = 0; I < table[pos].size; i++)
{
Delete(key)
if(table[pos][i] == key)
{
int pos = HashF(key) return table[pos][i]
val = Search(key) }
if(val is not -1) return -1
{ }
remove(table, val)
}
}
Chaining: Insert
• Hash function: 0
• key % 10 1 11
2 732
3
• Insert(44, 44)
4 44 94
• Insert(98, 98)
5
• Insert(8, 8)
6 26
• Insert(94, 94)
7
• Insert(11, 11) 8 98 8
• Insert(732, 732) 9
• Insert(26, 26)
Chaining: Search
Search(94) // returns 94
Search(67) // returns -1
0
1 11
2 732
3
Bucket 4 = 94 % 10 4 44 94
5
6 26
Bucket 7 = 67 % 10 7
8 98 8
9
Chaining: Delete
0 Delete(44) 0
Delete(8)
1 11 1 11
Delete(77)
2 732 2 732
3 3
4 44 94 4 94
5 5
6 26 6 26
7 7
8 98 8 8 98
9 9
Linear Probing
• A hash table with linear probing handles a collision by starting
at the key's mapped bucket, and then linearly searches
subsequent buckets until an empty bucket is found.
0
• Hash function: 1
• key % 10 2
• Insert(44, 44) 3
• Insert(98, 98) 4 44
• Insert(8, 8) 5 94
• Insert(94, 94) 6
• Insert(35, 35) ? 7
• Insert(48, 48) ? 8 98
9 8
Linear Probing
• Types of empty bucket:
• Empty-since-start
• An empty-since-start bucket has been empty since the hash table
was created.
• Empty-after-removal
• An empty-after-removal bucket had an item removed that caused
the bucket to now be empty.
• The distinction will be important during searches, since searching
only stops for empty-since-start, not for empty-after-removal.
Source: [2]
Linear Probing: Insert
Insert(key, value)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size())
{
if(table[pos] is Empty)
Insert(42, 42)
{ Insert(19, 19)
table[pos] = value
return true
}
pos = (pos + 1)%table.size()
probed++
}
return false
}
Source: [2]
Linear Probing: Delete
Delete(key)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size() &&
table[pos] is not EmptySinceStart)
{
if(table[pos].key == key) Delete(202)
{ Delete(19)
table[pos].clear()
table[pos] = EmptyAfterRemoval
}
pos = (pos + 1)%table.size()
probed++
}
}
Source: [2]
Linear Probing: Search
Search(key)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size() && table[pos]
is not EmptySinceStart)
{
if(table[pos].key == key)
{ Search(42) // returns 42
return table[pos].value Search(19) // returns 19
} Search(115) // returns -1
pos = (pos + 1)%table.size()
probed++
}
return -1
}
Source: [2]
Quadratic Probing
• A hash table with quadratic probing handles a collision by starting
at the key's mapped bucket, and then quadratically searches
subsequent buckets until an empty bucket is found.
• If an item's mapped bucket is H, the following formula is used to
determine the item's index in the hash table.
• (H + c1∗i + c2∗i2) % (table_size)
• c1 and c2 are programmer-defined constants for quadratic
probing.
• Inserting a key uses the formula, starting with i = 0, to repeatedly
search the hash table until an empty bucket is found.
• Each time an empty bucket is not found, i is incremented by 1.
• Iterating through sequential i values to obtain the desired table
index is called the probing sequence.
Source: [2]
Quadratic Probing: Insert
Hash function: key % 10
Assuming c1 = 5, c2 = 7 Hash Table
Quadratic probing sequence: (H + 5*i + 7* i* i) % 10
0
1
Operation H(key) i Bucket index Bucket 2
empty? 3
Insert key 75 75 % 10 = 5 0 (5 + 5*0 + 7*0*0) % 10 = 5 Yes 4
Insert key 46 46 % 10 = 6 0 (6 + 5*0 + 7*0*0) % 10 = 6 Yes 5 55
Insert key 35 35 % 10 = 5 0 (5 + 5*0 + 7*0*0) % 10 = 5 No 6 46
1 (5 + 5*1 + 7*1*1) % 10 = 7 Yes 7 35
8
9
Quadratic Probing: Insert
• Hash function: key % 16
• Assuming c1 = 1, c2 = 1.
• Insertion order: 49, 32, 3, 99, 16, 64, 23, 42, 11
Source: [2]
Quadratic Probing: Delete
Hash function: key % 16, Assuming c1 = 1, c2 = 1
Insertion order: 49, 32, 3, 99, 16, 64, 23, 42, 11
Source: [2]
Quadratic Probing: Search
Hash function: key % 16, Assuming c1 = 1, c2 = 1
Source: [2]
Quadratic Probing
Insert(key, value) Search(key)
{ {
int pos = HashF(key) int pos = HashF(key)
int probed = 0 int probed = 0
while(probed < table.size()) while(probed < table.size() &&
{ table[pos] is not EmptySinceStart)
if(table[pos] is Empty) {
{ if(table[pos].key == key)
table[pos] = value {
return true return table[pos].value
} }
probed++ probed++
pos = (HashF(key) + c1*probed + pos = (HashF(key) + c1*probed +
c2*probed^2)%table.size() c2*probed^2)%table.size()
} }
return false return -1
} }
Quadratic Probing
Delete(key)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size() && table[pos] is not EmptySinceStart)
{
if(table[pos].key == key)
{
table[pos].clear()
table[pos] = EmptyAfterRemoval
}
probed++
pos = (HashF(key) + c1*probed + c2*probed^2)%table.size()
}
}
Double Hashing
• Double hashing uses 2 different hash functions to compute
bucket indices.
• Using hash functions h1 and h2, a key's index in the table is
computed with the following formula
• (h1(key) + i*h2(key)) % table_size
Source: [2]
Double Hashing: Insert
Source: [2]
Double Hashing: Search/Delete
Source: [2]
Double Hashing: Insert & Search
Insert(key, value) Search(key)
{ {
int pos = HashF1(key) int pos = HashF1(key)
int probed = 0 int probed = 0
while(probed < table.size()) while(probed < table.size() &&
{ table[pos] is not EmptySinceStart)
if(table[pos] is Empty) {
{ if(table[pos].key == key)
table[pos] = value {
return true return table[pos].value
} }
probed++ probed++
pos = (HashF1(key) + pos = (HashF1(key) +
probed*HashF2(key))%table.size() probed*HashF2(key))%table.size()
} }
return false return -1
} }
Double Hashing: Delete
Delete(key)
{
int pos = HashF(key)
int probed = 0
while(probed < table.size() && table[pos] is not EmptySinceStart)
{
if(table[pos].key == key)
{
table[pos].clear()
table[pos] = EmptyAfterRemoval
}
probed++
pos = (HashF1(key) + probed*HashF2(key)) % table.size()
}
}
Hashing Functions
• Perfect Hashing
• Modulo Hashing
• Mid-square Hashing
• Base 10
• Base 2
• Adler-32 Hashing
• Direct hashing
Modulo Hashing
Assume N is the size of the table
HashF(key)
{
return key % N
}
Mid-Square Hashing
Assume N is the size of the table
R ≥ ceil(log10N)
HashF(key)
{
int pos = key^2
string spos = pos
rDigits = ceil((spos.size()-R)/2)
spos.erase(spos.size()-rDigits,rDigits)
lDigits = spos.size()-R
spos.erase(0,lDigits)
return parseInt(spos) % N
}
Mid-Square Hashing
Assume N is the size of the table
R ≥ ceil(log2N)
HashF(key)
{
int pos = key^2
lBits = ceil(((numBits in key^2)-R)/2)
eBits = pos >> lBits
eBits = eBits & (0xFFFFFFFF >> (32-R))
return eBits%N
}
Multiplicative String Hashing
Assume N is the size of the table
MULTIPLIER = 2
HashF(key)
{
pos = 0
for(c : key)
{
pos = (pos* MULTIPLIER ) + c
}
return pos%N
}
Adler-32 Hashing
Assume N is the size of the table
ADLERMOD = 65521
HashF(key)
{
a = 1, b = 0
for(c : key)
{
a = (a + c) % ADLERMOD
b = (b + a) % ADLERMOD
}
return ((b << 16) | a)%N
}