Professional Documents
Culture Documents
• Want to support insertion, retrieval, or
deletion of any named item
Hash Tables • Want to support O(1) operations
• Analogous to a dictionary ‐
A l di i Gi
Given a word (key)
d (k )
Chapter 10 look up the definition
• Do not need to maintain ordering information
among items
Example Operations on Characteristic Vector
3 4
Issues in Use of Characteristic Vectors Hashing
• Suppose that you want to allow multiple • Provide a function to map from any type of
occurrences of key values? key value to a range of integer indices
• Suppose that you want to represent values (hashing function)
whose keys are elements of the set of 32 bit
whose keys are elements of the set of 32 bit
integers?
• Provide a number of elements approximately
• Suppose that you want to use character data as equal to the number of members of the
keys? subset (instead of whole domain)
• What if the number of elements is << the
number of possible key values?
5 6
1
Example Hashing Function Example Hashing Function
• Want to represent a set of 20 character strings • Character string “abc” to be stored in a 10
of length 1 to 8 characters element hash table
• Take the ASCII value of each character and add – Key = (97+98+99) mod 10 = 4
them up
them up
• Take resulting sum mod 20 • Character string “HELLO” to be stored in 15
• Result is an index 0‐19 which can be used to element hash table
identify string’s place in the table – Key = (72+69+76+76+79) mod 15 = 12
7 8
Hash Function Considerations Hash Table Conflict Resolution
• Want a function which is cheap to calculate • When two key values hash to the same
location in the hash table how do we handle?
• Want a function which evenly distributes the – linear probing
k
key values into the available hash table slots
l i h il bl h h bl l – quadratic probing
quadratic probing
– separate chaining
• Consider overflow effects in hash function
9 10
Comparing Collision Resolution
Linear Probing
Schemes
• When item hashes to a filled location, scan
• To compare the schemes we need to make for next empty slot and insert in that slot
calculations on the average number of
comparisons needed using each scheme
• Find must follow the same path as insert
• We will need to consider the load factor • If hash table is nearly full, then degenerates
(0<=α<=1) which indicates the percentage of into unordered list search
slots in the table which are full
• A load factor of 0.5 is generally used with
linear probing
11 12
2
Linear Probing Deletion with Linear Probing
Hash(89,10) = 9 • Search operation assumes that if it finds an
Hash(18,10) = 8
Hash(49,10) = 9 empty cell in searching for a key then it is not
Hash(58,10) = 8 present
Hash( 9,10) = 9
After Insert 89 After Insert 18 After Insert 49 After Insert 58 After Insert 9
0 0 0 49 0 49 0 49 • If item is inserted and collision causes it to
1 1 1 1 58 1 58
2 2 2 2 2 9
occupy a cell further on in the array, deletion
3 3 3 3 3 of an intervening cell could cause a find to fail
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7 • Lazy deletion marks cells as inactive
8 8 18 8 18 8 18 8 18
9 89 9 89 9 89 9 89 9 89
13 14
Clustering Quadratic Probing
• Most analysis of the performance of linear • Want to eliminate clustering
probing assumes that the hash values generated
• Instead of trying locations H+1, H+2,... if
are evenly distributed
location H is full, try H+12, H+22 , ...
• Using
Using linear probing, keys that hash into
linear probing keys that hash into • Misses on locations H and H+1 now try
i l i H dH 1
occupied cells tend to cluster into blocks of different spots to resolve collisions
occupied cells
• Works well if the table size is prime and the
• Improved scheme should spread collisions out so loading factor is <= 0.5
that keys which hash to close cells won’t look for
free space in the same spots
15 16
Quadratic Probing Separate Chaining
Hash(89,10) = 9 • Separate chaining eliminates the requirement
Hash(18,10) = 8
Hash(49,10) = 9 of linear and quadratic probing for loading
Hash(58,10) = 8 factors < 0.5
Hash( 9,10) = 9
After Insert 89 After Insert 18 After Insert 49 After Insert 58 After Insert 9
0 0 0 49 0 49 0 49 • Elements which hash to the same cells are
1 1 1 1 1
2 2 2 2 58 2 58
simply chained together to make a list
3 3 3 3 3 9
4 4 4 4 4
5
6
5
6
5
6
5
6
5
6
• Requires searching of list, but if each list is
7 7 7 7 7 small this can still be O(1)
8 8 18 8 18 8 18 8 18
9 89 9 89 9 89 9 89 9 89
17 18
3
Example of Separate Chaining Analysis of Linear Probing1
1⎡ 1 ⎤ for successful search
1+
2 ⎣ 1 − α ⎥⎦
⎢
1⎡ 1 ⎤ for unsuccessful search
for unsuccessful search
⎢1 + ⎥
Hash function is key % 10
2 ⎣ (1 − α )2 ⎦ (and insertion)
1 D. E. Knuth, 1973, The Art of Computer Programming, Vol. 3: Sorting and Searching, Reading, MS: Addison‐
18 58 Wesley.
89 49 9
19 20
Quadratic Probing and Double
Example of Linear Probing Analysis
Hashing1
load factor (α) = N items / Tablesize
− log 2 ⎛⎜⎝1−α ⎞⎟⎠ for successful search
α
On the average, if α = 0.50,
successful search ≈ 1.5 probes, 1 for unsuccessful search
unsuccessful search ≈ 2.5 probes, 1−α (and insertion)
(insertion is same as unsuccessful search)
1 D. E. Knuth, 1973, The Art of Computer Programming, Vol. 3: Sorting and Searching, Reading, MS: Addison‐
Wesley.
21 22
Separate Chaining1 Performance of Hash Tables
1 D. E. Knuth, 1973, The Art of Computer Programming, Vol. 3: Sorting and Searching, Reading, MS: Addison‐Wesley.
23 24
4
Performance of Hash Tables vs. Other
Containers
Implementing Hash Tables
• Hash table:
– Insert: average O(1) • pair<const K, E> Entry_Type;
– Search: average O(1)
• Sorted array: • hashTable.h: implements open addressing
– Insert: average O(n)
I t O( )
– Search: average O(log n)
(linear probing)
• Binary Search Tree:
– Insert: average O(log n) • hashChains.h: implements chaining
– Search: average O(log n)
• But balanced trees can guarantee O(log n)
25 26
The search function The insert function
template<class K, class E>
void hashTable<K,E>::insert(const pair<const K, E>& thePair)
template<class K, class E> {
int hashTable<K,E>::search(const K& theKey) const // search the table for a matching pair
{// Search an open addressed hash table for a pair with key theKey. int b = search(thePair.first);
// Return location of matching pair if found, otherwise return
// location where a pair with key theKey may be inserted // check if matching pair found
// provided the hash table is not full. if (table[b] == NULL)
{
int i = (int) (hash(theKey) % divisor); // home bucket // t hi i d t bl t f ll
// no matching pair and table not full
int j = i; // start at home bucket table[b] = new pair<const K,E> (thePair);
do dSize++;
{ }
if (table[j] == NULL || table[j]‐>first == theKey) else
return j; {// check if duplicate or table full
j = (j + 1) % divisor; // next bucket if (table[b]‐>first == thePair.first)
} while (j != i); // returned to home bucket? {// duplicate, change table[b]‐>second
table[b]‐>second = thePair.second;
return j; // table full }
} else // table is full
throw hashTableFull();
}
}
29 30
5
Implementation Considerations for
The erase function Maps and Sets
Algorithm for erase • The hash template class defines the hash function, but provides
1. Find the first table element that is empty or that contains no implementation. The programmer must provide an
implementation for the key type.
the key.
• Example for string:
2. if an empty element is found template<>
3. done class hashCode<string>
4. else
l {
public:
5. Delete the Entry_Type object pointed to. size_t operator()(const string theKey) const
6. Set the pointer in this entry to DELETED {// Convert theKey to a nonnegative integer.
7. Increment num_deletes unsigned long hashValue = 0;
int length = (int) theKey.length();
8. Decrement num_keys for (int i = 0; i < length; i++)
Implementation is left as an exercise hashValue = 5 * hashValue + theKey.at(i);
return size_t(hashValue);
}
};
31 32
Defining Your Own hash function
• The hash_map class implementations use the
hash<Key_Type> function to locate the initial search
position.
• It then uses the objects equality operator (operator
==) to determine if there is a match.
• Therefore, your hashCode function should obey the
following constraint:
– If obj1 == obj2 then
hash<type>()(obj1) == hash<type>()(obj2)
33