Since a hash function gets us a small number for a key
which is a big integer or string, there is a possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique.
How to handle Collisions?
There are mainly two methods to handle collision: 1) Separate Chaining 2) Open Addressing • Separate Chaining: The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101. 2. Closed Hashing (Open Addressing) • This collision resolution technique requires a hash table with fixed and known size. During insertion, if a collision is encountered, alternative cells are tried until an empty bucket is found. • These techniques require the size of the hash table to be supposedly larger than the number of objects to be stored (something with a load factor < 1 is ideal). • There are various methods to find these empty buckets: • a. Liner Probing • b. Quadratic probing • c. Double hashing • Linear probing
• The simplest approach to resolve a collision is linear
probing. In this technique, if a value is already stored at a location generated by h(k), it means collision occurred then we do a sequential search to find the empty location. • Here the idea is to place a value in the next available position. Because in this approach searches are performed sequentially so it’s known as linear probing. • Here array or hash table is considered circular because when the last slot reached an empty location not found then the search proceeds to the first location of the array. • There is an ordinary hash function h´(x) : U → {0, 1, . . ., m – 1}. In open addressing scheme, the actual hash function h(x) is taking the ordinary hash function h’(x) and attach some another part with it to make one linear equation. • h´(𝑥) = 𝑥 𝑚𝑜𝑑 𝑚 • ℎ(𝑥, 𝑖) = (ℎ´(𝑥) + 𝑖)𝑚𝑜𝑑 𝑚 • The value of i| = 0, 1, . . ., m – 1. So we start from i = 0, and increase this until we get one freespace. So initially when i = 0, then the h(x, i) is same as h´(x). Example Suppose we have a list of size 20 (m = 20). We want to put some elements in linear probing fashion. The elements are {96, 48, 63, 29, 87, 77, 48, 65, 69, 94, 61} Quadratic Probing • what is quadratic probing technique in open addressing scheme. There is an ordinary hash function h’(x) : U → {0, 1, . . ., m – 1}. In open addressing scheme, the actual hash function h(x) is taking the ordinary hash function h’(x) and attach some another part with it to make one quadratic equation. • h´ = (𝑥) = 𝑥 𝑚𝑜𝑑 𝑚 • ℎ(𝑥, 𝑖) = (ℎ´(𝑥) + 𝑖2)𝑚𝑜𝑑 𝑚 • We can put some other quadratic equations also using some constants • The value of i = 0, 1, . . ., m – 1. So we start from i = 0, and increase this until we get one free space. So initially when i = 0, then the h(x, i) is same as h´(x). Example we have a list of size 20 (m = 20). We want to put some elements in linear probing fashion. The elements are {96, 48, 63, 29, 87, 77, 48, 65, 69, 94, 61} • Double Hashing technique in open addressing scheme. There is an ordinary hash function h´(x) : U → {0, 1, . . ., m – 1}. In open addressing scheme, the actual hash function h(x) is taking the ordinary hash function h’(x) when the space is not empty, then perform another hash function to get some space to insert. • h1(x)=xmodmh1(x)=xmodm • h2(x)=xmodm′h2(x)=xmodm′ • h(x,i)=(h1(x)+ih2)modmh(x,i)=(h1(x)+ih2)modm • The value of i = 0, 1, . . ., m – 1. So we start from i = 0, and increase this until we get one free space. So initially when i = 0, then the h(x, i) is same as h´(x). • Example • Suppose we have a list of size 20 (m = 20). We want to put some elements in linear probing fashion. The elements are {96, 48, 63, 29, 87, 77, 48, 65, 69, 94, 61} • h1(x)=xmod20h1(x)=xmod20 • h2(x)=xmod13h2(x)=xmod13 • x h(x, i) = (h1 (x) + ih2(x)) mod 20 • Rehashing • Rehashing is a collision resolution technique. • Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating a new table. It is preferable is the total size of table is a prime number. There are situations in which the rehashing is required. • • When table is completely full • • With quadratic probing when the table is filled half. • • When insertions fail due to overflow. • In such situations, we have to transfer entries from old table to the new table by re computing their positions using hash functions. • Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and 87. the table size is 10 and will use hash function: • H(key) = key mod tablesize • 37 % 10 = 7 • 90 % 10= 0 • 55 % 10 = 5 • 22 % 10 = 2 • 17 % 10 = 7 • 49 % 10 = 9 • Now this table is almost full and if we try to insert more elements collisions will occur and eventually further insertions will fail. Hence we will rehash by doubling the table size. The old table size is 10 then we should double this size for new table, that becomes 20. But 20 is not a prime number, we will prefer to make the table size as 23. And new hash function will be • H(key) key mod 23 • H(key) key mod 23 • 37 % 23 = 14 • 90 % 23 = 21 • 55 % 23 = 9 • 22 % 23 = 22 • 17 % 23 = 17 • 49 % 23 = 3 • 87 % 23 = 18 • Now the hash table is sufficiently large to accommodate new insertions.