You are on page 1of 22

HASHING AND COLLISION

IN HASHING
Introduction
■ Hashing is a fundamental concept in computer science used in variety of applications
■ Hashing involves applying a hash function to input data, which serves as a unique
identifier for the original data
■ By using hash values as identifiers, hashing enables efficient storage and retrieval of
data
■ Hashing has various applications, including data storage in databases and associative
arrays, data integrity verification, and cryptography
Advantages
■ Hashing enables quick access to data by generating unique hash values.
■ Data can be directly retrieved from a hash table using the hash value, eliminating the
need for searching through a large database
■ Hashing ensures data integrity by generating hash values that uniquely represent the
original data.
■ By comparing hash values, it can be determined if the data has been modified or
tampered with
■ Hashing can handle large datasets efficiently by resizing or rehashing the hash table
■ Hashing eliminates duplicate data by using hash values as unique identifiers.
■ This reduces the amount of storage space required as duplicate data is stored only once
■ Ideal for applications that require fast and efficient data access
Collision in Hashing
■ A collision occurs in hashing when two different input values produce the same hash
value
■ Common factors that can influence the probability of collisions include the size of the
hash table, the quality of the hash function, and the distribution of input data
■ Collisions can impact the performance and correctness of hash-based data structures and
algorithms
■ Different techniques are used to handle collisions in hashing, such as separate chaining,
open addressing, and linear probing.
■ The choice of collision resolution technique depends on the specific application and the
characteristics of the input data
■ Efficient collision resolution is important for maintaining the performance of hash-
based data structures, such as hash maps and hash sets
Collision Handling Methods
■ Collusion occurs when two or more keys are hashed to the same index.
■ Collision handling methods are used to resolve these.
■ These methods are:
– Separate chaining
– Open Addressing
– Robin Hood Hashing
– Cuckoo Hashing
Separate Chaining
■ A method used in hashing to handle collisions.
■ It occurs when two or more keys map to the same index.
■ It creates a linked list for each hash table entry.
■ Implementation includes :
– insert
– search
– delete
Functioning
■ It uses key-value pair in insertion
■ If any linked list exists, the pair is appended to the end of the list, else a new linked list
is created and the pair is inserted as the head of the list.
■ It searches until the key is found or the end of the list is reached.
■ It the key exists, the corresponding value is returned, else the search returns None.
■ Linked list is searched until the key is found or the end of the list is reached.
■ If the key is found, the corresponding key-value pair is removed from the list.
Advantages and Disadvantages
■ It requires extra memory for the linked lists
■ It allows for an unlimited number of keys to be stored in the hash table.
■ It is a popular method for handling collisions
■ It is simple to implement.
■ It provides good performance for most applications.
Open Addressing
■ It uses a probe sequence to search for the next available slot.
■ A probe sequence is a set of rules that determine which slot to check next when
searching for an empty slot.
■ It has three types:
– Linear probing
– Quadratic probing
– Double hashing
Types of Open Addressing
■ Linear probing:
■ where the next slot is checked by incrementing the index by a fixed amount until an
empty slot is found.
■ Quadratic probing:
■ Another method is quadratic probing, where the next slot is checked by
incrementing the index by a quadratic function of the probe number until an empty
slot is found.
■ Double hashing:
■ where the next slot is checked by applying a second hash function to the key and
using the result to increment the index until an empty slot is found.
Functioning
■ Searching:
– uses probe sequence to search for the key in the subsequent slots until the key is
found or an empty slot is encountered.
■  Deletion:
– When a key is deleted from the hash table, the slot containing the key is marked as
deleted but not actually removed from the table. This is because removing the key
could break the probe sequence and make it impossible to find other keys.
Advantages and Disadvantages
■ It does not require extra memory for linked lists
■ It can have performance issues when the load factor of the hash table is high
■ There are many keys stored in the table relative to the size of the table.
■ Probe sequence can become long and cause many collisions, leading to poor
performance.
Robin Hood
■ It is named after the legendary outlaw Robin Hood who "robs from the rich and gives to
the poor," symbolizing the redistribution of elements in the hash table
■ Robin Hood hashing is used in scenarios where the distribution of keys is not known in
advance and collisions are likely to occur
Functioning
■Each element in the hash table is stored along with its distance from its ideal position.
■Collision occurs when two elements have same ideal position
■During insertion, if a collision occurs, the element with the longer probing distance (the
"rich") is displaced to make room for the new element (the "poor").
■One with larger probing length is moved further down
Advantages and Disadvantages
■ Provides better cache performance by keeping elements with shorter distances closer
together
■ Reduces clustering and improves lookup performance
■ Requires additional space to store the distance information
■ Slightly more complex to implement compared to other collision handling techniques
Cuckoo
■ It is named after the cuckoo bird known for laying eggs in the nests of other birds,
symbolizing the displacement of elements in the hash table
■ Cuckoo hashing is commonly used in scenarios where worst-case lookup time is
important and collisions need to be handled efficiently
Functioning
■Each element is stored in one of the tables based on the hash value from one of the hash
functions.
■If a collision occurs, the existing element is evicted from its position and moved to an
alternate position in the other hash table.
■ This process continues until a vacant position is found or a maximum number of
rehashes is reached
Advantages and Disadvantages
■Simple and easy to implement.
■Offers good worst-case lookup performance
■Requires additional memory to maintain multiple hash tables.
■Performance can degrade when the maximum number of rehashes is reached
Deletion

■ A process of removing a key-value pair from the hash table.


■ It allows efficient insertion, retrieval, and deletion of key-value pairs.
■ It involves removing a key-value pair from a bucket or linked list associated with a
specific hash code in a hash table
Steps:
■ Compute the hash code of the key.
■ Locate the bucket corresponding to the hash code.
■ Traverse the linked list in the bucket to find the specific key-value pair to be deleted.
■ Update the pointers or references in the linked list to remove the key-value pair.
■ If the linked list becomes empty after deletion, update the bucket to indicate that it is
empty.

You might also like