You are on page 1of 25

DEPARTMENT OF COMPUTER SCIENCE

& APPLICATION
ATAL BIHARI VAJPAYEE VISHWVIDHALAYA,
BILASPUR , (C.G)

PRESENTATION TOPIC : HASHING

BY : MUSKAN NIRMALKAR
TO : Mr. JEETENDRA KUMAR
Bsc. Hons. CS 3rd Semester
( Assistant professor)
SUBJECT : DATA STRUCTURE
TABLE OF CONTENTS
01 02 03
WHAT IS COMPONENTS OF TYPES OF HASH
HASHING ? HASHING FUNCTION

04 05 06
COLLISION OPEN CLOSED
RESOLUTION HASHING HASHING
TECHNIQUES
WHAT IS HASHING ?
 Hashing is the process of mapping large amount of data item to smaller table with the
help of hashing function.
 It is a method for storing and retrieve the data from data base in order of one time.
 Hashing is a technique or process of mapping keys, and values into the hash table by
using a hash function.
 Hashing is one of the searching techniques that uses a constant time.
 Hashing is used to index and retrieve items in a database because it is faster to find the
item using the shortest hashed key than to find it using the original value. It is also used
in many encryption algorithms.
 Hashing is the process of generating a value from a text or a list of numbers using a
mathematical function known as a hash function.
Why Hashing ?
● Hashing is one of the searching techniques that uses a constant time. The
time complexity in hashing is O(1). Till now, we read the two techniques for
searching, i.e., linear search and binary search. The worst time complexity in
linear search is O(n), and O(logn) in binary search. In both the searching
techniques, the searching depends upon the number of elements but we want
the technique that takes a constant time. So, hashing technique came that
provides a constant time.
COMPONENTS OF HASHING
There are majorly three components of hashing :-

1. Key : A Key can be anything string or integer which is fed as input in the hash
function the technique that determines an index or location for storage of an
item in a data structure.
2. Hash Function : Is a function that converts a given numeric or alphanumeric
key to a small practical integer value. The mapped integer value is used as an
index in the hash table.
3. Hash Table : Hash table is a data structure that maps keys to values using a
special function called a hash function. Hash stores the data in an associative
manner in an array where each data value has its own unique index.
Types of Hash functions :
There are many hash functions that use numeric or alphanumeric keys.
 Division Method
 Mid Square Method
 Folding Method

Note : A hash function that maps every item into its own
unique slot is known as a perfect hash function.
What is Collision ?
The hashing process generates a small number for a
big key, so there is a possibility that two keys could
produce the same value. The situation where the 24 0
newly inserted key maps to an already occupied, then 19 1
it is known as the condition of collision .
32 2
3
Keys : 24 , 19 , 32 , 44 24 mod 6 = 0
h(k) = k mod 6 19 mod 6 = 1 4
32 mod 6 = 2 5
44 mod 6 = 2
How to handle Collisions?
There are mainly two methods to handle collision:
1. Separate Chaining:
2. Open Addressing:
Collision resolution techniques

Separate Chaining Open Addressing


( open hashing ) ( closed hashing )
Linear Probing
Quadratic Probing
Double hashing
Separate Chaining :
 when multiple elements are hashed into the same slot index, then these
elements are inserted into a singly-linked list which is known as a chain.
 The linked list data structure is used to implement this technique.

0 10
Keys : ( 42 , 19 , 10 , 12 )
Hash function : k mod 5 1 Already filled
2 42 12
42 mod 5 = 2 3
19 mod 5 = 4
10 mod 5 = 0 4 19
12 mod 5 = 2
Advantage :-
Deletion is easy.
Simple to implement.
Inserts in constant time.
Hash table never fills up, we can always add more elements to the
chain.

Disadvantage :-
If the chain becomes long, then search time can become O(n) in the
worst case.
It takes up extra space, even when space is available , so that the empty
space remains empty.
Open Addressing
 Open Addressing is the second most Collision resolution
technique, it is a way of dealing with collisions, similar to the
Separate Chaining process .
 In Open Addressing, the hash table alone stores all of its
elements.
 This approach is also known as closed hashing.
 In Open Addressing, all elements are stored in the hash
table itself.
Linear Probing
In linear probing, the hash table is searched sequentially that starts from the original
location of the hash.
If in case the location that we get is already occupied, then we check for the next
location.
The function used for rehashing is as follows :-
h(k) = k mod key
h’(k, i) = [ h(k) + i ] mod k
h’( k, i) = rehash(key)
h(k) = hash key
i = prob no. / collision no.
[prob No. :- How many time you attempt , means how many times are we checking
whether the index is empty or not .
Example :-
Keys : ( 43 , 135 , 72 , 23 , 99 , 19 , 82 ) 0
1
h(k) = k mod 10 72
Collision number 2
43
h’(k , i ) = ( h(k) + i ) mod 10 23 3
23 4
43 mod 10 = 3 135 5
135 mod 10 = 5
72 mod 10 = 2 6
h(k) = 3
23 mod 10 = 3
h’(k , i) = 3 +1 mod 10 7
99 mod 10 = 9
19 mod 10 = 9 = 4 mod 10 8
=4 99
82 mod 10 = 2 9
19 19 0
h(k) = 99 mod 10 = 9
1
72
2
h(k) = 19 mod 10 = 9 43
h’(k, i) = [ 9 + 1 ] mod 10 = 10 mod10 = 0 3
23 4
135 5
h(k) = 82 mod 10 = 2
h’(k, i) = [ 2 + 1 ] mod 10 = 3mod10 = 3
82 6
h’(k, i) = [ 2 + 2 ] mod 10 = 4mod10 = 4 7
8
h’(k, i) = [ 2 + 3 ] mod 10 = 5mod10 = 5 99
h’(k, i) = [ 2 + 4 ] mod 10 = 6mod10 = 6 9
Advantage :-
It doesn’t take up extra space.
It will feel the given space.
While inserting or deleting it can go up to the order of one in best case.

Disadvantage :-
It take longer searching time.
Deletion is difficult.
It doesn’t deal with primary clustering an secondary clustering.
While inserting or deleting it can go up to the order of n o(n) in worst
case.
1. primary clustering :- [cluster :- group of element ]
Clustering refers to increasing the probability of
occurrence of an element in a particular location.
Ex :- The probability of occurrence of an element in the
index of number 7 has increased .
2. Secondary clustering :- When two or more elements are
competition for same probe sequence.
Ex :- Suppose we take 82 in 2nd index then its probe sequence will be
2,3,4,5,6,7 . In the same way, if we take 52 in the 2 nd index , then it will follow
the same probe sequence.
Quadratic Probing
 This method is also known as the mid-square method.
 In this method, we look for the i2‘th slot in the ith iteration. We always
start from the original hash location. If only the location is occupied then
we check the other slots.
 Quadratic probing is a method with the help of which we can solve the
problem of clustering.
[cluster :- group of element ]
[Clustering :- Clustering refers to increasing the probability of occurrence of
an element in a particular location.]
Keys : 42 , 16 , 91 , 33 , 18 , 27 , 36
36 0
h(k) = k mod 10
91 1
h’(k , i) = (h(k) +i^2) mod 10 42 2
h(k) = 42 mod 10 = 2 33 3
h(k) = 16 mod 10 = 6 4
h(k) = 91 mod 10 = 1
h(k) = 33 mod 10 = 3 5
h(k) =18 mod 10 = 8 36 16 6
h(k) = 27 mod 10 = 7
27 7
h(k) = 36 mod 10 = 6 18 8
h’(k, i) = [ 6 + 12 ] % 10 = 7%10 = 7
h’(k, i) = [ 6 + 22 ] % 10 = 10%10 = 0 9
Advantage :-
It doesn’t take up extra space.
It will feel the given space.
It deal with primary clustering.
While inserting or deleting it can go up to the order of one in best case .

Disadvantage :-
It doesn’t deal with secondary clustering.
No guarantee of finding slot.
It take longer searching time.
Deletion is difficult.
While inserting or deleting it can go up to the order of n o(n) in worst case.
Double Hashing
 Double hashing is a technique that reduces clustering in an optimized
way.
 Double hashing uses the idea of applying a second hash function to
the key when a collision occurs.
 In this technique, the increments for the probing sequence are
computed by using another hash function.
 In this, we will use two hash function .
 The function used as follows :-
h(k) = k mod key
 h’(k, i) = [ h 1 (k) + i2 h2 (k) ] mod k
Keys : 20 , 34 , 45 , 70 , 56
0
h1(k) = k mod 11 45 34 1
h2(k) = 8 – (k mod 8 ) 2
h( k , i) = ( h1(k) + ih2 (k) ) mod 11 3
45 4
20 mod 11 = 9 5
34 mod 11 = 1
45 mod 11 = 1 70 6
70 mod 11 = 4 7
56 mod 11 = 1 8
h2(k) = 8 - ( 45 mod 8) = 8 – 5 = 3 20 9
h’(k, i) = [ 1 + (1 ) 3 ] mod 11 = 4 mod 11 = 4 10
0
34 1
h1 (k) = 56 mod 11 = 1
h2(k) = 8 - ( 56 mod 8) = 8 – 0 = 8 2
h’(k, i) = [ 1 + (1 ) 8 ] mod 11 = 9 mod 11 = 9 56 3
45 4
h’(k, i) = [ 1 + (2 ) 8 ] mod 11 =17 mod 11 = 6 5
70 6
h’(k, i) = [ 1 + (3) 8 ] mod 11 =25 mod 11 = 3 7
8
20 9
10
Advantage :-
 It doesn’t take up extra space.
 It will feel the given space.
 It deal with primary clustering and secondary clustering.
 While inserting or deleting it can go up to the order of one in best case.
 guarantee of finding slot.
Disadvantage :-
 It take longer searching time.
 Deletion is difficult.
 While inserting or deleting it can go up to the order of n o(n) in worst
case.
THANKYOU !

You might also like