Professional Documents
Culture Documents
By
Pravakar Ghimire
• It is an efficient searching technique in which key is
placed in direct accessible address for rapid search.
• Hashing provides the direct access of records from
the file no matter where the record is in the file.
Due to which it reduces the unnecessary
comparisons.
• This technique uses a hashing function say h which
maps the key with the corresponding key address or
location.
Hashing Performance
There are three factors that influence the
performance of hashing:
Hash function
should distribute the keys and entries evenly
throughout the entire table
should minimize collisions
Hashing Performance
Collision resolution strategy
Open Addressing: store the key/entry in a different position
Separate Chaining: chain several keys/entries in the same
position
Table size
Too large a table, will cause a wastage of memory
Too small a table will cause increased collisions and eventually
force rehashing (creating a new hash table of larger size and
copying the contents of the current hash table into it)
The size should be appropriate to the hash function used and
should typically be a prime number. Why?
Hash Function
• A function that transforms a key into a table
index is called a hash function.
• A common hash function is
h(x)=x mod SIZE
Some Popular Hash Functions
• The hash function converts the key into the table position. It can be carried out
using:
• Modular Arithmetic( The one we saw in previous slide): Compute the index by
dividing the key with some value and use the remainder as the index. This forms
the basis of the next two techniques. For Example: index := key MOD table_size
• Truncation: Ignoring part of the key and using the rest as the array index. The
problem with this approach is that there may not always be an even distribution
throughout the table. For Example: If student id’s are the key 928324312 then
select just the last three digits as the index i.e. 312 as the index. => the table size
has to be at least 999. Why?
• Folding: Partition the key into several pieces and then combine it in some
convenient way.
.......and a lot more for which I’m too lazy to describe them here, see it yourself
Desired Properties in Hash Function
1 81 1
2
4 64 4
5 25
6 36 16
7
9 49 9
Analysis of Separate Chaining
• Collisions are very likely.
– How likely and what is the average length of lists?
• Load factor definition:
– Ratio of number of elements (N) in a hash table to
the hash TableSize.
• i.e. = N/TableSize
– The average length of a list is also .
– For chaining is not bound by 1; it can be > 1.
29
Example
• Directory is array of size 4.
• What info this hash data
structure maintains?
– Need to know current #bits
used in the hash function
• Global Depth
– May need to know #bits used
to hash a specific bucket
• Local Depth
– Can global depth < local depth?
Points to remember
• What is the global depth of directory?
– Max # of bits needed to tell which bucket an entry belongs to.
• What is the local depth of a bucket?
– # of bits used to determine if an entry belongs to this bucket.
• When does bucket split cause directory doubling?
– Before insert, bucket is full & local depth = global depth.
• How to efficiently double the directory?
– Directory is doubled by copying it over and `fixing’ pointer to split
image page.
– Note that you can do this only by using the least significant bits in the
directory.