You are on page 1of 27

Hashing

Hashing
. Hashing technique is used to calculate the direct
location of a data record on the disk.
• Data is stored at the data blocks whose address is
generated by using the hashing function.
• Memory location where these records are stored is
known as data bucket or data blocks.
• Hash function can choose any of the column
value to generate the address
• Hash function uses the primary key to
generate the address of the data block
• Hash function is a simple mathematical
function to any complex mathematical
function
Hash Organization
Bucket 
• Hash file stores data in bucket format.
• Bucket is considered a unit of storage.
• Bucket typically stores one complete disk block,
which in turn can store one or more records.
Hash Function 
• Hash function, h, is a mapping function that
maps all the set of search-keys K to the
address where actual records are placed.
• It is a function from search keys to bucket
addresses.
Types of Hashing: Static Hashing
– When a search-key value is provided, the hash
function always computes the same address.
– The output address shall always be same for that
function.
– Number of buckets provided remains unchanged at
all times.
Operations of Static Hashing
• Insertion − When a record is required to be entered
using static hash, the hash function h computes the
bucket address for search key K, where the record
will be stored.
Bucket address = h(K)
• Search − When a record needs to be retrieved, the
same hash function can be used to retrieve the
address of the bucket where the data is stored.
• Delete − This is simply a search followed by a
deletion operation.
• Update a Record To update a record, first
search it using a hash function, and then the
data record is updated.
Bucket Overflow
• If we want to insert some new record into the file but
the address of a data bucket generated by the hash
function is not empty, or data already exists in that
address.
• This situation in the static hashing is known as bucket
overflow.
Overflow Chaining −
When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This
mechanism is called Closed Hashing.
Linear Probing −
• When a hash function generates an address at which data is
already stored, the next free bucket is allocated to it.
• This mechanism is called Open Hashing.
Dynamic Hashing
• The problem with static hashing is that it does not
expand or shrink dynamically as the size of the database
grows or shrinks.
• Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-
demand.
• Dynamic hashing is also known as extended hashing.

 
• Hash function, in dynamic hashing, is made to
produce a large number of values and only a
few are used initially.
• Consider the following grouping of keys into buckets,
depending on the prefix of their hash address:

• last two bits of 2 and 4 are 00. So it will go into bucket B0. The
last two bits of 5 and 6 are 01, so it will go into bucket B1. The
last two bits of 1 and 3 are 10, so it will go into bucket B2. The
last two bits of 7 are 11, so it will go into B3
Advantages of dynamic hashing

• Performance does not decrease as the data


grows in the system. It simply increases the
size of memory to accommodate the data.
• Memory is well utilized as it grows and shrinks
with the data. There will not be any unused
memory lying.
Disadvantages of dynamic hashing
• If the data size increases then the bucket size
is also increased.
• These addresses of data will be maintained in
the bucket address table. Because the data
address will keep changing as buckets grow
and shrink.
• If there is a huge increase in data, maintaining
the bucket address table becomes tedious.
B Tree
B Tree
• B Tree is a specialized m-way tree that can be widely
used for disk access
• B-Tree of order m can have at most m-1 keys and m
children.
• Reason for using B tree is its capability to store large
number of keys in a single node and large key values by
keeping the height of the tree relatively small.
• Number of keys in a node is one less than the number of
its children .
• Root has a minimum one key to maximum (m-1)
keys.
• Root has minimum two children to maximum m
children.
• All leaf nodes are on the same level.
• Non-leaf nodes are called internal nodes (nodes
having child).
• Leaf nodes are called external nodes (nodes having
no child).
• Every node in a B-Tree contains at most m
children.
• Every node in a B-Tree except the root node
and the leaf node contain at least m/2
children.
• The root nodes must have at least 2 nodes.
Operations
Searching :
• Searching in B Trees is similar to that in Binary
search tree. For example, if we search for an
item 120 in the following B Tree.

You might also like