Professional Documents
Culture Documents
Linear Search
Linear search is a very simple search algorithm. In this type of search, a sequential search is
made over all items one by one. Every item is checked and if a match is found then that
particular item is returned, otherwise the search continues till the end of the data collection.
Algorithm
Step 1: Set i to 1
Step 2: if i > n then go to step 7
Step 3: if A[i] = x then go to step 6
Step 4: Set i to i + 1
Step 5: Go to Step 2
Step 6: Print Element x Found at index i and go to step 8
Step 7: Print element not found
Step 8: Exit
Binary Search
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This search
algorithm works on the principle of divide and conquer. For this algorithm to work properly,
the data collection should be in the sorted form.
Binary search looks for a particular item by comparing the middle most item of the
collection. If a match occurs, then the index of item is returned. If the middle item is greater
than the item, then the item is searched in the sub-array to the left of the middle item.
Otherwise, the item is searched for in the sub-array to the right of the middle item. This
process continues on the sub-array as well until the size of the sub array reduces to zero.
For a binary search to work, it is mandatory for the target array to be sorted. We shall learn
the process of binary search with a pictorial example. The following is our sorted array and
let us assume that we need to search the location of value 31 using binary search.
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We
find that the value at location 4 is 27, which is not a match. As the value is greater than 27
and we have a sorted array, so we also know that the target value must be in the upper
portion of the array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid =(high + low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.
The value stored at location 7 is not a match, rather it is more than what we are looking for.
So, the value must be in the lower part from this location.
We compare the value stored at location 5 with our target value. We find that it is a match.
Set lowerBound = 1
Set upperBound = n
if A[midPoint] < x
set lowerBound = midPoint + 1
if A[midPoint] > x
set upperBound = midPoint - 1
if A[midPoint] = x
EXIT: x found at location midPoint
end while
end procedure
In external sorting it will on disks, outside main memory. It can be because the data is huge
and cannot be stored in main memory. While sorting the data will pulled over in chunks from
disk to main memory. Later all the sorted data will be merged and stored back to disk, where
it can fit. External merge sort can be used here.
BUBBLE SORT
Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison-based
algorithm in which each pair of adjacent elements is compared and the elements are
swapped if they are not in order. This algorithm is not suitable for large data sets as its
average and worst case complexity are of Ο(n2) where n is the number of items.
We take an unsorted array for our example. Bubble sort takes Ο(n2) time so we're keeping it
short and precise.
Bubble sort starts with very first two elements, comparing them to check which one is
greater.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we
compare 33 with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
Next we compare 33 and 35. We find that both are in already sorted positions.
We know then that 10 is smaller 35. Hence they are not sorted.
We swap these values. We find that we have reached the end of the array. After one
iteration, the array should look like this −
To be precise, we are now showing how an array should look like after each iteration. After
the second iteration, it should look like this −
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely sorted.
Algorithm
We assume list is an array of n elements. We further assume that swap function swaps the
values of the given array elements.
begin BubbleSort(list)
return list
end BubbleSort
SELECTION SORT
Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place
comparison-based algorithm in which the list is divided into two parts, the sorted part at the
left end and the unsorted part at the right end. Initially, the sorted part is empty and the
unsorted part is the entire list.
The smallest element is selected from the unsorted array and swapped with the leftmost
element, and that element becomes a part of the sorted array. This process continues moving
unsorted array boundary by one element to the right.
This algorithm is not suitable for large data sets as its average and worst case complexities
are of Ο(n2), where n is the number of items.
For the first position in the sorted list, the whole list is scanned sequentially. The first
position where 14 is stored presently, we search the whole list and find that 10 is the lowest
value.
So we replace 14 with 10. After one iteration 10, which happens to be the minimum value in
the list, appears in the first position of the sorted list.
For the second position, where 33 is residing, we start scanning the rest of the list in a linear
manner.
We find that 14 is the second lowest value in the list and it should appear at the second
place. We swap these values.
After two iterations, two least values are positioned at the beginning in a sorted manner.
The same process is applied to the rest of the items in the array.
Following is a pictorial depiction of the entire sorting process −
Now, let us learn some programming aspects of selection sort.
Algorithm
Step 1 − Set MIN to location 0
Step 2 − Search the minimum element in the list
Step 3 − Swap with value at location MIN
Step 4 − Increment MIN to point to next element
Step 5 − Repeat until list is sorted
Pseudocode
procedure selection sort
list : array of items
n : size of list
for i = 1 to n - 1
/* set current element as minimum*/
min = i
for j = i+1 to n
if list[j] < list[min] then
min = j;
end if
end for
end procedure
INSERTION SORT
This is an in-place comparison-based sorting algorithm. Here, a sub-list is maintained which
is always sorted. For example, the lower part of an array is maintained to be sorted. An
element which is to be 'insert'ed in this sorted sub-list, has to find its appropriate place and
then it has to be inserted there. Hence the name, insertion sort.
The array is searched sequentially and unsorted items are moved and inserted into the sorted
sub-list (in the same array). This algorithm is not suitable for large data sets as its average
and worst case complexity are of Ο(n2), where n is the number of items.
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list.
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that
the sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-
list remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.
This process goes on until all the unsorted values are covered in a sorted sub-list. Now we
shall see some programming aspects of insertion sort.
Algorithm
Now we have a bigger picture of how this sorting technique works, so we can derive simple
steps by which we can achieve insertion sort.
Step 1 − If it is the first element, it is already sorted. return 1;
Step 2 − Pick next element
Step 3 − Compare with all elements in the sorted sub-list
Step 4 − Shift all the elements in the sorted sub-list that is greater than the
value to be sorted
Step 5 − Insert the value
Step 6 − Repeat until list is sorted
Pseudocode
end for
end procedure
QUICK SORT
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data
into smaller arrays. A large array is partitioned into two arrays one of which holds values
smaller than the specified value, say pivot, based on which the partition is made and another
array holds values greater than the pivot value.
Quicksort partitions an array and then calls itself recursively twice to sort the two resulting
subarrays. This algorithm is quite efficient for large-sized data sets as its average and worst-
case complexity are O(nLogn) and image.png(n2), respectively.
Partition in Quick Sort
The pivot value divides the list into two parts. And recursively, we find the pivot for each
sub-lists until all lists contains only one element.
In 3 way merge sort we divide array repeatedly in three parts and then merge it to get sorted
array
In k way erge sort we divide array in k parts and then merge them to get sorted array
There is a slight difference between merge and two way merge some people say merge sort is
2 way merge using recursion while two way merge sort is 2 way merge using iteration.
Merge sort is a sorting technique based on divide and conquer technique. With worst-case
time complexity being Ο(n log n), it is one of the most respected algorithms.
Merge sort first divides the array into equal halves and then combines them in a sorted
manner.
We know that merge sort first divides the whole array iteratively into equal halves unless the
atomic values are achieved. We see here that an array of 8 items is divided into two arrays of
size 4.
This does not change the sequence of appearance of items in the original. Now we divide
these two arrays into halves.
We further divide these arrays and we achieve atomic value which can no more be divided.
Now, we combine them in exactly the same manner as they were broken down. Please note
the color codes given to these lists.
We first compare the element for each list and then combine them into another list in a
sorted manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10 and in
the target list of 2 values we put 10 first, followed by 27. We change the order of 19 and 35
whereas 42 and 44 are placed sequentially.
In the next iteration of the combining phase, we compare lists of two data values, and merge
them into a list of found data values placing all in a sorted order.
After the final merging, the list should look like this −
Heap is a special case of balanced binary tree data structure where the root-node key is
compared with its children and arranged accordingly. If α has child node β then −
key(α) ≥ key(β)
As the value of parent is greater than that of child, this property generates Max Heap. Based
on this criteria, a heap can be of two types −
For Input → 35 33 42 10 14 19 27 44 26 31
Min-Heap − Where the value of the root node is less than or equal to either of its children.
Max-Heap − Where the value of the root node is greater than or equal to either of its
children.
Both trees are constructed using the same input and order of arrival.
Heaps
A heap is a tree-based data structure in which all the nodes of the tree are in a specific order.
For example, if X is the parent node of Y, then the value of X follows a specific order with
respect to the value of Y and the same order will be followed across the tree.
The maximum number of children of a node in a heap depends on the type of heap. However,
in the more commonly-used heap type, there are at most 2 children of a node and it's known
as a Binary heap.
In binary heap, if the heap is a complete binary tree with N nodes, then it has smallest
possible height which is log2N .
In the diagram above, you can observe a particular sequence, i.e each node has greater value
than any of its children.
Suppose there are N Jobs in a queue to be done, and each job has its own priority. The job
with maximum priority will get completed first than others. At each instant, we are
completing a job with maximum priority and at the same time we are also interested in
inserting a new job in the queue with its own priority.
So at each instant we have to check for the job with maximum priority to complete it and also
insert if there is a new job. This task can be very easily executed using a heap by
considering N jobs as N nodes of the tree.
As you can see in the diagram below, we can use an array to store the nodes of the tree. Let’s
say we have 7 elements with values {6, 4, 5, 3, 2, 0, 1}.
Note: An array can be used to simulate a tree in the following way. If we are storing one
element at index i in array Arr, then its parent will be stored at index i/2 (unless its a root, as
root has no parent) and can be accessed by Arr[i/2], and its left child can be accessed
by Arr[2∗i] and its right child can be accessed by Arr[2∗i+1]. Index of root will be 1 in an
array.
There can be two types of heap:
Max Heap: In this type of heap, the value of parent node will always be greater than or equal
to the value of child node across the tree and the node with highest value will be the root
node of the tree.
Implementation:
Let’s assume that we have a heap having some elements which are stored in array Arr. The
way to convert this array into a heap structure is the following. We pick a node in the array,
check if the left sub-tree and the right sub-tree are max heaps, in themselves and the node
itself is a max heap (it’s value should be greater than all the child nodes)
To do this we will implement a function that can maintain the property of max heap (i.e each
element value should be greater than or equal to any of its child and smaller than or equal to
its parent)
Complexity: O(logN)
Example:
In the diagram below,initially 1st node (root node) is violating property of max-heap as it has
smaller value than its children, so we are performing max_heapify function on this node
having value 4.
Now as we can see that we can maintain max- heap by using max_heapify function.
Before moving ahead, lets observe a property which states: A N element heap stored in an
array has leaves indexed by N/2+1, N/2+2 , N/2+3 …. upto N.
Now let’s say we have N elements stored in the array Arr indexed from 1 to N. They are
currently not following the property of max heap. So we can use max-heapify function to
make a max heap out of the array.
How?
From the above property we observed that elements from Arr[N/2+1] to Arr[N] are leaf
nodes, and each node is a 1 element heap. We can use max_heapify function in a bottom up
manner on remaining nodes, so that we can cover each node of tree.
Example:
Suppose you have 7 elements stored in array Arr.
Here N=7, so starting from node having index N/2=3, (also having value 3 in the above
diagram), we will call max_heapify from index N/2 to 1.
In step 1, in max_heapify(Arr, 3), as 10 is greater than 3, 3 and 10 are swapped and further
call to max_heap(Arr, 7) will have no effect as 3 is a leaf node now.
In step 2, calling max_heapify(Arr, 2) , (node indexed with 2 has value 4) , 4 is swapped with
8 and further call to max_heap(Arr, 5) will have no effect, as 4 is a leaf node now.
In step 3, calling max_heapify(Arr, 1) , (node indexed with 1 has value 1 ), 1 is swapped with
10 .
Step 4 is a subpart of step 3, as after swapping 1 with 10, again a recursive call of
max_heapify(Arr, 3) will be performed , and 1 will be swapped with 9. Now further call to
max_heapify(Arr, 7) will have no effect, as 1 is a leaf node now.
In step 5, we finally get a max- heap and the elements in the array Arr will be :
Min Heap: In this type of heap, the value of parent node will always be less than or equal to
the value of child node across the tree and the node with lowest value will be the root node of
tree.
As you can see in the above diagram, each node has a value smaller than the value of their
children.
We can perform same operations as performed in building max_heap.
First we will make function which can maintain the min heap property, if some element is
violating it.
Complexity: O(logN) .
Example:
Suppose you have elements stored in array Arr {4, 5, 1, 6, 7, 3, 2}. As you can see in the
diagram below, the element at index 1 is violating the property of min -heap, so performing
min_heapify(Arr, 1) will maintain the min-heap.
Now let’s use above function in building min-heap. We will run the above function on
remaining nodes other than leaves as leaf nodes are 1 element heap.
Complexity: O(N). The complexity calculation is similar to that of building max heap.
Example:
Consider elements in array {10, 8, 9, 7, 6, 5, 4} . We will run min_heapify on nodes indexed
from N/2 to 1. Here node indexed at N/2 has value 9. And at last, we will get a min_heap.
Heaps can be considered as partially ordered tree, as you can see in the above examples that
the nodes of tree do not follow any order with their siblings(nodes on the same level). They
can be mainly used when we give more priority to smallest or the largest node in the tree as
we can extract these node very efficiently using heaps.
APPLICATIONS:
1) Heap Sort:
We can use heaps in sorting the elements in a specific order in efficient time.
Let’s say we want to sort elements of array Arr in ascending order. We can use max heap to
perform this operation.
Idea: We build the max heap of elements stored in Arr, and the maximum element
of Arr will always be at the root of the heap.
Processing:
Implementation:
Example:
In the diagram below,initially there is an unsorted array Arr having 6 elements. We begin by
building max-heap.
After building max-heap, the elements in the array Arr will be:
Processing:
2) Priority Queue:
Priority Queue is similar to queue where we insert an element from the back and remove an
element from front, but with a difference that the logical order of elements in the priority
queue depends on the priority of the elements. The element with highest priority will be
moved to the front of the queue and one with lowest priority will move to the back of the
queue. Thus it is possible that when you enqueue an element at the back in the queue, it can
move to front because of its highest priority.
Example:
Let’s say we have an array of 5 elements : {4, 8, 1, 7, 3} and we have to insert all the
elements in the max-priority queue.
First as the priority queue is empty, so 4 will be inserted initially.
Now when 8 will be inserted it will moved to front as 8 is greater than 4.
While inserting 1, as it is the current minimum element in the priority queue, it will remain in
the back of priority queue.
Now 7 will be inserted between 8 and 4 as 7 is smaller than 8.
Now 3 will be inserted before 1 as it is the 2nd minimum element in the priority queue. All the
steps are represented in the diagram below:
We can think of many ways to implement the priority queue.
Naive Approach:
Suppose we have N elements and we have to insert these elements in the priority queue. We
can use list and can insert elements in O(N) time and can sort them to maintain a priority
queue in O(NlogN) time.
Efficient Approach:
We can use heaps to implement the priority queue. It will take O(logN) time to insert and
delete each element in the priority queue.
Based on heap structure, priority queue also has two types max- priority queue and min -
priority queue.
Max Priority Queue is based on the structure of max heap and can perform following
operations:
Implementation:
Complexity: O(1)
Extract Maximum: In this operation, the maximum element will be returned and the last
element of heap will be placed at index 1 and max_heapify will be performed on node 1 as
placing last element on index 1 will violate the property of max-heap.
Complexity: O(logN).
Increase Value: In case increasing value of any node, it may violate the property of max-
heap, so we may have to swap the parent’s value with the node’s value until we get a larger
value on parent node.
Complexity : O(logN).
Insert Value :
void insert_value (int Arr[ ], int val)
{
length = length + 1;
Arr[ length ] = -1; //assuming all the numbers greater than 0 are to be inserted in queue.
increase_val (Arr, length, val);
}
Complexity: O(logN).
Example:
Storage device
There are two types of storage devices used with computers: a primary storage device, such
as RAM, and a secondary storage device, such as a hard drive. Secondary storage can
be removable, internal, or external.
Today, magnetic storage is one of the most common types of storage used with computers.
This technology is found mostly on extremely large HDDs or hybrid hard drives.
Floppy diskette
Hard drive
Magnetic strip
SuperDisk
Tape cassette
Zip diskette
Optical storage devices
Another common type of storage is optical storage, which uses lasers and lights as its method
of reading and writing data.
Blu-ray disc
CD-ROM disc
Flash memory has replaced most magnetic and optical media as it becomes cheaper because it
is the more efficient and reliable solution.
CF (CompactFlash)
M.2
Memory card
MMC
NVMe
SDHC Card
SmartMedia Card
SD card
SSD
xD-Picture Card
Storing data online and in cloud storage is becoming popular as people need to access their
data from more than one device.
Cloud storage
Network media
Paper storage
Early computers had no method of using any of the technologies above for storing
information and had to rely on paper. Today, these forms of storage are rarely used or found.
In the picture is an example of a woman entering data to a punch card using a punch card
machine.
OMR
Punch card
Why is storage needed in a computer?
Without a storage device, a computer cannot save or remember any settings or information
and would be considered a dumb terminal.
Although a computer can run with no storage device, it would only be able to view
information, unless it was connected to another computer that had storage capabilities.
Even a task, such as browsing the Internet, requires information to be stored on your
computer.
As computers advance, the technologies used to store data do too, right along with higher
requirements for storage space. Because people need more and more space, want it faster,
cheaper, and want to take it with them, new technologies have to be invented. When new
storage devices are designed, as people upgrade to those new devices, the older devices are no
longer needed and stop being used.
For example, when punch cards were first used in early computers, the magnetic media used
for floppy disks was not available. After floppy diskettes were released, they were replaced by
CD-ROM drives, which were replaced by DVD drives, which have been replaced by flash
drives. The first hard disk drive from IBM cost $50,000, was only 5 MB, big, and
cumbersome. Today, we have smartphones that have hundreds of times the capacity at a much
smaller price that we can carry in our pocket.
Each advancement of storage devices gives a computer the ability to store more data, as well
as save and access data faster.
When saving anything on a computer, it may ask you for a storage location, which is the area
where you would like to save the information. By default, most information is saved to your
computer hard drive. If you want to move the information to another computer, save it to a
removable storage device, such as a USB flash drive.
INDEXING TECHNIQUE
We know that data is stored in the form of records. Every record has a key field, which helps
it to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database files
based on some attributes on which the indexing has been done. Indexing in database systems
is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values.
Clustering Index − Clustering index is defined on an ordered data file. The data file
is ordered on a non-key field.
Ordered Indexing is of two types −
Dense Index
Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This
makes searching faster but requires more space to store index records itself. Index records
contain search key value and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record here
contains a search key and an actual pointer to the data on the disk. To search a record, we
first proceed by index record and reach at the actual location of the data. If the data we are
looking for is not where we directly reach by following the index, then the system starts
sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on
the disk along with the actual database files. As the size of the database grows, so does the
size of the indices. There is an immense need to keep the index records in the main memory
so as to speed up the search operations. If single-level index is used, then a large size index
cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order to
make the outermost level so small that it can be saved in a single disk block, which can
easily be accommodated anywhere in the main memory.
B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order
m can have at most m-1 keys and m children. One of the main reason of using B tree is its
capability to store large number of keys in a single node and large key values by keeping the
height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the
following properties.
It is not necessary that, all the nodes contain the same number of children but, each node
must have m/2 number of nodes.
Operations
Searching :
Searching in B Trees is similar to that in Binary search tree. For example, if we search for an
item 49 in the following B Tree. The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm takes O(log
n) time to search any element in a B tree.
Inserting
Insertions are done at the leaf node level. The following algorithm needs to be followed in
order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can
be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing
order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the
node from the median i.e. 8 and push it up to its parent node shown as follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a
leaf node or an internal node. Following algorithm needs to be followed in order to delete a
node from a B tree.
If the the node which is to be deleted is an internal node, then replace the node with its in-
order successor or predecessor. Since, successor or predecessor will always be on the leaf
node hence, the process will be similar as the node is being deleted from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
53 is present in the right child of element 49. Delete it.
Now, 57 is the only element which is left in the node, the minimum number of elements that
must be present in a B tree of order 5, is 2. it is less than that, the elements in its left and right
sub-tree are also not sufficient therefore, merge it with the left sibling and intervening
element of parent i.e. 49.
Deletion in B-Tree
For deletion in b tree we wish to remove from a leaf. There are three possible case for
deletion in b tree.
Let k be the key to be deleted, x the node containing the key. Then the cases are:
Case-I
If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too
few keys, then simply remove the key to be deleted. key k is in node x and x is a leaf, simply
delete k from x.
6 deleted
Case-II
If key k is in node x and x is an internal node, there are three cases to consider:
Case-II-a
If the child y that precedes k in node x has at least t keys (more than the minimum), then find
the predecessor key k' in the subtree rooted at y. Recursively delete k' and replace k with k' in
x
Case-II-b
Symmetrically, if the child z that follows k in node x has at least t keys, find the successor k'
and delete and replace as before. Note that finding k' and deleting it can be performed in a
single downward pass.
13 deleted
Case-II-c
Otherwise, if both y and z have only t−1 (minimum number) keys, merge k and all of z into
y, so that both k and the pointer to z are removed from x. y now contains 2t − 1 keys, and
subsequently k is deleted.
7 deleted
Case-III
If key k is not present in an internal node x, determine the root of the appropriate subtree that
must contain k. If the root has only t − 1 keys, execute either of the following two cases to
ensure that we descend to a node containing at least t keys. Finally, recurse to the appropriate
child of x.
Case-III-a
If the root has only t−1 keys but has a sibling with t keys, give the root an extra key by
moving a key from x to the root, moving a key from the roots immediate left or right sibling
up into x, and moving the appropriate child from the sibling to x.
2 deleted
Case-III-b
If the root and all of its siblings have t−1 keys, merge the root with one sibling. This involves
moving a key down from x into the new merged node to become the median key for that
node.
4 deleted
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored on the disks
since, the access to value stored in a large database that is stored on a disk is a very time
consuming process.
Searching an un-indexed and unsorted database containing n key values needs O(n) running
time in worst case. However, if we use B Tree to index this database, it will be searched in
O(log n) time in worst case.
B+ Tree
B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search
operations.
In B Tree, Keys and records both can be stored in the internal as well as leaf nodes. Whereas,
in B+ tree, records (data) can only be stored on the leaf nodes while internal nodes can only
store the key values.
The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make the
search queries more efficient.
B+ Tree are used to store the large amount of data which can not be stored in the main
memory. Due to the fact that, size of main memory is always limited, the internal nodes (keys
to access records) of the B+ tree are stored in the main memory whereas, leaf nodes are
stored in the secondary memory.
The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in
the following figure.
Advantages of B+ Tree
B Tree VS B+ Tree
SN B Tree B+ Tree
Insertion in B+ Tree
Step 2: If the leaf doesn't have required space, split the node and copy the middle node to the
next index node.
Step 3: If the index node doesn't have required space, split the node and copy the middle
element to the next index page.
Example :
Insert the value 195 into the B+ tree of order 5 shown in the following figure.
195 will be inserted in the right sub-tree of 120 after 190. Insert it at the desired position.
The node contains greater than the maximum number of elements i.e. 4, therefore split it and
place the median node up to the parent.
Now, the index node contains 6 children and 5 keys which violates the B+ tree properties,
therefore we need to split it, shown as follows.
Deletion in B+ Tree
Step 2: if the leaf node contains less than minimum number of elements, merge down the
node with its sibling and delete the key in between them.
Step 3: if the index node contains less than minimum number of elements, merge the node
with the sibling and move down the key in between them.
Example
Delete the key 200 from the B+ Tree shown in the following figure.
200 is present in the right sub-tree of 190, after 195. delete it.
Merge the two nodes by using 195, 190, 154 and 129.
Now, element 120 is the single element present in the node which is violating the B+ Tree
properties. Therefore, we need to merge it by using 60, 78, 108 and 120.