Unit 5

SEARCHING AND SORTING
Linear Search
Linear search is a very simple search algorithm. In this type of search, a sequential search is
made over all items one by one. Every item is checked and if a match is found then that
particular item is returned, otherwise the search continues till the end of the data collection.
Algorithm
Linear Search ( Array A, Value x)
Step 1: Set i to 1
Step 2: if i > n then go to step 7
Step 3: if A[i] = x then go to step 6
Step 4: Set i to i + 1
Step 5: Go to Step 2
Step 6: Print Element x Found at index i and go to step 8
Step 7: Print element not found
Step 8: Exit
Binary Search
Binary search is a fast search algorithm with run-time complexity of Ο(log n). This search
algorithm works on the principle of divide and conquer. For this algorithm to work properly,
the data collection should be in the sorted form.
Binary search looks for a particular item by comparing the middle most item of the
collection. If a match occurs, then the index of item is returned. If the middle item is greater
than the item, then the item is searched in the sub-array to the left of the middle item.
Otherwise, the item is searched for in the sub-array to the right of the middle item. This
process continues on the sub-array as well until the size of the sub array reduces to zero.
How Binary Search Works?
For a binary search to work, it is mandatory for the target array to be sorted. We shall learn
the process of binary search with a pictorial example. The following is our sorted array and
let us assume that we need to search the location of value 31 using binary search.
First, we shall determine half of the array by using this formula −

mid = (low+high) / 2
Here it is, (9+0 ) / 2 = 4 (integer value of 4.5). So, 4 is the mid of the array.
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We
find that the value at location 4 is 27, which is not a match. As the value is greater than 27
and we have a sorted array, so we also know that the target value must be in the upper
portion of the array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid =(high + low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.
The value stored at location 7 is not a match, rather it is more than what we are looking for.
So, the value must be in the lower part from this location.
Hence, we calculate the mid again. This time it is 5.
We compare the value stored at location 5 with our target value. We find that it is a match.
We conclude that the target value 31 is stored at location 5.

Binary search halves the searchable items and thus reduces the count of comparisons to be
made to very less numbers.
Pseudocode
The pseudocode of binary search algorithms should look like this −

Procedure binary_search
A ← sorted array
n ← size of array
x ← value to be searched
Set lowerBound = 1
Set upperBound = n
while x not found

if upperBound < lowerBound
EXIT: x does not exists.
set midPoint = ( upperBound+lowerBound ) / 2
if A[midPoint] < x
set lowerBound = midPoint + 1
if A[midPoint] > x
set upperBound = midPoint - 1
if A[midPoint] = x
EXIT: x found at location midPoint
end while
end procedure
What is internal sorting and external sorting?

In internal sorting the data that has to be sorted will be in the main memory always, implying
faster access. Complete sorting will happen in main memory. Insertion sort, quick sort, heap
sort, radix sort can be used for internal sorting.
In external sorting it will on disks, outside main memory. It can be because the data is huge
and cannot be stored in main memory. While sorting the data will pulled over in chunks from
disk to main memory. Later all the sorted data will be merged and stored back to disk, where
it can fit. External merge sort can be used here.
BUBBLE SORT
Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison-based
algorithm in which each pair of adjacent elements is compared and the elements are
swapped if they are not in order. This algorithm is not suitable for large data sets as its
average and worst case complexity are of Ο(n2) where n is the number of items.
How Bubble Sort Works?
We take an unsorted array for our example. Bubble sort takes Ο(n2) time so we're keeping it
short and precise.
Bubble sort starts with very first two elements, comparing them to check which one is
greater.
In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we
compare 33 with 27.
We find that 27 is smaller than 33 and these two values must be swapped.
The new array should look like this −
Next we compare 33 and 35. We find that both are in already sorted positions.
Then we move to the next two values, 35 and 10.
We know then that 10 is smaller 35. Hence they are not sorted.
We swap these values. We find that we have reached the end of the array. After one
iteration, the array should look like this −
To be precise, we are now showing how an array should look like after each iteration. After
the second iteration, it should look like this −
Notice that after each iteration, at least one value moves at the end.
And when there's no swap required, bubble sorts learns that an array is completely sorted.
Now we should look into some practical aspects of bubble sort.
Algorithm
We assume list is an array of n elements. We further assume that swap function swaps the
values of the given array elements.
begin BubbleSort(list)
for all elements of list

if list[i] > list[i+1]
swap(list[i], list[i+1])
end if
end for
return list
end BubbleSort
SELECTION SORT
Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place
comparison-based algorithm in which the list is divided into two parts, the sorted part at the
left end and the unsorted part at the right end. Initially, the sorted part is empty and the
unsorted part is the entire list.
The smallest element is selected from the unsorted array and swapped with the leftmost
element, and that element becomes a part of the sorted array. This process continues moving
unsorted array boundary by one element to the right.
This algorithm is not suitable for large data sets as its average and worst case complexities
are of Ο(n2), where n is the number of items.
How Selection Sort Works?
Consider the following depicted array as an example.
For the first position in the sorted list, the whole list is scanned sequentially. The first
position where 14 is stored presently, we search the whole list and find that 10 is the lowest
value.
So we replace 14 with 10. After one iteration 10, which happens to be the minimum value in
the list, appears in the first position of the sorted list.
For the second position, where 33 is residing, we start scanning the rest of the list in a linear
manner.
We find that 14 is the second lowest value in the list and it should appear at the second
place. We swap these values.
After two iterations, two least values are positioned at the beginning in a sorted manner.
The same process is applied to the rest of the items in the array.
Following is a pictorial depiction of the entire sorting process −
Now, let us learn some programming aspects of selection sort.
Algorithm
Step 1 − Set MIN to location 0
Step 2 − Search the minimum element in the list
Step 3 − Swap with value at location MIN
Step 4 − Increment MIN to point to next element
Step 5 − Repeat until list is sorted
Pseudocode
procedure selection sort
list : array of items
n : size of list
for i = 1 to n - 1
/* set current element as minimum*/
min = i
/* check the element to be minimum */
for j = i+1 to n
if list[j] < list[min] then
min = j;
end if
end for
/* swap the minimum element with the current element*/

if indexMin != i then
swap list[min] and list[i]
end if
end for
end procedure
INSERTION SORT
This is an in-place comparison-based sorting algorithm. Here, a sub-list is maintained which
is always sorted. For example, the lower part of an array is maintained to be sorted. An
element which is to be 'insert'ed in this sorted sub-list, has to find its appropriate place and
then it has to be inserted there. Hence the name, insertion sort.
The array is searched sequentially and unsorted items are moved and inserted into the sorted
sub-list (in the same array). This algorithm is not suitable for large data sets as its average
and worst case complexity are of Ο(n2), where n is the number of items.
How Insertion Sort Works?
We take an unsorted array for our example.
Insertion sort compares the first two elements.
It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted sub-list.
Insertion sort moves ahead and compares 33 with 27.
And finds that 33 is not in the correct position.
It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see that
the sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the sorted sub-
list remains sorted after swapping.
By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.
These values are not in a sorted order.

So we swap them.
However, swapping makes 27 and 10 unsorted.
Hence, we swap them too.
Again we find 14 and 10 in an unsorted order.
We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.
This process goes on until all the unsorted values are covered in a sorted sub-list. Now we
shall see some programming aspects of insertion sort.
Algorithm
Now we have a bigger picture of how this sorting technique works, so we can derive simple
steps by which we can achieve insertion sort.
Step 1 − If it is the first element, it is already sorted. return 1;
Step 2 − Pick next element
Step 3 − Compare with all elements in the sorted sub-list
Step 4 − Shift all the elements in the sorted sub-list that is greater than the
value to be sorted
Step 5 − Insert the value
Step 6 − Repeat until list is sorted
Pseudocode
procedure insertionSort( A : array of items )

int holePosition
int valueToInsert
for i = 1 to length(A) inclusive do:
/* select value to be inserted */

valueToInsert = A[i]
holePosition = i
/*locate hole position for the element to be inserted */
while holePosition > 0 and A[holePosition-1] > valueToInsert do:

A[holePosition] = A[holePosition-1]
holePosition = holePosition -1
end while
/* insert the number at hole position */

A[holePosition] = valueToInsert
end for
end procedure
QUICK SORT
Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of data
into smaller arrays. A large array is partitioned into two arrays one of which holds values
smaller than the specified value, say pivot, based on which the partition is made and another
array holds values greater than the pivot value.
Quicksort partitions an array and then calls itself recursively twice to sort the two resulting
subarrays. This algorithm is quite efficient for large-sized data sets as its average and worst-
case complexity are O(nLogn) and image.png(n2), respectively.
Partition in Quick Sort
The pivot value divides the list into two parts. And recursively, we find the pivot for each
sub-lists until all lists contains only one element.
Quick Sort Pivot Algorithm

Based on our understanding of partitioning in quick sort, we will now try to write an
algorithm for it, which is as follows.
Step 1 − Choose the highest index value has pivot
Step 2 − Take two variables to point left and right of the list excluding pivot
Step 3 − left points to the low index
Step 4 − right points to the high
Step 5 − while value at left is less than pivot move right
Step 6 − while value at right is greater than pivot move left
Step 7 − if both step 5 and step 6 does not match swap left and right
Step 8 − if left ≥ right, the point where they met is new pivot
Using pivot algorithm recursively, we end up with smaller possible partitions. Each partition
is then processed for quick sort. We define recursive algorithm for quicksort as follows −
Step 1 − Make the right-most index value pivot
Step 2 − partition the array using pivot value
Step 3 − quicksort left partition recursively
Step 4 − quicksort right partition recursively
2-WAY MERGE SORT OR MERGE SORT

2 way Merge sort by default mean merge sort In which we sort array by dividing array
repeatedly in two parts and then we merge them to to get sorted array.
In 3 way merge sort we divide array repeatedly in three parts and then merge it to get sorted
array
In k way erge sort we divide array in k parts and then merge them to get sorted array
There is a slight difference between merge and two way merge some people say merge sort is
2 way merge using recursion while two way merge sort is 2 way merge using iteration.
Merge sort is a sorting technique based on divide and conquer technique. With worst-case
time complexity being Ο(n log n), it is one of the most respected algorithms.
Merge sort first divides the array into equal halves and then combines them in a sorted
manner.
How Merge Sort Works?
To understand merge sort, we take an unsorted array as the following −
We know that merge sort first divides the whole array iteratively into equal halves unless the
atomic values are achieved. We see here that an array of 8 items is divided into two arrays of
size 4.
This does not change the sequence of appearance of items in the original. Now we divide
these two arrays into halves.
We further divide these arrays and we achieve atomic value which can no more be divided.
Now, we combine them in exactly the same manner as they were broken down. Please note
the color codes given to these lists.
We first compare the element for each list and then combine them into another list in a
sorted manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10 and in
the target list of 2 values we put 10 first, followed by 27. We change the order of 19 and 35
whereas 42 and 44 are placed sequentially.
In the next iteration of the combining phase, we compare lists of two data values, and merge
them into a list of found data values placing all in a sorted order.
After the final merging, the list should look like this −
Now we should learn some programming aspects of merge sorting.

Algorithm
Merge sort keeps on dividing the list into equal halves until it can no more be divided. By
definition, if it is only one element in the list, it is sorted. Then, merge sort combines the
smaller sorted lists keeping the new list sorted too.
Step 1 − if it is only one element in the list it is already sorted, return.
Step 2 − divide the list recursively into two halves until it can no more be divided.
Step 3 − merge the smaller lists into new list in sorted order.
HEAP SORT
Heap sort is performed on the heap data structure. We know that heap is a complete binary
tree. Heap tree can be of two types. Min-heap or max heap. For min heap the root element is
minimum and for max heap the root is maximum. After forming a heap, we can delete an
element from the root and send the last element to the root. After these swapping procedure,
we need to re-heap the whole array. By deleting elements from root we can sort the whole
array.
The complexity of Heap Sort Technique
1. Time Complexity: O(n log n)

2. Space Complexity: O(1)
Input and Output

Input:
A list of unsorted data: 30 8 99 11 24 39
Output:
Array before Sorting: 30 8 99 11 24 39
Array after Sorting: 8 11 24 30 39 99
Algorithm
heapSort(array, size)
Input: An array of data, and the total number in the array
Output: sorted array
Begin
for i := n to 1 decrease by 1 do
heapify(array, i)
swap array[1] with array[i]
done
End
Heap is a special case of balanced binary tree data structure where the root-node key is
compared with its children and arranged accordingly. If α has child node β then −
key(α) ≥ key(β)
As the value of parent is greater than that of child, this property generates Max Heap. Based
on this criteria, a heap can be of two types −
For Input → 35 33 42 10 14 19 27 44 26 31
Min-Heap − Where the value of the root node is less than or equal to either of its children.
Max-Heap − Where the value of the root node is greater than or equal to either of its
children.
Both trees are constructed using the same input and order of arrival.
Heaps
A heap is a tree-based data structure in which all the nodes of the tree are in a specific order.
For example, if X is the parent node of Y, then the value of X follows a specific order with
respect to the value of Y and the same order will be followed across the tree.
The maximum number of children of a node in a heap depends on the type of heap. However,
in the more commonly-used heap type, there are at most 2 children of a node and it's known
as a Binary heap.
In binary heap, if the heap is a complete binary tree with N nodes, then it has smallest
possible height which is log2N .
In the diagram above, you can observe a particular sequence, i.e each node has greater value
than any of its children.
Suppose there are N Jobs in a queue to be done, and each job has its own priority. The job
with maximum priority will get completed first than others. At each instant, we are
completing a job with maximum priority and at the same time we are also interested in
inserting a new job in the queue with its own priority.
So at each instant we have to check for the job with maximum priority to complete it and also
insert if there is a new job. This task can be very easily executed using a heap by
considering N jobs as N nodes of the tree.
As you can see in the diagram below, we can use an array to store the nodes of the tree. Let’s
say we have 7 elements with values {6, 4, 5, 3, 2, 0, 1}.
Note: An array can be used to simulate a tree in the following way. If we are storing one
element at index i in array Arr, then its parent will be stored at index i/2 (unless its a root, as
root has no parent) and can be accessed by Arr[i/2], and its left child can be accessed
by Arr[2∗i] and its right child can be accessed by Arr[2∗i+1]. Index of root will be 1 in an
array.
There can be two types of heap:
Max Heap: In this type of heap, the value of parent node will always be greater than or equal
to the value of child node across the tree and the node with highest value will be the root
node of the tree.
Implementation:
Let’s assume that we have a heap having some elements which are stored in array Arr. The
way to convert this array into a heap structure is the following. We pick a node in the array,
check if the left sub-tree and the right sub-tree are max heaps, in themselves and the node
itself is a max heap (it’s value should be greater than all the child nodes)
To do this we will implement a function that can maintain the property of max heap (i.e each
element value should be greater than or equal to any of its child and smaller than or equal to
its parent)
void max_heapify (int Arr[ ], int i, int N)

{
int left = 2*i //left child
int right = 2*i +1 //right child
if(left<= N and Arr[left] > Arr[i] )
largest = left;
else
largest = i;
if(right <= N and Arr[right] > Arr[largest] )
largest = right;
if(largest != i )
{
swap (Arr[i] , Arr[largest]);
max_heapify (Arr, largest,N);
}
}
Complexity: O(logN)
Example:
In the diagram below,initially 1st node (root node) is violating property of max-heap as it has
smaller value than its children, so we are performing max_heapify function on this node
having value 4.
As 8 is greater than 4, so 8 is swapped with 4 and max_heapify is performed again on 4, but

on different position. Now in step 2, 6 is greater than 4, so 4 is swapped with 6 and we will
get a max heap, as now 4 is a leaf node, so further call to max_heapify will not create any
effect on heap.
Now as we can see that we can maintain maxheap by using max_heapify function.
Before moving ahead, lets observe a property which states: A N element heap stored in an
array has leaves indexed by N/2+1, N/2+2 , N/2+3 …. upto N.
Let’s observe this with an example:
Lets take above example of 7 elements having values {8, 7, 6, 3, 2, 4, 5}.

So you can see that elements 3, 2, 4, 5 are indexed by N/2+1 (i.e 4), N/2+2 (i.e 5 )
and N/2+3 (i.e 6) and N/2+4 (i.e 7) respectively.
Building MAX HEAP:
Now let’s say we have N elements stored in the array Arr indexed from 1 to N. They are
currently not following the property of max heap. So we can use max-heapify function to
make a max heap out of the array.
How?
From the above property we observed that elements from Arr[N/2+1] to Arr[N] are leaf
nodes, and each node is a 1 element heap. We can use max_heapify function in a bottom up
manner on remaining nodes, so that we can cover each node of tree.
void build_maxheap (int Arr[ ])

{
for(int i = N/2 ; i >= 1 ; i-- )
{
max_heapify (Arr, i) ;
}
}
Complexity: O(N). max_heapify function has complexity logN and

the build_maxheap functions runs only N/2 times, but the amortized complexity for this
function is actually linear.
For more details, you can refer this.
Example:
Suppose you have 7 elements stored in array Arr.
Here N=7, so starting from node having index N/2=3, (also having value 3 in the above
diagram), we will call max_heapify from index N/2 to 1.
In the diagram below:
In step 1, in max_heapify(Arr, 3), as 10 is greater than 3, 3 and 10 are swapped and further
call to max_heap(Arr, 7) will have no effect as 3 is a leaf node now.
In step 2, calling max_heapify(Arr, 2) , (node indexed with 2 has value 4) , 4 is swapped with
8 and further call to max_heap(Arr, 5) will have no effect, as 4 is a leaf node now.
In step 3, calling max_heapify(Arr, 1) , (node indexed with 1 has value 1 ), 1 is swapped with
10 .
Step 4 is a subpart of step 3, as after swapping 1 with 10, again a recursive call of
max_heapify(Arr, 3) will be performed , and 1 will be swapped with 9. Now further call to
max_heapify(Arr, 7) will have no effect, as 1 is a leaf node now.
In step 5, we finally get a maxheap and the elements in the array Arr will be :
Min Heap: In this type of heap, the value of parent node will always be less than or equal to
the value of child node across the tree and the node with lowest value will be the root node of
tree.
As you can see in the above diagram, each node has a value smaller than the value of their
children.
We can perform same operations as performed in building max_heap.
First we will make function which can maintain the min heap property, if some element is
violating it.
void min_heapify (int Arr[ ] , int i, int N)

{
int left = 2*i;
int right = 2*i+1;
int smallest;
if(left <= N and Arr[left] < Arr[ i ] )
smallest = left;
else
smallest = i;
if(right <= N and Arr[right] < Arr[smallest] )
smallest = right;
if(smallest != i)
{
swap (Arr[ i ], Arr[ smallest ]);
min_heapify (Arr, smallest,N);
}
}
Complexity: O(logN) .
Example:
Suppose you have elements stored in array Arr {4, 5, 1, 6, 7, 3, 2}. As you can see in the
diagram below, the element at index 1 is violating the property of min -heap, so performing
min_heapify(Arr, 1) will maintain the min-heap.
Now let’s use above function in building min-heap. We will run the above function on
remaining nodes other than leaves as leaf nodes are 1 element heap.
void build_minheap (int Arr[ ])

{
for( int i = N/2 ; i >= 1 ; i--)
min_heapify (Arr, i);
}
Complexity: O(N). The complexity calculation is similar to that of building max heap.
Example:
Consider elements in array {10, 8, 9, 7, 6, 5, 4} . We will run min_heapify on nodes indexed
from N/2 to 1. Here node indexed at N/2 has value 9. And at last, we will get a min_heap.
Heaps can be considered as partially ordered tree, as you can see in the above examples that
the nodes of tree do not follow any order with their siblings(nodes on the same level). They
can be mainly used when we give more priority to smallest or the largest node in the tree as
we can extract these node very efficiently using heaps.
APPLICATIONS:
1) Heap Sort:
We can use heaps in sorting the elements in a specific order in efficient time.
Let’s say we want to sort elements of array Arr in ascending order. We can use max heap to
perform this operation.
Idea: We build the max heap of elements stored in Arr, and the maximum element
of Arr will always be at the root of the heap.
Leveraging this idea we can sort an array in the following manner.
Processing:
 Initially we will build a max heap of elements in Arr.

 Now the root element that is Arr[1] contains maximum element of Arr. After that, we
will exchange this element with the last element of Arr and will again build a max
heap excluding the last element which is already in its correct position and will
decrease the length of heap by one.
 We will repeat the step 2, until we get all the elements in their correct position.
 We will get a sorted array.
Implementation:
Suppose there are N elements stored in array Arr.
void heap_sort(int Arr[ ])

{
int heap_size = N;
build_maxheap(Arr);
for(int i = N; i>=2 ; i-- )
{
swap(Arr[ 1 ], Arr[ i ]);
heap_size = heap_size-1;
max_heapify(Arr, 1, heap_size);
}
}
Complexity: As we know max_heapify has complexity O(logN), build_maxheap has

complexity O(N) and we run max_heapify N−1 times in heap_sort function, therefore
complexity of heap_sort function is O(NlogN).
Example:
In the diagram below,initially there is an unsorted array Arr having 6 elements. We begin by
building max-heap.
After building max-heap, the elements in the array Arr will be:
Processing:
Step 1: 8 is swapped with 5.

Step 2: 8 is disconnected from heap as 8 is in correct position now.
Step 3: Max-heap is created and 7 is swapped with 3.
Step 4: 7 is disconnected from heap.
Step 5: Max heap is created and 5 is swapped with 1.
Step 10: 3 is disconnected.
After all the steps, we will get a sorted array.
2) Priority Queue:
Priority Queue is similar to queue where we insert an element from the back and remove an
element from front, but with a difference that the logical order of elements in the priority
queue depends on the priority of the elements. The element with highest priority will be
moved to the front of the queue and one with lowest priority will move to the back of the
queue. Thus it is possible that when you enqueue an element at the back in the queue, it can
move to front because of its highest priority.
Example:
Let’s say we have an array of 5 elements : {4, 8, 1, 7, 3} and we have to insert all the
elements in the max-priority queue.
First as the priority queue is empty, so 4 will be inserted initially.
Now when 8 will be inserted it will moved to front as 8 is greater than 4.
While inserting 1, as it is the current minimum element in the priority queue, it will remain in
the back of priority queue.
Now 7 will be inserted between 8 and 4 as 7 is smaller than 8.
Now 3 will be inserted before 1 as it is the 2nd minimum element in the priority queue. All the
steps are represented in the diagram below:
We can think of many ways to implement the priority queue.
Naive Approach:
Suppose we have N elements and we have to insert these elements in the priority queue. We
can use list and can insert elements in O(N) time and can sort them to maintain a priority
queue in O(NlogN) time.
Efficient Approach:
We can use heaps to implement the priority queue. It will take O(logN) time to insert and
delete each element in the priority queue.
Based on heap structure, priority queue also has two types max- priority queue and min -
priority queue.
Let’s focus on Max Priority Queue.
Max Priority Queue is based on the structure of max heap and can perform following
operations:
maximum(Arr) : It returns maximum element from the Arr.

extract_maximum (Arr) - It removes and return the maximum element from the Arr.
increase_val (Arr, i , val) - It increases the key of element stored at index i in Arr to new
value val.
insert_val (Arr, val ) - It inserts the element with value val in Arr.
Implementation:
length = number of elements in Arr.

Maximum :
int maximum(int Arr[ ])

{
return Arr[ 1 ]; //as the maximum element is the root element in the max heap.
}
Complexity: O(1)
Extract Maximum: In this operation, the maximum element will be returned and the last
element of heap will be placed at index 1 and max_heapify will be performed on node 1 as
placing last element on index 1 will violate the property of max-heap.
int extract_maximum (int Arr[ ])

{
if(length == 0)
{
cout<< “Can’t remove element as queue is empty”;
return -1;
}
int max = Arr[1];
Arr[1] = Arr[length];
length = length -1;
max_heapify(Arr, 1);
return max;
}
Complexity: O(logN).
Increase Value: In case increasing value of any node, it may violate the property of max-
heap, so we may have to swap the parent’s value with the node’s value until we get a larger
value on parent node.
void increase_value (int Arr[ ], int i, int val)

{
if(val < Arr[ i ])
{
cout<<”New value is less than current value, can’t be inserted” <<endl;
return;
}
Arr[ i ] = val;
while( i > 1 and Arr[ i/2 ] < Arr[ i ])
{
swap(Arr[ i/2 ], Arr[ i ]);
i = i/2;
}
}
Complexity : O(logN).
Insert Value :
void insert_value (int Arr[ ], int val)
{
length = length + 1;
Arr[ length ] = -1; //assuming all the numbers greater than 0 are to be inserted in queue.
increase_val (Arr, length, val);
}
Complexity: O(logN).
Example:
Initially there are 5 elements in priority queue.

Operation: Insert Value(Arr, 6)
In the diagram below, inserting another element having value 6 is violating the property of
max-priority queue, so it is swapped with its parent having value 4, thus maintaining the max
priority queue.
Operation: Extract Maximum:

In the diagram below, after removing 8 and placing 4 at node 1, violates the property of max-
priority queue. So max_heapify(Arr, 1) will be performed which will maintain the property of
max - priority queue.
As discussed above, like heaps we can use priority queues in scheduling of jobs. When there
are N jobs in queue, each having its own priority. If the job with maximum priority will be
completed first and will be removed from the queue, we can use priority queue’s operation
extract_maximum here. If at every instant we have to add a new job in the queue, we can
use insert_value operation as it will insert the element in O(logN) and will also maintain the
property of max heap.
Storage device
Alternatively referred to as digital storage, storage, storage media, or storage medium,

a storage device is any hardware capable of holding information either temporarily or
permanently. The picture shows an example of a Drobo, an external secondary storage device.
There are two types of storage devices used with computers: a primary storage device, such
as RAM, and a secondary storage device, such as a hard drive. Secondary storage can
be removable, internal, or external.
Examples of computer storage

Magnetic storage devices
Today, magnetic storage is one of the most common types of storage used with computers.
This technology is found mostly on extremely large HDDs or hybrid hard drives.
 Floppy diskette
 Hard drive
 Magnetic strip
 SuperDisk
 Tape cassette
 Zip diskette
Optical storage devices
Another common type of storage is optical storage, which uses lasers and lights as its method
of reading and writing data.
 Blu-ray disc
 CD-ROM disc
 CD-R and CD-RW disc.
 DVD-R, DVD+R, DVD-RW, and DVD+RW disc.
Flash memory devices
Flash memory has replaced most magnetic and optical media as it becomes cheaper because it
is the more efficient and reliable solution.
 USB flash drive, jump drive, or thumb drive.
 CF (CompactFlash)
 M.2
 Memory card
 MMC
 NVMe
 SDHC Card
 SmartMedia Card
 Sony Memory Stick
 SD card
 SSD
 xD-Picture Card
Online and cloud
Storing data online and in cloud storage is becoming popular as people need to access their
data from more than one device.
 Cloud storage
 Network media
Paper storage
Early computers had no method of using any of the technologies above for storing
information and had to rely on paper. Today, these forms of storage are rarely used or found.
In the picture is an example of a woman entering data to a punch card using a punch card
machine.
 OMR
 Punch card
Why is storage needed in a computer?
Without a storage device, a computer cannot save or remember any settings or information
and would be considered a dumb terminal.
Although a computer can run with no storage device, it would only be able to view
information, unless it was connected to another computer that had storage capabilities.
Even a task, such as browsing the Internet, requires information to be stored on your
computer.
Why so many different storage devices?
As computers advance, the technologies used to store data do too, right along with higher
requirements for storage space. Because people need more and more space, want it faster,
cheaper, and want to take it with them, new technologies have to be invented. When new
storage devices are designed, as people upgrade to those new devices, the older devices are no
longer needed and stop being used.
For example, when punch cards were first used in early computers, the magnetic media used
for floppy disks was not available. After floppy diskettes were released, they were replaced by
CD-ROM drives, which were replaced by DVD drives, which have been replaced by flash
drives. The first hard disk drive from IBM cost $50,000, was only 5 MB, big, and
cumbersome. Today, we have smartphones that have hundreds of times the capacity at a much
smaller price that we can carry in our pocket.
Each advancement of storage devices gives a computer the ability to store more data, as well
as save and access data faster.
What is a storage location?
When saving anything on a computer, it may ask you for a storage location, which is the area
where you would like to save the information. By default, most information is saved to your
computer hard drive. If you want to move the information to another computer, save it to a
removable storage device, such as a USB flash drive.
INDEXING TECHNIQUE
We know that data is stored in the form of records. Every record has a key field, which helps
it to be recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database files
based on some attributes on which the indexing has been done. Indexing in database systems
is similar to what we see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
 Primary Index − Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a
candidate key and has a unique value in every record, or a non-key with duplicate
values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file
is ordered on a non-key field.
Ordered Indexing is of two types −
 Dense Index
 Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database. This
makes searching faster but requires more space to store index records itself. Index records
contain search key value and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record here
contains a search key and an actual pointer to the data on the disk. To search a record, we
first proceed by index record and reach at the actual location of the data. If the data we are
looking for is not where we directly reach by following the index, then the system starts
sequential search until the desired data is found.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on
the disk along with the actual database files. As the size of the database grows, so does the
size of the indices. There is an immense need to keep the index records in the main memory
so as to speed up the search operations. If single-level index is used, then a large size index
cannot be kept in memory which leads to multiple disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order to
make the outermost level so small that it can be saved in a single disk block, which can
easily be accommodated anywhere in the main memory.
B Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order
m can have at most m-1 keys and m children. One of the main reason of using B tree is its
capability to store large number of keys in a single node and large key values by keeping the
height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains the
following properties.
1. Every node in a B-Tree contains at most m children.

2. Every node in a B-Tree except the root node and the leaf node contain at least m/2
children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.
It is not necessary that, all the nodes contain the same number of children but, each node
must have m/2 number of nodes.
A B tree of order 4 is shown in the following image.

While performing some operations on B Tree, any property of B Tree may violate such as
number of minimum children a node can have. To maintain the properties of B Tree, the tree
may split or join.
Operations
Searching :
Searching in B Trees is similar to that in Binary search tree. For example, if we search for an
item 49 in the following B Tree. The process will something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Searching in a B tree depends upon the height of the tree. The search algorithm takes O(log
n) time to search any element in a B tree.
Inserting
Insertions are done at the leaf node level. The following algorithm needs to be followed in
order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can
be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing
order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
Example:
Insert the node 8 into the B Tree of order 5 shown in the following image.
8 will be inserted to the right of 5, therefore insert 8.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the
node from the median i.e. 8 and push it up to its parent node shown as follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a
leaf node or an internal node. Following algorithm needs to be followed in order to delete a
node from a B tree.
1. Locate the leaf node.

2. If there are more than m/2 keys in the leaf node then delete the desired key from the
node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the element
from eight or left sibling.
o If the left sibling contains more than m/2 elements then push its largest
element up to its parent and move the intervening element down to the node
where the key is deleted.
o If the right sibling contains more than m/2 elements then push its smallest
element up to the parent and move intervening element down to the node
where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new leaf node
by joining two leaf nodes and the intervening element of the parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on the parent
too.
If the the node which is to be deleted is an internal node, then replace the node with its in-
order successor or predecessor. Since, successor or predecessor will always be on the leaf
node hence, the process will be similar as the node is being deleted from the leaf node.
Example 1
Delete the node 53 from the B Tree of order 5 shown in the following figure.
53 is present in the right child of element 49. Delete it.
Now, 57 is the only element which is left in the node, the minimum number of elements that
must be present in a B tree of order 5, is 2. it is less than that, the elements in its left and right
sub-tree are also not sufficient therefore, merge it with the left sibling and intervening
element of parent i.e. 49.
The final B tree is shown as follows.
Deletion in B-Tree
For deletion in b tree we wish to remove from a leaf. There are three possible case for
deletion in b tree.
Let k be the key to be deleted, x the node containing the key. Then the cases are:
Case-I
If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too
few keys, then simply remove the key to be deleted. key k is in node x and x is a leaf, simply
delete k from x.
6 deleted
Case-II
If key k is in node x and x is an internal node, there are three cases to consider:
Case-II-a
If the child y that precedes k in node x has at least t keys (more than the minimum), then find
the predecessor key k' in the subtree rooted at y. Recursively delete k' and replace k with k' in
x
Case-II-b
Symmetrically, if the child z that follows k in node x has at least t keys, find the successor k'
and delete and replace as before. Note that finding k' and deleting it can be performed in a
single downward pass.
13 deleted
Case-II-c
Otherwise, if both y and z have only t−1 (minimum number) keys, merge k and all of z into
y, so that both k and the pointer to z are removed from x. y now contains 2t − 1 keys, and
subsequently k is deleted.
7 deleted
Case-III
If key k is not present in an internal node x, determine the root of the appropriate subtree that
must contain k. If the root has only t − 1 keys, execute either of the following two cases to
ensure that we descend to a node containing at least t keys. Finally, recurse to the appropriate
child of x.
Case-III-a
If the root has only t−1 keys but has a sibling with t keys, give the root an extra key by
moving a key from x to the root, moving a key from the roots immediate left or right sibling
up into x, and moving the appropriate child from the sibling to x.
2 deleted
Case-III-b
If the root and all of its siblings have t−1 keys, merge the root with one sibling. This involves
moving a key down from x into the new merged node to become the median key for that
node.
4 deleted
Application of B tree
B tree is used to index the data and provides fast access to the actual data stored on the disks
since, the access to value stored in a large database that is stored on a disk is a very time
consuming process.
Searching an un-indexed and unsorted database containing n key values needs O(n) running
time in worst case. However, if we use B Tree to index this database, it will be searched in
O(log n) time in worst case.
B+ Tree
B+ Tree is an extension of B Tree which allows efficient insertion, deletion and search
operations.
In B Tree, Keys and records both can be stored in the internal as well as leaf nodes. Whereas,
in B+ tree, records (data) can only be stored on the leaf nodes while internal nodes can only
store the key values.
The leaf nodes of a B+ tree are linked together in the form of a singly linked lists to make the
search queries more efficient.
B+ Tree are used to store the large amount of data which can not be stored in the main
memory. Due to the fact that, size of main memory is always limited, the internal nodes (keys
to access records) of the B+ tree are stored in the main memory whereas, leaf nodes are
stored in the secondary memory.
The internal nodes of B+ tree are often called index nodes. A B+ tree of order 3 is shown in
the following figure.
Advantages of B+ Tree
1. Records can be fetched in equal number of disk accesses.

2. Height of the tree remains balanced and less as compare to B tree.
3. We can access the data stored in a B+ tree sequentially as well as directly.
4. Keys are used for indexing.
5. Faster search queries as the data is stored only on the leaf nodes.
B Tree VS B+ Tree
SN B Tree B+ Tree
1 Search keys can not be Redundant search keys

repeatedly stored. can be present.
2 Data can be stored in Data can only be stored

leaf nodes as well as on the leaf nodes.
internal nodes
3 Searching for some data Searching is

is a slower process since comparatively faster as
data can be found on data can only be found
internal nodes as well as on the leaf nodes.
on the leaf nodes.
4 Deletion of internal Deletion will never be a

nodes are so complicated complexed process since
and time consuming. element will always be
deleted from the leaf
nodes.
5 Leaf nodes can not be Leaf nodes are linked

linked together. together to make the
search operations more
efficient.
Insertion in B+ Tree
Step 1: Insert the new node as a leaf node
Step 2: If the leaf doesn't have required space, split the node and copy the middle node to the
next index node.
Step 3: If the index node doesn't have required space, split the node and copy the middle
element to the next index page.
Example :
Insert the value 195 into the B+ tree of order 5 shown in the following figure.
195 will be inserted in the right sub-tree of 120 after 190. Insert it at the desired position.
The node contains greater than the maximum number of elements i.e. 4, therefore split it and
place the median node up to the parent.
Now, the index node contains 6 children and 5 keys which violates the B+ tree properties,
therefore we need to split it, shown as follows.
Deletion in B+ Tree
Step 1: Delete the key and data from the leaves.
Step 2: if the leaf node contains less than minimum number of elements, merge down the
node with its sibling and delete the key in between them.
Step 3: if the index node contains less than minimum number of elements, merge the node
with the sibling and move down the key in between them.
Example
Delete the key 200 from the B+ Tree shown in the following figure.
200 is present in the right sub-tree of 190, after 195. delete it.
Merge the two nodes by using 195, 190, 154 and 129.
Now, element 120 is the single element present in the node which is violating the B+ Tree
properties. Therefore, we need to merge it by using 60, 78, 108 and 120.
Now, the height of B+ tree will be decreased by 1.

Unit 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5

Uploaded by

Copyright:

Available Formats

SEARCHING AND SORTING

Linear Search ( Array A, Value x)

How Binary Search Works?

First, we shall determine half of the array by using this formula −

Hence, we calculate the mid again. This time it is 5.

We conclude that the target value 31 is stored at location 5.

The pseudocode of binary search algorithms should look like this −

while x not found

set midPoint = ( upperBound+lowerBound ) / 2

What is internal sorting and external sorting?

How Bubble Sort Works?

The new array should look like this −

Then we move to the next two values, 35 and 10.

Now we should look into some practical aspects of bubble sort.

for all elements of list

How Selection Sort Works?

Consider the following depicted array as an example.

/* check the element to be minimum */

/* swap the minimum element with the current element*/

How Insertion Sort Works?

We take an unsorted array for our example.

Insertion sort compares the first two elements.

Insertion sort moves ahead and compares 33 with 27.

And finds that 33 is not in the correct position.

These values are not in a sorted order.

However, swapping makes 27 and 10 unsorted.

Hence, we swap them too.

Again we find 14 and 10 in an unsorted order.

procedure insertionSort( A : array of items )

for i = 1 to length(A) inclusive do:

/* select value to be inserted */

/*locate hole position for the element to be inserted */

while holePosition > 0 and A[holePosition-1] > valueToInsert do:

/* insert the number at hole position */

Quick Sort Pivot Algorithm

2-WAY MERGE SORT OR MERGE SORT

How Merge Sort Works?

To understand merge sort, we take an unsorted array as the following −

Now we should learn some programming aspects of merge sorting.

1. Time Complexity: O(n log n)

Input and Output

void max_heapify (int Arr[ ], int i, int N)

As 8 is greater than 4, so 8 is swapped with 4 and max_heapify is performed again on 4, but

Let’s observe this with an example:

Lets take above example of 7 elements having values {8, 7, 6, 3, 2, 4, 5}.

Building MAX HEAP:

void build_maxheap (int Arr[ ])

Complexity: O(N). max_heapify function has complexity logN and

In the diagram below:

void min_heapify (int Arr[ ] , int i, int N)

void build_minheap (int Arr[ ])

Leveraging this idea we can sort an array in the following manner.

 Initially we will build a max heap of elements in Arr.

Suppose there are N elements stored in array Arr.

void heap_sort(int Arr[ ])

Complexity: As we know max_heapify has complexity O(logN), build_maxheap has

Step 1: 8 is swapped with 5.

Let’s focus on Max Priority Queue.

maximum(Arr) : It returns maximum element from the Arr.

length = number of elements in Arr.

int maximum(int Arr[ ])

int extract_maximum (int Arr[ ])

void increase_value (int Arr[ ], int i, int val)

/locate hole position for the element to be inserted /