You are on page 1of 21

SORTING TECHNIQUES

Introduction:
Sorting is one of the most important operations performed by
computers. Sorting a set of values – arranging them in a fixed order,usually
alphabetical or numerical is one of the common computing applications.
The process of getting elements into order is called Sorting. Sorting is a
good example to perform a task according to many different algorithms each one
having certain advantages and disadvantages that have to be weighed against each
other in the light of the particular application. Ideally,one uses different
algorithms depending on whether one is sorting a small set or a large one,on
whether the individual elements of the set occupy a lot of storage,on how easy it is
to compare two elements to figure out which one should precede the other,and so
on. That means different applications require different sorting methods. The
degree of order in the input data also influences the performance of sorting
methods.
Sorting is generally understood to be the process of rearranging a
given set of objects in a specific order. The purpose of sorting is to facilitate the
later search for members of the sorted set. As such it is an almost universally
performed, fundamental activity. Objects are sorted in telephone books,in income
tax files, in tables of contents,in libraries, in dictionaries,in warehouses,and almost
everywhere that sorted objects have to be searched and retrieved.
Hence,sorting is a relevant and essential activity,particularly in data
processing. What else would be easier to sort than data! Nevertheless, our
primary interest in sorting is devoted to the even more fundamental techniques
used in the construction of algorithms. In particular sorting is an ideal subject to
demonstrate a great diversity of algorithms, all having the same purpose, many of
them being optimal in some sense,and most of them having advantages over
others. It is therefore to demonstrate the necessity of performance analysis of
algorithms. The example of sorting is moreover well suited for showing how a
very significant gain in performance may be obtained by the development of
sophisticated algorithms when obvious methods are readily available.
The dependence of choice of an algorithm on the structure of the data
to be processed is so profound in the case of sorting that sorting methods are
generally classified into two categories, namely, sorting of arrays and sorting of
(sequential) files. The two classes are often called internal and external sorting
because arrays are sorted in the fast, high-speed, random-access “internal” store
of computers and files are appropriate on the slower , but more spacious
“external” stores based on mechanically moving devices (disks and tapes). The
importance of this distinction is obvious from the example of sorting numbered
cards. Structuring the cards of an array corresponds to laying them out in front of
the sorter so that each card is visible and individually accessible. Structuring the
cards as a file, however, implies that from each file only the card on the top is
visible. Such a restriction will evidently have serious consequences on the sorting
method to be used, but it is unavoidable if the number of cards to be laid out is
larger than the available table.
Sorting algorithms can be computationally expensive,
particularly as data sets become large. They can be classified into simple
algorithms and sophisticated algorithms. To sort n data elements, simple
techniques require the order of n² (O(n²)) comparisons where as sophisticated
techniques require O(n logn) comparisons. In general, sorting methods involves
the exchange of elements and moving a portion of data. For a large data elements
set , these operations requires considerable amount of processing time. Thus, one
of the main objective of designing sorting algorithms is to minimize exchanges or
movement of data.
Selection Sort :
The selection sort searches all of the elements in a list until it finds the
smallest element. It “swaps” this with the first element in the list. Next it finds the
smallest of the remaining elements , and “swaps” it with the second element.
Repeat this process until only the last two elements in the list are compared.

Explanation with an example:
Consider the data elements 5 2 1 3 6 4.
This method starts by searching for the smallest element in the list.
5 2 1 3 6 4



Smallest
Swap the smallest element with the first element.
1 2 5 3 6 4
After the smallest element is in the first position , we continue the process
by taking second element and looking for the next smallest element in the
remaining unsorted array.

1 2 5 3 6 4

smallest
In this special case, the next smallest element is in the second position
already. Swapping the element with itself keeps it in the same position.
1 2 5 3 6 4
After the next smallest element is in the second position, we continue as
above for the next smallest element.
1 2 5 3 6 4

smallest
swap this smallest element with the third element.
1 2 3 5 6 4
smallest
swap this element with the fourth element.
1 2 3 4 6 5
Continue in the same way for the next smallest element we have
1 2 3 4 5 6
It can be noted that the largest of all the elements is in the last
position after completion of all passes. That means the maximum number of
passes required to sort N elements is N-1.
Pseudo code for sorting N elements using selection sort :
For index =1 to N-1 do
begin
Find the smallest value in position index through N
Interchange the smallest value with that in position index.
End.

Algorithm :
Procedure SelectionSort (A,N)
1. For index=1 to N-1 do
begin
2. MinPosition <-- index
3. For j=index+1 to N do
begin
4. If A[j]<A[MinPosition] then
MinPosition <-- j
end
5. swap(A[index],A[MinPosition])
end
6. Return.
In step 5 we are swapping the smallest element with current index
element. However when array is almost sorted,this swapping is unnecessary and
step can be modified by providing a simple condition whether the smallest
element position is same as current index element.
The number of comparisons is N-1 in the first pass,N-2 in the second
pass,N-3 passes in the third pass , and so on, and N-i in the ith pass. Hence there
are at most
(N-1)+(N-2)+(N-3)+...............+1= N(N-1)/2 comparisons
Time Complexity is [O(N)]²

Selection sort is efficient than Bubble sort and Insertion sort in the
sense that there are no more than N-1 actual exchanges. Thus the selection is very
much suitable in the application in which excessive swapping is a problem.
Since the number of comparisons cannot be limited in a particular
pass , there is no significance if the input data is completely sorted or unsorted.
That is, performance for both Best case and Worst case is same in selection sort.
C Program using Selection Sort :
#include <stdio.h>
main()
{ int a[20],n,i;
printf(“\n How many numbers”);
scanf(“%d”,&n);
printf(“\n Enter %d numbers”,n);
for(i=0;i<n;++i)
scanf(“%d”,%a[i]);
printf(“\n Numbers before sort : \n”);
for(i=0;i<n;++i)
printf(“%d”,a[i]);
SelectionSort(a,n);
printf(“\n Numbers after sort:\n”);
for(i=0;i<n;++i)
printf(“%d”,a[i]);
}
SelectionSort(int a[],int n)
{
int index, minposition,t,j;
for(index=0;index<n-1;index++)
{
minposition=index;
for(j=index+1;j<n;index++)
{
if(a[j]<a[minposition])
minposition=j;
}
t=a[index];
a[index]=a[minposition];
a[minposition]=t;
}
}
Insertion Sort :
Insertion sort resembles the arrangement of cards in card-player game.
When a new card is picked ,it is inserted into its proper place.
One of the simplest sorting algorithms is the Insertion sort. It consists of n-1
passes. For pass p=2 through n, insertion sort ensures that the elements in position
1 through p are in sorted order. It makes use of the fact that elements in positions
1 through p-1 are already known to be in sorted order.
Example :
original 34 8 64 51 32 21 positions moved
After p=2 8 34 64 51 32 21 1
After p=3 8 34 64 51 32 21 0
After p=4 8 34 51 64 32 21 1
After p=5 8 32 34 51 64 21 3
After p=2 8 21 32 34 51 64 4
In pass p, we move the pth element left until its correct place is found
among the first p elements.
Pseudo code for Insertion sort :
For index=2 to N do
begin
put the element value into its correct position relative to values
between positions 1 and index.
End
Algorithm :
Procedure InsertionSort(A,N)
/* Array A contains N items to be sorted. Initially the first item is
considered ‘sorted’ index divides a into a sorted region, x<i,and an unsorted
one,x>=i */
1.For index=2 to N do
begin
2.temp <- A[index]
3.position<-index
4.while(position>1) and (A[position-1]>temp) do
begin
5. A[position<- A[position-1]
6.position<-position-1
end
7.A[position]<- temp
end
8.Return
Analysis of Insertion sort :
Because of the nested loops, each of which can take n iterations, insertion
sort is O(n²). Furthermore , this bound is tight because input in reverse order can
actually achieve this bound. The inner loop can be executed at most p times for
each value of p. Summing over all ‘p’ gives a total of
n
∑p=2+3+4+...............n = O(n²)
p=2
On the other hand, if the input is presorted, the running time is O(n)
because the test at the top of the inner loop is always fails immediately. Indeed, if
the input is almost sorted, insertion sort will run quickly. Because of this wide
variation, it is worth analyzing the average case behavior of this algorithm. It
turns out that the average case is O(n²) for insertion sort as well as for a variety of
other sorting algorithms.
Program :
#include <stdio.h>
main()
{
int a[20],n,i;
printf(“\n how many numbers”);
scanf(“%d”,&n);
printf(“\n Enter %d Numbers”,n);
for(i=0;i<n;++i)
scanf(“%d”,&a[i]);
printf(“\n Numbers before sort:\n”);
for(i=0;i<n;++i)
printf(“%d”,a[i]);
insertionsort(a,n);
printf(“\n Numbers after sort :\n”);
for(i=0;i<n;++i)
printf(“%d”,a[i]);
}
insertionsort(int a[],int n)
{
int index,temp,position;
for(index1;index<n;++index)
{
temp=a[index]
position=index;
while(position>0&& a[position-1]>temp)
{
a[position]=a[position-1];
--position;
}
a[position]=temp;
}
}
Merge Sort :
Merge sort runs in O(n logn) worst case running time and the number of
comparisons used is nearly optimal.
It is a fine example of recursive algorithm.
The fundamental operation in this algorithm is merging two sorted lists.
Because the lists are sorted, this can be done in one pass through the input if the output
is put in a third list.
The basic merging algorithm takes two input arrays a and b, an output array
c, and three counters , aptr ,bptr and cptr, which are initially set to the beginning of their
respective arrays. The smaller of a[aptr] and b[bptr] is copied to the next entry in c, and
the appropriate counters are advanced. When either input list is exhausted, the remainder
of the other list is copied to c. An example of how the merge routine works is provided
for the following input.

Merge sort Analysis :
Merge sort is a classic example of the techniques used to analyze recursive
routines. It is not obvious that merge sort can easily be rewritten without recursion, so
we have to write a recurrence relation for the running time. We will always assume that
n is a power of 2 so that we always split into two even halves. For n=1,the time to merge
sort is constant, which we will denote by 1. Otherwise, the time to merge sort n numbers
is equal to the time to do two recursive merge sorts of size n/2,plus the time to
merge,which is linear.
T(1) = 1
T(n) = 2T(n/2) + n
This is a standard recurrence relation that can be solved several ways.The one
main idea is to divide the recurrence relation through by n. This yields
T(n)/n = T(n/2)/(n/2) + 1
This equation is valid for any n that is a power of 2, so we may also write
T(n/2)/(n/2) = T(n/4)/(n/4) + 1
and
T(n/4)/(n/4) = T(n/8)/(n/8) + 1
.
.
.
T(2)/2 = T(1)/1 + 1
Now add up all the equations. After everything is added the final result is
because all of the other terms cancel and there are logn equations , and so all the 1s at
the end of these equations add up to logn.
T(n)/n = T(1)/1 + logn
Multiplying through by n gives the final answer.
T(n) = nlogn + n
= O(nlogn)
Bubble Sort :
The Bubble sort is the simplest but not very efficient of the sorts. It
compares adjacent elements in a list of data elements, and swaps them if they are
not in order. Each pair of adjacent elements is compared and swapped until the
smallest element “bubbles up” to the top thus the name bubble sort. Repeat this
process till all the elements are in sorting order. This technique is often called
Exchange sort, Adjacent sort or Sinking sort.
The step by step procedure to sort in ascending order is given below :
➔ Compare two numbers at a time, starting with the first two numbers.
➔ If the top number is larger than the second number, then exchange the two
numbers.
➔ Go down one number and compare that number to the number that follows it.
These two form a new pair.
➔ Continue this process until no exchanges had been made in an entire pass through
the list.
For example, sort the following numbers from smallest to largest
pass1
2 2 2 2 2 2 2
6 6 5 5 5 5 5
5 5 6 4 4 4 4
4 4 4 6 1 1 1
1 1 1 1 6 6 6
7 7 7 7 7 7 7
3 3 3 3 3 3 3
pass 2
2 2 2 2 2 2
5 5 4 4 4 4
4 4 5 1 1 1
1 1 1 5 5 5
6 6 6 6 6 3
7 3 3 3 3 6
3 7 7 7 7 7
pass 3
2 2 2 2 2
4 4 1 1 1
1 1 4 4 4
5 5 5 5 3
3 3 3 3 5
6 6 6 6 6
7 7 7 7 7
pass 4
2 1 1 1
1 2 2 2
4 4 4 3
3 3 3 4
5 5 5 5
6 6 6 6
7 7 7 7
pass 5
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
pass 6
1 1
2 2
3 3
4 4
5 5
6 6
7 7
Note that after the first pass the largest element is in the last position of the
array. In the ith pass, ith largest element bubbles up to the ith position from the top of
the array. Since each pass places a new element into its proper position, it requires N-1
passes for N elements .
Pseudo code for Bubble Sort:
Repeat
Initialize NoExchanges to true
For each array of adjacent elements do
begin
If the values of the pair are out of order then
begin
Exchange the values
Set NoExchanges to False
end
end
until the array is sorted
Algorithm :
Procedure bubblesort(A,N)
1.For pass=1 to N-1 do
begin
2.NoExchange <- true
3.For j=1 to (N-pass) do
begin
4.If a[j]>a[j+1] then
begin
5.swap(a[j],a[j+1])
6.NoExchange <- False
end
end
7.If(NoExchange is true) then
return
end
8.Return.

The algorithm requires N-1 passes ,each pass places one item in its correct
place. Obviously the Nth element would also be in correct place. The ith pass makes N-i
comparisons. So the time
T(N)= (N-1)+(N-2)+.................+2+1
N-1
= ∑i=N/2(N-1)=O(N²)
i=1
The Bubble sort is simple but least efficient. If the data set contains N elements ,
the maximum number of passes is N-1. Similarly the number of passes is N-1. Similarly
the number of tests per pass ranges between N-1 and 1 with an average value of N/2.
Thus the upper bound on the total number of tests is roughly (N-1)*N/2 that is
proportional to N².
Program to implement Bubble sort :
# include <stdio.h>
# define swp(x,y)(int t;t=x;x=y;y=t)
main()
{
int a[20],n,i;
printf(“How many numbers”);
scanf(“%d”,&n);
printf(“Enter %d numbers”,n);
for(i=0;i<n;++i)
scanf(“%d”,&a[i]);
printf(“\n Numbers before sort:\n”);
for(i=0;i<n;++i)
printf(“%d”,a[i]);
bubblesort(a,n);
printf(“\n Numbers after sort:\n”);
for(i=0;i<n;++i)
printf(“%d”,a[i]);
}
bubblesort(int a[],int n)
{
int pass,j,noexchange;
for(pass=0;pass<n-1;pass++)
{
noexchange=1;
for(j=0;j<n-pass;j++)
{
if(a[j]>a[j+1]
{
swap(a[j],a[j+1])
noexchange=0;
}
}
if(noexchange)
return;
}
}

Heap Sort :
Heap sort is a sorting algorithm which was originally described by Floyd. This
method sorts by building a heap, and then repeatedly extracting the maximum item.
Heap is a complete binary tree with the property that a parent is always greater
than or equal to either of its children. Thus, the root node always contain the largest
element.
Binary trees can be represented as arrays, by first numbering the nodes from left
to right. The key values of the nodes are then assigned to array positions whose subscript
is given by the number of node.
Heap sort supports two operations delete and insert. Delete operation removes the
root node and replace it with the last leaf. At this point the tree might not be ordered
since most likely the new node is not the next largest element. We then migrate the root
down the tree until we find its correct position, by swapping it with one of the two
children as we preserve the heap property.
Insert operation inserts the new element as the last right most node in the binary
tree. At this point the new node will most likely not be in the correct position, so we
move it up the tree structure until its correct position is attained.
Heap sort simply uses the heap structure and operations. First it initializes the
heap by putting the data from array onto a heap. Then we repeatedly rank the
root(maximum) element out of the heap and put it at the end of the sorted table.
Eventually the heap will be empty and the sorted table will contain all the elements.
The Heap sort method takes the input array of elements and then builds a heap
based on the elements and finally produces the sorted elements. Thus, the Heap sort
algorithm consists of two phases:
1.Building a Heap tree
2.Sorting
To build a heap , start with one element and insert the next element into the
existing heap as in the insert operation, so that again heap is constructed. The insertion
process is repeated until a heap builds for all input data elements.
Once the heap is constructed, the largest element in the input must be the root of
the heap, located at position 1. In the sorting phase, the root is placed into its correct
position, swapping it with the element at the bottom of the array. Then the properties of
the heap are restored considering all the elements excluding the largest element, already
placed in the sorted position. This is repeated until the heap is empty.
The Pseudo code for Heap Sort :
1. Building Heap
(a) start with just one element. One element will always satisfy heap
property.
(b) Insert next elements and make this a heap.
(c)repeat step b,until all elements are included in the heap.
2. Sorting
(a) Exchange the root and the last element in the heap.
(b) Make this a heap again,but this time do not include last node.
(c)Repeat steps a and b until there is no element left.
Algorithm :
Building Heap :
Procedure BuildHeap(A,N)
1.For j=2 to N do
begin
2. Key<— A[j]
3. i<—trunc(j/2)
4. while (i>0) and (A[i]<key) do
begin
5. A[j] <— A[i]
6. j <— i
7. i <— trunc(j/2)
end
8. A[i] <— key
end
9. Return
Re-create the Heap:
Procedure ReHeap(A,K)
1. Parent <— 1
2. child <— 2
3. key <— A[parent]
4. Heap <— False
5. while (child <= k) and (not Heap) do
begin
6. If (child< k) then
If A[child + 1]> A[child] then
child <— child + 1
7. If A[child] > key then
begin
8. A[parent] <— A[child]
9. parent <— child
10. child <— parent * 2
end
else
11. Heap <— true
end
12. A[parent] <— key
13.Return.
Heap sort :
procedure HeapSort(A,N)
1.call BuildHeap(A,N)
2.For pos = N to 2 in steps of -1 do
begin
3.swap(A[1],A[pos])
4.call ReHeap(A,pos-1)
end
5.Return
Addition and deletion are both O(log n) operations. We need to perform n
additions for building a heap, and thus it takes O(n log n) time. Restoring the heap n
times takes O(n logn) time.
Hence the total time complexity is 2* O(n log------- n)= O(n log n)
The Average, best and worst case performances are O(nlogn) as there is no
best or worst for heap sort algorithm.
The Heap sort can be used for small or large lists of items. Unlike the Quick
sort , it does not degenerate to O(n²) algorithm. When the list is sorted. Although it has
the same time complexity as the quick sort it does not have a few more comparisions,
therefore it is a little slower. But when the list is already sorted Heap sort is faster than
Quick sort.
Quick sort :
As the name suggests, quick sort or partition exchange sort is a fast sorting
algorithm invented by C.A.R. Hoare. It is based on the recognition that exchanges
should preferably be performed over large distances in order to be most effective. Its
average running time is O(nlogn). It is very fast, mainly because of a very tight and
highly optimized inner loop. It has O(n²) worst-case performance, but this can be made
exponentially unlikely with a little effort.
The quick sort algorithm is simple to understand and prove correct, although for
many years it had the reputation of being an algorithm that could in theory be highly
optimized but in practice it was impossible to code correctly.
Like Merge sort, quick sort is a Divide and Conquer recursive algorithm.
In quick sort we divide the array of items to be sorted into two partitions and then
call the quick sort procedure recursively to sort the two partitions. To partition the data
elements , a pivot element is to be selected such that all the items in the lower part are
less than the pivot and all those in the upper part greater than it.
Consider an array A of N elements to be sorted. Select a pivot element among the
N elements. The selection of pivot element is somewhat arbitrary. However , the first
element is a convenient one.
The array is divided into two partitions so that the pivot element is placed into its
proper position satisfying the following properties
1. All elements to the left of pivot are less than the pivot element.
2. All elements to the right of pivot are greater than or equal to the pivot
element.
Pseudo code for Quick sort is :
1. If the input array A is empty or has only one element , then Return.
2. Select first element, called pivot , of array A.
3. partition the array into two, so that the elements less than or equal to
pivot are towards the left and those greater than towards to right,
relative to pivot.
4. Call Quick sort procedure recursively to sort left and right sub arrays
respectively.
The Quick sort can be implemented either iterative or recursive.
The recursive algorithm is simple and easy to understand.
Algorithm :
Procedure Quicksort(A,Lowbound, Upbound)
1. If(Lowbound<Upbound) then
begin
2. i < Lowbound
3. j < Upbound
4. pivot < lowbound
5. while(i<j) do
6. while ((A[i] <= A[pivot]) and (i<Upbound) do
i< i+1
7. while(A[j] >A[pivot]) do
j< j-1
8. if (i<j) then
swap(A[i],A[j])
end
9. swap(A[pivot] ,A[j])
end
10.call Quicksort(A, Lowbound, j-1)
11.call Quicksort (A,j+1,Upbound)
end
12. Return.
Analysis of Quick sort :
Like Merge sort, Quick sort is recursive and hence its analysis requires
solving a recurrence formula.
We will do this for a quick sort, assuming a random pivot and no cutoff
for small files.
The running time of quick sort is equal to the running time of two
recursive calls plus the linear time spent in the partition.
The basic Quick sort relation is
T(n) = T(i) + T(n-i-1) + Cn
where i= |s1| is the number of elements in s1.
Worst-case Analysis :
The pivot is the smallest element all the time. Then i=0 and if we ignore
T(0) = 1, which is significant, the recurrence is
T(n) = T(n-1) + Cn , n>1
T(n-1) = T(n-2) + C(n-1)
T(n-2) = T(n-3) + C(n-2)
.
.
T(2) = T(1) + C(2)
Adding up all these equations we get
n
T(n) = T(1) + C∑i = O(n²)
i=2
Best-case Analysis :
In this case the pivot is in the middle.
T(n) = 2T(n/2) + Cn
divide both sides by n
T(n)/n = T(n/2)/(n/2) + C
T(n/2)/(n/2) = T(n/4)/(n/4) + C
and
T(n/4)/(n/4) = T(n/8)/(n/8) + C
.
.
.
T(2)/2 = T(1)/1 + C
Now add up all the equations. After everything is added the final
result is because all of the other terms cancel and there are logn equations , and so all the
1s at the end of these equations add up to logn.
T(n)/n = T(1)/1 + logn
Multiplying through by n gives the final answer.
T(n) = Cnlogn + n
= O(nlogn)