You are on page 1of 88

Unit I

Sorting

Definitions of an Algorithm
An algorithm is a step by step procedure, which defines a set
of instructions to be executed in a certain order to get the
desired output.
(Or)
An algorithm is a finite set of instructions which, if followed
accomplish a particular task.

Characteristics of an Algorithm

In addition, every algorithm musty satisfies the following criteria:


Input: It takes zero or more inputs.
Output: It produces zero or more outputs. At least one output must be produced.
Definiteness: Algorithm should be clear. The good algorithm can be easily
understandable by non-programmer.
Finiteness: All the operations can be carried out with a finite number of steps.
Effectiveness: The algorithm should be very efficient. The good algorithm
occupies less memory space as much as possible.
Unambiguous Algorithm should be clear and unambiguous. Each of its steps
(or phases), and their input/outputs should be clear and must lead to only one
meaning. Algorithms are generally created independently of underlying
languages, i.e. an algorithm can be implemented in more than one programming
language.

Analysis of an Algorithm
There are two primary methods for analyzing algorithms formally
i) Correctness: The primary method of validity for algorithms is using a
proof of correctness.
This is corrected through verification of the algorithm accomplishing its
purpose formally, and terminating in k finite steps.
ii) Complexity: An algorithm may be analyzed in the context of its
complexity.
Asymptotic analysis of an algorithm refers to defining the mathematical
foundation/framing of its un-time performance. Using asymptotic
analysis, we can very well conclude the best case, average case and worst
case scenario of an algorithm
This is typically via measures of run-times using big-O, or big-Omega, or
big-theta notation.

Complexity can be through 3 main types of analysis

Average-Case Analysis: Average time required for program execution.


Best-Case Analysis: Minimum time required for program execution.
Worst-Case Analysis: Maximum time required for program execution.

Asymptotic Notations

The following are commonly used asymptotic notations used in


calculating running time complexity of an algorithm.
Notation
Notation
Notation

Big Oh Notation,

The n is the formal way to express the upper bound of an algorithm's running
time. It measures the worst case time complexity or longest amount of time an
algorithm can possibly take to complete. It is asymptotic upper bound.

O(g(n)) = {f(n) : positive constants c and n0, such that n n0, we have 0
f(n) cg(n) }

Omega Notation,
The n is the formal way to express the lower bound of an algorithm's
running time. It measures the best case time complexity or best amount of
time an algorithm can possibly take to complete

(g(n)) = {f(n) : positive constants c and n0, such that n n0, we have 0
cg(n) f(n)}

Theta Notation,

The n is the formal way to express both the lower bound and upper bound of an
algorithm's running time. It is represented as follows

(g(n)) = {f(n) : positive constants c1, c2, and n0, such that n n0, we have 0
c1g(n) f(n) c2g(n)}

Relations between , and O


Theorem: For any two function g(n) and f(n),
f(n) = (g(n)) Iff
f(n) = O(g(n)) and f(n) = (g(n)).
i.e. (g(n)) = O(g(n))

(g(n))

Definitions of Data structure


Data Structure is a way to organize data in such a way that it can be
used efficiently. That is, organized collection of data is called a data
structure.
(Or)
The data structure is a specialized format for organizing and storing
data.
Data is a set of elementary items. The possible ways in which the data
items are logically related is defined by the data structure.
The programs have to follow a certain rules to access and process the
structure data. And so,
Data structure = Organized data + allowed Operation.

From data structure point of view, the data in the data structures are
processed by the below mentioned operations are:
Searching Algorithm to search an item in a data structure.
Sorting Algorithm to sort items in a certain order.
Insertion Algorithm to insert a new item in a data structure.
Updating Algorithm to update an existing item in a data structure.
Deletion Algorithm to delete an existing item from a data structure.
Traversing Algorithm to visit each item in a data structure at least
once.
Merging- Algorithm to merge one item with another in a data
structure.

Classifications of Data structures

Sorting
A process that organizes a collection of data into either ascending or
descending order.

Classification of sorting
a. External sorting
b. Internal sorting
c. Stable Sorting
External sorting
External sorting is a process of sorting in which large blocks of datas are
stored in storage devices are moved to the main memory and then sorted. I.e. a
sorting can be external if the records that it is sorting are in auxiliary storage.
Internal sorting
Internal sorting is a process of sorting the datas in the main memory. I.e. a
sorting can be internal if the records that it is sorting are in main memory.
Stable sort
A sorting technique is called stable if for all records i and j such that k[i]
equals k[j], if r[i] precedes r[j] in the original file, r[i] precedes r[j] in the sorted
file. That is, a stable sort keeps records with the same key in the same relative order
that they were in before the sort.

Insertion Sort

Insertion Sort (also called as Straight Insertion Sort)

One of the simplest sorting algorithms is the insertion sort.


If the first few objects are already sorted, an unsorted object can be inserted
in the sorted set in proper place. This is called insertion sort. (Or) It is the
one that sort a set of records by inserting records into an existing sorted file.
With each pass of an insertion sort, one or more pieces of data are inserted
into their correct location in an ordered list.
Instead of inserting an element anywhere in the list and resorting it again,
each time when a new element is encountered, it is inserted in the correct
position.
In this sorting, the list is divided into two parts: sorted and unsorted.
In each pass the first element of the unsorted sub-list is transferred to the
sorted list by inserting it at the appropriate place.
It will take at most n-1 passes to sort the data.

In pass p, move the pth element left until its correct place is found among the first p
elements.
Example: Card Players (As they pick up each card, they insert it into the proper
sequence in their hand
Steps
Let A be the array of n numbers.
Our aim is to sort the numbers in ascending order.
Scan the array from A[1] to A[n-1] and find A[R] where R=1,2,3(N-1) and
insert into the proper position previously sorted sub-array, A[1],A[2]..A[R-1].
If R=1, the sorted sub-array is empty, so A[1] is sorted itself.
If R=2, A[2] is inserted into the previously sorted sub-array A[1], i.e., A[2] is
inserted either before A[1] or after A[1].
If R=3, A[3] is inserted into the previously sorted sub-array A[1],A[2] i.e., A[3] is
inserted either before A[1] or after A[2] or in between A[1] and A[2].
We can repeat the process for(n-1) times, and finally we get the sorted array.

Example: the sorts through a list of six numbers. Sorting these data
requires five sort passes. Each pass moves the wall one element to the right
as an element is removed from the unsorted sub-list and inserted into the
sorted list.

Example of insertion sort


8

L1.20

Example of insertion sort


8

L1.21

Example of insertion sort

L1.22

Example of insertion sort

L1.23

Example of insertion sort

L1.24

Example of insertion sort

L1.25

Example of insertion sort

L1.26

Example of insertion sort

L1.27

Example of insertion sort

L1.28

Example of insertion sort

L1.29

Example of insertion sort

L1.30

done

Algorithm for Insertion sort


Step 1 If it is the first element, it is already sorted. Return 1;
Step 2 Pick next element
Step 3 Compare with all elements in the sorted sub-list
Step 4 Shift all the elements in the sorted sub-list that is greater than the
value to be sorted
Step 5 Insert the value
Step 6 Repeat until the list is sorted.

https://courses.cs.vt.edu/csonline/Algorithms/Lessons/InsertionCardSort/in
sertioncardsort.swf

Advantage of Insertion Sort

The advantage of Insertion Sort is that it is relatively simple and easy to


implement.
Insertion sorts only pass through the array only once.
They are adaptive; efficient for data sets that are already sorted.
The simplicity and efficiency of the insertion sort, especially for small
arrays.
Relative simple and easy to implement. Twice faster than bubble sort.

Disadvantage of Insertion Sort

The disadvantage of Insertion Sort is that it is not efficient to operate with


a large list or input size.
The great inefficiency for large arrays.

Selection sort

One of the easiest ways to sort a table is by selection.


Beginning with a first record in the table a search is performed to locate the
element which has the small key.
When this element is found, it is interchanged with the first record in the table.
This interchange places the record with the smallest key in the first position of
the table.
A search for second smallest key is then carried out.
This is accomplished by examining the keys from second element onward.
The element which has the second smallest key is interchanged with the
element located in the second position of the table.
The process of searching for the record with the next smallest key and placing
it in its proper position (within the desired ordering) continues until all the
records have been sorted in ascending order.

A general algorithm for selection sort is


Repeat through step 5 a total of n-1 times.
Record the portion of the vector already stored.
Repeat step 4 for the elements in the unsorted portion of the
vector.
Record location of the smallest element in the unsorted vector.
Exchange first element in unsorted vector by smallest element.
The list is divided into two sublists, sorted and unsorted, which
are divided by an imaginary wall.
We find the smallest element from the unsorted sublist and swap
it with the element at the beginning of the unsorted data.

After each selection and swapping, the imaginary wall between


the two sublists move one element ahead, increasing the number of
sorts elements and decreasing the number of unsorted ones.
Each time we move one element from the unsorted sublist to the
sorted sublist, we say that we have completed a sort pass.
A list of n elements requires n-1 passes to completely rearrange
the data.

Example 1

Example 2

Algorithm for Selection sort


Step 1 Set MIN to location 0
Step 2 Search the minimum element in the list
Step 3 Swap with value at location MIN
Step 4 Increment MIN to point to the next element
Step 5 Repeat until the list is sorted

The best case, the worst case, and the average case of the selection
sort algorithm are same. All of them are O(n2).

This means that the behavior of the selection sort algorithm does not
depend on the initial organization of data.
Since O(n2) grows, so rapidly, the selection sort algorithm is
appropriate only for small n.
Although the selection sort algorithm requires O(n 2) key comparisons,
it only requires O(n) moves.
A selection sort could be a good choice if data moves are costly, but
key comparisons are not costly (sort keys, long records).

Advantages of Selection sort

Disadvantages of selection
sort

Shell Sort

Shell sort is a highly efficient sorting algorithm and is based


on insertion sort algorithm.
This algorithm avoids large shifts as in case of insertion sort
if smaller value is very far right and have to move to far left.
This algorithm uses insertion sort on widely spread elements
first to sort them and then sorts the less widely spaced
elements. This spacing is termed as interval. This interval is
calculated based on Knuth's formula as
h=h*3+1
where h is interval with initial value 1
This algorithm is quite efficient for medium sized data sets
as its average and worst case complexity are of O(n) where n
are no. of items.

Example
We take the below example to have an idea, how shell sort works? We
take the same array we have used in our previous examples. For our
example and ease of understanding we take the interval of 4. And make
a virtual sub list of all values located at the interval of 4 positions. Here
these values are {35, 14}, {33, 19}, {42, 27} and {10, 44}
We compare values in each sub-list and swap them (if necessary) in the
original array. After this step, new array should look like this
Then we take interval of 2 and this gap generates two sub lists - {14,
27, 35, 42}, {19, 10, 33, 44}
We compare and swap the values, if required, in the original array.
After this step, this array should look like this {14, 27, 35, 42}, {10,
19, 33, 44}
And finally, we sort the rest of the array using interval of value 1. Shell
sort uses insertion sort to sort the array {10, 14, 19, 27, 33, 35, 42, and
44}.
The step by step depiction is shown below
We see that it required only four swaps to sort the rest of the array.

Example 2
0

5 6

9 10

11 12 13 14 15

44 68 191 119 119 37 83 82 191 45 158 130 76 153 39 25


Initial gap = length / 2 = 16 / 2 = 8
Initial sub arrays indices:
{0, 8}, {1, 9}, {2, 10}, {3, 11}, {4, 12}, {5, 13}, {6, 14}, {7, 15}
next gap = 8 / 2 = 4
{0, 4, 8, 12}, {1, 5, 9, 13}, {2, 6, 10, 14}, {3, 7, 11, 15}
next gap = 4 / 2 = 2
{0, 2, 4, 6, 8, 10, 12, 14}, {1, 3, 5, 7, 9, 11, 13, 15}
final gap = 2 / 2 = 1

Algorithm for shell sort


We shall now see the algorithm for shell sort.
Step 1 - Initialize the value of h
Step 2 Divide the list into smaller sub-list of equal interval h
Step 3 Sort these sub lists using insertion sort
Step 4 Repeat until complete list is sorted

Bubble Sort

Another well known sorting method is bubble sort. It differs from the
selection sort in that instead of finding the smallest record and then
performing an interchange two records are interchanged immediately
upon discovering that they are of out of order.
When this approach is used there are at most n-1 passes required. During
the first pass k1and k2 are compared, and if they are out of order, then
records R1 and R2 are interchanged; this process is repeated for records
R2 and R3, R3 and R4 and so on .this method will cause with small keys
to bubble up.
After the first pass the record with the largest key will be in the nth
position. On each successive pass, the records with the next largest key
will be placed in the position n-1, n-2,n respectively, thereby
resulting in a sorted table.
After each pass through the table, a check can be made to determine
whether any interchanges were made during that pass. If no interchanges
occurred then the table must be sorted and no further passes are required.

A general algorithm for bubble sort is


o Repeat through step 4 a total of n-1 times.
o Repeat step 3 for elements in unsorted portion of the vector.
o If the current element in the vector > next element in the
vector then exchange elements.
o If no exchanges were made then return else reduce the size of
the unsorted vector by one.

Example 1
One pass

Array after completion of each


pass

Example 2
Original text
After pass 1

After pass 2

After pass 3

After pass 4

Algorithm for bubble sort


We assume list is an array of n elements. We further assume that swap
function, swaps the values of given array elements.
begin BubbleSort(list)
for all elements of list
if list[i] > list[i+1]
swap(list[i], list[i+1])
end if
end for
return list
end BubbleSort

Analysis of bubble sort


Number of comparisons (worst case):
(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)
Number of comparisons (best case):
n 1 O(n)
Number of exchanges (worst case):
(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)
Number of exchanges (best case):
0 O(1)
Overall worst case: O(n) + O(n) = O(n)

Quick Sort

Like merge sort, Quick sort is also based on the divide-andconquer paradigm. But it uses this technique in a somewhat
opposite manner, as all the hard work is done before the
recursive calls.
It works as follows:
First, it partitions an array into two parts,
Then, it sorts the parts independently,
Finally, it combines the sorted subsequences by a simple
concatenation.

QUICK_SORT(K,LB,UB)
Given a table K of N record, this recursive procedure sorts the
table, as previously described, in ascending order.
A dummy record with key K [N+1] is assumed where
K[I]<=K[N+1] for all 1<=<=N . The Integer parameters LB
and UB denote the lower and upper bounds of the current sub
table being processed.
The indices I and J are used to select certain keys during the
processing of each sub table. KEY contains the key value which
is being placed in its final position within the sorted sub table.
FLAG is a logical variable which indicates the end of the process
that places a record in its final position.
When FLAG becomes false, the input sub table has been
partitioned into two disjointed parts.

Variables used
K Array to hold elements
LB,UB Denotes the lower and upper bounds of the current
sub table
I,J Used to select certain keys during processing
KEY Holds the key value of the final position
FLAG Logical variable to indicate the end of process

The quick-sort algorithm consists of the following three steps:


Divide: Partition the list.
To partition the list, we first choose some element from the
list for which we hope about half the elements will come
before and half after. Call this element the pivot.
Then we partition the elements so that all those with values
less than the pivot come in one sub list and all those with
greater values come in another.
Recursion: Recursively sort the sub lists separately.
Conquer: Put the sorted sub lists together.

Partitioning places the pivot in its correct place position within the array.

Arranging the array elements around the pivot p generates two smaller
sorting problems.
Sort the left section of the array, and sort the right section of the array.
When these two smaller sorting problems are solved recursively, our
bigger sorting problem is solved.
First, we have to select a pivot element among the elements of the given
array, and we put this pivot into the first location of the array before
partitioning.

Which array item should be selected as pivot?


Somehow we have to select a pivot, and we hope that we
will get a good partitioning.
If the items in the array arranged randomly, we choose a
pivot randomly.
We can choose the first or last element as a pivot (it may
not give a good partitioning).
We can use different techniques to select the pivot.

Partition Function
Invariant for the partition algorithm

Initial state of the array

Moving theArray[firstUnknown] into S2 by incrementing


firstUnknown

Partition in Quick sort


The pivot value divides the list in to two parts. And
recursively we find pivot for each sub-lists until all lists
contains only one element.

Quick Sort Pivot Algorithm


Based on our understanding of partitioning in quick sort, we
should now try to write an algorithm for it here.
Step 1 Choose the highest index value has pivot
Step 2 Take two variables to point left and right of the list
excluding pivot
Step 3 left points to the low index
Step 4 right points to the high
Step 5 while value at left is less than pivot move right
Step 6 while value at right is greater than pivot move left
Step 7 if both step 5 and step 6 does not match swap left and right
Step 8 if left right, the point where they met is new pivot

QuickSort Algorithm
Using pivot algorithm recursively we end-up with smaller
possible partitions. Each partition then processed for quick sort.
We define recursive algorithm for quicksort as below
Step 1 - Make the right-most index value pivot
Step 2 partition the array using pivot value.
Step 3 quicksort left partition recursively
Step 4 quicksort right partition recursively

Example for using this pivot function

A worst-case partitioning with quick sort

An average-case partitioning with quick sort

Quick sort analysis

Quicksort is O(n*log2n) in the best case and average case.

Quicksort is slow when the array is sorted and we choose the first element as the pivot.

Although the worst case behavior is not so good, its average case behavior is much better
than its worst case.
So, Quicksort is one of best sorting algorithms using key comparisons.

Merge Sort

The operation of sorting is closely related to the processing of merging.


This sort formulates the sorting algorithm based on the successive
merges.
The approach is used to give two formulations of merge sort. First one
is recursive which is easier and to write and analyze. The second one is
the iterative which is complex.
First let us see the merging of the two ordered tables which can be
combined to produce a single sorted table.
This process can be accomplished easily by successively by selecting
the record with the smallest key.
Occurring in either of the tables and placing this record in the new
table. There by creating an ordered list.
Merge sort algorithm is one of two important divide-and-conquer
sorting algorithms (the other one is quick sort).

It is a recursive algorithm.
Divides the list into halves,
Sort each halve separately, and
Then merge the sorted halves into one sorted array.
Example 1:

Example 2

Example 3

Merge sort Analysis


Merging two sorted arrays of size k
Best-case
All the elements in the first array are smaller (or larger) than
all the elements in the second array.
The number of moves: 2k + 2k
The number of key comparisons: k
Worst-case
The number of moves: 2k + 2k
The number of key comparisons: 2k-1

Levels of recursive calls to merge sort, given an array of eight


items

Merge sort is extremely efficient algorithm with respect to time.


Both worst case and average cases are O (n * log2n )
But, merge sort requires an extra array whose size equals to the size of
the original array.
If we use a linked list, we do not need an extra array
But, we need space for the links
And, it will be difficult to divide the list into half ( O(n) )

Radix Sort

Radix sort algorithm is different than other sorting algorithms that we


talked.
It does not use key comparisons to sort an array.
The radix sort
Treats each data item as a character string.
First it groups data items according to their rightmost character, and
put these groups into order with respect to this rightmost character.
Then, combine these groups.
We repeat these grouping and combining operations for all other
character positions in the data items from the rightmost to the
leftmost character position.
At the end, the sort operation will be completed.
Given tables of N Records arranged as a linked list, each node in the
list consists of a Key field, and a link field, this procedure performs the
sort.

The first node is pointed by a pointer called FIRST the vectors T and B is the
pointers to store the address of the rear and front of the queue. In particular,
these denoted as T[I] and B[I] , are the top and bottom of the records.

The pointer R is used to denote the current record. Next is the pointer to
denote the next record. Prev pointer is used to combine the queue. D is used
to examine the digit.
Variables Used
- First Pointer to point the First Node in the table.
- TDenotes Top of the queue.
- BDenotes the bottom of the Stack.
- J Pass Index
- PPocket index to point to the temporary table.
- R Pointer to store the address of the current record being handled.
- NEXTPointer which has the address of the next record in the table.
- PREV Pointer to combine the pockets.
- DCurrent digit being handled in the current Key field.

Example 1:
mom, dad, god, fat, bad, cat, mad, pat, bar, him original list
(dad,god,bad,mad) (mom,him) (bar) (fat,cat,pat) group
strings by rightmost letter
dad,god,bad,mad,mom,him,bar,fat,cat,pat
combine
groups
(dad,bad,mad,bar,fat,cat,pat) (him) (god,mom) group strings
by middle letter
dad,bad,mad,bar,fat,cat,pat,him,god,mom
combine
groups
(bad,bar) (cat) (dad) (fat) (god) (him) (mad,mom) (pat) group
strings by first letter
bad,bar,cat,dad,fat,god,him,mad,mom,par
groups (SORTED)

combine

Example 2:

Analysis of Radix sort


The radix sort algorithm requires 2*n*d moves to sort n strings of d
characters each.
So, Radix Sort is O(n)
Although the radix sort is O(n), it is not appropriate as a generalpurpose sorting algorithm.
Its memory requirement is d * original size of data (because
each group should be big enough to hold the original data
collection.)
For example, we need 27 groups to sort string of uppercase letters.
The radix sort is more appropriate for a linked list than an array.
(we will not need the huge memory in this case)

You might also like