You are on page 1of 25

TABLE OF CONTENTS

INTRODUCTION CLASSIFICATION OF SORTING ALGORITHMS BUBBLE SORT INSERTION SORT SHELL SORT HEAP SORT MERGE SORT QUICK SORT BUCKET SORT RADIX SORT

INTRODUCTION
A sorting algorithm is an algorithm that puts a list in a specific order. Sorting is the building block of any program. It helps in optimizing the performance of the program, and also helps in reducing the code to a great level. This has made sorting algorithms an area of interest for the Computer Scientists. The list is an abstract data type that implements an ordered collection of values where the values may occur more than once in the list. The order in which the list is sorted can be numerical or lexicographical. This report aims to give a glimpse about the various sorting algorithms present. It gives a detailed (but limited) explanation on the working and the uses of some of the most widely used sorting algorithms. With regard to the different types of sorting algorithms at hand, the report presents us the basis on which the algorithms are classified, some of them being complexity, stability, and general method followed. For every sorting algorithm, we have provided with a code fragment for the facility of the readers. The report is to be used as a study material and not just a reading one.

CLASSIFICATION OF SORTING ALGORITHMS


Sorting algorithms are classified based on various factors. These factors also influence the efficiency of the sorting algorithm. They are often classified by: Computational Complexity Comparisons can be made as worst, average and best behaviour in terms of the Big O Notation a. For an sorting algorithm of a list of size behaviour is , good behaviour is and bad .

. Ideal behaviour for a sorting algorithm is

Memory Usage There are in-place algorithms and out-of-place algorithms. In-place algorithms require just or extra memory,

beyond the items being sorted. Therefore they dont need to create auxiliary locations for data to be temporarily stored, as in out-of-place sorting algorithms. Recursion Some algorithms are either recursive or non-recursive, while others may be both (e.g., merge sort). Stability Stable sorting algorithms maintain the relative order of records with equal keys (i.e., values). As an example the following set of values is to be sorted by their first component. ( 2 , 6 ) ( 2 , 1 ) ( 3 , 5 ) ( 4 , 3 ) ( 7 , 2 ) . They can be sorted in two ways : (2,6) (2,1) (3,5) (4,3) (7,2) Changed (2,1) (2,6) (3,5) (4,3) (7,2) - Order Changed - Order Not

The algorithm that does not change the relative order, is the stable algorithm. The unstable algorithms can always be specially implemented to be

stable, by keeping the order of actual data as the tie breaker for equal values. This would require additional computational cost and memory. Comparison Algorithms can be classified on whether the sorting algorithm is a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator. Most of the widely used sorts are comparison sorts. General Method According to the method followed by the sorting algorithm it is classified as insertion, exchange, selection, merging, etc. Examples of exchange sorts and selection sorts are bubble sort and heap sort respectively. All these factors will be considered when we will be doing a detailed study of the widely used and popular sorting algorithms.

BUBBLE SORT
Bubble sort is a simplistic and straightforward method of sorting data that is used in computer science education. It works by repeatedly steeping through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted. The smaller elements bubble their way to the top, and hence the name bubble sort. Bubble sort is a comparison sort. Bubble sort has the worst case and average complexity both (n), where n is the number of items being sorted. In this algorithm for sorting 100 elements, there are 10000 comparisons made. There are lot of other sorting algorithms which substantially work better with the worst or average case of O(n log n). Insertion sort, which also has a worst case of (n), perform better than bubble sort. Therefore the use of bubble sort is not practical when n is large. The performance of bubble sort also depends on the position of the elements. Large elements at the beginning of the list do not pose a problem, as they are quickly swapped. Small elements towards the end, however, move to the beginning extremely slowly. Cocktail sort is a variant of bubble sort that solves this problem, but it still retains the (n), worst case complexity. Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest number using bubble sort algorithm. In each step, elements written in bold are being compared. First Pass: (51428) swaps them. (15428) (14528) ( 1 4 5 2 8 ), Swap since 5 > 4 ( 1 4 2 5 8 ), Swap since 5 > 2 ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and

(14258) Second Pass: (14258) (14258) (12458) (12458)

( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),

algorithm does not swap them. (14258) (12458) (12458) (12458)

Now, the array is already sorted, but our algorithm does not know if it is completed. The algorithm needs one whole pass without any swap to know it is sorted. Third Pass: (12458) (12458) (12458) (12458) (12458) (12458) (12458) (12458)

Finally, the array is sorted, and the algorithm can terminate. The performance of bubble sort can be improved marginally. When the first pass is over, the greatest element comes over to the last position i.e. the n-1 position in the array. For further passes, that position is not to be compared. So now, each pass can be one step shorter than the previous pass. This can shorten the number of passes by half although the complexity still remains (n). Due to its simplicity and straightforwardness, bubble sort is often used to introduce the concept of an algorithm to introductory computer science students. The Jargon file, which famously calls bogo-sort the archetypical perversely awful algorithm, also calls bubble sort the generic bad algorithm. D. Knuth, in his popular book The Art of Computer Programming concludes that bubble sort seems to have nothing in it to recommend. Researchers like Owen Astrachan have shown by experimental results that insertion sort performs better even on random lists. They have gone as far as to recommend that it no longer be taught.

INSERTION SORT
Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists, and often is used as a part of more sophisticated algorithms. It is a comparison sort, in which the sorted list is built one entry at a time. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. The new list and the remaining elements can share the arrays space, but insertion is expensive, requiring shifting all following elements over by one. Insertion sort has an average and worst complexity of O(n2). The best case is when an already sorted list is sorted, the running time being linear i.e. O(n). Worst case is when the list is in the reverse order. Since in average cases also, the running time is quadratic, it is not considered suitable for large lists. However it is one of the fastest, when the list contains less than 10 elements. A slight variant of insertion sort, Shell sort is more efficient for larger lists. The next chapter gives more detail on shell sort. Insertion sort is stable, and is an in-place algorithm, requiring a constant amount of additional memory space. It can sort a list as it receives it. Compared to other advanced algorithms like quick sort, heap sort, or merge sort, insertion sort is much less efficient on large lists. The following example will show the algorithm for insertion sort. Let us consider the array of numbers 5 1 4 2 8. In each step, elements written in bold are being compared. First Pass: (51428) by swaps. Second Pass: ( 1 5 4 2 8 ), Here, algorithm moves down the list searching for an element less than 5, (which is 1 in this case) and brings it to the front

(15428) (14528)

( 1 4 5 2 8 ) Here 4<5, therefore it is brought to its position ( 1 4 5 2 8 ) Here 1>4, hence no swapping is done

Third Pass: (14528) (14258) (12458) Fourth Pass: (12458) (12458) Instead of swapping, direct shifting can be done by using binary search, and finding out where the element is to be inserted. Binary search is only efficient if the number of comparisons is more than the number of swaps. Since insertion is very tedious in arrays, one can use a linked-list for the sort. But in linked-list, binary search cannot be done as random access is not allowed in linked lists. In 2004 Bender, FarachColton, and Mosteiro produced a new variant of insertion sort, called library sort, that leaves a small number of unused gaps spread throughout the array. The benefit is that shifting of elements be done only till a gap is reached. (14258) ( 1 2 4 5 8 ) Here the list is sorted, but it keeps checking that (12458)

the whole list is sorted or not.

void insertionSort(int[] arr) { int i, j, newValue; for (i = 1; i < arr.length; i++) { newValue = arr[i]; j = i; while (j > 0 && arr[j - 1] > newValue) { arr[j] = arr[j - 1]; j--; } arr[j] = newValue; } }

SHELL SORT
Shell sort was invented by Donald Shell, in 1959. The sort was given its name upon its inventor. It is an improvised version of insertion sort. It combines insertion sort and bubble sort to give much more efficiency than both of these traditional algorithms. Shell sort improves insertion sort by comparing elements separated by a gap of several positions. This lets an element take "bigger steps" toward its expected position. Multiple passes over the data are taken with smaller and smaller gap sizes. The last step of Shell sort is a plain insertion sort, but by then, the array of data is guaranteed to be almost sorted. An implementation of shell sort can be described as arranging the data sequence in a two-dimensional array and then sorting the columns of the array using insertion set. The effect is that the data sequence is partially sorted. The process above is repeated, but each time with a smaller number of columns. In the last step, the array consists of only one column. Actually, the data sequence is not held in a two-dimensional array, but in a one-dimensional array that is indexed properly. Though the shell sort is a simple algorithm, finding its complexity is a laborious task. The original shell sort algorithm has O(n2) complexity for comparisons and exchanges. The gap sequence is a major factor in the shell sort that improves or deteriorates the performance of the algorithm. The original gap sequence suggested by Donald Shell was to begin with N/2 and halve the number until it reaches 1. With this gap sequence, the worst case running time is O(n2). The other gap sequences that are in use popularly and their worst case running time is as follows. O(n3 increments of
/ 2

) for Hibbard's increments of 2k 1, O(n4 , or

/ 3

) for Sedgewick's

, or O(nlog2n) for Pratt's

increments 2i3j, and possibly unproven better running times. The existence of an O(nlogn), (which is the optimal performance for comparison sort algorithms)

worst-case implementation of Shell sort was precluded by Poonen, Plaxton, and Suel. Let 3 7 9 0 5 1 6 8 4 2 0 6 1 5 7 3 4 9 8 2 be the data sequence to be sorted. First, it is arranged in an array with 7 columns (left), then the columns are sorted (right): 3 7 9 0 5 1 6 8 4 2 0 6 1 5 7 3 4 9 8 2 Data elements 8 and 3 3 2 7 4 4 8 7 9 9 have 0 5 1 5 0 6 1 6 9 8 2 now already come to the end of the sequence, but a

small element (2) is also still there. In the next step, the sequence is arranged in 3 columns, which are again sorted: 3 3 2 0 0 1 0 5 1 1 2 2 5 7 4 3 3 4 4 0 6 4 5 6 1 6 8 5 6 8 7 9 9 7 7 9 8 2 8 9 Now the sequence is almost completely sorted. When arranging it in one column in the last step, it is only a 6, an 8 and a 9 that have to move a little bit to their correct position. The best known sequence according to research by Marcin Ciura is 1, 4, 10, 23, 57, 132, 301, 701, 1750. This study also concluded that "comparisons rather than moves should be considered the dominant operation in Shellsort." Another sequence that performs very well on large arrays is the Fibonacci numbers (leaving out one of the starting 1's) to the power of twice the golden ratio, which gives the following sequence: 1, 9, 34, 182, 836, 4025, 19001, 90358, 428481, 2034035, 9651787, 45806244, 217378076, 1031612713, .

Algorithm Shellsort void shellsort (int[] a, int n) { int i, j, k, h, v; int[] cols = {1391376, 463792, 198768, 86961, 33936, 13776, 4592, 1968, 861, 336, 112, 48, 21, 7, 3, 1} for (k=0; k<16; k++) { h=cols[k]; for (i=h; i<n; i++) { v=a[i]; j=i; while (j>=h && a[j-h]>v) { a[j]=a[j-h]; j=j-h; } a[j]=v; } } }

MERGE SORT
Merge sort is a comparison sort which is very much effective on large lists, with a worst case complexity of O(n log n) It was invented by John Von Neumann in the year 1945. Merge sort is an example of divide and conquer algorithm. The algorithm followed by merge sort is as follows 1. 2. 3. 4. If the list is of length 0 or 1, then it is already sorted. Otherwise: Divide the unsorted list into two sub-lists of about half the size. Sort each sub-list recursively by re-applying merge sort. Merge the two sub-lists back into one sorted list.

The two main ideas behind the algorithm is sorting small lists takes lesser time and steps than sorting long lists, and creating a sorted list from two sorted lists is easier than from two unsorted lists. Merge sort is a stable sort i.e. the order of equal inputs is preserved in the sorted list. As mentioned above, merge sort has an average and worst case performance of O(n log n) in sorting n objects. When comparing with quick sort, merge sorts worst case is found equal with quick sorts best case. In the worst case, merge sort does about 39% fewer comparisons than quick sort does in the average case. The main disadvantage that merge sort has is that as many recursive implementations of the algorithm is done, so much method call overhead is created, thus taking time and memory. But it is not difficult to code an iterative, non recursive merge sort, avoiding all method call overheads. Also merge sort does not sort in place, therefore it requires an extra memory to be allocated for the sorted output to be stored in. One of the main advantage of merge sort is that it has O(n) complexity, if the input is already sorted, which is equivalent to running through the list and checking if it is presorted. Sorting in place is possible using linked lists and is very complicated. In such cases, heap sort is more preferable. Merge sort is a very stable sort as long as the merge operation is done properly.

Consider a list of 3, 5, 4, 9, 2 to be sorted using merge sort. First the list will be divided into smaller lists. Here in this case

35492

35

492

92

Now let us consider the comparisons that take place in the algorithm. According to the algorithm, if the list contains 0 or 1 no. of elements the list is already sorted, and it is merged to form a larger sorted list. Accordingly, 3 and 5 will be merged as ( 3 5 ) and 9 and 2 will be merged as ( 2 9 ). Now 4 and ( 2 9 ) are two sorted lists that are to be merged. After comparisons the new sorted list will be ( 2 4 9 ). Now the two sorted lists to be merged are compared are shown in bold. (35) (249) (35) (49) (5) (49) (5) (9) (2) (23) (234) (23459) (35) and ( 2 4 9 ). The comparisons done are shown below, and the elements being

Thus the merged list is ( 2 3 4 5 9 ) which is the required sorted output.

Various programming languages use either merge sort or a variant of the algorithm as their in-built method for sorting.

public int[] mergeSort(int array[]) { if(array.length > 1) int elementsInA1 = array.length/2; int elementsInA2 = elementsInA1; if((array.length % 2) == 1) elementsInA2 += 1; int arr1[] = new int[elementsInA1]; int arr2[] = new int[elementsInA2]; for(int i = 0; i < elementsInA1; i++) arr1[i] = array[i]; for(int i = elementsInA1; i < elementsInA1 + elementsInA2; i++) arr2[i - elementsInA1] = array[i]; arr1 = mergeSort(arr1); arr2 = mergeSort(arr2); int i = 0, j = 0, k = 0; while(arr1.length != j && arr2.length != k) { if(arr1[j] < arr2[k]) { array[i] = arr1[j]; i++; j++; } else {

array[i] = arr2[k]; i++; k++;

} }

while(arr1.length != j) { array[i] = arr1[j]; i++; j++; } while(arr2.length != k) { array[i] = arr2[k]; i++; k++; }

return array;

HEAPSORT
Heapsort is a much more efficient version of selection sort. It works similar to selection sort, by determining the largest (or smallest) element of the list, placing that at the end ( or beginning ) of the list, the continuing with the rest of the list, but accomplishes the task more efficiently with the use of a data structure called a heap, which is a special type of binary tree. It is always guaranteed that the root element of the heap is always the largest element in max_heap ( or smallest element in min_heap ). When the largest element is removed from the heap, there is no need to find the next largest element, as the heap rearranges itself so that the next largest element becomes the root. For the heap data structure to find the next largest element and to move it to the top, it takes only O(log n) time. Therefore the whole Heapsort algorithm takes just O(n log n) time. Heap is a specialized tree based data structure, where if B is a child node of A, then key (A) key (B). Therefore in a heap, the root element is always the largest element. The various operations that can be done on a heap data structure are insert new element, delete an element from the root, and so on. For elementary Heapsort algorithms, the binary heap data structure is widely used. The operations that can be done and their algorithms are given in Appendix B. As Heapsort has O( n log n) complexity it is always compared with quick sort and merge sort. Quick sort has O(n) complexity, which is very insecure and inefficient for large lists. However quick sort works better on smaller lists because of cache and other factors. Since Heapsort is more secure, it is used in embedded systems where security is a great concern. When comparing with merge sort the main advantage Heapsort has, is that Heapsort requires only a constant amount of auxiliary storage space, in contrast to merge sort which requires O(n) auxiliary space. Merge sort has many advantages over Heapsort, some being that merge sort is stable and can be easily adaptable to linked lists and lists on slow media disks.

Let us study an example which demonstrates the working of Heapsort. For the list ( 11 9 34 25 17 109 53 ) the heap data structure will be

An interesting alternative to Heapsort is introsort which combines quick sort and Heapsort, keeping the worst case property of Heapsort and average case property of Quicksort.

Quicksort
Main article: Quicksort Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array, we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n) performance, but if at each step we choose the median as the pivot then it works in O(n log n).

Bucket sort
Main article: Bucket sort Bucket sort is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Thus this is most effective on data whose values are limited (e.g. a sort of a million integers ranging from 1 to 1000). A variation of this method called the single buffered count sort is faster than quicksort and takes about the same time to run on any set of data.

Radix sort
Main article: Radix sort Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n k) time by treating them as bit strings. We first sort the list by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the number of values a bit can have is minimal - only '1' or '0'. [hide]
vde

Sorting algorithms
Computational complexity theory | Big O notation | Theory Total order | Lists | Stability | Comparison sort Exchange sorts Bubble sort | Cocktail sort |

Odd-even sort | Comb sort | Gnome sort | Quicksort Selection sort | Heapsort | Selection sorts Smoothsort | Cartesian tree sort | Tournament sort Insertion sort | Shell sort | Insertion sorts Tree sort | Library sort | Patience sorting Merge sorts Merge sort | Strand sort | Timsort

Radix sort | Bucket sort | Non-comparison sorts Counting sort | Pigeonhole sort | Burstsort | Bead sort Topological sorting | Sorting network | Bitonic sorter | Batcher odd-even mergesort | Pancake sorting

Others

Ineffective/humorous Bogosort | Stooge sort sorts In mathematics, computer science, and related fields, big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. Big O notation allows its users to simplify functions in order to concentrate on their growth rates: different functions with the same growth rate may be represented using the same O notation. Notation Name Example Determining if a number is even or odd; using a constant-size lookup table or hash table Amortized time per operation using a disjoint set The find algorithm of Hopcroft and Ullman on a disjoint set

constant

inverse Ackermann

iterated logarithmic

log-logarithmic

Amortized time per operation using a bounded priority queue[4] Finding an item in a sorted array with a binary search Deciding if n is prime with the AKS primality test

logarithmic

polylogarithmic

fractional power

Searching in a kd-tree

linear

Finding an item in an unsorted list; adding two n-digit numbers Performing a Fast Fourier transform; heapsort, quicksort (best case), or merge sort Multiplying two n-digit numbers by a simple algorithm; adding two nn matrices; bubble sort (worst case or naive implementation), shell sort, quicksort (worst case), or insertion sort Multiplying two nn matrices by simple algorithm; finding the shortest path on a weighted digraph with the FloydWarshall algorithm; inverting a (dense) nxn matrix using LU or Cholesky decomposition Tree-adjoining grammar parsing; maximum matching for bipartite graphs (grows faster than cubic if and only if c > 3)

linearithmic, loglinear, or quasilinear

quadratic

cubic

polynomial or algebraic

L-notation

Factoring a number using the special or general number field sieve

exponential or geometric

Finding the (exact) solution to the traveling salesman problem using dynamic programming; determining if two logical statements are equivalent using brute force Solving the traveling salesman problem via brute-force search; finding the determinant with expansion by minors. Deciding the truth of a given statement in Presburger arithmetic

factorial or combinatorial

double exponential

http://www.algolist.net/Algorithms/Sorting/Insertion_sort http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.1393

You might also like