You are on page 1of 49

Sorting and Searching Algorithms

This is a collection of algorithms for sorting and searching. Descriptions are brief and intuitive,
with just enough theory thrown in to make you nervous. I assume you know a high-level
language, such as C, and that you are familiar with programming concepts including arrays and
pointers.
The first section introduces basic data structures and notation. The next section presents several
sorting algorithms. This is followed by a section on dictionaries, structures that allow efficient
insert, search, and delete operations. The last section describes algorithms that sort data and
implement dictionaries for very large files. Source code for each algorithm, in ANSI C, is
included.
Most algorithms have also been coded in Visual Basic. If you are programming in Visual Basic, I
recommend you read Visual Basic Collections and Hash Tables, for an explanation of hashing
and node representation. A search for Algorithms at amazon.com returned over 18,000 entries.
Here is one of the best.
Insertion Sort

One of the simplest methods to sort an array is an insertion sort. An example of an insertion sort
occurs in everyday life while playing cards. To sort the cards in your hand you extract a card,
shift the remaining cards, and then insert the extracted card in the correct place. This process is
repeated until all the cards are in the correct sequence. Both average and worst-case time is
O(n2). For further reading, consult Knuth [1998].

Theory
Starting near the top of the array in Figure 2-1(a), we extract the 3. Then the above elements are
shifted down until we find the correct place to insert the 3. This process repeats in Figure 2-1(b)
with the next number. Finally, in Figure 2-1(c), we complete the sort by inserting 2 in the correct
place.

Figure 2-1: Insertion Sort
Assuming there are n elements in the array, we must index through n- 1 entries. For each entry,
we may need to examine and shift up to n- 1 other entries, resulting in a O(n2) algorithm. The
insertion sort is an in-place sort. That is, we sort the array in-place. No extra memory is required.
The insertion sort is also a stable sort. Stable sorts retain the original ordering of keys when
identical keys are present in the input data.

Implementation in C
An ANSI-C implementation for insertion sort is included. Typedef T and comparison operator
compGT should be altered to reflect the data stored in the table.
/* insert sort */
#include <stdio.h>
#include <stdlib.h>
typedef int T;
typedef int tblIndex;

/* type of item to be sorted */
/* type of subscript */

#define compGT(a,b) (a > b)
void insertSort(T *a, tblIndex lb, tblIndex ub) {
T t;
tblIndex i, j;
/**************************
* sort array a[lb..ub] *
**************************/
for (i = lb + 1; i <= ub; i++) {
t = a[i];
/* Shift elements down until */

/* insertion point found.
*/
for (j = i-1; j >= lb && compGT(a[j], t); j--)
a[j+1] = a[j];
/* insert */
a[j+1] = t;
}

}

void fill(T *a, tblIndex lb, tblIndex ub) {
tblIndex i;
srand(1);
for (i = lb; i <= ub; i++) a[i] = rand();
}
int main(int argc, char *argv[]) {
tblIndex maxnum, lb, ub;
T *a;
/* command-line:
*
*
ins maxnum
*
*
ins 2000
*
sorts 2000 records
*
*/
maxnum = atoi(argv[1]);
lb = 0; ub = maxnum - 1;
if ((a = malloc(maxnum * sizeof(T))) == 0) {
fprintf (stderr, "insufficient memory (a)\n");
exit(1);
}
fill(a, lb, ub);
insertSort(a, lb, ub);
}

return 0;

Shell Sort

Shell sort, developed by Donald L. Shell, is a non-stable in-place sort. Shell sort improves on the
efficiency of insertion sort by quickly shifting values to their destination. Average sort time is
O(n7/6), while worst-case time is O(n4/3). For further reading, consult Knuth [1998].

Theory
In Figure 2-2(a) we have an example of sorting by insertion. First we extract 1, shift 3 and 5
down one slot, and then insert the 1, for a count of 2 shifts. In the next frame, two shifts are

required before we can insert the 2. The process continues until the last frame, where a total of 2
+ 2 + 1 = 5 shifts have been made.
In Figure 2-2(b) an example of shell sort is illustrated. We begin by doing an insertion sort using
aspacingof two. In the first frame we examine numbers 3-1. Extracting 1, we shift 3 down one
slot for a shift count of 1. Next we examine numbers 5-2. We extract 2, shift 5 down, and then
insert 2. After sorting with a spacing of two, a final pass is made with a spacing of one. This is
simply the traditional insertion sort. The total shift count using shell sort is 1+1+1 = 3. By using
an initial spacing larger than one, we were able to quickly shift values to their proper destination.

Figure 2-2: Shell Sort
Various spacings may be used to implement a shell sort. Typically the array is sorted with a large
spacing, the spacing reduced, and the array sorted again. On the final sort, spacing is one.
Although the shell sort is easy to comprehend, formal analysis is difficult. In particular, optimal
spacing values elude theoreticians. Knuth recommends a technique, due to Sedgewick, that
determines spacinghbased on the following formulas:
hs= 9·2s- 9·2s/2+ 1, if s is even
hs= 8·2s- 6·2(s+1)/2+ 1, if s is odd

These calculations result in values (h0,h1,h2,…) = (1,5,19,41,109,209,…). Calculate h until 3ht >=
N, the number of elements in the array. Then choose ht-1 for a starting value. For example, to sort
150 items, ht = 109 (3·109 >= 150), so the first spacing is ht-1, or 41. The second spacing is 19,
then 5, and finally 1.

Implementation in C
An ANSI-C implementation for shell sort is included. Typedef T and comparison operator
compGT should be altered to reflect the data stored in the array. The central portion of the
algorithm is an insertion sort with a spacing of h.

/* shell sort */ #include <stdio. 268386305. 109. typedef int tblIndex. i <= ub. 3*h[t] < n. 37730305. 3905. 8929. 41. j >= lb && compGT(a[j]. /* start with h[t-1] */ if (t > 0) t--. } } } void fill(T *a. i++) { T tmp = a[i]. 4188161.lb + 1. 5. 209. 16001. 1045505. 9427969. 19. 146305. tmp). /* find t such that 3*h[t] >= n */ n = ub . 67084289. tblIndex ub) { tblIndex i.h> #include <stdlib. 505. j. tblIndex lb. while (t >= 0) { unsigned int ht. int t. static const unsigned int h[] = { 1. t++). T *a. 260609. j -= ht) a[j+ht] = a[j]. char *argv[]) { tblIndex maxnum. tblIndex lb. 150958081. 2415771649U }. for (j = i-ht. 2354689. 587521. 36289. 64769. i++) a[i] = rand(). 1073643521. a[j+ht] = tmp. 16764929. /* sort-by-insertion in increments of h */ ht = h[t--]. i <= ub. for (t = 0. } int main(int argc. unsigned int n. 2161. ub. tblIndex ub) { tblIndex i. 929. lb.b) (a > b) void shellSort(T *a.h> typedef int T. /* command-line: * * shl maxnum * . for (i = lb. srand(1). 603906049. /* type of item to be sorted */ /* type of subscript */ #define compGT(a. for (i = lb + ht.

* * */ shl 2000 sort 2000 records maxnum = atoi(argv[1]).. int Ub). int Lb. QuickSort (A. with proper precautions. while larger values are placed to the right. begin if Lb < Ub then M = Partition (A. Values smaller than the pivot value are placed to the left of the pivot. reorder A[Lb].. int function Partition (Array A. For further reading. ub). worst-case behavior is very unlikely. .. Quicksort Although the shell sort algorithm is significantly better than insertion sort.1. int Lb. ub = maxnum . } return 0. lb = 0. Quicksort is a non-stable sort. M.1). procedure QuickSort (Array A. begin select a pivot from A[Lb]. there is still room for improvement. consult Cormen [2009]. end. } fill(a. Ub). shellSort(a. M . then recursively sorting each partition. "insufficient memory (a)\n"). exit(1). lb.A[Ub]. However. One of the most popular sorting algorithms is quicksort. Lb. It is not an in-place sort as stack space is required..A[Ub] such that: all values to the left of the pivot are <= pivot all values to the right of the pivot are >= pivot return pivot position. Quicksort executes in O(n lgn) on average. and O(n2) in the worst-case. lb. int Ub). Lb. QuickSort (A. In Partition (Figure 2-3). Theory The quicksort algorithm works by partitioning the array to be sorted. ub). if ((a = malloc(maxnum * sizeof(T))) == 0) { fprintf (stderr. Ub). one of the array elements is selected as a pivot value.

. However. Suppose the array was originally in order. All other values would be compared to the pivot value. These elements are then exchanged. equally dividing the array. Indices are run starting at both ends of the array. there is one case that fails miserably. One index starts on the left and selects an element that is larger than the pivot. the run time is O(n lgn). let's assume that this is the case. Each recursive call to quicksort would only diminish the size of the array to be sorted by one. resulting in a O(n2) run time. Therefore n recursive calls would be required to do the sort. Partition could simply select the first element (A[Lb]). Partition would always select the lowest value as a pivot and split the array with one element in the left partition. This would make it extremely unlikely that worst-case behavior would occur. In this manner. and Partition must eventually examine all n elements. To find a pivot value.end. QuickSort recursively sorts the two subarrays. while another index starts on the right and selects an element that is smaller than the pivot. For a moment. and placed either to the left or right of the pivot as appropriate. QuickSort succeeds in sorting the array. Figure 2-3: Quicksort Algorithm In Figure 2-4(a). Figure 2-4: Quicksort Example As the process proceeds. numbers 4 and 1 are selected. and all elements to the right of the pivot are >= the pivot. it may be necessary to move the pivot so that correct ordering is maintained. and Ub-Lb elements in the other. Since the array is split in half at each step. as is shown in Figure 2-4(b). This process repeats until all elements to the left of the pivot <= the pivot. In this case. resulting in the array shown in Figure 2-4(c). the pivot selected is 3. One solution to this problem is to randomly select an item as a pivot. If we're lucky the pivot selected will be the median of all values.

Recursive calls were replaced by explicit stack operations. t). Also included is an ANSI-C implementation.  For short arrays. of qsort.Implementation in C An ANSI-C implementation of quicksort is included. Typedef T and comparison operator compGT should be altered to reflect the data stored in the array. resulting in a better utilization of stack space. /* shift down until insertion point found */ for (j = i-1. this will be a good choice. as short partitions are quickly sorted and dispensed with. tblIndex lb. typedef int tblIndex. /* type of item to be sorted */ /* type of subscript */ #define compGT(a. Included is a version of quicksort that sorts linked-lists. quicksort is not an efficient algorithm to use on small arrays. /* quicksort */ #include <stdio. Cutoff values of 12-200 are appropriate. insertSort is called. j--) . Table 2-1. j. Worst-case behavior occurs when the center element happens to be the largest or smallest element each time partition is invoked.h> #include <stdlib.b) (a < b) void sortByInsertion(T *x. Consequently. Due to recursion and other overhead. i++) { T t = x[i]. If the list is partially ordered.  After an array is partitioned.  Tail recursion occurs when the last statement in a function is a call to the function itself. and quickSortImproved.h> typedef int T. shows timing statistics and stack utilization before and after the enhancements were applied. Enhancements include:  The center element is selected as a pivot in partition. Tail recursion may be replaced by iteration. any array with fewer than 50 elements is sorted using an insertion sort. i <= ub. a standard C library function usually implemented with quicksort. the smallest partition is sorted first. Two version of quicksort are included: quickSort. This results in a better utilization of stack space. j >= 0 && compGT(x[j]. tblIndex ub) { tblIndex i.b) (a > b) #define compLT(a. for (i = lb + 1.

void quickSortImproved(T *x. return j.lb <= 50) { sortByInsertion(x. and */ /* values greater than pivot to the right */ tblIndex i = lb . } void quickSort(T *x. quickSort(x. lb. tblIndex lb. ub). m = partition(x. x[t] */ t = x[i]. lb. return. lb. if (i >= j) break. tblIndex ub) { tblIndex m. swapping to keep */ /* values less than pivot to the left. x[j] = t.1. ub). pivot)). pivot)). /* work from both ends. tblIndex ub) { while (lb < ub) { tblIndex m. while (compGT(x[--j].x[j+1] = x[j]. ub). tblIndex lb. tblIndex j = ub + 1. } tblIndex partition(T *x. /* quickly sort short lists */ if (ub . m + 1. lb. tblIndex lb. while (compLT(x[++i]. m). } /* insert */ x[j+1] = t. while (1) { T t. } if (lb >= ub) return. } m = partition(x. } /* swap x[i]. tblIndex ub) { /* select a pivot */ double pivot = x[(lb+ub)/2]. quickSort(x. ub). x[i] = x[j]. /* eliminate tail recursion and */ /* sort the smallest partition first */ .

for (i = lb. } External Sort One method for sorting a file is to load the file into memory. m). quickSortImproved(a. } } void fill(T *a. } else { quickSortImproved(x. lb. ub). ub = m. tblIndex ub) { tblIndex i. We will implement an external sort using replacement selection to establish initial runs. ub = maxnum . lb. i++) a[i] = rand(). lb = 0. "insufficient memory (a)\n"). char *argv[]) { tblIndex maxnum. When the file cannot be loaded into memory due to resource limitations. } fill(a. i <= ub. srand(1). I highly . followed by a polyphase merge sort to merge the runs into one sorted file. /* command-line: * * qui maxnum * * qui 2000 * sorts 2000 records * */ maxnum = atoi(argv[1]). return 0.lb <= ub . ub). lb = m + 1.m) { quickSortImproved(x. an external sort applicable. sort the data in memory. lb. then write the results.} /* to minimize stack requirements */ if (m . ub). T *a. lb. } int main(int argc. if ((a = malloc(maxnum * sizeof(T))) == 0) { fprintf (stderr. tblIndex lb. m + 1. ub. exit(1).1.

Phase T1 T2 A 7 6 8 4 9 5 B C 9 8 5 4 7 6 9 8 5 4 T3 7 6 9 8 7 6 5 4 D Figure 4-1: Merge Sort Several interesting details have been omitted from the previous illustration. how were the initial runs created? And. did you notice that they merged perfectly. Assume that the beginning of each tape is at the bottom of the frame. all data is on tapes T1 and T2. let me digress for a bit. There are two sequential runs of data on T1: 4-8. Tape T2 has one run: 5-9. in phase A. and 6-7. as many details have been omitted. with no extra runs on any tapes? Before I explain the method used for constructing initial runs. Theory For clarity. Initially. Leonardo Fibonacci presented the following exercise in his Liber Abbaci (Book of the . we've merged the first run from tapes T1 (4-8) and T2 (5-9) into a longer run on tape T3 (4-5-8-9). with the final output on tape T3. I'll assume that data is on one or more reels of magnetic tape. At phase B. In phase D we repeat the merge. For example. Phase C is simply renames the tapes. In 1202. so we may repeat the merge again. Figure 4-1 illustrates a 3-way polyphase merge.recommend you consult Knuth [1998].

Curiously. This arrangement is ideal. Note that the numbers {1. and {0. and that rabbits never die. we would be left with 8 runs on T1 and 13 runs on T3 {8. the following month the original pair and the pair born during the first month will both usher in a new pair. The tape is read. 21. as tapes must be mounted and rewound for each pass. Recall that we initially had one run on tape T2. . 55. Initially. 13. that if we had 13 runs on T2. there will be 2 pairs of rabbits.. One method we could use to create initial runs is to read a batch of records into memory. and 2 runs on tape T1. in fact. all the data is on one tape. and runs are distributed to other tapes in the system. Initially. for a total of 7 passes.  Write the selected record.13} after one pass. this is a big savings. and will result in the minimum number of passes. replacement selection.1}. After the initial runs are created. We could predict. {1. the buffer is filled.1} are two sequential numbers in the Fibonacci series. 3. Note that the numbers {1. {3. . 5. 89. Should data actually be on tape.5}. {2. 2.1}. we had one run on T1 and one run on T2. This process would continue until we had exhausted the input tape. allows for longer runs. Successive passes would result in run counts of {5. only one notch down. 1.2} are two sequential numbers in the Fibonacci series. the Fibonacci series has something to do with establishing initial runs for external sorts. Select the record with the smallest key for the first record of the next run. and there will be 5 in all. An alternative algorithm.Abacus): "How many pairs of rabbits can be produced from a single pair in a year's time?" We may assume that each pair produces a new pair of offspring every month.21}. each pair becomes fertile at the age of one month. 8. and so on. and 21 runs on T1 {13. After our first merge.  If all keys are smaller than the key of the last record written. 1. There's even a Fibonacci Quarterly journal. sort the records. the following steps are repeated until the input is exhausted:  Select the record with the smallest key that is >= the key of the last record written.8}.3}. This series. the Fibonacci series has found wide-spread application to everything from the arrangement of flowers on plants to studying the efficiency of Euclid's algorithm. After one month. as you might suspect. is known as the Fibonacci sequence: 0. A buffer is allocated in memory to act as a holding place for several records. . higher-order Fibonacci numbers are used. For more than 2 tapes. then we have reached the end of a run. and write them out. where each number is the sum of the two preceeding numbers. after two months there will be 3. And.. Then. they are merged as described above. 34.

Thus. so that we only compare lgnitems. Replace the selected record with a new record from input. we replace it with key 4. Function readRec employs the replacement selection algorithm (utilizing a . Function makeRuns calls readRec to read the next record. execution time becomes prohibitive. After writing key 7. searching for the appropriate key. One way to do this is to scan the entire list. this method is more effective than doing partial sorts. When selecting the next output record. Figure 4-2 illustrates replacement selection for a small file. This is key 7. We select the smallest key >= 6 in step D. and start another. This process repeats until step F. An alternative method is to use a binary tree structure. However. This is replaced with the next record (key 8). I've allocated a 2-record buffer. Implementation in C An ANSI-C implementation of an external sort is included. we terminate the run. runs can be extremely long. However. the average length of a run is twice the length of the buffer. we need to find the smallest key >= the last key written. and all keys are less than 8. such a buffer would hold thousands of records. and write the record with the smallest key (6) in step C. We load the buffer in step B. if the data is somewhat ordered. Step Input Buffer Output A 5-3-4-8-6-7 B 5-3-4-8 6-7 C 5-3-4 8-7 6 D 5-3 8-4 6-7 E 5 3-4 6-7-8 F 5-4 6-7-8 | 3 G 5 6-7-8 | 3-4 H 6-7-8 | 3-4-5 Figure 4-2: Replacement Selection This strategy simply utilizes an intermediate buffer to hold values until the appropriate time for output. At this point. when the buffer holds thousands of records. Typically. Using random numbers as input. where our last key written was 8. To keep things simple.

char name[LNAME]. char *ifName. } tmpFileType. Function mergeSort is then called to do a polyphase merge sort on the runs. recType rec. bool valid. typedef struct recTypeTag { keyType key. and makeRuns distributes the records in a Fibonacci distribution. int dummy. /* /* /* /* /* /* /* /* file pointer */ filename */ last record read */ number of dummy runs */ end-of-file flag */ end-of-run flag */ true if rec is valid */ ideal fibonacci number */ static static static static /* /* /* /* array of file info for tmp files */ number of tmp files */ input filename */ output filename */ tmpFileType **file.h> #include <stdlib.h> /**************************** * implementation dependent * ****************************/ /* template for workfiles (8. #if LRECL char data[LRECL-sizeof(keyType)].binary tree) to fetch the next record. If the number of runs is not a perfect Fibonacci number.3 format) */ #define FNAME "_sort%03d.y) (x > y) /* define the record to be sorted here */ #define LRECL 100 typedef int keyType. typedef struct tmpFileTag { FILE *fp. dummy runs are simulated at the beginning of each file.dat" #define LNAME 13 /* comparison operators */ #define compLT(x. int fib. .y) (x < y) #define compGT(x. /* external sort */ #include <stdio.h> #include <string. int nTmpFiles. bool eor. char *ofName. #endif } recType. /* sort key for record */ /* other fields */ /****************************** * implementation independent * ******************************/ typedef enum {false. bool eof. true} bool.

1. /* level of runs */ /* number of nodes for selection tree */ void deleteTmpFiles(void) { int i. file[fileT]->fp = NULL. size)) == NULL) { printf("error: malloc failed. deleteTmpFiles(). } deleteTmpFiles(). /* safely allocate memory and initialize to zero */ if ((p = calloc(1. if (*file[i]->name) remove(file[i]->name). } } free (file). if (rename(file[fileT]->name. exit(1).static int level. . } void cleanExit(int rc) { } /* cleanup tmp files and exit */ termTmpFiles(rc). /* delete merge files and free resources */ if (file) { for (i = 0. void *safeMalloc(size_t size) { void *p. } return p. cleanExit(1). /* file[T] contains results */ fileT = nTmpFiles . } *file[fileT]->name = 0. exit(rc). i < nTmpFiles. fclose(file[fileT]->fp). free (file[i]). size). if (rc == 0) { int fileT. ofName)) { perror("io1"). i++) { if (file[i]) { if (file[i]->fp) fclose(file[i]->fp). static int nNodes. size = %d\n". } } void termTmpFiles(int rc) { /* cleanup files */ remove(ofName).

i). static int curRun. /* /* /* /* /* /* /* /* /* array of selection tree nodes */ new winner */ input file */ true if end-of-file. tmpFileType *fileInfo. static FILE *ifp. . /* bool valid. i < nTmpFiles. /* int run. file = safeMalloc(nTmpFiles * sizeof(tmpFileType*)). static keyType lastKey. /* initialize merge files */ if (nTmpFiles < 3) nTmpFiles = 3. /* internal node */ /* external node */ static nodeType *node. static bool eof. } nodeType./* recType rec. if ((file[i]->fp = fopen(file[i]->name. static int maxRun./* parent of internal node */ struct eNodeTag *loser. static eNodeType *win. for (i = 0. /* } eNodeType. static bool lastKeyValid. external node */ parent of external node */ input record */ run number */ input record is valid */ typedef struct nodeTag { iNodeType i. typedef struct eNodeTag { /* struct iNodeTag *parent. input */ maximum run number */ current run number */ pointer to internal nodes */ true if lastKey is valid */ last key written */ /* read next record using replacement selection */ /* check for first call */ if (node == NULL) { int i. sprintf(file[i]->name. } } } recType *readRec(void) { typedef struct iNodeTag { /* internal node */ struct iNodeTag *parent. FNAME. iNodeType *p. if (nNodes < 2) nNodes = 2. fileInfo = safeMalloc(nTmpFiles * sizeof(tmpFileType)). /* external loser */ } iNodeType. "w+b")) == NULL) { perror("io2"). eNodeType e. cleanExit(1).} void initTmpFiles(void) { int i. i++) { file[i] = fileInfo + i.

cleanExit(1).i. win->valid = false.run = 0. } else if (p->loser->run == win->run) { if (p->loser->valid && win->valid) { if (compLT(p->loser->rec.key. lastKey)) && (++win->run > maxRun)) maxRun = win->run.valid = false.e.e.key)) swap = true. win->valid = true. node[i].node = safeMalloc(nNodes * sizeof(nodeType)). lastKeyValid = false.e. ifp) == 1) { if ((!lastKeyValid || compLT(win->rec. } else if (feof(ifp)) { fclose(ifp). } else { perror("io4"). swap = false.i. "rb")) == NULL) { printf("error: file %s.i. cleanExit(1).parent = &node[(nNodes + i)/2]. } } else { win->valid = false. win->run = maxRun + 1.e. do { bool swap. node[i]. win->rec.i. eof = true.e. } while (1) { /* replace previous winner with new record */ if (!eof) { if (fread(&win->rec. } /* adjust loser and winner pointers */ p = win->parent.loser = &node[i]. } if ((ifp = fopen(ifName. for (i = 0. unable to open\n". i < nNodes. } else { swap = true. ifName). node[i]. 1.key. node[i]. sizeof(recType). } } if (swap) { . win->run = maxRun + 1. if (p->loser->run < win->run) { swap = true. i++) { node[i].parent = &node[i/2]. } win = &node[0].

j++) { file[j]->fib = 1.key. . return &win->rec. /* /* /* /* winner */ last file */ next to last file */ selects file[j] */ /* Make initial runs using replacement selection. fileP = fileT . } } void makeRuns(void) { recType *win. file[j]->dummy = 1. int fileT. t = p->loser. j < fileT. for (j = 0. } curRun = win->run.1. file[fileT]->dummy = 0. j = 0. lastKeyValid = true. } file[fileT]->fib = 0.1. int fileP./* p should be winner */ eNodeType *t. win = t.i). } p = p->parent. } } /* output top of tree */ if (win->run) { lastKey = win->rec. p->loser = win. * Runs are written using a Fibonacci distintbution. /* end of run? */ if (win->run != curRun) { /* win->run = curRun + 1 */ if (win->run > maxRun) { /* end of output */ free(node). } while (p != &node[0]. level = 1. win = readRec(). return NULL. int j. */ /* initialize file structures */ fileT = nTmpFiles .

} } else { /* first run in file */ file[j]->dummy--. win && j <= fileP. j++) { bool run. for (j = 0. 1. if ((win = readRec()) == NULL) break. } if (run) { anyrun = true. } } } } void rewindFile(int j) { /* rewind file[j] and read in first record */ file[j]->eor = false. j++) { file[j]->dummy = t + file[j+1]->fib .key)) { /* append to an existing run */ run = true. file[j]->fp) != 1) { perror("io3"). . run = true. t = file[0]->fib. if (compLT(win->key. up a level */ if (!anyrun) { int t. file[j]->rec. file[j]->valid = true.key = win->key. cleanExit(1). for (j = 0. sizeof(recType). } file[j]->rec. j <= fileP. file[j]->rec. } } /* if no room for runs. } else if (file[j]->dummy) { /* start a new run */ file[j]->dummy--. run = true. if (file[j]->valid) { if (!compLT(win->key. level++. anyrun = false.key)) break. file[j]->fib = t + file[j+1]->fib.while (win) { bool anyrun. } /* flush run */ while(1) { if (fwrite(win.file[j]->fib. run = false.

rewind(file[j]->fp). /* prime the files */ for (j = 0. 1. file[j]->eof = true. } else { perror("io5"). /* polyphase merge sort */ fileT = nTmpFiles . } } void mergeSort(void) { int fileT. int j.1. sizeof(recType). } } if (anyRuns) { int k.. if (fread(&file[j]->rec. j < fileT. cleanExit(1). if (!file[j]->eof) anyRuns = true. j++) { rewindFile(j). . bool anyRuns. /* merge 1 run file[0]. file[j]->fp) != 1) { if (feof(file[j]->fp)) { file[j]->eor = true. tmpFileType *tfile. } /* each pass through loop merges one run */ while (level) { while(1) { bool allDummies. /* scan for runs */ allDummies = true. j <= fileP. int fileP. fileP = fileT .1.file[P] --> file[T] */ while(1) { /* each pass thru loop writes 1 record to file[fileT] */ /* find smallest key */ k = -1. anyRuns = false.} file[j]->eof = false. keyType lastKey. j++) { if (!file[j]->dummy) { allDummies = false. for (j = 0.

. sizeof(recType). file[fileT] contains data */ return. j <= fileP. } } else if (allDummies) { for (j = 0.key. j <= fileP. } else if (feof(file[k]->fp)) { file[k]->eof = true. lastKey)) file[k]->eor = true. } else { perror("io7"). reopen as new */ fclose(file[fileP]->fp). cleanExit(1). if (fread(&file[k]->rec. } /* end of run */ if (file[fileP]->eof && !file[fileP]->dummy) { /* completed a fibonocci-level */ level--. (k < 0 || != j && compGT(file[k]->rec. file[fileT]->fp) != 1) { perror("io6"). j++) { if (file[j]->dummy) file[j]->dummy--. (file[j]->dummy) continue. file[fileT]->dummy++. /* write record[k] to file[fileT] */ if (fwrite(&file[k]->rec. if (!file[j]->eof) file[j]->eor = false. file[k]->eor = true. j++) file[j]->dummy--. j <= fileP. file[k]->fp) == 1) { /* check for end of run on file[s] */ if (compLT(file[k]->rec. cleanExit(1). 1. j++) { (file[j]->eor) continue.key. } if (k < 0) break.key.key))) = 0. } /* fixup dummies */ for (j = 0. file[j]- k = j. } } /* replace record[k] */ lastKey = file[k]->rec. } /* fileP is exhausted.for (j if if if (k >rec. 1. if (!level) { /* we're done. sizeof(recType).

termTmpFiles(0). } int main(int argc.f[fileT] <-.. rewindFile(fileT). sorts using 5 files and 2000 nodes. .. file[fileP]->eor = false.if ((file[fileP]->fp = fopen(file[fileP]->name.dat out. cleanExit(1). } } } } void extSort(void) { initTmpFiles().dat 5 2000 * reads in. fileT * sizeof(tmpFileType*)).dat. printf("extSort: nFiles=%d. /* f[0]. /* start new runs */ for (j = 0.f[T-1] */ tfile = file[fileT]. file[0] = tfile.. } file[fileP]->eof = false. mergeSort(). "w+b")) == NULL) { perror("io8"). j++) if (!file[j]->eof) file[j]->eor = false. nNodes=%d. cleanExit(1). nTmpFiles = atoi(argv[3]).. nTmpFiles. memmove(file + 1. file. char *argv[]) { /* command-line: * * ext ifName ofName nTmpFiles nNodes * * ext in.f[fileT]. nNodes = atoi(argv[4]). output to out.f[0]. nNodes. extSort().. argv[0]). lrecl=%d\n". makeRuns(). j <= fileP.dat */ if (argc != 5) { printf("%s ifName ofName nTmpFiles nNodes\n".f[1]. } ifName = argv[1].. ofName = argv[2]. sizeof(recType)).

. this is true only if the tree is balanced. The second comparison finds that 16 > 7. Either child. to search for 16. with search time increasing proportional to the number of elements stored. since data was stored in an array. In this case the search time is O(n). In this respect. 16. This method is very effective. However. For this example the tree height is 2. may be missing. Binary Search Trees In the introduction we used the binary search algorithm to find data stored in an array. While it is a binary search tree. and all children to the right of the node have values larger than k. we first note that 16 < 20 and we traverse to the left child. the algorithm is similar to a binary search on an array. The height of a tree is the length of the longest path from root to leaf. so we traverse to the right child. Assuming krepresents the value of a given node. Figure 3-2 illustrates a binary search tree. its behavior is more like that of a linked list. Theory A binary search tree is a tree where each node has a left and right child. Figure 3-3 shows another tree containing the same values. and 43. then a binary search tree also has the following property: all children to the left of the node have values smaller than k. we succeed.} return 0. Binary search trees store data in nodes that are linked in a tree-like fashion. The top of a tree is known as the root. the root is node 20 and the leaves are nodes 4. as each iteration reduced the number of items to search by one-half. For randomly inserted data. Each comparison results in reducing the number of items to inspect by one-half. we start at the root and work down. For example. or both children. In Figure 3-2. search time is O(lgn). However. Worst-case behavior occurs when ordered data is inserted. For example. On the third comparison. See Cormen [2001] for a more detailed description. 37. insertions and deletions were not efficient. and the exposed nodes at the bottom are known as leaves. Figure 3-2: A Binary Search Tree To search a tree for a given value.

To make the selection. each node will be added to the right of the previous node. This causes us to arrive at node 16 with nowhere to go. then a more balanced tree is possible. Now we can see how an unbalanced tree can occur. Figure 3-4: Binary Tree After Adding Node 18 .Figure 3-3: An Unbalanced Binary Search Tree Insertion and Deletion Let us examine insertions in a binary search tree to determine the conditions that can cause an unbalanced tree. However. Therefore we need to select the smallest valued node to the right of node 20. If the data is presented in an ascending sequence. To insert an 18 in the tree in Figure 3-2. it must be replaced by node 18 or node 37. but require that the binary search tree property be maintained. the resulting tree is shown in Figure 3-5. For example. The successor for node 20 must be chosen such that all nodes to the right are larger. Deletions are similar. if data is presented for insertion in a random order. This will create one long chain. or linked list. and then chain to the left until the last node is found (node 37). we simply add node 18 to the right child of node 16 (Figure 3-4). chain once to the right (node 38). we first search for that number. Assuming we replace a node by its successor. This is the successor for node 20. if node 20 in Figure 3-4 is removed. Since 18 > 16. The rationale for this choice is as follows.

} nodeType. struct nodeTag *right. /* root of binary tree */ left child */ right child */ parent */ key used for searching */ user data */ .h> #include <stdlib. } recType. STATUS_MEM_EXHAUSTED. and is initially NULL.b) (a == b) /* implementation dependent declarations */ typedef enum { STATUS_OK. keyType. STATUS_DUPLICATE_KEY. recType rec. Typedefs recType. struct nodeTag *parent.h> #define compLT(a. /* type of key */ /* optional related data */ typedef struct nodeTag { struct nodeTag *left. Function delete deletes and frees a node from the tree. and parent pointers designating each child and the parent. Function insert allocates a new node and inserts it in the tree. /* user data stored in tree */ typedef struct { int stuff. /* /* /* /* /* nodeType *root = NULL.b) (a < b) #define compEQ(a. The tree is based atroot. STATUS_KEY_NOT_FOUND } statusEnum. Each Node consists of left. keyType key. Function find searches the tree for a particular value. /* binary search tree */ #include <stdio. right. and comparison operators compLT and compEQ should be altered to reflect the data stored in the tree.Figure 3-5: Binary Tree After Deleting Node 20 Implementation in C An ANSI-C implementation for a binary search tree is included. typedef int keyType.

*current. statusEnum delete(keyType key) { nodeType *x. x->key = key. } return STATUS_OK. *y. while(z != NULL) { if(compEQ(key. current->key) ? current->left : current->right. parent = current. } /* setup new node */ if ((x = malloc (sizeof(*x))) == 0) { return STATUS_MEM_EXHAUSTED. else z = compLT(key. else root = x. } . current->key)) return STATUS_DUPLICATE_KEY.statusEnum insert(keyType key. *parent. x->rec = *rec. current = compLT(key. /*************************** * delete node from tree * ***************************/ /* find node in tree */ z = root. z->key) ? z->left : z->right. /* insert x in tree */ if(parent) if(compLT(x->key. /*********************************************** * allocate node for data and insert in tree * ***********************************************/ /* find future parent */ current = root. x->right = NULL. z->key)) break. while (current) { if (compEQ(key. parent = 0. *z. parent->key)) parent->left = x. } x->parent = parent. else parent->right = x. x->left = NULL. recType *rec) { nodeType *x.

char **argv) { . else { y = z->right.if (!z) return STATUS_KEY_NOT_FOUND. while(current != NULL) { if(compEQ(key. } free (y). /* if z and y are not the same. current->key)) { *rec = current->rec. } else { current = compLT(key. else x = y->right. } } return STATUS_KEY_NOT_FOUND. copy y to z. return STATUS_OK. recType *rec) { /******************************* * find node containing data * *******************************/ } nodeType *current = root. if (y->parent) if (y == y->parent->left) y->parent->left = x. /* remove y from the parent chain */ if (x) x->parent = y->parent. */ if (y != z) { z->key = y->key. while (y->left != NULL) y = y->left. int main(int argc. return STATUS_OK. /* find tree successor */ if (z->left == NULL || z->right == NULL) y = z. current->key) ? current->left : current->right. else y->parent->right = x. z->rec = y->rec. } statusEnum find(keyType key. else root = x. } /* point x to a valid child of y. if it has one */ if (y->left != NULL) x = y->left.

maxnum). i++) key[i] = i. %d items\n". statusEnum status. exit(1). i. if (status) printf("pt3. i >= 0. keyType *key. /* command-line: * * bin maxnum random * * bin 5000 // 5000 sequential * bin 2000 r // 2000 random * */ maxnum = atoi(argv[1]). } else { for (i=0. i >= 0. recType *rec. i=%d: %d\n". i=%d: %d\n". "insufficient memory (rec)\n"). %d items\n". } for (i = maxnum-1. if (status) printf("pt1. } return 0. i < maxnum. "insufficient memory (key)\n"). } .int i. i--) { status = find(key[i]. i++) { status = insert(key[i]. maxnum). status). &rec[i]). printf ("ran bt. i--) { status = delete(key[i]). i. status). exit(1). printf ("seq bt. i++) key[i] = rand(). if (status) printf("pt2. i < maxnum. maxnum. if ((rec = malloc(maxnum * sizeof(recType))) == 0) { fprintf (stderr. random. } for (i = maxnum-1. i=%d: %d\n". i<maxnum. } for (i = 0. random = argc > 2. } if ((key = malloc(maxnum * sizeof(keyType))) == 0) { fprintf (stderr. status). i. } if (random) { /* random */ /* fill "key" with unique random numbers */ for (i = 0. &rec[i]).

Its worst-case running time has a lower order of growth than insertion sort. p = 1 and r = n.. r]. but these values change as we recurse through subproblems..Merge Sort Merge sort is based on the divide-and-conquer paradigm... 3. r]. r] into two subarrays A[p . q] and A[q + 1 . That is.. r]. r]: 1. it is already sorted. Conquer Step Conquer by recursively sorting the two subarrays A[p . split A[p . r].. p. 2. To sort A[p . Combine Step Combine the elements back in A[p .. Divide Step If a given array A has zero or one element. r). q] and A[q + 1 . we state each subproblem as sorting a subarray A[p . . r] into a sorted sequence. r]. Initially. Otherwise.... Since we are dealing with subproblems. we will define a procedure MERGE (A. To accomplish this step. each containing about half of the elements of A[p .. simply return. r] by merging the two sorted subarrays A[p .. q] and A[q + 1 . q is the halfway point of A[p . q.

r) 1. r) MERGE (A. // Conquer step. Algorithm: Merge Sort To sort the entire sequence A[1 . q) MERGE (A.Note that the recursion bottoms out when the subarray has just one element. p. 5.. make the initial call to the procedure MERGE-SORT (A. n]. 4. n). p. 3. 1. Example: Bottom-up view of the above procedure for n = 8. so that it is trivially sorted. IF p < r THEN q = FLOOR[(p + r)/2] MERGE (A. q. . p. MERGE-SORT (A. // Conquer step. 2. q + 1. r) // Check for base case // Divide step // Conquer step.

Now the question is do we actually need to check whether a pile is empty before each basic step? The answer is no. which is the number of elements being merged. this procedure should take Θ(n) time. r. neither subarray is empty. since we check just the two top cards.  Once one input pile empties. Each pile is sorted and placed face-up on a table with the smallest cards on top. Each basic step should take constant time. OUTPUT: The two subarrays are merged into a single sorted subarray in A[p . r]. Therefore... since each basic step removes one card from the input piles. We implement it so that it takes Θ(n) time. Put on the bottom of each input pile a special sentinel card. q. q. thereby exposing a new top card. We will merge these into a single sorted pile.  Place the chosen card face-down onto the output pile. Idea Behind Linear Time Merging Think of two piles of cards. The following is the input and output of the MERGE procedure. By restrictions on p. INPUT: Array A and indices p. face-down on the table.  Remove it from its pile. It . There are at most n basic steps. we do not. r] is sorted. r such that p ≤ q ≤ r and subarray A[p . A basic step:  Choose the smaller of the two top cards.  Repeatedly perform basic steps until one input pile is empty. where n = r − p + 1. and we started with n cards in the input piles. q] is sorted and subarray A[q + 1 .Merging What remains is the MERGE procedure.. just take the remaining input pile and place it face-down onto the output pile.

Rather than even counting basic steps. FOR j ← 1 TO n2 7. i ← 1 11. The pseudocode of the MERGE procedure is as follow: MERGE (A. n1] and A[q + 1 . 16). L[n1 + 1] ← ∞ 9. . THEN A[k] ← L[i] 15. But when that happens. p.3]: A call of MERGE(A. r ) 1. DO R[j] ← A[q + j ] 8. j ← 1 12. where A[p . 12. n1 ← q − p + 1 2. DO L[i] ← A[p + i − 1] 6. . n1 + 1] and R[1 . . . 9. . n2].contains a special value that we use to simplify the code. j←j+1 Example [from CLRS-Figure 2. n2 + 1] 4.  The first part shows the arrays at the start of the "for k ← p to r" loop. ELSE A[k] ← R[j] 17. q] is copied into L[1 . just fill up the output array from index p up through and including index r . FOR k ← p TO r 13. i←i+1 16. We use ∞. r ] is copied into R[1 . The only way that ∞ cannot lose is when both piles have ∞ exposed as their top cards. Never a need to check for sentinels. since they will always lose. . since that's guaranteed to lose to any other value. q. We know in advance that there are exactly r − p + 1 nonsentinel cards so stop once we have performed r − p + 1 basic steps. That is how we have done in the class. R[n2 + 1] ← ∞ 10. n2 ← r − q 3. FOR i ← 1 TO n1 5. Create arrays L[1 . Read the following figure row by row. . all the nonsentinel cards have already been placed into the output pile. DO IF L[i ] ≤ R[ j] 14.

] . and that only the sentinels (∞) are exposed in the arrays L and R. . Entries in L and R with slashes have been copied back into A. which is now sorted. r].  Entries in A with slashes have had their values copied to either L or R and have not had a value copied back in yet.  The last part shows that the subarrays are merged back into A[p . Succeeding parts show the situation at the start of successive iterations.

each taking constant time. the loop in line 4 and the loop in line 6) take Θ(n1 + n2) = Θ(n) time. the loop in line 12) makes n iterations.Running Time The first two for loops (that is. The last for loop (that is. .

. When n ≥ 2. assume that n is a power of 2 so that each divide step yields two subproblems.for Θ(n) time. Θ(1). which takes constant time i. Therefore. On small inputs. the total running time is Θ(n). Analyzing Merge Sort For simplicity.e. The base case occurs when n = 1. Summed together they give a function that is linear in n. merge sort is faster.  Combine: MERGE on an n-element subarray takes Θ(n) time. insertion sort may be faster. Trading a factor of n for a factor of lg n is a good deal. which is Θ(n). merge sort will always be faster. Compared to insertion sort [Θ(n2) worst-case time]. the recurrence for merge sort running time is Solving the Merge Sort Recurrence By the master theorem in CLRS-Chapter 4 (page 73). which is 2T(n/2). because its running time grows more slowly than insertion sorts. we can show that this recurrence has the solution T(n) = Θ(n lg n). But for large enough inputs. Therefore.  Conquer: Recursively solve 2 subproblems. Reminder: lg n stands for log2 n. both of size exactly n/2. each of size n/2. time for merge sort steps:  Divide: Just compute q as the average of p and r.

There is a drawing of recursion tree on page 35 in CLRS.5b in CLRS) shows that for the original problem. we have a cost of cn/2. each costing T (n/4). which shows successive expansions of the recurrence. plus two subproblems. . The following figure (Figure: 2.5d in CLRS) tells to continue expanding until the problem sizes get down to 1.Recursion Tree We can understand how to solve the merge-sort recurrence without the master theorem. The following figure (Figure 2. we have a cost of cn. each costing T (n/2). The following figure (Figure 2.5c in CLRS) shows that for each of the size-n/2 subproblems. plus the two subproblems.

In the above recursion tree. each level has cost cn. Therefore. each contributing cost cn/4. each contributing cost cn/2. the number of subproblems doubles but the cost per subproblem halves. The height of this recursion tree is lg n and there are lg n + 1 levels. cost per level stays the same.  The next level down has 2 subproblems.  The next level has 4 subproblems.  The top level has cost cn.  Each time we go down one level. Mathematical Induction .

Base case: n = 1 Implies that there is 1 level. Total cost is sum of costs at each level of the tree. each costing cn. A tree for a problem size of 2i + 1 has one more level than the size-2i tree implying i + 2 levels. and we have. Since lg 2i + 1 = i + 2. and lg 1 + 1 = 0 + 1 = 1. .We use induction on the size of a given subproblem n. Θ(n lg n) which is the desired result. Because we assume that the problem size is a power of 2. Inductive Step Our inductive hypothesis is that a tree for a problem size of 2i has lg 2i + 1 = i +1 levels. the total cost is cn lg n + cn. Since we have lg n +1 levels. we are done with the inductive argument. Ignore low-order term of cn and constant coefÞcient c. the next problem size up after 2i is 2i + 1.

array_size . 0. if (right > left) { . int right) { int mid. int temp[]. int temp[]. } void m_sort(int numbers[].1).Implementation void mergeSort(int numbers[]. int array_size) { m_sort(numbers. int left. temp.

. mid). int temp[]. left_end = mid .1. merge(numbers.left + 1. while ((left <= left_end) && (mid <= right)) { if (numbers[left] <= numbers[mid]) { temp[tmp_pos] = numbers[left]. m_sort(numbers. } } void merge(int numbers[]. right). left. int right) { int i. temp. mid+1. int mid. tmp_pos = left. temp. mid+1. temp. right). int left. left. tmp_pos. num_elements.mid = (right + left) / 2. left_end. num_elements = right . m_sort(numbers.

} while (mid <= right) { temp[tmp_pos] = numbers[mid].tmp_pos = tmp_pos + 1. left = left +1. tmp_pos = tmp_pos + 1. . tmp_pos = tmp_pos + 1. } } while (left <= left_end) { temp[tmp_pos] = numbers[left]. left = left + 1. } else { temp[tmp_pos] = numbers[mid]. mid = mid + 1.

mid = mid + 1. i <= num_elements. } } Merge sort From Wikipedia. search Merge sort . tmp_pos = tmp_pos + 1.1. right = right . the free encyclopedia Jump to: navigation. i++) { numbers[right] = temp[right]. } for (i = 0.

Finally all the elements are sorted and merged. which means that the implementation preserves the input order of equal elements in the sorted output. Merge sort is a divide and conquer algorithm that was invented by John von Neumann in 1945. Most implementations produce a stable sort.[1] A detailed description and analysis of bottom-up mergesort appeared in a report by Goldstine and Neumann as early as 1948. Class Sorting algorithm Data structure Array Worst case performance O(n log n) O(n log n) typical. Best case performance Average case performance Worst case space complexity O(n) natural variant O(n log n) O(n) auxiliary Merge sort (also commonly spelled mergesort) is an O(n log n) comparison-based sorting algorithm. then compare each element with the adjacent list to sort and merge the two adjacent lists. First divide the list into the smallest unit (1 element).An example of merge sort.[2] .

To sort a sequence of n>1 elements: .e. divide-and-conquer strategy. In a two-way merge. two sorted sequences are merged into one. Figure: Two-Way Merging Sorting by merging is a recursive. Since such a sequence is already sorted. I. However in order to do this. In the base case.. Figure illustrates the basic. there is nothing to be done. Merging is the combination of two or more sorted sequences into a single sorted sequence. it is not possible to merge the two sequences in place in O(n) steps.Merge Sorting The fourth class of sorting algorithm we consider comprises algorithms that sort by merging . we need space in which to store the result. we have a sequence with exactly one element in it. two sorted sequences each of length n can be merged into a sorted sequence of length 2n in O(2n)=O(n) steps. Clearly. two-way merge operation.

. recursively sort each of the two subsequences.1. Divide the sequence into two sequences of length and . 2. The tempArray variable is keeps track of that array. Since merge operations cannot be done in place. The TwoWayMergeSorter constructor simply sets the tempArray pointer to zero. Figure illustrates the operation of the two-way merge sort algorithm. is declared. temporary array is needed. and then. merge the sorted subsequences to obtain the final result. Figure: Two-Way Merge Sorting Implementation Program declares the TwoWayMergeSorter<T> class template. a second. tempArray. A single member variable. 3. This variable is a pointer to an Array<T> instance.

place ai into bin ai mod n. Merge and two versions of DoSort. An example Assume that we have n integers in the range (0.Program: TwoWayMergeSorter<T> Class Definition In addition to the constructor. . The purpose of the Merge function is to merge sorted subsequences of the array to be sorted. Using n bins. 7. and we would have an O(n+m) = O(n2) algorithm.) Sort them in two phases: 1. m = n2. three protected member functions are declared. The two DoSort routines implement the sorting algorithm itself. (For a bin sort.5 Radix Sorting The bin sorting approach can be generalised in a technique that is known as radix sorting.n2) to be sorted.

2. being careful to add each item to the end of the bin. After the first phase. Repeating the process. This results in a sorted list. We can apply this process to numbers of any size expressed to any suitable base or radix. eg 1900 < year <= 2000.5. In all cases. If the ranges for day and month are limited in the obvious way. being careful to append to the end of each bin. suppose that the sorting key is a sequence of fields. eg the key is a date using the structure: typedef int int int } date.99). consider the list of integers: 36 9 0 25 1 49 64 16 81 4 n is 10 and the numbers all lie in (0. we used the leading decimal digit to allocate items to bins. month. each with bounded ranges. As an example. 7. placing ai into bin floor(ai/n). year. we will have: Bin Content 0 1 0 1 81 2 - 3 - 4 5 6 7 8 9 64 4 25 36 16 - - 9 49 Note that in this phase. then we can apply the same procedure except that we'll employ a different number of bins in each phase. will produce: Bin 0 1 2 3 4 5 6 7 8 9 16 25 36 49 - 64 - 81 - Content 0 1 4 9 In this second phase. we'll sort first using the least significant "digit" (where "digit" here means a .1 Generalised Radix Sorting We can further observe that it's not necessary to use the same radix in each phase. Repeat the process using n bins. and the range for year is suitably constrained. we placed each item in a bin indexed by the least significant decimal digit. struct t_date { day.

then the keys can be viewed as k-digit base-b integers. O(si) for(j=0.j<si. then using the next significant "digit". Thus. If the key length were allowed to increase with n. each taking O(n) time.j++) bin[j] = EMPTY. Assume that the key of the item to be sorted has k fields. and that each fi has si discrete values.bk-1). } O(si) } Total Now if. placing each item after all the items already in the bin.field with a limited range). If k is allowed to increase with n.j<n.j++) concatenate bin[j] onto the end of A.i<k. then we would have: . then we will be able to use the radixsort approach effectively if we allow duplicate keys when n>bk. if we need to have unique keys.k-1. Another way of looking at this is to note that if the range of the key is restricted to (0. For example.. and so on. then we have a different picture.bk-1). This result depends on k being constant. for some constant k. then k must increase to at least logbn. However.i++) { for(j=0. it takes log2n binary digits to represent an integer <n. we need to have logn phases. then a generalised radix sort procedure can be written: radixsort( A.j<si. so that k = logn. as n increases. si = b for all i and the time complexity becomes O(n+kb) or O(n). the keys are integers in (0. and the radix sort is the same as quick sort! . fi|i=0. for example. n ) { for(i=0.j++) { move Ai to the end of bin[Ai->fi] } O(n) for(j=0. Thus.

Sample code This sample code sorts arrays of integers on various radices: the number of bits used for each radix can be set with the call to SetRadices. so RadixSort can be very expensive in its memory usage! . The Bins class is used in each phase to collect the items as they are sorted. ConsBins is called to set up a set of bins: each bin must be large enough to accommodate the whole array.