Professional Documents
Culture Documents
March 16
Bucket sort
Bucket sort has two meanings. One is similar to that of Counting sort that is described in the book. We assume that every entry to be sorted is in the set {0, 1, . . . , m 1}. We sort array fragment < A, 0, n > using array of buckets B[m]. Bucket_sort(A,n,B,m) { // distribution for (i = 0; i < m; i++) place A[i] in bucket B[A[i]]; // collection for (i = j = 0; j < m; j++ ) { while (B[j] is empty) x = removed from B[j], A[i++] = x; } }
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
2 2
B[0]
B[1]
B[2]
3 3 3 B[3]
4 4
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
3 3 B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
3 B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
B[0]
B[1]
B[2]
B[3]
B[4]
B[5]
March 16
Counting sort
We can use fragments of another array as buckets. If we place them appropriately, we do not need Collection stageinstead, we need to perform Census stage to calculate the placement of the buckets. Counting_sort(A,B,n) { int C[m+1]; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[A[i]]++; // bucket i will be <B,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) B[C[A[i]]++] = A[i]; } In this algorithm we are performing a constant number of work per each sorted numberwe look at it during Census of each bucket and move it during Transfer to bucketsand for each bucketto Prepare counters and to compute the bucket limits. Thus the running time is (m + n) Remark. This is a stable sorting method; we do not change relative positions in the order of numbers that are equal; this is important later.
March 16
Array C[6]:
0 1 2 3 4 5 6 0 0 0 0 0 0
March 16
Array C[6]:
0 1 2 3 4 5 6 1 1 2 3 2 1
March 16
Array C[6]:
Array C[6]:
Array C[6]:
Array C[6]:
Array C[6]:
Array C[6]:
Array C[6]:
March 16
Array C[6]:
0 1 2 3 4 5 6 0 1 2 4 7 9 10 0 1 2 3 4 5 6 7 8 9
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 0 1 2 5 7 9 10 0 1 2 3 4 5 6 7 8 9 3
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 0 1 3 5 7 9 10 0 1 2 3 4 5 6 7 8 9 2 3
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 0 1 3 5 8 9 10 0 1 2 3 4 5 6 7 8 9 2 3 4
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 1 3 5 8 9 10 0 1 2 3 4 5 6 7 8 9 0 2 3 4
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 2 3 5 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 2 3 5 8 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 2 4 5 8 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 4 5
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 2 4 6 8 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 4 5
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 2 4 6 9 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 4 4 5
Array B[10]:
March 16
Array C[6]:
0 1 2 3 4 5 6 1 2 4 7 9 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 3 4 4 5
Array B[10]:
March 16
10
Radix sort
We will transform Counting Sort into sorting algorithm that is good for sorting numbers in {0, 1, . . . , m3 1}. A number k from this range has three digits, k = digit(k, 0) + m digit(k, 1) + m2 digit(k, 2). Counting_sort(A,B,n) { int C[m+1]; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[A[i]]++; // bucket i will be <B,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) B[C[A[i]]++] = A[i]; }
March 16
10
Radix sort
We will transform Counting Sort into sorting algorithm that is good for sorting numbers in {0, 1, . . . , m3 1}. A number k from this range has three digits, k = digit(k, 0) + m digit(k, 1) + m2 digit(k, 2). Radix_sort(A,B,n) // preliminary { int C[m+1], *S = A, *T = B, *temp; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],0)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],0)]++] = S[i]; temp = T, T = S, S = temp; } 0 1 2 3 4 5 213 352 144 40 501 205 will be transformed into 0 1 2 40 501 352 3 4 32 213 5 6 32 7 8 9 3 154 433
March 16
11
We can repeat what we did once more, but now we will look at the next digit: Radix_sort(A,B,n) // still preliminary { int C[m+1], *S = A, *T = B, *temp; for (d = 0; d < 2; d++) { // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],d)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],d)]++] = S[i]; temp = T, T = S, S = temp; } } 0 1 2 3 4 5 213 352 144 40 501 205 will be transformed into 0 1 2 3 4 40 501 352 32 213 will be transformed into 0 501 1 2 3 3 205 213 5 6 32 7 8 9 3 154 433
4 5 32 433
March 16
12
Now we can present the nal version: Radix_sort(A,B,n) { int C[m+1], *S = A, *T = B, *temp; for (d = 0; d < 3; d++) { // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],d)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],d)]++] = S[i]; temp = T, T = S, S = temp; } } 0 1 2 3 4 5 213 352 144 40 501 205 will be transformed into 0 1 2 3 4 40 501 352 32 213 will be transformed into 5 6 32 7 8 9 3 154 433
March 16
13
We can Radix sort with any number of digits. Because we compute digits very often, it is good to compute them fast. One good way is to look at the keys that are sorted as strings of unsigned characters. This way we have m equal to 256 and we do not compute the digits, just read them. This is particularly good if the keys are indeed strings, e.g. names to be sorted alphabetically etc. Interestingly, we can use this approach to sort positive oating point numbers: exponent byte is the most signicant, when exponents are equal, we should compare mantissas. If the set to be sorted has many thousands of elements (or millions) we prefer to have fewer passes. We can use pair of bytes/characters, and characters a, b dene digit a + 256 b.
March 16
14
Counting sort and Radix sort select the place where a sorted number k should be moved based on a function of its value, e.g. digit(k, d). This assumes some knowledge about the range of objects that we are sorting, and is not necessarily useful in every possible range. Therefore we are interested in comparison sorting, i.e. in sorting algorithms in which we do not compute any fuctions of the values of the sorted objects other than comparisons, Boolean functions on pairs of objects. It is easy to show that an algorithm that sorts n numbers must perform, in the worst case, at least log2 ( n!) comparisons. We may assume that all the objects in the input are distinct. Then there exists exactly one permutation such that for input a 0 , a 1 , . . . , a n 1 the valid output is a (0) , a (1) , . . . , a (n1) . There are n! possible permutation and anyone of them is needed for some input. Let be the set of permutations that are needed for one of the inputs that would give the answers to comparisons that we have seen so far. Initially, before performing any comparisons, || = n! because every input is possible (and thus every permutation). Suppose now that we are about to perform a comparison, say ai < a j . Let yes be the set of permutations from that are consistent with the positive answer, and let no be the set of permutations from that are consistent with the negative answer. For some val { no, yes} the set val is at least as large as the other one. If is not empty, then it is possible that val is the answer to the comparison ai < a j . Thus it is possible that as the result of comparison ai < a j we change set into val and |val | ||21 . Consequently, it is possible that after performing k comparisons we have || n!2k . On the other hand, after sorting is completed, we have = {} for a certain permutation , so we have || = 1.
March 16
15
Therefore if k is the largest total number of permutation performed by our unknown algorithm for some input, then n!2k 1 1 n!2k 2k n!
k log2 ( n!) k
i =2
log2 n.
On the other hand, we can estimate the latter summation as follows: n n n 1 1 n [ x ln x x ]1 log2 i > log2 x d x = ln x d x = = ln 2 ln 2 i =2 1 1 1 ( n ln n n + 1) = ln 2 n log2 n n1 . ln 2