You are on page 1of 44

CSE465, Spring 2009

March 16

Bucket sort

Bucket sort has two meanings. One is similar to that of Counting sort that is described in the book. We assume that every entry to be sorted is in the set {0, 1, . . . , m 1}. We sort array fragment < A, 0, n > using array of buckets B[m]. Bucket_sort(A,n,B,m) { // distribution for (i = 0; i < m; i++) place A[i] in bucket B[A[i]]; // collection for (i = j = 0; j < m; j++ ) { while (B[j] is empty) x = removed from B[j], A[i++] = x; } }

CSE465, Spring 2009

March 16

Example. Array A[10]: 0 1 2 3 4 5 6 7 8 9 3 2 4 0 1 5 2 3 4 3

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 2 4 0 1 5 2 3 4 3


3

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 4 0 1 5 2 3 4 3


2 3

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 5 2 3 4 3


2 3 4

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 1 5 2 3 4 3


2 3 4

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 5 2 3 4 3


1 2 3 4

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 2 3 4 3


1 2 3 4 5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 3 4 3


1 2 2 3 4 5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 4 3


1 2 2 3 3 4 5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9 3


1 2 2 3 3 4 4 5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Distribution: Array A[10]: 0 1 2 3 4 5 6 7 8 9

2 2

B[0]

B[1]

B[2]

3 3 3 B[3]

4 4

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0


1 2 2 3 3 3 B[3] 4 4 5

B[0]

B[1]

B[2]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1


2 2 3 3 3 B[3] 4 4 5

B[0]

B[1]

B[2]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2


3 3 3 B[3] 4 4 5

B[0]

B[1]

B[2]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2


3 3 3 B[3] 4 4 5

B[0]

B[1]

B[2]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3


4 4 5

B[0]

B[1]

B[2]

3 3 B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3


4 4 5

B[0]

B[1]

B[2]

3 B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 3


4 4 5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 3 4


5 4

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 3 4 4


5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Collection:: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 3 4 4 5

B[0]

B[1]

B[2]

B[3]

B[4]

B[5]

CSE465, Spring 2009

March 16

Counting sort

We can use fragments of another array as buckets. If we place them appropriately, we do not need Collection stageinstead, we need to perform Census stage to calculate the placement of the buckets. Counting_sort(A,B,n) { int C[m+1]; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[A[i]]++; // bucket i will be <B,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) B[C[A[i]]++] = A[i]; } In this algorithm we are performing a constant number of work per each sorted numberwe look at it during Census of each bucket and move it during Transfer to bucketsand for each bucketto Prepare counters and to compute the bucket limits. Thus the running time is (m + n) Remark. This is a stable sorting method; we do not change relative positions in the order of numbers that are equal; this is important later.

CSE465, Spring 2009

March 16

Example. Array A[10]: 0 1 2 3 4 5 6 7 8 9 3 2 4 0 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 0 0 0 0 0 0

CSE465, Spring 2009

March 16

After census of buckets: Array A[10]: 0 1 2 3 4 5 6 7 8 9 3 2 4 0 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 1 1 2 3 2 1

CSE465, Spring 2009

March 16

Computing buckets left ends: 0 1 2 3 4 5 6 1 1 2 3 2 1 10 0 1 2 3 4 5 6 1 1 2 3 2 9 10 0 1 2 3 4 5 6 1 1 2 3 7 9 10 0 1 2 3 4 5 6 1 1 2 4 7 9 10 0 1 2 3 4 5 6 1 1 2 4 7 9 10 0 1 2 3 4 5 6 1 1 2 4 7 9 10 0 1 2 3 4 5 6 0 1 2 4 7 9 10

Array C[6]:

Array C[6]:

Array C[6]:

Array C[6]:

Array C[6]:

Array C[6]:

Array C[6]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 3 2 4 0 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 0 1 2 4 7 9 10 0 1 2 3 4 5 6 7 8 9

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 2 4 0 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 0 1 2 5 7 9 10 0 1 2 3 4 5 6 7 8 9 3

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 4 0 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 0 1 3 5 7 9 10 0 1 2 3 4 5 6 7 8 9 2 3

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 0 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 0 1 3 5 8 9 10 0 1 2 3 4 5 6 7 8 9 2 3 4

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 1 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 1 1 3 5 8 9 10 0 1 2 3 4 5 6 7 8 9 0 2 3 4

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 5 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 1 2 3 5 8 9 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 2 3 4 3

Array C[6]:

0 1 2 3 4 5 6 1 2 3 5 8 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 3 4 3

Array C[6]:

0 1 2 3 4 5 6 1 2 4 5 8 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 4 5

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 4 3

Array C[6]:

0 1 2 3 4 5 6 1 2 4 6 8 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 4 5

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9 3

Array C[6]:

0 1 2 3 4 5 6 1 2 4 6 9 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 4 4 5

Array B[10]:

CSE465, Spring 2009

March 16

Transfer: Array A[10]: 0 1 2 3 4 5 6 7 8 9

Array C[6]:

0 1 2 3 4 5 6 1 2 4 7 9 10 10 0 1 2 3 4 5 6 7 8 9 0 1 2 2 3 3 3 4 4 5

Array B[10]:

CSE465, Spring 2009

March 16

10

Radix sort

We will transform Counting Sort into sorting algorithm that is good for sorting numbers in {0, 1, . . . , m3 1}. A number k from this range has three digits, k = digit(k, 0) + m digit(k, 1) + m2 digit(k, 2). Counting_sort(A,B,n) { int C[m+1]; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[A[i]]++; // bucket i will be <B,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) B[C[A[i]]++] = A[i]; }

CSE465, Spring 2009

March 16

10

Radix sort

We will transform Counting Sort into sorting algorithm that is good for sorting numbers in {0, 1, . . . , m3 1}. A number k from this range has three digits, k = digit(k, 0) + m digit(k, 1) + m2 digit(k, 2). Radix_sort(A,B,n) // preliminary { int C[m+1], *S = A, *T = B, *temp; // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],0)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],0)]++] = S[i]; temp = T, T = S, S = temp; } 0 1 2 3 4 5 213 352 144 40 501 205 will be transformed into 0 1 2 40 501 352 3 4 32 213 5 6 32 7 8 9 3 154 433

6 7 8 9 3 433 144 154 205

CSE465, Spring 2009

March 16

11

We can repeat what we did once more, but now we will look at the next digit: Radix_sort(A,B,n) // still preliminary { int C[m+1], *S = A, *T = B, *temp; for (d = 0; d < 2; d++) { // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],d)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],d)]++] = S[i]; temp = T, T = S, S = temp; } } 0 1 2 3 4 5 213 352 144 40 501 205 will be transformed into 0 1 2 3 4 40 501 352 32 213 will be transformed into 0 501 1 2 3 3 205 213 5 6 32 7 8 9 3 154 433

6 7 8 9 3 433 144 154 205 6 7 8 9 40 144 352 154

4 5 32 433

CSE465, Spring 2009

March 16

12

Now we can present the nal version: Radix_sort(A,B,n) { int C[m+1], *S = A, *T = B, *temp; for (d = 0; d < 3; d++) { // Census // Prepare counters for (i = 0; i < m; C[i++] = 0); // Census of each bucket for (i = 0; i < n; i++) C[digit(S[i],d)]++; // bucket i will be <T,C[i],C[i+1]> C[m] = n; for (i = m-1; m >= 0; i--) C[i] = C[i+1]-C[i]; // Transfer to buckets for (i = 0; i < n; i++) T[C[digit(S[i],d)]++] = S[i]; temp = T, T = S, S = temp; } } 0 1 2 3 4 5 213 352 144 40 501 205 will be transformed into 0 1 2 3 4 40 501 352 32 213 will be transformed into 5 6 32 7 8 9 3 154 433

6 7 8 9 3 433 144 154 205 6 7 8 9 40 144 352 154

0 1 2 3 4 5 501 3 205 213 32 433 will be transformed into 0 3 1 32

2 3 4 5 6 7 8 9 40 144 154 205 213 352 433 501

CSE465, Spring 2009

March 16

13

Final remarks on Radix sort.

We can Radix sort with any number of digits. Because we compute digits very often, it is good to compute them fast. One good way is to look at the keys that are sorted as strings of unsigned characters. This way we have m equal to 256 and we do not compute the digits, just read them. This is particularly good if the keys are indeed strings, e.g. names to be sorted alphabetically etc. Interestingly, we can use this approach to sort positive oating point numbers: exponent byte is the most signicant, when exponents are equal, we should compare mantissas. If the set to be sorted has many thousands of elements (or millions) we prefer to have fewer passes. We can use pair of bytes/characters, and characters a, b dene digit a + 256 b.

CSE465, Spring 2009

March 16

14

Lower bound on comparison sorting

Counting sort and Radix sort select the place where a sorted number k should be moved based on a function of its value, e.g. digit(k, d). This assumes some knowledge about the range of objects that we are sorting, and is not necessarily useful in every possible range. Therefore we are interested in comparison sorting, i.e. in sorting algorithms in which we do not compute any fuctions of the values of the sorted objects other than comparisons, Boolean functions on pairs of objects. It is easy to show that an algorithm that sorts n numbers must perform, in the worst case, at least log2 ( n!) comparisons. We may assume that all the objects in the input are distinct. Then there exists exactly one permutation such that for input a 0 , a 1 , . . . , a n 1 the valid output is a (0) , a (1) , . . . , a (n1) . There are n! possible permutation and anyone of them is needed for some input. Let be the set of permutations that are needed for one of the inputs that would give the answers to comparisons that we have seen so far. Initially, before performing any comparisons, || = n! because every input is possible (and thus every permutation). Suppose now that we are about to perform a comparison, say ai < a j . Let yes be the set of permutations from that are consistent with the positive answer, and let no be the set of permutations from that are consistent with the negative answer. For some val { no, yes} the set val is at least as large as the other one. If is not empty, then it is possible that val is the answer to the comparison ai < a j . Thus it is possible that as the result of comparison ai < a j we change set into val and |val | ||21 . Consequently, it is possible that after performing k comparisons we have || n!2k . On the other hand, after sorting is completed, we have = {} for a certain permutation , so we have || = 1.

CSE465, Spring 2009

March 16

15

Therefore if k is the largest total number of permutation performed by our unknown algorithm for some input, then n!2k 1 1 n!2k 2k n!

k log2 ( n!) k

i =2

log2 n.

On the other hand, we can estimate the latter summation as follows: n n n 1 1 n [ x ln x x ]1 log2 i > log2 x d x = ln x d x = = ln 2 ln 2 i =2 1 1 1 ( n ln n n + 1) = ln 2 n log2 n n1 . ln 2

You might also like