Professional Documents
Culture Documents
Radix Sorts: Key-Indexed Counting LSD Radix Sort MSD Radix Sort 3-Way Radix Quicksort Application: LRS
Radix Sorts: Key-Indexed Counting LSD Radix Sort MSD Radix Sort 3-Way Radix Quicksort Application: LRS
key-indexed counting
LSD radix sort
MSD radix sort
3-way radix quicksort
application: LRS
References:
Algorithms in Java, Chapter 10
http://www.cs.princeton.edu/introalgsds/61sort
1
algorithm
guarantee
average
extra
space
operations
on keys
insertion sort
N2 /2
N2 /4
no
compareTo()
selection sort
N2 /2
N2 /2
no
compareTo()
mergesort
N lg N
N lg N
compareTo()
quicksort
1.39 N lg N
1.39 N lg N
c lg N
compareTo()
Digital keys
Many commonly-use key types are inherently digital
(sequences of fixed-length characters)
example interface
Examples
Strings
64-bit integers
interface Digital
{
public int charAt(int k);
public int length(int);
}
This lecture:
refer to fixed-length vs. variable-length strings
R different characters for some fixed value R.
assume key type implements charAt() and length() methods
code works for String
key-indexed counting
LSD radix sort
MSD radix sort
3-way radix quicksort
application: LRS
Examples:
char (R = 256)
short with fixed R, enforced by client
int with fixed R, enforced by client
Key-indexed counting
Task: sort an array a[] of N integers between 0 and R-1
Plan: produce sorted result in array temp[]
1. Count frequencies of each letter using key as index
2. Compute frequency cumulates
3. Access cumulates using key as index to find record positions.
4. Copy back into original array
a[]
int N = a.length;
int[] count = new int[R];
count
frequencies
compute
cumulates
move
records
copy back
temp[]
12
count[]
10 f
10 f
11 f
11 f
algorithm
guarantee
average
extra
space
operations
on keys
insertion sort
N2 /2
N2 /4
no
compareTo()
selection sort
N2 /2
N2 /2
no
compareTo()
mergesort
N lg N
N lg N
compareTo()
quicksort
1.39 N lg N
1.39 N lg N
c lg N
compareTo()
key-indexed counting
N+R
N+R
N+ R
use as
array index
key-indexed counting
LSD radix sort
MSD radix sort
3-way radix quicksort
application: LRS
sort key
sort key
sort key
10
10
10
10
11
11
11
11
sort key
d
10
10
11
11
in order
by previous
passes
10
public
{
int
int
for
{
key-indexed
counting
count
frequencies
compute
cumulates
move
records
copy back
}
}
Assumes fixed-length keys (length = W)
11
algorithm
guarantee
average
extra
space
assumptions
on keys
insertion sort
N2 /2
N2 /4
no
Comparable
selection sort
N2 /2
N2 /2
no
Comparable
mergesort
N lg N
N lg N
Comparable
quicksort
1.39 N lg N
1.39 N lg N
c lg N
Comparable
WN
WN
N+ R
digital
12
Sorting Challenge
Problem: sort a huge commercial database on a fixed-length key field
Ex: account number, date, SS number
B14-99-8765
756-12-AD46
CX6-92-0112
332-WX-9877
375-99-QWAX
CV2-59-0221
387-SS-0321
KJ-00-12388
715-YT-013C
MJ0-PP-983F
908-KK-33TY
BBN-63-23RE
48G-BM-912D
982-ER-9P1B
WBL-37-PB81
810-F4-J87Q
LE9-N8-XX76
908-KK-33TY
B14-99-8765
CX6-92-0112
CV2-59-0221
332-WX-23SQ
332-6A-9877
13
Sorting Challenge
Problem: sort huge files of random 128-bit numbers
Ex: supercomputer sort, internet router
Which sorting method to use?
1. insertion sort
2. mergesort
3. quicksort
4. LSD radix sort
14
card punch
punched cards
card reader
mainframe
line printer
card sorter
key-indexed counting
LSD radix sort
MSD radix sort
3-way radix quicksort
application: LRS
16
count[]
10
10
11
11
sort key
10
11
sort these
independently
(recursive)
17
key-indexed
counting
count[]
a[]
temp[]
0 b
0 a
1 a
1 b
19
e \0
\0
\0
e \0
\0
\0
\0
19/64 30% of the characters examined
\0
20
key-indexed
counting
21
22
Disadvantage of quicksort.
N lg N, not linear.
Has to rescan long keys for compares
[but stay tuned]
23
key-indexed counting
LSD radix sort
MSD radix sort
3-way radix quicksort
application: LRS
24
qsortX(0, 12, 0)
0
10
10
10
11
11
11
10
11
partition 0th
char on b
swap bs to ends as
in 3-way quicksort
qsortX(0, 2, 0)
qsortX(2, 5, 1)
qsortX(5, 12, 0)
3-way partition on b
25
4-way partition
with equals
at ends
swap equals
back to middle
sort 3 pieces
recursively
}
27
constant factor
asymptotically
28
29
key-indexed counting
LSD radix sort
MSD radix sort
3-way radix quicksort
application: LRS
30
a a c a a g t t t a c a a g c a t g a t g c t g t a c t a
g g a g a g t t a t a c t g g t c g t c a a a c c t g a a
c c t a a t c c t t g t g t g t a c a c a c a c t a c t a
c t g t c g t c g t c a t a t a t c g a g a t c a t c g a
a c c g g a a g g c c g g a c a a g g c g g g g g g t a t
a g a t a g a t a g a c c c c t a g a t a c a c a t a c a
t a g a t c t a g c t a g c t a g c t c a t c g a t a c a
c a c t c t c a c a c t c a a g a g t t a t a c t g g t c
a a c a c a c t a c t a c g a c a g a c g a c c a a c c a
g a c a g a a a a a a a a c t c t a t a t c t a t a a a a
31
a a c a a g t t t a c a a g c a t g a t g c t g t a c t a
g g a g a g t t a t a c t g g t c g t c a a a c c t g a a
c c t a a t c c t t g t g t g t a c a c a c a c t a c t a
c t g t c g t c g t c a t a t a t c g a g a t c a t c g a
a c c g g a a g g c c g g a c a a g g c g g g g g g t a t
a g a t a g a t a g a c c c c t a g a t a c a c a t a c a
t a g a t c t a g c t a g c t a g c t c a t c g a t a c a
c a c t c t c a c a c t c a a g a g t t a t a c t g g t c
a a c a c a c t a c t a c g a c a g a c g a c c a a c c a
g a c a g a a a a a a a a c t c t a t a t c t a t a a a a
32
String processing
String. Sequence of characters.
Important fundamental abstraction
Natural languages, Java programs, genomic sequences,
33
String
char
String
String
s
c
t
u
=
=
=
=
"strings";
s.charAt(2);
s.substring(2, 6);
s + t;
//
//
//
//
s
c
t
u
=
=
=
=
"strings"
'r'
"ring"
"stringsring"
34
implements Comparable<String>
//
//
//
//
characters
index of first char into array
length of string
cache of hashCode()
}
java.lang.String
35
quadratic time
linear time
36
linear time
37
Applications
bioinformatics.
cryptanalysis.
Brute force.
Try all indices i and j for start of possible match, and check.
Time proportional to M N2 , where M is length of longest match.
a
i
j
38
sorted suffixes
suffixes
0
10
11
12
13
14
11
12
14
10
13
c
39
read input
create suffixes
(linear time)
Arrays.sort(suffixes);
sort suffixes
find LCP
}
}
Sorting Challenge
Problem: suffix sort a long string
Ex. Moby Dick ~1.2 million chars
41
algorithm
brute-force
36.000 (est.)
quicksort
9.5
LSD
not fixed-length
MSD
395
6.8
2.8
42
abcdefghi
abcdefghiabcdefghi
bcdefghi
bcdefghiabcdefghi
cdefghi
cdefghiabcdefgh
defghi
efghiabcdefghi
efghi
fghiabcdefghi
fghi
ghiabcdefghi
fhi
hiabcdefghi
hi
iabcdefghi
i
Input: "abcdeghiabcdefghi"
43
Running time
finishes after lg N phases
obvious upper bound on growth of total time: O( N (lg N)2 )
actual growth of total time (proof omitted): ~N lg N.
babaaaabcbabaaaaa0
abaaaabcbabaaaaa0
baaaabcbabaaaaa0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
babaaaaa0
abaaaaa0
baaaaa0
aaaaa0
aaaa0
aaa0
aa0
a0
0
17
1
16
3
4
5
6
15
14
13
12
10
0
9
11
7
2
8
0
abaaaabcbabaaaaa0
a0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abcbabaaaaa0
aa0
aaa0
aaaa0
aaaaa0
abaaaaa0
babaaaabcbabaaaaa0
babaaaaa0
baaaaa0
bcbabaaaaa0
baaaabcbabaaaaa0
cbabaaaaa0
inverse
0
12
16
15
17
13
10
11
11
14
12
10
13
14
15
16
17
sorted
45
babaaaabcbabaaaaa0
abaaaabcbabaaaaa0
baaaabcbabaaaaa0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
babaaaaa0
abaaaaa0
baaaaa0
aaaaa0
aaaa0
aaa0
aa0
a0
0
17
16
12
3
4
5
13
15
14
6
1
10
0
9
11
2
7
8
0
a0
aaaaa0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
aaaa0
aa0
aaa0
abcbabaaaaa0
abaaaabcbabaaaaa0
abaaaaa0
babaaaabcbabaaaaa0
babaaaaa0
baaaaa0
baaaabcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
inverse
0
12
10
15
16
17
13
10
11
11
14
12
13
14
15
16
17
sorted
46
babaaaabcbabaaaaa0
abaaaabcbabaaaaa0
baaaabcbabaaaaa0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
babaaaaa0
abaaaaa0
baaaaa0
aaaaa0
aaaa0
aaa0
aa0
a0
0
17
16
15
14
3
12
13
4
5
1
10
6
2
11
0
9
7
8
inverse
0
a0
aa0
aaa0
aaaabcbabaaaaa0
aaaaa0
aaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abaaaabcbabaaaaa0
abaaaaa0
abcbabaaaaa0
baaaabcbabaaaaa0
baaaaa0
babaaaabcbabaaaaa0
babaaaaa0
bcbabaaaaa0
cbabaaaaa0
14
12
11
16
17
15
10
10
11
13
12
13
14
15
16
17
sorted
47
babaaaabcbabaaaaa0
abaaaabcbabaaaaa0
baaaabcbabaaaaa0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
babaaaaa0
abaaaaa0
baaaaa0
aaaaa0
aaaa0
aaa0
aa0
a0
0
17
16
15
14
3
13
12
4
5
10
1
6
11
2
9
0
7
8
inverse
0
a0
aa0
aaa0
aaaabcbabaaaaa0
aaaa0
aaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abaaaaa0
abaaaabcbabaaaaa0
abcbabaaaaa0
baaaaa0
baaaabcbabaaaaa0
babaaaaa0
babaaaabcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
15
10
13
11
16
17
14
10
11
12
12
13
14
15
16
17
sorted
48
babaaaabcbabaaaaa0
abaaaabcbabaaaaa0
baaaabcbabaaaaa0
aaaabcbabaaaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abcbabaaaaa0
bcbabaaaaa0
cbabaaaaa0
babaaaaa0
abaaaaa0
baaaaa0
aaaaa0
aaaa0
aaa0
aa0
a0
0
0 + 4 = 4
9 + 4 = 13
17
16
15
14
3
12
13
4
5
1
10
6
2
11
0
9
7
8
0
a0
aa0
aaa0
aaaabcbabaaaaa0
aaaaa0
aaaa0
aaabcbabaaaaa0
aabcbabaaaaa0
abaaaabcbabaaaaa0
abaaaaa0
abcbabaaaaa0
baaaabcbabaaaaa0
baaaaa0
babaaaabcbabaaaaa0
babaaaaa0
bcbabaaaaa0
cbabaaaaa0
inverse
0
14
12
11
16
17
15
10
10
11
13
12
13
14
15
16
17
0
49
algorithm
time to suffixsort
AesopAesop
(seconds)
brute-force
36.000 (est.)
4000 (est.)
quicksort
9.5
167
MSD
395
out of memory
6.8
162
2.8
400
Manber MSD
17
8.5
counters in
deep recursion
only 2 keys in
subfiles with long
matches
50
MSD
LCP
LRS