Professional Documents
Culture Documents
Wei-Mei Chen
1/79
Introduction
Motivation
• Analysis:
• Unsuccessful search: n + 1 ⇒ O(n)
• Average successful search:
n
X i n+1
= = O(n)
n 2
i=1
3/79
Binary Search
4/79
Interpolation Search
k − a[1].key
i= ×n
a[n].key − a[1].key
5/79
List Verification
6/79
Verifying Two Unsorted Lists
7/79
Verifying Two Sorted Lists
9/79
An Example for Stable
1. Schedule
2. Destination
10/79
Categories of Sorting Method
11/79
Insertion Sort
Insertion Sort
5 2 4 6 1 3
2 5 4 6 1 3 sorted key ?
2 4 5 6 1 3
2 4 5 6 1 3 Loop invariant : At the start of each
1 2 4 5 6 3 iteration of the for loop, A[1..j − 1] con-
1 2 3 4 5 6 tains it’s original elements but sorted
12/79
Analysis of Insertion Sort
13/79
Quick Sort
Quick Sort
Hoare [1962]
• the best practical sorting algorithm
• in place sorting
• worst-case running time: Θ(n2 ) on n numbers
• average-case running time: Θ(n lg n)
14/79
Description of Quick Sort
Divide-and-conquer paradigm
• Divide: a[p..r ]
p q r
≤x x >x
↑ pivot
• Conquer:
Sort the two subarrays a[p..q − 1] and a[q + 1..r ] by recursive calls
to quick sort.
• Combine:
Since the subarrays are sorted in place, no work is needed to
combine them
15/79
Algorithm quickSort
i j
26 5 37 1 61 11 59 15 48 19
16/79
Worst-Case Partitioning
n
T (n) = T (n − 1) + T (0) + n − 1 0 n−1
= T (n − 1) + n − 1 + T (0)
0 n−2
= T (n − 2) + (n − 2) + (n − 1) + 2T (0)
··· 0 n−3
= T (0) + 0 + 1 + 2 + 3 + · · · + (n − 1) + nT (0)
.. ..
= Θ(n2 ) . .
17/79
Best-Case Partitioning
.. .. .. ..
. . . .
18/79
Average-Case Running Time
19/79
n n
X 1 2X
Since [T (p − 1) + T (n − p)] = T (p − 1), we have
p=1
n n p=1
n
2X
T (n) = T (p − 1) + n − 1.
n p=1
n
X
”×n” : nT (n) = 2 T (p − 1) + n(n − 1) (1)
p=1
n−1
X
”n → n − 1” : (n − 1)T (n − 1) = 2 T (p − 1) + (n − 1)(n − 2) (2)
p=1
T (n) T (n − 1) 2(n − 1)
= +
n+1 n n(n + 1)
20/79
T (n) T (n − 1) 2(n − 1)
= + n−1 2 1
n+1 n n(n + 1) = −
n(n + 1) n+1 n
T (n − 1) T (n − 2) 2(n − 2)
= +
n n−1 (n − 1)(n)
n−2 2 1
.. .. = −
. . (n − 1)n n n−1
T (2) T (1) 2
= +
3 2 6
T (n) 4 1 1 1
summing ⇒ = +2 + + ··· + −1
n+1 n+1 n n−1 3
n
X 1 4
=2 + −1
k n+1
k=3
4
= 2Hn + −4
n+1
T (n) ≈ 2(n + 1) ln n ∈ Θ(n lg n)
21/79
Harmonic Numbers
n
X 1
Hn =
k
k=1
1 1 1
=1+ + + ··· +
2 3 n
1 1 1 1
= ln n + γ + − 2
+ 4
− + ···
2n 12n 120n 252n6
where γ = 0.5772156649
22/79
Variations of Quick Sort
1. Median of three:
Pick the median of the first, middle, and last keys in the
current sublist as the pivot. Thus,
pivot = median{Kl , K(l+r )/2 , Kr } .
à Use median as the pivot
2. Randomized Quick Sort:
Pick a key randomly.
23/79
Optimal Sorting Time
How Fast Can We Sort?
24/79
Decision Trees
2:3 1:3
≤ > ≤ >
< 1, 2, 3 > 1:3 < 2, 1, 3 > 2:3
≤ > ≤ >
< 1, 3, 2 > < 3, 1, 2 > < 2, 3, 1 > < 3, 2, 1 >
25/79
Lower Bounds
26/79
Lower Bound for Sorting
27/79
Theorem
Any algorithm that sorts only by comparisons must have a
worst-case computing time of Ω(n log n)
2 4 5 7 ∞ 1 2 3 6 ∞
1 2 2 3 4 5 6 7
29/79
Merging Two Sorted Lists
30/79
Iterative Merge Sort
31/79
void mergePass(element initList[], element mergedList[], int n, int s)
{
int i, j;
for( i=1; i<= n - 2 * s + 1; i+= 2*s)
merge(initList, mergedList, i, i+s-1, i+2*s-1);
if ( i+s-1 < n)
merge(initList, mergedList, i, i+s-1, n);
else
for( j=i; j<=n; j++)
mergedList[j] = initList[j];
}
5 2 4 7 1 3 2 6
5 2 4 7 1 3 2 6
5 2 4 7
5 2
33/79
Example: Recursive Merge Sort
5 2 4 7 1 3 2 6
5 2 4 7 1 3 2 6
2 5 4 7
5 2 4 7
34/79
Example: Recursive Merge Sort
5 2 4 7 1 3 2 6
5 2 4 7 1 3 2 6
2 5 4 7
5 2 4 7
35/79
Example: Recursive Merge Sort
5 2 4 7 1 3 2 6
2 4 5 7 1 3 2 6
2 5 4 7
5 2 4 7
36/79
Example: Recursive Merge Sort
5 2 4 7 1 3 2 6
2 4 5 7 1 2 3 6
2 5 4 7 1 3 2 6
5 2 4 7 1 3 2 6
37/79
Example: Recursive Merge Sort
1 2 2 3 4 5 6 7
2 4 5 7 1 2 3 6
2 5 4 7 1 3 2 6
5 2 4 7 1 3 2 6
38/79
Dividing Phase
5 2 4 7 1 3 2 6
5 2 4 7 1 3 2 6
5 2 4 7 1 3 2 6
5 2 4 7 1 3 2 6
Divide
39/79
Merging Phase
1 2 2 3 4 5 6 7
2 4 5 7 1 2 3 6
2 5 4 7 1 3 2 6
5 2 4 7 1 3 2 6
Conquer and Combine
40/79
Dividing Phase: Recursive Merge Sort
41/79
Merging Phase: Recursive Merge Sort
42/79
int listMerge(element a[], int link[], int start1, int start2)
{ int last1, last2, lastResult=0;
for(last1 = start1, last2 = start2; last1 && last2; )
if (a[last1] <= a[last2]) {
link[lastResult] = last1;
lastResult = last1;
last1 = link[last1];
}
else{
link[lastResult2] = last2;
lastResult = last2;
last2 = link[last2];
}
if(last1 == 0) link[lastResult] = last2;
else link[lastResult] = last1;
return link[0];
}
• Time Complexity:
(
O(1), if n = 1
T (n) =
2T (n/2) + O(n) if n > 1
= O(n log n)
44/79
Variations: Natural Merge Sort
45/79
Heap Sort
Recall: Heaps
1
h
16
2 3
14 10 2h 2h + 1
4 5 6 7
1 2 3 4 5 6 7 8 9 10
8 7 9 3
16 14 10 8 7 9 3 2 4 1
8 9 10
2 4 1
46/79
Inserting 21 into a Max Heap
20
15 2
14 4
20 20 21
15 2 15 15 20 15 20
14 4 14 4 2 14 4 2 14 4 2
47/79
Deleting from a Max Heap
21 20 20
15 20 15 20 15 15 2
14 4 2 14 4 2 14 4 14 4
Delete
15 15 15
14 2 14 2 2 15 2
4 14 14 4
4 4
1 3 1 3
2 16 9 10 2 16 9 10
14 8 7 14 8 7
4 4
1 3 1 10
14 16 9 10 14 16 9 3
2 8 7 2 8 7
50/79
Building a Max Heap (2/2)
4 16
16 10 4 10
14 7 9 3 14 7 9 3
2 8 1 2 8 1
16 16
14 10 14 10
4 7 9 3 8 7 9 3
2 8 1 2 4 1
51/79
Analysis of adjust
L−1
X L−1
X L−1
X
2(L − i)2i = 2L 2i − 2 i · 2i
i=0 i=0 i=0 i
L L
= 2L(2 − 1) − 2(2 (L − 2) + 2) L
L
= 4 · 2 − 2L − 4
= 4 · 2blg nc − 2blg nc − 4
≤ cn ( if c ≥ 4)
PL−1
Note: i=0 i · 2i = 2L (L − 2) + 2
52/79
PL−1
Compute i=0 i · 2i
PL−1
Let S = i=1 i · 2i
S = 1 · 21 + 2 · 22 + 3 · 23 + 4 · 24 + · · · + (L − 1) · 2L−1
2S = 22 + 2 · 23 + 3 · 24 + · · · + (L − 2) · 2L−1 + (L − 1)2L
à S − 2S = 2 + 22 + 23 + · · · + 2L−1 − (L − 1)2L
2(1 − 2L−1 )
S = (L − 1)2L −
1−2
= 2L (L − 1) + 2 − 2 · 2L−1
= 2L (L − 2) + 2
53/79
Algorithm heapSort
54/79
Example: Heap Sort (1/2)
16 14
14 10 8 10
8 7 9 3 4 7 9 3
2 4 1 2 1 16
10 9
8 9 8 3
4 7 1 3 4 7 1 2
2 14 16 10 14 16
55/79
Example: Heap Sort (2/2)
8 7
7 3 4 3
4 2 1 9 1 2 8 9
10 14 16 10 14 16
1
2 3
4 7 8 9
10 14 16 1 2 3 4 7 8 9 10 14 16
56/79
Analysis of Heap Sort
n−1
X i nodes
2 blg ic
i=1
= 2 nblg nc − 2blg nc+1 + 2 blg ic
= 2nblg nc − 4 · 2blg nc + 4
= 2nblg nc − 4cn + 4 if 2 ≤ c ≤ 4
57/79
Pn−1
Compute i=1 blg ic
blg 1c = 0 à 1
blg 2c = blg 3c = 1 à 2
blg 4c = blg 5c = blg 6c = blg 7c = 2 à 4 + k · 2k
blg 8c = · · · = blg 15c = 3 à 8
m−1
X
k · 2k = 2m (m − 2) + 2
k=0
58/79
Summary of Internal Sorting
59/79
Comparison
60/79
Radix Sort
Sorting Several Keys
• LSD and MSD only defines the order in which the keys are to
be sorted. But they do not specify how each key is sorted.
• Bin sort can be used for sorting on each key. The complexity
of bin sort is O(n) if there are n bins.
• LSD and MSD can be used even when there is only one key.
• If the keys are numeric, then each decimal digit may be
regarded as a subkey.
à Radix sort.
• In radix sort, we decompose the sort key using some radix r .
• The number of bins needed is r .
• Each key has d digits in the range of 0 to r − 1.
62/79
Radix Sort
r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]
33 859
f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]
64/79
Example: Radix Sort (2/3)
r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]
f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]
65/79
Example: Radix Sort (3/3)
r [0] r [1] r [2] r [3] r [4] r [5] r [6] r [7] r [8] r [9]
93
55
33 271
f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]
66/79
External Sorting
External Sorting
• There are some lists that are too large to fit in the memory of a
computer. So internal sorting is not possible.
• Some records are stored in the disk (tape, etc.). System retrieves a
block of data from a disk at a time. A block contains multiple
records.
• The most popular method for sorting on external storage devices is
merge sort.
• Segments of the input list are sorted.
• Sorted segments (called runs) are written onto external
storage.
• Runs are merged until only a run is left.
• Three factors contributing to the read/write time of disk:
• seek time
• latency time
• transmission time
67/79
Example: External Sort (1/2)
68/79
Example: External Sort (2/2)
Analysis:
Operation Time
1 read 18 blocks of input, 18tIO , internally sort, 36tIO + 6tIS
6tIS , write 18 blocks, 18tIO
2 merge runs 1 to 6 in pairs 36tIO + 4500tm
3 merge two runs of 1500 records each, 12 blocks 24tIO + 3000tm
4 merge one run of 3000 records with one run of 36tIO + 4500tm
1500 records
70/79
Optimal Merging of Runs
15
2 4 5 15
5
2 4
71/79
Huffman Algorithm
72/79
Example: Huffman Tree
2 3 5 7 9 13
10
5 5 5 16
2 3 2 3 9 7
23 39
(d) (e)
10 13 16 23
5 5 9 7 10 13
2 3 5 5
2 3
73/79
Application: File Compression
74/79
Tree Representation of Code
P 0 1
Cost: i di fi
di : depth of code 0 1 0 1
fi : frequency of code
0 1 0 1 0 1
a e i s t sp nl
“ ”
0 1
0 1 0 1
nl
0 1 0 1 0 1
a e i s t sp
75/79
File Compression Problem
76/79
Huffman’s Codes
Character Frequency
a 10
e 15
i 12
s 3
t 4
space 13
newline 1
77/79
Steps of Huffman’s Algorithm
1 3 4 10 12 13 15
(a) (e) 18
nl s t a i sp e
8 a
4
(b) 4 10 12 13 15 4 t 25
nl s t a i sp e 15
nl s i sp e
(c) 8
4 t
10 12 13 15 (f) 33
nl s a i sp e
18 e
(d) 18
8 a
8 a
4 t 25
4 t
12 13 15 nl s i sp
nl s i sp e
78/79
Huffman Code Assignment
100010000101 à iase
79/79