0% found this document useful (0 votes)
128 views53 pages

Mergesort and Sorting Stability Explained

The document discusses stability in sorting algorithms. It provides an example of sorting students by name then by section to illustrate stability. A stable sort preserves the relative order of items with equal keys. Insertion sort is proven to be a stable sorting algorithm through an example that shows items are not reordered if their keys are equal during the sorting process.

Uploaded by

Dong Raiser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views53 pages

Mergesort and Sorting Stability Explained

The document discusses stability in sorting algorithms. It provides an example of sorting students by name then by section to illustrate stability. A stable sort preserves the relative order of items with equal keys. Insertion sort is proven to be a stable sorting algorithm through an example that shows items are not reordered if their keys are equal during the sorting process.

Uploaded by

Dong Raiser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2.

2 M ERGESORT
‣ mergesort
‣ bottom-up mergesort
‣ sorting complexity
Algorithms
‣ comparators
‣ stability
R OBERT S EDGEWICK | K EVIN W AYNE

[Link]
Stability

A typical application. First, sort by name; then sort by section.

[Link](a, new [Link]());

Andrews 3 A 664-480-0023 097 Little

Battle 4 C 874-088-1212 121 Whitman

Chen 3 A 991-878-4944 308 Blair

Fox 3 A 884-232-5341 11 Dickinson

Furia 1 A 766-093-9873 101 Brown

Gazsi 4 B 766-093-9873 101 Brown

Kanaga 3 B 898-122-9643 22 Brown

Rohde 2 A 232-343-5555 343 Forbes

76
Stability

A typical application. First, sort by name; then sort by section.

[Link](a, new [Link]()); [Link](a, new [Link]());

Andrews 3 A 664-480-0023 097 Little Furia 1 A 766-093-9873 101 Brown

Battle 4 C 874-088-1212 121 Whitman Rohde 2 A 232-343-5555 343 Forbes

Chen 3 A 991-878-4944 308 Blair Chen 3 A 991-878-4944 308 Blair

Fox 3 A 884-232-5341 11 Dickinson Fox 3 A 884-232-5341 11 Dickinson

Furia 1 A 766-093-9873 101 Brown Andrews 3 A 664-480-0023 097 Little

Gazsi 4 B 766-093-9873 101 Brown Kanaga 3 B 898-122-9643 22 Brown

Kanaga 3 B 898-122-9643 22 Brown Gazsi 4 B 766-093-9873 101 Brown

Rohde 2 A 232-343-5555 343 Forbes Battle 4 C 874-088-1212 121 Whitman

76
Stability

A typical application. First, sort by name; then sort by section.

[Link](a, new [Link]()); [Link](a, new [Link]());

Andrews 3 A 664-480-0023 097 Little Furia 1 A 766-093-9873 101 Brown

Battle 4 C 874-088-1212 121 Whitman Rohde 2 A 232-343-5555 343 Forbes

Chen 3 A 991-878-4944 308 Blair Chen 3 A 991-878-4944 308 Blair

Fox 3 A 884-232-5341 11 Dickinson Fox 3 A 884-232-5341 11 Dickinson

Furia 1 A 766-093-9873 101 Brown Andrews 3 A 664-480-0023 097 Little

Gazsi 4 B 766-093-9873 101 Brown Kanaga 3 B 898-122-9643 22 Brown

Kanaga 3 B 898-122-9643 22 Brown Gazsi 4 B 766-093-9873 101 Brown

Rohde 2 A 232-343-5555 343 Forbes Battle 4 C 874-088-1212 121 Whitman

@#%&@! Students in section 3 no longer sorted by name.

76
Stability

A typical application. First, sort by name; then sort by section.

[Link](a, new [Link]()); [Link](a, new [Link]());

Andrews 3 A 664-480-0023 097 Little Furia 1 A 766-093-9873 101 Brown

Battle 4 C 874-088-1212 121 Whitman Rohde 2 A 232-343-5555 343 Forbes

Chen 3 A 991-878-4944 308 Blair Chen 3 A 991-878-4944 308 Blair

Fox 3 A 884-232-5341 11 Dickinson Fox 3 A 884-232-5341 11 Dickinson

Furia 1 A 766-093-9873 101 Brown Andrews 3 A 664-480-0023 097 Little

Gazsi 4 B 766-093-9873 101 Brown Kanaga 3 B 898-122-9643 22 Brown

Kanaga 3 B 898-122-9643 22 Brown Gazsi 4 B 766-093-9873 101 Brown

Rohde 2 A 232-343-5555 343 Forbes Battle 4 C 874-088-1212 121 Whitman

@#%&@! Students in section 3 no longer sorted by name.

A stable sort preserves the relative order of items with equal keys.
76
Stability

Q. Which sorts are stable?

sorted by time sorted by location (not stable) sorted by location (stable)

Chicago [Link] Chicago [Link] Chicago [Link]


Phoenix [Link] Chicago [Link] Chicago [Link]
Houston [Link] Chicago [Link] Chicago [Link]
Chicago [Link] Chicago [Link] Chicago [Link]
Houston [Link] Chicago [Link] Chicago [Link]
Chicago [Link] Chicago [Link] Chicago [Link]
Seattle [Link] Chicago [Link] Chicago [Link]
Seattle [Link] Chicago [Link] Chicago [Link]
Phoenix [Link] Houston [Link] no Houston [Link]
longer still
Chicago [Link] Houston [Link] Houston [Link] sorted
Chicago [Link] Phoenix [Link] sorted Phoenix [Link]
by time by time
Chicago [Link] Phoenix [Link] Phoenix [Link]
Seattle [Link] Phoenix [Link] Phoenix [Link]
Seattle [Link] Seattle [Link] Seattle [Link]
Chicago [Link] Seattle [Link] Seattle [Link]
Chicago [Link] Seattle [Link] Seattle [Link]
Seattle [Link] Seattle [Link] Seattle [Link]
Phoenix [Link] Seattle [Link] Seattle [Link]

Stability when sorting on a second key


77
Stability

Q. Which sorts are stable?


A. Need to check algorithm (and implementation).

sorted by time sorted by location (not stable) sorted by location (stable)

Chicago [Link] Chicago [Link] Chicago [Link]


Phoenix [Link] Chicago [Link] Chicago [Link]
Houston [Link] Chicago [Link] Chicago [Link]
Chicago [Link] Chicago [Link] Chicago [Link]
Houston [Link] Chicago [Link] Chicago [Link]
Chicago [Link] Chicago [Link] Chicago [Link]
Seattle [Link] Chicago [Link] Chicago [Link]
Seattle [Link] Chicago [Link] Chicago [Link]
Phoenix [Link] Houston [Link] no Houston [Link]
longer still
Chicago [Link] Houston [Link] Houston [Link] sorted
Chicago [Link] Phoenix [Link] sorted Phoenix [Link]
by time by time
Chicago [Link] Phoenix [Link] Phoenix [Link]
Seattle [Link] Phoenix [Link] Phoenix [Link]
Seattle [Link] Seattle [Link] Seattle [Link]
Chicago [Link] Seattle [Link] Seattle [Link]
Chicago [Link] Seattle [Link] Seattle [Link]
Seattle [Link] Seattle [Link] Seattle [Link]
Phoenix [Link] Seattle [Link] Seattle [Link]

Stability when sorting on a second key


77
Stability: insertion sort

Proposition. Insertion sort is stable.

public class Insertion


{
public static void sort(Comparable[] a)
{
int N = [Link];
for (int i = 0; i < N; i++)
for (int j = i; j > 0 && less(a[j], a[j-1]); j--)
exch(a, j, j-1);
}
}
i j 0 1 2 3 4
0 0 B1 A1 A2 A3 B2

1 0 A1 B1 A2 A3 B2

2 1 A1 A2 B1 A3 B2

3 2 A1 A2 A3 B1 B2

4 4 A1 A2 A3 B1 B2

A1 A2 A3 B1 B2

78
Stability: insertion sort

Proposition. Insertion sort is stable.

public class Insertion


{
public static void sort(Comparable[] a)
{
int N = [Link];
for (int i = 0; i < N; i++)
for (int j = i; j > 0 && less(a[j], a[j-1]); j--)
exch(a, j, j-1);
}
}
i j 0 1 2 3 4
0 0 B1 A1 A2 A3 B2

1 0 A1 B1 A2 A3 B2

2 1 A1 A2 B1 A3 B2

3 2 A1 A2 A3 B1 B2

4 4 A1 A2 A3 B1 B2

A1 A2 A3 B1 B2

Pf. Equal items never move past each other.


78
Stability: selection sort

Proposition. Selection sort is not stable.

public class Selection


{
public static void sort(Comparable[] a)
{
int N = [Link];
for (int i = 0; i < N; i++)
{
int min = i;
for (int j = i+1; j < N; j++)
if (less(a[j], a[min]))
min = j;
exch(a, i, min);
}
}
}

79
Stability: selection sort

Proposition. Selection sort is not stable.

public class Selection


{
public static void sort(Comparable[] a)
{
int N = [Link];
for (int i = 0; i < N; i++)
{ i min 0 1 2
int min = i; 0 2 B1 B2 A
for (int j = i+1; j < N; j++)
if (less(a[j], a[min])) 1 1 A B2 B1
min = j;
2 2 A B2 B1
exch(a, i, min);
} A B2 B1
}
}

Pf by counterexample. Long-distance exchange can move one equal item


past another one.
79
Stability: shellsort

Proposition. Shellsort sort is not stable.

public class Shell


{
public static void sort(Comparable[] a)
{
int N = [Link];
int h = 1;
while (h < N/3) h = 3*h + 1;
while (h >= 1)
{
for (int i = h; i < N; i++)
{
for (int j = i; j > h && less(a[j], a[j-h]); j -= h)
exch(a, j, j-h);
}
h = h/3;
}
}
}

80
Stability: shellsort

Proposition. Shellsort sort is not stable.

public class Shell


{
public static void sort(Comparable[] a)
{
int N = [Link];
int h = 1;
while (h < N/3) h = 3*h + 1;
while (h >= 1)
{
for (int i = h; i < N; i++)
{
for (int j = i; j > h && less(a[j], a[j-h]); j -= h)
exch(a, j, j-h);
}
h = h/3;
h 0 1 2 3 4
}
} B1 B2 B3 B4 A1
}
4 A1 B2 B3 B4 B1

1 A1 B2 B3 B4 B1

A1 B2 B3 B4 B1
Pf by counterexample. Long-distance exchanges.
80
Stability: mergesort

Proposition. Mergesort is stable.

public class Merge


{
private static void merge(...)
{ /* as before */ }

private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi)
{
if (hi <= lo) return;
int mid = lo + (hi - lo) / 2;
sort(a, aux, lo, mid);
sort(a, aux, mid+1, hi);
merge(a, aux, lo, mid, hi);
}

public static void sort(Comparable[] a)


{ /* as before */ }
}

81
Stability: mergesort

Proposition. Mergesort is stable.

public class Merge


{
private static void merge(...)
{ /* as before */ }

private static void sort(Comparable[] a, Comparable[] aux, int lo, int hi)
{
if (hi <= lo) return;
int mid = lo + (hi - lo) / 2;
sort(a, aux, lo, mid);
sort(a, aux, mid+1, hi);
merge(a, aux, lo, mid, hi);
}

public static void sort(Comparable[] a)


{ /* as before */ }
}

Pf. Suffices to verify that merge operation is stable.


81
Stability: mergesort

Proposition. Merge operation is stable.

private static void merge(...)


{
for (int k = lo; k <= hi; k++)
aux[k] = a[k];

int i = lo, j = mid+1;


for (int k = lo; k <= hi; k++)
{
if (i > mid) a[k] = aux[j++];
else if (j > hi) a[k] = aux[i++];
else if (less(aux[j], aux[i])) a[k] = aux[j++];
else a[k] = aux[i++];
}
}

0 1 2 3 4 5 6 7 8 9 10
A1 A2 A3 B D A4 A5 C E F G

82
Stability: mergesort

Proposition. Merge operation is stable.

private static void merge(...)


{
for (int k = lo; k <= hi; k++)
aux[k] = a[k];

int i = lo, j = mid+1;


for (int k = lo; k <= hi; k++)
{
if (i > mid) a[k] = aux[j++];
else if (j > hi) a[k] = aux[i++];
else if (less(aux[j], aux[i])) a[k] = aux[j++];
else a[k] = aux[i++];
}
}

0 1 2 3 4 5 6 7 8 9 10
A1 A2 A3 B D A4 A5 C E F G

Pf. Takes from left subarray if equal keys.


82
Sorting summary

inplace? stable? best average worst remarks

selection ✔ ½N2 ½N2 ½N2 N exchanges

use for small N


insertion ✔ ✔ N ¼N2 ½N2
or partially ordered

tight code;
shell ✔ N log3 N ? c N 3/2 subquadratic

N log N guarantee;
merge ✔ ½ N lg N N lg N N lg N
stable

improves mergesort
timsort ✔ N N lg N N lg N when preexisting order

? ✔ ✔ N N lg N N lg N holy sorting grail

83
Algorithms R OBERT S EDGEWICK | K EVIN W AYNE

2.3 Q UICKSORT
‣ quicksort
‣ selection
‣ duplicate keys
Algorithms F O U R T H E D I T I O N
‣ system sorts

R OBERT S EDGEWICK | K EVIN W AYNE

[Link]
Two classic sorting algorithms: mergesort and quicksort

Critical components in the world’s computational infrastructure.

・Full scientific understanding of their properties has enabled us


to develop them into practical system sorts.

・ Quicksort honored as one of top 10 algorithms of 20th century


in science and engineering.

Mergesort. [last lecture]

...

Quicksort. [this lecture]

...

2
2.3 Q UICKSORT
‣ quicksort
‣ selection
‣ duplicate keys
Algorithms
‣ system sorts

R OBERT S EDGEWICK | K EVIN W AYNE

[Link]
Quicksort

Basic plan.

・Shuffle the array.


・Partition so that, for some j
– entry a[j] is in place
– no larger entry to the left of j
– no smaller entry to the right of j

・Sort each subarray recursively.

input Q U I C K S O R T E X A M P L E
shuffle K R A T E L E P U I M Q C X O S
partitioning item
partition E C A I E K L P U T M Q R X O S
not greater not less
sort left A C E E I K L P U T M Q R X O S
sort right A C E E I K L M O P Q R S T U X
result A C E E I K L M O P Q R S T U X
Quicksort overview
5
Tony Hoare

・Invented quicksort to translate Russian into English.


・ [ but couldn't explain his algorithm or implement it! ]
・Learned Algol 60 (and recursion).
・Implemented quicksort.
Tony Hoare

1980 Turing Award
“ There are two ways of constructing a software design: One way is
to make it so simple that there are obviously no deficiencies, and
the other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult. ”

“ I call it my billion-dollar mistake. It was the invention of the null


reference in 1965… This has led to innumerable errors,
vulnerabilities, and system crashes, which have probably caused
a billion dollars of pain and damage in the last forty years. ”

7
Bob Sedgewick

・Refined and popularized quicksort.


・Analyzed quicksort.

rithms, and we can learn from that experience to separate


good algorithms from bad ones. Third, if the tile fits into
Bob Sedgewick
the memory of the computer, there is one algorithm,
called Quicksort, which has been shown to perform well
Programming S. L. Graham, R. L. Rivest in a variety of situations. Not only is this algorithm
Techniques Editors simpler than many other sorting algorithms, but empir-

Implementing ical [2, ll, 13, 21] and analytic [9] studies show that
Quicksort can be expected to be up to twice as fast as its

Quicksort Programs nearest competitors. The method is simple enough to be


learned by programmersActa Informatica
9 by
who have no
Springer-Verlag
7, 327--355 (1977)
previous experi-
1977
ence with sorting, and those who do know other sorting
Robert Sedgewick methods should also find it profitable to learn about
Brown University Quicksort.
Because of its prominence, it is appropriate to study
The Analysis of Quicksort Programs*
how Quicksort might be improved. This subject has
This paper is a practical study of how to implement received considerable attention (see, for example, [1, 4,
the Quicksort sorting algorithm and its best variants on 11, 13, 14, 18, 20]), but few real improvements have been Robert Sedgewick
real computers, including how to apply various code suggested beyond those described by C.A.R. Hoare, the
optimization techniques. A detailed implementation inventor of Quicksort, in his original papers [5, 6].Received Hoare January 19, t976
combining the most effective improvements to also showed how to analyze Quicksort and predict its
running time. The analysis Summary. The been
has since Quicksort
extendedsorting
to algorithm and its best variants are presented
Quicksort is given, along with a discussion of how to and analyzed. Results are derived which make it possible to obtain exact formulas de-
implement it in assembly language. Analytic results the improvements that he suggested, and used to indicate
scribing the total expected running time of particular implementations on real com-
describing the performance of the programs are how they may best be implemented
puters of Quick,sort [9,and15,an17]. The
improvement called the median-of-three modification.
summarized. A variety of special situations are subject of the carefulDetailed analysis of the effect of has
implementation of Quicksort an implementation technique called loop unwrapping
considered from a practical standpoint to illustrate not been studied as widely as global
is presented. The paper improvements
is intended tonot only to present results of direct practical utility,
Quicksort's wide applicability as an internal sorting the algorithm, butbut thealso to illustrate
savings to be the intriguing
realized are asmathematics which arises in the complete analysis
of thisofimportant
significant. The history Quicksortalgorithm.
is quite complex,
method which requires negligible extra storage.
Key Words and Phrases: Quicksort, analysis of and [15] contains a full survey of the many variants
algorithms, code optimization, sorting which, have been proposed. 1. Introduction
CR Categories: 4.0, 4.6, 5.25, 5.31, 5.5 The purpose of this paper is to describe in detail how
Quicksort can best beInimplemented
t96t-62 C.A.R. Hoareactual
to handle presented a new algorithm called Quicksort [7, 8]
applications on real computers. A general description ofinto order by computer. This method combines
which is suitable for putting files
elegancebyand
the algorithm is followed efficiency,of and
descriptions it remains today the most useful general-purpose
the most
effective improvementssortingthat method
have for beencomputers.
proposed The (as practical utility of the algorithm has meant
demonstrated in [15]). Next, an implementation of to countless modifications (though few real
not only that it has been sfibjected 8
Quicksort partitioning demo

Repeat until i and j pointers cross.


・Scan i from left to right so long as (a[i] < a[lo]).

・Scan j from right to left so long as (a[j] > a[lo]).

・Exchange a[i] with a[j].

K R A T E L E P U I M Q C X O S

lo i j

9
Quicksort partitioning demo

Repeat until i and j pointers cross.


・Scan i from left to right so long as (a[i] < a[lo]).

・Scan j from right to left so long as (a[j] > a[lo]).

・Exchange a[i] with a[j].


When pointers cross.
・Exchange a[lo] with a[j].

E C A I E K L P U T M Q R X O S

lo j hi

partitioned!
10
Quicksort: Java code for partitioning

private static int partition(Comparable[] a, int lo, int hi)


{
int i = lo, j = hi+1;
while (true)
{
while (less(a[++i], a[lo])) find item on left to swap
if (i == hi) break;

find item on right to swap


while (less(a[lo], a[--j]))
if (j == lo) break;
check if pointers cross
if (i >= j) break; swap
exch(a, i, j);
}
swap with partitioning item
before v
exch(a, lo, j); return index of item now known to be in place
lo hi
return j;
} before v during v "v !v

lo hi i j

during after "v v !v


before v v "v !v

lo hi i j lo j hi

during v "v !v after "v v !v Quicksort partitioning overview


11
Quicksort: Java implementation

public class Quick


{
private static int partition(Comparable[] a, int lo, int hi)
{ /* see previous slide */ }

public static void sort(Comparable[] a)


{
shuffle needed for
[Link](a);
performance guarantee
sort(a, 0, [Link] - 1);
(stay tuned)
}

private static void sort(Comparable[] a, int lo, int hi)


{
if (hi <= lo) return;
int j = partition(a, lo, hi);
sort(a, lo, j-1);
sort(a, j+1, hi);
}
}

12
Quicksort trace

lo j hi 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
initial values Q U I C K S O R T E X A M P L E
random shuffle K R A T E L E P U I M Q C X O S
0 5 15 E C A I E K L P U T M Q R X O S
0 3 4 E C A E I K L P U T M Q R X O S
0 2 2 A C E E I K L P U T M Q R X O S
0 0 1 A C E E I K L P U T M Q R X O S
1 1 A C E E I K L P U T M Q R X O S
4 4 A C E E I K L P U T M Q R X O S
6 6 15 A C E E I K L P U T M Q R X O S
no partition 7 9 15 A C E E I K L M O P T Q R X U S
for subarrays
of size 1 7 7 8 A C E E I K L M O P T Q R X U S
8 8 A C E E I K L M O P T Q R X U S
10 13 15 A C E E I K L M O P S Q R T U X
10 12 12 A C E E I K L M O P R Q S T U X
10 11 11 A C E E I K L M O P Q R S T U X
10 10 A C E E I K L M O P Q R S T U X
14 14 15 A C E E I K L M O P Q R S T U X
15 15 A C E E I K L M O P Q R S T U X

result A C E E I K L M O P Q R S T U X

Quicksort trace (array contents after each partition)

13
Quicksort animation

50 random items

algorithm position
in order
current subarray
not in order
[Link]
14
Quicksort: implementation details

Partitioning in-place. Using an extra array makes partitioning easier


(and stable), but is not worth the cost.

Terminating the loop. Testing whether the pointers cross is trickier


than it might seem.

Equal keys. When duplicates are present, it is (counter-intuitively)


better to stop scans on keys equal to the partitioning item's key.

Preserving randomness. Shuffling is needed for performance guarantee.


Equivalent alternative. Pick a random partitioning item in each subarray.

15
Quicksort: empirical analysis (1961)

Running time estimates:

・Algol 60 implementation.
・National-Elliott 405 computer.

Elliott 405 magnetic disc


sorting N 6-word items with 1-word keys
(16K words)

16
Quicksort: empirical analysis

Running time estimates:


・ Home PC executes 108 compares/second.
・ Supercomputer executes 10 12 compares/second.

insertion sort (N2) mergesort (N log N) quicksort (N log N)

computer thousand million billion thousand million billion thousand million billion

317
home instant 2.8 hours instant 1 second 18 min instant 0.6 sec 12 min
years

super instant 1 second 1 week instant instant instant instant instant instant

Lesson 1. Good algorithms are better than supercomputers.


Lesson 2. Great algorithms are better than good ones.
17
Quicksort: best-case analysis

Best case. Number of compares is ~ N lg N.

initial values

random shuffle

18
Quicksort: worst-case analysis

Worst case. Number of compares is ~ ½ N 2 .

initial values

random shuffle

19
Quicksort: summary of performance characteristics

Quicksort is a (Las Vegas) randomized algorithm.

・Guaranteed to be correct.
・Running time depends on random shuffle.
Average case. Expected number of compares is ~ 1.39 N lg N.

・39% more compares than mergesort.


・Faster than mergesort in practice because of less data movement.
Best case. Number of compares is ~ N lg N.
Worst case. Number of compares is ~ ½ N 2.
[ but more likely that lightning bolt strikes computer during execution ]

20
Quicksort properties

Proposition. Quicksort is an in-place sorting algorithm.


Pf.

・Partitioning: constant extra space.


・Depth of recursion: logarithmic extra space (with high probability).
can guarantee logarithmic depth by recurring
on smaller subarray before larger subarray
(requires using an explicit stack)

Proposition. Quicksort is not stable.


Pf. [ by counterexample ]
i j 0 1 2 3

B1 C1 C2 A1

1 3 B1 C1 C2 A1

1 3 B1 A1 C2 C1

0 1 A1 B1 C2 C1

21
Quicksort: practical improvements

Insertion sort small subarrays.

・Even quicksort has too much overhead for tiny subarrays.


・Cutoff to insertion sort for ≈ 10 items.

private static void sort(Comparable[] a, int lo, int hi)


{
if (hi <= lo + CUTOFF - 1)
{
[Link](a, lo, hi);
return;
}
int j = partition(a, lo, hi);
sort(a, lo, j-1);
sort(a, j+1, hi);
}

22
Quicksort: practical improvements

Median of sample.

・Best choice of pivot item = median.


・Estimate true median by taking median of sample.
・Median-of-3 (random) items.
~ 12/7 N ln N compares (14% less)
~ 12/35 N ln N exchanges (3% more)

private static void sort(Comparable[] a, int lo, int hi)


{
if (hi <= lo) return;

int median = medianOf3(a, lo, lo + (hi - lo)/2, hi);


swap(a, lo, median);

int j = partition(a, lo, hi);


sort(a, lo, j-1);
sort(a, j+1, hi);
}

23
2.3 Q UICKSORT
‣ quicksort
‣ selection
‣ duplicate keys
Algorithms
‣ system sorts

R OBERT S EDGEWICK | K EVIN W AYNE

[Link]
Selection

Goal. Given an array of N items, find the kth smallest item.


Ex. Min (k = 0), max (k = N - 1), median (k = N / 2).

Applications.

・Order statistics.
・Find the "top k."
Use theory as a guide.
・Easy N log N upper bound. How?
・Easy N upper bound for k = 1, 2, 3. How?
・Easy N lower bound. Why?
Which is true?
・N log N lower bound? is selection as hard as sorting?

・N upper bound? is there a linear-time algorithm?

25
Quick-select

Partition array so that:


・Entry a[j] is in place.
・No larger entry to the left of j.
・No smaller entry to the right of j.
Repeat in one subarray, depending on j; finished when j equals k.

k) v
before
public static Comparable select(Comparable[] a, int
{ lo if a[k] is here if a[k] is here
hi
[Link](a);
set hi to j-1 set lo to j+1
int lo = 0, hi = [Link] - 1; during v " v !v
while (hi > lo)
{ i j

int j = partition(a, lo, hi); after "v v !v


if (j < k) lo = j + 1;
else if (j > k) hi = j - 1; lo j hi
else return a[k];
} Quicksort partitioning overview
return a[k];
}

26
Quick-select: mathematical analysis

Proposition. Quick-select takes linear time on average.

Pf sketch.

・Intuitively, each partitioning step splits array approximately in half:


N + N / 2 + N / 4 + … + 1 ~ 2N compares.

・Formal analysis similar to quicksort analysis yields:


CN = 2 N + 2 k ln (N / k) + 2 (N – k) ln (N / (N – k))

・Ex: (2 + 2 ln 2) N ≈ 3.38 N compares to find median.

27
Theoretical context for selection

Proposition. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] Compare-based


selection algorithm whose worst-case running time is linear.

Time Bounds for Selection

bY .
Manuel Blum, Robert W. Floyd, Vaughan Watt,
Ronald L. Rive&, and Robert E. Tarjan

Abstract

L The number of comparisons required to select the i-th smallest of

n numbers is shown to be at most a linear function of n by analysis of


i
a new selection algorithm -- PICK. Specifically, no more than
i 5.4305 n comparisons are ever required. This bound is improved for
L
extreme values of i , and a new lower bound on the requisite number

L of comparisons is also proved.

Remark. Constants are high ⇒ not used in practice.

Use theory as a guide.


L

・Still worthwhile to seek practical linear-time (worst-case) algorithm.


・Until one is discovered, use quick-select if you don’t need a full sort.
28
Generic methods

In our select() implementation, client needs a cast.

Double[] a = new Double[N];


for (int i = 0; i < N; i++)
unsafe cast
a[i] = [Link]();
required in client
Double median = (Double) [Link](a, N/2);

The compiler complains.

% javac [Link]
Note: [Link] uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

Q. How to fix?
29
Generic methods

Pedantic (safe) version. Compiles cleanly, no cast needed in client.

generic type variable


public class QuickPedantic
(value inferred from argument a[])
{
public static <Key extends Comparable<Key>> Key select(Key[] a, int k)
{ /* as before */ }
return type matches array type

public static <Key extends Comparable<Key>> void sort(Key[] a)


{ /* as before */ }

private static <Key extends Comparable<Key>> int partition(Key[] a, int lo, int hi)
{ /* as before */ }

private static <Key extends Comparable<Key>> boolean less(Key v, Key w)


{ /* as before */ }

private static <Key extends Comparable<Key>> void exch(Key[] a, int i, int j)


{ Key swap = a[i]; a[i] = a[j]; a[j] = swap; }

} can declare variables of generic type

[Link]

Remark. Ugly code used in system sort; not in this course.


30
2.3 Q UICKSORT
‣ quicksort
‣ selection
‣ duplicate keys
Algorithms
‣ system sorts

R OBERT S EDGEWICK | K EVIN W AYNE

[Link]
Duplicate keys

Often, purpose of sort is to bring items with equal keys together.

・Sort population by age.


・Remove duplicates from mailing list.
・Sort job applicants by college attended.
sorted by time sorted by city (unstable)
Chicago [Link] Chicago [Link] C
Typical characteristics of such applications.
Phoenix [Link] Chicago [Link] C
Houston [Link] Chicago [Link] C
・ Huge array. Chicago [Link]
Houston [Link]
Chicago [Link]
Chicago [Link]
C
C
・ Small number of key values. Chicago [Link]
Seattle [Link]
Chicago [Link]
Chicago [Link]
C
C
Seattle [Link] Chicago [Link] C
Phoenix [Link] Houston [Link] H
Chicago [Link] Houston [Link] NOT H
Chicago [Link] Phoenix [Link] sorted P
Chicago [Link] Phoenix [Link] P
Seattle [Link] Phoenix [Link] P
Seattle [Link] Seattle [Link] S
Chicago [Link] Seattle [Link] S
Chicago [Link] Seattle [Link] S
Seattle [Link] Seattle [Link] S
Phoenix [Link] Seattle [Link] S

Stability when sorting on a second ke


key

32
Duplicate keys

Quicksort with duplicate keys. Algorithm can go quadratic unless


partitioning stops on equal keys!

S T O P O N E Q U A L K E Y S

swap if we don't stop if we stop on


on equal keys equal keys

Caveat emptor. Some textbook (and commercial) implementations


go quadratic when many duplicate keys.

33
Partitioning an array with all equal keys

35
Duplicate keys: the problem

Recommended. Stop scans on items equal to the partitioning item.


Consequence. ~ N lg N compares when all keys equal.

B A A B A B C C B C B A A A A A A A A A A A

Mistake. Don't stop scans on items equal to the partitioning item.


Consequence. ~ ½ N 2 compares when all keys equal.

B A A B A B B B C C C A A A A A A A A A A A

Desirable. Put all items equal to the partitioning item in place.

A A A B B B B B C C C A A A A A A A A A A A

36
3-way partitioning

Goal. Partition array into three parts so that:


・Entries between lt and gt equal to the partition item.
・No larger entries to left of lt.
・No smaller entries to right of gt.
before v

lo hi
before v =v >v
during <v
lo hi
lt i gt
during <v =v >v
after <v =v >v
lt i gt
lo lt gt hi
after <v =v >v
3-way partitioning
lo lt gt hi

3-way partitioning
Dutch national flag problem. [Edsger Dijkstra]

・Conventional wisdom until mid 1990s: not worth doing.


・Now incorporated into C library qsort() and Java 6 system sort.
37
Sorting summary

inplace? stable? best average worst remarks

selection ✔ ½N2 ½N2 ½N2 N exchanges

use for small N


insertion ✔ ✔ N ¼N2 ½N2
or partially ordered

tight code;
shell ✔ N log3 N ? c N 3/2 subquadratic

N log N guarantee;
merge ✔ ½ N lg N N lg N N lg N
stable

improves mergesort
timsort ✔ N N lg N N lg N when preexisting order

N log N probabilistic guarantee;


quick ✔ N lg N 2 N ln N ½N2
fastest in practice

improves quicksort
3-way quick ✔ N 2 N ln N ½N2 when duplicate keys

? ✔ ✔ N N lg N N lg N holy sorting grail

44

You might also like