Professional Documents
Culture Documents
14AnalysisOfAlgorithms PDF
14AnalysisOfAlgorithms PDF
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
F O U R T H
E D I T I O N
mathematical models
order-of-growth classifications
theory of algorithms
memory
Cast of characters
Programmer needs to develop
a working solution.
Student might play
any or all of these
roles someday.
Theoretician wants
to understand.
Running time
Analytic Engine
3
this course
Provide guarantees.
Understand theoretical basis.
theory of algorithms
Friedrich Gauss
1805
time
quadratic
64T
32T
16T
linearithmic
8T
size
linear
1K 2K
4K
8K
5
Andrew Appel
PU '81
time
quadratic
64T
32T
16T
linearithmic
8T
size
linear
1K 2K
4K
8K
6
The challenge
Q. Will my program be able to solve a large practical input?
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
Example: 3-SUM
3-SUM. Given N distinct integers, how many triples sum to exactly zero?
% more 8ints.txt
8
30 -40 -20 -10 40 0 10 5
% java ThreeSum 8ints.txt
4
a[i]
a[j]
a[k]
sum
30
-40
10
30
-20
-10
-40
40
-10
10
A. Manual.
70
% java ThreeSum 2Kints.txt
tick tick tick tick tick tick tick tick
tick tick tick tick tick tick tick tick
tick tick tick tick tick tick tick tick
528
% java ThreeSum 4Kints.txt
4039
12
(part of stdlib.jar )
Stopwatch()
double elapsedTime()
13
Empirical analysis
Run the program for various input sizes and measure running time.
14
Empirical analysis
Run the program for various input sizes and measure running time.
time (seconds)
250
0.0
500
0.0
1,000
0.1
2,000
0.8
4,000
6.4
8,000
51.1
16,000
15
Data analysis
Standard plot. Plot running time T (N) vs. input size N.
standard plot
log-log plot
50
51.2
25.6
12.8
lg (T(N ))
40
30
20
6.4
3.2
1.6
.8
.4
10
.2
.1
1K
2K
4K
8K
problem size N
Analysis of experimental data (the running time of ThreeSum)
16
Data analysis
Log-log plot. Plot running time T (N) vs. input size N using log-log scale.
log-log plot
51.2
25.6
straight line
of slope 3
lg (T(N ))
12.8
lg(T (N)) = b lg N + c
b = 2.999
c = -33.2103
6.4
3.2
1.6
T (N) = a N b, where a = 2 c
.8
.4
.2
.1
8K
1K
2K
4K
8K
lgN
power law
slope
Predictions.
time (seconds)
8,000
51.1
8,000
51.0
8,000
51.1
16,000
410.8
validates hypothesis!
18
Doubling hypothesis
Doubling hypothesis. Quick way to estimate b in a power-law relationship.
Run program, doubling the size of the input.
N
time (seconds)
ratio
lg ratio
250
0.0
500
0.0
4.8
2.3
1,000
0.1
6.9
2.8
2,000
0.8
7.7
2.9
4,000
6.4
8.0
3.0
8,000
51.1
8.0
3.0
Doubling hypothesis
Doubling hypothesis. Quick way to estimate b in a power-law relationship.
Q. How to estimate a (assuming we know b) ?
A. Run the program (for a sufficient large value of N) and solve for a.
N
8,000
time (seconds)
51.1
51.1 = a 80003
8,000
51.0
8,000
51.1
a = 0.998 10 10
Experimental algorithmics
System independent effects.
Algorithm.
Input data.
determines exponent b
in power law
determines constant a
in power law
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
Donald Knuth
1974 Turing Award
operation
example
nanoseconds
integer add
a + b
2.1
integer multiply
a * b
2.4
integer divide
a / b
5.4
floating-point add
a + b
4.6
floating-point multiply
a * b
4.2
floating-point divide
a / b
13.5
sine
Math.sin(theta)
91.3
arctangent
Math.atan2(y, x)
129.0
...
...
...
25
operation
example
nanoseconds
variable declaration
int a
c1
assignment statement
a = b
c2
integer compare
a < b
c3
a[i]
c4
array length
a.length
c5
1D array allocation
new int[N]
c6 N
2D array allocation
new int[N][N]
string length
s.length()
c8
substring extraction
s.substring(N/2, N)
c9
string concatenation
s + t
c10 N
c7 N
Example: 1-SUM
Q. How many instructions as a function of input size N ?
int count = 0;
for (int i = 0; i < N; i++)
if (a[i] == 0)
count++;
operation
frequency
variable declaration
assignment statement
N+1
equal to compare
array access
increment
N to 2 N
27
Example: 2-SUM
Q. How many instructions as a function of input size N ?
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
0 + 1 + 2 + . . . + (N
operation
frequency
variable declaration
N+2
assignment statement
N+2
(N + 1) (N + 2)
equal to compare
N (N 1)
1) =
=
1
N (N
2
N
2
1)
array access
N (N 1)
increment
N (N 1) to N (N 1)
28
THIS
Downloaded
SUMMARY
A number of methods of solving sets of linear equations and inverting matrices
are discussed. The theory of the rounding-off errors involved is investigated for
some of the methods. In all cases examined, including the well-known 'Gauss
elimination process', it is found that the errors are normally quite moderate: no
exponential build-up need occur.
Included amongst the methods considered is a generalization of Choleski's method
which appears to have advantages over other known methods both as regards
accuracy and convenience. This method may also be regarded as a rearrangement
of the elimination process.
29
operation
frequency
variable declaration
N+2
assignment statement
N+2
(N + 1) (N + 2)
equal to compare
N (N 1)
array access
N (N 1)
increment
N (N 1) to N (N 1)
1) =
=
1
N (N
2
N
2
1)
N 3/6
Ex 1.
N 3 + 20 N + 16
~ N3
Ex 2.
~ N
Ex 3.
N3 - N
+ 100 N
2
4/3
+ 56
+ N
N 3/6 ! N 2/2 + N /3
166,666,667
~ N3
166,167,000
N
1,000
Leading-term approximation
f (N)
= 1
g(N)
lim
31
operation
frequency
tilde notation
variable declaration
N+2
~N
assignment statement
N+2
~N
(N + 1) (N + 2)
~ N2
equal to compare
N (N 1)
~ N2
array access
N (N 1)
~ N2
increment
N (N 1) to N (N 1)
~ N2 to ~ N2
32
Example: 2-SUM
Q. Approximately how many array accesses as a function of input size N ?
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
if (a[i] + a[j] == 0)
count++;
"inner loop"
0 + 1 + 2 + . . . + (N
1) =
A. ~ N 2 array accesses.
1
N (N
2
N
2
1)
Bottom line. Use cost model and tilde notation to simplify counts.
33
Example: 3-SUM
Q. Approximately how many array accesses as a function of input size N ?
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i+1; j < N; j++)
for (int k = j+1; k < N; k++)
if (a[i] + a[j] + a[k] == 0)
count++;
"inner loop"
N
3
A. ~ N 3 array accesses.
N (N
1)(N
3!
2)
1 3
N
6
Bottom line. Use cost model and tilde notation to simplify counts.
34
Ex 1. 1 + 2 + + N.
i
N
k
i=1
1
N k+1
k+1
x=1
i=1
xk dx
1 2
N
2
x=1
i=1
Ex 2. 1k + 2k + + N k.
x dx
1
i
x=1
x=1
1
dx = ln N
x
y=x
z=y
dz dy dx
1 3
N
6
35
TN = c1 A + c2 B + c3 C + c4 D + c5 E
A=
B=
C=
D=
E=
array access
integer add
integer compare
increment
variable assignment
frequencies
(depend on algorithm, input)
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
time
Common order-of-growth
classifications
200T
order of growth discards
leading coefficient
1, log N, N, N log N, N
2,
3,
and
2N
logarithmic
constant
200K
500K
size
ic
ea
hm
lin
lin
ea
qua
r it
dra
tic
cubi
c
512T
exponential
log-log plot
time
64T
8T
4T
2T
logarithmic
constant
1K
2K
4K
8K
size
512K
name
description
example
T(2N) / T(N)
constant
a = b + c;
statement
add two
numbers
log N
logarithmic
while (N > 1)
N = N / 2; ...
divide in half
binary search
~1
linear
loop
find the
maximum
N log N
linearithmic
divide
and conquer
mergesort
~2
double loop
check all
pairs
triple loop
check all
triples
exhaustive
search
check all
subsets
T(N)
N2
quadratic
N3
cubic
2N
exponential
40
1980s
1990s
2000s
any
any
any
any
log N
any
any
any
any
millions
tens of
millions
hundreds of
millions
billions
N log N
hundreds of
thousands
millions
millions
hundreds of
millions
N2
hundreds
thousand
thousands
tens of
thousands
N3
hundred
hundreds
thousand
thousands
2N
20
20s
20s
30
Bottom line. Need linear or linearithmic alg to keep pace with Moore's law.
41
13
14
25
33
43
51
53
64
72
84
93
95
96
97
10
11
12
13
14
lo
hi
42
Invariant. If key appears in the array a[], then a[lo] key a[hi].
43
Pf sketch.
T (N)
T (N / 2) + 1
given
T (N / 4) + 1 + 1
T (N / 8) + 1 + 1 + 1
...
T (N / N) + 1 + 1 + + 1
= 1 + lg N
44
Step 1:
Step 2:
input
30 -40 -20 -10 40
sort
-40 -20 -10
0 10
5 10 30 40
60
(-40, -10)
(-40,
0)
(-40,
5)
(-40, 10)
50
40
35
30
(-40,
Step 1:
Step 2:
N2
log N.
40)
(-20, -10)
30
(-10,
0)
10
( 10,
30)
-40
( 10,
( 30,
40)
40)
-50
-70
only count if
a[i] < a[j] < a[k]
to avoid
double counting
45
Comparing programs
Hypothesis. The sorting-based N 2 log N algorithm for 3-SUM is significantly
faster in practice than the brute-force N 3 algorithm.
time (seconds)
time (seconds)
1,000
0.1
1,000
0.14
2,000
0.8
2,000
0.18
4,000
6.4
4,000
0.34
8,000
51.1
8,000
0.96
16,000
3.67
32,000
14.88
64,000
59.16
ThreeSum.java
ThreeSumDeluxe.java
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
Types of analyses
Best case. Lower bound on cost.
Best:
~ N3
Best:
~ 1
Average:
~ N3
Average:
~ lg N
Worst:
~ N3
Worst:
~ lg N
49
Types of analyses
Best case. Lower bound on cost.
Worst case. Upper bound on cost.
Average case. Expected cost.
50
Theory of algorithms
Goals.
51
notation
provides
example
shorthand for
used to
N2
Big Theta
asymptotic
order of growth
(N2)
10 N2
5 N2 + 22 N log N + 3N
classify
algorithms
10 N2
Big Oh
O(N2)
100 N
22 N log N + 3 N
develop
upper bounds
N2
Big Omega
(N2)
and larger
(N2)
N5
N3 + 22 N log N + 3 N
develop
lower bounds
52
Ex. Have to examine all N entries (any unexamined one might be 0).
Running time of the optimal algorithm for 1-SUM is (N ).
Optimal algorithm.
).
54
logN ).
Develop an algorithm.
Prove a lower bound.
Gap?
1970s-.
Steadily decreasing upper bounds for many important problems.
Many known optimal algorithms.
Caveats.
Commonly-used notations
notation
Tilde
Big Theta
provides
leading term
asymptotic
growth rate
example
~ 10 N2
shorthand for
used to
10 N2
provide
approximate
model
10 N2 + 22 N log N
10 N2 + 2 N + 37
N2
(N2)
10 N2
Big Oh
O(N2)
classify
algorithms
10 N2
5 N2 + 22 N log N + 3N
100 N
22 N log N + 3 N
develop
upper bounds
N2
Big Omega
(N2)
N5
N3 + 22 N log N + 3 N
develop
lower bounds
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
Basics
Bit. 0 or 1.
NIST
Byte. 8 bits.
Megabyte (MB). 1 million or 220 bytes.
Gigabyte (GB).
60
type
bytes
type
bytes
boolean
char[]
2N + 24
byte
int[]
4N + 24
char
double[]
8N + 24
int
float
long
double
type
bytes
char[][]
~2MN
int[][]
~4MN
double[][]
~8MN
24 bytes
multiple of 8 bytes.
object
overhead
int
x
value
bytes padding
of memory.
32 bytes
object
overhead
day
month
year
padding
4 bytes (int)
int
values
4 bytes (int)
4 bytes (int)
4 bytes (padding)
32 bytes
counter object
public class Counter
{
32 bytes
62
40 bytes
object
overhead
value
offset
count
hash
padding
substring example
String genome = "CGCCTGGCGTCTGTAC";
reference
int
values
4 bytes (int)
4 bytes (int)
4 bytes (padding)
2N + 64 bytes
63
Padding:
64
Example
Q. How much memory does WeightedQuickUnionUF use as a function of N ?
Use tilde notation to simplify your answer.
public class WeightedQuickUnionUF
{
private int[] id;
private int[] sz;
private int count;
16 bytes
(object overhead)
8 + (4N + 24) each
reference + int[] array
4 bytes (int)
4 bytes (padding)
public WeightedQuickUnionUF(int N)
{
id = new int[N];
sz = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
for (int i = 0; i < N; i++) sz[i] = 1;
}
...
}
A. 8 N + 88 ~ 8 N bytes.
65
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
mathematical models
order-of-growth classifications
theory of algorithms
memory
67
Algorithms
1.4 A NALYSIS
OF
A LGORITHMS
introduction
observations
Algorithms
F O U R T H
E D I T I O N
mathematical models
order-of-growth classifications
theory of algorithms
memory