Professional Documents
Culture Documents
T02 - Algorithm Analysis and Design
T02 - Algorithm Analysis and Design
Learning objectives
1. Understand why it is important to be able to compare the
complexity of algorithms
2. Measure the complexity of algorithms
3. Analyse the performance of algorithms
4. Be able to perform empirical analyses of algorithms
5. Develop new algorithms
2
RMIT Classification: Trusted
Array Search
• 10 5
• 100 50
• 1000 500
• 1,000,000 500,000 4
RMIT Classification: Trusted
Binary Search
Items Tries
• 10 4
• 100 7
• 1000 10
• 10,000 14
• 100,000 17
• 1,000,000 20 For a sorted array of size N, the
number of tries to find a value is
• 10,000,000 24 correlated to log2N
5
RMIT Classification: Trusted
Sorting
• This is a very interesting topic that we will cover extensively
in class as many problems require using or generating
sorted data
• But to answer the question on the medical company we
need to be able to determine how fast sorting can be done
6
RMIT Classification: Trusted
Which is Faster?
• Assume that sorting takes 10 n*log(n) msec, linear search takes 2*n
msec and binary search takes 2*log(n) msec
• If you do the search 10 times on 106 which approach is best?
1. 10*2*106 = 20*106 msec = 20,000 sec
2. Sorting: 10 * 106 * log 106 ≈ 10*106 *13.8 msec ≈ 138,000 sec +
Searching: 10*2*log 106 = 0.276 sec. Total ≈ 138,000 sec
• If you do the search 1000 times and the data size is 106 then
1. 1000*2*106 = 2000*106 msec = 2,000,000 sec
2. Sorting: 10 * 106 * log 106 ≈ 10*106 *13.8 msec ≈ 138,000 secs +
Searching: 1000*2*log 106 = 27.6 seconds. Total ≈ 138,027 seconds
7
RMIT Classification: Trusted
Program Performance
• Assume that we can find a formula that, for a
specific machine, determines how long a
program takes to run on various input (N) sizes
• When do programs A and B become impractical?
o If n = 10 then A takes 20 and B takes 50 secs
o If n = 25 they both take 125 secs.
o If n = 103 then A takes 200,000 secs and B takes 5,000 secs.
o If n = 106 then A takes 2x1011 seconds and B take 5x106 seconds
8
RMIT Classification: Trusted
Finding Primes
How do you check if a number, n, is prime
• Checking if each number from 2 to n-1 divides n
• Running the loop up to n/2
• Use the fact that for p > 3, p can be written as 6n ± 1
o So if the number is not of this form, it cannot be prime
• Running the loop up to the square root of n
• CAN WE DO BETTER?
9
RMIT Classification: Trusted
Comparing Performance
• Different algorithms that solve the same problem can have very
different performance
o How do we compare the performance of two programs that solve the same
problem?
o Should we just run them and compare their time requirements?
• Need address issues such as - what computer to use, what data to use
and how much space is needed?
• An inefficient algorithm on a small data set (i.e., one that grows
quadratically) would be unfeasible to run on a large data set
11
RMIT Classification: Trusted
What to measure?
• In this lecture, we look at the ways of estimating the
running time of a program and how to compare the running
times of two programs without ever implementing them
• It is vital to analyse the resource use of an algorithm, well
before it is implemented and deployed
• Space is also important but we focus on time in this course
12
RMIT Classification: Trusted
Time Efficiency
RMIT Classification: Trusted
14
RMIT Classification: Trusted
Basic operation
Operation(s) that contribute most towards the total running
time
• Examples:
o Compare ( i != j )
o Add ( i + j )
o Multiply ( i * j )
o Divide ( i / j )
o Assignment ( i = j )
16
RMIT Classification: Trusted
17
RMIT Classification: Trusted
18
RMIT Classification: Trusted
19
RMIT Classification: Trusted
20
RMIT Classification: Trusted
21
RMIT Classification: Trusted
Runtime Complexity
• Worst Case - Given an input of n items, what is the
maximum running time for any possible input?
• Best Case - Given an input of n items, what is the
minimum running time for any possible input?
• Average Case - Given an input of n items, what is the
average running time across all possible inputs?
NOTE: Average Case is not the average of the worst and best case.
Rather, it is the average performance across all possible inputs.
22
RMIT Classification: Trusted
Runtime Complexity
• Sequential Search: search for the key by traversing values
in the array one-by-one
• Best-case:
o The best case input is when the item being searched for is
the first item in the list, so Cbest (n) = 1.
• Worst-case:
o The worst case input is when the item being searched for is
not present in the list, so Cworst (n) = n.
23
RMIT Classification: Trusted
Runtime Complexity
• Average Case: What does average case mean?
o Recall: average across all possible inputs – how to
analyse this?
o Typically not straight forward
24
RMIT Classification: Trusted
Runtime Complexity
Average Case Analysis: p is the probability of a successful
search.
If search is successful
Cavg(n) = (1 + 2 + … + n)/n = (n + 1)/2
(Assume the probability of finding the search value at each
element is the same)
If search is unsuccessful
Cavg(n) = n
25
RMIT Classification: Trusted
Summary
• Input size, basic operation
• Time complexity estimate using input size and
basic operation
• Best, worst, and average cases
26
RMIT Classification: Trusted
Asymptotic Complexity
RMIT Classification: Trusted
Asymptotic Complexity
• Problem:
o We now have a way to analyse the running time (a.k.a.
time complexity) of an algorithm, but every algorithm
has their own time complexity.
28
RMIT Classification: Trusted
Asymptotic Complexity
• Solution:
o Group them into equivalence classes (for
easier comparison and understanding), with
respect to the input size
o Focus of this part: asymptotic complexity and
equivalence classes
29
RMIT Classification: Trusted
Asymptotic Complexity
Consider the running time
estimates of two algorithms:
• Algo 1:
• Algo 2:
Asymptotic Complexity
What about the
followings:
• Algo 3:
• Algo 4:
31
RMIT Classification: Trusted
• In other words:
o As n becomes large, what is the dominant term
contributing to the running time ?
32
RMIT Classification: Trusted
33
RMIT Classification: Trusted
34
RMIT Classification: Trusted
35
RMIT Classification: Trusted
What to use if ?
• Any of the above functions are possible!
36
RMIT Classification: Trusted
Asymptotic Complexity
If an algorithm requires n2–3*n+10 seconds to solve a problem size n. If
constants k and n0 exist such that
k*n2 > n2–3*n+10 for all n n0
the algorithm is order n2 (In fact, k is 3 and n0 is 2)
3*n2 > n2–3*n+10 for all n 2
Thus, the algorithm requires no more than
k*n2 time units for n n0 where k = 3 and no = 2
So, it is O(n2)
37
RMIT Classification: Trusted
List sorted from best to worst. Many other options e.g. sqrt(n), log(log(n)), etc.
38
RMIT Classification: Trusted
Bike Tour
• Suppose you decide to ride a
bicycle around Ireland
o You will start in Dublin and your
goal is to visit Cork, Galway,
Limerick, and Belfast before
returning to Dublin
• What is the best itinerary?
o How can you minimize the
number of kilometers yet make
sure you visit all the cities?
39
RMIT Classification: Trusted
Optimal Tour
• If there are only 5 cities it’s not too hard to figure out the
optimal tour
o The shortest path is most likely a “loop”
o Any path that crosses over itself will be longer than a path that
travels in a big circle
40
RMIT Classification: Trusted
41
RMIT Classification: Trusted
6 60
310,224,200,866,619,719,680,000
7 360
8 2,520
9 20,160
10 181,440
42
RMIT Classification: Trusted
• One example of the famous P versus NP problems which are problems where
the best known solutions are really hard (exponential) to solve but are relative
easy (polynomial) to check
o Examples - subset sum, jigsaw solving
1 -3 7 2 11 -9 -6
14 5 11 -10 -1 22 -11
43
RMIT Classification: Trusted
Real-Life Applications
• It’s not likely anyone would want to plan a bike trip to 25 cities
• But the solution of many important real-world problems is the same as
finding a tour of a large number (n>1000) of cities
o Transportation: school bus routes, service calls
o Manufacturing: industrial robot drilling holes in a
printed circuit, VLSI (microchip) layout
o Planning a telecom network
• https://www.quantamagazine.org/computer-scientists-break-traveling-s
alesperson-record-20201008/
44
RMIT Classification: Trusted
Is P = NP?
• Open Question: if one can check an answer in polynomial time, is there
an algorithm to find the answer in polynomial time
• Examples : sudoku, Jigsaws, subset-sum, graph colouring, map
colouring, travelling salesman, Hamiltonian path
45
RMIT Classification: Trusted
46
RMIT Classification: Trusted
47
RMIT Classification: Trusted
48
RMIT Classification: Trusted
Growth-Rate Functions
• Ignore low-order terms in the growth-rate function
o If an algorithm is O(n3+4n2+3n+5), it is also O(n3)
• Ignore a multiplicative constant in the higher-order term
o If an algorithm is O(5n3), it is also O(n3)
• Combine two modules with growth rates O(f(n)) and O(g(n)) is O(f(n)
+g(n))
o If an algorithm is O(n3) + O(4n), it is also O(n3 +4n2), so, it is O(n3)
• We can also multiply two growth rates e.g. there is a loop inside a loop
o So a O(n2) loop inside O(n) loop is O(n3)
49
RMIT Classification: Trusted
Answer: O(n4)
50
RMIT Classification: Trusted
Analysis of Algorithms
RMIT Classification: Trusted
52
RMIT Classification: Trusted
53
RMIT Classification: Trusted
Example: an
Input size: n
C(n):
=n
54
RMIT Classification: Trusted
= n2
55
RMIT Classification: Trusted
Recursion
• Recursion is fundamental tool in computer science.
o A recursive program (or function) is one that calls itself.
o It must have a termination condition defined.
• Many interesting algorithms are simply expressed with a
recursive approach
56
RMIT Classification: Trusted
Input size: n
57
RMIT Classification: Trusted
58
RMIT Classification: Trusted
Backward Substitution
Recurrence: for , and
59
RMIT Classification: Trusted
Backward Substitution
Aim of simplification and backward substitution: Convert
to , e.g.,
60
RMIT Classification: Trusted
Backward Substitution
Recurrence: for , and
Hence
61
RMIT Classification: Trusted
Empirical Analysis
• Theoretical analysis of the complexity of an algorithm gives
an estimate of the running time and growth rate, but not the
real time
• Measuring the actual time of an implementation takes is in
the real world is very important, especially when comparing
two algorithms with the same time complexity
62
RMIT Classification: Trusted
Growth-Rate Functions – Ex 1
Cost Occurrences
i=1 c1 1
sum = 0 c2 1
while (i <= n): c3 n+1
i=i+1 c4 n
sum = sum + i c5 n
63
RMIT Classification: Trusted
Growth-Rate Functions – Ex 2
Cost Occurrences
i=1 c1 1
sum = 0 c2 1
while (i <= n): c3 n+1
j=1 c4 n
while (j <= n): c5 n*(n+1)
sum = sum + i c6 n*n
j=j+1 c7 n*n
i = i +1 c8 n
T(n) = c1 + c2 + (n+1)*c3 + n*c4 + n*(n+1)*c5+n*n*c6+n*n*c7+n*c8
= (c5+c6+c7)*n2 + (c3+c4+c5+c8)*n + (c1+c2+c3) = a*n2 + b*n + c
So, the growth-rate function for this algorithm is O(n2)
64
RMIT Classification: Trusted
65
RMIT Classification: Trusted
66
RMIT Classification: Trusted
67
RMIT Classification: Trusted
68
RMIT Classification: Trusted
Estimating Performance
• We can estimate the growth rate if we collect performance data on a number of
inputs of different sizes
• We can then apply it to other input sizes
70
RMIT Classification: Trusted
Benchmarking Algorithms
Theoretical: Merge-sort and Quick-sort have O(n log(n))
complexity, while Selection-sort is O(n2).
Empirical: Running Times (in seconds) for different sorting
algorithms on a randomised list:
Input (list size) 500 2,500 10,000
Merge sort 0.8 8.1 39.8
Quick sort 0.3 1.3 5.3
Selection sort 1.5 35.0 534.7
71
RMIT Classification: Trusted
Benchmarking Algorithms - 2
Theoretical: Merge-sort and Quick-sort have O(n log(n))
complexity, while Selection-sort is O(n2).
Empirical: Running Times (in seconds) for different sorting
algorithms on an ordered list:
72
RMIT Classification: Trusted
Question
One of your data sources to predict the weather are images
sent from satellites. When you receive this data, you find that
for 5, 10 and 20 images your code took 24 msec, 99 msec
and 401 msec respectively.
o What is order of complexity of your algorithm and justify your
answer
o Your final input file has 1,000,000 images. Without using a
calculator, can you determine approximately how many days it will
take your code to run?
73
RMIT Classification: Trusted
Best Practices
75
RMIT Classification: Trusted
76
RMIT Classification: Trusted
Other Approaches
• Heuristic algorithms
o Do not guarantee that the best will be found but usually will quickly
find a solution close to the best one.
o Sometimes these algorithms actually find the best solution, but the
algorithm is still called heuristic until this solution is proven to be the
best.
• In general if X is the found solution and Y is the optimal solution,
then
o Deterministic algorithm: X = Y
o Approximation Algorithm : X is close to Y
o Probabilistic algorithm - X is "usually" Y 77
RMIT Classification: Trusted
Other Topics
RMIT Classification: Trusted
79
RMIT Classification: Trusted
80
RMIT Classification: Trusted
81
RMIT Classification: Trusted
82
RMIT Classification: Trusted
83
RMIT Classification: Trusted
Examples
Upper bound Lower bound
t (n) O(n) O(n2) O(n3) Ω(n) Ω(n2) Ω(n3)
log2 n T T T F F F
10n + 5 T T T T F F
n(n − 1)/2 F T T T T F
(n + 1)3 F F T T T T
F F F T T T
2n
84
RMIT Classification: Trusted
Some Clarifications…
• Generally O(n) is most commonly used…
• But exact bounds Θ(n) tell us the bounds are tight and the
algorithm doesn’t have anything outside what we expect
• Lower bounds Ω(n) are useful to describe the (theoretical)
limits of whole classes of algorithms, and also sometimes
useful to state how fast can the best case reach.
85
RMIT Classification: Trusted
Some clarifications…
• O(n) is not the same thing as “Worst Case Efficiency”
• Ω(n) is not the same thing as “Best Case Efficiency”
• Θ(n) is not the same thing as “Average Case Efficiency”
86
RMIT Classification: Trusted