Analysis of Algorithms
Data Structures and Algorithms
Prof. Dr. Mohamed Amine Chatti
Social Computing Group, University of Duisburg-Essen
www.uni-due.de/soco
Agenda
Introduction
Observations
Mathematical Models
Order-of-Growth Classifications
http://algs4.cs.princeton.edu Theory of Algorithms
Chapter 1.4
Data Structures and Algorithms - Analysis of Algorithms 2
Agenda
Introduction
Observations
Mathematical Models
Order-of-Growth Classifications
http://algs4.cs.princeton.edu Theory of Algorithms
Chapter 1.4
Data Structures and Algorithms - Analysis of Algorithms 3
Cast of Characters
Programmer needs to develop
a working solution
Student might play
any or all of these
roles someday
Client wants to solve
problem efficiently
Theoretician wants
to understand
Data Structures and Algorithms - Analysis of Algorithms 4
Running Time
“As soon as an Analytic Engine exists, it will necessarily guide the future
course of the science. Whenever any result is sought by its aid, the question
will arise—By what course of calculation can these results be arrived at by
the machine in the shortest time?” — Charles Babbage (1864)
How many times do you have to
turn the crank?
Analytic Engine
Data Structures and Algorithms - Analysis of Algorithms 5
Reasons to Analyze Algorithms
Predict performance
Compare algorithms this course
Provide guarantees
Understand theorical basis “Berechenbarkeit und Komplexität”
Primary practical reason: avoid performance issues
Client gets poor performance because programmer
did not understand performance characteristics
Data Structures and Algorithms - Analysis of Algorithms 6
In-class Exercise
• Assume that the running time of an algorithm is related to frequency of
execution of its operations.
• Suppose that 𝑛 equals 1 million. Approximately how much faster is an
algorithm (A) that performs 𝑛 ∙ lg 𝑛 operations versus another algorithm (B)
that performs 𝑛2 operations? Recall that lg is the base-2 logarithm function.
• (A) 20x
• (B) 1000x Compute
𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑓 𝐵
𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑓 𝐴
• (C) 50000x
• (D) 1000000x
2 minutes
Data Structures and Algorithms - Analysis of Algorithms 7
The Challenge
Why is my Why does it run
• Q: Will my program be able to solve program so slow? out of memory?
a large practical input?
• Insight
• Use scientific method to understand
performance [Knuth 1970s]
Data Structures and Algorithms - Analysis of Algorithms 8
Scientific Method Applied to Analysis of Algorithms
• A framework for predicting performance and comparing algorithms
• Scientific Method
• Observe some feature of the natural world
• Hypothesize a model that is consistent with the observations
• Predict events using the hypothesis
• Verify the predictions by making further observations
• Validate by repeating until the hypothesis and observations agree
• Principles
• Experiments must be reproducible
• Hypotheses must be falsifiable
Data Structures and Algorithms - Analysis of Algorithms 9
Agenda
Introduction
Observations
Mathematical Models
Order-of-Growth Classifications
http://algs4.cs.princeton.edu Theory of Algorithms
Chapter 1.4
Data Structures and Algorithms - Analysis of Algorithms 10
Example: 3-Sum
• Given N distinct integers, how many triples sum to exactly zero?
𝒂𝒊 𝒂𝒋 𝒂𝒌 𝒔𝒖𝒎
$ cat data/8ints.txt
30 -40 10 0
30 -40 -20 -10 40 0 10 5
30 -20 -10 0
$ python3 3sum.py data/8ints.txt
4 -40 40 0 0
-10 0 10 0
Data Structures and Algorithms - Analysis of Algorithms 11
3-Sum: Brute-Force Algorithm
import sys
How to measure
import DSA the running time of
this program?
def count(a): In Java
N = len(a)
count = 0
for i in range(0,N):
for j in range(i+1,N):
for k in range(j+1,N): Check each triple
if a[i] + a[j] + a[k] == 0:
count+=1
return count
f = DSA.In(sys.argv[1]) $ cat data/8ints.txt
30 -40 -20 -10 40 0 10 5
a = f.readAllInts()
print(count(a)) $ python3 3sum.py data/8ints.txt
4
Data Structures and Algorithms - Analysis of Algorithms 12
Measuring The Running Time
• Q. How to time a program?
• A. Manual
Data Structures and Algorithms - Analysis of Algorithms 13
Measuring The Running Time
class Stopwatch (part of DSA.py)
Stopwatch() Create a new stopwatch
float: elapsedTime() Time since creation (in seconds)
• Q. How to time a program?
• A. Automatic
f = DSA.In(sys.argv[1])
a = f.readAllInts()
s = DSA.Stopwatch()
c = count(a)
time = s.elapsedTime()
print("elapsed time:",time,"seconds")
print(c)
Data Structures and Algorithms - Analysis of Algorithms 14
Empirical Analysis
$ python3 3sum.py data/8ints.txt
elapsed time: 0.00 seconds
4
$ python3 3sum.py data/1Kints.txt
elapsed time: 18.62 seconds
• Run the program for various input 70
sizes and measure running time $ python3 3sum.py data/1Kints.txt
Elapsed time: 18.79 seconds
70
$ python3 3sum.py data/2Kints.txt
|
Data Structures and Algorithms - Analysis of Algorithms 15
Empirical Analysis
Input 𝑵 time (seconds)†
size 𝑵 𝑻 𝑵
250 0
500 0
• Run the program for various input 1000 0,1
sizes and measure running time 2000 0,8
4000 6,4
8000 51,1
16000 ?
Data Structures and Algorithms - Analysis of Algorithms 16
Data Analysis – Standard plot
Input 𝑵 time (seconds)†
size 𝑵 𝑻 𝑵
250 0
500 0
1000 0,1
2000 0,8
4000 6,4
8000 51,1
16000 ?
• Plot running time 𝑇 𝑁 vs. input size 𝑁
Data Structures and Algorithms - Analysis of Algorithms 17
Data Analysis – Log-Log Plot
log-log plot
• Plot running time 𝑇 𝑁 vs. input size 𝑁 7
Straight
using a log-log scale 6
5 line of
Input 𝑵 time (seconds)† 4 slope 3
𝒍𝒈(𝑵) 𝒍𝒈(𝑻(𝑵)) 3
size 𝑵 𝑻 𝑵
lg(T(N))
2
1000 9,966 0,1 -3,322 1
0
2000 10,966 0,8 -0,322
-1 0 5 10 15
4000 11,966 6,4 2,678 -2
-3
8000 12,966 51,1 5,675
Power law -4
lg(N)
• Regression
• Fit a straight line through data points: 𝑎 ∙ 𝑁 𝑏 slope lg 𝑇 𝑁 = 𝑏 ∙ lg 𝑁 + 𝑐
𝑏 = 2,999
• Hypothesis 𝑐 = −33,210
• The running time is about 1,006 ∙ 10−10 ∙ 𝑁 2,999
seconds 𝑇 𝑁 = 𝑎 ∙ 𝑁 𝑏 , 𝑤ℎ𝑒𝑟𝑒 𝑎 = 2𝑐
Data Structures and Algorithms - Analysis of Algorithms 18
In-class Exercise
• How to find the value of 𝑏? 7
log-log plot
• How to find the value of 𝑐? 6
5
Straight
line of
4 slope 3
3
lg(T(N))
2
1
0
Input 𝑵 time (seconds)† -1 0 5 10 15
𝒍𝒈(𝑵) 𝒍𝒈(𝑻(𝑵))
size 𝑵 𝑻 𝑵 -2
1000 9,966 0,1 -3,322 -3
-4
2000 10,966 0,8 -0,322 lg(N)
4000 11,966 6,4 2,678
8000 12,966 51,1 5,675 lg 𝑇 𝑁 = 𝑏 ∙ lg 𝑁 + 𝑐
𝑏 =?
𝑐 =?
2 minutes
Data Structures and Algorithms - Analysis of Algorithms 19
Logarithms and Exponential Rules
Logarithm Rules Exponential Rules
• lg 𝑛 = log 2 𝑛 (binary logarithm) • 𝑎 = 𝑏 log𝑏 𝑎 • 𝑎𝑚 ∙ 𝑎𝑛 = 𝑎𝑚+𝑛 ,
𝑎𝑚
• ln 𝑛 = log 𝑒 𝑛 (natural logarithm) • log 𝑐 𝑎𝑏 = log 𝑐 𝑎 + log 𝑐 𝑏 • = 𝑎𝑚−𝑛 , 𝑎≠0
𝑎𝑛
• lg 𝑘 𝑛 = lg 𝑛 𝑘
(exponential) • log 𝑏 𝑎𝑛 = 𝑛 ∙ log 𝑏 𝑎 , • 𝑎𝑏 𝑚 = 𝑎𝑚 𝑏𝑚
log 𝑎 𝑎 𝑚 𝑎𝑚
• lg lg 𝑛 = lg lg 𝑛 (composition) • log 𝑏 𝑎 = log𝑐 𝑏 • = , 𝑏≠0
𝑐 𝑏 𝑏𝑚
1
• log 𝑏 = − log 𝑏 𝑎 • 𝑎𝑚 𝑛 = 𝑎𝑚𝑛
𝑎
1
• log 𝑏 𝑎 = log
𝑎 𝑏
• 𝑎log𝑏 𝑐 = 𝑐 log𝑏 𝑎
Data Structures and Algorithms - Analysis of Algorithms 20
In-class Exercise
• Prove that 𝑇 𝑁 = 𝑎 ∙ 𝑁 𝑏 where, 7
log-log plot
𝑎 = 2𝑐 from the log-log plot 6 Straight
5 line of
4 slope 3
3
• Exponential on both sides of the
lg(T(N))
2
equation 1
0
-1 0 5 10 15
• Use rules -2
-3
• 𝑎𝑚 ∙ 𝑎𝑛 = 𝑎𝑚+𝑛 , -4
• 𝑎𝑚 𝑛 = 𝑎𝑚𝑛 lg(N)
• 𝑎 = 𝑏 log𝑏 𝑎
lg 𝑇 𝑁 = 𝑏 ∙ lg 𝑁 + 𝑐
𝑏 = 2,999
𝑐 = −33,210
𝑇 𝑁 = 𝑎 ∙ 𝑁 𝑏 , 𝑤ℎ𝑒𝑟𝑒 𝑎 = 2𝑐
4 minutes
Data Structures and Algorithms - Analysis of Algorithms 21
Prediction and Validation
• Hypothesis
• The running time is about 1,006 ∙ 10−10 ∙ 𝑁 2,999 seconds
• Predictions “order of growth” of running time
• 51,0 seconds for 𝑁 = 8000 is about 𝑁 3 [discussed later]
• 408,1 seconds for 𝑁 = 16000
• Observations
Input size 𝑵 𝑻(𝑵)
8000 51,1
8000 51 validates hypothesis!
8000 51,1
16000 410,8
Data Structures and Algorithms - Analysis of Algorithms 22
Doubling Hypothesis
T N = a ∙ 𝑁𝑏
𝑇 2𝑁 𝑎 ⋅ 2𝑁 𝑏 𝑏
= = 2
• Quick way to estimate b in a power- 𝑇 𝑁 𝑎 ⋅ 𝑁𝑏
law relationship
𝑵 time
Input 𝒍𝒈
• Run program, doubling the size of the size 𝑵
(seconds)† ratio
ratio
𝑻 𝑵
input
250 0 -
500 0 4,8 2,3
• Hypothesis 1000 0,1 6,9 2,8
2000 0,8 7,7 2,9
• Running time is about T N = a ∙ 𝑁 𝑏 b = lg ratio
4000 6,4 8 3
with b = lg ratio, i.e., 0,998 ∙ 10−10 ∙ 𝑁 3 lg(6,4/0,8) = 3,0
8000 51,1 8 3
seems to converge to a constant b ≈ 3
Data Structures and Algorithms - Analysis of Algorithms 23
Doubling Hypothesis
T N = a ∙ 𝑁𝑏
• Q. How to estimate a (assuming we
know 𝑏)? Input size
𝑵 time
(seconds)†
𝑵
• A. Run the program (for a sufficiently 𝑻 𝑵 51,1 = 𝑎 × 80003
large value of 𝑁) and solve for 𝑎 8000 51,1 ⇒ 𝑎 = 0,998 ∙ 10−10
8000 51,0
8000 51,1
• Hypothesis
• Running time is about 0,998 ∙ 10−10 ∙ 𝑁 3
seconds
almost identical hypothesis
to one obtained via linear regression
Data Structures and Algorithms - Analysis of Algorithms 24
Experimental Algorithmics
• System independent effects T N = a ∙ 𝑁𝑏
• Algorithm
determines exponent in power law (b)
• Input data
determines constant in power law (a)
• System dependent effects
• Hardware: CPU, memory, cache, …
• Software: compiler, VM, …
• System: operating system, network, other apps, …
• Bad news
• Difficult to get precise measurements
• Good news
• Much easier and cheaper than other science e.g., can run huge number of experiments
Data Structures and Algorithms - Analysis of Algorithms 25
What’s Next?
• Next session on April 15
• Lecture on Analysis of Algorithms (Cont.)
• In Campus Essen R14 R00 A04 Audimax
• Live stream for students in Campus Duisburg
http://algs4.cs.princeton.edu
Chapter 1.4
Data Structures and Algorithms - Analysis of Algorithms 26