You are on page 1of 36

Data Structure

Lebanese University – Faculty of Engineering III

Analysis of Algorithms
Dr. Ali El Masri

Adapted from slides provided with Data Structures and Algorithms in C++
Goodrich, Tamassia and Mount (Wiley, 2004) 1
• Given 2 or more algorithms to solve the same problem, how do we select the best one?

• Some criteria for selecting an algorithm


Is it easy to implement, understand, modify?
How long does it take to run it to completion?
How much of computer memory does it use?

• Software engineering is primarily concerned with the first criteria

• In this course we are interested in the second and third criteria

2
• Time complexity
The amount of time that an algorithm needs to run to completion

• Space complexity
The amount of memory an algorithm needs to use

• We will occasionally look at space complexity, but we are mostly interested in time complexity in this course

• Thus in this course the better algorithm is the one which runs faster (has smaller time complexity)

3
• Most algorithms transform input objects into output objects

sorting
5 3 1 2 algorithm 1 2 3 5

input object output object

• The running time of an algorithm typically grows with the input size
best case
• Average case time is often difficult to determine average case
worst case
120
• We focus on the worst case running time
100
• Easier to analyze

Running Time
• Crucial to some applications 80
• Knowing the worst case running time gives us a guarantee that an algorithm will 60
not exceed a certain amount of time
• For some algorithms, the worst case occurs fairly often, e.g., 40

searching a database for a specific piece of information 20


algorithm worst case will often occur when the information is not present
0
(occurs frequently in some applications) 1000 2000 3000 4000
Input Size
4
• Write a program implementing the algorithm

• Run the program with inputs of varying size and composition

• Use a function, like the built-in clock() function, to get an accurate measure of the actual running time

• Plot the results


9000
8000

7000
6000
Time (ms)

5000

4000
3000
2000

1000
0
0 50 100
5
Input Size
• It is necessary to implement the algorithm, which may be difficult

• Results may not be indicative of the running time on other inputs not included in the experiment
Even on inputs of the same size, running time can be very different

• In order to compare two algorithms, the same hardware and software environments must be used

6
• Uses a high-level description of the algorithm instead of an implementation

• Characterizes running time as a function of the input size, n

• Takes into account all possible inputs

• Allows us to evaluate the speed of an algorithm independent of the hardware/software environment

7
• High-level description of an algorithm

• More structured than English prose Example: find max element of an array

• Less detailed than a program Algorithm arrayMax(A, n)


Input array A of n integers
• Preferred notation for describing algorithms
Output maximum element of A
• Hides program design issues
currentMax ¬ A[0]
for i ¬ 1 to n - 1 do
if A[i] > currentMax then
currentMax ¬ A[i]
return currentMax

8
• Control flow • Method call
• if … then … [else …] var.method (arg [, arg…])
• while … do …
• repeat … until … • Return value
• for … do … return expression
• Indentation replaces braces

• Expressions
• Method declaration ¬ Assignment
Algorithm method (arg [, arg…]) (like = in C++)
Input …
Output … = Equality testing
(like == in C++)

n2 Superscripts and other mathematical


formatting allowed
9
• A CPU

• A potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character

• Memory cells are numbered and accessing any cell in memory takes unit time

2
1
0

10
• Basic computations performed by an algorithm
• Assumed to take a constant amount of time in the RAM model

• Examples:
• Evaluating an expression x2+ey
• Assigning a value to a variable cnt ¬ 2
• Indexing into an array A[5]
• Calling a method mySort(A,n)
• Returning from a method return(cnt)

11
• By inspecting the pseudocode (as in Slide 8) or the implementation (as below), we can determine the maximum
number of primitive operations executed by an algorithm, as a function of the input size

double arrayMAX (double A[], int n) { # operations


double currentMax = A[0]; 2
for (int i = 1; i < n; i++) { 2n = 2(n-1)+2
if (A[i] > currentMax) 3(n-1)
currentMax = A[i]; 0 à 2(n-1)
}
return currentMax; 1
}

Total 5n à 7n-2

12
• Algorithm arrayMax executes 7n - 2 primitive operations in the worst case and 5n primitive
operations in the best case.
Define:
a = Time taken by the fastest primitive operation
b = Time taken by the slowest primitive operation

• Let T(n) be worst case time of arrayMax. Then


a (5n) £ T(n) £ b(7n - 2)

• Hence, the running time T(n) is bounded by two linear functions

13
• Changing the hardware/ software environment
we focus on the big-picture which
• Affects T(n) by a constant factor, but is the growth rate of an algorithm
• Does not alter the growth rate of T(n)

• The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax
• Algorithm arrayMax grows proportionally with n, with its true running time being n times a constant factor that depends
on the specific computer

14
Seven functions that often appear in algorithm analysis:

• Constant ≈ 1
• Logarithmic ≈ log n
• Linear ≈ n arrayMAX is here
• N-Log-N ≈ n log n
• Quadratic ≈ n2
• Cubic ≈ n3
• Exponential ≈ 2n

15
n log(n) n nlog(n) n2 n3 2n
8 3 8 24 64 512 256

16 4 16 64 256 4096 65536

32 5 32 160 1024 32768 4.3x109

64 6 64 384 4096 262144 1.8x1019

128 7 128 896 16384 2097152 3.4x1038

256 8 256 2048 65536 16777218 1.2x1077

16
17
18
• Insertion sort is n2 / 4

• Merge sort is 2 n lg n

• Sort a million items?


insertion sort takes roughly 70 hours
merge sort takes roughly 40 seconds

• This is a slow machine, but if 100 x as fast then it is:


40 minutes versus less than 0.5 seconds

19
• The growth rate is not affected by
• constant factors
• lower-order terms

• Examples
• 102n + 105 is a linear function
• 105n2 + 108n is a quadratic function

20
• Given functions f(n) and g(n), we say that f(n) is O(g(n)) if
there are positive constants c and n0 (n0 is an integer) such
that:
f(n) £ cg(n) for all n ³ n0

• Example: 2n + 10 is O(n)
• 2n + 10 £ cn
• (c - 2) n ³ 10
• n ³ 10/(c - 2)
• Pick c = 3 and n0 = 10

21
• Example: the function n2 is not O(n)
• n2 £ cn
• n£c
• The above inequality cannot be satisfied since c
must be a constant

22
5𝑛4 + 3𝑛3 + 2𝑛2 + 4𝑛 + 1 is 𝑂(𝑛4)
Note that 5𝑛4 + 3𝑛3 + 2𝑛2 + 4𝑛 + 1 ≤ (5 + 3 + 2 + 4 + 1)𝑛4 = 𝑐𝑛4, (𝑐 = 15, 𝑛 ≥ 𝑛0 = 1)

5𝑛2 + 3𝑛𝑙𝑜𝑔 𝑛 + 2𝑛 + 5 is 𝑂(𝑛2)


5𝑛2 + 3𝑛𝑙𝑜𝑔𝑛 + 2𝑛 + 5 ≤ 5 + 3 + 2 + 5 𝑛2 = 𝑐𝑛2, (𝑐 = 15, 𝑛 ≥ 𝑛0 = 1)

20𝑛3 + 10𝑛𝑙𝑜𝑔𝑛 + 5 is 𝑂(𝑛4 )


20𝑛3 + 10𝑛𝑙𝑜𝑔𝑛 + 5 ≤ 35𝑛3, (𝑐 = 35, 𝑛 ≥ 𝑛0 = 1)

3𝑙𝑜𝑔𝑛 + 2 is 𝑂(𝑙𝑜𝑔𝑛)
3𝑙𝑜𝑔𝑛 + 2 ≤ 5𝑙𝑜𝑔𝑛, (𝑐 = 5, 𝑛 ≥ 𝑛0 = 2); Note that 𝑙𝑜𝑔1 = 0
+2
2𝑛 is 𝑂(25 )
+ = 2𝑛 2
2𝑛 2 2 = 4 ∗ 2𝑛, (𝑐 = 4, 𝑛 ≥ 𝑛0 = 1)

2𝑛 + 100𝑙𝑜𝑔𝑛 is 𝑂(𝑛)
2𝑛 + 100𝑙𝑜𝑔𝑛 ≤ 102𝑛, (𝑐 = 102, 𝑛 ≥ 𝑛0 = 1)

23
• The big-Oh notation gives an upper bound on the growth rate of a function

• The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n)

• We can use the big-Oh notation to rank functions according to their growth rate

f(n) is O(g(n)) g(n) is O(f(n))

g(n) grows more Yes No


f(n) grows more No Yes
Same growth Yes Yes

24
• If f(n) is a polynomial of degree d, then f(n) is O(nd), i.e.,
1. Drop lower-order terms
2. Drop constant factors

• Use the smallest possible class of functions


• Say “2n is O(n)” instead of “2n is O(n2)”

• Use the simplest expression of the class


• Say “3n + 5 is O(n)” instead of “3n + 5 is O(3n)”

25
• The asymptotic analysis (as n grows toward infinity) of an algorithm determines the running time in big-
Oh notation

• To perform the asymptotic analysis


• We find the worst-case number of primitive operations executed as a function of the input size
• We express this function with big-Oh notation

• Example:
• We determine that algorithm arrayMax executes at most 7n - 2 primitive operations
• We say that algorithm arrayMax “runs in O(n) time”

• Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them
when counting primitive operations

26
35
X
• We further illustrate asymptotic analysis with two algorithms for 30
prefix averages A
25

• The i-th prefix average of an array X is average of the first (i + 1) 20


elements of X:
15
A[i] = (X[0] + X[1] + … + X[i])/(i+1)
10
5
• Computing the array A of prefix averages of another array X has
applications to financial analysis 0
1 2 3 4 5 6 7

27
The following algorithm computes prefix averages in quadratic time by applying the definition

Algorithm prefixAverages1(X, n)
Input array X of n integers
Output array A of prefix averages of X
#operations
A ¬ new array of n integers n
for i ¬ 0 to n - 1 do n
s ¬ X[0] n
for j ¬ 1 to i do 1 + 2 + …+ (n - 1) = (n-1)*n/2
s ¬ s + X[j] 1 + 2 + …+ (n - 1) = (n-1)*n/2
A[i] ¬ s / (i + 1) n
return A 1

Algorithm prefixAverages1 runs in O(n2) time

28
The following algorithm computes prefix averages in linear time by keeping a running sum

Algorithm prefixAverages2(X, n)
Input array X of n integers
Output array A of prefix averages of X
#operations
A ¬ new array of n integers n
s¬0 1
for i ¬ 0 to n - 1 do n
s ¬ s + X[i] n
A[i] ¬ s / (i + 1) n
return A 1

Algorithm prefixAverages2 runs in O(n) time


(better than prefixAverages1)
29
Big-Omega
f(n) is Ω(g(n)) if there is a constant c > 0 and an integer constant n0 ≥ 1 such that
f(n) ≥ c g(n) for n ≥ n0

Big-Theta
f(n) is Θ(g(n)) if there are constants c’ > 0 and c’’ > 0 and an integer constant n0 ≥ 1 such that
c’g(n) ≤ f(n) ≤ c’’g(n) for n ≥ n0

30
31
Exercise: Find the time complexity of the following programs using Big – Oh notation

1.
public static void addTen (int[] a, int n) {
for (int i = 0; i < n; i++) { n
a[i] = a[i] + 10; n
}
}
O(n)

2.
for (int i = 0; i < 10; i++) { 10
cout << i << endl; 10
}
O(1)

32
3.
public static int search (int a[], int n, int x) {
int i = 0; 1
while (i < n) { n
if (x == a[i]) {

}
return i;
n
0 -> 1 O(n)
i++; n
}
return -1; 1
}

j 1 (1 time) 2 (1 time) 3 (1 time) 4 (1 time) ... n (1 time) Total


4. (n times)
sum = 0; i 1 time 2 * times 3 * times 4 times ... N * times 1+2+3
for (j = 1; j <= n; j++) +4 + .....n
for (i = 1; i <= j; i++) 1 + 2 + 3 +4 + .....n = n (n + 1) /2 = n2/2 + n/2 = O (n2)
sum++;
33
N=2 N = 22 N = 23 N = 24
5.
sum = 0; 1 time 2 times 3 times 4 times
while (n > 1) { This function will be applied k times (complexity = O(k))
sum++;
N = 2k à k times -> O(k)
n = n / 2;
} k = Log (n)
Complexity = Log(n)

6.
sum = 0; i 1 (1 time) 2 (1 time) 4 (1 time) 8 (1 time) 2k = n
(1 time)
for (i = 1; i <= n; i*=2)
for (j = 1; j <= n; j++) j n times n times n times n times n times
sum++;
2k = n (we are supposing that we reached n in k steps)
Complexity of the first loop is O(k)
2k = n à k = Log n --> Complexity of first loop is O(Log n)
For each iteration in the first loop, we have n steps in the second loop
Complexity O(n Log n)

34
35
• Data Structures and Algorithms in C++, 2nd Edition, by M.T. Goodrich, R. Tamassia, and D.M. Mount

• Data Structures and Algorithms in Java, 6th Edition, by M.T. Goodrich and R. Tamassia

36

You might also like