You are on page 1of 28

Data Structures and Algorithms

(«ôPÀ⌥ó’)
Lecture 3: Analysis Tools
Hsuan-Tien Lin (ó“0)
htlin@csie.ntu.edu.tw

Department of Computer Science


& Information Engineering
National Taiwan University
(↵Àc'x«⌦Â↵˚)

Hsuan-Tien Lin (NTU CSIE) Data Structures and Algorithms 0/25


Analysis Tools

Roadmap
1 the one where it all began

Lecture 2: Data Structures


scheme of purposefully organizing data with
access/maintenance algorithms,
such as ordered array for faster search

Lecture 3: Analysis Tools


motivation
cases of complexity analysis
asymptotic notation
usage of asymptotic notation
2 the data structures awaken
3 fantastic trees and where to find them
4 the search revolutions
5 sorting: the final frontier
1/25
motivation
Analysis Tools motivation

Recall: Properties of Good Program

good program: proper use of resources

Space Resources Computation Resources


• memory • CPU(s)
• disk(s) • GPU(s)
• transmission bandwidth • computation power
—space complexity —time complexity

need: language for describing complexity

3/25
Analysis Tools motivation

Space Complexity of G ET-M IN

G ET-M IN(A)
1 m = 1 // store current min. index
2 for i = 2 to A.length
3 // update if i-th element smaller
4 if A[m] > A[i]
5 m=i
6 return A[m]

• array A: pointer size s1 (not counting the actual input elements)


• integer m: size s2
• integer i: size s2

total space s1 + 2s2 :


constant to n = A.length within algorithm
4/25
Analysis Tools motivation

Space Complexity of G ET-M IN -WASTE

G ET-M IN -WASTE(A)
1 B = C OPY(A, 1, A.length)
2 I NSERTION -S ORT(B)
3 return B[1]

• array A: pointer size s1 (not counting the actual input elements)


• array B:
• pointer size s1
• n integers with total size s2 · n, where n = A. length
• any space that I NSERTION -S ORT uses: ⇤

total space 2s1 + s2 n + ⇤:


(at least) linear to n within algorithm

5/25
Analysis Tools motivation

Time Complexity of Insertion Sort


I NSERTION -S ORT(A) cost times
1 for m = 2 to A. length d1 n
2 key = A[m] d2 n 1
3 // insert A[m] into the sorted 0 n 1
sequence A[1 . . m 1]
4 i = m 1 d4 n
Pn 1
5 while i > 0 and A[i] > key d5 Pm=2 tm
n
6 A[i + 1] = A[i] d6 Pnm=2 (t m 1)
7 i = i 1 d7 m=2 (tm 1)
8 A[i + 1] = key d8 n 1

(from Introduction to Algorithms Third Edition, Cormen at al.)

total time T (n)


n
X n
X n
X
= d1 n + d2 (n 1) + d4 (n 1) + d5 tm + d6 (tm 1) + d7 (tm 1) + d8 (n 1)
m=2 m=2 m=2

actual time d• depends on machine type;


total T (n) depends on n and tm , number of while checks
6/25
Analysis Tools motivation

Fun Time
Consider running G ET-M IN on an array A of length n. If line i
takes a time cost of di , and the inequality in line 4 is TRUE for t
times, what is the time complexity of G ET-M IN?
G ET-M IN(A) 1 d1 + d2 + d4 + d5 + d6
1 m = 1 // store current min. index
2 for i = 2 to A. length 2 d1 + td2 + td4 + td5 + d6
3 // update if i-th element smaller 3 d1 + nd2 + td4 + td5 + d6
4 if A[m] > A[i]
5 m = i 4 d1 + nd2 + (n 1)d4 + td5 + d6
6 return A[m]

7/25
Analysis Tools motivation

Fun Time
Consider running G ET-M IN on an array A of length n. If line i
takes a time cost of di , and the inequality in line 4 is TRUE for t
times, what is the time complexity of G ET-M IN?
G ET-M IN(A) 1 d1 + d2 + d4 + d5 + d6
1 m = 1 // store current min. index
2 for i = 2 to A. length 2 d1 + td2 + td4 + td5 + d6
3 // update if i-th element smaller 3 d1 + nd2 + td4 + td5 + d6
4 if A[m] > A[i]
5 m = i 4 d1 + nd2 + (n 1)d4 + td5 + d6
6 return A[m]

Reference Answer: 4
The loop (including ending check) in line 2 is
run n times; the condition in line 4 is checked
n 1 times, and t of those result in execution
of line 5.

7/25
cases of complexity analysis
Analysis Tools cases of complexity analysis

Best-case Time Complexity of Insertion Sort


I NSERTION -S ORT(A) cost times
1 for m = 2 to A. length d1 n
2 key = A[m] d2 n 1
3 // insert A[m] into the sorted 0 n 1
sequence A[1 . . m 1]
4 i = m 1 d4 n
Pn 1
5 while i > 0 and A[i] > key d5 Pm=2 tm
n
6 A[i + 1] = A[i] d6 Pm=2 (tm 1)
n
7 i = i 1 d7 m=2 (tm 1)
8 A[i + 1] = key d8 n 1

(from Introduction to Algorithms Third Edition, Cormen at al.)

sorted A =) tm = 1
T (n)
n
X n
X n
X
= d1 n + d2 (n 1) + d4 (n 1) + d5 tm + d6 (tm 1) + d7 (tm 1) + d8 (n 1)
m=2 m=2 m=2
= d1 n + d2 (n 1) + d4 (n 1) + d5 (n 1) + d6 (0) + d7 (0) + d8 (n 1)

best case: T (n) = ⌅ · n + ⌥ (linear to n)


9/25
Analysis Tools cases of complexity analysis

Worst-case Time Complexity of Insertion Sort


I NSERTION -S ORT(A) cost times
1 for m = 2 to A. length d1 n
2 key = A[m] d2 n 1
3 // insert A[m] into the sorted 0 n 1
sequence A[1 . . m 1]
4 i = m 1 d4 n
Pn 1
5 while i > 0 and A[i] > key d5 Pm=2 tm
n
6 A[i + 1] = A[i] d6 Pm=2 (tm 1)
n
7 i = i 1 d7 m=2 (tm 1)
8 A[i + 1] = key d8 n 1

(from Introduction to Algorithms Third Edition, Cormen at al.)

reverse-sorted A =) tm = m
T (n)
n
X n
X n
X
= d1 n + d2 (n 1) + d4 (n 1) + d5 tm + d6 (tm 1) + d7 (tm 1) + d8 (n 1)
m=2 m=2 m=2
(n+2)(n 1) n(n 1) n(n 1)
= d1 n + d2 (n 1) + d4 (n 1) + d5 ( 2
) + d6 ( 2
) + d7 ( 2
) + d8 (n 1)

worst case: T (n) = F · n2 + ⌅ · n + ⌥ (quadratic to n)


10/25
Analysis Tools cases of complexity analysis

Average-case Time Complexity of Insertion Sort


average case

other cases
A = [1, 2, 4, 3]

other cases
best cases A = [1, 4, 2, 3] worst cases
A = [1, 2, 3, 4] A = [4, 3, 2, 1]
...

other cases
A = [4, 3, 1, 2]

best case  average case  worst case


11/25
Analysis Tools cases of complexity analysis

Time Complexity Analysis in Pratice

Common Focus Common Language


worst-case time complexity rough time needed
w.r.t. input size n
s
with Peopect

T (n) = F·n2 +⌅ · n + ⌥

• physically meaningful: • care more about


waiting time/power • larger n
consumption • leading term of n
• often ⇡ average-case: • care less about
when many • constants
near-worst-cases • other terms of n

next: language of rough notation


12/25
Analysis Tools cases of complexity analysis

Fun Time
Which of the following describes the best-case time complexity
of G ET-M IN on an array A of length n?
G ET-M IN(A) 1 constant to n
1 m = 1 // store current min. index
2 for i = 2 to A. length 2 linear to n
3 // update if i-th element smaller 3 quadratic to n
4 if A[m] > A[i]
5 m = i 4 none of the other choices
6 return A[m]

13/25
Analysis Tools cases of complexity analysis

Fun Time
Which of the following describes the best-case time complexity
of G ET-M IN on an array A of length n?
G ET-M IN(A) 1 constant to n
1 m = 1 // store current min. index
2 for i = 2 to A. length 2 linear to n
3 // update if i-th element smaller 3 quadratic to n
4 if A[m] > A[i]
5 m = i 4 none of the other choices
6 return A[m]

Reference Answer: 2
Even in the best case, where line 5 is
executed 0 times, the loop (including ending
check) in line 2 still needs to be run n times,
and the condition in line 4 still needs to be
checked n 1 times.

13/25
asymptotic notation
Analysis Tools asymptotic notation

‘Rough’ Notation
goal
• care more about
• larger n
roughly • leading term of n
F·n2 +⌅ · n + ⌥ ⇠ n2
• care less about
• constants
• other terms of n

notation
F · n2 + ⌅ · n + ⌥ = ⇥(|{z}
n2 )
| {z }
f (n) g(n)

for positive f (n) and g(n) [when n 2 R with n 1]

f (n)
extracting the similarity: consider g(n)

15/25
Analysis Tools asymptotic notation
Modeling Rough with Asymptotic Behavior
goal
F · n2 + ⌅ · n + ⌥ = ⇥(|{z}
n2 )
| {z }
f (n) g(n)

• growth of ⌅ · n + ⌥ slower than g(n) = n2 :


for large n, removable by dividing g(n)
• asymptotically, two functions only differ by c > 0

f (n)
lim =c
n!1 g(n)

—why needing c > 0?

‘rough’ definition version 0 (to be changed):


for positive f (n) and g(n),
f (n)
f (n) = ⇥(g(n)) if limn!1 g(n) =c>0
16/25
Analysis Tools asymptotic notation

Asymptotic Notation: Modeling Rough Growth


f (n)
f (n) = ⇥(g(n)) (= lim =c>0
n!1 g(n)

big-⇥: roughly the same


• definition meets criteria:
• care about larger n: yes, n ! 1
p
• leading term more important: yes, n + n + log n = ⇥(n)
• insensitive to constants: yes, 1126n = ⇥(n)
• meaning: f (n) grows roughly the same as g(n)
• “= ⇥(·)” actually “2”
p
n 0.1126n n 112.6n n1.1 exp(n)
⇥(n)? N Y Y Y N N

asymptotic notation:
the most used ‘language’ for time/space complexity
17/25
Analysis Tools asymptotic notation

Issue about the Convergence Definition


f (n)
f (n) = ⇥(g(n)) (= lim =c>0
n!1 g(n)

consider a hypothetical algorithm:


• T (n) = n for even n
• T (n) = 2n for odd n
T (n)
— want: T (n) = ⇥(n), but limn!1 n does not exist!

fix (formal): for asymptotically non-negative f(n) & g(n)

f (n) = ⇥(g(n)) () exists positive (n0 , c1 , c2 )


such that c1 g(n)  f (n)  c2 g(n)
for n n0

18/25
Analysis Tools asymptotic notation

Convergence ‘Definition’ ) Formal Definition


For asymptotically non-negative functions f (n) and g(n), if
f (n)
limn!1 g(n) = c, then f (n) = ⇥(g(n)).

• with definition of limit, there exists ✏ > 0, n0 > 0 such that for all
f (n)
n n0 , | g(n) c| < ✏.
f (n)
• i.e. for all n n0 , c ✏< g(n) < c + ✏.
• Let c10 = c ✏, c20 = c + ✏, n00 = n0 , formal definition satisfied with
(c10 , c20 , n00 ). QED

often suffices to use convergence ‘definition’ in practice

19/25
usage of asymptotic notation
Analysis Tools usage of asymptotic notation

The Seven Functions as g(n)


g(n) =?
• 1: constant
—meaning c1  f (n)  c2 for n n0
• log n: logarithmic
—does base matter?
• n: linear
• n log n
• n2 : square
• n3 : cubic
• 2n : exponential
—does base matter?

will often encounter them in future classes

21/25
Analysis Tools usage of asymptotic notation

Logarithmic Function in Asymptotic Notation


Claim
For any a > 1, b > 1, if f (n) = ⇥(loga n), then f (n) = ⇥(logb n).

Proof
• f (n) = ⇥(loga n) () 9(c1 , c2 , n0 ) such that
c1 loga n  f (n)  c2 loga n for n n0
• Then, c1 loga b logb n  f (n)  c2 loga b logb n for n n0
• Let c10 = c1 loga b, c20 = c2 loga b, n00 = n0 , we get f (n) = ⇥(logb n)

base does not matter in ⇥(log n)

22/25
Analysis Tools usage of asymptotic notation

Analysis of Sequential Search

S EQ -S EARCH(A, key )
1 for i = 1 to A.length
2 // return when found
3 if A[i] equals key
4 return i
5 return NIL
• best case (i.e. key at 1): T (n) = ⇥(1)
• worst case (i.e. return NIL): T (n) = ⇥(n)
• average case with respect to uniform key 2 A: E(T (n)) = ⇥(n)

# iterations in loop: dominating often

23/25
Analysis Tools usage of asymptotic notation

Analysis of Binary Search


• best case (i.e. key at first m):
B IN -S EARCH(A, key , `, r ) T (n) = ⇥(1)
1 while `  r
2 m = floor((` + r )/2) • worst case (i.e. return NIL):
3 if A[m] equals key because range (r ` + 1)
4 return m
5 elseif A[m] > key
roughly halved in each while ,
6 r = m 1 // cut out end # iterations roughly log2 n:
7 elseif A[m] < key
8 ` = m + 1 // cut out begin
T (n) = ⇥(log n)
9 return NIL

often care more about worst case, as mentioned

24/25
Analysis Tools usage of asymptotic notation

Summary

Lecture 3: Analysis Tools


motivation
roughly quantify time or space complexity to
measure efficiency
cases of complexity analysis
often focus on worst-case with ‘rough’
notations
asymptotic notation
rough comparison of function for large n
usage of asymptotic notation
describe f (n) (time, space) by simpler g(n)

25/25

You might also like