7 views

Uploaded by Ahmed Barbary

Big O Analysis

- Engineering Books.pdf
- Big-Oh.ppt
- lec1
- Paper 22-A Randomized Fully Polynomial-Time Approximation Scheme for Weighted Perfect Matching in the Plane
- Module%201%20Data%20Structures%20Theory%20syllabus%202002%20onwards[1]
- math syllabus
- Algo Summary Ch1 Ch6
- 2 Complexity Analysis.pdf
- 1308.6007
- Greg s Notes
- Artificial Intelligence Fyl
- alt2010final.pdf
- 1 Boolean Circuits and AC0
- Yazici Kalkan AlgorithmAnalysis 2011 Week2a
- Lec (1)
- DAA
- sub_ncs_301_30sep14
- Andreas TF IDF Demo
- kjkj
- Mca 202 Data Structures

You are on page 1of 9

The worst-case number of elementary steps of an algorithm is a precise function of its parameters.

For instance, sorting n numbers with a particular sorting program has a worst-case number of

elementary steps for each value of n, so it is a legitimate function.

It would be hard to figure out what this exact function is, as that would depend on the compiler

and the computer you are running it on. The problem is even more difficult when we are talking

about an algorithm in abstract terms without yet giving a program that implements it or specifying

the computer.

Big-O analysis, also known as asymptotic analysis, allows us to get into the ballpark. It

is an analysis that groups large sets of functions into ballparks, and when two functions arent

in the same ballpark, it tells which one is decisively faster growing. The ballparks are known as

equivalence classes.

Its hard to figure out the exact number of steps that an algorithm takes in the worst case.

If we try, we get an ugly polynomial, such as the one on page 27, and even then, the constants

in the expression reflect engineering considerations that havent been settled until we write out

a full program and specify the compiler and the machine. Its often easy to figure out what

equivalence class it falls into, however. When the running times of two algorithms for a problem

are in different equivalence classes, it gives us a decisive reason for claiming that one algorithm is

superior, independently of implementation details, at least for large instances of the problem.

To compare two functions f (n) and g(n), we will look at what happens to f (n)/g(n) as n goes

to infinity. We can think of f (n) as trying to pull the ratio to infinity and g(n) as trying to pull it

to zero.

Well say that functions f (n) and g(n) are in the same equivalence class if, as n gets large,

the ratio f (n)/g(n) settles down to a region of the y axis that lies between two positive constants,

c1 c2 . In class, I used the notation f (n) = g(n) to express this.

Example: f (n) = n2 , g(n) = 3n2 7n + 5. The ratio f (n)/g(n) gets closer and closer to

1/3 as n gets large. For n = 100, f (n) = 10, 000 and g(n) = 30, 000 700 + 5. The ratio is

0.34124. From then on out, it never gets any farther from 1/3. So we can pick, say, c1 = 0.1

and c2 = 5, and claim that for n > 100, f (n)/g(n) is always between c1 and c2 . Therefore,

f (n) = g(n). They are in the same equivalence class.

We could say that f (n) < g(n) if the limit of f (n)/g(n) as n goes to infinity is 0. That means

that no matter how small a positive constant we choose for c1 , the ratio eventually drops under c1

and stays under it, defeating any attempt to show that they are in the same equivalence class.

Example: f (n) = 100n, g(n) = n3 . Then the limit of f (n)/g(n) = 100/n2 is 0. If we pick c1

to be 1/10,000, then for n > 1000, the ratio will lie below c1 . No matter how small a positive

constant we pick for c1 , the ratio will eventually go below it and stay below it.

We could will say that f (n) > g(n) if the limit of f (n)/g(n) as n goes to infinity is infinity.

That means that no matter how large a c2 we pick, the ratio will eventually go above c2 and stay

there, defeating any effort to show they are in the same equivalence class.

Example: f (n) = n3 , g(n) = 100n. Notice that we have swapped the roles of f (n) and g(n)

from the previous example, so, since f (n) was less than g(n) in the previous example, it

should now be greater than g(n). The ratio is n2 /100. If we pick c2 to be something large,

such as 10, 000, then for n > 1000, the ratio is greater than 10, 000. No matter how large a

constant we pick for c2 , it will suffer the same fate, defeating any attempt to show that they

are in the same equivalence class.

There is a fourth case, which is none of the above three, illustrated by the exotic example of

f (n) = n and g(n) = n1+sin n . We could call this a forfeit. Its caused by one of the players in

the contest behaving in an erratic way that we wont ever see in the worst-case performance of an

algorithm. That is the last time in the course that well see the fourth case.

Notation

For historical reasons, we use will use the following notation, which we have not choice but to

memorize.

f (n) < g(n) is denoted f (n) = o(g(n))

f (n) = g(n) is denoted f (n) = (g(n))

f (n) > g(n) is denoted f (n) = (g(n))

f (n) g(n) is denoted f (n) = O(g(n)). This says, f (n) is not faster growing than g(n).

It might be in the same equivalence class or it might be slower growing; we arent taking a

stand.

f (n) g(n) is denoted f (n) = (g(n)). It might be in the same equivalence class or it

might be faster growing.

Notice that f (n) = o(g(n)) if and only if g(n) = (f (n)), f (n) = (g(n)) if and only if

g(n) = (f (n)), and f (n) = O(g(n)) if and only if g(n) = (f (n)).

Examples:

3n2 5n + 24 = O(n3 );

3n2 5n + 24 = (n2 );

3n2 5n + 24 = (n2 );

3n2 5n + 24 = o(n3 );

3n2 = 5n + 24 6= (n2 );

3n2 5n + 24 6= o(n2 );

The usual convention is to put a complicated function, which is often the worst-case running

time of an algorithm, on the left, and a simple expression inside the O(), ()), (), o(), or ()

expression.

Example: The worst-case running time of Mergesort, when run on a list of size n, is (n log2 n).

The expression f (n)(g(n)) is the most specific of these relationships. Why not always use it?

Sometimes, we can prove that f (n) = O(g(n)), but we dont know how to prove f (n) = (g(n)),

hence we cant claim f (n) = (g(n)). Sometimes, we are only interested in showing upper (big-O)

bounds on the running time of an algorithm, and we dont want to bother with showing lower (())

bounds.

When comparing f (n) and g(n) in a class in algebra, you can change the functions in any way that

doesnt change their value anywhere. Thats because standard algebra is concerned with the exact

values of things. There is no notion of ballparks. The algebraic identities tell you what you can

get away with.

When comparing f (n) and g(n) in our big- analysis, it is legitimate to transform them in

any way that cant affect what equivalence class they fall into. This allows us a lot of tricks for

simplifying them that wouldnt be allowed if we were doing an exact analysis. They would get you

in trouble with your teacher in middle school, who would think you are lazy and sloppy. They

make life really easy, yet they dont allow people to cheat when deriving a time bound.

Here are some examples. Convince yourself that they cant affect the outcome of the tug of war

if we only apply them a constant number of times.

Increase f (n) by an additive constant. Same with g(n).

Multiply f (n) or g(n) by a constant that does not depend on n.

When f (n) has two terms, f (n) and f (n) and f (n) = O(f (n)), we can throw out the f (n)

term. Same with g(n).

Example: f (n) = 2n3 + 3n3 + 4n2 + n + 5, we can throw out all but the first term.

Other key facts:

logb n = (logc n) for all b, c > 1. This follows from the fourth identify of 3.15 on page 56:

logb n = logc n/ logc b, and logc b is a constant factor. This means that we dont need to

specify the base of a log when we put it inside a big-O expression, big- expression, little-o

expression, etc.

nk = o(cn ) for all k 0, c > 1. I will omit the proof.

logk n = o(n ) for all k, > 0. It doesnt matter how large k is or how small is; the latter

will eventually overtake the former and leave it in the dust.

Here is a proof: let x = log2 n. Then logk n = xk , and n = (2log2 n ) = 2 log2 n = 2x . (See the

identities on pages 55-56.) Since x goes to infinity as n does, the outcome of the comparison

between logk n and n will be the same as the outcome of the comparison between xk and

2x , and xk = o(2x ) by the previous fact.

Summations

want to compare one running time to others, it is convenient to get it into closed form.

4.1

Geometric series

P

When f (n) = ni=0 ri , where 0 < r and r 6= 1, the summation is called a geometric series. The

ratio of each pair of consecutive terms is a constant, r.

Heres a convenient trick: for bounding a geometric series, we can throw out all but the largest

term of the summation and put a big- around it.

Example: 1 + 2 + 4 + 8 + 16 + 32 + 64 = 2 64 1. 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 = 2 128 1.

The summation is always just shy of two times the largest term. Throwing out all but the largest

P

term, 2n , of the summation ni=1 2i diminishes it by less than a factor of two, so it doesnt change

its equivalence class. The summation is (2n ).

Example: 1 + 1/2 + 1/4 + 1/8 + ... + 1/2n < 2, so throwing out all but the largest term, 1,

diminishes the summation by less than a factor of two. The summation is (1).

P

Proof: We will first show using mathematical induction that f (n) = ni=0 ri = (rn+1 1)/(r

1). If n = 0 the sum is 1 and (r0+1 1)/(r 1) = 1. Therefore, if the equality fails for some

Pn1 i

Pn

i

n

n, it must fail at a smallest n. Because n 6= 0, n > 0.

i=0 r =

i=0 r + r . Since n is the

Pn1

n1+1

n

1)/(r 1) = (r 1)/(r 1). Therefore,

smallest value where the identity fails, i=0 = (r

f (n) = (rn 1)/(r 1) + rn = (rn 1 + rn+1 rn )/(r 1) = (rn+1 1)/(r 1), contradicting our

definition of n as a place where the identity fails. It must be that there is no lowest value of n 0

where it fails, so it must be true for all n 0.

Now that we have our summation in closed form, lets simplify. Suppose r > 1. Lets mess

with (rn+1 1)/(r 1) in ways that are allowed in asymptotic analysis. We can multiply it by

the constant (r 1) to change it to rn+1 1. We can add the constant 1 to change it to rn+1 .

We can divide it by the constant r to change it to rn , which is equal to the biggest term. If it

ties with the biggest term after these changes, it must be that it tied initially with it. Therefore,

Pn

i

n+1 1)/(r 1) = (r n ), and the claim holds.

i=0 r = (r

Suppose 0 < r < 1. Lets get both the numerator and denominator positive, and rewrite it as

(1 rn+1 )/(1 r) This gets nearer and nearer to 1/(1 r) as n gets large, since rn+1 gets smaller

and smaller. We can now multiply this by the constant (1 r), and it is 1. If it was (1) after

P

these steps, it must be that ni=0 ri = (rn+1 1)/(r 1) = (1), and 1 is the largest term of the

summation.

4.2

P

The summation f (n) = ni=1 i is not a geometric series, since the ratio of consecutive terms is not

a constant r. The difference between consecutive terms is a constant. This is called an arithmetic

series. We cant throw away all but the biggest term without changing its equivalence class.

Heres a proof that its (n2 ). Our strategy is to show first that its O(n2 ) and then show that

its (n2 ). These two facts together show that its (n2 ), just as i 5 and i 5 imply that i = 5.

To show its O(n2 ), we can transform it into a larger function, since this can only hurt our case.

A skeptic wont object. If we can show that it is still O(n2 ) after the transformation, then weve

proved that the original is O(n2 ). The trick is not to hurt our case so much that we change its

P

equivalence class. Lets boost up all the terms of ni=1 i to n. We now have n terms, each of which

is n, so our new sum is O(n2 ). We showed the bound in spite of hurting our case. It must be that

the original is O(n2 ).

To show its (n2 ), we can transform it into a smaller function, since this can only hurt our

case. Lets throw away the first n/2 terms. There are at least n/2 remaining terms, each of which

is at least n/2. Then lets cut each of the remaining terms down n/2. No term has been increased,

so this can only hurt our case. What is left is at least n/2 n/2 = n2 /4 = (n2 ).

We have shown that

Pn

i=1 i

= (n2 ).

P

Another example: f (n) = ni=1 log2 i. The largest term is log2 n. For an upper bound,

nobody will object if we change all the terms to log2 n. That could only hurt our case when trying

P

to show an upper bound. It becomes ni=1 log2 n = n log2 n. Therefore, f (n) = O(n log n).

For a lower bound, suppose n is even, since nobody would object if we throw out the last term

and n 1 = (n ) for n = n + 1. (We can get away with a lot of simplifying steps when were

working toward a big- bound.) Nobody will object if we throw out the terms that are smaller

than log2 (n/2). That could only hurt our efforts to still get our lower bound.

P

Our summation is now ni=n/2 log2 i. The smallest of the terms in this expression is log2 (n/2) =

(log2 n) 1. Nobody will object if we change all of the terms to be equal to the smallest. This

P

gives ni=n/2 ((log2 n) 1) = (n/2 + 1)((log2 n) 1) = (n log n).

We have shown that f (n) = O(n log n) and f (n) = (n log n), so f (n) = (n log n).

4.3

P

The summation ni=1 1/i is the harmonic series. It turns out that this one is (log n).

Heres a proof. Lets divide the terms into buckets, where the first term goes into the first

bucket, the next two terms go into the second bucket, the next four terms go into the third, the

next eight go into the fourth. That is the first bucket is {1}, the second is {1/2, 1/3}, the third is

{1/4, 1/5, 1/6, 1/7}, etc. Well end up with O(log n) buckets, since we can double at most log2 n

times before reaching n.

To show that the sum is O(log n), lets observe that the largest element of the first bucket is

at most 1, the largest element of the next bucket is 1/2, the largest of the next is 1/4, et. Lets

5

replace all the elements in each bucket with the biggest element in the bucket. This can only hurt

our case. Now the elements in the first bucket sum to 1 1 = 1, the elements in the second bucket

sum to 2 1/2 = 1, the elements in the third sum to 4 1/4 = 1, etc. There are O(log n) buckets,

each of which sums to at most 1, so our terms all sum to O(log n).

Lets now show that it is (log n). If there is an incomplete bucket at the end, lets throw its

terms out, since this can only hurt our case. It is easy to see that last complete bucket is at least

1/2 log2 n. Therefore, the number of complete buckets is (1/4 log2 n) = (log n). If the last

element in a bucket is 1/i, lets replace all elements in the bucket with 1/(i + 1). This reduces the

size of all elements in the bucket, so it can only hurt our case.

The first bucket now has one term equal to 1/2, so it sums to 1/2. The second bucket has two

terms equal to 1/4, so it sums to 1/2. The third bucket has four terms equal to 1/8, so it sums to

1/2. The sum is 1/2(log n) = (log n).

A peek at Python

print (\n\nIm starting.)

def InsertionSort(l):

if len(l) == 1:

return l

else:

l2 = InsertionSort(l[0:len(l)-1])

t = l[len(l)-1]

return [i for i in l2 if i <= t] + [t] + [i for i in l2 if i > t]

print (\n\nIm existing.)

Then, from the Linux command line, type ipython -i mySort.py. The file will run the program, and the -i prompt tells python to give you the command line after it runs. The only thing

the program does is to print two output statements and define a method. It doesnt actually run

the method; methods are defined with a command, def.

The -i option says to give you the Python command line after it runs. Type this to the Python

command line: InsertionSort([5,3,6,4,2]). You will see that it returns a list that has the same

elements, in sorted order.

Study the Python tutorial to find out about the non-Java-like statements you see. Find out

how lists can be represented with the square-bracket syntax. Find out about the len operator.

Find out what it means when l is a list and you write l[3:6]. Experiment by getting the Python

command line and doing the following:

In [2]: l = [6,9,2,1,3,4,7,8]

In [3]: l[3:6]

Out[3]: [1, 3, 4]

In [4]: l[0:2]

Out[4]: [6, 9]

In [5]: l2 = [100,150]

In [6]: l + l2

Out[6]: [6, 9, 2, 1, 3, 4, 7, 8, 100, 150]

In [7]: l

Out[7]: [6, 9, 2, 1, 3, 4, 7, 8]

In [8]: l2

Out[8]: [100, 150]

7

In [5]: l2 = [100,150]

In [9]: [i for i in l if i >= 4]

Out[9]: [6, 9, 4, 7, 8]

In [10]: l

Out[10]: [6, 9, 2, 1, 3, 4, 7, 8]

Before long, you should study the entire tutorial, trying out all of the examples at the Python

prompt as you go through it.

Here are some facts about Python operations. Producing the concatenation of two lists takes time

proportional to the sum of sizes of the lists. Creating a sublist of those elements of a list that meet

a condition that takes O(1) time to evaluate takes time proportional to the length of the list. To

create a list of elements between two indices takes time proportional to the number of elements in

this sublist.

Lets think of a name for the worst-case running time of our Python method on a list of size n:

well call it T (n). We would like to get a big- bound for T (n).

If n > 1, it makes a recursive call on n 1 elements. We dont know how long this takes, but

we have a name for it: T (n 1). It also does a fixed number k of operations that each take (n)

time. The total for these is k(n) = (n) since k is fixed. In the friendly world of big-O, we didnt

even need to bother counting what k is!

If n == 1, it takes (1) time.

We get the following recursive expression for the running time:

T (1) = (1); T (n) = T (n 1) + (n).

Since were doing big- analysis, there is no harm in fudging this and just calling it this, since

it can only affect T () by a constant factor:

T (1) = 1; T (n) = T (n 1) + n.

If n 1 > 1, we can substitute T (n 1) = T (n 2) + n 1 into this expression, and get

T (n) = T (n 2) + n 1 + n.

If n 2 > 1, we can substitute T (n 2) = T (n 3) + n 2 into this expression, and get

T (n) = T (n 3) + n 2 + n 1 + n.

Eventually, we will get down to T (1), and we can substitute in T (1) = 1. We get T (n) =

1 + 2 + 3 + . . . + n. Weve turned our recursive expression for the running time into a summation.

Its still not in closed form, but its progress.

We analyzed this summation above, and found out it was (n2 ). The running time of our

Python program must be (n2 ).

Study how the book derives a recurrence for mergesort. Its similar to what we have just done

above.

The master theorem is a recipe that often works for getting a big- bound on a recurrence of

the form T (n) = aT (n/b) + f (n). The recurrence for mergesort is similar.

The master theorem says that if the ratio nlogb a /f (n) is (n ) for some > 0, then T (n) =

(nlogb a ). If the inverse of this ratio is (n ), then T (n) = (f (n)). If f (n) = (nlogb a ), then

T (n) = (f (n) log n) = (nlogb a log n).

The second condition on case 3 of the master theorem applies to all recurrences we will see

this semester. It is there as a disclaimer against skeptics who could otherwise show it is false by

tripping it up with an exotic f (n). I will explain what this is about in class.

Examples:

T (n) = 2T (n/2) + n. Were comparing nlog2 2 = n1 vs. f (n) = n. Theyre big- of each

other. T (n) = (n log n).

T (n) = 4T (n/2) + n. Were comparing nlog2 4 = n2 vs. f (n) = n. The ratio is n for = 1.

T (n) = (n2 ).

T (n) = 2T (n/2)+n1/2 . Were comparing n1 vs n1/2 . The ratio is n for = 1/2. T (n) = (n).

T (n) = 2T (n/2) + n2 . Were comparing n1 vs. n2 . The ratio is 1/n1 . T (n) = (n2 ).

T (n) = 2( n/2) = n log n. n1 vs n log n. The ratio is 1/log n. However, since log n = o(n ) for

all > 0, the master theorem says you have to find another method. It punts. In this case,

we have to expand the recurrence by iterative substitution, the way we did for the recurrence

for insertion sort above.

- Engineering Books.pdfUploaded byOlsjon Baxhija
- Big-Oh.pptUploaded byMohammad Gulam Ahamad
- lec1Uploaded byravg10
- Paper 22-A Randomized Fully Polynomial-Time Approximation Scheme for Weighted Perfect Matching in the PlaneUploaded byEditor IJACSA
- Module%201%20Data%20Structures%20Theory%20syllabus%202002%20onwards[1]Uploaded byambilichandran
- math syllabusUploaded byapi-301726429
- Algo Summary Ch1 Ch6Uploaded byHaneen Odeh
- 2 Complexity Analysis.pdfUploaded bySalman Khan
- 1308.6007Uploaded byygp666
- Greg s NotesUploaded byAamir Javed
- Artificial Intelligence FylUploaded byShivani Katukota
- alt2010final.pdfUploaded byKevin Mondragon
- 1 Boolean Circuits and AC0Uploaded byochie_lover
- Yazici Kalkan AlgorithmAnalysis 2011 Week2aUploaded byRıdvan Özcan
- Lec (1)Uploaded byMahmood Hamed
- DAAUploaded bykarthikbabube
- sub_ncs_301_30sep14Uploaded byNrgk Prasad
- Andreas TF IDF DemoUploaded byscribdv7r@gishpuppycom
- kjkjUploaded byJaishree Chauhan
- Mca 202 Data StructuresUploaded bythomassebastian2010
- NonUploaded byShiela Marie Caberto
- Fs ModelUploaded bySrikanth Kalvala
- Quantum Computation and Shor's Factoring AlgorithmUploaded byMichael Pearson
- Lecture 05Uploaded bymkysony
- hw02Uploaded byBob
- Phase Transitions in MachLearningUploaded byc_mc2
- Polyline to Polygon MatchingUploaded bySoundar Msr
- Tablas de Verdad practicas profesor borregoUploaded byErickGalvanAlcantara
- Grade5-191177-544-9288Uploaded bymehalingam nainar
- Information Theoretic SecurityUploaded bykoolzfire

- cypremort_sitemapUploaded byAhmed Barbary
- AppendixAUploaded byGautham Popuri
- finalpptbci-120604054644-phpapp01Uploaded byAhmed Barbary
- 1 DSP FundamentalsUploaded byshankar
- 1 DSP FundamentalsUploaded byshankar
- 07_dwUploaded byShiv Shankar Das
- ComparatorUploaded byAhmed Barbary
- AppendixAUploaded byGautham Popuri
- AppendixAUploaded byGautham Popuri
- CES522Lecture7Delay (1)Uploaded byAhmed Barbary
- Presentation 2Uploaded byAhmed Barbary
- Chapter10 VerilogUploaded byCris Calitina
- TestUploaded byAhmed Barbary
- TestUploaded byAhmed Barbary

- 141 Summerhill Road Footscray Plans 1Uploaded byMurray
- setting lesson plan templateUploaded byapi-241330064
- Trans-Saharan Slave Trade and Racism in the Arab World « BallandalusUploaded byedwinmasai
- Mechatronics Lab ManualUploaded byRAMAKANT RANA
- Digital DesignUploaded bykattaswamy
- Liturgy of St. BasilUploaded byAndrew Mitry
- MBTIUploaded byIqbal Naghma
- c03.FirmsUploaded byHuyen Nguyen
- Web Sections, Web Chapters and Solution AppendixUploaded byDieter Luisenhoff
- Lesson Plan EggsUploaded byDarien Henry
- SCOPH Statement on Graphic Health Warnings for Tobacco ProductsUploaded byAMSA_Philippines
- Chapter 10Uploaded byleonardo97valero
- suny admissions summaryUploaded byapi-355830853
- Not at Home in This WorldUploaded bysamutek
- history lesson plan 3rd grade - with reflectionUploaded byapi-317362905
- notes _10ME668_SQC-1-1Uploaded byprabhath
- Best Practices for Older Adults in Online EducationUploaded byJames Desrosier
- daniel diaz vidal cvUploaded byapi-239095261
- CDM Project in INdonesiaUploaded byarianz
- Before Labor BeginsUploaded byAyr Hershel Masenas
- Toyota 1KR-FE Service DataUploaded byGianluca Masala
- LVLesson1.pdfUploaded byRobert Obajdin
- Catalyst WorkoutsUploaded byMary Karabekios
- Exp 2 Relative DensityUploaded byumar
- Attachment-retained Restorations ASTRA TECH Implant System EVUploaded bypioneer2world
- Document.rtfUploaded byMohanad Hussien
- Rural Banking in Icici BankUploaded bymonikakhandal
- Design of COW MachineUploaded byATT2016
- Absensi Retaker UKDIUploaded byDwi Hm Hsb
- Liver AnatomyUploaded bydr.braveheart