P. 1
Asymptotic notations

# Asymptotic notations

|Views: 68|Likes:

See more
See less

11/08/2012

# I try to minimize things that might not be well known to my target audience because I'm reasonably sure that when

a lot of people reach some funky notation that they don't understand, they stop reading. Now, while I try also to explain the good, the bad, and the ugly without using those notations as well, it's hard to realize this if you don't get to that point. ;-) The most important measures that I use are for asymptotic growth of algorithms. This article will attempt to explain the use of these notations without requiring you to have a degree in computer science. Because I'm simplifying reality, it should be noted that none of the information in this article will be completely accurate in a formal mathematical sense, so please refrain from emailing me with complaints that I'm not entirely correct. The information in this article is more than sufficient to know what I mean when I say O(N2), or Ω(Nlog2 N) and which one is better. And to be perfectly frank, once you get the basic concept, that's more than enough to live your life unless you have a burning desire to formally analyse and prove the performance properties of an algorithm. I don't really care about that because I can get close enough with my simplified reality to know which algorithms to choose. The very word “asymptotic” scares the bejeezus out of people because it sounds complicated. The definition doesn't serve to alleviate that fear. Something that's asymptotic relates to a asymptote, which is defined as “A line whose distance to a given curve tends toward zero”. That's damn near worthless, so let's say that something asymptotic refers to a limiting behaviour based on a single variable and a desired measure. For example, we can use the number of steps that it takes for an algorithm that works with N items to complete (aka. time complexity) as the desired measure, and derive an asymptotic bound on the time complexity by increasing N toward infinity. In real person terms, we're just figuring out how much longer the algorithm takes when we add more items. The most common way to go about this is to double the number of items and see how much longer the algorithm takes. Now, we could actually test this by writing the algorithm, profiling it to see how long it takes for N, then profiling it again after doubling N. The time difference is a rough estimate of the growth. This is called an empirical test. However, we can also do a theoretical test by measuring the steps that rely on the size of N and get a reasonably useful measure of how the time complexity grows. Because the steps that don't rely on N won't grow, we can remove them from the measure because at a certain point, they become so small as to be worthless. In other words, we pretend that they don't matter in all cases. This is the idea behind asymptotic notation. By removing the constants (variables that have a fixed but unknown value), we can focus on the part of the measure that grows and derive a simplified asymptotic bound on the algorithm. A common notation that removes constants is called Big O Notation, where the O means “order of” (there are variants that do something similar that we'll look at shortly). Let's look at an example:
1 void f ( int a[], int n ) 2 { 3 int i; 4 5 printf ( "N = %d\n", n ); 6 7 for ( i = 0; i < n; i++ )

int x ) 2 { 3 int i. then there's no growth and the function will always take a fixed amount of time to complete. we can make the search a lot faster by splitting the array in half at each comparison and only searching the half where the item might be. If we say that a function is O(1). 9 printf ( "\n" ). so we can say that the loop has a linear time complexity. it's not always the most accurate measure. it's usually less even though we don't really know how much less. int x ) . or O(N). That's the upper bound of an algorithm. asymptotic notation also typically ignores the measures that grow more slowly because eventually the measure that grows more quickly will dominate the time complexity as N moves toward infinity. That's common knowledge. It may be less. it's simplified to O(1) + O(N). how about a binary search instead of a sequential search? If the array is sorted.8 printf ( "%d ". let's say we have a sequential search of an unordered array where the items are randomly distributed and we want both the average case growth and the worst case growth: 1 int find ( int a[]. Okay. that's the worst case upper bound. 10 } In this function. So by ignoring the constant time complexity because it grows more slowly than the linear time complexity. The loop itself has a number of steps equal to the size of the array. but what does O really mean? Big O notation refers to the asymptotic upper bound. we can simplify the asymptotic bound of the function to O(N). only half of the array is searched before the item is found due to the random distribution. 4 5 for ( i = 0. int n. but why is it faster? Here's the code for a binary search: 1 int find ( int a[]. So while the time complexity could reach O(N). the only part that takes longer as the size of the array grows is the loop. Now. as they don't rely on N. We know (because smarter people than I figured it out) that on average. so the conclusion is that f has linear time complexity. and the time complexity of the loop doubles as the size of the array doubles. or O(1). 8 } 9 10 return 0. and because constants are removed. Therefore. the function's time complexity at most will double. If we say that a function is O(N) then if N doubles. but never more. Now. 11 } This algorithm is clearly O(N) because it only has one loop that relies on the size of the array. For example. a[i] ). i < n. which means that it's a cap on how much the time complexity will grow. and it's the most common notation. the two printf calls outside of the loop are said to have a constant time complexity. even though O notation is the most common. The entire function f has a time complexity of 2 * O(1) + O(N). Okay. int n. However. i++ ) { 6 if ( a[i] == x ) 7 return 1.

algorithm. and you'll see it more often than any of the other notations. Okay. What's the smallest time complexity that we can expect? For example. However. since a correct binary search is guaranteed to only take log N steps to complete. which is the upper limit that we know we're guaranteed never to cross. so we can say that the lower bound for binary search is Ω(log2 N). But because the array is split in half each time. so why is the lower bound Ω(log2 N)? Remember that we're only using one variable. what if we want to know the lower bound for the binary search we just found the upper bound for? Well. int n ) 2 { . we can easily say that the lower bound is O(1) because the best possible case is an immediate match. now we can have a very accurate bound on the time complexity of a binary search. 14 } 15 16 return 0. Sometimes we're interested not in an upper bound. 4 5 while ( i < n ) { 6 int mid = ( n + i ) / 2. and since we know the O and Ω for binary search and they're the same. Therefore. There's a notation for the lower bound too. or O(log2 N). lacking those extra variables. we can't make an assumption. So an even better choice would be to set the upper bound to log N. we can say that binary search is Θ(log N). 12 else 13 return 1. Therefore. what about a sorting algorithm? Let's start with selection sort. The upper and lower bounds are the same! That's good. but in a lower bound. the number of steps is always going to be equal to the base-2 logarithm of N. If we use other variables. Note that in the best case scenario. decrease the size of the array so that you don't continually choose from the items that have already been selected: 1 void jsw_selection ( int a[]. but Θ(log2 N) is a much stronger claim. we can say that the lower bound is also logarithmic. to derive our measure. a more accurate claim is that binary search is a logarithmic. the first item we look at would be the one we're looking for and the search would effectively be O(1). When you move an item to the back. the size of the array. called Theta. called Omega.2 { 3 int i = 0. O(log2 N) is still correct. the longest time complexity possible is logarithmic for both the upper and lower bounds. 7 8 if ( a[mid] < x ) 9 n = mid. 10 else if ( a[mid] > x ) 11 i = mid + 1. The algorithm is simple: find the largest item and move it to the back of the array. which is considerably less than O(N). such as the contents of the array and the item being searched for. 17 } We can call this an O(N) algorithm and not be wrong because the time complexity will never exceed O(N). There's a notation for the asymptotically tight bound too. Wait a second.

3 while ( --n > 0 ) { 4 int i. Both rely on the size of the array. while ( k < n ) { if ( k + 1 < n && a[k] < a[k + 1] ) ++k. int n ) { int i = n / 2. i = k. selection sort is O(N2). 13 } 14 } This algorithm has two loops. &a[max] ). i. } } . int i. Let's look at a faster sort. and even then it's difficult to analyse. we can turn the whole process of selection into a Θ(log2 N) process: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 void jsw_do_heap ( int a[]. n ). &a[n] ). 0. int save = a[i]. jsw_do_heap ( a. } a[i] = save. so the algorithm is clearly O(N * N).> 0 ) jsw_do_heap ( a. more commonly shown as O(N2) and referred to as quadratic. n ). The heap sort algorithm uses a tree based structure to make the selection process faster. while ( i-. max = n. k = i * 2 + 1. int n ) { int k = i * 2 + 1. i++ ) { 7 if ( a[i] > a[max] ) 8 max = i. if ( save >= a[k] ) break. But that doesn't matter much because the upper bound is really all we care about for an existing sorting algorithm. while ( --n > 0 ) { jsw_swap ( &a[0]. The fact that N decreases with each step of the outer loop is irrelevant unless you want a tight bound. Because the selection part of a selection sort is Θ(N). 5 6 for ( i = 0. and the outer loop that it's nested in is O(N). } void jsw_heapsort ( int a[]. 9 } 10 11 if ( max != n ) 12 jsw_swap ( &a[n]. one inside of the other. But by using a heap where selection is O(1) and fixing the heap is Θ(log2 N). i < n. a[i] = a[k].

Here are the upper bound time complexities in order of growth from least to greatest that you're most likely to see:       O(1) . jsw_do_heap is Θ(log2 N). O(Nlog2 N). we can safely say that heap sort has Θ(log2 N) time complexity. but I think I covered all of the foundations that would be useful in figuring out a basic time complexity for your own algorithms as well as understanding many of the time complexities given for existing algorithms. but it's often shown as O(N * log2 N).No growth O(log2 N) . algorithm analysis can only be simplified so much before it becomes useless.Because the heap is structured like a tree. However. but because the second loop is O(N) and dominates the first loop. O(log2 N) for logarithmic time. Unfortunately.Grows by the logarithm of N when N doubles O(N) . The first loop in jsw_heapsort is O(N / 2). but you won't see them often. O(N) for linear time. such as O(N!) for a ridiculous factorial growth. So we have an O(N) loop that calls a Θ(log2 N) function. because the lower bound of heap sort is also Ω(Nlog2 N) for the same reasons as binary search.Grows with N when N doubles O(Nlog2 N) . we can toss the complexity of the first loop.Grows by the product of N and the logarithm of N when N doubles O(N2) .Grows by the factorial of N when N doubles Hopefully this article clears up any confusion without delving too far into the theory. which doesn't have a set descriptive name.Grows twice as fast as N when N doubles O(N!) . . Others exist. and O(N2) for quadratic time. We've looked at the most common time complexities: O(1) for constant time. We conclude that the upper bound of heap sort is O(Nlog2 N).

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->