You are on page 1of 21

Mathematics Background

. Arithmetic progressions: In this series, the difference between an element to its successor is the same as the difference between the element and its predecessor. So the series will be, a, a + d, a + 2d, a + 3d, Sum of n terms = n/2 * ( first term + last term) Also the sum of n terms = (n/2) * [ 2 * first term + (n-1) * constant diff.] = (n/2)*[2a + (n - 1) d] Geometric Progressions: There will be a constant ratio between an element and its successor( it is the same as the ratio between an element and its predecessor).So the series will be a, a r, ar^2, a r^3,

The log functions grow slowly compared to linear functions.. loga(x) is a constant multiple of logb(x) for fixed a, b. Whenever the lg is specified, it is log base 2. Factorials: A number n! is represented by 1 * 2 * 3 * . * (n-1) * n

In the figure in the slide, the x axis represents the problem size and the y axis represents the resources... Growth of functions: The above figure shows the growth of a few mathematical functions. The x-axis varies from 0 to 50 and the y-axis varies from 0 to 100. The point to observed here is that the growth rate of the function log(n) is smaller when compared to the other functions namely n, nlog(n), n2 and 2n. An exponential function like 2n. will ultimately over take any polynomial function.. From the graph, we can find that the logarithmic functions will grow more slowly and the exponential functions will grow much faster. What are factorial functions? What is their growth rate? The functions which grows at the rate of n! are called

factorial functions.The growth rate of factorial is tremendous, that it will be much ore greater

than 2^n.

Introduction to Algorithms The etymology of the word Algorithm dates back to the 8th Century AD. The word
Algorithm is derived from the name of the Persian author Abu Jafar Mohammad ibn Musa al Khowarizmi

Abu Jafar Mohammad ibn Musa al Khowarizmi - was a great mathematician who was born around 780 AD in Baghdad. He worked on, algebra, geometry, and astronomy. His treatise on algebra, Hisab al-jabr w'al-muqabala, was the most famous and important of all of alKhwarizmi's works. It is the title of this text that gives us the word "algebra" What is an Algorithm?

An Algorithm is defined as Finite set of instructions to accomplish a task. An Algorithm has five properties as follows: Finiteness: An algorithm should end in a finite number of steps. Definiteness: Every step of an algorithm should be clear and unambiguously defined. Input:The input of an algorithm can either be given interactively by the user or generated internally. Output: An algorithm should have at least one output. Effectiveness: Every step in the algorithm should be easy to understand and prove using paper and pencil.

Pseudo Code
An algorithm is independent of any language or machine whereas a program is dependent on a language and machine. To fill the gap between these two, we need pseudo codes.

.Algorithms are developed during the design phase of software engineering. During the design phase, we first look at the problem, try to write the psuedo-code and move towards the programming (implementation) phase. It is a high level description of the algorithm It is less detailed than the program Will not reveal the design issues of the program Uses English like language. Refer to note book for pseudo code conventions or refer page number 5 of our prescribed text Fundamentals of computer algorithm by Horowitz and Sahni. Life Cycle of an Algorithm( Refer page number 2 of the text) Design the Algorithm Validate the algorithm Analyze the Algorithm Test the Algorithm

The life cycle of an algorithm consists of the four phases: Design, Write, Test and Analyze. (i) Design: The design techniques help in devising the algorithms. Some techniques are Divide & Conquer, Greedy Technique, Dynamic Programming , Backtracking, Branch and bound etc. (ii) Validation : Once an algorithm is designed it is necessary to see whether it gives correct answer for all possible legal inputs.. (iii) Analyze: Estimating the amount of time/space (which are considered to be prime resources) required while executing the algorithm. (iii) Test: Testing the algorithm for its correctness PERFORMANCE ANALYSIS

The Primary Resources available in a deterministic silicon computer are: CPU & Primary memory. In this course we will focus on time (CPU utilization) and space (memory utilization). When an algorithm is designed it should be analyzed for the amount of these resources it consumes. While solving a problem, an algorithm consuming more resources than others will not be considered in most of the cases. Why Performance? Since most of the software problems do not have a unique solution, we are always interested in finding the better solution. A better solution is judged based on its performance. Some of the performance measures include the time taken by the solution, the quality of the solution, the simplicity of the solution, etc. For any solution to a problem we would always ask the following questions: Is it feasible to use this solution? In other words is it efficient enough to be used in practice? The efficiency measure which we normally look for is time and space. How much time does this solution take?. How much space (memory) does this solution occupy? Improving the performance of a solution can be done by improving the algorithm design, database design, transaction design and by paying attention to the end-user psychology. Also continuous improvements in hardware and communication infrastructure aid in improving the performance of a solution.

When a programmer builds an algorithm during design phase of software life cycle, he/she might not be able to implement it immediately. This is because programming comes in later part of the software life cycle. But there is a need to analyze the algorithm at that stage. This will help in forecasting how much time the algorithm takes or how much primary memory it might occupy when it is implemented. So analysis of algorithm becomes very important .Complexity of an algorithm represents the amount of resources required while executing the algorithm. There will always be a tradeoff between the time and space complexity. Most of the problems which require more space will take less time to execute and vice versa. Example : Think of a GUI drop-down list box that displays a list of employees whose names begin with a specified sequence of characters. If the employee database is on a different machine, then there are two options: Option a: fire a SQL and retrieve the relevant employee names each time the list is dropped down. Option b: keep the complete list of employees in memory and refer to it each time the list is dropped down. Which is the preferred option and why? This example does not have a unique solution. It depends on various parameters which include: The number of employees The transmission time from the database server to the client machine The volume of data transmission each time The frequency of such requests. The network bandwidth Neither of the solutions is the better one. The main point here is the tradeoff. When ever we need a better performance in terms of time taken, then we could opt for the option b which would however lead to more memory requirements. The vice versa is also true. When we want our solution to occupy less memory (space) then we need to strike a compromise for the efficiency in terms of time taken. This tradeoff is called the space time tradeoff which is an universal principle

Analysis of Algorithms
There are two types of Analysis: 8

Priori Analysis: This is the theoretical estimation of resources required. Here the efficiency of the algorithm is checked. If possible the logic of the algorithm can be improved for efficiency.This is done before the implementation of the algorithm on a machine and so it is done independent of any machine/software. Posteriori Analysis: This Analysis is done after implementing the algorithm on a target machine. It is aimed at determination of actual statistics about algorithms consumption of time and space requirements (primary memory) in the computer when it is being executed as a program. Eg. Algorithm to check whether a number is prime or not. Algo1: Divide the number n from 2 to (n-1) and check the reminder Algo2: Divide the number n from 2 to n/2 and check the reminder Algo3: Divide the number n from 2 to sqrt(n) and check the reminder Before implementing the algorithm (Priori Analysis) in a programming language, the best of the three algorithms will be selected(Algo3 will suit if n is large). After implementing the algorithm (Posteriori Analysis) in a programming language, the performance is checked with the help of a profiler. Algorithms can be analyzed in many dimensions, speed, accuracy, power consumption, and resiliency. Numerical algorithms have to be devised for adequate accuracy. Only after you get sufficient accuracy can we look at speed. Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc. Memory or in general resource usage is a dual metric Embedded systems have to be power efficient, e.g. cell phones. Many algorithms, especially banking and finance are required to be fault tolerant, especially of server failures, etc. These systems are required to be generally geographically distributed. The resulting communication overhead can often be the dominant contribution to time. Analysis based on time taken to execute the algorithm is called Time complexity of the Algorithm Analysis based on the memory required to execute the algorithm is called Space

complexity of the Algorithm Space Complexity The space needed by a program has the following components: 1) A Fixed part that is independent of the characteristics of inputs and outputs.It includes Instruction space and Data space Instruction space: Space needed to store the object code. Data space: Space needed to store constants & variables. 2) A Variable part that consists of the space needed by the variables whose size is dependent on the particular problem instance. Environment stack space: Space needed when functions are called. If the function, fnA calls another function fnB then the return address and all the local variables and formal parameters are to stored. The space requirement S(P) of any algorithm may be written as S(P) = c + S p(instance characteristics),where c is a constant. Refer the problems that has been dealt in the class or refer page 16 in the text Time Complexity The time complexity of an algorithm is given by the number of steps taken by the algorithm to compute the function it was written for. Even though any specific instance may have several characteristics we choose those characteristics which are important to us. Therefore time complexity is a varying factor which depends on the machine, current load of the system , compilers and other real time factors. Time complexity includes the compilation time and execution time but compilation is done once whereas the execution is done n number of times. So the compilation time is not considered in most of the cases but only the execution time a) Operation count is one way to estimate the Time Complexity. Example 1: Searching an array for the presence of an element. Here the time complexity is estimated based on the number of search operations. Example 2: Finding the roots of a quadratic equation ax2+bx+c =0

10

The roots are (b + sqrt(b2 -4*a*c))/2a and (b - sqrt(b2 -4*a*c))/2a. Here the number of operations can be reduced by computing the common expression sqrt(b 2 -4*a*c). The success of this method (Operation count) depends on the identification of the exact operation/s that contribute most to the time complexity. So we could obtain an expression for tP(n) as tP(n) = caADD(n) + csSUB(n) + cmMUL(n) +cdDIV(n) + where n denotes the instance characteristics and ca, cs, cm, cd denotes the time needed for addition, subtraction, multiplication and division etc. Obtaining such an exact formula is a difficult task because the time for arithmetic operation depend on the numbers being involved it. Also in a multi user system it the execution time depends on the factors such as system load, number of other programs running on the computer and characteristics of the other program etc. b) Step count is another way to estimate time complexity It is a meaningful segment of a program that has an execution time that is independent of instance characteristics. Consider the code below: sum(array, n) { tsum : = 0; for (i:=0 ; i<n ; i++) tsum = tsum + array[i]; return tsum; } Total steps 0 0 1 n+1 n 1 0 calculation of time complexity based on the nature of the algorithm For many algorithms the time complexity is not fully depends on the number of inputs or outputs or some other easily specified characteristics. Example : In Searching algorithm ,if the searching element is the first one then within one step we get the element. Similarly if the searching element is not present in the array we need to search entire array in order to reach a conclusion. Therefore The analysis of the algorithm depends on the nature of the problem also. Thus we have:

Total number of steps: 2n+3 Refer page number 25 for more example

11

Worst case analysis Average case analysis Best case analysis Worst case: Under what condition/s does the algorithm when executed consumes maximum amount of resources. It is the maximum amount of resource the algorithm can consume for any value of problem size. Best case: Under what condition/s does the algorithm when executed consumes minimum amount of resources. Average case: This is between worst case & best case. It is probabilistic in nature. Average-case running times are calculated by first arriving at an understanding of the average nature of the input, and then performing a running-time analysis of the algorithm for this configuration. Average case analysis is done by considering every possibility are equally likely to happen. Why Worst case analysis? goodness of an algorithm is most often expressed in terms of its worst-case running time. Need for a bound on ones pessimism, Every Body needs a guarantee. This is the maximum time an algorithm will take on a given input size ease of calculation of worst-case times In case of critical systems we can not rely on average or best case times Worst Case for all sorting problems is when the inputs are in the reverse order Determining an exact step count for best/average/worst is a very difficult task because the notion of step count itself is inexact. (For example we consider step count for both the instructions x := y and x := y + z +(x/y) +5 (x/z) as 1 even though latter involves more operation).hence we go for order of magnitude. Here we assume the algorithm with running time an+b where a, b constant is same as another algorithm with running time cn+d ,but faster than an algorithm with running time a1n2 12

c) Order of magnitude

In calculating the order of magnitude, the lower order terms are left out as they are relatively insignificant. The assumptions in the example are made because we will not know on which machine the algorithm is to be implemented. So we cant exactly say how much time each statement will take. The exact time depends on the machine on which the algorithm is run. In the example the approximation is done because for higher values of n, the effect of c (constant) will not be significant. Thus, constants can be ignored.

In the above example, the inner loop will be executed m times and the outer loop n times.

13

Asymptotic notations for determination of order of magnitude of an algorithm The limiting behavior of the complexity of a problem as problem size increases is called asymptotic complexity The most common asymptotic notations are: Big Oh ( O) notation: It represents the upper bound of the resources required to solve a problem.It is represented by O Omega notation: It represents the lower bound of the resources required to solve a problem.It is represented by

The goodness of an algorithm is expressed usually in terms of its worst case running time.Worst case running time of an algorithm is the upper bound for time of execution of that algorithm for different problem size.An algorithm is said to have a worst-case running time of O(n^2) if, its running time.(execution time) is always bound within n^2 where n is the problem size. Goodness of an algorithm refers to efficiency or capability.Upper bound is also called the upper limit or the range of maximum values. Eg: when we consider marks of a student out of 100, 100 is the upper bund and no student get marks more than 100.

14

While we compute the complexity of any algorithm, we take the threshold problem size i.e n > n0 , where n0 is the threshold problem size(break even point) and n is the problem size. Accordingly we determine the upper bound of computation.In the above graph, the dotted line (parallel to y axis ) passing through the intersection of T(n) and f(n) represents the threshold problem size.The threshold problem size is taken into account in priori analysis because the algorithm might have some assignment operations which cant be neglected for a lower problem size ( i.e for lower values of n). Example: T(n) = (n+1)2 Which is O(n2). f(n) = n2 Let n0 = 1 ( threshold value) c=(1+1)2 = 4 So there exists n0 and c such that T(n) <= cf(n). Definition of Big "oh" f(n)= O(g((n)) iff there exist positive constants c and n0 such that f(n)<= cg(n) for all n, n>= n0 Examples 3n+ 2= O(n) as 3n+ 2<= 4n for all n>= 2 10n2+ 4n+ 2= O(n2) as 10n2+ 4n+ 2<= 11n2 for n>= 5 3n+2<> O(1), 10n2+ 4n+ 2<> O(n) Remarks g(n) is the least upper bound n=O(n2)=O(n2.5)= O(n3)= O(2n) O(1): constant, O(n): linear, O(n2): quadratic, O(n3): cubic, and O(2n): exponential

15

Theta notation: If it can proved that for any two constants c1 & c2, T(n) lies between c1.f(n) and c2.f(n) then T(n) can be expressed as ( f( n )). Omega notation: The function f(n) is the lower bound for T(n). This means for any value of n (n n0), the time of computation of the algorithm T(n) is always above the graph of f(n). So f(n) serves as the lower bound for T(n). Big Oh Vs Omega notations Case (i) : A Project manager requires maximum of 100 software engineers to finish the project on time. Case (ii) : The Project manager can start the project with minimum of 50 software engineers but cannot assure the completion of project in time. Case (i) is similar to Big Oh notation, specifying the upper bound of resources needed to do a task. Case (ii) is similar to Omega notation, specifying the lower bound of resources needed to do a task.

Problems
1) consider 2 algorithms with time complexity f(n) = 1000n2 and g(n) = 1/1000 n3 . Prove that g(n) O(f(n)? If g(n) = O(f(n) then there exists positive constants c and n0 such that 1/1000 n3 <= c1000 n2 for all n>=n0 .ie n < = 106 c = k .Therefore g(n) = O(f(n) only when n <= k, a fixed constant which is not true. 2)Prove that 3n+4 O(1) Asymptotic notation manipulations Sol . Suppose 3n+4 O(1).then there exists +ve constants c and n0 such that 3n+4 <= c .1 for some c and for all n>=n0. ie 3n<=c-4 n <= c-4/3 = k. this means that 3n+4 O(1) when n is less than some fixed constant which is not true and hence the proof

16

. Rule I The leading coefficients of highest power of n and all lower powers of n and the constants are ignored in f(n). Example: T(n) = O(100n3 + 29 n2 + 19n) Representing the same in big Oh notation as T(n) = O(n3) The constants and the slower growing terms are ignored as their growth rates are insignificant compared to the growth rate of the highest power. The following table highlights why we are ignoring lower order terms

n 10 20 50 100 1000 10000 100000

n2 100 400 2500 10000 1000000 100000000 10000000000

0.1n2 + n +100 120 160 400 12000 101100 100100100 10000100100

n2 + + 2n 5 125 445 2605 10205 1000105 100001005 10000010005

500n and n2/ 10 will meet at 5000 (threshold) Unless the threshold is very high we take a lower growth
ISTE STTP on Latest Advances in Algorithm Analysis and Design 2/ 2/ 2009

33

Rule II : The time of execution of a for loop is the running time of all statements inside the for loop multiplied by number of iterations of the for loop. Example: for( i=0 to n) { x := x + 1; y := y + 1; x := x + y 17

} The for loop is executed n times. So,worst case running time of the algorithm is T(n) =O( 3* n )=O( n ) Rule III : If we have a nested for loop, in an algorithm, the analysis of that algorithm should start from the inner loop and move it outwards towards outer loop. Example: for(j=0 to m) { for( i=0 to n) { x := x + 1; y :=y + 1; z :=x + y; } } The worst case running time of inner loop is O( 3*n ).The worst case running time of outer loop is O( m*3*n ).The total running time = O ( m * n ) Rule IV : The execution of an if else statement is an algorithm comprises of Execution time for testing the condition . The maximum execution time of either if or else( whichever is larger ) Example: If(x > y) { print( x is larger than y); print( x is the value to be selected); z := x; x := x+1; } else print( x is smaller than y); The execution time of the program is the exec. time of testing (X > Y) +exec. time of if statement, as the execution time of if statement is more than that of else statement

18

Note O(constant)=1. For example, O(100)=1 For little omega and for little oh ,refer either note book or text book page no: 31

The above given code inserts a value k into position l in an array a. The basic operation here is copy. Worst Case Analysis: Step 2 does n-1 copies in the worst case. Step 3 does 1 copy. So the total number of copy operations is n-1+1=n. Hence the worst case complexity of array insertion is O(n). Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs 2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the average number of copies that step 2 performs is (1/n) + (2/n) + + (n-1)/n + (n/n) =(n+1)/2. Also step 3 performs 1 copy. So on an average the array insertion performs ((n+1)/2) + 1 copies. Hence the average case complexity of array insertion is O(n). Best case Analysis: O(1) = 1, as only one insertion is done with no movements.

19

The above given code deletes the value k at a given index i in an array a. The basic operation here is copy. Worst Case Analysis: Step 2 does n-1 copies in the worst case. So the total number of copy operations is n-1. Hence the worst case complexity is O(n). Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs 2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the average number of copies that step 2 performs is (1/n) + (2/n) + + (n-1)/n = (n-1)/2. So on an average the array deletion performs ((n-1)/2) copies. Hence the average case complexity of array insertion is O(n). Best case Analysis: O(1) = 1, as only one deletion will be done with no further movements.

Recurrence A recurrence is equation or inequality that describes a function in terms of its value on smaller inputs. A recurrence equation is of the form T(n) = aT(n/b) + f(n).( i.e the given problem is divided ino a sub problems of size n/b) .Consider an example of computing the Fibonacci sequence using recursive algorithm. Algorithm fibrec(n) { 20

if(n<2) then return n else return fibrec(n-1)+ fibrec(n-2) }. Let T(n) be the time taken by a call on fibrec(n)..If n > 2 the work is spent in the two recursive calls which take time T(n-1) and T(n-2).let h(n) denote time for addition of values returned by the recursive calls. Therefore the recuurence equation is given by T(n) = 1 for n = 0 or n = 1

T(n-1) + T(n-2) + h(n) for n >2. There are 3 methods to solve the recurrence equation, substitution method,master theorem and recursion tree. Master Theorem Refer either class notes or the book algorithms by Thomas coreman. Recursion Tree : refer note book

21

You might also like