You are on page 1of 21

Mathematics Background

.
Arithmetic progressions:
In this series, the difference between an element to its successor is the same as the difference
between the element and its predecessor. So the series will be, a, a + d, a + 2d, a + 3d,…
Sum of n terms = n/2 * ( first term + last term)
Also the sum of n terms = (n/2) * [ 2 * first term + (n-1) * constant diff.] = (n/2)*[2a + (n - 1)
d]
Geometric Progressions:
There will be a constant ratio between an element and its successor( it is the same as the ratio
between an element and its predecessor).So the series will be a, a r, ar^2, a r^3, …

1
The log functions grow slowly compared to linear functions.. loga(x) is a constant
multiple of logb(x) for fixed a, b. Whenever the lg is specified, it is log base 2.
Factorials: A number n! is represented by 1 * 2 * 3 * …. * (n-1) * n

2
•In the figure in the slide, the x axis represents the problem size and the y axis represents the
resources...
Growth of functions: The above figure shows the growth of a few mathematical functions.
The x-axis varies from 0 to 50 and the y-axis varies from 0 to 100. The point to observed here
is that the growth rate of the function log(n) is smaller when compared to the other functions
namely n, nlog(n), n2 and 2n. An exponential function like 2n. will ultimately over take any
polynomial function.. From the graph, we can find that the logarithmic functions will grow
more slowly and the exponential functions will grow much faster. What are factorial
functions? What is their growth rate? The functions which grows at the rate of n! are called

3
factorial functions.The growth rate of factorial is tremendous, that it will be much ore greater

than 2^n.
Introduction to Algorithms
The etymology of the word Algorithm dates back to the 8th Century AD. The word
Algorithm is derived from the name of the Persian author “Abu Jafar Mohammad ibn
Musa al Khowarizmi”

4
Abu Jafar Mohammad ibn Musa al Khowarizmi - was a great mathematician who was born
around 780 AD in Baghdad. He worked on, algebra, geometry, and astronomy. His treatise
on algebra, Hisab al-jabr w'al-muqabala, was the most famous and important of all of al-
Khwarizmi's works. It is the title of this text that gives us the word "algebra"
What is an Algorithm?


An Algorithm is defined as “Finite set of instructions to accomplish a task”. An Algorithm
has five properties as follows:
Finiteness: An algorithm should end in a finite number of steps.
Definiteness: Every step of an algorithm should be clear and unambiguously defined.
Input:The input of an algorithm can either be given interactively by the user or generated
internally.
Output: An algorithm should have at least one output.
Effectiveness: Every step in the algorithm should be easy to understand and prove using
paper and pencil.
Pseudo Code
• An algorithm is independent of any language or machine whereas a program is dependent
on a language and machine. To fill the gap between these two, we need pseudo codes.

5
.Algorithms are developed during the design phase of software engineering. During the
design phase, we first look at the problem, try to write the “psuedo-code” and move towards
the programming (implementation) phase.
It is a high level description of the algorithm
It is less detailed than the program
Will not reveal the design issues of the program
Uses English like language.
Refer to note book for pseudo code conventions or refer page number 5 of our
prescribed text “Fundamentals of computer algorithm “ by Horowitz and Sahni.

Life Cycle of an Algorithm( Refer page number 2 of the text)


• Design the Algorithm
• Validate the algorithm
• Analyze the Algorithm
• Test the Algorithm
The life cycle of an algorithm consists of the four phases: Design, Write, Test and Analyze.
(i) Design:
The design techniques help in devising the algorithms. Some techniques are Divide &
Conquer, Greedy Technique, Dynamic Programming , Backtracking, Branch and bound etc.
(ii) Validation : Once an algorithm is designed it is necessary to see whether it gives correct
answer for all possible legal inputs..
(iii) Analyze: Estimating the amount of time/space (which are considered to be prime
resources) required while executing the algorithm.
(iii) Test: Testing the algorithm for its correctness

PERFORMANCE ANALYSIS

6
The Primary Resources available in a deterministic silicon computer are:
CPU & Primary memory. In this course we will focus on time (CPU utilization) and space
(memory utilization). When an algorithm is designed it should be analyzed for the amount of
these resources it consumes. While solving a problem, an algorithm consuming more
resources than others will not be considered in most of the cases.
Why Performance?
Since most of the software problems do not have a unique solution, we are always interested
in finding the better solution. A better solution is judged based on its performance. Some of
the performance measures include the time taken by the solution, the quality of the solution,
the simplicity of the solution, etc.
For any solution to a problem we would always ask the following questions:
“Is it feasible to use this solution?” In other words is it efficient enough to be used in
practice? The efficiency measure which we normally look for is time and space. How much
time does this solution take?. How much space (memory) does this solution occupy?
Improving the performance of a solution can be done by improving the algorithm design,
database design, transaction design and by paying attention to the end-user psychology. Also
continuous improvements in hardware and communication infrastructure aid in improving
the performance of a solution.

7
When a programmer builds an algorithm during design phase of software life cycle, he/she
might not be able to implement it immediately. This is because programming comes in later
part of the software life cycle. But there is a need to analyze the algorithm at that stage. This
will help in forecasting how much time the algorithm takes or how much primary memory it
might occupy when it is implemented. So analysis of algorithm becomes very important
.Complexity of an algorithm represents the amount of resources required while executing the
algorithm. There will always be a tradeoff between the time and space complexity. Most of
the problems which require more space will take less time to execute and vice versa.
Example : Think of a GUI drop-down list box that displays a list of employees whose names
begin with a specified sequence of characters. If the employee database is on a different
machine, then there are two options:
Option a: fire a SQL and retrieve the relevant employee names each time the list is dropped
down.
Option b: keep the complete list of employees in memory and refer to it each time the list is
dropped down.
Which is the preferred option and why?
This example does not have a unique solution. It depends on various parameters which
include:
• The number of employees
•The transmission time from the database server to the client machine
•The volume of data transmission each time
•The frequency of such requests.
•The network bandwidth
Neither of the solutions is the better one. The main point here is the tradeoff. When ever we
need a better performance in terms of time taken, then we could opt for the option b which
would however lead to more memory requirements. The vice versa is also true. When we
want our solution to occupy less memory (space) then we need to strike a compromise for
the efficiency in terms of time taken. This tradeoff is called the space time tradeoff which is
an universal principle

Analysis of Algorithms
There are two types of Analysis:

8
Priori Analysis:
This is the theoretical estimation of resources required. Here the efficiency of the algorithm
is checked. If possible the logic of the algorithm can be improved for efficiency.This is done
before the implementation of the algorithm on a machine and so it is done independent of any
machine/software.
Posteriori Analysis:
This Analysis is done after implementing the algorithm on a target machine. It is aimed at
determination of actual statistics about algorithm’s consumption of time and space
requirements (primary memory) in the computer when it is being executed as a program.
Eg. Algorithm to check whether a number is prime or not.
Algo1: Divide the number n from 2 to (n-1) and check the reminder
Algo2: Divide the number n from 2 to n/2 and check the reminder
Algo3: Divide the number n from 2 to sqrt(n) and check the reminder
Before implementing the algorithm (Priori Analysis) in a programming language, the best
of the three algorithms will be selected(Algo3 will suit if n is large).
After implementing the algorithm (Posteriori Analysis) in a programming language, the
performance is checked with the help of a profiler.
Algorithms can be analyzed in many dimensions, speed, accuracy, power consumption, and
resiliency.
•Numerical algorithms have to be devised for adequate accuracy. Only after you get
sufficient accuracy can we look at speed.
•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc.
Memory or in general resource usage is a dual metric
•Embedded systems have to be power efficient, e.g. cell phones.
•Many algorithms, especially banking and finance are required to be fault tolerant, especially
of server failures, etc. These systems are required to be generally geographically distributed.
The resulting communication overhead can often be the dominant contribution to time.

• Analysis based on time taken to execute the algorithm is called Time complexity of the
Algorithm
• Analysis based on the memory required to execute the algorithm is called Space

9
complexity of the Algorithm

Space Complexity
The space needed by a program has the following components:
1) A Fixed part that is independent of the characteristics of inputs and outputs.It includes
Instruction space and Data space
Instruction space: Space needed to store the object code.
Data space: Space needed to store constants & variables.
2) A Variable part that consists of the space needed by the variables whose size is dependent
on the particular problem instance.
Environment stack space: Space needed when functions are called. If the function, fnA calls
another function fnB then the return address and all the local variables and formal parameters
are to stored.
The space requirement S(P) of any algorithm may be written as S(P) = c + S p(instance
characteristics),where c is a constant.
Refer the problems that has been dealt in the class or refer page 16 in the text

Time Complexity
The time complexity of an algorithm is given by the number of steps taken by the algorithm
to compute the function it was written for. Even though any specific instance may have
several characteristics we choose those characteristics which are important to us. Therefore
time complexity is a varying factor which depends on the machine, current load of the
system , compilers and other real time factors. Time complexity includes the compilation
time and execution time but compilation is done once whereas the execution is done n
number of times. So the compilation time is not considered in most of the cases but only the
execution time

a) Operation count is one way to estimate the Time Complexity.


• Example 1: Searching an array for the presence of an element. Here the time complexity is
estimated based on the number of search operations.
• Example 2: Finding the roots of a quadratic equation ax2+bx+c =0

10
The roots are (–b + sqrt(b2 -4*a*c))/2a and (–b - sqrt(b2 -4*a*c))/2a.
Here the number of operations can be reduced by computing the common expression sqrt(b 2
-4*a*c).
The success of this method (Operation count) depends on the identification of the exact
operation/s that contribute most to the time complexity. So we could obtain an expression for
tP(n) as tP(n) = caADD(n) + csSUB(n) + cmMUL(n) +cdDIV(n) + ……where n denotes the
instance characteristics and ca, cs, cm, cd denotes the time needed for addition, subtraction,
multiplication and division etc.
Obtaining such an exact formula is a difficult task because the time for arithmetic operation
depend on the numbers being involved it. Also in a multi user system it the execution time
depends on the factors such as system load, number of other programs running on the
computer and characteristics of the other program etc.
b) Step count is another way to estimate time complexity
It is a meaningful segment of a program that has an execution time that is independent of
instance characteristics.
Consider the code below: Total steps
sum(array, n) 0
{ 0
tsum : = 0; 1
for (i:=0 ; i<n ; i++) n+1
tsum = tsum + array[i]; n
return tsum; 1
} 0
Total number of steps: 2n+3 Refer page number 25 for more example
calculation of time complexity based on the nature of the algorithm
For many algorithms the time complexity is not fully depends on the number of inputs or
outputs or some other easily specified characteristics.
Example : In Searching algorithm ,if the searching element is the first one then within one
step we get the element. Similarly if the searching element is not present in the array we need
to search entire array in order to reach a conclusion. Therefore The analysis of the algorithm
depends on the nature of the problem also. Thus we have:

11
• Worst case analysis
• Average case analysis
• Best case analysis
Worst case:
Under what condition/s does the algorithm when executed consumes maximum amount of
resources. It is the maximum amount of resource the algorithm can consume for any value of
problem size.
Best case:
Under what condition/s does the algorithm when executed consumes minimum amount of
resources.
Average case:
This is between worst case & best case. It is probabilistic in nature. Average-case running
times are calculated by first arriving at an understanding of the average nature of the input,
and then performing a running-time analysis of the algorithm for this configuration. Average
case analysis is done by considering every possibility are equally likely to happen.

Why Worst case analysis?


 goodness of an algorithm is most often expressed in terms of its worst-case running
time.
 Need for a bound on one’s pessimism, Every Body needs a guarantee. This is the
maximum time an algorithm will take on a given input size
 ease of calculation of worst-case times
 In case of critical systems we can not rely on average or best case times
 Worst Case for all sorting problems is when the inputs are in the reverse order

Determining an exact step count for best/average/worst is a very difficult task because the
notion of step count itself is inexact. (For example we consider step count for both the
instructions x := y and x := y + z +(x/y) +5 – (x/z) as 1 even though latter involves more
operation).hence we go for order of magnitude. Here we assume the algorithm with running
time an+b where a, b constant is same as another algorithm with running time cn+d ,but
faster than an algorithm with running time a1n2

12
c) Order of magnitude

In calculating the order of magnitude, the lower order terms are left out as they are relatively
insignificant. The assumptions in the example are made because we will not know on which
machine the algorithm is to be implemented. So we can’t exactly say how much time each
statement will take. The exact time depends on the machine on which the algorithm is run. In
the example the approximation is done because for higher values of ‘n’, the effect of ‘c’
(constant) will not be significant. Thus, constants can be ignored.

In the above example, the inner loop will be executed m times and the outer loop n times.

13
Asymptotic notations for determination of order of magnitude of an algorithm
The limiting behavior of the complexity of a problem as problem size increases is called
asymptotic complexity
The most common asymptotic notations are:
• ‘Big Oh’ ( ‘O’) notation:
It represents the upper bound of the resources required to solve a problem.It is represented by
‘O’
‘Omega’ notation:
It represents the lower bound of the resources required to solve a problem.It is represented by

The goodness of an algorithm is expressed usually in terms of its worst case running
time.‘Worst case running time’ of an algorithm is the ‘upper bound’ for time of execution
of that algorithm for different problem size.An algorithm is said to have a worst-case running
time of O(n^2) if, its running time.(execution time) is always bound within n^2 where n is
the problem size.
Goodness of an algorithm refers to efficiency or capability.Upper bound is also called the
upper limit or the range of maximum values. Eg: when we consider marks of a student out of
100, 100 is the upper bund and no student get marks more than 100.

14
While we compute the complexity of any algorithm, we take the threshold problem size i.e n
> n0 , where n0 is the threshold problem size(break even point) and n is the problem size.
Accordingly we determine the upper bound of computation.In the above graph, the dotted
line (parallel to y axis ) passing through the intersection of T(n) and f(n) represents the
threshold problem size.The threshold problem size is taken into account in priori analysis
because the algorithm might have some assignment operations which can’t be neglected for a
lower problem size
( i.e for lower values of ‘n’).
Example:
T(n) = (n+1)2
Which is O(n2).
f(n) = n2
Let n0 = 1 ( threshold value)
c=(1+1)2 = 4
So there exists n0 and c such that T(n) <= cf(n).
Definition of Big "oh"
– f(n)= O(g((n)) iff there exist positive constants c and n0 such that f(n)<=
cg(n) for all n, n>= n0
Examples
– 3n+ 2= O(n) as 3n+ 2<= 4n for all n>= 2
– 10n2+ 4n+ 2= O(n2) as 10n2+ 4n+ 2<= 11n2 for n>= 5
– 3n+2<> O(1), 10n2+ 4n+ 2<> O(n)
Remarks
– g(n) is the least upper bound
n=O(n2)=O(n2.5)= O(n3)= O(2n)
– O(1): constant, O(n): linear, O(n2): quadratic, O(n3): cubic, and O(2n):
exponential

15
Theta notation:

If it can proved that for any two constants c1 & c2, T(n) lies between c1.f(n) and c2.f(n) then
T(n) can be expressed as θ ( f( n )).

Omega notation:

The function f(n) is the lower bound for T(n). This means for any value of n (n ≥ n0), the
time of computation of the algorithm T(n) is always above the graph of f(n). So f(n) serves
as the lower bound for T(n).

Big ‘Oh’ Vs Omega notations

Case (i) : A Project manager requires maximum of 100 software engineers to finish the
project on time.
Case (ii) : The Project manager can start the project with minimum of 50 software engineers
but cannot assure the completion of project in time.
Case (i) is similar to Big Oh notation, specifying the upper bound of resources needed to do a
task.
Case (ii) is similar to Omega notation, specifying the lower bound of resources needed to do
a task.
Problems
1) consider 2 algorithms with time complexity f(n) = 1000n2 and g(n) = 1/1000 n3 . Prove
that g(n)∉ O(f(n)?
– If g(n) = O(f(n) then there exists positive constants c and n0 such that 1/1000
n3 <= c1000 n2 for all n>=n0 .ie n < = 106 c = k .Therefore g(n) = O(f(n) only
when n <= k, a fixed constant which is not true.
2)Prove that 3n+4 ∉ O(1)
– Sol . Suppose 3n+4 Є O(1).then there exists +ve constants c and n0 such that
3n+4 <= c .1 for some c and for all n>=n0. ie 3n<=c-4
– n <= c-4/3 = k. this means that 3n+4 Є O(1) when n is less than
some fixed constant which is not true and hence the proof

Asymptotic notation manipulations

16
.
Rule I
The leading coefficients of highest power of ‘n’ and all lower powers of ‘n’ and the constants
are ignored in f(n).
Example:
T(n) = O(100n3 + 29 n2 + 19n) Representing the same in big Oh notation as T(n) = O(n3)
The constants and the slower growing terms are ignored as their growth rates are
insignificant compared to the growth rate of the highest power. The following table
highlights why we are ignoring lower order terms

n n2 0.1n2 + n +100 n2 +2n +5


10 100 120 125
20 400 160 445
50 2500 400 2605
100 10000 12000 10205
1000 1000000 101100 1000105
10000 100000000 100100100 100001005
100000 10000000000 10000100100 10000010005

500n and n2/ 10 will meet at 5000 (threshold)


Unless the threshold is very high we take a lower
growth
ISTE STTP on Latest Advances in Algorithm Analysis and Design 2/ 2/ 2009 33

Rule II :
The time of execution of a ‘for loop’ is the ‘running time’ of all statements inside the ‘for
loop’ multiplied by number of iterations of the ‘for loop’.
Example:
for( i=0 to n)
{
x := x + 1;
y := y + 1;
x := x + y

17
}
The for loop is executed n times. So,worst case running time of the algorithm is T(n) =O( 3*
n )=O( n )
Rule III :
If we have a ‘nested for loop’, in an algorithm, the analysis of that algorithm should start
from the inner loop and move it outwards towards outer loop.
Example:
for(j=0 to m) {
for( i=0 to n) {
x := x + 1;
y :=y + 1;
z :=x + y;
}
}
The worst case running time of inner loop is O( 3*n ).The worst case running time of
outer loop is O( m*3*n ).The total running time = O ( m * n )
Rule IV :
The execution of an ‘if else statement’ is an algorithm comprises of • Execution time for
testing the condition . The maximum execution time of either ‘if’ or ‘else’( whichever is
larger )
Example:
If(x > y) {
print( “ x is larger than y”);
print(“ x is the value to be selected”);
z := x;
x := x+1;
}
else print( “ x is smaller than y”);
The execution time of the program is the exec. time of testing (X > Y) +exec. time of ‘if’
statement, as the execution time of ‘if’ statement is more than that of ‘else’ statement

18
Note
O(constant)=1.
For example, O(100)=1
For little omega and for little oh ,refer either note book or text book page no: 31

The above given code inserts a value k into position l in an array a. The basic operation here
is copy.
Worst Case Analysis: Step 2 does n-1 copies in the worst case. Step 3 does 1 copy. So the
total number of copy operations is n-1+1=n. Hence the worst case complexity of array
insertion is O(n).
Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived as
follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs 2
copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the
average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n + (n/n)
=(n+1)/2. Also step 3 performs 1 copy. So on an average the array insertion performs
((n+1)/2) + 1 copies. Hence the average case complexity of array insertion is O(n).
Best case Analysis:
O(1) = 1, as only one insertion is done with no movements.

19
The above given code deletes the value k at a given index i in an array a. The basic operation
here is copy.
Worst Case Analysis: Step 2 does n-1 copies in the worst case. So the total number of copy
operations is n-1. Hence the worst case complexity is O(n).
Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived as
follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs 2
copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the
average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n = (n-1)/2. So
on an average the array deletion performs ((n-1)/2) copies. Hence the average case
complexity of array insertion is O(n).
Best case Analysis:
O(1) = 1, as only one deletion will be done with no further movements.

Recurrence
A recurrence is equation or inequality that describes a function in terms of its value on
smaller inputs. A recurrence equation is of the form T(n) = aT(n/b) + f(n).( i.e the given
problem is divided ino a sub problems of size n/b) .Consider an example of computing the
Fibonacci sequence using recursive algorithm.
Algorithm fibrec(n)
{

20
if(n<2) then return n
else return fibrec(n-1)+ fibrec(n-2)
}.
Let T(n) be the time taken by a call on fibrec(n)..If n > 2 the work is spent in the two
recursive calls which take time T(n-1) and T(n-2).let h(n) denote time for addition of values
returned by the recursive calls. Therefore the recuurence equation is given by

T(n) = 1 for n = 0 or n = 1
T(n-1) + T(n-2) + h(n) for n >2.
There are 3 methods to solve the recurrence equation, substitution method,master theorem
and recursion tree.
Master Theorem
Refer either class notes or the book ‘algorithms by Thomas coreman’.
Recursion Tree : refer note book

21