You are on page 1of 73

DAAA ppt

by Dr
Dr. Pre
eeti Bailke
Introduction to algorithm
 An algorithm is a high
h level task in computer
science
 A set of operations to
o solve a computational
problem
 Calculation,
C l l ti d t proc
data cessing,
i and
d automated
t t d
reasoning tasks
Introduction to algorithm

• Algorithm: a tool for solving a well-specified


computational
p problem
p m
• Problem statement: sppecifies in general terms
the desired input/outp
put relationship
• The algorithm de
escribes a specific
computational
t ti l procedd
dure f
for achieving
hi i th t
that
input/output relationship
p
Introduction to algorithm

Here is how we forma


ally define the sorting
problem:
• Input: A sequence of n numbers
<a1 a2,
<a1, a2 …,an>
an>
• Output: A permutation
n (reordering)
<a1’, a2’, …, an’> of the
t input sequence such
that a1
a1’ ≤ a2
a2’ ≤ … ≤ an
n
n’
Introduction to algorithm
• A
An algorithm
l ith is i said
id to
t be
b correctt if,
if for
f every
input instance, it halts
h with the correct
output
t t
• A correct algorithm
g m solves the g
given
computational problem
m
• An incorrect algorithm might not halt at all on
some input instances, or
o
• It might halt with an inco
orrect answer
Introduction to algorithm

• An algorithm can be specified in English, as a


computer pseudocode,
pseudocode or even as a hardware
design
• The only requirement is that the specification
must provide a precise description of the
computational
t ti l procedur
d re to
t be
b followed
f ll d
Practical applications of algorithms

• The Human Genome Project


P
Identifies all the 100,00
100 00
00 genes in human DNA
Determines sequences s of 3 billion chemical
b
base pairs
i that
th t make
ke up human
h DNA
• Internet finds g
good routes on which the data will
travel
Practical applicatiions of algorithms

• Search engine quickly y finds pages on which


particular information re
p esides
• E-commerce include public-key cryptography
and digital signatures
• Manufacturing and other commercial
enterprises
t i allocate
ll t sccarce resources in
i mostt
beneficial way
Design of a
an algorithm
 Effi i t method
Efficient th d
 Expressed
p within finite
e ((min)) amount of time
and space
 Independent from any programming
p languages
Algorithm
m Design
 To solve
T l a problem,
bl diff
ff
fferent
t approaches
h can
be followed
 Efficient with respect to
o time consumption, OR
memory y efficient
 Tradeoff between time & space
Problem Development
p Steps
p
 Problem definition
 Development of a mo
odel
 Specification of an A
Algorithm
 Designing an Algorithm
 Checking the correcttness of an Algorithm
 Analysis of an Algorithm
 Implementation of an
n Algorithm
 Program testing
 Documentation
Characteristics
s of Algorithms
 U i
Unique name
 Explicitly
p y defined set off inputs
p and outputs
p
 Well-ordered with unam
mbiguous operations
 H lt iin a fifinite
Halt it amountt of
of time
ti
Pseudocode
 R
Removes ambiguity
bi it associated
i t d with
ith plain
l i text
t t
 High-level
g description
p o an algorithm
of g
 Independent of program
mming language
 R
Running
i time
ti can be
b es
stimated
ti t d
Theoretical analyysis of algorithms
 Estimate
E ti t the
th complexi
l iity
it function
f ti f arbitrarily
for bit il
large input
 Computational complex
xity theory
 Theoretical estimatioon for the required
resources of an algorrithm to solve a specific
computational problem
 Amount of time and spa
ace resources
Theoretical Analyysis of algorithms
 Efficiency
Effi i or running
i titim
me off an algorithm
l ith
 Function relating
g the inp
put length
g to the number
of steps, known as timee complexity
 Volume of memory
memory, kno
own as space complexity
Time Co
omplexity
It’s a function describin ngg the amount of time
required to run an algorrithm in terms of the size
of the input.
p
"Time" can mean the e number of memory
accesses performed,
performed
ns between integers,
the number of comparison
the number of times some
e inner loop is executed,
or some other natural unitt related to the amount of
real time the algorithm will
w take.
Space Complexity
It’s a function
It’ f ti d
describing
ibi g the
th amountt off memory
an algorithm takes in teerms of the size of input
t the
to th algorithm.
l ith
Space complexity
y is som
metimes ignored
g because
the space used is minim
mal and/or obvious,
however sometimes it be
ecomes as important an
issue as time.
Algorithm:
g In
nsertion-Sort
 Input: A list L of integerrs of length n
 Output: A sorted lis st L1 containing those
integers
g present in L
p
 Step 1: Keep a sorted
d list L1 which starts off
empty
 Step 2: Perform Step 3 for each element in the
original
i i l list
li t L
 Step
p 3: Insert it into th
he correct p
position in the
sorted list L1.
 Step 4: Return the sortted list
 Step 5: Stop
Algorithm:
g In
nsertion-Sort
Pseudocode: Insertion sort
Describes the high
g level steps
p of algorithm
g in a
much realistic manner
Time complexityy of Insertion sort
• C
Consider
id a situation
it ti wh here we callll insert
i t and
d the
th
value being inserted intto a subarray is less than
every element
l t in
i the
th subarray.
• For example, if we'rre inserting g 0 into the
subarray [2, 3, 5, 7, 11
1], then every element in
the subarray has to slide over one position to
the right.
• So, in general, if we
we're
re inserting into a subarray
with k elements, all k might have to slide over by
one position.
p
Time complexityy of Insertion sort
• R
Rather
th than
th counting
ti exactlytl how
h many lines
li off
code we need to test an
a element against a key
and
d slide
lid the
th element
l tt, let's
l t' agree that
th t it's
it' a
constant number of lines; let's call that
constant c.
c
• Therefore, it could takee up to kc lines to insert
into a subarray of k elem
ments.
• Suppose that upon every call to insert, the value
being inserted is less th
han every element in the
subarrayy to its left.
Time complexityy of Insertion sort
• Wh
When we callll insert
i t theth first
fi t time,
ti k 1 Th
k=1. The
second time, k=2. The third t time, k=3 and so on,
up through
th h th
the llastt titimee, when
h k = n-1. 1
• Therefore, the total time
e spent inserting
g into
sorted subarrays is
cc*1
1 + c*2
c 2 + c*3
c 3 + … + c*(
c ((n-1)
(n 1)
= c * (1 + 2 + 3 + … + (n--1)
Time complexity of Insertion sort
• As I know the formula fo
or adding 1,2,3…n
1 2 3 n is given
by n(n+1)/2
• C
Comparingi to above
b f
formula
l if we want to
1 , using the above formula
calculate sum up to n-1
we gett n-1(n-1+1)/2
1( 1 1)/2
• Time = c * [[n * ((n-1)) / 2]] = c * [(
[(n^2 – n)) / 2]]
• cn ^ 2 / 2 − cn / 2
• U
Using
i bi Θ notation,
big-Θ t ti we discard
di d the
th low-order
l d
term cn/2, and the co onstant factors c and 1/2,
getting the running
r nning time
e of insertion sort Θ(n^2)
Need forr Analysis
 How to choose a better algorithm for a
particular problem
 A set of numbers can be sorted using
g different
algorithms.
 Number of comparissons performed by one
algorithm may vary with
w others for the same
input.
input
 Hence, time complex
xity of those algorithms
may differ.
diff
 Also memory space reequired by each algorithm
is to be calculated
Need for Analysis:
Merge vs Quick
Q sort
Merge sort is more effficient and works faster
than quick sort in cas
se of larger array size or
datasets.
datasets
Quick sort is more effficient and works faster
th merge sortt in
than i cas
se off smaller
ll array size
i or
datasets.
A recursive divide-and-co
onquer sorting algorithms
would be the merge so ort.
Compare the merge sort with another recursive
sort solution, the quick sort
Need for Analysis:
A
Bubble sort vs
v merge sort
Consider
C id space complex
l xity
it as the
th program may
run on a system where memory is limited but
adequate space is avaailable or may be vice-
vice
versa
Bubble sort does not reqquire additional memory,
but merge sort requires additional space
Though time complexity of bubble sort is higher
p
compared to merge
g sorrt,, we may y need to apply
pp y
bubble sort if the program needs to run in an
environment, where memory y is very
y limited
Need for Analysis
By considering
B id i an alglgorithm
ith f
for a specific
ifi
problem, we can begin to develop pattern
recognition
iti so that
th t simi
i ilar
l types
t off problems
bl can
be solved by the help off this algorithm
Main concern of analys sis of algorithms is the
required time or perform
mance
Need for Analysis: Worst,
Worst best,
best avg
Worst-case
W t − The
Th max ximum
i number
b off steps
t
taken on any instance of
o size a.
Best-case − The minimum number of steps taken
on anyy instance of size a.
Average case − An av verage number of steps
taken on any instance of
o size a.
a
Amortized − A sequence e of operations applied to
the input of size a avera
aged over time.
time
Need for Analysis: Worst,
Worst best,
best avg
• When we say y that an algorithm
g runs in
time T(n), we mean tha at T(n) is an upper bound
on the running time tha at holds for all inputs of
size n. This is called wo
orst-case analysis.
• The algorithm may verry well take less time on
some inputs of size n, but
b it doesn't matter.
• If an algorithm takes T((n)=c
(n)=c*nn2+k steps on only
a single input of each siize n and only n steps on
the rest,
rest still it is a quadratic algorithm.
algorithm
Need for Analysis: Worst,
Worst best,
best avg
• A popular alternative to worst-case analysis
y
is average-case analysiis.
• Here we do not bound d the worst case running
time, but try to calculate
e the expected time spent
on a randomly chosen in nput.
• This kind of analysis is generally harder, since it
involves probabilistic arguments and often
requires assumptions abouta the distribution of
inputs that may be difficcult to justify.
justify
Need for Analysis: Worst,
Worst best,
best avg
• On the other hand, it can be more useful
because sometimes the e worst-case behavior of
an algorithm is misleadingly bad.
• A good example of this s is the popular quicksort
algorithm, whose worst
worst--case
case running time on an
input sequence of le ength n is proportional
to n2 but whose exp pected running
p g time is
proportional to n log n.
Need for Analysis: Worst,
Worst best,
best avg
• The Best Case analysis
y s is bogus.
g Guaranteeing g
a lower bound on an alg
gorithm doesn’t provide
any information
• as in the worst case, an
n algorithm may take
years to run.
• For some algorithms, all the cases are
asymptotically same
same, ii.e
e
e., there are no worst and
best cases.
• For example, Merge Sort does Θ(nLogn)
operations in all cases.
Need for Analysis: Worst,
Worst best,
best avg
• Most of the other sortin
ng
g algorithms
g have worst
and best cases.
• For example,
example in the ty ypical implementation of
Quick Sort (when pivot is
i the corner element),
• the worst occurs when the
t input array is already
sorted and the best occur when the pivot
elements always divide array in two halves.
halves
• For insertion sort, the worst
w case occurs when
the array is reverse sortted
• the best case occurs wh
hen the array
y is sorted in
the same order as outpu
ut.
Theoretical analyysis of algorithms
 The term
Th t "
"analysis
l i off algorithms"
al ith " was coined
i d by
b
Donald Knuth
 Estimate of complexity in the asymptotic sense
 Asymptotic: (of a function) approaching a given
value as an expressio on containing a variable
tends to infinity.
infinity
Asymptotic
c meaning
Suppose that
S th t we are inte
i terested
t d in
i the
th properties
ti
of a function f(n) as n beecomes very large.
If f(n) = n2 + 3n, then as n becomes very large, the
term 3n becomes insign gnificant compared to n2.
The function f(n) is saiid to be "asymptotically
equivalent to n2, as n → ∞".
This is often written sy ymbolically as f(n) ~ n2,
which is read as "f(n) is asymptotic to n2"
Asymptotic
c Analysis
The asymptotic
Th t ti behavior
b h i r off a function
f ti f( ) (such
f(n) ( h
as f(n)=c*n or f(n)=c*n2, etc.) refers to the
growth
th off f(n)
f( ) as n gets
ts large.
l
Ignore
g small values of n
Estimate how slow the program
p will be on large
inputs
A good rule of thumb is that the slower the
asymptotic growth rate, rate the better the
algorithm. Though it’s not always true.
Asymptotic
c Analysis
• A linear algorithm
g ((f(n)=d*n
( ) n+k)) is always
y asymptotically
y p y
better than a quadratic one e (f(n)=c*n2+q).
• That is because for any a y given
g (positive)) c,k,d,
(p , , ,
and q there is always som
me n at which the magnitude
of c*n2+q overtakes d*n+k..
• For moderate values of n, the quadratic algorithm
could very well take less time than the linear one, if c is
significantly
i ifi tl smaller
ll th
than d and/or
d/ k isi significantly
i ifi tl
smaller than q.
• However, the linear algorithm will always be better for
sufficiently large inputss. Remember to THINK
BIG when working with asyymptotic rates of growth.
growth
Asymptotic
c Notations
Execution
E ti ti
time off an alg
lgorithm
ith depends
d d on theth
instruction set, processo or speed, disk I/O speed,
etc.
t Hence,
H we estima
ti ate
t the
th efficiency
ffi i off an
algorithm asymptotically y.
Time function of an algorithm
a is represented
by T(n), where n is the input size.
Different types of asymptootic notations are used to
represent the complexitty of an algorithm.
Asymptotic
c Notations
Following
F ll i asymptotict ti nota
tations
ti are used
d tto
calculate the running tim
me complexity of an
algorithm.
l ith
O − Big
g Oh
Ω − Big omega
θ − Big
Bi th
theta
t
o − Little Oh
ω − Little omega
Algorithm efficiency:
Insertion sort vs Merge sort
• Diff
Differentt algorithms
l ith devised
de i d to
t solve
l the same
th
problem often differ dramatically in their
efficiency.
ffi i
• These differences can be
b much more significant
g
than differences due to hardware and software
• Time complexity of inse
ertion sort ?
• Time complexity of merrge sort ?
Algorithm efficien
ncy: insertion sort
vs merg ge sort
• insertion
i ti sort’s
t’ running
i time
ti : c1n
1 * n and
d
g sort’s running
• merge g tim
me : c2n * lg
gn
• insertion sort has a fa
actor of ‘n’ in its running
time merge sort has a factor of ‘lg
time, lg n
n’, which is
much smaller
• For
F example,
l when
h n = 1000
1000,
g n’ is approximately
‘lg pp y 10, ((2^10 = 1024))
• and when n equals one million,
‘l n’’ is
‘lg i approximately
i t l only
o l 20 (2^20 = 10 L)
Algorithm efficien
ncy: insertion sort
vs merg ge sort
• Log
L base
b 2 also
2, l known
k n as the
th binary
bi l
logarithm
ith
• The binary
y logarithm
g off x is the p
power to which
the number 2 must be b raised to obtain the
value x.
• For example,
bi
binary l
logarithm
ith off 1 is
i 0,
y logarithm
binary g of 2 is 1 and
binary logarithm of 4 is 2
Algorithm efficiency:
insertion sort vs
v merge sort
• Alth
Although
h insertion
i ti sortt usually
ll runs faster
f t than
th
merge sort for small inp
put sizes,
• once the input size n becomes large enough,
g sort’s advantage
merge ge of lg
g n vs n will more
than compensate for th
he difference in constant
factors.
• No matter how much sm maller c1 is than c2, there
will always be a crosso
over point beyond which
merge sort is faster.
Algorithm efficiency:
insertion sort vs
v merge sort
• Faster
F t computer
t A runn
ning
i insertion
i ti sortt
• Slower computer
p B runn
ning
g merge
g sort
• They each must sort an array of 10 million
numbers
• if the numbers are eigh
ht-byte integers, then the
i
inputt occupies
i about
b t ???,
• Does it fit in the memorry
y of even an inexpensive
p
laptop computer??
Algorithm efficiency:
insertion sort vs
v merge sort
• Input
I t occupies
i about
b t 80
0 megabytes,
b t
• which fits in the memorry
y of even an inexpensive
p
laptop computer
Algorithm efficiency:
insertion sort vs
v merge sort
• S
Suppose thatth t compute
ter A executes t 10 billion
billi
instructions per second d (faster than any single
sequential
ti l computer
t att the
th time
ti off this
thi writing)
iti )
• and computer B exe ecutes only
y 10 million
instructions per second
• so that computer A isi ?? times faster than
computer B in raw comp
puting power
• [Note: 1 million = 10 L,
L
1 billion = 1000 millions]
m
Algorithm efficiency:
insertion sort vs
v merge sort
Computer
C t A is
i 1000 time
ti es faster
f t than
th computer
t B
in raw computing powerr.
Algorithm efficiency:
insertion sort vs
v merge sort
• a CPU withith a clock
l k spe
eedd off 2 gigahertz
i h t (GHz)
(GH )
can carry out two th housand million (or two
billi ) cycles
billion) l per seconnd
d
• This laptop = 1.5 GHz
Algorithm efficiency:
insertion sort vs
v merge sort
Suppose that the world’s
s craftiest programmer
codes insertion sort in machine language
g g for
computer A, and the resulting code requires
2n*n instructions to sorrt n numbers.
Suppose further that justt an average programmer
implements merge sort,s
sort using a high
high-level
level
language with an ineffficient compiler, with the
resulting code taking 50
0n lg n instructions
Which algorithm is efficie
ent? Justify .
Algorithm efficiency:
insertion sort vs
v merge sort
1) Insertion sort on comp
pA
Tot numbers to sort = n = 10 million = 100 L =
10^7
Time = 2n
2n*n
n
Numerator = 2* (10^7)^2 instructions
computer A speed: 10 billion instructions/s
Denominator = 10
10,000
000 m
million
= 10^10 instructions/s
Algorithm efficiency:
insertion sort vs
v merge sort
1) Insertion sort on comp
pA
N/D = 2* ((10^7)^2
) / 10^10 = 20000 sec
= 5.5 hrs
Algorithm efficiency:
insertion sort vs
v merge sort
2) Merge sort on comp B
Tot numbers to sort = n = 10 million = 100 L =
10^7
Time = 50n lg n
Numerator = 50* (10^7)^ lg (10^7) instructions
Computer B speed: 10 million
m instructions /s
Denominator = 10
10^7
7 ins
structions/s
N/D = 50* 23 = 1150 sec
c (< 20 min)
Algorithm efficiency:
insertion sort vs
v merge sort
• By using an algorithm whose
w running time grows
more slowly, even with a poor compiler,
computer
t B runs more than
th 17 times
ti f t than
faster th
computer A!
• The advantage of merge sort is even more
pronounced when we sort
s 100 million numbers
• where insertion sort ta
akes more than 23 days,
merge sort takes underr four hours
• In general, as the prooblem size increases, so
does the relative advan
ntage of merge sort
Asymptotic
c Analysis
• Let f(x) = 6x4 − 2x3 + 5
p y this function using
• To simplify g O notation,, to
describe its growth rate
e as x approaches infinity.
• This function is the sum of three
terms: 6x4, −2x3, and 5.
5
• Of these
th th
three t
terms, th one with
the ith the
th highest
hi h t
growth rate is the one with the largest exponent
as a function
f ti off x, namelyl 6x
6 4.
Asymptotic
c Analysis
• Now one may apply the t second rule: 6x4 is a
product of 6 and x4 in which
w the first factor does
nott depend
d d on x.
• Omitting
g this factor results in the simplified
p
form x4.
• Thus,
Thus we say that f(x) is a "big
big O
O" of x4.
• Mathematically, we can
n write f(x) = O(x4).
Asymptotic
c Analysis
• The study of change e in performance of the
algorithm with the cha ange in the order of the
input size is defined ass asymptotic analysis
• Asymptotic notations are the mathematical
notations
t ti usedd to
t des
d cribeib the
th running
i ti
time off
an algorithm when the input tends towards a
particular value or a lim
miting value.
value
• For example: In bubb ble sort, when the input
array is already sorted d, the time taken by the
algorithm is linear i.e. the best case.
Asymptotic
c Analysis
• But, when the input arrray is in reverse condition,
the algorithm takes s the maximum time
(
(quadratic)
d ti ) to
t sortt the
the elements
l t i.e.
i th worstt
the
case.
• When the input array y is neither sorted nor in
reverse order, then it ta
akes average time. These
durations are deno oted using asymptotic
notations.
Asymptotic
c Analysis
• Suppose that an algorrithm, running on an input
of size n, takes 6n^2 + 100n + 300 machine
i t ti
instructions.
• The 6n^2 term bec comes larger
g than the
remaining terms, 100 n & 300, once n becomes
large enough, 20 in this
s case.
• Here's a chart showing values of 6n^2+100n +
300 for values of n from
m 0 to 100
Asymptotic
c Analysis
Asymptotic
c Analysis
• We would say that the t running time of this
algorithm grows as n^22, dropping the coefficient
6 and
d the
th remaining
i i ter
t rms 100n
100 + 300.
300
y matter what coefficients we use;;
• It doesn't really
as long as the running time is an^2 + bn + c, for
some numbers a > 0, b, b and c,
there will always be b a value of n for
which anan^2
2 is greateer than bn + c and this
difference increases as
s n increases.
Asymptotic
c Analysis
• For example, here's a chart showing values
of 0.6n^2+1000n+3000 0 so that we've reduced
th coefficient
the ffi i t off n^2
^2
2 by
b a factor
f t off 10 and d
increased the other two constants by a factor of
10
• The value of n at which
h 0.6n^2 becomes greater
than 1000n + 3000 has s increased, but there will
always be such a crosssover point, no matter
what the constants.
Asymptotic
c Analysis
Asymptotic
c Analysis
• By dropping the less significant terms and the
constant coefficients, we can focus on the
i
important
t t partt off an algorithm's
al ith ' runningi ti
time—
its rate of growth—w without getting mired in
d t il that
details th t complicate
li t our understanding.
d t di
• When we drop the con nstant coefficients and the
less significant terms
s, we use asymptotic
notation.
Asymptotic
c Analysis
• By definition, f(n) is O(g
g(n)) if:
There exists constants k, N where k > 0, such
th t ffor allll n > N
that N:
• f(n)
( ) <= k * g(
g(n))
• So to prove that f(x) = 4x^2
4 - 5x + 3 is O(x^2) we
need to show that:
• There exists constants k, N where k > 0, such
th t ffor allll x > N:
that N
• f(x) <= k * g(x)
• 4x^2 - 5x + 3 <= k * x^2
2
Asymptotic
c Analysis
• The way we show that is by finding some k and
some N that will work.
• The basic strategy is:
- break up p f(x)
( ) into term
ms
- for each term find som me term with a coefficient
* x^2 that is clearly equ ual to or larger than it
- this will show that f(x)) <= the sum of the larger
x^2 terms
- the coefficient for the sum of the x^2 terms will
be our k
Asymptotic
c Analysis
• Explanation of providedd proof:
f(x) = 4x^2 - 5x + 3
a number
b iis always
l <= its
it absolute
b l t value
l
• e.g.
g -1 <= | -1 | and 2 <=
< |2|
so we can say that:
• f(x) <= | f(x) |
• f(x) <= |f(x)| = |4x^2 – 5x
5 + 3|
• 4x^2 + 3 will always be
e positive, but -5x will be
negative for x > 0
Asymptotic
c Analysis
• so we know that -5x is <= | - 5 x |, so we can say
that:
• f(x) <= |4x^2|+ |- 5x| + |3|
• For x > 0 |4x^2|+
|4x 2|+ ||- 5x| + |3| = 4x
4x^2
2 + 5x + 3
3,
• so we can say that:
• f(x) <= 4x^2 + 5x + 3, fo
or all x > 0
• Suppose x > 11. Multiplyy both sides by x to show
that x^2 > x
Asymptotic
c Analysis
• So we can say x <= x^2
2.
• This let's us replace
p ea
ach of our x terms with x^2
• so we can say that:
• f(x) <= 4x^2 + 5x^2 + 3x^2,
3 for all x > 1
• 4x^2 + 5x^2 + 3x^2 = 12x^2 so we can say
y that:
• f(x) <= 12x^2, for all x > 1
• So our k= 12 and since
e we had to assume x > 1
we pick N = 1
Quick sort : Worst case
Quick sort : Avg.
Avg case
Quick sort : Avg.
Avg case
https://www.tutorialspoint
htt // t t i l i t.com/design_and_analys
/d i d l
is_of_algorithms/designn_and_analysis_of_algori
th
thms_asymptotic_notat
t ti t tions_apriori.htm
i i i ht

You might also like