You are on page 1of 83

Data structure and

algorithm
Fundamental Concepts

1
Chapter 1

Fundamental
CONTENT
1.1. Algorithm and Complexity

1.2. Beginning example

1.3. Asymptotic Symbols

1.4. Pseudo Code

1.5. Algorithm analysis techniques


What is problem solving?

• Problem solving

• The process of posing a problem and developing a computer


program to solve the problem

• Problem solution includes:


• Algorithms: sequence of steps that need to be taken to
produce the output of the problem from input data in a finite
time.

• Data structure:

• How to organize input and output data storage


Algorithm notation
Definition: A deterministic procedure consists of a finite
sequence of steps that must be performed to obtain an
output for a given input of the problem.

For example:
• Cooking instructions
• Instructions for installing a device
• The rules make a game
• Directions from A to B
• Motorcycle repair instructions
• etc.

5
Algorithm notation
• Algorithm has the following characteristics:

• Input

• Output

• Precision

• Finiteness

• Uniqueness

• Generality
Data structure
• Set of data
• Has related relation in a predetermined problem

• Represent data structure in memory:


• Internal storage
• External storage

• Choosing the appropriate data structure and


algorithm: very important

Data structure + Algorithm = Program

7
Algorithm’s Complexity
• Evaluating the computational complexity of an algorithm
is to evaluate the amount of resources of all kinds that
the algorithm requires.
• Commonly used resources:
• time
• Memory
• bandwidth

• Interested in evaluating the time needed to execute the


algorithm (calculation time of the algorithm).
Algorithm computation time
Factors affecting calculation time:
-Computer
-Compiler
-Algorithm used
-Input data of the algorithm
• The value of the data affects the calculation time
• Usually, the size of the input data is the main factor
that determines the computation time
• For example: with sorting problem → number of
sorting elements
• For example: matrix multiplication problem → total
number of elements of 2 matrices
Basic operation
• Definition. We call a basic operation an operation
that can be performed with time bounded by a
constant that does not depend on the data size.

• To calculate the algorithm’s computation time: count the


number of basic operations that must be performed.
(assignment, comparison, arithmetic operations, etc.)
Types of computation time
For a given input data of size n,
• Best computation time.
• Minimum time required to execute the algorithm with any input
data set of size n.

• Worst computation time.


• Maximum time needed to execute the algorithm with any input
data set of size n.

• Average computation time.


• Average time needed to execute the algorithm on a finite set of
inputs of size n.
CONTENT
1.1. Algorithm and Complexity
1.2. Beginning example
1.3. Asymptotic Symbols
1.4. Pseudo Code
1.5. Algorithm analysis techniques
Beginning example
• Problem of finding the largest subsequence:
• Give a series of numbers
a1 , a2 , … , an
• The sequence of numbers ai, ai+1 , …, aj with 1 ≤ i ≤ j ≤ n is
called a subsequence of the given sequence and ∑jk=i ak is called the
weight of this subsequence
• The problem is: Find the maximum weight of the subsequences,
that is, find the maximum value ∑jk=i ak .
• For simplicity, call the subsequence with the largest weight the
largest subsequence.
• Ex: The given sequence is -2, 11, -4, 13, -5, 2 . Largest weight?

→ the answer is 20 (weight of the subsequence 11, -4, 13)


Direct algorithm

• Traverse all possible subsequences

ai, ai+1 , …, aj with 1 ≤ i ≤ j ≤ n

calculate the sum of each subsequence to find the largest


number.

• The total number of possible subsequences of the given


sequence is

C(n,2) + n = n2/2 + n/2 .


Direct algorithm
• Implementation
int maxSum = 0;
for (int i = 0; i < n; i++) {
for (int j = i; j < n; j++) {
int sum = 0;
for (int k = i; k <= j; k++)
sum += a[k];
if (sum > maxSum)
maxSum = sum;
}
}
Direct algorithm
• Algorithm analysis: Calculate the number of additions
that must be performed, i.e. count how many time the
lines of code Sum += a[k]
must be executed.
• Number of additions:
n −1 n −1 n −1 n −1
(n − i )(n − i + 1)

i = 0 j =i
( j − i + 1) = (1 + 2 + ... + (n − i )) = 
i =0 i =0 2
1 n 1  n 2 n  1  n(n + 1)(2n + 1) n(n + 1) 
=  k (k + 1) =   k +  k  =  +
2 k =1 2  k =1 k =1  2 6 2 
n3 n 2 n
= + +
6 2 3
Faster algorithm

• Note that the sum of the terms i to j can be obtained from


the sum of the terms i to j-1 by one addition,

• We have: j j −1

 a[k ] = a[ j ] +  a[k ]
k =i k =i

• This note allows the innermost “for” loop to be removed.


Faster algorithm (cont..1)

• Implementation:

int maxSum = a[0];


for (int i=0; i<n; i++) {
int sum = 0;
for (int j=i; j<n; j++) {
sum += a[j];
if (sum > maxSum)
maxSum = sum;
}
}
Faster algorithm (cont..2)

• Algorithm analysis. Calculate the number of additions:

n −1
n2 n

i =0
(n − i ) = n + (n − 1) + ... + 1 = +
2 2
• This number is exactly equal to the number of
subsequences.
• => It seems that the resulting algorithm is very good,
because each subsequence must be considered exactly
once.
Recursive algorithm

• Better algorithm!

• Divide and conquer technique:


• Divide the problem to be solved into sub-
problems of the same form

• Solve each subproblem recursively

• Combine the solutions of the subproblems to


obtain the solution of the original problem.
Recursive algorithm
• Apply this technique to the problem of finding the
maximum weight of subsequences:

• Divide the given range into 2 ranges using the


middle element
• and obtain 2 sequences of numbers with the length
reduced by half
• (referred to as the left and right subsequence).
Recursive algorithm
• To combine the solution, realize that only 1 of 3 cases can occur
respectively when the largest subsequence is located at:
• Left subsequence (left half)
• Right subsequence (right half)
• Start in the left half and end in the right half (middle).
• Denotes the weight of the largest subsequence in
• The left half is wL
• right half is wR
• in the middle is wM
• The weight to find will be max(wL, wR, wM).
Recursive algorithm
• Finding the weight of the largest subsequence in the left
half (wL) and right half (wR) can be done recursively.
• Find the weight wM of the largest subsequence starting
in the left half and ending in the right half:
• Calculate the weight of the largest subsequence in the left half
ending at the division point (wML) and
• Calculate the weight of the largest subsequence in the right
half starting at the division point (wMR).
• Then wM = wML + wMR.
Recursive algorithm
• m – division point of the left sequence, m+1 is the
division point of the right sequence

a1, a2,…,am, am+1, am+2,…,an

Calculate the WML of the Calculate the WMR of the


largest subsequence in the largest subsequence in the
left half ending at am right half starting from am+1
Recursive algorithm

• Calculate the weight of the largest subsequence in the


left half (from a[i] to a[j]) ending at a[j]:

MaxLeft(a, i, j);
{
maxSum = -; sum = 0;
for (int k=j; k>=i; k--) {
sum = sum+a[k];
maxSum = max(sum, maxSum);
}
return maxSum;
}
Recursive algorithm
• Calculate the weight of the largest subsequence in the
right half (from a[i] to a[j]) starting from a[i] :

MaxRight(a, i, j);
{
maxSum = -; sum = 0;
for (int k=i; k<=j; k++){
sum = sum+a[k];
maxSum = max(sum, maxSum);
}
return maxSum;
}
Recursive algorithm

recursive algorithm’s diagram can be described as follows:


MaxSub(a, i, j);
{
if (i == j) return a[i]
else
{ m = (i+j)/2;
wL = MaxSub(a, i, m);
wR = MaxSub(a, m+1, j);
wM = MaxLeft(a, i, m)+
MaxRight(a, m+1, j);
return max(wL, wR, wM);
}
}
Recursive algorithm
• Algorithm analysis:
• MaxLeft and MaxRight required
n/2 + n/2 = n additions
• Therefore, if we call T(n) the number of additions to
find -> recursive formula:

0 n =1

T ( n) =  n n n
T ( 2 ) + T ( 2 ) + n = 2T ( 2 ) + n n 1
Recursive algorithm

• We confirm that T(2k) = k.2k. We prove it by induction


• Inductive basis: If k=0 then T(20) = T(1) = 0 = 0.20.
• Inductive transfer: If k>0, suppose T(2k-1) = (k-1)2k-1 is
correct. Then
T(2k) = 2T(2k-1)+2k = 2(k-1).2k-1 + 2k = k.2k.
• Returning to the notation n, we have
T(n) = n log n .
• The results obtained are better than the second
algorithm!
Calculation time
Calculation time
Time conversion table

• The following table is used to calculate execution time


Dynamic Programming algorithm

Algorithm development based on DP includes 3 stages:


1. Decomposition: Divide the problem to be solved into
smaller sub-problems of the same form as the original
problem.
2. Record solutions: Store the solutions of subproblems in a
table.
3. Synthesize solutions: In turn, from the solutions of
smaller sized sub-problems, build the solution of the
larger sized problem, until the solution of the starting
problem (which is the sub-problem has the largest size)
is obtained.
Dynamic Programming algorithm

• Decomposition: Let si be the weight of the largest


subsequence in the sequence a1, a2, ..., ai , i = 1, 2, ..., n.
Obviously sn is the value to find.
• Synthesize solutions.
• We have
s 1 = a1 .
• Suppose i > 1 and sk is known with k = 1, 2, ..., i-1.
need to calculate si as the weight of the largest subsequence of the
sequence
a1, a2, ..., ai-1, ai .
Dynamic Programming algorithm
• Since the largest subsequence of this sequence either contains the element ai or
does not contain the element ai, it can only be 1 of 2 sequences:
• largest subsequence of a1, a2, ..., ai-1
• largest subsequence of a1, a2, ..., ai which end at ai.
• Thence inferred
si = max {si-1, ei}, i = 2, …, n.
where ei is the weight of the largest subsequence of a1, a2, ..., ai end at ai.
• Calculate ei, use the following recursive formula:
• e1 = a1;
• ei = max {ai, ei-1 + ai}, i = 2, ..., n.
MaxSub(a); {
giải thuật QHĐ
smax = a[1]; (* smax – weight of the largest subsequence*)
maxendhere = a[1]; (* maxendhere –weight of the largest subsequence ends at a[i] *)
imax = 1; (* imax – The end position of the largest subsequence*)
for i = 2 to n {
u = maxendhere + a[i];
v = a[i];
if (u > v) maxendhere = u
else maxendhere = v;
if (maxendhere > smax)then {
smax := maxendhere;
imax := i;
}
}
}
It is easy to see the number of addition operations that must be performed in the
algorithm (number of times the statement u = maxendhere + a[i]; is executed) is n.
CONTENT
1.1. Algorithm and Complexity
1.2. Beginning example
1.3. Asymptotic Symbols
1.4. Pseudo Code
1.5. Algorithm analysis techniques
Asymptotic Notation

Q, W, O

• Used to describe the calculation time of the algorithm


• Instead of saying the exact calculation time, we say Q(n2)
• Defined for functions that take non-negative integer values
• Used to compare the increasing speed of two functions
Q symbol
For the function g(n), the symbol Q(g(n)) is the set of functions
Q(g(n)) = {f(n): there exist constants c1, c2 and n0 so that
0  c1g(n)  f(n)  c2g(n), with every n  n0 }

g(n) is the asymptotically correct estimate for f(n)


Example
10n2 - 3n = Q(n2) ?

• For what values of the constants n0, c1, and c2 is the


inequality in the definition true?

• Taking c1 as smaller than the coefficient of the term with


the highest exponent, and taking c2 as larger, we have
n2 ≤ 10n2 – 3n ≤ 11n2, with every n ≥ 1.

• For polynomial functions: To compare the growth rate,


you need to look at the term with the highest exponent
Symbol O (pronounced big O)
For a given function g(n), we denote O(g(n)) as the set of functions
O(g(n)) = {f(n): there exist constants c and n0 so that :
f(n)  cg(n) with every n  n0 }

We say g(n) is the asymptotic upper bound of f(n).


Symbol W
For a given function g(n), we denote W(g(n)) as the set of functions
W(g(n)) = {f(n): there exist constants c and n0 so that :
f(n) ≥ cg(n) with every n  n0 }

We say g(n) is the asymptotic lower bound of f(n).


Relationship between Q, W, O

For any two functions g(n) and f(n),


f(n) = Q(g(n))
If and only if
f(n) = O(g(n)) and f(n) = W(g(n))
It mean
Q(g(n)) = O(g(n))  W(g(n))
how to use these symbols
• Say “the running time for this algorithm is O(f(n))”
means: The worst-case running time is O(f(n)).

• “the running time is W(f(n))”: The best case running


time is W(f(n)).
Asymptotic notation in equalities

• Used to replace expressions containing operands


with slow increments
• Example
4n3 + 3n2 + 2n + 1 = 4n3 + 3n2 + Q(n)
= 4n3 + Q(n2) = Q(n3)
• In the equations, Q(f(n)) replaces a certain function
g(n)  Q(f(n))
• In above example, Q(n2) replace for 3n2 + 2n + 1
Graphs of some basic functions
Similarities between comparing functions and comparing
numbers

fg  ab

f (n) = O(g(n))  a  b

f (n) = W(g(n))  a  b

f (n) = Q(g(n))  a = b
Properties
• Transitivity
f(n) = Q(g(n)) & g(n) = Q(h(n))  f(n) = Q(h(n))
f(n) = O(g(n)) & g(n) = O(h(n))  f(n) = O(h(n))
f(n) = W(g(n)) & g(n) = W(h(n))  f(n) = W(h(n))
• Symmetry
f(n) = Q(g(n)) if and only if g(n) = Q(f(n))
• Transpose Symmetry
f(n) = O(g(n)) if and only if g(n) = W(f(n))
Example
A B
• 5n2 + 100n 3n2 + 2

• log3(n2) log2(n3)
Recall some logarithmic functions
x a = b  log b = a
x

log ab = log a + log b


log m b
log a b =
log m a
log a b = b log a
alogn = nloga
logb a = (log a)b  log ab
d ln x = 1
dx x
N vs lognlogn
Example
A B
• 5n2 + 100n 3n2 + 2 A  Q(B)
A  Q(n2), n2  Q(B)  A  Q(B)

• log3(n2) log2(n3) A  Q(B)


logba = logca / logcb;
A = 2lgn / lg3 B = 3lgn,
A/B =2/(3lg3)
Example

• 2n2 = O(n3): 2n2 ≤ cn3  2 ≤ cn  c = 1 and n0= 2

• n2 = O(n2): n2 ≤ cn2  c ≥ 1  c = 1 and n0= 1

• 1000n2+1000n = O(n2):

1000n2+1000n ≤ cn2  c=1001 and n0 = 1000

• n = O(n2): n ≤ cn2  cn ≥ 1  c = 1 and n0= 1


Example
• 5n2 = W(n)
 c, n0 such that: 0  cn  5n2  cn  5n2  c = 1 and n0 = 1

• 100n + 5 ≠ W(n2)

Assume:  c, n0 such that: 0  cn2  100n + 5.


We have: 100n + 5  100n + 5n ( n  1) = 105n
Inferred: cn2  105n  n(cn – 105)  0
Because n is positive  cn – 105  0  n  105/c
absurd: n cannot be smaller than a constant
• n = W(2n), n3 = W(n2), n = W(logn)
Attention
• The values of n0 and c are not unique in the proof of the asymptotic formula

• Prove that100n + 5 = O(n2)

• 100n + 5 ≤ 100n + n = 101n ≤ 101n2 for every n ≥ 5

n0 = 5 and c = 101 are the constants to find

• 100n + 5 ≤ 100n + 5n = 105n ≤ 105n2 for every n ≥ 1

n0 = 1 and c = 105 are also the constants to find

Just find certain constants c and n0 that satisfy the


inequality in the definition of the asymptotic formula
Some special algorithm classes

• O(1): constant
• O(log n): logarithmic
• O(n): linear
• O(n log n): superlinear
• O(n2): quadratic
• O(n3): cubic
• O(an): exponential (a > 1)
• O(nk): polynomial (k ≥ 1)
CONTENT
1.1. Algorithm and Complexity
1.2. Beginning example
1.3. Asymptotic Symbols
1.4. Pseudo Code
1.5. Algorithm analysis techniques
Algorithm description: pseudo-language

• Use certain programming language to describe the


algorithm → can make the algorithm description
complicated and difficult to grasp.
• →use:
• Block diagram
• pseudo language,
Block diagram
Control instructions can be:
-instruction block
-Conditional instruction Begin
-Repeat instruction
Input n

Begin or end
R=n%2

Assign instruction
N Y
R is 0 ?
Input/Output

Condition Odd Even

Continue the
command
Execution flow
End
Block of instructions

Syntax:
{ S1

S1;
S2; S2

S3;
} S3
Condition instruction
 Syntax
if(condition)
action false
condition

true

action
Condition instruction
 Syntax:
if (B) then
false true
S1; B

else
S2; S2 S1
Loop instruction:

Syntax:
while (B) do
S;
true
B S

false
For loop
 Syntax
for (initialization; condition; update)
action
initialization

false
condition

true
action

update
do-while loop
 Syntax
do action
while (condition)
action

true
condition

false
Algorithm description: pseudo-language

• pseudo language
• allows both describing algorithms in everyday language
and using command structures similar to those of
programming languages.
Algorithm description: pseudo-language

• Declare variable
integer x,y;
real u, v;
boolean a, b;
char c, d;
datatype x;
• Assignment statement
x = expression;
or
x ← expression;
Ex: x ← 1+4;
y = a*y+2;
Algorithm description: pseudo-language

• Control structure
• if condition then
sequence of instructions
else
sequence of instructions
endif;

while condition do
sequence of instructions
endwhile;
Algorithm description: pseudo-language

repeat
sequence of instructions; Case statement:
until condition; Case
cond1: stat1;
for i=n1 to n2 [step d] cond2: stat2;
sequence of instructions; .
.
endfor;
.
condn: stat n;
• Input-Output endcase;
read(X); /* X can be single variable or
array */
print(data) or print(message)
Algorithm description: pseudo-language

• Function and procedure


Function name(paramenters)
begin
variable declaration; Pass parameters:
statements in the body of the function;
• Values
return (value)
end; • Reference
• Local variable
Procedure name(paramenters) • Global variable
begin
variable declaration;
statements in the body of the procedure;
end;
Algorithm description: pseudo-language

• Example: Algorithm to find the largest element in an array A(1:n)


Function max(A(1:n))
begin
datatype x; /*to keep the maximum value found */
integer i;
x=A[1];
for i=2 to n do
if x < A[i] then
x=A[i];
endif
endfor ;
return (x);
end max;
Algorithm description: pseudo-language

• For example: the two-variable content swap algorithm


Procedure swap(x, y)
begin
temp=x;
x = y;
y = temp;
end swap;
Algorithm description: pseudo-language

• Example: Find a prime number greater than the positive


integer n.
• First, we build function Is_prime to check whether a
positive integer m is prime or not.
• Using this function, we build an algorithm to solve the
given problem.
• If m=a*b with 1 < a, b < m, then either of the factors a, b
will not exceed 𝑚.
→ m will be prime if it has no divisors among the
positive integers from 2 to 𝑚.
Algorithm description: pseudo-language

• Algorithm to check whether a positive integer is prime


or not.
• Input: Positive integer m.
• Output: true if m is prime, false otherwise.
function Is_prime(m);
begin
i =2;
while (i*i <= m) and (m mod i ≠ 0) do i=i+1;
Is_prime = i > sqrt(m);
End Is_Prime;
Algorithm description: pseudo-language

• Algorithm to find prime number greater than positive


integer n.
• The algorithm will use Is_prime as a subroutine.
• Input: Positive integer n.
• Output: m - prime number greater than n.
procedure Lagre_Prime(n);
begin
m = n+1;
while not Is_prime(m) do m=m+1;
end;
• Since the set of prime numbers is infinite, the Lagre_Prime
algorithm is finite.
CONTENT
1.1. Algorithm and Complexity
1.2. Beginning example
1.3. Asymptotic Symbols
1.4. Pseudo Code
1.5. Algorithm analysis techniques
Basic techniques for analyzing algorithm complexity

Sequential structure.
• Suppose P and Q are two segments of the algorithm,
• Can be a command but can also be a sub-algorithm.
• Time(P), Time(Q): calculation time of P and Q respectively.

• Sequence rule:Calculation time required by “P; Q”,


meaning P is executed first, followed by Q, which will be
Time(P; Q) = Time(P) + Time(Q) ,
or in Theta notation::
Time(P; Q) = max(Time(P), Time(Q).
For loop

for i =1 to m do P(i);

• Suppose the execution time P(i) is t(i)


• Then the for loop execution time will be
m

 t(i )
i =1
Typical statement
• Definition: A typical statement is a statement that is
executed at least as often as any other statement in
the algorithm.
• If we assume that the execution time of each
statement is bounded by a constant
• => To evaluate the calculation time, you can count
the number of times a typical command is executed
Example: FibIter

function Fibiter(n)
begin
i:=0; j:=1; Typical instruction
for k:=1 to n do
begin
j:= j+i; •The number of times a typical
i:= j-i; statement is executed is n.
end;
•Calculation time of the
Fibiter:= j; algorithm is O(n)
end;
Example: Algorithm1 of beginning
example
int maxSum =0;
for (int i=0; i<n; i++) {
for (int j=i; j<n; j++) {
int sum = 0;
for (int k=i; k<=j; k++)
sum += a[k];
if sum > maxSum
maxSum = sum;
}
}
Select the typical statement sum+=a[k].
=> Evaluate the calculation time of the algorithm as O(n3)
Questions?

You might also like