Professional Documents
Culture Documents
Chapter 1
Introduction to Data Structures and Algorithms
Jimma, Ethiopia.
Outline
Introduction to Data Structures
Abstract data Types
Algorithms
Properties of an algorithm
Algorithm analysis concepts
Complexity analysis
Asymptotic Analysis
2
Overviews of Data Structures
Data structures include arrays, linked lists, stacks, binary trees, and
hash tables.
Data structures are the building blocks of a program; here the selection
of a particular data structure will help the programmer to design more
efficient programs, as the complexity and volume of the problems
solved, by the computer is steadily increasing day by day:
With abstraction you create a well-defined entity that can be properly handled. These
entities define the data structure of the program.
An entity with the properties just described is called an Abstract Data Type (ADT).
6
Abstract Data Types (ADT)
There are lots of formalized and standard Abstract data types such as
Stacks, Queues, Trees, etc.
7
Cont...
The interface doesn’t give any specific details about how something
should be implemented or in what programming language.
It is considered as not only that but also the maintaining of the logical
9
Cont...
1. Primitive data structure
2. Non-primitive data structure
10
Cont...
Primitive data structure:
also known as basic data structures.
predefined types of data, which are supported by the programming
language.
Non-Primitive data structure:
are highly developed complex data structures.
are not defined by the programming language, but are instead
created by the programmer.
Are responsible for organizing the group of homogeneous and
heterogeneous data elements.
11
Cont...
Non-Primitive data structure:
The linear and non-linear data structure subclassification of Non-primitive
data structure.
Linear data structure arranges the data into a sequence and follow some sort of
order.
Non-linear data structure does not organize the data in a sequential manner.
12
Cont...
13
Cont...
14
Algorithm
15
Cont...
16
Properties of Algorithm
Definiteness: Each step must be clearly defined, having one and only
one interpretation. At each point in computation, one should be able to
tell exactly what happens next.
20
Cont...
There are many factors that affect the running time of a program.
Among these are the algorithm itself, the input data, and the
computer system used to run the program.
The performance of a computer is determined by
the hardware:
• processor used (type and speed),
• memory available (cache and RAM), and
• disk available;
the programming language in which the algorithm is specified;
the language compiler/interpreter used; and
the computer operating system software.
21
Cont...
Determine how running time increases as the size of the problem increases.
22
Complexity Analysis
Is the systematic study of the cost of computation, measured either
in time units
in operations performed
in the amount of storage space required
The goal is to have a meaningful measure that permits comparison of
algorithms independent of operating platform.
Complexity: A measure of the performance of an algorithm: An algorithm’s
performance depends on internal and external factors.
Internal External
The algorithm’s efficiency, in • Size of the input to the algorithm
terms of: • Speed of the computer on which it is
• Time required to run run
• Space (memory storage) • Quality of the compiler
required to run
23
Complexity Analysis
Two things to consider:
Time Complexity:
24
Complexity Analysis
Time Complexity:
The time complexity of an algorithm or a program is the amount of time it
needs to run to completion.
The exact time will depend on the implementation of the algorithm,
programming language, optimizing the capabilities of the compiler
used, the CPU speed, other hardware characteristics/specifications and
so on.
To measure the time complexity accurately, we have to count all
sorts of operations performed in an algorithm.
If we know the time for each one of the primitive operations
performed in a given computer, we can easily compute the time
taken by an algorithm to complete its execution.
• This time will vary from machine to machine.
25
Complexity Analysis
By analyzing an algorithm, it is hard to come out with an exact time
required.
To find out exact time complexity, we need to know the exact
instructions executed by the hardware and the time required for the
instruction.
The time complexity also depends on the amount of data inputted to
an algorithm.
But we can calculate the order of magnitude for the time
required.
26
Complexity Analysis
SPACE COMPLEXITY
Analysis of space complexity of an algorithm or program is the amount of
memory it needs to run to completion.
Some of the reasons for studying space complexity are:
If the program is to run on multi user system, it may be required to
specify the amount of memory to be allocated to the program.
We may be interested to know in advance that whether sufficient
memory is available to run the program.
There may be several possible solutions with different space
requirements.
Can be used to estimate the size of the largest problem that a program
can solve.
27
Complexity Analysis
When we analyze an algorithm it depends on the input data, there are three cases :
In the best case, the amount of time a program might be expected to take on best possible
input data.
In the average case, the amount of time a program might be expected to take on typical (or
average) input data.
In the worst case, the amount of time a program would take on the worst possible input
configuration.
In order to determine the running time of an algorithm it is possible to define three functions
29
Complexity Analysis
Worst case analysis
Assumes that data are arranged in the disadvantageous order.
It also assumes that the input size is infinite.
Eg.
For sorting – data are arranged in opposite required order
For searching – the required item is found at the end of the item or the
item is missing
It computes the upper bound of T(n) and causes maximum number of
execution.
Worst Case (Tworst): The amount of time the algorithm takes on the
worst possible set of inputs.
30
Complexity Analysis
Average case Analysis
Assumes that data are found in random order.
It also assumes random or average input size.
E.g.
For sorting – data are in random order.
For searching – the required item is found at any position or missing.
Average Case (Tavg): The amount of time the algorithm takes on an "average"
set of inputs.
It also causes average number of execution.
Best case and average case can not be used to estimate (determine) complexity
of algorithms
Worst case is the best to determine the complexity of algorithms.
31
Complexity Analysis
Benefits
Provides the short hand method, to find how much time algorithm takes
(approx)
32
Complexity Analysis
33
Complexity Analysis
Analysis Rules:
1. Execution of one of the following operations takes time 1:
• Assignment Operation
• Single Input/output Operation
• Single Boolean Operations
• Single Arithmetic Operations
• Function Return
2. Running time of a selection statement (if, switch) is the time for the condition
evaluation + the maximum of the running times for the individual clauses in the
selection
34
Complexity Analysis
3. Loops: Running time for a loop is equal to the running time for the statements
inside the loop * number of iterations
Remark:
The total running time of a statement inside a group of nested loops is
• The running time of the statements multiplied by the product of the
sizes of all the loops.
For nested loops, analyze inside out.
Always assume that the loop executes the maximum number of
iterations possible.
4. Running time of a function call is 1 for setup + the time for any parameter
calculations + the time required for the execution of the function body.
35
Example
37
Example
void func(){ Time Units to Compute
39
Example
40
Exercise
What is the Time Units to Compute ?
sum=0
k = 0
if(i==0){ for (int i = 0; i < n; i++)
sum = n(n+1)/2; for (int j = 0; j < n; j++)
k++;
cout<<sum;
}else{
for(i=1;i<=n;i++){
sum=sum+i; for (i=1; i<n;){
} i=i*2;
cout<<sum; }
41
Asymptotic Analysis
Using asymptotic analysis, we can very well conclude the best case,
42 average case, and worst case scenario of an algorithm.
Cont...
Then it should be a fairly simple matter to compare the two functions T A(n)
What exactly does it mean for one function, say TA(n), to be better than another function,
TB(n)?
For example, suppose the problem size is n0 and TA(n0) < TB(n0). Then clearly
We consider the asymptotic behavior of the two functions for very large problem sizes.
43
Cont...
If the size of the input is n then f(n) is the function of n denotes the time
complexity. f(n) represents the number of instruction executed for the input
values of n.
We can compare the data structure for a particular operation by comparing
their f(n) values.
We are interested in the growth rate of f(n) with respect to n because it might be
possible that for smaller input size, one data structure may seem better than the
other but for larger input size it may not.
This concepts is applicable in comparing the two algorithm as well.
Examining the exact running time is not the best solution to calculate the time
complexity
Measuring the actual running time is not practical at all
The running time generally depends on the size of the input
44
Cont.....................
Consider the two functions
f(n) = n2 and g(n) = n2 – 3n + 2
Around n = 0, they look very
different
The justification for both pairs of polynomials being similar is that, in both cases,
they each had the same leading term:
n6 in the first case, n6 in the second
Suppose however, that the coefficients of the leading terms were different. In
this case, both functions would exhibit the same rate of growth, however, one
46 would always be proportionally larger
Cont...
Example: f(n)=5n2 +6n +12, calculate running for n=1
% running time of 5n2 = , n 5n2 6n 12
% running time of 6n = 1 21.74 26.09 52.17
% running time of 12 = 10 87.41 10.49 2.09
100 98.79 1.19 0.02
1000 99.88 0.12 0.0002
There are five notations used to describe a running time function. These
are:
Formal Definition: f(n)= O(g(n)) if there exist c, k ∊ℛ+ such that for all n≥ k,
49 f(n) ≤ c.g (n)
Cont…
50
1 < logn << n < nlogn <<< … <<< …<
Cont...
f(n) = f(n)=8n+128. Show that f(n)=O(n2).
To show that f(n) is O(g(n)) we must show that constants c and k such that
=f(n) <=c.g(n2) for all n>=k
8n+128 <=1*n2 for all n>=k (Try c=1)
8n+128<=n2 0<=n2-8n-128 0<=(n+8)(n-16)
Since (n+8)>0 for all value of n>=0, we conclude that (n-16)>=0. that is n 0 =16
So, we have that for c=1 and no=16, f(n) <=c.g(n2) for all integer n>no. hence,
f(n)=O(n2).
Exercise
51
2. f(n) = 3n2 +4n+1. Show that f(n)=O(n2).
Cont...
An upper bound is the best algorithmic solution that has been found for
a problem.
52
Cont...
Big-O Theorems
Theorem 1: k is O(1)
Theorem 2: A polynomial is O(the term containing the highest
power of n).
Theorem 3: k*f(n) is O(f(n)) Constant factors may be ignored
Theorem 4(Transitivity): If f(n) is O(g(n))and g(n) is O(h(n)),
then f(n) is O(h(n)).
Theorem 5: For any base b, logb(n) is O(logn).
53
Cont...
Worst case is the best to determine the time complexity of
algorithms. In other ways Big O notation is determine the upper
bound of the function.
What is the time complexity of the following program?
Void fun(int n){
int i, j, k; Constant times O(1)
cout(“hello”);
}
}
Total time complexity is : 1+N*N=O(n2)
54
Cont...
What is the time complexity of the following program?
Void fun(int n){ Void fun(int n){
int i,j; int i, j, k count=0;
for (i=1;i<=n/3;i++){ for ( i=n/2; i<=n; i++){
for(j=1; j<=n; j+=4){ for(j=1; j+n/2<=n; j++){
cout<<“Hello”; for(k=1; k<=n; k=k*2)
} {
} count++;
}
}
}
55
Big-Omega Notation (Ω)
f(n)>=c.g(n)
2n+3>=1.n ; try c=1 and for all n>1. therefore, f(n)= Ω(n) or f(n)=
Ω(logn).
f(n)=5n2-64n+256. show that f(n)= Ω(n2). in order to show this we need to find
an integer no and a constant c>0 such that for all integers n>no, f(n)>=cn2
4n2-64n+256>=0 4(n-8)2
57 So, we have that for c=1 and no=0, f(n)>=cn2 for all integers n>=no.
Theta Notation ()
A function f(n) belongs to the set of (g(n)) if there exist positive constants c1
and c2 such that it can be sandwiched between c1.g(n) and c2.g(n), for
sufficiently large values of n.
Formal Definition:
there exist constants c1, c2, and k>0 such that c1.g(n)<=f(n)<=c2.g(n) for
all n>=k
simple terms, f(n)=(g(n)) means that f(n) and g(n) have the same rate
of growth.
58
Cont...
Show that
If f(n)=2n+1, then f(n) = (n)
c1.g(n)<=f(n)<=c2.g(n)
c1.n<=2n+1<=c2.n
Take c1=1 and c=3
n<=2n+1<=3n where c1=1 , c2=3 and n>=1
59
Little-o Notation ()
f(n)=o(g(n)) means for all c>0 there exists some k>0 such that
f(n)<c.g(n) for all n>=k.
Informally, f(n)=o(g(n)) means f(n) becomes insignificant relative to
g(n) as n approaches infinity.
60
Little-Omega Notation ()
61
Cont...
The End !!
62