Professional Documents
Culture Documents
• The way data are organized in a computer’s memory is said to be Data Structure and
of the problem.
This process of modeling is called abstraction.
This implies that the model focuses only on problem related stuff and that a programmer tries
It affects the design of both the structural and function aspect of the program.
2. Non primitive data structure: - is more sophisticated data structure emphasizing on structuring of a
ADT specifies:
What can be stored in the Abstract Data Type
What operations can be done on/by the Abstract Data Type.
This ADT:
Stores employees with their relevant attributes and discarding irrelevant attributes.
Supports hiring, firing, retiring … operations.
How do data structures model the world or some part of the world?
The value held by a data structure represents some specific characteristic of the world
The characteristic being modeled restricts the possible values held by a data structure
An algorithm transforms data structures from one state to another state in
two ways:
An algorithm may change the value held by a data structure
Definiteness: Each step must be clearly defined, having one and only one
Sequence: Each step must have a unique defined preceding and succeeding step.
The first step (start step) and last step (halt step) must be clearly noted.
Correctness: It must compute correct answer for all possible legal inputs.
Effectiveness: It must be possible to perform each step exactly and in a finite amount of
time.
Efficiency: It must solve with the least amount of computational resources such as time
and space.
Input/output: There must be a specified number of input values, and one or more result
values.
10/24/2021 Chapter 1 by: Kokeb A. 12
Algorithm Analysis Concepts
It’s a process of predicting the resource requirement of algorithms in a given environment. (Time and Space
required)
Memory Usage
Communication Bandwidth
Running time is usually treated as the most important since computational time is the most precious
resource in most problem domains.
Theoretical: Determining the quantity of resources required mathematically (Execution time, memory space, etc.)
needed by each algorithm.Chapter 1
10/24/2021 by: Kokeb A. 13
Algorithm Analysis Concepts Cont…
It is difficult to use actual clock-time as a consistent measure of an algorithm’s
efficiency, because clock-time can vary based:
Specific processor speed
Input Properties
Operating Environment
Space Complexity: Determine the approximate memory required to solve a problem of size n.
Order of Magnitude Analysis: Analysis of the function T (n) to determine the general complexity category to
which it belongs.
Function Return
Running time of a selection statement (if, switch) is the time for the condition evaluation +
the maximum of the running times for the individual clauses in the selection.
Always assume that the loop executes the maximum number of iterations possible.
Running time of a function call is 1 for setup + the time for any parameter
calculations + the time required for the execution of the function body.
Chapter 1
10/24/2021 17
by: Kokeb A.
Eg.
Examples:
1. int count()
{
int k=0;
cout<< “Enter an integer”; T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 =
cin>>n; O(n)
for (i=0;i<n;i++)
k=k+1;
return 0;
}
Again, count the number of additions. The outer summation is for the
outer for loop
Conditionals: Formally
If (test) s1 else s2: Compute the maximum of the running time for s1 and s2.
Example: Suppose we have hardware capable of executing 106 instructions per second.
How long would it take to execute an algorithm whose complexity function was:
T (n) = 2n2 on an input size of n= 108?
The total number of operations to be performed would be T (108):
T(108) = 2*(108)2 =2*1016
The required number of seconds required would be given by T(108)/106 so:
Running time =2* 1016 /106 = 2* 1010
The number of seconds per day is 86,400 so this is about 231,480 days (634 years).
Chapter 1
10/24/2021 23
by: Kokeb A.
Measures of Time
In order to determine the running time of an algorithm it is possible to define three functions
Tbest(n), Tavg(n) and Tworst(n) as the best, the average and the worst case running time of the
algorithm respectively.
Average Case (Tavg): The amount of time the algorithm takes on an "average" set of inputs.
Worst Case (Tworst): The amount of time the algorithm takes on the worst possible set of inputs.
Best Case (Tbest): The amount of time the algorithm takes on the smallest possible set of inputs.
We are interested in the worst-case time, since it provides a bound for all input – this is called the
“Big-Oh” estimate.
These are:
Big-Oh Notation (O)
• It’s only concerned with what happens for a very large value of n.
• Hence the n term is ignored. Of course, for small 1 values of n, it may be important.
However, Big-Oh is mainly concerned with large values of n.
10/24/2021 Chapter 1 by: Kokeb A. 26
Big-Oh Notation Cont…
Formal Definition: f (n)= O (g (n)) if there exist c, k ∊ ℛ+ such that for all n≥ k, f
(n) ≤ c.g (n).
Examples: The following points are facts that you can use for Big-Oh problems:
1<=n for all n>=1
To show that f(n) is O(g(n)) we must show that constants c and k such that
f(n) <=c.g(n) for all n>=k
(c=15,k=1).
Demonstrating that a function f(n) is big-O of a function g(n) requires that we find specific
constants c and k for which the inequality holds (and show that the inequality does in fact hold).
Big-O expresses an upper bound on the growth rate of a function, for sufficiently large values of
n.
An upper bound is the best algorithmic solution that has been found for a problem.
Exercise:
f(n) = (3/2)n2+(5/2)n+3
Theorem 1: k is O(1)
Formal Definition: A function f(n) is ( g (n)) if there exist constants c and k ∊ ℛ+ such that
f(n)= ( g (n)) means that f(n) is greater than or equal to some constant multiple of g(n) for all
values of n greater than or equal to some k.
In simple terms, f(n)= ( g (n)) means that the growth rate of f(n) is greater than or equal to g(n).
Example:
f(n)=O(n4)
f(n)=O(n3)
f(n)=O(n2)
All these are technically correct, but the last expression is the best and tight one. Since 2n2 and n2 have
2n2 = O(n2)=O(n3)
f(n)=o(g(n)) means for all c>0 there exists some k>0 such that f(n)<c.g(n) for all
n>=k. Informally, f(n)=o(g(n)) means f(n) becomes insignificant relative to g(n) as n
approaches infinity.
In simple terms, f(n) has less growth rate compared to g(n).
Transpose symmetry
f(n)=O(g(n)) if and only if g(n)= (f(n)),
Reflexivity
f(n)= (f(n)),
f(n)=O(f(n)),
f(n)= (f(n)).
The mathematical model of a Stack is LIFO. Data placed in the Stack is accessed through one path.
The standard operations on Stack are:
Push(s, data): push ‘data’ in stack ‘s’
POP(S): withdraw next available data from stack ‘S’
Makenull(S): clear stack ‘S’ of all data
Empty(S):return ‘true’ if Stack ‘s’ is empty and ‘false’ otherwise.
Top(S): view the next available data on stack ‘S’
All operation can be implemented in O(1) time/
Queue Enqueue(Q)
Dequeue(Q)
Front(Q)
Rear(Q)
Makenull(Q)
The mathematical model of Queue is FIFO. Data placed in the Queue goes through one path, while data
withdrawn goes t;hrough another path. The next available data is the first data placed in the queue.
The standard operations on queue are:
Enqueue(Q, data): put ‘data’ in the rear path of queue ‘Q’
Dequeue(Q): withdraw next available data from front path of queue ‘Q’
Front(Q): view the next available data on queue ‘Q’
Rear(Q): view the last data entered on queue ‘Q’
Makenull(Q): clear queue ‘Q’ of all data
A binary tree is an important type of tree structure which occurs very often.
It is characterized by the fact that any node can have at most two children, i.e. there is
no node with degree greater than two.
For binary trees we distinguish between the subtree on the left and on the right,
whereas for trees the order of the subtrees was irrelevant.
Furthermore a binary tree is allowed to have zero nodes while a tree must have at least
one node.
A data structure which provides for these two operations is called a priority queue.
Many algorithms need to make use of priority queues and so an efficient way to
We might first consider using a queue since inserting new elements would be very
efficient.
But finding the largest element would necessitate a scan of the entire queue.
But an insertion could require moving all of the items in the list.
efficiently.
A heap is a complete binary tree with the property that the value at each
node is at least as large as the values at its children (if they exist).
22 14 17 5 13 16 2 2
The two conditions for defining a heap ensure that the array has no
gaps (i.e. spaces where there could be a node but isn’t).
Recall that priority queues are required to dequeue the element with
the highest priority.
heapEnqueue(el):
Put el at the end of the heap (i.e. in the next free space)
While el is not the root and el > parent(el)
Swap el with its parent
heapDequeue():
Extract the element from the root
Move the element at the last leaf to the root
p = the root
While p is not a leaf and p < any of its children
Swap p with its larger child
Because this may violate the first heap condition, the new node is continually
swapped with its parent until the heap condition is satisfied again.
In the heapDequeue() operation, the root element (i.e. the one with the highest
priority) is extracted, then the last leaf node is moved to the root.
Because this will also violate the first heap condition, it is continually swapped
with its larger child until it the condition is satisfied again.
However, it is more often the case that we are given n items, say n integers
and we are free to choose whatever shape binary tree seems most desirable.
In this case the complete binary tree is chosen and represented sequentially.
This is why in the definition of heap we insist that a complete binary tree is
used.
10/24/2021 Chapter 1 by: Kokeb A. 55
Heap cont…
This is because the edges have no direction: an edge from node A to node B is
the same as an edge from node B to node A.
If the graph were a digraph then this would not be the case.
To represent a weighted graph, the array elements would contain the weights
of their respective edges.
Examples of graphs: (a-c) simple graphs; (d) a complete graph; (e) a multigraph;
(f) a pseudograph; (g) a digraph; (h) a weighted graph
• Each GraphVertex consists of a name and a linked list of edges. Each EdgeNode contains a
pointer to a GraphNode (i.e. an edge) and a link to the next edge emanating from this
vertex.
Chapter 1
10/24/2021 63
by: Kokeb A.
Set operation
• There are many set operations that are quite useful for algorithm
study. Some of the set operations are as follows.
• 1. Intersection:- this operation returns the common elements of a set.
• Example. A={1,3,5,7}
• B={3}
A intersection B={3}.
Chapter 1
10/24/2021 64
by: Kokeb A.
Hashing
• Hashing is a technique used for storing and retrieving keys in a rapid
manner.
• It is a concept of distributing keys in an array called a hash table using
predetermined function called a hash function.
Chapter 1
10/24/2021 65
by: Kokeb A.
.
Hash Functions
Mid-Square
Division (Modulus)
Folding
Extraction:
Chapter 1
10/24/2021 67
by: Kokeb A.
Division
• The home bucket is obtained by using the modulo (%) operator. The
• key x is divided by some number M, and the remainder is used as the
• home bucket for x.
• f(x) = x mod M
ex)
Chapter 1
10/24/2021 68
by: Kokeb A.
Folding
In this method the key k is partitioned into several parts, all but
possibly the last being of the same length. These partitions are then
added together to obtain the hash address for k.
There are two ways of carrying out this addition. (1) Shift (2) Boundary
ex1)
We assume the key = 12320324111220 , and hashing table has 1000 buckets.
123|203|241|112|20
(1) 123+203+241+112+020=699
(2) 123+302+241+211+020=897
69
Folding
ex2)
For the key 123456789, this method might use the first four digits (1234), or the last
four (6789), or the first two and last two (1289).
Extraction methods can be satisfactory so long as the omitted portion of the key is not
significant in distinguishing the keys.
For example, at MTU University many student ID numbers begin with the letters
“SCIR”, so the first four letters can be omitted and the following numbers used to
generate the key using one of the other hash function techniques.
10/24/2021 Chapter 1 by: Kokeb A. 71
Collision Resolution
If the hash function being used is not a perfect hash function (which is
usually the case), then the problem of collisions will arise.
For this reason, any hashing system should adopt a collision resolution
strategy.
10/24/2021 Chapter 1 by: Kokeb A. 72
Collision Handling
Linear Probing(Linear Open Addressing)
When the overflow occurs, we search the hash table buckets in the
order (H(x)+1, H(x)+2…), until the hash table is full or reaching the
first unfilled bucket.
ex) 0 10 0 10 0 10
1 1 1
Insert 55 Insert 25
2 75 2 75 2 75
3 3 55 3 55
4 4 4 25
5 43 5 43 5 43
6 6 6
73
Chaining
In chaining, each address in the table refers to a list, or chain, of data values.
If a collision occurs the new data is simply added to end of the chain.
Provided that the lists do not become very long, chaining is an efficient technique.
0 10 0 10
Ex.
1 1
Insert 25
Insert 55
2 75 2 75
55
3 3
25
4 4
5 43 5 43
6 6
74
Quadratic Probing
When the overflow occurs, we search the hash table buckets by using
75
Question1:
Suppose the hashing function f(x) = x mod 11 is used to hash a list of input
value (in the given order) into a hash table implemented by the array
bucked[0],bucket[1],…bucket[10]. The inputs are
10,100,32,45,126,3,24,200,and 53. Each bucket can hold only one number.
Overflow is resolved by quadratic probing, which examines buckets f(x),
(f(x)+ ) mod 11,and (f(x)- ) mod 11, i=1 to 5.Show2 the final contents in
bucket[0] to bucket[10].
2
i
i
76
Ans: 0 32
1 100
2 45
3 3
4
5 126
6 24
7
8 53
9 200
10 10
77
Question2:
78
(1) (2)
0 0 32
1 →34 1 34
2 2 21
3
3
4 26
4 →26→15
5 15
5 →17
6 17
6
7
7
8
8
9 20
9 →20→9 10 9
10 →32→21 79