You are on page 1of 129

Introduction

An algorithm, named after the ninth century scholar Abu Jafar Muhammad
Ibn Musu Al-Khowarizmi, is defined as follows: Roughly speaking:

• An algorithm is a set of rules for carrying out calculation either by


hand or on a machine.
• An algorithm is a finite step-by-step procedure to achieve a required
result.
• An algorithm is a sequence of computational steps that transform the
input into the output.
• An algorithm is a sequence of operations performed on data that have
to be organized in data structures.
• An algorithm is an abstraction of a program to be executed on a
physical machine (model of Computation).

The most famous algorithm in history dates well before the time of the ancient
Greeks: this is Euclid's algorithm for calculating the greatest common divisor
of two integers.

The Classic Multiplication Algorithm


1. Multiplication, the American way:

Multiply the multiplicand one after another by each digit of the multiplier
taken from right to left.

2. Multiplication, the English way:

Multiply the multiplicand one after another by each digit of the multiplier
taken from left to right.
Algorithmic is a branch of computer science that consists of designing and
analyzing computer algorithms

The “design” pertain to

• The description of algorithm at an abstract level by means of a pseudo


language, and
• Proof of correctness that is, the algorithm solves the given problem in all
cases.
• The “analysis” deals with performance evaluation (complexity analysis).

We start with defining the model of computation, which is usually the Random
Access Machine (RAM) model, but other models of computations can be use
such as PRAM. Once the model of computation has been defined, an algorithm
can be describe using a simple language (or pseudo language) whose syntax is
close to programming language such as C or java

Algorithm's Performance
Two important ways to characterize the effectiveness of an algorithm are its
space complexity and time complexity. Time complexity of an algorithm
concerns determining an expression of the number of steps needed as a function
of the problem size. Since the step count measure is somewhat coarse, one does
not aim at obtaining an exact step count. Instead, one attempts only to get
asymptotic bounds on the step count. Asymptotic analysis makes use of the O
(Big Oh) notation. Two other notational constructs used by computer scientists
in the analysis of algorithms are Θ (Big Theta) notation and Ω (Big Omega)
notation.
The performance evaluation of an algorithm is obtained by totaling the number
of occurrences of each operation when running the algorithm. The performance
of an algorithm is evaluated as a function of the input size n and is to be
considered modulo a multiplicative constant.

The following notations are commonly use notations in performance analysis


and used to characterize the complexity of an algorithm.

Θ-Notation (Same order)

This notation bounds a function to within constant factors. We say f(n) = Θ(g(n)) if
there exist positive constants n0, c1 and c2 such that to the right of n0 the value of f(n)
always lies between c1g(n) and c2g(n) inclusive.

O-Notation (Upper Bound)

This notation gives an upper bound for a function to within a constant factor. We
write f(n) = O(g(n)) if there are positive constants n0 and c such that to the right of
n0, the value of f(n) always lies on or below cg(n).
Ω-Notation (Lower Bound)

This notation gives a lower bound for a function to within a constant factor. We
write f(n) = Ω(g(n)) if there are positive constants n0 and c such that to the right of
n0, the value of f(n) always lies on or above cg(n).

Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the upper bound of
the number of operation (or running time) performed by an algorithm when the
input size is n.
There are two interpretations of upper bound.

Worst-case Complexity

The running time for any given size input will be lower than the upper bound except
possibly for some values of the input where the maximum is reached.

Average-case Complexity

The running time for any given size input will be the average number of operations
over all problem instances for a given size.

Because, it is quite difficult to estimate the statistical behavior of the input,


most of the time we content ourselves to a worst case behavior. Most of the
time, the complexity of g(n) is approximated by its family o(f(n)) where f(n) is
one of the following functions. n (linear complexity), log n (logarithmic
complexity), na where a≥2 (polynomial complexity), an (exponential
complexity).

Optimality
Once the complexity of an algorithm has been estimated, the question arises
whether this algorithm is optimal. An algorithm for a given problem is optimal
if its complexity reaches the lower bound over all the algorithms solving this
problem. For example, any algorithm solving “the intersection of n segments”
problem will execute at least n2 operations in the worst case even if it does
nothing but print the output. This is abbreviated by saying that the problem has
Ω(n2) complexity. If one finds an O(n2) algorithm that solve this problem, it
will be optimal and of complexity Θ(n2).

Reduction
Another technique for estimating the complexity of a problem is the
transformation of problems, also called problem reduction. As an example,
suppose we know a lower bound for a problem A, and that we would like to
estimate a lower bound for a problem B. If we can transform A into B by a
transformation step whose cost is less than that for solving A, then B has the
same bound as A.

The Convex hull problem nicely illustrates "reduction" technique. A lower


bound of Convex-hull problem established by reducing the sorting problem
(complexity: Θ(nlogn)) to the Convex hull problem.

Mathematics for Algorithmic

Set

A set is a collection of different things (distinguishable objects or distinct objects)


represented as a unit. The objects in a set are called its elements or members. If an
object x is a member of a set S, we write x S. On the the hand, if x is not a member
of S, we write z S. A set cannot contain the same object more than once, and its
elements are not ordered.

For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57}
or equivalently, 7 S and 8 S.

We can also describe a set containing elements according to some rule. We write

{n : rule about n}

Thus, {n : n = m2 for some m N } means that a set of perfect squares.

Set Cardinality
The number of elements in a set is called cardinality or size of the set, denoted |S| or
sometimes n(S). The two sets have same cardinality if their elements can be put into
a one-to-one correspondence. It is easy to see that the cardinality of an empty set is
zero i.e., | | .

Mustiest
If we do want to take the number of occurrences of members into account, we call
the group a multiset.
For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as
multiset.
Infinite Set
A set contains infinite elements. For example, set of negative integers, set of
integers, etc.

Empty Set
Set contain no member, denoted as or {}.

Subset
For two sets A and B, we say that A is a subset of B, written A B, if every member
of A also is a member of B.

Formally, A B if

x A implies x B

written

x A => x B.

Proper Subset

Set A is a proper subset of B, written A B, if A is a subset of B and not equal to B.


That is, A set A is proper subset of B if A B but A B.

Equal Sets

The sets A and B are equal, written A = B, if each is a subset of the other. Rephrased
definition, let A and B be sets. A = B if A B and B A.
Power Set

Let A be the set. The power of A, written P(A) or 2A, is the set of all subsets of A.
That is, P(A) = {B : B A}.

For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the
power set of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0),
(0, 1), (1, 0), (1, 1)}.

Disjoint Sets

Let A and B be sets. A and B are disjoint if A B = .

Union of Sets

The union of A and B, written A B, is the set we get by combining all elements in A
and B into a single set. That is,

A B = { x : x A or x B}.

For two finite sets A and B, we have identity

|A B| = |A| + |B| - |A B|

We can conclude

|A B| |A| + |B|

That is,

if |A B| = 0 then |A B| = |A| + |B| and if A B then |A| |B|


Intersection Sets

The intersection of set set A and B, written A B, is the set of elements that are
both in A and in B. That is,

A B = { x : x A and x B}.

Partition of Set
A collection of S = {Si} of nonempty sets form a partition of a set if

i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .

ii. Their union is S, that is, S = Si

In other words, S form a partition of S if each element of S appears in exactly on Si.

Difference of Sets
Let A and B be sets. The difference of A and B is

A - B = {x : x A and x B}.

For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3}
while B-A = {4, 6, 8}.
Complement of a Set

All set under consideration are subset of some large set U called universal set. Given
a universal set U, the complement of A, written A', is the set of all elements under consideration that are not in
A.

Formally, let A be a subset of universal set U. The complement of A in U is

A' = A - U
OR

A' = {x : x U and x A}.

For any set A U, we have following laws

i. A'' = A

ii. A A' = .
iii. A A' = U

Symmetric difference

Let A and B be sets. The symmetric difference of A and B is

A B = { x : x A or x B but not both}

Therefore,

A B = (A B) - (A B)

As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The
symmetric difference, A B = {1, 3, 4, 6, 8}.
Sequences
A sequence of objects is a list of objects in some order. For example, the sequence 7,
21, 57 would be written as (7, 21, 57). In a set the order does not matter but in a
sequence it does.

Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.

Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7,
7, 21, 57) is different from {7, 21, 57}.

Tuples
Finite sequence often are called tuples. For example,

(7, 21) 2-tuple or pair


(7, 21, 57) 3-tuple
(7, 21, ..., k ) k-tuple

An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a,
b) = (a, {a, b}).

Cartesian Product or Cross Product


If A and B are two sets, the cross product of A and B, written A×B, is the set of all
pairs wherein the first element is a member of the set A and the second element is a
member of the set B. Formally,

A×B = {(a, b) : a A, b B}.

For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x),
(2, y), (2, z)}.

When A and B are finite sets, the cardinality of their product is


|A×B| = |A| . |B|

n-tuples

The cartesian product of n sets A1, A2, ..., An is the set of n-tuples

A × A × ... × A = {(a , ..., a ) : a A , i = 1, 2, ..., n}


1 2 n 1 n i i

whose cardinality is

| A × A × ... × A | = |A | . |A | ... |A |
1 2 n 1 2 n

If all sets are finite. We denote an n-fold cartesian product over a single set A by the
set

An = A × A × ... × A

whose cardinality is

|An | = | A|n
if A is finite.
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/MathAlgor/sets.html

Greedy Introduction

Greedy algorithms are simple and straightforward. They are shortsighted in


their approach in the sense that they take decisions on the basis of information
at hand without worrying about the effect these decisions may have in the
future. They are easy to invent, easy to implement and most of the time quite
efficient. Many problems cannot be solved correctly by greedy approach.
Greedy algorithms are used to solve optimization problems

Greedy Approach
Greedy Algorithm works by making the decision that seems most promising at
any moment; it never reconsiders this decision, whatever situation may
arise later.

As an example consider the problem of "Making Change".

Coins available are:


dollars (100 cents)

quarters (25 cents)

dimes (10 cents)

nickels (5 cents)

pennies (1 cent)

Problem Make a change of a given amount using the smallest possible


number of coins.

Informal Algorithm
Start with nothing.

at every stage without passing the given amount.

add the largest to the coins already chosen.
o

Formal Algorithm

Make change for n units using the least possible number of coins.

MAKE-CHANGE (n)
C ← {100, 25, 10, 5, 1} // constant.
Sol ← {}; // set that will hold the solution set.
Sum ← 0 sum of item in solution set
WHILE sum not = n
x = largest item in set C such that sum + x ≤ n
IF no such item THEN
RETURN "No Solution"
S ← S {value of x}
sum ← sum + x
RETURN S
Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution
contains 2 dollars, 3 quarters, 1 dime and 4 pennies. The algorithm is greedy
because at every stage it chooses the largest coin without worrying about the
consequences. Moreover, it never changes its mind in the sense that once a coin
has been included in the solution set, it remains there.

Characteristics and Features of Problems solved by


Greedy Algorithms

To construct the solution in an optimal way. Algorithm maintains two sets. One
contains chosen items and the other contains rejected items.

The greedy algorithm consists of four (4) function.


1. A function that checks whether chosen set of items provide a solution.

2. A function that checks the feasibility of a set.

3. The selection function tells which of the candidates is the most promising.

4. An objective function, which does not appear explicitly, gives the value of a solution.

Structure Greedy Algorithm


Initially the set of chosen items is empty i.e., solution set.

At each step

item will be added in a solution set by using selection function.
o
IF the set would no longer be feasible
o
reject items under consideration (and is never consider again).

ELSE IF set is still feasible THEN
o
add the current item.

Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not
merely a solution, but an optimal solution to the problem. In particular,
the empty set is always promising why? (because an optimal solution
always exists)

Unlike Dynamic Programming, which solves the subproblems bottom-up, a


greedy strategy usually progresses in a top-down fashion, making one greedy
choice after another, reducing each problem to a smaller one.

Greedy-Choice Property

The "greedy-choice property" and "optimal substructure" are two ingredients in


the problem that lend to a greedy strategy.

Greedy-Choice Property

It says that a globally optimal solution can be arrived at by making a locally


optimal choice.

Knapsack Problem
Statement A thief robbing a store and can carry a maximal weight of w
into their knapsack. There are n items and ith item weigh wi and is worth vi
dollars. What items should thief take?

There are two versions of problem


I. Fractional knapsack problem

The setup is same, but the thief can take fractions of items, meaning that the items can be broken into
smaller pieces so that thief may decide to carry only a fraction of x of item i, where 0 ≤ x ≤ 1.
i i

Exhibit greedy choice property.

Greedy algorithm exists.


Exhibit optimal substructure property.

?????

II. 0-1 knapsack problem
The setup is the same, but the items may not be broken into smaller pieces, so thief may decide either

to take an item or to leave it (binary choice), but may not take a fraction of an item.

Exhibit No greedy choice property.


No greedy algorithm exists.

Exhibit optimal substructure property.

Only dynamic programming algorithm exists.


Dynamic-Programming Solution
to the 0-1 Knapsack Problem
Let i be the highest-numbered item in an optimal solution S for W pounds.
Then S`= S - {i} is an optimal solution for W-wi pounds and the value to
the solution S is Vi plus the value of the subproblem.

We can express this fact in the following formula: define c[i, w] to be the
solution for items 1,2, . . . , i and maximum weight w. Then

0 if i = 0 or w = 0
c[i,w]
c[i-1, w] if w ≥ 0
= i

max [v + c[i-1, w-w ], c[i-1,


i i if i>0 and w ≥ w
w]} i

This says that the value of the solution to i items either include ith item, in
which case it is vi plus a subproblem solution for (i-1) items and the weight
excluding wi, or does not include ith item, in which case it is a subproblem's
solution for (i-1) items and the same weight. That is, if the thief picks item i,
thief takes vi value, and thief can choose from items w-wi, and get c[i-1, w-wi]
additional value. On other hand, if thief decides not to take item i, thief can
choose from item 1,2, . . . , i-1 upto the weight limit w, and get c[i-1, w] value.
The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS
formula: boundary values are 0, and other values are computed from the input
and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length
algorithm given in CLR-book for finding a longest common subsequence of
two sequences.

The algorithm takes as input the maximum weight W, the number of items n,
and the two sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores
the c[i, j] values in the table, that is, a two dimensional array, c[0 . . n, 0 . . w]
whose entries are computed in a row-major order. That is, the first row of c is
filled in from left to right, then the second row, and so on. At the end of the
computation, c[n, w] contains the maximum value that can be picked into the
knapsack.

Dynamic-0-1-knapsack (v, w, n, W)
for w = 0 to W
do c[0, w] = 0
for i = 1 to n
do c[i, 0] = 0
for w = 1 to W
do if wi ≤ w
then if vi + c[i-1, w-wi]
then c[i, w] = vi + c[i-1, w-wi]
else c[i, w] = c[i-1, w]
else
c[i, w] = c[i-1, w]
The set of items to take can be deduced from the table, starting at c[n. w] and
tracing backwards where the optimal values came from. If c[i, w] = c[i-1, w]
item i is not part of the solution, and we are continue tracing with c[i-1, w].
Otherwise item i is part of the solution, and we continue tracing with c[i-1, w-
W].

Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as
follows:

θ(nw) times to fill the c-table, which has (n+1).(w+1) entries, each
requiring θ(1) time to compute. O(n) time to trace the solution, because the
tracing process starts in row n of the table and moves up 1 row at each step.

Greedy Solution to the


Fractional Knapsack Problem
There are n items in a store. For i =1,2, . . . , n, item i has weight wi > 0 and
worth vi > 0. Thief can carry a maximum weight of W pounds in a knapsack.
In this version of a problem the items can be broken into smaller piece, so the
thief may decide to carry only a fraction xi of object i, where 0 ≤ xi ≤ 1.
Item i contributes xiwi to the total weight in the knapsack, and xivi to the
value of the load.

In Symbol, the fraction knapsack problem can be stated as follows.


maximize nSi=1 xivi subject to constraint nSi=1 xiwi ≤ W

It is clear that an optimal solution must fill the knapsack exactly, for otherwise
we could add a fraction of one of the remaining objects and increase the value
of the load. Thus in an optimal solution nSi=1 xiwi = W.

Greedy-fractional-knapsack (w, v, W)
FOR i =1 to n
do x[i] =0
weight = 0
while weight < W
do i = best remaining item
IF weight + w[i] ≤ W
then x[i] = 1
weight = weight + w[i]
else
x[i] = (w - weight) / w[i]
weight = W
return x

Analysis
If the items are already sorted into decreasing order of vi / wi, then
the while-loop takes a time in O(n);
Therefore, the total time including the sort is in O(n log n).

If we keep the items in heap with largest vi/wi at the root. Then

creating the heap takes O(n) time



while-loop now takes O(log n) time (since heap property must be restored after the removal of root)

Although this data structure does not alter the worst-case, it may be faster if
only a small number of items are need to fill the knapsack.

One variant of the 0-1 knapsack problem is when order of items are sorted by
increasing weight is the same as their order when sorted by decreasing value.

The optimal solution to this problem is to sort by the value of the item in
decreasing order. Then pick up the most valuable item which also has a least
weight. First, if its weight is less than the total weight that can be carried. Then
deduct the total weight that can be carried by the weight of the item just pick.
The second item to pick is the most valuable item among those remaining.
Keep follow the same strategy until thief cannot carry more item (due to
weight).

Proof
One way to proof the correctness of the above algorithm is to prove the greedy
choice property and optimal substructure property. It consist of two steps. First,
prove that there exists an optimal solution begins with the greedy choice given
above. The second part prove that if A is an optimal solution to the original
problem S, then A - a is also an optimal solution to the problem S - s where
a is the item thief picked as in the greedy choice and S - s is the subproblem
after the first greedy choice has been made. The second part is easy to prove
since the more valuable items have less weight.
Note that if v` / w` , is not it can replace any other because w` < w, but it
increases the value because v` > v. □

Theorem The fractional knapsack problem has the


greedy-choice property.
Proof Let the ratio v`/w` is maximal. This supposition implies that
v`/w` ≥ v/w for any pair (v, w), so v`v / w > v for any (v, w). Now
Suppose a solution does not contain the full w` weight of the best ratio. Then
by replacing an amount of any other w with more w` will improve the value. □

Greedy Solution to the


Fractional Knapsack Problem
There are n items in a store. For i =1,2, . . . , n, item i has weight wi > 0 and
worth vi > 0. Thief can carry a maximum weight of W pounds in a knapsack.
In this version of a problem the items can be broken into smaller piece, so the
thief may decide to carry only a fraction xi of object i, where 0 ≤ xi ≤ 1.
Item i contributes xiwi to the total weight in the knapsack, and xivi to the
value of the load.

In Symbol, the fraction knapsack problem can be stated as follows.


maximize nSi=1 xivi subject to constraint nSi=1 xiwi ≤ W

It is clear that an optimal solution must fill the knapsack exactly, for otherwise
we could add a fraction of one of the remaining objects and increase the value
of the load. Thus in an optimal solution nSi=1 xiwi = W.

Greedy-fractional-knapsack (w, v, W)
FOR i =1 to n
do x[i] =0
weight = 0
while weight < W
do i = best remaining item
IF weight + w[i] ≤ W
then x[i] = 1
weight = weight + w[i]
else
x[i] = (w - weight) / w[i]
weight = W
return x

Analysis
If the items are already sorted into decreasing order of vi / wi, then
the while-loop takes a time in O(n);
Therefore, the total time including the sort is in O(n log n).

If we keep the items in heap with largest vi/wi at the root. Then

creating the heap takes O(n) time



while-loop now takes O(log n) time (since heap property must be restored after the removal of root)

Although this data structure does not alter the worst-case, it may be faster if
only a small number of items are need to fill the knapsack.

One variant of the 0-1 knapsack problem is when order of items are sorted by
increasing weight is the same as their order when sorted by decreasing value.

The optimal solution to this problem is to sort by the value of the item in
decreasing order. Then pick up the most valuable item which also has a least
weight. First, if its weight is less than the total weight that can be carried. Then
deduct the total weight that can be carried by the weight of the item just pick.
The second item to pick is the most valuable item among those remaining.
Keep follow the same strategy until thief cannot carry more item (due to
weight).

Proof
One way to proof the correctness of the above algorithm is to prove the greedy
choice property and optimal substructure property. It consist of two steps. First,
prove that there exists an optimal solution begins with the greedy choice given
above. The second part prove that if A is an optimal solution to the original
problem S, then A - a is also an optimal solution to the problem S - s where
a is the item thief picked as in the greedy choice and S - s is the subproblem
after the first greedy choice has been made. The second part is easy to prove
since the more valuable items have less weight.
Note that if v` / w` , is not it can replace any other because w` < w, but it
increases the value because v` > v. □

Theorem The fractional knapsack problem has the


greedy-choice property.
Proof Let the ratio v`/w` is maximal. This supposition implies that
v`/w` ≥ v/w for any pair (v, w), so v`v / w > v for any (v, w). Now
Suppose a solution does not contain the full w` weight of the best ratio. Then
by replacing an amount of any other w with more w` will improve the value. □

An Activity Selection Problem


An activity-selection is the problem of scheduling a resource among several
competing activity.

Problem Statement
Given a set S of n activities with and start time, Si and fi, finish time of an ith
activity. Find the maximum size set of mutually compatible activities.

Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and
[sj, fj)
do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi

Greedy Algorithm for Selection


Problem
I. Sort the input activities by
increasing finishing time.
f1 ≤ f2 ≤ . . . ≤ fn
II. Call GREEDY-ACTIVITY-
SELECTOR (s, f)
1. n = length [s]

2. A={i}

3. j=1

4. for i = 2 to n

5. do if s ≥ f
i j
6. then A= AU{i}

7. j=i

8. return set A

Operation of the algorithm


Let 11 activities are given S = {p, q, r, s, t, u, v, w, x, y, z} start and finished
times for proposed activities are (1, 4), (3, 5), (0, 6), 5, 7), (3, 8), 5, 9), (6, 10),
(8, 11), (8, 12), (2, 13) and (12, 14).

A = {p} Initialization at line 2


A = {p, s} line 6 - 1st iteration of FOR - loop
A = {p, s, w} line 6 -2nd iteration of FOR - loop
A = {p, s, w, z} line 6 - 3rd iteration of FOR-loop
Out of the FOR-loop and Return A = {p, s, w, z}

Analysis
Part I requires O(n lg n) time (use merge of heap sort).
Part II requires θ(n) time assuming that activities were already
sorted in part I by their finish time.

Correctness
Note that Greedy algorithm do not always produce optimal solutions but
GREEDY-ACTIVITY-SELECTOR does.

Theorem Algorithm GREED-ACTIVITY-SELECTOR


produces solution of maximum size for the activity-
selection problem.

Proof Idea Show the activity problem satisfied


I. Greedy choice property.

II. Optimal substructure property.

Proof
I. Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It implies that activity 1 has the

earliest finish time.

Suppose, A S is an optimal solution and let activities in A are ordered by finish time. Suppose, the first activity in A is k.

If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is nothing to proof here).

If k 1, we want to show that there is another solution B that begins with greedy choice, activity 1.

Let B = A - {k} {1}. Because f f , the activities in B are disjoint and since B has same number of activities as
1 k
A, i.e., |A| = |B|, B is also optimal.
II. Once the greedy choice is made, the problem reduces to finding an optimal solution for the problem. If A is an optimal

solution to the original problem S, then A` = A - {1} is an optimal solution to the activity-selection problem S` = {i
S: S f }.
i i
why? Because if we could find a solution B` to S` with more activities then A`, adding 1 to B` would yield a solution B to S
with more activities than A, there by contradicting the optimality. □

As an example consider the example. Given a set of activities to among lecture


halls. Schedule all the activities using minimal lecture halls.
In order to determine which activity should use which lecture hall, the
algorithm uses the GREEDY-ACTIVITY-SELECTOR to calculate the
activities in the first lecture hall. If there are some activities yet to be scheduled,
a new lecture hall is selected and GREEDY-ACTIVITY-SELECTOR is called
again. This continues until all activities have been scheduled.

LECTURE-HALL-ASSIGNMENT
(s, f)
n = length [s)
for i = 1 to n
do HALL [i] = NIL
k=1
while (Not empty (s))
do HALL [k] =
GREEDY-ACTIVITY-
SELECTOR (s, t, n)
k=k+1
return HALL
Following changes can be made in the GREEDY-ACTIVITY-SELECTOR (s, f)
(see CLR).

j = first (s)
A=i
for i = j + 1 to n
do if s(i) not= "-"
then if
GREED-ACTIVITY-SELECTOR
(s, f, n)
j = first (s)
A = i = j + 1 to n
if s(i] not = "-" then
if s[i] ≥ f[j]|
then A = AU{i}
s[i] = "-"
j=i
return A

Correctness
The algorithm can be shown to be correct and optimal. As a contradiction,
assume the number of lecture halls are not optimal, that is, the algorithm
allocates more hall than necessary. Therefore, there exists a set of activities B
which have been wrongly allocated. An activity b belonging to B which has
been allocated to hall H[i] should have optimally been allocated to H[k].
This implies that the activities for lecture hall H[k] have not been allocated
optimally, as the GREED-ACTIVITY-SELECTOR produces the optimal set of
activities for a particular lecture hall.
Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-
2
SELECTOR runs in θ(n). The running time of this algorithm is O(n ).

Two important Observations


Choosing the activity of least duration will not always produce an optimal solution. For example, we have a set of activities

{(3, 5), (6, 8), (1, 4), (4, 7), (7, 10)}. Here, either (3, 5) or (6, 8) will be picked first, which will be picked first, which will
prevent the optimal solution of {(1, 4), (4, 7), (7, 10)} from being found.

Choosing the activity with the least overlap will not always produce solution. For example, we have a set of activities {(0, 4),

(4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0, 3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other
activities is (4, 6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5), (5, 9), (9, 10)} from

being found.

Dynamic-Programming Algorithm for the Activity-Selection Problem

Huffman Codes

Huffman code is a technique for compressing data. Huffman's greedy


algorithm look at the occurrence of each character and it as a binary string in an
optimal way.

Example
Suppose we have a data consists of 100,000 characters that we want to
compress. The characters in the data occur with following frequencies.

a b c d e f
Frequency
45,00013,00012,00016,0009,000 5,000
Consider the problem of designing a "binary character code" in which each
character is represented by a unique binary string.

Fixed Length Code


In fixed length code, needs 3 bits to represent six(6) characters.

a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 000 001 010 011 100 101

This method require 3000,000 bits to code the entire file.

How do we get 3000,000?


Total number of characters are 45,000 + 13,000 + 12,000 + 16,000 + 9,000 + 5,000 = 1000,000.

Add each character is assigned 3-bit codeword => 3 * 1000,000 = 3000,000 bits.

Conclusion
Fixed-length code requires 300,000 bits while variable code
requires 224,000 bits.

=> Saving of approximately 25%.

Prefix Codes
In which no codeword is a prefix of other codeword. The reason prefix codes
are desirable is that they simply encoding (compression) and decoding.

Can we do better?

A variable-length code can do better by giving frequent characters short


codewords and infrequent characters long codewords.

a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 0 101 100 111 1101 1100

Character 'a' are 45,000


each character 'a' assigned 1 bit codeword.
1 * 45,000 = 45,000 bits.

Characters (b, c, d) are 13,000 + 12,000 + 16,000 =


41,000
each character assigned 3 bit codeword
3 * 41,000 = 123,000 bits

Characters (e, f) are 9,000 + 5,000 = 14,000


each character assigned 4 bit codeword.
4 * 14,000 = 56,000 bits.

Implies that the total bits are: 45,000 + 123,000 + 56,000 = 224,000 bits
Encoding: Concatenate the codewords representing each characters of the
file.

String Encoding
TEA 10 00 010
SEA 011 00 010
TEN 10 00 110

Example From variable-length codes table, we code the3-character file


abc as:

a b c

0 101 100 => 0.101.100 = 0101100

Decoding
Since no codeword is a prefix of other, the codeword that begins an encoded
file is unambiguous.
To decode (Translate back to the original character), remove it from the encode
file and repeatedly parse.
For example in "variable-length codeword" table, the string 001011101 parse
uniquely as 0.0.101.1101, which is decode to aabe.
The representation of "decoding process" is binary tree, whose leaves are
characters. We interpret the binary codeword for a character as path from the
root to that character, where 0 means "go to the left child" and 1 means "go to
the right child". Note that an optimal code for a file is always represented by a
full (complete) binary tree.

Theorem A Binary tree that is not full cannot


correspond to an optimal prefix code.

Proof Let T be a binary tree corresponds to prefix code such that T is not
full. Then there must exist an internal node, say x, such that x has only one
child, y. Construct another binary tree, T`, which has save leaves as T and have
same depth as T except for the leaves which are in the subtree rooted at y in T.
These leaves will have depth in T`, which implies T cannot correspond to an
optimal prefix code.
To obtain T`, simply merge x and y into a single node, z is a child of parent of
x (if a parent exists) and z is a parent to any children of y. Then T` has the
desired properties: it corresponds to a code on the same alphabet as the code
which are obtained, in the subtree rooted at y in T have depth in T` strictly less
(by one) than their depth in T.
This completes the proof. □

a b c d e f
Frequency
45,000 13,000 12,000 16,000 9,000 5,000
Fixed Length code 000 001 010 011 100 101

Variable-length Code 0 101 100 111 1101 1100


Fixed-length code is not optimal since binary tree is not full.

Figure

Optimal prefix code because tree is full binary

Figure

From now on consider only full binary tree

If C is the alphabet from which characters are drawn, then the tree for an
optimal prefix code has exactly |c| leaves (one for each letter) and exactly |c|-1
internal orders. Given a tree T corresponding to the prefix code, compute the
number of bits required to encode a file. For each character c in C, let f(c) be
the frequency of c and let dT(c) denote the depth of c's leaf. Note that dT(c) is
also the length of codeword. The number of bits to encode a file is

B (T) = S f(c) dT(c)

which define as the cost of the tree T.

For example, the cost of the above tree is

B (T) = S f(c) dT(c)


= 45*1 +13*3 + 12*3 + 16*3 + 9*4 +5*4
= 224

Therefore, the cost of the tree corresponding to the optimal prefix code is 224
(224*1000 = 224000).

Constructing a Huffman code


A greedy algorithm that constructs an optimal prefix code called a Huffman
code. The algorithm builds the tree T corresponding to the optimal code in a
bottom-up manner. It begins with a set of |c| leaves and perform |c|-1 "merging"
operations to create the final tree.

Data Structure used: Priority


queue = Q
Huffman (c)
n = |c|
Q=c
for i =1 to n-1
do z = Allocate-Node ()
x = left[z] =
EXTRACT_MIN(Q)
y = right[z] =
EXTRACT_MIN(Q)
f[z] = f[x] + f[y]
INSERT (Q, z)
return EXTRACT_MIN(Q)

Analysis
Q implemented as a binary heap.

line 2 can be performed by using BUILD-HEAP (P. 145; CLR) in O(n) time.

FOR loop executed |n| - 1 times and since each heap operation requires O(lg n) time.

=> the FOR loop contributes (|n| - 1) O(lg n)

=> O(n lg n)
Thus the total running time of Huffman on the set of n characters is O(nlg n).

Operation of the Algorithm

An optimal Huffman code for the following set of frequencies


a:1 b:1 c:2 d:3 e:5 g:13 h:2
Note that the frequencies are based on Fibonacci numbers.

Since there are letters in the alphabet, the initial queue size is n = 8, and 7
merge steps are required to build the tree. The final tree represents the optimal
prefix code.

Figure

The codeword for a letter is the sequence of the edge labels on the path from
the root to the letter. Thus, the optimal Huffman code is as follows:

h: 1

g: 1 0
f: 1 1 0
e: 1 1 1 0
d: 1 1 1 1 0

c: 1 1 1 1 1 0
b: 1 1 1 1 1 1 0

a: 1 1 1 1 1 1 1

As we can see the tree is one long limb with leaves n=hanging off. This is true
for Fibonacci weights in general, because the Fibonacci the recurrence is

Fi+1 + Fi + Fi-1 implies that i Fi = Fi+2 - 1.

To prove this, write Fj as Fj+1 - Fj-1 and sum from 0 to i, that is, F-1 = 0.
Correctness of Huffman Code Algorithm
Proof Idea

Step 1: Show that this problem satisfies the greedy choice


property, that is, if a greedy choice is made by Huffman's
algorithm, an optimal solution remains possible.

Step 2: Show that this problem has an optimal substructure


property, that is, an optimal solution to Huffman's algorithm
contains optimal solution to subproblems.

Step 3: Conclude correctness of Huffman's algorithm using step 1


and step 2.

Lemma - Greedy Choice Property Let c be an alphabet


in which each character c has frequency f[c]. Let x and y
be two characters in C having the lowest frequencies. Then
there exists an optimal prefix code for C in which the
codewords for x and y have the same length and differ only
in the last bit.

Proof Idea
Take the tree T representing optimal prefix code and transform T into a tree
T` representing another optimal prefix code such that the x characters x and y
appear as sibling leaves of maximum depth in T`. If we can do this, then their
codewords will have same length and differ only in the last bit.

Figures
Proof
Let characters b and c are sibling leaves of maximum depth in tree T. Without
loss of generality assume that f[b] ≥ f[c] and f[x] ≤ f[y]. Since f[x]
and f[y] are lowest leaf frequencies in order and f[b] and f[c] are arbitrary
frequencies in order. We have f[x] ≤ f[b] and f[y] ≤ f[c]. As shown in
the above figure, exchange the positions of leaves to get first T` and then T``.
By formula, B(t) = c in C f(c)dT(c), the difference in cost between T and
T` is

B(T) - B(T`) = f[x]dT(x) + f(b)dT(b) -


[f[x]dT(x) + f[b]dT`(b)
= (f[b] - f[x]) (dT(b) -
dT(x))
= (non-negative)(non-
negative)
≥0

Two Important Points


The reason f[b] - f[x] is non-negative because x is a minimum frequency
leaf in tree T and the reason dT(b) - dT(x) is non-negative that b is a
leaf of maximum depth in T.
Similarly, exchanging y and c does not increase the cost which implies that
B(T) - B(T`) ≥ 0. This fact in turn implies that B(T) ≤ B(T`), and
since T is optimal by supposition, which implies B(T`) = B(T). Therefore,
T`` is optimal in which x and y are sibling leaves of maximum depth, from
which greedy choice property follows. This complete the proof. □
Lemma - Optimal Substructure Property Let T be a
full binary tree representing an optimal prefix code over an
alphabet C, where frequency f[c] is define for each
character c belongs to set C. Consider any two characters
x and y that appear as sibling leaves in the tree T and let z
be their parent. Then, considering character z with
frequency f[z] = f[x] + f[y], tree T` = T - {x, y} represents
an optimal code for the alphabet C` = C - {x, y}U{z}.

Proof Idea

Figure

Proof
We show that the cost B(T) of tree T can be expressed in terms of the cost
B(T`). By considering the component costs in equation, B(T) =
f(c)dT(c), we show that the cost B(T) of tree T can be expressed in terms of
the cost B(T`) of the tree T`. For each c belongs to C - {x, y}, we have
dT(c) = dT(c)

cinCf[c]dT(c) = ScinC-{x,y} f[c]dT`(c)


= f[x](dT` (z) + 1 + f[y]
(dT`(z) +1)
= (f[x] + f[y]) dT`(z) + f[x]
+ f[y]
And B(T) = B(T`) + f(x) + f(y)
If T` is non-optimal prefix code for C`, then there exists a T`` whose leaves
are the characters belongs to C` such that B(T``) < B(T`). Now, if x and
y are added to T`` as a children of z, then we get a prefix code for alphabet C
with cost B(T``) + f[x] + f[y] < B(T), Contradicting the optimality of
T. Which implies that, tree T` must be optimal for the alphabet C. □

Theorem Procedure HUFFMAN produces an optimal


prefix code.

Proof
Let S be the set of integers n ≥ 2 for which the Huffman procedure produces a
tree of representing optimal prefix code for frequency f and alphabet C with |c|
= n.
If C = {x, y}, then Huffman produces one of the following optimal trees.

figure

This clearly shows 2 is a member of S. Next assume that n belongs to S and


show that (n+1) also belong to S.
Let C is an alphabet with |c| = n + 1. By lemma 'greedy choice property', there
exists an optimal code tree T for alphabet C. Without loss of generality, if x and
y are characters with minimal frequencies then
a. x and y are at maximal depth in tree T and b. and y have a common parent
z.
Suppose that T` = T - {x,y} and C` = C - {x,y}U{z} then by lemma-optimal
substructure property (step 2), tree T` is an optimal code tree for C`. Since |C`|
= n and n belongs to S, the Huffman code procedure produces an optimal code
tree T* for C`. Now let T** be the tree obtained from T* by attaching x and y
as leaves to z.
Without loss of generality, T** is the tree constructed for C by the Huffman
procedure. Now suppose Huffman selects a and b from alphabet C in first step
so that f[a] not equal f[x] and f[b] = f[y]. Then tree C constructed by Huffman
can be altered as in proof of lemma-greedy-choice-property to give equivalent
tree with a and y siblings with maximum depth. Since T` and T* are both
optimal for C`, implies that B(T`) = B(T*) and also B(T**) = B(T) why?
because

B(T**) = B(T*) - f[z]dT*(z) + [f[x] +


f[y]] (dT*(z) + 1)]
= B(T*) + f[x] + f[y]
Since tree T is optimal for alphabet C, so is T** . And T** is the tree
constructed by the Huffman code.
And this completes the proof. □

Theorem The total cost of a tree for a code can be


computed as the sum, over all internal nodes, of the
combined frequencies of the two children of the node.

Proof
Let tree be a full binary tree with n leaves. Apply induction hypothesis on the
number of leaves in T. When n=2 (the case n=1 is trivially true), there are two
leaves x and y (say) with the same parent z, then the cost of T is

B(T) = f(x)dT(x) + f[y]dT(y)


= f[x] + f[y]
since dT(x) = dT(y) =1
= f[child1 of z] + f[child2
of z].
Thus, the statement of theorem is true. Now suppose n>2 and also suppose that
theorem is true for trees on n-1 leaves.
Let c1 and c2 are two sibling leaves in T such that they have the same parent p.
Letting T` be the tree obtained by deleting c1 and c2, we know by induction that

B(T) = leaves l` in T` f[l`]dT(l`)


= internal nodes i` in T`
f[child1of i`] + f [child2 of i`]
Using this information, calculates the cost of T.

B(T) = leaves l in T f[l]dT(l)


= l not= c1, c2 f[l]dT(l) +
f[c1]dT (c1) -1) + f[c2]dT (c2)
-1) + f[c1]+ f[c2]
= leaves l` in T` f[l]dT`(l) +
f[c1]+ f[c2]
= internal nodes i` in T`
f[child1of i`] + f [child2 of i`]
+ f[c1]+ f[c2]
= internal nodes i in T
f[child1of i] + f[child1of i]
Thus the statement is true. And this completes the proof.

The question is whether Huffman's algorithm can be generalize to handle


ternary codewords, that is codewords using the symbols 0,1 and 2. Restate the
question, whether or not some generalized version of Huffman's algorithm
yields optimal ternary codes? Basically, the algorithm is similar to the binary
code example given in the CLR-text book. That is, pick up three nodes (not
two) which have the least frequency and form a new node with frequency equal
to the summation of these three frequencies. Then repeat the procedure.
However, when the number of nodes is an even number, a full ternary tree is
not possible. So take care of this by inserting a null node with zero frequency.
Correctness
Proof is immediate from the greedy choice property and an optimal
substructure property. In other words, the proof is similar to the correctness
proof of Huffman's algorithm in the CLR.

Spanning Tree and


Minimum Spanning Tree

Spanning Trees
A spanning tree of a graph is any tree that includes every vertex in the graph.
Little more formally, a spanning tree of a graph G is a subgraph of G that is a
tree and contains all the vertices of G. An edge of a spanning tree is called a
branch; an edge in the graph that is not in the spanning tree is called a chord.
We construct spanning tree whenever we want to find a simple, cheap and yet
efficient way to connect a set of terminals (computers, cites, factories, etc.).
Spanning trees are important because of following reasons.
Spanning trees construct a sparse sub graph that tells a lot about the original graph.

Spanning trees a very important in designing efficient routing algorithms.

Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be solved approximately by using

spanning trees.
Spanning trees have wide applications in many areas, such as network design, etc.

Greedy Spanning Tree Algorithm


One of the most elegant spanning tree algorithm that I know of is as follows:
Examine the edges in graph in any arbitrary sequence.

Decide whether each edge will be included in the spanning tree.

Note that each time a step of the algorithm is performed, one edge is
examined. If there is only a finite number of edges in the graph, the algorithm
must halt after a finite number of steps. Thus, the time complexity of this
algorithm is clearly O(n), where n is the number of edges in the graph.

Some important facts about spanning trees are as follows:

Any two vertices in a tree are



connected by a unique path.
Let T be a spanning tree of a

graph G, and let e be an edge of G not in
T. The T+e contains a unique cycle.

Lemma The number of spanning trees in the complete


graph Kn is nn-2.

Greediness It is easy to see that this algorithm has the property that
each edge is examined at most once. Algorithms, like this one, which examine
each entity at most once and decide its fate once and for all during that
examination are called greedy algorithms. The obvious advantage of greedy
approach is that we do not have to spend time reexamining entities.

Consider the problem of finding a spanning tree with the smallest possible
weight or the largest possible weight, respectively called a minimum spanning
tree and a maximum spanning tree. It is easy to see that if a graph possesses a
spanning tree, it must have a minimum spanning tree and also a maximum
spanning tree. These spanning trees can be constructed by performing the
spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate
ordering of the edges.
Minimum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the
edges is order of non decreasing weight (smallest first, largest last). If two or
more edges have the same weight, order them arbitrarily.

Maximum Spanning Tree Algorithm


Perform the spanning tree algorithm (above) by examining the
edges in order of non increasing weight (largest first, smallest last). If two or
more edges have the same weight, order them arbitrarily.

Minimum Spanning Trees


A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G
whose edges sum is minimum weight. In other words, a MST is a tree formed
from a subset of the edges in a given undirected graph, with two properties:
it spans the graph, i.e., it includes every vertex of

the graph.
it is a minimum, i.e., the total weight of all the

edges is as low as possible.

Let G=(V, E) be a connected, undirected graph where V is a set of vertices


(nodes) and E is the set of edges. Each edge has a given non negative length.

Problem Find a subset T of the edges of G such that all the vertices
remain connected when only the edges T are used, and the sum of the lengths
of the edges in T is as small as possible.

Let G` = (V, T) be the partial graph formed by the vertices of G and the edges
in T. [Note: A connected graph with n vertices must have at least n-1 edges
AND more that n-1 edges implies at least one cycle]. So n-1 is the minimum
number of edges in the T. Hence if G` is connected and T has more that n-1
edges, we can remove at least one of these edges without disconnecting (choose
an edge that is part of cycle). This will decrease the total length of edges in T.
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must
have n-1 edges otherwise there exist at least one cycle. Hence if G` is
connected and T has more that n-1 edges. Implies that it contains at least one
cycle. Remove edge from T without disconnecting the G` (i.e., remove the edge
that is part of the cycle). This will decrease the total length of the edges in T.
Therefore, the new solution is preferable to the old one.

Thus, T with n vertices and more edges can be an optimal solution. It follow T
must have n-1 edges and since G` is connected it must be a tree. The G` is
called Minimum Spanning Tree (MST).

Kruskal's Algorithm

In kruskal's algorithm the selection function chooses edges in increasing order of length
without worrying too much about their connection to previously chosen edges, except
that never to form a cycle. The result is a forest of trees that grows until all the trees in a
forest (all the components) merge in a single tree.

Prim's Algorithm

This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from
an arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already
constructed; the algorithm halts when all the vertices in the graph have been reached.
This strategy is greedy in the sense that at each step the partial spanning tree is
augmented with an edge that is the smallest among all possible adjacent edges.

MST-PRIM

Input: A weighted, undirected graph G=(V, E, w)


Output: A minimum spanning tree T.

T={}
Let r be an arbitrarily chosen vertex from V.
U = {r}
WHILE | U| < n
DO
Find u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.
T = TU{(u, v)}
U= UU{v}

Analysis
The algorithm spends most of its time in finding the smallest edge. So, time of the
algorithm basically depends on how do we search this edge.
Straightforward method
Just find the smallest edge by searching the adjacency list of the vertices in V. In this
case, each iteration costs O(m) time, yielding a total running time of O(mn).

Binary heap
By using binary heaps, the algorithm runs in O(m log n).

Fibonacci heap
By using Fibonacci heaps, the algorithm runs in O(m + n log n) time.

Dijkstra's Algorithm (Shortest Path)

Consider a directed graph G = (V, E).

Problem Determine the length of the shortest path from the source to each of the other
nodes of the graph. This problem can be solved by a greedy algorithm often called
Dijkstra's algorithm.

The algorithm maintains two sets of vertices, S and C. At every stage the set S contains
those vertices that have already been selected and set C contains all the other vertices.
Hence we have the invariant property V=S U C. When algorithm starts Delta contains
only the source vertex and when the algorithm halts, Delta contains all the vertices of the
graph and problem is solved. At each step algorithm choose the vertex in C whose
distance to the source is least and add it to S.

Divide-and-Conquer Algorithm

Divide-and-conquer is a top-down technique for designing algorithms that


consists of dividing the problem into smaller subproblems hoping that the
solutions of the subproblems are easier to find and then composing the partial
solutions into the solution of the original problem.
Little more formally, divide-and-conquer paradigm consists of following major
phases:
Breaking the problem into several sub-problems that are similar to the original problem but smaller in size,

Solve the sub-problem recursively (successively and independently), and then

Combine these solutions to subproblems to create a solution to the original problem.

Binary Search (simplest application of divide-and-conquer)


Binary Search is an extremely well-known instance of divide-and-conquer
paradigm. Given an ordered array of n elements, the basic idea of binary search
is that for a given element we "probe" the middle element of the array. We
continue in either the lower or upper segment of the array, depending on the
outcome of the probe until we reached the required (given) element.

Problem Let A[1 . . . n] be an array of non-decreasing sorted order;


that is A [i] ≤ A [j] whenever 1 ≤ i ≤ j ≤ n. Let 'q' be the query
point. The problem consist of finding 'q' in the array A. If q is not in A, then
find the position where 'q' might be inserted.

Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤
A[i].

Sequential Search
Look sequentially at each element of A until either we reach at the end of an
array A or find an item no smaller than 'q'.

Sequential search for 'q' in array A


for i = 1 to n do
if A [i] ≥ q then
return index i
return n + 1

Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is
Ω(n) in the worst case and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the
array then loop executed (n + 1) / 2 average number of times. On average
(as well as the worst case), sequential search takes θ(n) time.

Binary Search
Look for 'q' either in the first half or in the second half of the array A.
Compare 'q' to an element in the middle, n/2 , of the array. Let k = n/2
. If q ≤ A[k], then search in the A[1 . . . k]; otherwise search
T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the
promise that

A[i-1] < x ≤ A[j]


If i = j then
return i (index)
k= (i + j)/2
if q ≤ A [k]
then return Binary Search
[A [i-k], q]
else return Binary Search
[A[k+1 . . j], q]

Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e.,
T(n) = θ(log n). This version of the binary search takes logarithmic time in
the best case.

Iterative Version of Binary Search


Interactive binary search for q, in array A[1 . . n]
if q > A [n]
then return n + 1
i = 1;
j = n;
while i < j do
k = (i + j)/2
if q ≤ A [k]
then j = k
else i = k + 1
return i (the index)

Analysis
The analysis of Iterative algorithm is identical to that of its recursive
counterpart.
Dynamic Programming Algorithms

Dynamic programming is a stage-wise search method suitable for


optimization problems whose solutions may be viewed as the result of a
sequence of decisions. The most attractive property of this strategy is that
during the search for a solution it avoids full enumeration by pruning early
partial decision solutions that cannot possibly lead to optimal solution. In many
practical situations, this strategy hits the optimal solution in a polynomial
number of decision steps. However, in the worst case, such a strategy may end
up performing full enumeration.

Dynamic programming takes advantage of the duplication and arrange to solve


each subproblem only once, saving the solution (in table or something) for later
use. The underlying idea of dynamic programming is: avoid calculating the
same stuff twice, usually by keeping a table of known results of subproblems.
Unlike divide-and-conquer, which solves the subproblems top-down, a
dynamic programming is a bottom-up technique.

Bottom-up means
i. Start with the smallest subproblems.

ii. Combining theirs solutions obtain the solutions to subproblems of

increasing size.

iii. Until arrive at the solution of the original problem.

The Principle of Optimality


The dynamic programming relies on a principle of optimality. This principle
states that in an optimal sequence of decisions or choices, each subsequence
must also be optimal. For example, in matrix chain multiplication problem, not
only the value we are interested in is optimal but all the other entries in the
table are also represent optimal.

The principle can be related as follows: the optimal solution to a problem is a


combination of optimal solutions to some of its subproblems.

The difficulty in turning the principle of optimally into an algorithm is that it is


not usually obvious which subproblems are relevant to the problem under
consideration.

Dynamic-Programming Solution
to the 0-1 Knapsack Problem

Problem Statement A thief robbing a store and can carry a maximal


weight of W into their knapsack. There are n items and ith item weigh wi and is
worth vi dollars. What items should thief take?

There are two versions of problem

Fractional knapsack problem The setup is same,


but the thief can take fractions of items, meaning that
the items can be broken into smaller pieces so that
thief may decide to carry only a fraction of xi of item
i, where 0 ≤ xi ≤ 1.

0-1 knapsack problem The setup is the same, but


the items may not be broken into smaller pieces, so
thief may decide either to take an item or to leave it
(binary choice), but may not take a fraction of an
item.

Fractional knapsack problem


Exhibit greedy choice property.

⇒ Greedy algorithm exists.

Exhibit optimal substructure property.




0-1 knapsack problem


Exhibit No greedy choice property.

⇒ No greedy algorithm exists.

Exhibit optimal substructure property.



Only dynamic programming algorithm exists.

Dynamic-Programming Solution to the 0-1 Knapsack


Problem

Let i be the highest-numbered item in an optimal solution S for W pounds.


Then S` = S - {i} is an optimal solution for W - wi pounds and the value to
the solution S is Vi plus the value of the subproblem.

We can express this fact in the following formula: define c[i, w] to be the
solution for items 1,2, . . . , i and maximum weight w. Then

0 if i = 0 or w = 0
c[i,w]
c[i-1, w] if w ≥ 0
= i

max [v + c[i-1, w-w ], c[i-1,


i i if i>0 and w ≥ w
w]} i
th
This says that the value of the solution to i items either include i item, in
which case it is vi plus a subproblem solution for (i - 1) items and the weight
th
excluding wi, or does not include i item, in which case it is a subproblem's
solution for (i - 1) items and the same weight. That is, if the thief picks item i,
thief takes vi value, and thief can choose from items w - wi, and get c[i - 1,
w - wi] additional value. On other hand, if thief decides not to take item i,
thief can choose from item 1,2, . . . , i- 1 upto the weight limit w, and get c[i -
1, w] value. The better of these two choices should be made.
Although the 0-1 knapsack problem, the above formula for c is similar to LCS
formula: boundary values are 0, and other values are computed from the input
and "earlier" values of c. So the 0-1 knapsack algorithm is like the LCS-length
algorithm given in CLR for finding a longest common subsequence of two
sequences.

The algorithm takes as input the maximum weight W, the number of items n,
and the two sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . ,
wn>. It stores the c[i, j] values in the table, that is, a two dimensional array,
c[0 . . n, 0 . . w] whose entries are computed in a row-major order. That is,
the first row of c is filled in from left to right, then the second row, and so on.
At the end of the computation, c[n, w] contains the maximum value that can
be picked into the knapsack.

Dynamic-0-1-knapsack (v,
w, n, W)
FOR w = 0 TO W
DO c[0, w] = 0
FOR i=1 to n
DO c[i, 0] = 0
FOR w=1 TO
W
DO IFf wi ≤
w
THEN
IF vi + c[i-1, w-wi]
THEN
c[i, w] = vi + c[i-1,
w-wi]
ELSE
c[i, w] = c[i-1, w]
ELSE
c[i, w]
= c[i-1, w]

The set of items to take can be deduced from the table, starting at c[n. w] and
tracing backwards where the optimal values came from. If c[i, w] = c[i-1,
w] item i is not part of the solution, and we are continue tracing with c[i-1,
w]. Otherwise item i is part of the solution, and we continue tracing with c[i-
1, w-W].

Analysis
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as
follows: θ(nw) times to fill the c-table, which has (n +1).(w +1) entries,
each requiring θ(1) time to compute. O(n) time to trace the solution, because
the tracing process starts in row n of the table and moves up 1 row at each step.
Dynamic-Programming Algorithm
for the Activity-Selection Problem

An activity-selection is the problem of scheduling a resource among several


competing activity.

Problem Statement Given a set S of n activities with and start time,


Si and fi, finish time of an ith activity. Find the maximum size set of mutually
compatible activities.

Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj) do
not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi

Dynamic-Programming Algorithm
The finishing time are in a sorted array f[i] and the starting times are in array
s[i]. The array m[i] will store the value mi, where mi is the size of the largest
of mutually compatible activities among activities {1, 2, . . . , i}. Let
BINARY-SEARCH(f, s) returns the index of a number i in the sorted
array f such that f(i) ≤ s ≤ f[i + 1].

for i =1 to n
do m[i] = max(m[i-1], 1+
m [BINARY-SEARCH(f,
s[i])])
We have P(i] = 1
if activity i is in optimal
selection, and P[i] = 0
otherwise
i=n
while i > 0
do if m[i] = m[i-
1]
then P[i] =
0
i=i
-1
else
i=
BINARY-SEARCH (f, s[i])
P[i] =
1

Analysis
The running time of this algorithm is O(nlg n) because of the binary search
which takes lg(n) time as opposed to the O(n) running time of the greedy
algorithm. This greedy algorithm assumes that the activities already sorted by
increasing time.

Amortized Analysis
In an amortized analysis, the time required to perform a sequence of data
structure operations is average over all operation performed. Amortized
analysis can be used to show that average cost of an operation is small, if one
average over a sequence of operations, even though a single operation might be
expensive. Unlike the average probability distribution function, the amortized
analysis guarantees the 'average' performance of each operation in the worst
case.

CLR covers the three most common techniques used in amortized analysis. The
main difference is the way the cost is assign.
1. Aggregate Method

Computes an upper bound T(n) on the total cost of a sequence of n operations.


o
2. Accounting Method

Overcharge some operations early in the sequence. This 'overcharge' is used later in the sequence for pay operation
o
that charges less than they actually cost.

3. Potential Method
Maintain the credit as the potential energy to pay for future operations.
o

Aggregate Method

Aggregate Method Characteristics


It computes the worst case time T(n) for a sequence of n operations.

The amortized cost is T(n)/n per operation.

It gives the average performance of each operation in the worst case.

This method is less precise than other methods, as all operations are assigned the same cost.

Application 1: Stack operations


In the following pseudocode, the operation STACK-EMPTY returns TRUE if
there are no object currently on the stack, and FALSE otherwise.

MULTIPOP(s, k)
while (.NOT. STACK-
EMPTY(s) and k ≠ 0)
do pop(s)
k = k-1

Analysis

i. Worst-case cost for MULTIPOP is O(n). There are n successive calls to MULTIPOP would cost O(n2). We get unfair cost

O(n2) because each item can be poped only once for each time it is pushed.

ii. In a sequence of n mixed operations the most times multipop can be called n/2.Since the cost of push and pop is O(1), the cost

of n stack operations is O(n). Therefore, amortized cost of an operation is the average: O(n)/n = O(1).

Application 2: Binary Counter


We use an array A[0 . . k-1] of bits, where length [A] = k, as the counter. A
binary number x that is stored in the counter has its lowest-order bit in A[0] and
its highest-order bit is A[k-1], so that k-1Si=0 2iA[i]. Initially, x = 0, and thus A[i]
= 0 for i=0, 1, . . . , k-1.
To add 1 (modulus 2k ) to the value in the counter, use the following
pseudocode.
INCREMENT (A)
i=0
while i < length [A] and A[i] = 1
do A[i] = 0
i = i+1
if i < length [A]
then A[i] = 1

A single execution of INCREMENT takes O(k) in worst case when Array A


contains all 1's. Thus, a sequence of n INCREMENT operation on an initially
zero counter takes O(nk) in the worst case. This bound is correct but not tight.

Amortized Analysis

We can tighten the analysis to get a worst-case cost for a sequence of an


INCREMENT's by observing that not all bits flip each time INCREMENT is
called.

Bit A[0] is changed ceiling n times (every time)


Bit A[0] is changed ceiling [n/21] times (every other
time)
Bit A[0] is changed ceiling [n/22] times (every other
time)
.
.
.
Bit A[0] is changed ceiling [n/2i] times.

In general, for i = 0, 1, . . ., lg n , bit A[i] flips n/2i times in a


sequence of n INCREMENT operations on an initially zero counter.
For i > lg(n) , bit A i never flips at all. The total number of flips in a
sequence is thus

floor(log)
Si=0 n/2i < n ∞Si=0 1/2i = 2n

Therefore, the worst-case time for a sequence of n INCREMENT operation on


an initially zero counter is therefore O(n), so the amortized cost of each
operation is O(n)/n = O(1).

Accounting Method

In this method, we assign changes to different operations, with some


operations charged more or less than they actually cost. In other words, we
assign artificial charges to different operations.
Any overcharge for an operation on an item is stored (in an bank account) reserved for that item.

Later, a different operation on that item can pay for its cost with the credit for that item.

The balanced in the (bank) account is not allowed to become negative.

The sum of the amortized cost for any sequence of operations is an upper bound for the actual total cost of these operations.

The amortized cost of each operation must be chosen wisely in order to pay for each operation at or before the cost is incurred.

Application 1: Stack Operation


Recall the actual costs of stack operations were:

PUSH (s, x) 1
POP (s) 1
MULTIPOP (s, k) min(k,s)
The amortized cost assignments are

PUSH 2
POP 0
MULTIPOP 0
Observe that the amortized cost of each operation is O(1). We must show that
one can pay for any sequence of stack operations by charging the amortized
costs.

The two units costs collected for each PUSH is used as follows:
1 unit is used to pay the cost of the PUSH.

1 unit is collected in advanced to pay for a potential future POP.

Therefore, for any sequence for n PUSH, POP, and MULTIPOP operations, the
amortized cost is an

Ci = j=1Si 3 - Ciactual
= 3i - (2floor(lg1) + 1 + i -floor(lgi) - 2)
If i = 2k, where k ≥ 0, then
Ci = 3i - (2k+1 + i - k -2)
=k+2
If i = 2k + j, where k ≥ 0 and 1 ≤ j ≤ 2k, then
Ci = 3i - (2k+1 + i - k - 2)
= 2j + k + 2

This is an upperbound on the total actual cost. Since the total amortized cost is
O(n) so is the total cost.

As an example, consider a sequence of n operations is performed on a data


structure. The ith operation costs i if i is an exact power of 2, and 1 otherwise.
The accounting method of amortized analysis determine the amortized cost per
operation as follows:
Let amortized cost per operation be 3, then the credit Ci after the ith operation is:
Since k ≥ 1 and j ≥ 1, so credit Ci always greater than zero. Hence, the total
amortized cost 3n, that is O(n) is an upper bound on the total actual cost.
Therefore, the amortized cost of each operation is O(n)/n = O(1).

Another example, consider a sequence of stack operations on a stack whose


size never exceeds k. After every k operations, a copy of the entire stack is
made. We must show that the cost of n stack operations, including copying the
stack, is O(n) by assigning suitable amortized costs to the various stack
operations.
There are, ofcourse, many ways to assign amortized cost to stack operations.
One way is:

PUSH 4,
POP 0,
MULTIPOP 0,
STACK-COPY 0.

Every time we PUSH, we pay a dollar (unit) to perform the actual operation
and store 1 dollar (put in the bank). That leaves us with 2 dollars, which is
placed on x (say) element. When we POP x element off the stack, one of two
dollar is used to pay POP operation and the other one (dollar) is again put into a
bank account. The money in the bank is used to pay for the STACK-COPY
operation. Since after kk dollars in the bank and the stack size is never exceeds
k, there is enough dollars (units) in the bank (storage) to pay for the STACK-
COPY operations. The cost of n stack operations, including copy the stack is
therefore O(n). operations, there are atleast

Application 2: Binary Counter


We observed in the method, the running time of INCREMENT operation on
binary counter is proportion to the number of bits flipped. We shall use this
running time as our cost here.

For amortized analysis, charge an amortized cost of 2 dollars to set a bit to 1.


When a bit is set, use 1 dollar out of 2 dollars already charged to pay the actual
setting of the bit, and place the other dollar on the bit as credit so that when we
reset a bit to zero, we need not charge anything.

The amortized cost of psuedocode INCREMENT can now be evaluated:

INCREMENT (A)
1. i = 0
2. while i < length[A]
and A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
Within the while loop, the cost of resetting the bits is paid for by the dollars on
the bits that are reset.At most one bit is set, in line 6 above, and therefore the
amortized cost of an INCREMENT operation is at most 2 dollars (units). Thus,
for n INCREMENT operation, the total amortized cost is O(n), which bounds
the total actual cost.

Consider a Variant
Let us implement a binary counter as a bit vector so that any sequence of n
INCREMENT and RESET operations takes O(n) time on an initially zero
counter,. The goal here is not only to increment a counter but also to read it to
zero, that is, make all bits in the binary counter to zero. The new field , max[A],
holds the index of the high-order 1 in A. Initially, set max[A] to -1. Now,
update max[A] appropriately when the counter is incremented (or reset). By
contrast the cost of RESET, we can limit it to an amount that can be covered
from earlier INCREMENT'S.
INCREMENT (A)
1. i = 1
2. while i < length [A] and
A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
7. if i > max[A]
8. then max[A] = i
9. else max[A] = -1
Note that lines 7, 8 and 9 are added in the CLR algorithm of binary counter.

RESET(A)

For i = 0 to max[A]
do A[i] = 0
max[A] = -1
For the counter in the CLR we assume that it cost 1 dollar to flip a bit. In
addition to that we assume that we need 1 dollar to update max[A]. Setting and
Resetting of bits work exactly as the binary counter in CLR: Pay 1 dollar to set
bit to 1 and placed another 1 dollar on the same bit as credit. So, that the credit
on each bit will pay to reset the bit during incrementing.
In addition, use 1 dollar to update max[A] and if max[A] increases place 1
dollar as a credit on a new high-order 1. (If max[A] does not increase we just
waste that one dollar). Since RESET manipulates bits at some time before the
high-order 1 got up to max[A], every bit seen by RESET has one dollar credit
on it. So, the zeroing of bits by RESET can be completely paid for by the credit
stored on the bits. We just need one dollar to pay for resetting max[A].
Thus, charging 4 dollars for each INCREMENT and 1 dollar for each RESET
is sufficient, so the sequence of n INCREMENT and RESET operations take
O(n) amortized time.

Potential Method

This method stores pre-payments as potential or potential energy that can be


released to pay for future operations. The stored potential is associated with the
entire data structure rather than specific objects within the data structure.

Notation:
D is the initial data structure (e.g., stack)
• 0
D is the data structure after the ith operation.
• i
c is the actual cost of the ith operation.
• i
The potential function Ψ maps each D to its potential value Ψ(D )
• i i

The amortized cost ^ci of the ith operation w.r.t potential function Ψ is defined
by
^
ci = ci + Ψ(Di) - Ψ (Di-1) --------- (1)

The amortized cost of each operation is therefore


^
ci = [Actual operation cost] + [change
in potential].

By the eq.I, the total amortized cost of the n operation is


n ^ n
i=1 ci = i=1(ci + Ψ(Di) - Ψ (Di-1) )
n n n
= i=1 ci + i=1 Ψ(Di) -

i=1 Ψ (Di-1)
= n i=1 ci + Ψ(D1) + Ψ(D2) +
. . . + Ψ (Dn-1) + Ψ(Dn) - {Ψ(D0) +
Ψ(D1) + . . . + Ψ (Dn-1)
= n i=1 ci + Ψ(Dn) - Ψ(D0)
----------- (2)

If we define a potential function Ψ so that Ψ(Dn) ≥ Ψ(D0), then the total


amortized cost n i=1 ^ci is an upper bound on the total actual cost.

As an example consider a sequence of n operations performed on a data


th
structure. The i operation costs i if i is an exact power of 2 and 1 otherwise.
The potential method of amortized determines the amortized cost per operation
as follows:
 
Let Ψ(Di) = 2i - 2 lgi+1 + 1, then
 
Ψ(D0) = 0. Since 2 lgi+1 ≤
2i where i >0 ,
Therefore, Ψ(Di) ≥ 0 = Ψ(D0)

If i = 2k where k ≥ 0 then

 
2 lgi+1 =
2k+1 = 2i
 
2 lgi =
2k = i

^
ci = ci + Ψ(Di) - Ψ(Di-1)
= i + (2i -2i+1) -{2(i-1)-i+1}
=2
If i = 2k + j where k ≥ 0 and 1 ≤ j ≤ 2k
 
then 2 lgi+1 = 2[lgi]
^
ci = ci + Ψ(Di) - Ψ(Di-1) = 3

n ^ n
Because i=1 ci = i=1 ci + Ψ(Dn) - Ψ(D0)

and Ψ(Di) ≥ Ψ(D0), so, the total amortized cost of n operation is an upper
bound on the total actual cost. Therefore, the total amortized cost of a sequence
of n operation is O(n) and the amortized cost per operation is O(n) / n =
O(1).

Application 1- Stack Operations


Define the potential function Ψ on a stack to be the number of objects in the
stack. For empty stack D0 , we have Ψ(D0) = 0. Since the number of objects in
the stack can not be negative, the stack Di after the ith operation has nonnegative
potential, and thus

Ψ(Di) ≥ 0 = Ψ(D0).

Therefore, the total amortized cost of n operations w.r.t. function Ψ represents


an upper bound on the actual cost.

Amortized costs of stack operations are:


PUSH

th
If the i operation on a stack containing s object is a
PUSH operation, then the potential difference is

Ψ(Di) - Ψ(Di-1) = (s + 1) - s = 1
In simple words, if ith is PUSH then (i-1)th must be
one less. By equation I, the amortized cost of this
PUSH operation is

^
ci = ci + Ψ(Di) - Ψ(Di-1) = 1 + 1 =
2

MULTIPOP

If the ith operation on the stack is MULTIPOP(S, k)


and k` = min(k,s) objects are popped off the
stack.

The actual cost of the operation is k`, and the


potential difference is

Ψ(Di) - Ψ(Di-1) = -k`


why this is negative? Because we are taking off item
from the stack. Thus, the amortized cost of the
MULTIPOP operation is
^
ci = ci + Ψ(Di) - Ψ(Di-1)
= k`-k` = 0

POP
Similarly, the amortized cost of a POP operation is
0.

Analysis
Since amortized cost of each of the three operations is O(1), therefore, the total
amortized cost of n operations is O(n). The total amortized cost of n operations
is an upper bound on the total actual cost.

Lemma If data structure is Binary heap: Show that a


potential function is O(nlgn) such that the amortized cost of
EXTRACT-MIN is constant.

Proof
^
We know that the amortized cost ci of operation i is defined as

^
ci = ci + Ψ(Di) - Ψ(Di-1)
For the heap operations, this gives us

c1lgn = c2lg(n+c3) + Ψ(Di) -


Ψ(Di-1) (Insert) ------------
(1)

c4 = c5lg(n + c6) + Ψ(Di) -


Ψ(Di-1) (EXTRACT-
MIN) -----(2)

Consider the potential function Ψ(D) = lg(n!), where n is the number of items
in D.

From equation (1), we have

(c1 - c2)lg(n + c3) = lg(n!)


- lg ((n-1)!) = lgn.
This clearly holds if c1 = c2 and c3 =
0.

From equation (2), we have

c4 - c5 lg(n + c6) = lg(n!) -


lg ((n+1)!) = - lg(n+1).
This clearly holds if c4 = 0 and c4 =
c6 = 1.
Remember that stirlings function tells that lg(n!) = θ(nlgn), so
Ψ(D) = θ(n lg n)
And this completes the proof.

Application 2: Binary Counter


th
Define the potential of the counter after i INCREMENT operation to be bi,
th
the number of 1's in the counter after i operation.

Let ith INCREMENT operation resets ti bits. This implies that actual cost =
atmost (ti + 1).
Why? Because in addition to resetting ti it also sets at most one bit to 1.
th
Therefore, the number of 1's in the counter after the i operation is therefore bi
≤ bi-1 - ti + 1, and the potential difference is
Ψ(Di) - Ψ(Di-1) ≤ (bi-1 - ti +
1) - bi-1 = 1- ti

Putting this value in equation (1), we get

^
ci = ci + Ψ(Di) - Ψ
(Di-1)
= (ti + 1) + (1- ti)
= 2
If counter starts at zero, then Ψ(D0) = 0. Since Ψ(Di) ≥ 0 for all i, the
total amortized cost of a sequence of n INCREMENT operation is an upper
bound on the total actual cost, and so the worst-case cost of n INCREMENT
operations is O(n).

If counter does not start at zero, then the initial number are 1's (= b0).

After 'n' INCREMENT operations the number of 1's = bn, where 0 ≤ b0, bn ≤ k.

Since n i=1^ci = (ci + Ψ(Di) + Ψ(Di-1))


=> n i=1^ci = n i=1 ci + n i=1 Ψ(Di) + n i=1 Ψ(Di-1)
=> n i=1^ci = nSi=1 ci + Ψ(Dn) - Ψ(D0)
=> n i=1ci = n i=1 ^ci + Ψ(D0) - Ψ(Dn)
^
We have ci ≤ 2 for all 1≤ i ≤ n. Since Ψ(Di) = b0 and Ψ(Dn) = b, the
total cost of n INCREMENT operation is

Since n i=1 ic =n i=1


^
ci + Ψ(Dn) + Ψ(D0)
≤n i=1 2 - bn + b0 why because ci ≤
2
= 2 n i=1 - bn + b0
= 2n - bn + b

Note that since b0 ≤ k, if we execute at least n = Ω(k) INCREMENT


Operations, the total actual cost is O(n), no matter what initial value of
counter is.

Implementation of a queue with two stacks, such that the amortized cost of
each ENQUEUE and each DEQUEUE Operation is O(1). ENQUEUE pushes
an object onto the first stack. DEQUEUE pops off an object from second stack
if it is not empty. If second stack is empty, DEQUEUE transfers all objects
from the first stack to the second stack to the second stack and then pops off the
first object. The goal is to show that this implementation has an O(1)
amortized cost for each ENQUEUE and DEQUEUE operation. Suppose Di
th
denotes the state of the stacks after i operation. Define Ψ(Di) to be the
number of elements in the first stack. Clearly, Ψ(D0) = 0 and Ψ(Di) ≥ Ψ(D0)
for all i. If the ith operation is an ENQUEUE operation, then Ψ(Di) - Ψ(Di-1) = 1
Since the actual cost of an ENQUEUE operation is 1, the amortized cost of an
th
ENQUEUE operation is 2. If the i operation is a DEQUEUE, then there are
two case to consider.
Case i: When the second stack is not empty.

In this case we have Ψ(D ) - Ψ(D ) = 0 and the actual cost of the DEQUEUE operation is 1.
i i-1
Case ii: When the second stack is empty.

In this case, we have Ψ(D ) - Ψ(D ) = - Ψ(D ) and the actual cost of the DEQUEUE operation is Ψ(D ) + 1 .
i i-1 i-1 i-1

In either case, the amortize cost of the DEQUEUE operation is 1. It follows that
each operation has O(1) amortized cost

Dynamic Table

If the allocated space for the table is not enough, we must copy the table into
larger size table. Similarly, if large number of members erased from the table, it
is good idea to reallocate the table with a smaller size. Using amortized analysis
we shall show that the amortized cost of insertion and deletion is constant and
unused space in a dynamic table never exceeds a constant fraction of the total
space.

Assume that the dynamic table supports following two operations:


TABLE-INSERT

This operation add an item in the table


by copying into the unused single slot.
The cost of insertion is 1.

TABLE-DELETE

This operation removes an item from


the table by freeing a slot. The cost of
deletion is 1.

Load Factor
The number of items stored in the table, n, divided by the size of the table, m, is
defined as the load factor and denoted as T(α) = m/n
The load factor of the empty table (size m=0) is 1.
A table is full when there exists no used slots. That is, the number of items
stored in the table equals the number of available slots (m=n). In this case

Load factor T(α) = n/m = 1

Proposed Algorithm
1. Initialize table size to m=1.

2. Keep inserting elements as long as size of the table less than number of items i.e., n<m.

3. Generate a new table of size 2m and set m <-- 2m.

4. Copy items (by using elementary insertion) from the old table into the new one.

5. GOTO step 2.

Analysis
If n elementary insert operations are performed in line 4, the worst-case cost of
an operation is O(n), which leads to an upper bound of O(n2) on the total
running time for n operations.

Aggregate Analysis
th
The i insert operation causes an expansion only when i - 1 an exact power of
th
2. Let ci be the cost of the i insert operation. Then

ci = i if i - 1 is an exact
power of 2
1 Otherwise
As an example, consider the following illustration.
INSERTION TABLE-SIZE COST
n m 1

1 1 1+1
2 2 1+2

3 4 1
4 4 1+4

5 8 1
6 8 1

7 8 1
8 8 1
9 16 1+8
10 16 1

The total cost of n insert operations is therefore

n
∑i=1 ci ≤ n +
floor(lgn)
∑j=0 2j
=n+
floor(lgn)+1
[2 -1]/[2-
1] since n∑k=0 xk
= [xn+1 -1]/[x-1]
= n + 2lgn .
2-1
= n + 2n -
1
= 3n - 1
< 3n
Therefore, the amortized cost of a single operation is
= Total
cost
Number of
operations
= 3n/n
= 3
Asymptotically, the cost of dynamic table is O(1) which is the same as that of
table of fixed size.

Accounting Method
Here we guess charges to 3$. Intuitively, each item pays for 3 elementary
operations.
1. 1$ for inserting immediate item.

2. 1$ for moving item (re-insert) when the table is expanded.

3. 1$ for moving another item (re-insert) has already been moved once when that table is expanded.

Potential Method
Define a potential function Φ that is 0 immediately after an expansion but
potential builds to the table size by the time the table is full.

Φ(T) = 2 . num[T] - size[T]

Immediately after an expansion (but before any insertion) we have, num[T]


= size[T]/2
which implies

Φ(T) = 2 . num[T] -
2 num[T]
=0
Immediately before an expansion, we have num[T] = size[T],

which implies

Φ(T) = 2 . num[T] -
num[T]
= num[T]
The initial value of the potential function is zero i.e., Φ(T) = 0, and half-full
i.e., num[T] ≥ size[T]/2. or 2 num[T] ≥ size[T]
which implies

Φ(T) = 2 num[T] - size[T] ≥ 0


That is, Φ(T) is always nonnegative.

Before, analyze the amortized cost of the ith TABLE-INSERT operation define
following.

Let
numi = number of elements in the table after ith operation
sizei = table after ith operation.
Φi = Potential function after ith operation.

Initially, we have num0 = size0 =0 and Φ0 = 0.

If ith insertion does not trigger an expansion, then sizei = sizei-1 and the
amortized cost of the operation is
^
ci = ci + Φi - Φi-1
= 1 + [2 . numi - sizei] - [2
. numi-1- sizei-1]
= 1 + 2 numi - sizei -
2(numi- 1) - sizei
= 1 + 2numi - sizei -
2numi + 2 - sizei
=3

If the ith insertion does trigger an expansion, then (size of the table becomes
double) then sizei = sizei-1 = numi and the amortized cost of the operation
is 2
^
ci = ci + Φi - Φi-1
= numi + [2 . numi - sizei]
- [2 . numi-1 - sizei-1]
= numi + 2numi - sizei - 2
. numi-1 + sizei-1
= numi + 2numi -2(numi
-1) -2(numi -1) + (numi -1)
= numi + 2numi - 2numi
-2 -2numi + 2 + numi -1
= 4 -1
= 3

What is the catch? It show how potential builds (from zero) to pay the
table expansion.
Dynamic Table Expansion and Contraction
When the load factor of the table, α(T) = n/m, becomes too small, we want
to preserve following two properties:
1. Keep the load factor of the dynamic table below by a constant.

2. Keep the amortized cost of the dynamic table bounded above by a constant.

Proposed Strategy
Even: When an item is inserted into full table.
Action: Double the size of the table i.e., m ← 2m.
Even: When removing of an item makes table less the half
table.
Action: Halve the size of the table i.e., m ← m/2.

The problem with this strategy is trashing. We can avoid this problem by
allowing the load factor of the table, α(T) = n/m, to drop below 1/2 before
contradicting it. By contradicting the table when load factor falls below 1/4,
we maintain the lower bound α(T) ≥ 1/4 i.e., load factor is bounded by the
constant 1/4.

Load Factor
The load factor α(T), of non-empty table T is defined as the number of items
stored in the T divided by the size of T, which is number slots in T, i.e.,

α(T) = num[T] / size[T]

Since for an empty table, we have


num[T] = size[T] = 0 and α(T) = 1

which implies that we always have

num[T] = α(T) . size[T]


whether table is empty or not.

Analysis by Potential Method


We start by defining a potential function Φ that is
1. 0 immediately after an expansion and builds as the load factor, α(T) = n/m, increases to 1.

2. 0 immediately after a contraction and builds as the load factor, α(T) = n/m, decreases to 1/4.

2 . num[T] - size(T) if α(T) ≥ 1/2


Φ(T)
size(T) - num[T] if α(T) < 1/2

Note that the potential function is never negative.

Properties of Potential Function


When α(T) = 1/2

α(T) = num[T] = 1/2


size[T]

implies that size[T] = 2 num[T]

And the potential function is


since Φ(T) = 2 num[T] - size[T]
= 2 num[T] - 2 num[T]
=0

When α(T) = 1
since α(T) = num[T] = 1
size[T]

implies that size[T] = num[T]

And the potential function is

since Φ(T) = 2 num[T] - size[T]


= 2 num[T] - num[T]
= num[T]

which indicates that the potential can pay for an expansion if an item is
inserted.

When α(T) = 1/4, then


since α(T) = num[T] = 1/4
size[T]
implies that size[T] = 4 num[T]

And the potential function is

Φ(T) = size[T]/2 - num[T]


= 4 num[T]/2 - num[T]
= num[T]

which indicates that the potential can pay for a contradiction if an item is
deleted.

Notation
The subscript is used in the existing notations to denote their values after the ith
operations. That is to say, ^ci, ci, numi, sizei, αi and Φi indicate values after the
ith operation.
Initially
num0 = size0 = Φ0 = 1 and α0

Hash Table

Direct-address table
If the keys are drawn from the reasoning small universe U = {0, 1, . . . , m-1} of
keys, a solution is to use a Table T[0, . m-1], indexed by keys. To represent the
dynamic set, we use an array, or direct-address table, denoted by T[0 . . m-1],
in which each slot corresponds to a key in the universe.

Following figure illustrates the approach.

Each key in the universe U i.e., Collection, corresponds to an index in the table
T[0 . . m-1]. Using this approach, all three basic operations (dictionary
operations) take θ(1) in the worst case.
Hash Tables
When the size of the universe is much larger the same approach (direct address
table) could still work in principle, but the size of the table would make it
impractical. A solution is to map the keys onto a small range, using a function
called a hash function. The resulting data structure is called hash table.

With direct addressing, an element with key k is stored in slot k. With hashing
=, this same element is stored in slot h(k); that is we use a hash function h to
compute the slot from the key. Hash function maps the universe U of keys into
the slot of a hash table T[0 . . .m-1].

h: U → {0, 1, . . ., m-1}
More formally, suppose we want to store a set of size n in a table of size m.
The ratio α = n/m is called a load factor, that is, the average number of
elements stored in a Table. Assume we have a hash function h that maps each
key k U to an integer name h(k) [0 . . m-1]. The basic idea is to store
key k in location T[h(k)].

Typical, hash functions generate "random looking" valves. For example, the
following function usually works well

h(k) = k mod m where m is a prime number.

Is there any point of the hash function? Yes, the point of the hash function is to
reduce the range of array indices that need to be handled.

Collision
As keys are inserted in the table, it is possible that two keys may hash to the
same table slot. If the hash function distributes the elements uniformly over the
table, the number of conclusions cannot be too large on the average, but the
birthday paradox makes it very likely that there will be at least one collision,
even for a lightly loaded table

A hash function h map the keys k and j to the same slot, so they collide.

There are two basic methods for handling collisions in a hash table: Chaining
and Open addressing.

Collision Resolution by Chaining


When there is a collision (keys hash to the same slot), the incoming keys is
stored in an overflow area and the corresponding record is appeared at the end
of the linked list.
Each slot T[j] contains a linked list of all the keys whose hash value is j. For
example, h(k1) = h(kn) and h(k5) = h(k2) = h(k7).
The worst case running time for insertion is O(1).

 Deletion of an element x can be accomplished in O(1) time if the lists are doubly linked.

 In the worst case behavior of chain-hashing, all n keys hash to the same slot, creating a list of length n. The worst-case time for search

is thus θ(n) plus the time to compute the hash function.

keys: 5, 28, 19, 15, 20, 33, 12, 17, 10


slots: 9
hash function = h(k) = k mod 9
h(5) = 5 mod 9 = 4
h(28) = 28 mod 9 = 1

h(19) = 19 mod 9 = 1
h(15) = 15 mod 9 = 6

h(20) = 20 mod 9 = 2
h(33) = 33 mod 9 = 6
h(12) = 12mod 9 = 3
h(17) = 17 mod 9 = 8

h(10) = 10 mod 9 = 1

Figure
A good hash function satisfies the assumption of simple uniform hashing, each
element is equally likely to hash into any of the m slots, independently of
where any other element has hash to. But usually it is not possible to check this
condition because one rarely knows the probability distribution according to
which the keys are drawn.

In practice, we use heuristic techniques to create a hash function that perform


well. One good approach is to derive the hash value in a way that is expected to
be independent of any patterns that might exist in the data (division method).

Most hash function assume that the universe of keys is the set of natural
numbers. Thus, its keys are not natural to interpret than as natural numbers.

Method for Creating Hash Function

1. The division method.

2. The multiplication method.

3. Universal hashing.

1. The Division Method


Map a key k into one of m slots by taking the remainder of k divided by m.
That is, the hash function is

h(k) = k mod m.

Example:
If table size m = 12
key k = 100

than
h(100) = 100 mod 12
=4

Poor choices of m
m should not be a power of 2, since if m = 2p, then h(k) is just the p lowest-order bits of k.
So, 2p may be a poor choice, because permuting the characters of k does not change value.

Good m choice of m
A prime not too close to an exact of 2.

2. The Multiplication Method


Two step process
Step 1:

Multiply the key k by a constant 0< A < 1 and extract the fraction part of kA.
Step 2:

Multiply kA by m and take the floor of the result.

The hash function using multiplication method is:

h(k) = m(kA mod 1)


where "kA mod 1" means the fractional part of kA, that is, kA -
 kA .

Advantage of this method is that the value of m is not critical and can be
implemented on most computers.
A reasonable value of constant A is
≈ (sqrt5 - 1) /2

suggested by Knuth's Art of Programming.

3. Universal Hashing
Open Addressing

This is another way to deal with collisions.

In this technique all elements are stored in the hash table itself. That is, each
table entry contains either an element or NIL. When searching for element (or
empty slot), we systematically examine slots until we found an element (or
empty slot). There are no lists and no elements stored outside the table. That
implies that table can completely "fill up"; the load factor α can never exceed
1.Advantage of this technique is that it avoids pointers (pointers need space
too). Instead of chasing pointers, we compute the sequence of slots to be
examined. To perform insertion, we successively examine or probe, the hash
table until we find an empty slot. The sequence of slots probed "depends upon
the key being inserted." To determine which slots to probe, the hash function
includes the probe number as a second input. Thus, the hash function becomes

h: × {0, 1, . . . m -1 }--> {0, 1, . . . , m-1}


and the probe sequence

< h(k, 0), h(k, 1), . . . , h(k, m-1) >


in which every slot is eventually considered.

Pseudocode for Insertion


HASH-INSERT (T, k)
i=0
Repeat j <-- h(k, i)
if Y[j] = NIL
then T[j] = k
Return j
use i = i +1
until i = m
error "table overflow"

Pseudocode for Search


HASH-SEARCH (T, k)

i=0
Repeat j <-- h(k, i)
if T[j] = k
then return j
i = i +1
until T[j] = NIL or i = m
Return NIL

Pseudocode for Deletion

Following are the three techniques to compute the probe


sequences.
1. Linear probing.

2. Quadratic probing.

3. Double hashing.
These techniques guarantee that

< h(k, 0), h(k, 1), . . . , h(k, m-1) >


a permutation of < 0, 1, . . . , m-1> for each key k.

Uniform hashing required are not met. Since none of these techniques capable
of generating more than m2 probe sequences (instead of m!).

Uniform Hashing
Each key is equally likely to have any of the m! permutation of < 0, 1, . . . , m-1> as its probe sequence.

Note that uniform hashing generalizes the notion of simple uniform hashing.

1. Linear Probing
This method uses the hash function of the form:

h(k, i) = (h`(k) + i) mod m for i = 0, 1, 2, . . . , m-1


where h` is an auxiliary hash function. Linear probing suffers primary
clustering problem.

2. Quadratic Probing
This method uses the hash function of the form

h(k, i) = (h`(k) + c1i + c2i2) mod m for i = 0, 1, 2, . . . , m-


1
where h` is an auxiliary hash function. c1 + c2 ≠ 0 are auxiliary constants.
This method works much better than linear probing.

Quadratic probing suffers a milder form of clustering, called secondary


clustering.

3. Double Hashing
This method produced the permutation that is very close to random. This
method uses a hash function of the form

h(k, i) = (h, 9k) + ih2 (k)) mod m


where h1 and h2 are auxiliary hash functions.

The probe sequence here depends in two ways on the key k, the initial probe
position and the offset.

Binary Search Tree

Binary Search tree is a binary tree in which each internal node x stores an element such that the element stored in the left subtree of x
are less than or equal to x and elements stored in the right subtree of x are greater than or equal to x. This is called binary-search-tree

property.

The basic operations on a binary search tree take time proportional to the height
of the tree. For a complete binary tree with node n, such operations runs in
(lg n) worst-case time. If the tree is a linear chain of n nodes, however, the
same operations takes (n) worst-case time.
The height of the Binary Search Tree equals the number of links from
the root node to the deepest node.

Implementation of Binary Search Tree


Binary Search Tree can be implemented as a linked data structure in which
each node is an object with three pointer fields. The three pointer fields left,
right and p point to the nodes corresponding to the left child, right child and the
parent respectively NIL in any pointer field signifies that there exists no
corresponding child or parent. The root node is the only node in the BTS
structure with NIL in its p field.
Inorder Tree Walk
During this type of walk, we visit the root of a subtree between the left subtree
visit and right subtree visit.

INORDER-TREE-WALK
(x)
If x NIL then
INORDER-TREE-WALK
(left[x])
print key[x]
INORDER-TREE-WALK
(right[x])
It takes (n) time to walk a tree of n nodes. Note that the Binary Search Tree property allows us to print out all the elements in the
Binary Search Tree in sorted order.
Preorder Tree Walk
In which we visit the root node before the nodes in either subtree.

PREORDER-TREE-
WALK (x)
If x not equal NIL then
PRINT key[x]
PREORDER-TREE-
WALK (left[x])
PREORDER-TREE-
WALK (right[x])

Postorder Tree Walk


In which we visit the root node after the nodes in its subtrees.

POSTORDER-TREE-
WALk (x)
If x not equal NIL then
POSTORDER-TREE-
WALK (left[x])
PREORDER-TREE-
WALK (right[x])
PRINT key [x]
It takes O(n) time to walk (inorder, preorder and pastorder) a tree of n nodes.

Binary-Search-Tree property Vs Heap Property


In a heap, a nodes key is greater than equal to both of its children's keys. In
binary search tree, a node's key is greater than or equal to its child's key but less
than or equal to right child's key. Furthermore, this applies to entire subtree in
the binary search tree case. It is very important to note that the heap property
does not help print the nodes in sorted order because this property does not tell
us in which subtree the next item is. If the heap property could used to print the
keys (as we have shown above) in sorted order in O(n) time, this would
contradict our known lower bound on comparison sorting.

The last statement implies that since sorting n elements takes Ω(n lg n) time
in the worst case in the comparison model, any comparison-based algorithm for
constructing a Binary Search Tree from arbitrary list n elements takes Ω(n lg
n) time in the worst case.
We can show the validity of this argument (in case you are thinking of beating
Ω(n lg n) bound) as follows: let c(n) be the worst-case running time for
constructing a binary tree of a set of n elements. Given an n-node BST, the
inorder walk in the tree outputs the keys in sorted order (shown above). Since
the worst-case running time of any computation based sorting algorithm is
Ω(n lg n) , we have

c(n) + O(n) = Ω(n lgn)


Therefore, c(n) = Ω(n lgn).

Querying a Binary Search Tree


The most common operations performed on a BST is searching for a key stored
in the tree. Other operations are MINIMUM, MAXIMUM, SUCCESSOR and
PREDESESSOR. These operations run in O(h) time where h is the height of
the tree i.e., h is the number of links root node to the deepest node.

The TREE-SEARCH (x, k) algorithm searches the tree root at x for a node
whose key value equals k. It returns a pointer to the node if it exists otherwise
NIL

TREE-SEARCH (x, k)
if x = NIL .OR.
k = key[x]
then return x
if k < key[x]
then return
TREE-SEARCH
(left[x], k)
else return
TREE-SEARCH
(right[x], k)

Clearly, this algorithm runs in O(h) time where h is the height of the tree.

The iterative version of above algorithm is very easy to implement.

ITERATIVE-TREE-
SEARCH (x, k)
1. while x not equal NIL .AND.
key ≠ key[x] do

2. if k < key [x]

3. then x ← left[x]
4. else x ← right [x]

5. return x

The TREE-MINIMUN (x) algorithm returns a point to the node of the tree at x
whose key value is the minimum of all keys in the tree. Due to BST property,
an minimum element can always be found by following left child pointers from
the root until NIL is uncountered.

TREE-MINIMUM (x)
while left[x] ≠ NIL
do
x ← left [x]
return x
Clearly, it runs in O(h) time where h is the height of the tree. Again thanks to
BST property, an element in a binary search tree whose key is a maximum can
always be found by following right child pointers from root until a NIL is
encountered.

TREE-MAXIMUM (x)
while right[x] ≠
NIL do
x ← right [x]
return x
Clearly, it runs in O(h) time where h is the height of the tree.

The TREE-SUCCESSOR (x) algorithm returns a pointer to the node in the tree
whose key value is next higher than key [x].
TREE-SUCCESSOR (x)
if right [x] ≠ NIL
then return
TREE-MINIMUM
(right[x])
else y ← p[x]
while y ≠ NIL
.AND. x =
right[y] do
x←y
y ← p[y]
return y
Note that algorithm TREE-MINIMUM, TRE-MAXIMUM, TREE-
SUCCESSOR, and TREE-PREDESSOR never look at the keys.

An inorder tree walk of an n-node BST can be implemented in (n)-time by


finding the minimum element in the tree with TREE-MINIMUM (x) algorithm
and then making n-1 calls to TREE-SUCCESSOR (x).

Another way of Implementing Inorder walk on Binary


Search Tree
Algorithm

find the minimum element in the tree with TREE-



MINIMUM
Make n-1 calls to TREE-SUCCESSOR

Let us show that this algorithm runs in (n) time. For a tree T, let mT be the
number of edges that are traversed by the above algorithm. The running time of
the algorithm for T is (mT). We make following claim:

mT is zero if T has at most one node


and 2e - r otherwise, where e is
the number of edges in the tree and r is
the length of the path from
root to the node holding the maximum
key.

Note that e = n - 1 for any tree with at least one node. This allows us to
prove the claim by induction on e (and therefore, on n).

Base case Suppose that e = 0. Then, either the tree is empty or consists
only of a single node. So, e = r = 0. Therefore, the claim holds.

Inductive step Suppose e > 0 and assume that the claim holds for all e' <
e. Let T be a binary search tree with e edges. Let x be the root, and T1 and T2
respectively be the left and right subtree of x. Since T has at least one edge,
either T1 or T2 respectively is nonempty. For each i = 1, 2, let ei be the
number of edges in Ti, pi the node holding the maximum key in Ti, and ri the
distance from pi to the root of Ti. Similarly, let e, p and r be the
correspounding values for T. First assume that both T1 and T2 are nonempty.
Then e = e1 + e2 + 2, p = p2, and r = r2 + 1. The action of the enumeration is as
follows:

Upon being called, the minimum-tree(x) traverses the left branch of x and enters T1.

Once the root of T1 is visited, the edges of T1 are traversed as if T1 is the input tree. This situation will last until p is visisted.
• 1
When the Tree-Successor is called form p . The upward path from p and x is traversed and x is discovered to hold the
• 1 1
successor.

When the tree-Successor called from x, the right branch of x is taken.



Once the root of T2 is visited, the edges of T2 are traversed as if T2 is the input tree. This situation will last until p is
• 2
reached, whereby the algorithm halts.
By the above analysis, the number of edges that are traversed by the above
algorithm, mT, is

mT = 1 + (2e1 - r1) + (r1 + 1) +


1 + (2e2 - r2)
= 2(e1 + e2 + 2) - (r2 + 1)
= 2e -r
Therefore, the claim clearly holds for this case.

Next suppose that T2 is emply. Since e > 0, T1 is nonempty. Then e = e1 +


1. Since x does not have a right child, x holds the maximum. Therefore, p =
x and r = 0. The action of the enumeration algorithm is the first two steps.
Therefore, the number of edges that are traversed by the algorithm in question
is

mT = 1 + (2e1 - r1) + ( r1 +1)


= 2(e1 + 1) - 0
= 2e - r
Therefore, the claim holds for this case.

Finally, assume that T1 is empty. Then T2 is nonempty. It holds that e = e2 +


1, p = p2, and r = r2 + 1. This time x holds the minimum key and the
action of the enumeration algorithm is the last two steps. Therefore, the number
of edges that are traversed by the algorithm is

mT = 1 + (2e2 - r2)
= 2(e2+1) - (r2 + 1)
= 2e -r
Therefore, the claim holds for this case.

The claim is proven since e = n - 1 , mT 2n. On the other hand, at least


one edge has to be traversed when going from on node to another, so mT n
- 1. Therefore, the running time of the above algorithm is (n).

Consider any binary search tree T and let y be the parent of a leaf z. Our goal
is to show that key[y] is

either the smallest key in T larger


than key[x]

or the largest key in the T


smaller than key[x].

Proof Suppose that x is a left child of y. Since key[y] key[x], only we


have to show that there is no node z with key[y] > key[z] > key[x]. Assume, to
the contrary, that there is such a z. Choose z so that it holds the smallest key
among such nodes. Note for every node u z, x, key[z] dey[u] if and only if
key[x] key[u]. If we search key[z], then the search path is identical to that of
key[x] until the path rearches z or x. Since x is a leaf (meaning it has no
children), the search path never reaches x. Therefore, z is an ancestor of x.
Since y is the parent of x (it is given, in case you've forgotton!) and is not z, z
has to be an ancestor of y. So, key[y] > dey[z] >dey[x]. However, we are
assuming key[y] > key[z] > key[x], so this is clearly impossible. Therefore,
there is no such z.

The case when x is a right child of y is easy. Hint: symmetric.

INSERTION
To insert a node into a BST
1. find a leaf st the appropriate place and

2. connect the node to the parent of the leaf.

TREE-INSERT (T, z)
y ← NIL
x ← root [T]
while x ≠ NIL do
y←x
if key [z] <
key[x]
then x ←
left[x]
else x ←
right[x]
p[z] ← y
if y = NIL
then root [T] ←
z
else if key [z] <
key [y]
then left [y] ←
z
else right [y]
←z

Like other primitive operations on search trees, this algorithm begins at the root
of the tree and traces a path downward. Clearly, it runs in O(h) time on a tree
of height h.
Sorting
We can sort a given set of n numbers by first building a binary search tree
containing these number by using TREE-INSERT (x) procedure repeatedly to
insert the numbers one by one and then printing the numbers by an inorder tree
walk.

Analysis
Best-case running time

Printing takes O(n) time and n insertion cost O(lg n) each (tree is balanced, half the insertions are at depth lg(n) -1).

This gives the best-case running time O(n lg n).

Worst-case running time

Printing still takes O(n) time and n insertion costing O(n) each (tree is a single chain of nodes) is O(n2). The n insertion

cost 1, 2, 3, . . . n, which is arithmetic sequence so it is n2/2.

Deletion
Removing a node from a BST is a bit more complex, since we do not want to
create any "holes" in the tree. If the node has one child then the child is
spliced to the parent of the node. If the node has two children then its successor
has no left child; copy the successor into the node and delete the successor
instead TREE-DELETE (T, z) removes the node pointed to by z from the tree
T. IT returns a pointer to the node removed so that the node can be put on a
free-node list, etc.

TREE-DELETE (T, z)
1. if left [z] = NIL .OR. right[z]
= NIL

2. then y ← z
3. else y ← TREE-SUCCESSOR
(z)

4. if left [y] ≠ NIL

5. then x ← left[y]

6. else x ← right [y]

7. if x ≠ NIL

8. then p[x] ← p[y]

9. if p[y] = NIL

10. then root [T] ← x

11. else if y = left [p[y]]

12. then left [p[y]] ← x

13. else right [p[y]] ← x

14. if y ≠ z

15. then key [z] ← key [y]

16. if y has other field, copy


them, too

17. return y

The procedure runs in O(h) time on a tree of height h.

Graph Algorithms

Graph Theory is an area of mathematics that deals with following types of


problems
Connection problems

Scheduling problems

Transportation problems

Network analysis

Games and Puzzles.

The Graph Theory has important applications in Critical path analysis, Social
psychology, Matrix theory, Set theory, Topology, Group theory, Molecular
chemistry, and Searching.

Those who would like to take a quick tour of essentials of graph theory please
go directly to "Graph Theory" from here.

Digraph
A directed graph, or digraph G consists of a finite nonempty set of vertices V,
and a finite set of edges E, where an edge is an ordered pair of vertices in V.
Vertices are also commonly referred to as nodes. Edges are sometimes referred
to as arcs.

As an example, we could define a graph G=(V, E) as follows:

V = {1, 2, 3, 4}
E = { (1, 2), (2, 4), (4, 2) (4, 1)}

Here is a pictorial representation of this graph.

The definition of graph implies that a graph can be drawn just knowing its
vertex-set and its edge-set. For example, our first example
has vertex set V and edge set E where: V = {1,2,3,4} and E = {(1,2),(2,4),(4,3),
(3,1),(1,4),(2,1),(4,2),(3,4),(1,3),(4,1). Notice that each edge seems to be listed
twice.

Another example, the following Petersen Graph G=(V,E) has vertex set V and
edge set E where: V = {1,2,3,4}and E ={(1,2),(2,4),(4,3),(3,1),(1,4),(2,1),(4,2),
(3,4),(1,3),(4,1)}.

We'll quickly covers following three important topics from algorithmic


perspective.
1. Transpose
2. Square

3. Incidence Matrix

1. Transpose
If graph G = (V, E) is a directed graph, its transpose, GT = (V, ET) is the same
as graph G with all arrows reversed. We define the transpose of a adjacency
matrix A = (aij) to be the adjacency matrix AT = (Taij) given by Taij = aji. In other
words, rows of matrix A become columns of matrix AT and columns of matrix
A becomes rows of matrix AT. Since in an undirected graph, (u, v) and (v, u)
represented the same edge, the adjacency matrix A of an undirected graph is its
own transpose: A = AT.

Formally, the transpose of a directed graph G = (V, E) is the graph GT (V, ET),
where ET = {(u, v) ∈ V×V : (u, v)∈E. Thus, GT is G with all its edges reversed.

We can compute GT from G in the adjacency matrix representations and


adjacency list representations of graph G.

Algorithm for computing GT from G in representation of graph G is

ALGORITHM MATRIX
TRANSPOSE (G, GT)
For i = 0 to i < V[G]
For j = 0 to j V[G]
GT (j, i) = G(i, j)
j = j + 1;
i=i+1

To see why it works notice that if GT(i, j) is equal to G(j, i), the same thing is
achieved. The time complexity is clearly O(V2).
Algorithm for Computing GT from G in Adjacency-List
Representation
In this representation, a new adjacency list must be constructed for transpose of
G. Every list in adjacency list is scanned. While scanning adjacency list of v
(say), if we encounter u, we put v in adjacency-list of u.

ALGORITHM LIST
TRANSPOSE [G]
for u = 1 to V[G]
for each element v∈Adj[u]
Insert u into the front of
Adj[v]

To see why it works, notice if an edge exists from u to v, i.e., v is in the


adjacency list of u, then u is present in the adjacency list of v in the transpose of
G.

2. Square
The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that (a,
b)∈E2 if and only if for some vertex c∈V, both (u, c)∈E and (c,b)∈E. That is,
G2 contains an edge between vertex a and vertex b whenever G contains a path
with exactly two edges between vertex a and vertex b.

Algorithms for Computing G2 from G in the


Adjacency-List Representation of G
Create a new array Adj'(A),
indexed by V[G]
For each v in V[G] do
For each u in Adj[v] do
\\ v has a path of length 2.
\\ to each of the neighbors
of u
make a copy of Adj[u] and
append it to Adj'[v]
Return Adj'(A).

For each vertex, we must make a copy of at most |E| list elements. The total
time is O(|V| * |E|).

Algorithm for Computing G2 from G in the


Adjacency-Matrix representation of G.
For i = 1 to V[G]
For j = 1 to V[G]
For k = 1 to V[G]
c[i, j] = c[i, j] + c[i,
k] * c[k, j]

Because of three nested loops, the running time is O(V3).

3. Incidence Matrix
The incidence matrix of a directed graph G=(V, E) is a V×E matrix B = (bij)
such that
-1 if edge j leaves
vertex j.
bij = 1 if edge j enters
vertex j.
0 otherwise.

If B is the incidence matrix and BT is its transpose, the diagonal of the product
matrix BBT represents the degree of all the nodes, i.e., if P is the product matrix
BBT then P[i, j] represents the degree of node i:

Specifically we have

BBT(i,j) = ∑e∈E bie bTej = ∑e∈E bie bje


Now,

If i = j, then b b = 1, whenever edge e enters or leaves vertex i and 0 otherwise.


• ie je
If i ≠ j, then b b = -1, when e = (i, j) or e = (j, i) and 0 otherwise.
• ie je

Therefore

BBT(i,j) = deg(i) = in_deg +


Out_deg if i = j
= -(# of edges
connecting i an j ) if i ≠ j

Breadth First Search (BFS)

Breadth First Search algorithm used in


Prim's MST algorithm.

Dijkstra's single source shortest path algorithm.

Like depth first search, BFS traverse a connected component of a given graph
and defines a spanning tree.

Algorithm Breadth First Search


BFS starts at a given vertex, which is at level 0. In the first stage, we visit all
vertices at level 1. In the second stage, we visit all vertices at second level.
These new vertices, which are adjacent to level 1 vertices, and so on. The BFS
traversal terminates when every vertex has been visited.

BREADTH FIRST
SEARCH (G, S)
Input: A graph G
and a vertex.
Output: Edges
labeled as
discovery and cross
edges in the
connected
component.
Create a Queue Q.
ENQUEUE (Q,
S) // Insert S into
Q.
While Q is not
empty do
for each vertex v
in Q do
for all edges e
incident on v do
if edge e is
unexplored then
let w be
the other endpoint
of e.
if vertex
w is unexpected
then
- mark
e as a discovery
edge
- insert
w into Q
else
mark e
as a cross edge

BFS label each vertex by the length of a shortest path (in terms of number of
edges) from the start vertex.

Example (CLR)

Step1
Step 2

Step 3

Step 4

Step 5

Step 6
Step 7

Step 8

Step 9

Starting vertex (node) is S


Solid edge = discovery edge.
Dashed edge = error edge (since none
of them connects a vertex to one of its
ancestors).

As with the depth first search (DFS), the discovery edges form a spanning tree,
which in this case we call the BSF-tree.
BSF used to solve following problem
Testing whether graph is
connected.

Computing a spanning
forest of graph.

Computing, for every


vertex in graph, a path
with the minimum
number of edges between
start vertex and current
vertex or reporting that no
such path exists.

Computing a cycle in
graph or reporting that no
such cycle exists.

Analysis
Total running time of BFS is O(V + E).

Bipartite Graph
We define bipartite graph as follows: A bipartite graph is an undirected graph
G = (V, E) in which V can be partitioned into two sets V1 and V2 such that (u,
v) ∈ E implies either u V1 and v V2 or u V2 and v V1. That is, all edges go
between the two sets V1 and V2.

In other to determine if a graph G = (V, E) is bipartite, we perform a BFS on it


with a little modification such that whenever the BFS is at a vertex u and
encounters a vertex v that is already 'gray' our modified BSF should check to
see if the depth of both u and v are even, or if they are both odd. If either of
these conditions holds which implies d[u] and d[v] have the same parity,
then the graph is not bipartite. Note that this modification does not change the
running time of BFS and remains (V + E).

Formally, to check if the given graph is bipartite, the algorithm traverse the
graph labeling the vertices 0, 1, or 2 corresponding to unvisited., partition 1 and
partition 2 nodes. If an edge is detected between two vertices in the same
partition, the algorithm returns.

ALGORITHM:
BIPARTITE (G, S)
For each vertex
U ∈ V[G] - {s} do
Color[u] =
WHITE
d[u] = ∞
partition[u] = 0
Color[s] = gray
partition[s] = 1
d[s] = 0
Q = [s]
While Queue 'Q' is
not empty do
u = head [Q]
for each v in
Adj[u] do
if partition [u]
= partition [v] then
return 0
else
if color[v]
WHITE then
color[v] =
gray
d[v] =
d[u] +1

partition[v] = 3 -
partition[u]

ENQUEUE (Q, v)
DEQUEUE (Q)
Color[u] = BLACK
Return 1

Correctness
As Bipartite (G, S) traverse the graph it labels the vertices with a partition
number consisted with the graph being bipartite. If at any vertex, algorithm
detects an inconsistency, it shows with an invalid return value,. Partition value
of u will always be a valid number as it was enqueued at some point and its
partition was assigned at that point. AT line 19, partition of v will unchanged if
it already set, otherwise it will be set to a value opposite to that of vertex u.

Analysis
The lines added to BFS algorithm take constant time to execute and so the
running time is the same as that of BFS which is O(V + E).
Diameter of Tree
The diameter of a tree T = (V, E) is the largest of all shortest-path distance
in the tree and given by max[dist(u,v)]. As we have mentioned that BSF
can be use to compute, for every vertex in graph, a path with the minimum
number of edges between start vertex and current vertex. It is quite easy to
compute the diameter of a tree. For each vertex in the tree, we use BFS
algorithm to get a shortest-path. By using a global variable length, we record
the largest of all shortest-paths. This will clearly takes O(V(V + E)) time.

ALGORITHM:
TREE_DIAMETER (T)
maxlength = 0
For S = 0 to S < |
V[T]|
temp = BSF(T,
S)
if maxlength <
temp
maxlength =
temp
Increment s by 1
return maxlength

Depth First Search (DFS)


Depth first search (DFS) is useful for
Find a path from one vertex to another

Whether or not graph is connected

Computing a spanning tree of a connected graph.

DFS uses the backtracking technique.

Algorithm Depth First Search


Algorithm starts at a specific vertex S in G, which becomes current vertex.
Then algorithm traverse graph by any edge (u, v) incident to the current vertex
u. If the edge (u, v) leads to an already visited vertex v, then we backtrack to
current vertex u. If, on other hand, edge (u, v) leads to an unvisited vertex v,
then we go to v and v becomes our current vertex. We proceed in this manner
until we reach to "deadend". At this point we start back tracking. The process
terminates when backtracking leads back to the start vertex.

Edges leads to new vertex are called discovery or tree edges and edges lead to
already visited are called back edges.

DEPTH FIRST SEARCH


(G, v)
Input: A graph G
and a vertex v.
Output: Edges
labeled as
discovery and back
edges in the
connected
component.
For all edges e
incident on v do
If edge e is
unexplored then
w ← opposite
(v, e) // return the
end point of e
distant to v
If vertex w is
unexplained then
- mark e as
a discovery edge
-
Recursively call
DSF (G, w)
else
- mark e as
a back edge

Example (CLR)
Solid Edge = discovery or tree edge
Dashed Edge = back edge.

Each vertex has two time stamps: the first time stamp records when vertex is
first discovered and second time stamp records when the search finishes
examining adjacency list of vertex.
DFS algorithm used to solve following
problems.
Testing whether graph is
connected.

Computing a spanning
forest of graph.

Computing a path
between two vertices of
graph or equivalently
reporting that no such
path exists.

Computing a cycle in
graph or equivalently
reporting that no such
cycle exists.

Analysis
The running time of DSF is (V + E).

Consider vertex u and vertex v in V[G] after a DFS. Suppose vertex v in


some DFS-tree. Then we have d[u] < d[v] < f[v] < f[u] because of
the following reasons
1. Vertex u was discovered before vertex v; and

2. Vertex v was fully explored before vertex u was fully explored.


Note that converse also holds: if d[u] < d[v] < f[v] < f[u] then vertex v
is in the same DFS-tree and a vertex v is a descendent of vertex u.

Suppose vertex u and vertex v are in different DFS-trees or suppose vertex u


and vertex v are in the same DFS-tree but neither vertex is the descendent of
the other. Then one vertex was discovered and fully explored before the other
was discovered i.e., f[u] < d[v] or f[v] < d[u].

Consider a directed graph G = (V, E). After a DFS of graph G we can put each
edge into one of four classes:

A tree edge is an edge in a DFS-tree.



A back edge connects a vertex to an ancestor in a DFS-tree. Note that a self-loop is a back edge.

A forward edge is a nontree edge that connects a vertex to a descendent in a DFS-tree.

A cross edge is any other edge in graph G. It connects vertices in two different DFS-tree or two vertices in the same DFS-

tree neither of which is the ancestor of the other.

Lemma 1 An Edge (u, v) is a back edge if and only if


d[v] < d[u] < f[u] < f[v].
Proof
(=> direction) From the definition of a back edge, it connects
vertex u to an ancestor vertex v in a DFS-tree. Hence, vertex u is a
descendent of vertex v. Corollary 23.7 in the CLR states that
vertex u is a proper descendent of vertex v if and only if d[v] <
d[u] < f[u] < f[v]. Hence proved forward direction. □

(<= direction) Again by the Corollary 23.7 (CLR), vertex u is a


proper descendent of vertex v. Hence if an edge (u, v) exists from
u to v then it is an edge connecting a descendent vertex u to its
ancestor vertex v. Hence it is a back edge. Hence proved
backward direction.

Conclusion: Immediate from both directions.


Lemma 2 An edge (u, v) is a cross edge if and only if d[v]
< f[v] < d[u] < f[v].
Proof
First take => direction.

Observation 1 For an edge (u, v), d[u] < f[u] and d[v] <
f[v] since for any vertex has to be discovered before we can
finish exploring it.

Observation 2 From the definition of a cross edge it is an edge


which is not a tree edge, forward edge or a backward edge. This
implies that none of the relationships for forward edge [ d[u] <
d[v] < f[v] < f[u] ] or back edge [ d[v] < d[u] < f[u] <
f[v] ] can hold for a cross edge.
From the above two observations we conclude that the only two
possibilities are:

d[u] < f[u] < d[v] < f[v] and


d[v] < f[v] < d[u] < f[u]
When the cross edge (u, v) is discovered we must be at vertex u
and vertex v must be black. The reason is that if v was while then
edge (u, v) would be a tree edge and if v was gray edge (u, v)
would be a back edge. Therefore, d[v] < d[u] and hence
possibility (2) holds true.

Now take <= direction.

We can prove this direction by eliminating the various possible


edges that the given relation can convey. If d[v] < d[v] <
d[u] < f[u] then edge (u, v) cannot be a tree or a forward
edge. Also, it cannot be a back edge by lemma 1. Edge (u, v) is
not a forward or back edge. Hence it must be a cross edge (please
go above and look again the definition of cross edge).

Conclusion: Immediately from both directions.

Just for the hell of it lets determine whether or not an undirected graph contain
a cycle. It is not difficult to see that the algorithm for this problem would be
very similar to DFS(G) except that when the adjacent edge is already a GRAY
edge than a cycle is detected. While doing this the algorithm also takes care
that it is not detecting a cycle when the GRAY edge is actually a tree edge from
a ancestor to a descendent.

ALGORITHM
DFS_DETECT_CYCLES
[G]
For each vertex u in
V[G] do
Color [u] =
while,
Predecessor [u]
= NIL;
time = 0
For each vertex u in
V[G] do
if color [u] =
while
DFS_visit(u);
The subalgorithm DFS_visit(u) is as follows:

DFS_visit(u)
color(u) = GRAY
d[u] = time = time + 1
For each v in adj[u] do
if color[v] = gray and
Predecessor[u] v do
return "cycle
exists"
if color[v] = while do
Predecessor[v] = u
Recursively
DFS_visit(v)
color[u] = Black;
f[u] = time = time + 1

Correctness
To see why this algorithm works suppose the node to visited v is a gray node,
then there are two possibilities:

1. The node v is a parent node of u and we are going back the tree edge which we traversed while visiting u after visiting v. In

that case it is not a cycle.

2. The second possibility is that v has already been encountered once during DFS_visit and what we are traversing now will be

back edge and hence a cycle is detected.


Time Complexity
The maximum number of possible edges in the graph G if it does not have
cycle is |V| - 1. If G has a cycles, then the number of edges exceeds this
number. Hence, the algorithm will detects a cycle at the most at the Vth edge if
not before it. Therefore, the algorithm will run in O(V) time.

Topological Sort

A topological sort of a directed acylic graph (DAG) G is an ordering of the


vertices of G such that for every edge (ei, ej) of G we have i < j. That is, a
topological sort is a linear ordering of all its vertices such that if DAG G
contains an edge (ei, ej), then ei appears before ej in the ordering. DAG is
cyclic then no linear ordering is possible.

In simple words, a topological ordering is an ordering such that any directed


path in DAG G traverses vertices in increasing order.

It is important to note that if the graph is not acyclic, then no linear ordering is
possible. That is, we must not have circularities in the directed graph. For
example, in order to get a job you need to have work experience, but in order to
get work experience you need to have a job (sounds familiar?).

Theorem A directed graph has a topological ordering if


and only if it is acyclic.
Proof:
Part 1. G has a topological ordering if is G acyclic.

Let G is topological order.


Let G has a cycle (Contradiction).
Because we have topological ordering. We must have i0, < i,
< . . . < ik-1 < i0, which is clearly impossible.
Therefore, G must be acyclic.

Part 2. G is acyclic if has a topological ordering.

Let is G acyclic.
Since is G acyclic, must have a vertex with no incoming edges.
Let v1 be such a vertex. If we remove v1 from graph, together with
its outgoing edges, the resulting digraph is still acyclic. Hence
resulting digraph also has a vertex *

Algorithm Topological Sort

TOPOLOGICAL_SORT(G
)
1. For each vertex find the finish
time by calling DFS(G).

2. Insert each finished vertex into


the front of a linked list.

3. Return the linked list.

Example
Given graph G; start node u

Diagram

with no incoming edges, and we let v2 be such a vertex. By repeating this


process until digraph G becomes empty, we obtain an ordering v1 < v2 < , . .
. , vn of vertices of digraph G. Because of the construction, if (vi, vj) is an
edge of digraph G, then vi must be detected before vj can be deleted, and thus i
< j. Thus, v1, . . . , vn is a topological sorting.
(V + E) . Since DFS(G) search
Total running time of topological sort is
takes (V + E) time and it takes O(1) time to insert each of the |V|
vertices onto the front of the linked list.

You might also like