You are on page 1of 74

DATA STRUCTURE AND ALGORITHMS

WHY DO WE STUDY DATA STRUCTURE?


We study data structures so that we can learn to write more efficient
programs.

But why must programs be efficient when new computers are faster
every year?

The reason is that our ambitions grow with our capabilities.

Instead of rendering efficiency needs obsolete, the modern revolution


in computing power and storage capability merely raises the efficiency
stakes as we computerize more complex tasks.

The quest for program efficiency need not and should not conflict with
sound design and clear coding.

Creating efficient programs has little to do with “programming tricks” but


rather is based on good organization of information and good algorithms.

A programmer who has not mastered the basic principles of clear design is
not likely to write efficient programs

Representing information is fundamental to computer science.

The primary purpose of most computer programs is not to perform


calculations, but to store and retrieve information — usually as fast as
possible.

For this reason, the study of data structures and the algorithms that
manipulate them is at the heart of computer science.

We have three primary goals:

• The first is to present the commonly used data structures.

These form a programmer’s basic data structure “toolkit.” For many


problems, some data structure in the toolkit provides a good solution.
• The second goal is to introduce the idea of tradeoffs and
reinforce the concept that there are costs and benefits
associated with every data structure.

This is done by describing, for each data structure, the amount of space and
time required for typical operations.

• The third goal is to teach how to measure the effectiveness of


a data structure or algorithm.

Only through such measurement can you determine which data structure in
your toolkit is most appropriate for a new problem.

What Is a Data Structure?

A data structure is a way of organizing input data and operations which can
be performed on this data (e.g., add, delete, search).

Goals:

• Make a particular algorithm more efficient in terms of computational time


and memory use.

• Reuse a given data structure for many different applications.

A data structure is a theoretical concept of representing data and operations


on them which can be implemented in any programming language (C, C++,
Java, etc.).

In the most general sense, a data structure is any data


representation and its associated operations.

• Even an integer or floating point number stored on the computer can


be viewed as a simple data structure.

More typically, a data structure is meant to be an organization or


structuring for a collection of data items.

• A sorted list of integers stored in an array is an example of such a


structuring.

Using the proper data structure can make the difference between a
program running in a few seconds and one requiring many days.
A solution is said to be efficient if it solves the problem within the
required resource constraints.

• Examples of resource constraints include the total space available


to store the data—possibly divided into separate main memory and
disk space constraints — and the time allowed to perform each
subtask.

The cost of a solution is the amount of resources that the solution


consumes.

• Most often, cost is measured in terms of one key resource such as


time, with the implied assumption that the solution meets the other
resource constraints.

When selecting a data structure to solve a problem, you should follow these
steps:

1. Analyze your problem to determine the basic operations that must be


supported.

Examples of basic operations include inserting a data item into the data
structure, deleting a data item from the data structure, and finding a
specified data item.

2. Quantify the resource constraints for each operation.

3. Select the data structure that best meets these requirements.

This three-step approach to selecting a data structure operationalizes a data


centered view of the design process.

Costs and Benefits

Each data structure has associated costs and benefits.

In practice, it is hardly ever true that one data structure is better than
another for use in all situations.

If one data structure or algorithm is superior to another in all respects, the


inferior one will usually have long been forgotten.
A data structure requires: a certain amount of space for each data item it
stores, a certain amount of time to perform a single basic operation, and
a certain amount of programming effort.

Each problem has constraints on available space and time.

Each solution to a problem makes use of the basic operations in some


relative proportion, and the data structure selection process must account
for this.

Example 1

A bank must support many types of transactions with its customers, but we
will examine a simple model where customers wish to open accounts, close
accounts, and add money or withdraw money from accounts.

We can consider this problem at two distinct levels:

(1) the requirements for the physical infrastructure and workflow process
that the bank uses in its interactions with its customers, and

(2) the requirements for the database system that manages the accounts.

The typical customer opens and closes accounts far less often than he or
she accesses the account.

Customers are willing to wait many minutes while accounts are created
or deleted but are typically not willing to wait more than a brief time for
individual account transactions such as a deposit or withdrawal. These
observations can be considered as informal specifications for the time
constraints on the problem.

It is common practice for banks to provide two tiers of service. Human


tellers or automated teller machines (ATMs) support customer access
to account balances and updates such as deposits and withdrawals.

Teller and ATM transactions are expected to take little time. Opening or
closing an account can take much longer (perhaps up to an hour from the
customer’s perspective).

For simplicity, assume that if money is added or removed, this transaction


simply changes the value stored in an account record.
Adding a new account to the database is allowed to take several minutes.
Deleting an account need have no time constraint, because from the
customer’s point of view all that matters is that all the money be returned
(equivalent to a withdrawal).

When considering the choice of data structure to use in the database system
that manages customer accounts, we see that a data structure that has
little concern for the cost of deletion, but is highly efficient for search
and moderately efficient for insertion, should meet the resource
constraints imposed by this problem.

Records are accessible by unique account number (sometimes called an


exact-match query).

One data structure that meets these requirements is the hash table.

Hash tables allow for extremely fast exact-match search. A record can
be modified quickly when the modification does not affect its space
requirements.

Hash tables also support efficient insertion of new records. While


deletions can also be supported efficiently, too many deletions lead to some
degradation in performance for the remaining operations.

However, the hash table can be reorganized periodically to restore the


system to peak efficiency.

Abstract Data Types and Data Structures

A type is a collection of values. For example, the Boolean type consists of


the values true and false.

The integers also form a type. An integer is a simple type because its
values contain no subparts.

A bank account record will typically contain several pieces of information


such as name, address, account number, and account balance - Such
a record is an example of an aggregate type or composite type.

A data item is a piece of information or a record whose value is drawn from


a type. A data item is said to be a member of a type.
A data type is a type together with a collection of operations to manipulate
the type. For example, an integer variable is a member of the integer data
type.

Addition is an example of an operation on the integer data type.

ABSTRACT DATA TYPE (ADT)

To understand the design of a data structure, we use an abstract


model called an abstract data type (ADT) that • specifies the type of
data stored

• the operations that support the data

An abstract data type (ADT) is the realization of a data type as a software


component.

The interface of the ADT is defined in terms of a type and a set of


operations on that type.

The behaviour of each operation is determined by its inputs and outputs. An


ADT does not specify how the data type is implemented.

These implementation details are hidden from the user of the ADT and
protected from outside access, a concept referred to as encapsulation.

A data structure is the implementation for an ADT. In an object-oriented


language such as C++, an ADT and its implementation together make up a
class.

Each operation associated with the ADT is implemented by a member


function or method. The variables that define the space required by a data
item are referred to as data members.

An object is an instance of a class, that is, something that is created and


takes up storage during the execution of a computer program.

The term “data structure” often refers to data stored in a computer’s main
memory.

The related term file structure often refers to the organization of data on
peripheral storage, such as a disk drive or CD-ROM.
Examples of ADTs – linked lists, queues, stacks, trees, heaps,
graphs.

Data Organization

The relation between particular elements of a data structure can be:

• linear – list, array, vector, linked list, sequences, queue, stack, deque

• hierarchical – tree, heap

• arbitrary – graph, network

Algorithm

An algorithm is a way of solving a problem by:


• performing an unambiguous sequence of instructions in a
finite amount of time and then halting.
• transforming input data into output data in finite time.
An algorithm can be considered as an abstraction of a computer program
and may be implemented in any programming language.

The analysis of an algorithm should be done theoretically before it is


implemented. We want to
• establish the algorithm efficiency using asymptotic notation
techniques
– get a relation between the number of its basic operations and the
size or magnitude of input data
– provide an upper bound on the running time function.
• prove the correctness using different proving techniques
– a correct algorithm, for any acceptable input data, should return
correct output and halt.

An algorithms can be defined as a step-by-step procedure for solving a


problem. It helps the user arrive at the correct result in a finite number of
steps.

Consider the following step-by-step procedure to display the first 10 natural


numbers:-

1. Set the value of counter to 1

2. Display counter

3. Increment counter by 1

4. If counter <=10, go to step 2

The preceding step-by-step procedure is an algorithm because it produces


the correct result in a finite number of steps.

An algorithm has five important properties:-

Finiteness: An algorithm terminates after a finite number of steps.

Definiteness: Each step in an algorithm is unambiguous. This means that


the action specified by the step cannot be interpreted in multiple ways and
can be performed without any confusion.

Input: An algorithm accepts zero or more inputs.

Output: An algorithm produces at least one output.

Effectiveness: An algorithm consists of basic instructions that are


realizable. This means that the instructions can be performed by using the
given inputs in a finite amount of time.

A problem can be solved by using a computer only if an algorithm can be


written for it. In addition, the use of algorithm provides many other
benefits:-

While writing an algorithm, you identify the step-by-step


procedure, the major decision points, and the variables
necessary to solve the problem. This helps you in the
development of the corresponding program.

Identification of the procedure and the decision points reduces the


problem into a series of smaller problems of more manageable
size. Therefore, problems that would be difficult or impossible to
solve as a whole can be approached as a series of small solvable
sub problems.

With the use of an algorithm, decision making becomes a more


rational process. This is because algorithms comprise of sub
tasks, where each sub task is atomic in nature and is supported
by facts.

With the use of an algorithm, the same specified steps are used
for performing the tasks. This makes the process more
consistent and reliable.

Role of Data Structures

Multiple algorithms can be designed to solve a particular problem. However,


the algorithm may differ in how efficiently they can solve the problem. In
such a situation, an algorithm that provides the maximum efficiency should
be used for solving the problem. Efficiency here means that the algorithm
should work in minimal time and use minimal memory.

One of the basic techniques for improving the efficiency of algorithms is to


structure the data that they operate on in such a way that the resulting
operations can be efficiently performed.

The way in which the various data elements are organized in memory with
respect to each other is called a data structure.

Data can be organized in many different ways; therefore, you can create as
many data structures as you want. However, there are some standard data
structures that have proved useful over the years. These include arrays,
linked lists, stacks, queues, trees and graphs.

Suppose you have to write an algorithm that enables printer to service the
requests of multiple users on a first-come-first-served (FCFS) basis. In this
case, using a data structure that stores and retrieves the requests in the
order of their arrival would be much more efficient than a data structure
that stores and retrieves the requests in a random order

In addition to improving the efficiency of an algorithm, the use of


appropriate data structures also allows you to overcome some
other programming challenges, such as:- Simplifying complex
problems

Creating standard, reusable code components

Creating programs that are easy to understand and maintain

Consider an example where you have to find the maximum value in a set of
50 numbers, In this scenario, you can either use 50 variables or a data
structure, such as an array of size 50, to store the numbers. When 50
different variables are used to store the numbers, the algorithm to
determine the maximum value among the numbers can be written as:

Accept 50 numbers and store them in num1, num2, num3, .. num50

Set max = num1

If num2 > max then: max = num2

If num3 > max then: max = num3:


.
.

If num50 > max then max = num50

Display max

On the other hand, when an array of size 50 is used, the algorithm can be
written as:-

Set max = num[0]

Repeat step3 varying i from 1 to 49

If num[i] > max then: max = num[i]

Display max
From the preceding two algorithms, it can be seen that the algorithm using
an array manipulates memory much more efficiently than the algorithm
using 50 variables.

Also, the algorithm using an array involves few steps and is therefore, easier
to understand and implement as compared to the algorithm that uses 50
variables.

Designing Algorithms and measuring their efficiency

Designing an algorithm for a given problem is a difficult intellectual exercise.


This is because there is no systematic method for designing an algorithm.

Moreover, there may exist more than one algorithm to solve a problem,
Writing an effective algorithm for a new problem or writing a better
algorithm for an already existing one is an art as well as science because
it requires both creativity and insight. Classification of Algorithms

Generally, there are two types of algorithms:

iterative and recursive.

Classification of algorithms based on design techniques:

1. divide and conquer

2. dynamic programming

3. greedy

4. brute force

5. backtracking

6. branch and bound

7. Randomized

Divide and Conquer Approach

The divide and conquer approach is an algorithm design technique that


involves breaking down the problem recursively into sub problems until the
sub problems become so small that they can directly be solved. The
solutions to the sub problems are then combined to give a solution to the
original problem.

Divide and conquer is a powerful approach for solving conceptually


difficult problems. It simply requires you to find a way of:- Breaking the

problem into sub problems Solving the trivial cases.

Combining the solutions of the sub problems to solve the original


problem.

Divide and conquer often provides a natural way to design


efficient algorithms

EXAMPLE
Consider an example where you have to find the minimum value in a list of
numbers. The lists is as shown in the figure:-

To find the minimum value, you can divide the list into two halves, as shown
in the following figure:-

Again, divide each of the two lists into two halves as shown in the following
figure:-

Now, there are only two elements in each list. At this stage, compare the
two elements in each lists to find the minimum of the two.

The minimum values from each of the four lists is shown in the following
figures
3 2 1 8
Minimum values in the four lists

Again, compare the first two minimum values to determine their minimum.

Also compare the last two minimum values to determine their minimum. The
two minimum values thus obtained are shown in the following figure:-

2 1
Minimum values in the two halves of the original list

Again, compare the two final minimum values to obtain the overall minimum
value, which is 1 in the preceding example.

Greedy Approach
The greedy approach is an algorithm design technique that selects the best
possible option at any given time. Algorithms based on the greedy
approach are used for solving optimization problems, where you need to
maximize profits or minimize costs under a given set of conditions. Some
examples of optimization problems are:-

Finding the shortest distance from an originating city to a set of


destination cities, given the distances between the pairs of cities.

Finding the minimum number of currency notes required for an


amount, where an arbitrary number of notes for each
denomination are available.

Selecting items with maximum value from a given set of items,


where the total weight of the selected items cannot exceed a
given value.

Consider an example, where you have to fill a bag of capacity 10kg by


selecting items, (from a set of items) whose weights and values are given in
the following table.
Item Weight (in kg) Value (in $/kg) Total Value (in $)
A 2 200 400

B 3 150 450

C 4 200 800

D 1 50 50

E 5 100 500

Weights and Values of Items

A greedy algorithms acts greedy, and therefore selects the item with the
maximum total value at each stage. Therefore, first of all, item C with total
value of $800 and weight 4 kg will be selected. Next, item E with total value
$500 and weight 5 kg will be selected. The next item with the highest value
is item B with a total value of $450 and weight 3 kg. However, if this item is
selected, the total weight of the selected items will be 12 kg (4 + 5 + 3),
which is more than the capacity of the bag.

Therefore, we discard item B and search for the item with the next highest
value. The item with the next higher value is item A having a total value of
$400 and a total weight of 2 kg. However, the item also cannot be selected
because if it is selected, the total weight of the selected items will be 11 kg (
4 + 5 + 2). Now, there is only one item left, that is, item D with a total
value of $50 and a weight of 1 kg. This item can be selected as it makes the
total weight equal to 10 kg.

The selected items and their total weights are listed in the following table.
Item Weight (in kg) Total value (in $)

C 4 800

E 5 500

D 1 50

Total 10 1350
Items selected using Greedy Approach

For most problems, greedy algorithms usually fail to find the globally
optimal solution. This is because they usually don’t operate exhaustively on
all data. They can make commitments to certain choices too early, which
prevent them from finding the best overall solution later.

This can be seen from the preceding example, where the use of a greedy
algorithm selects item with a total value of $1350 only. However, if the
items were selected in the sequence depicted by the following table, the
total value would have been much greater, with the weight being 10 kg
only.

Item Weight (in kg) Total value (in $)

C 4 800

B 3 450

A 2 400

D 1 50

Total 10 1700

In the preceding example you can observe that the greedy approach
commits to item E very early.

This prevents it from determining the best overall solution later.

Nevertheless, greedy approach is useful because its quick and easy to


implement.

Moreover, it often gives good approximation to the optimal value


Brute Force algorithms
Brute Force algorithms use non-sophisticated approaches to solve a given
problem. Typically they are useful for small domains due to the total cost of
examining all possible solutions.

Example: sequential search of the sorted array or a Hamilton circuit.

Dynamic programming algorithmic techniques


Dynamic programming algorithmic techniques are used for solving
optimization problems by performing the following steps:

- partition a problem into overlapping subproblems,

- recursively solve the subproblems and memoize their solutions

to avoid solving the same subproblems repeatedly,

- use an optimal structure approach to be sure that optimal

solution of local subproblems lead to the optimal solution of the global

problem. Two approaches are used:

a top-down and bottom-

up.

The dynamic programming algorithm ends up with better run time than
the brute force algorithm.

Examples: Fibonacci numbers, Floyd’s all pairs shortest paths algorithm,


matrix chain multiplication, longest common subsequence, activity
scheduling problem.

Backtracking algorithm
Backtracking algorithm views the problem to be solved as a sequence of
decisions and systematically considers all possible outcomes for each
decision to solve the overall problem.
For example, it finds a solution to the first subproblem and then attempts
recursively solve the other subproblems based

on this first solution. If it cannot, checks all possible solutions backtracks


and tries the next possible solution to the first subproblem and so on.

Backtracking terminates when there are no more solutions to the first


subproblem. In this sense, backtracking algorithms are like brute-force
algorithms.

However, backtracking algorithms are distinguished by the way in which the


space of possible solutions is explored. Sometimes a backtracking algorithm
can detect that an exhaustive search is unnecessary and, therefore, it
can perform much better.

An "intelligent backtracking" keeps track of the dependencies between


sub-problems and only re-solves those which depend on an earlier solutions
which have changed.

Examples: topological sort, Depth First Search, n-queens problem.

Branch and bound algorithms


Branch and bound algorithms find optimal solution by keeping track of
the best solutions found so far. If a partial solution is not better than a
current one then it is abandoned. The algorithm traverses a spanning tree of
the solution space and prunes the solution tree, thereby reducing the
number of solutions to be considered.

Examples: Linear programming and optimization problems.

Randomized approach
Randomized approach: Any algorithm that makes some random (or
pseudo-random) choices.

Examples: randomized quick sort, pseudo-random number generator,


probabilistic algorithms.
Efficiency of An Algorithm
Determining the Efficiency of an Algorithm

The greatest difficulty in solving programming problems is not how to solve


the problem, but how to solve the problem efficiently. Factors that affect the
efficiency of a program include:

the speed of the machine,

the compiler

the operating system,

the programming language, and

the size of the input.

However, in addition to these factors, the way data of a program is


organized, and the algorithm used to solve the problem also has a
significant impact on the efficiency of a program.

The efficiency of an algorithm can be computed by determining the amount


of resources it consumes. The primary resources that an algorithm
consumes are:

Time: The CPU time required to execute the algorithm

Space: The amount of memory used by the algorithm for execution

The lesser resources that an algorithm uses, the more efficient it is

The efficiency analysis concentrates on basic operations:

• data interchanges (swaps)

• comparisons (<,>,≤,≥,==, !=)

• arithmetic operations (+,∗,−,/)

These operations are performed on data. Therefore the number of data


items (= n) is directly connected with the number of operations performed
by an algorithm and it is expressed as a running time function, f (n).
The efficiency analysis concentrates on basic operations:

• data interchanges (swaps)

• comparisons (<,>,≤,≥,==, !=)

• arithmetic operations (+,∗,−,/)

These operations are performed on data. Therefore the number of data


items (= n) is directly connected with the number of operations performed
by an algorithm and it is expressed as a running time function, f (n).

EXAMPLE
Compute the sum of 1+2+3+...+n for any integer n > 0.

Algorithm 1:

1. sum = 0; // one assignment

2. for i = 1 to n // n iterations of the for loop

3. sum = sum +i; // 1 assignment + 1 addition per iteration Algorithm

1 requires no more than f (n) = (n+1)t=+nt+ time units.

If t = max(t=, t+) then f (n) = (2n+1)t time units.

Usually we drop t-time units and focus on the number of operations


because they are platform and software independent and they do not
change the order of the function only play the role of a scaling factor.
Algorithm 2 sum = 0; // one assignment for (i = 1; i <= n; i++) { for
(j = 1; j <= i; j++)

sum = sum + 1; } // executed n(n+1)/2 times

It requires (n/2) ·(n+1) operations.

Algorithm 3 sum = n*(n+1)/2;

// one assignment + one addition //

+ one multiplication + one division.


It requires 4 operations.

Why n+(n−1)+(n−2)+...+1 = n(n+1)/2?

You can use induction to prove it.

Time/Space Tradeoff

To solve a given programming problem, many different algorithms may be


used. Some of these algorithms may be extremely time-efficient and
others extremely space-efficient.

Time/space tradeoff refers to a situation where you can reduce the use of
memory at the cost of slower program execution, or reduce the running
time at the cost of increased memory usage.

An example of a situation where a time/space tradeoff can be applied is


that of data storage. If data is stored in a compressed form, the memory
usage is less because data compression reduces the amount of space
required. However, it is more time consuming because some additional time
is required to run the compression algorithm. Similarly, if data is stored in
its uncompressed form, the memory usage is more, but the running time is
less.

Memory is generally perceived to be extensible because you can increase


the memory of your computer. Time, however, is not extensible. Therefore,
time considerations generally override memory considerations.

To understand how the nature of an algorithm affects the execution,


consider a simple example.

Suppose assignment, comparison, write, and increment statements take a,


b, c, and d time units to execute respectively. Now, consider the following
code used to display the elements stored in an array:-

Set I = 0 // 1 assignment

While(I < n): // n comparisons

Display a[I] // n writes

Increment I by 1 // n increments

The execution time required for the preceding algorithm is given by:-
T=a+bxn+cxn+dxn

T = a + n(b + c + d)

Here, T is the total running time of the algorithm expressed as a linear


function of the number of elements (n) in the array. From the preceding
expression, it is clear that T is directly proportional to n.

In fact, the total running time T is directly proportional to the number of


iterations involved in the algorithm. The number of iterations can be
determined by counting the number of comparisons involved in the
algorithm.

Designing Algorithms using Recursion

Recursion refers to the technique of defining a process in terms of itself. It is


used to solve complex programming problems that are repetitive in nature.

The basic idea behind recursion is to break a problem into smaller versions
of itself, and then build up a solution for the entire problem. This may sound
similar to the divide and conquer technique. However, recursions not similar
to the divide and conquer technique. Divide and conquer is a theoretical
concept that may be implemented in a computer program with the help of
recursion.

Recursion is implemented in a program by using a recursive procedure or


function. A recursive procedure is a function which invokes itself.

Consider a function f(n), which is the sum of the first n natural


numbers. This function can be defined in several different ways. In
mathematics, the function will be defined as:- f(n) = 1 + 2 + 3 + …. +
n

However, the same function can be defined in a recursive manner as:-

f(n) = f(n – 1) + n

Where n >1; and f(1) = 1

In this case, the recursive definition of the function f(n) calls the same
function, but with its arguments reduced by one. The recursion will end n =
1, in which case f(1) = 1 has been defined.
To understand this concept, consider a factorial function. A factorial
function is defined as:- n! = 1 x 2 x 3 x 4 x .. x n

This same factorial function can be redefined as:-

n! = (n – 1)! x n

Where n > 1; and 0! = 1

This definition of n! is recursive because it refers to itself when it uses (n –


1)!.
The value of n! is explicitly given where n = 0; and the value of n! for
arbitrary n is defined in terms of the smaller value of n, which is closer to
the base value 0.

If you have to calculate 3! By using recursion. you first define 3! in terms


of 2!:- 3! = (3 x 2!)

Now, you will define 2! in terms of 1!:-

3! = (3 x (2 x 1!))

Now, 1! will be defined in terms of 0!:-

3! = (3 x (2 x (1 x 0!)))

As, 0! is defined as 1, the expression becomes:-

3! = (3 x (2 x (1 x 1)))

3! = (3 x (2 x 1 ))

3! = (3 x 2)

3! = 6

This recursive algorithm for determining the factorial of a number n can be


written as:-

Algorithm: Factorial(n)

If n = 0, then: //Terminating condition

Return (1)
Return (n x Factorial(n – 1))

Please note that every recursive algorithm should have a terminating


condition. Otherwise, the algorithm will keep on calling itself infinitely.

The main advantage of recursion is that it is useful in writing clear, short,


and simple programs. One of the most common and interesting problems
that can be solved using recursion is the Tower of Hanoi problem.

Tower of Hanoi

Tower of Hanoi is a classical problem, which consists on n different sized


disks and three pins over which these disks can be mounted. All the disks
are placed on the first pin with the largest disk at the bottom and the
remaining disks in decreasing order of their size as shown in the following
figure:-

The objective of the game is to move all disks from the first pin to the third
pin in the least number of moves by using the second pin as an
intermediary.

To play this game, you need to follow rules:-

Only one disk can be moved at a time

A larger disk cannot be placed over a smaller one


Let n be the number of the discs. If n = 3, it will require seven moves to
transfer all discs from pin one to pin three, as shown in the table below.

Steps Moves

1. move top disc from pin 1 to pin 3

2. move top disc from pin 1 to pin 2

3. move top disc from pin 3 to pin 2

4. move top disc from pin 1 to pin 3

5. move top disc from pin 2 to pin 1

6. move top disc from pin 2 to pin 3

7. move top disc from pin 1 to pin 3

When n = 2, we should move the top disc from pin 1 to pin 2, ,move the top
disc from pin 1 to pin 3, and then move the top disc from pin 2 to pin 3.

The solution for n = 1 will be to move the disc from pin 1 to pin 3.

In general, to move n discs from pin 1 to pin 3 using pin 2 as an


intermediary, you first need to move the top n – 1 discs from pin 1 to pin 2
using pin 3 as intermediary.

The following algorithm can be used to move the top n discs from the
first pin START to final pin FINISH through the temporary pin TEMP:-
Move (n, START, TEMP, FINISH) When n = 1:

MOVE a disc from START to FINISH

Return

Move the top n -1 discs from START to TEMP using FINISH as an


intermediary [MOVE (n - 1, START, FINISH, TEMP)]

Move the top disc from START to FINISH


Move the top n – 1 discs from TEMP to FINISH using START as an
intermediary [MOVE (n - 1, TEMP, START, FINISH)]

In general, this solution requires 2n-1 moves for n discs.

Efficiency of An Algorithm

The efficiency of an algorithm can be measured by how much of a


computer’s time and memory is utilized by its implementation. This can be
studied in abstract way without considering implementation details but
focusing only on an algorithm’s properties that affect its execution time.

The efficiency analysis concentrates on basic operations:

• data interchanges (swaps)

• comparisons (<,>,≤,≥,==, !=)

• arithmetic operations (+,∗,−,/)

These operations are performed on data. Therefore the number of data


items (= n) is directly connected with the number of operations performed
by an algorithm and it is expressed as a running time function, f (n).

Usually, this function is quite complex so to classify an algorithm, we use its


upper bound Cg(n), where g(n) is a simple function, and C is a positive
constant. In short we write f (n) = O(g(n)).
Properties of Data
Data Relationships

While it is a simple enough procedure to store information, or data, it is as


important to store relationships between the individual elements, or data.

For example, consider the two following scenerios:

A bank (the server) has customers (clients) in the branch. The


customers must not only be in the branch, but must also be in a
line-up, or queue so that those customers who got there first are
served first.

A web server has a number of requests from web browsers


(clients) who are requesting specific pages. The server can only
deal with one request at a time, and therefore, like the other
customers in a branch, it is reasonable to deal with the requests
in the order in which they arrived.

In each case there is an ordering to the elements (in this case, the ordering
is based on time of arrival).

A telephone book contains an ordered list of names where the order is


determined by what we will call lexicographical ordering, or dictionary
ordering.

Even the characters which form the sentences on this web page are ordered
where the author specifies the order (hopefully in an attempt to convey an
idea to the intended audience).

There are four types of relationships between data :

Unordered,

Well-ordered relationships,

Partially-ordered relationships,

and Adjacency relationships.


Unordered

It is possible for data to be completely unassociated. For example, we


may wish to store information about all the chips produced by a number of
manufacturers. While an order could be imposed on these products
(lexicographically based on the name of the product), there may be no
justification for this information.

Another example the units converter in Maple: assocatied with each unit
such as the metre (m), the gram (g), the pound (lb), and the galileo (Gal) is
certain information, such as dimension and the relationship to other base
units.

There is no useful ordering of units, and therefore, we can consider each


unit to represent an unordered data element.

Well-Ordered Relationships (or simiply Ordered)

A lot of data is well-ordered, that is, given any two elements, it is possible to
determine if one precedes the other.

This is often represented by the relationship &apos;<&apos;, where a < b


indicates that a precedes b.

The actual rule which is followed which determines this well ordering may be
based on either an implicit rule, for example,

• In a telephone book, the order is based on spelling of the name

• In a bank, the order of the customers is based on the arrival


time.

Both of these are implicit rules based on the characteristics of the data.

Alternately, we can also have explicit orderings, where the order is


dictated. The easiest example of this are the characters at a command line.
The order of the characters is determined by the user who is typing them.
Paritally-Ordered Relationships

A less common ordering is that of a partial ordering where two elements


may be related.

A straight-forward example of this is a family tree, where a < b if a is a


decendant of b. Such a tree is shown in Figure.

In this case, we see that Karen < Roger and Roger < Susan, and
therefore we may also infer that Karen < Susan. This is called the
transitive property of a partial ordering. In this example, there is a minimal
element, Susan, which precedes all other elements in this tree, and is
therefore called a minimal element.

Note, however, that not all elements are comparable: we cannot compare
Karen and Julie, as neither Karen < Julie nor Julie < Karen is true.

Adjacency Relationships
The layout of the intersections of streets in a city or circuit elements on a
chip may be described by their locations, however, what is much more
useful is knowing which nodes are adjacent to others.

For example, the essential behaviour of a chip is determined by the


adjacency relations between the circuit elements. Other properties, such as
wire length (cost, heat, and power consuption) and the location of wires
(cross talk) are features which are minimized after the chip has been
designed.
Fundamental Data Structures
Types of Data Structures

Data structures can be classified under the following two categories:

Static: These are data structures whose size is fixed at compile time, and
does not grow or shrink at runtime. An example of a static structure is an
array. Suppose you declare an array of size 50, but store only 5 elements in
it; the memory space allocated for the remaining 45 elements will be
wasted. Similarly, if you have declared an array of size 50 but later want to
store 20 more elements, you will not be able to store these extra required
elements because of the fixed size of an array.

Dynamic: These are data structures whose size is not fixed at compile time
and that can grow and shrink at run time to make efficient use of memory.
An example of a dynamic data structure would be a list of items for which
memory is not allocated in advance. As and when items are added to the
lists, memory is allocated for those elements. Similarly, when items are
removed from the list, memory allocated to those elements is de-allocated.
Such a list is called a linked list.

Arrays, Lists, Stacks, and Queues Arrays

An array data structure or simply array is a data structure consisting of a


collection of similar elements (values or variables), each identified by one
or more integer indices, stored so that the address of each element can be
computed from its index tuple by a simple mathematical formula.

The position of an element in the array is called the index. In C++ arrays
always begin with 0:

0 1 2 3 4 5 (indices)

12 -3 24 65 92 11 (array values)

For example, an array of 10 integer variables, with indices 0 through 9, may


be stored as 10 words at memory addresses 2000, 2004, 2008, … 2036
(this memory allocation can vary because some computers use other than 4
bytes to store integer type variables); so that the element with index i has
address 2000 + 4 × i.
The total number of items inserted into an array is called the logical size of
the array.

– If the logical size is different than the physical size of an array (=


maximum size of an array), a special variable must be used to keep track of
the current number of items.

Array structures are the computer analog of the mathematical concepts


of vector, matrix, and to a certain extent tensor.

Indeed, an array with one or two indices is often called a vector or matrix
structure, respectively.

Arrays are often used to implement tables, especially lookup tables; so


the word table is sometimes used as synonym of array.

Arrays are among the oldest and most important data structures, and are
used by almost every program and are used to implement many other data
structures, such as lists and strings.

They effectively exploit the addressing machinery of computers; indeed, in


most modern computers (and many external storage devices), the memory
is a one-dimensional array of words, whose indices are their
addresses.

Processors, especially vector processors, are often optimized for array


operations.

The terms array and array structure are often used to mean array data
type, a kind of data type provided by most high-level programming
languages that consists of a collection of values or variables that can be
selected by one or more indices computed at run-time.

Array types are often implemented by array structures; however, in


some languages they may be implemented by hash tables, linked lists,
search trees, or other data structures.

The terms are also used, especially in the description of algorithms, to mean
associative array or "abstract array", a theoretical computer science model
(an abstract data type or ADT) intended to capture the essential
properties of arrays
Applications
Arrays are used to implement mathematical vectors and matrices, as
well as other kinds of rectangular tables. Many databases, small and
large, consist of (or include) one-dimensional arrays whose elements are
records.

Arrays are used to implement other data structures, such as heaps, hash
tables, deques, queues, stacks, strings, and VLists.

One or more large arrays are sometimes used to emulate in-program


dynamic memory allocation, particularly memory pool allocation.
Historically, this has sometimes been the only way to allocate "dynamic
memory" portably.

Arrays can be used to determine partial or complete control flow in


programs, as a compact alternative to (otherwise repetitive), multiple IF
statements. They are known in this context as control tables and are used in
conjunction with a purpose built interpreter whose control flow is altered
according to values contained in the array. The array may contain
subroutine pointers (or relative subroutine numbers that can be acted upon
by SWITCH statements) - that direct the path of the execution. Lists

If your program needs to store a few things — numbers, payroll records,


or job descriptions for example — the simplest and most effective
approach might be to put them in a list.

We define a list to be a finite, ordered sequence of data items known as


elements.

“Ordered” in this definition means that each element has a position in the
list.

In the simple list implementations, all elements of the list have the
same data type, although there is no conceptual objection to lists
whose elements have differing data types if the application requires it.

The operations defined as part of the list ADT do not depend on the
elemental data type.

For example, the list ADT can be used for lists of integers, lists

of characters, lists of payroll records, even lists of lists.


A list is said to be empty when it contains no elements.

The number of elements currently stored is called the length of the


list.

The beginning of the list is called the head, the end of the list is
called the tail. There might or might not be some relationship between
the value of an element and its position in the list.

For example, sorted lists have their elements positioned in


ascending order of value, while unsorted lists have no particular
relationship between element values and positions.

If there are n elements in the list, they are given positions

0 through n-1 as

The subscript indicates an element’s position within the list.

Using this notation, the empty list would appear as

Before selecting a list implementation, a program designer should


first consider what basic operations the implementation must
support.

Our common intuition about lists tells us that:

A list should be able to grow and shrink in size as we insert

and remove elements.

We should be able to insert and remove elements from

anywhere in the list.

We should be able to gain access to any element’s value, either

to read it or to change it.

We must be able to create and clear (or reinitialize) lists.

It is also convenient to access the next or previous element

from the “current” one.


Many programming languages provide support for list data types, and have
special syntax and semantics for lists and list operations.

Often a list can be constructed by writing the items in sequence, separated


by commas, semicolons, or spaces, within a pair of delimiters such as
parentheses '()', brackets, '[]', braces '{}', or angle brackets '<>'.

Some languages may allow list types to be indexed or sliced like array
types. In object-oriented programming languages, lists are usually provided
as instances of subclasses of a generic "list" class.

List data types are often implemented using arrays or linked lists of some
sort, but other data structures may be more appropriate for some
applications. In some contexts, such as in Lisp programming, the term list
may refer specifically to a linked list rather than an array.

Operations
Implementation of the list data structure may provide some of the following
operations:

• a construtor for creating an empty list;

• an operation for testing whether or not a list is empty;

• an operation for prepending an entity to a list

• an operation for appending an entity to a list

• an operation for determining the first component (or the "head") of a

list

• an operation for referring to the list consisting of all the components of


a list except for its first (this is called the "tail" of the list.)

Characteristics
• Lists have the following properties:
• The size of lists. It indicates how many elements there are in the list.

• Equality of lists:

• In mathematics, sometimes equality of lists is defined simply in terms


of object identity: two lists are equal if and only if they are the
same object.

• In modern programming languages, equality of lists is normally


defined in terms of structural equality of the corresponding entries,
except that if the lists are typed, then the list types may also be
relevant.

• Lists may be typed. This implies that the entries in a list must have
types that are compatible with the list's type. It is common that lists
are typed when they are implemented using arrays.

• Each element in the list has an index. The first element commonly has
index 0 or 1 (or some other predefined integer). Subsequent elements
have indices that are 1 higher than the previous element The last
element has index <initial index> + <size> − 1.

• It is possible to retrieve the element at a particular index.

• It is possible to traverse the list in the order of increasing index.

• It is possible to change the element at a particular index to a different


value, without affecting any other elements.

• It is possible to insert an element at a particular index. The indices of


previous elements at that and higher indices are increased by 1.

• It is possible to remove an element at a particular index. The indices


of previous elements at that and higher indices are decreased by 1.

Implementations
• Lists are typically implemented either as linked list(either singly or
doubly-linked) or as arrays usually variable length or dynamic
arrays.
• A linked list is a data structure that consists of a sequence of data
records such that in each record there is a field that contains a
reference (i.e., a link) to the next record in the sequence.

• A linked list whose nodes contain two fields: an integer value and a
link to the next node.

• Linked lists are among the simplest and most common data
structures; they provide an easy implementation for several important
abstract data structures, including stacks, queues, associative
arrays, and symbolic expressions.

• The principal benefit of a linked list over a conventional array is


that the order of the linked items may be different from the order that
the data items are stored in memory or on disk. For that reason,
linked lists allow insertion and removal of nodes at any point in the
list, with a constant number of operations.

Linked lists can be implemented in most languages. Functional


languages such as Lisp and Scheme have the data structure built in, along
with operations to access the linked list.

Procedural languages, such as C, or object-oriented languages, such


as C++ and Java, typically rely on mutable references to create linked lists.
Basic concepts and nomenclature

Each record of a linked list is often called an element or node.

The field of each node that contains the address of the next node is usually
called the next link or next pointer. The remaining fields are known as the
data, information, value, cargo, or payload fields.

The head of a list is its first node, and the tail is the list minus that node
(or a pointer thereto).

Linear and circular lists


In the last node of a list, the link field often contains a null reference, a
special value that is interpreted by programs as meaning "there is no such
node".
A less common convention is to make it point to the first node of the list; in
that case the list is said to be circular or circularly linked; otherwise it is
said to be open or linear.

A circular linked list

Singly-, doubly-, and multiply-linked lists

Singly-linked lists contain nodes which have a data field as well as a next
field, which points to the next node in the linked list

A singly-linked list whose nodes contain two fields: an integer value and a
link to the next node

In a doubly-linked list, each node contains, besides the next-node link, a


second link field pointing to the previous node in the sequence. The two
links may be called forward(s) and backwards, or next and prev(ious).

A doubly-linked list whose nodes contain three fields: an integer value, the
link forward to the next node, and the link backward to the previous node

The technique known as XOR-linking allows a doubly-linked list to be


implemented using a single link field in each node. However, this technique
requires the ability to do bit operations on addresses, and therefore may not
be available in some high-level languages

In a multiply-linked list, each node contains two or more link fields, each
field being used to connect the same set of data records in a different order
(e.g., by name, by department, by date of birth, etc.). (While doublylinked
lists can be seen as special cases of multiply-linked list, the fact that the
two orders are opposite to each other leads to simpler and more efficient
algorithms, so they are usually treated as a separate case.)
Operations on Sorted Lists
Suppose that we have a sorted of size N in which we are storing n < N
sorted objects. There are three different operations we will focus on:

• Inserting an new object into the sorted list,

• Accessing/finding/modifying an object which is currently already in the


sorted list, and

• Deleting an object which is currently in the sorted list.

We will look at these run times under three circumstances:

• The operation affects the front of the sorted list (e.g., deleting the
smallest element),

• The operation affects an arbitrary location in the list, and

• The operation affects the back of the sorted list (e.g., inserting an
element larger than any element currently in the sorted list).

Sorted List as an Array


• The first implementation we will look at is an array:

• Suppose that we have an array of size N in which we are storing n < N


sorted objects. We assume N > n so that we do not have to create a
new array when inserting a new element.

• We will look at an implementation where the smallest element in the


entry array[0], the next largest in array[1], and so on.

Finding an arbitrary element, however, must be done by using a binary


search. In a binary search, we find the middle element. If that is the
element we are looking for, we are done, otherwise, we search the left or
right halves, depending on whether the element we are looking for is less
than or greater than the middle element, respectively.

As an example of a binary search, suppose we are attempting to find 17 in


the array given in Figure below.
2 5 7 8 12 17 19 21 25 26 28 31 33 34 39

The middle element (at location 7) is 21, and because 17 < 21, we exclude
the right half, as is shown in Figure below.

Next, we find the middle element of the left half: 8. Because 17 > 8, we
search the right half of the new list, as is shown in Figure below

2 5 7 8 12 17 19 21 25 26 28 31 33 34 39

Now, the middle element is 17 so we have successfully concluded our


search.

If we denote the runtime of a binary search by T(n), then the run time for
searching a list of size n is a constant number of operations to find and test
the middle element and then the time it takes to search either the left or the
right halves (but only one).

That is, T(n) = T(n/2) + Θ(1). In the special case where n = 1, we know
that the run time is O(1): all we must do is check the one element. To
determine the run time of this, we assume n = 2k, and therefore k =
log2(n):

T(n) = T(2k)

= T(2k/2) + 1 = T(2k − 1) + 1

= (T(2k − 1/2) + 1) + 1 = T(2k − 2) + 2

= (T(2k − 2/2) + 1) + 2 = T(2k − 3) + 3 After k steps, the pattern

suggests that we will get:

T(n) = T(2k − k) + k

= T(1) + k
=1+k

Note, however, k = log2(n), and therefore T(n) = O( 1 + log2(n) )

=O( log2(n) ). Thus, it follows, the time it takes to find an arbitrary object is
O(ln(n)).

Stacks
A stack is a linear list in which all additions and deletions are restricted to
one end, called top.

If you inserted objects into a stack and then you removed them, the order
of the objects is reversed. Numbers inserted as 1, 2, 3, 4, 5 are removed as
5, 4, 3, 2, 1.

This reversing attribute is the reason why stacks are called LIFO (last-
in, first-out) data structure.

It behaves very much like the common stack of plates or stack of


newspapers. The last object added to the stack is placed on the top and is
easily accessible, while objects that have been in the stack for a while are
more difficult to access.

Thus the stack is appropriate if we expect to access only the top object, all
other objects are inaccessible.

The three natural operations on a stack are push (insert), pop (delete),
and top (view).

The method push adds an object to the top of the stack.

The method pop removes an object from the top of the stack.

The method top reads the value on the top of the stack (it does not remove
it).

If useful, we can combine top and pop and get the method topAndPop which
returns and removes the most recent object from the stack.
Applications
There are numerous applications of stacks:

Creating an undo and redo feature for an application

Implementing forward and back in a web browser,

Matching parentheses when parsing code,

Allowing recursive function calls,

Reversing items in a list,

Tracking local variables in parsing,

Reverse-polish calculator, and Recording unallocated items.

Creating an Undo and redo Feature

Most application allows the user to undo the previous action performed.
This is usually implemented by recording what changes must be made to
revert the current state into the previous state. These are packaged
together and stored as a single object on the stack, and if undo is selected,
then the top set of instructions is popped from the stack, the reversions are
implemented, and the application is now in the previous state.

Usually, when an undo is performed, a corresponding redo feature is pushed


onto another stack, allowing the user to undo the undo. This second stack is
emptied whenever a new change is pushed onto the undo stack.

Forward and Back in Web Browsers


Implementing forward and back is similar to undo and redo, except that
the URLs, perhaps represented as strings, are pushed and popped from the
stack.

Each time a new link or page is visited, the previous link is pushed onto the
back stack while the forward stack is emptied. If the back button is selected,
the current URL is pushed onto the forward stack and the last URL is popped
from the back stack. If the forward button is selected, the current URL is
pushed onto the back stack, and the next URL is popped from the forward
stack.
In a browser, the back and forward arrows are disabled if the corresponding
stacks are empty. Visiting a new page or selecting an active forward button
will always make the back stack non-empty, while selecting an active back
button will always make the forward stack non-empty.

Matching Parentheses
In most programming languages today, the grammar for the language
requires that delimiters be matched in order, including parentheses (),
brackets [], angled brackets <>, and braces {}.

A pair of delimiters are said to be matched if all other delimiters contained


within the opening element and closing element are themselves matched.

For example, the delimiters (abc), [abc], <abc>, and {abc} are vacuously
matched, while the parentheses in (abc[def]ghi) is matched because the
brackets contained with them are also matched.

The delimiters in (abc[def)ght] are not matched because the parentheses


contains an opening bracket which is not matched within the a
corresponding closing bracket.

Reversing Items in a List


Place all the items into a stack, in order, and then take them out of the
stack, placing them into the order which they come out in.

Of course, this is extremely suboptimal if you are using an array to store the
data you are trying to reverse (just swap the appropriate entries), but may
be useful if the number of items to be reversed is unknown.

Reverse-Polish Calculator(Notation)
Implementations of RPN are stack-based; that is, operands are popped from
a stack, and calculation results are pushed back onto it.
Although this concept may seem obscure at first, RPN has the advantage of
being extremely easy, and therefore fast, for a computer to analyze due to
it being a regular grammar Practical implications

• Calculations proceed from left to right

• There are no brackets or parentheses, as they are unnecessary.

• Operands precede operator. They are removed as the operation is


evaluated.

• When an operation is made, the result becomes an operand itself (for


later operators)

• There is no hidden state. No need to wonder if you hit an operator or


not.

Example

• The calculation: ((1 + 2) * 4) + 3 can be written


down like this in RPN:

• 12+4*3+

• The expression is evaluated in the following way


(the Stack is displayed after Operation has taken
place):
Input Stack Operation

1 1 Push operand

2 1, 2 Push operand
+ 3 Addition

4 3, 4 Push operand

* 12 Multiplication

3 12, 3 Push operand

+ 15 Addition
This method of defining operators does not require parentheses: there is no
possibility for ambiguity as there may be with
5 + 3 * 4 - 7, which may be interpreted in as many as 3! = 6 different
ways. The associations are given below.

((5 + 3) * 4) - 7 = 25 5 3 + 4 * 7 (5

+ 3) * (4 - 7) = -24 5 3 + 4 7 - * (5

+ (3 * 4)) - 7 = 10 3 4 * 5 + 7 5 +

((3 * 4) - 7) = 10 3 4 * 7 - 5 + (5 +

3) * (4 - 7) = -24 4 7 - 5 3 + *

5 + (3 * (4 - 7)) = -4 4 7 - 3 * 5 +

Queues
The concept of a queue is quite familiar to humanity: entering a lineup in
anticipation of receiving a service. We can generalize this concept as
follows:

A queue is an abstract data type which has the following properties:

• A queue is a container which can store zero or more items,

• There are two operations on a queue: inserting an object into the


stack (termed enqueing into the queue) and removing objects out of
the queue (termed dequeuing from the queue).
• The properties of enqueuing onto and dequeing from the queue obeys
the following rule: the object dequeued from the queue is that object
which has been longest in the queue.

This behaviour is often summarized as first in, first out or FIFO.

Recursion

Many examples of the use of recursion may be found: the


technique is useful both for the definition of mathematical
functions and for the definition of data structures. Naturally, if a
data structure may be defined recursively, it may be processed
by a recursive function!

Recursive functions
Many mathematical functions can be defined recursively:

• factorial

• Fibonacci

• Euclid's GCD (greatest common denominator)

• Fourier Transform

Many problems can be solved recursively, eg games of all types from simple
ones like the Towers of Hanoi problem to complex ones like chess. In
games, the recursive solutions are particularly convenient because, having
solved the problem by a series of recursive calls, you want to find out how
you got to the solution. By keeping track of the move chosen at any point,
the program call stack does this housekeeping for you! This is explained in
more detail later.
Example: Factorial
One of the simplest examples of a recursive definition is that for the factorial
function:

factorial( n ) = if ( n = 0 ) then 1 else n * factorial( n-1 )

A natural way to calculate factorials is to write a recursive function which


matches this definition:

function fact( int n ) { if ( n == 0 ) return 1; else return n*fact(n-


1); }

Note how this function calls itself to evaluate the next term.

Eventually it will reach the termination condition and exit. However, before
it reaches the termination condition, it will have pushed n stack frames onto
the program's run-time stack.

The termination condition is obviously extremely important when dealing


with recursive functions. If it is omitted, then the function will continue to
call itself until the program runs out of stack space - usually with moderately
unpleasant results!

Another commonly used (and abused!) example of a recursive function is


the calculation of Fibonacci numbers. Following the definition:

fib( n ) = if ( n = 0 ) then 1 if ( n = 1 ) then 1 else fib( n-1 ) +


fib( n-2 ) one can write:

function fib( int n ) { if ( (n == 0) || (n == 1) ) return 1; else return fib(n-


1) + fib(n-2); }

Short and elegant, it uses recursion to provide a neat solution

Data structures also may be recursively defined. One of the most important
class of structure - trees - allows recursive definitions which lead to simple
(and efficient) recursive functions for manipulating them.

Searching
Computer systems are often used to store large amounts of data from which
individual records must be retrieved according to some search criterion.
Thus the efficient storage of data to facilitate fast searching is an important
issue. In this section, we shall investigate the performance of some
searching algorithms and the data structures which they use.

Sequential Searches
Let's examine how long it will take to find an item matching a key in the
collections we have discussed so far. We're interested in:

1. the average time 2. the

worst-case time and

3. the best possible time.

However, we will generally be most concerned with the worst-case time as


calculations based on worst-case times can lead to guaranteed performance
predictions. Conveniently, the worst-case times are generally easier to
calculate than average times.

If there are n items in our collection - whether it is stored as an array or as


a linked list - then it is obvious that in the worst case, when there is no item
in the collection with the desired key, then n comparisons of the key with
keys of the items in the collection will have to be made.

To simplify analysis and comparison of algorithms, we look for a dominant


operation and count the number of times that dominant operation has to be
performed.

In the case of searching, the dominant operation is the comparison, since


the search requires n comparisons in the worst case, we say this is a O(n)
(pronounce this "big-Oh-n" or "Oh-n") algorithm.

The best case - in which the first comparison returns a match - requires a
single comparison and is O(1).

The average time depends on the probability that the key will be found in
the collection - this is something that we would not expect to know in the
majority of cases

Thus in this case, as in most others, estimation of the average time is of


little utility. If the performance of the system is vital, i.e. it's part of a
lifecritical system, then we must use the worst case in our design
calculations as it represents the best guaranteed performance.
Implementations of a queue.
A. Queue with a Singly Linked List

We can implement a queue with a singly linked list

The front element is stored at the first node

The rear element is stored at the last node

The space used is O(n) and each operation of the Queue ADT
takes O(1) time

Applications of queues
There are numerous applications of queues. The most common is where
there are one or more servers satisfying requests from any number of
clients. Examples of this are:

A web server where each request comes from a separate web


browser,

An operating system (OS) assigning the CPU to one of many

processes (a process is a program which is executing),

An OS assigning a resource such as a printer to a specific process,

A bank with one or more tellers and zero or more clients waiting to

access their accounts, and


A store with one or more check-out counters with zero or more

customers waiting to be pay for their groceries.

Other applications include:

Temporary storage for communication from one application to another


where the speed of the applications may be different, or where both
applications are competing for the CPU (also called pipes), and The
operating system recording keystrokes.

INTRODUCTION TO TREES
In computer science, a tree is an abstract model of a hierarchical structure

Trees are extremely important in computer science because:

• they provide us with a means of organizing information so that it


canbe accessed very quickly: any node in a tree can be reached in
very few steps (the time required to get from the "root" to any node in
the tree is determined by the complexity of the algorithm used)

• the "branching factor" of a tree permits a high degree of


organizationof the information which it contains (think of the
organization of files and sub-directories in a directory tree).

We can represent a tree as a construction consisting of nodes,


and edges which represent a relationship between two nodes
The node which is at the top (if such a node exists) is termed the
root node.

The nodes to which it is joined by an edge are its children, and it


is the parent of those nodes.

Each of these children, in turn, may have one, many, or no


children, however, each node always has exactly one parent
(except of course the root node).

Children who share the same parent are called siblings.

A node with no children is referred to as a leaf.

Nodes which are the same distance from the root (have the same number of
edges lying between them and the root) are at the same level.

The height of the tree is the maximum distance between the root and any
node.

The degree of a node is the number of its children.


Structure of Njala University

SECRETARIAT

NJALA CAMPUS BO CAMPUS

SCHOOLS SCHOOLS

DEPARTMENTS DEPARTMENTS

A tree consists of nodes with a parent-child relation

Applications:

• Organization chart

• File systems

• Programming environments
The terminology for a tree is based on one of three possible analogies:

• Physical trees,

• Family trees, and

• Graph theory.

For each node in a linked list, if we consider the next node to be its
successor, then we may redefine a non-empty linked list as follows:

1. There is a first (or head) node,

2. Each node has either zero or one successors (a node which follow it),
and

For each node, except for the head, there is a node which has that node as
its successor

In a finite linked list, there is exactly one node which has no successor, and
that node is called the tail node of the linked list. An example of such a
linked list is shown in Figure.

In this definition, we restrict the number of successors of each node to a


maximum of one. Suppose, however, that we relax this description to allow
multiple successors, that is, any node may have an arbitrary number of
successors (including none). An example of such a structure is shown in
Figure

As suggested by the shape (standing it on end) we will call the first element
the root and the entire structure a tree.

Generalizing the definition of a linked list, we may define a non-empty tree


as follows:

There is a root node (the tree equivalent of a head node in a linked


list),

Each node has zero or more successors (nodes which follow it), and

For each node, except for the root, there is a node which has that
node as its successor.

The terms root and leaf come from botany, however, the standard
representation of a tree is to draw it with the root node at the top, as shown

From this form comes the terminology of family trees, as shown


Any individual within a family tree may be considered a node. Given any
node, it has zero or more successors. These successors are termed
children, while each node within the family tree (with the exception of the
top-most, or root, node), there exists exactly one node which is its parent.

We define the following:

• If a node Y is a successor of a node X, then:

• Y is said to be a child of X, and

• X is said to be the parent of Y; and

• If nodes Y and Z share a common parent X, then Y and Z are said to


be siblings.

A path is a sequence of nodes (N0, N1, ..., Nn) such that Nk − 1 is a parent

of Nk for k = 1, 2, ..., n. The length of this path is n. If there exists a path

from X to Y, then X is said to be an ancestor of Y, while Y is said to be

a descendant of X.

From this comes the peculiar definition that each node is both an ancestor
and a descendant of itself. To avoid this, we further say that if there exists
a non-trivial path (i.e., a path of length n ≥ 1) from X to Y, then X is said
to be a proper ancestor of Y, while Y is said to be a proper
descendant of X.

Size of a tree is the total number of nodes of this tree.


Nodes that have no children are called leaves or external nodes. All
other nodes are called internal nodes.

Depth of a node in a tree is the length of the path from the root to the
node. Height of a node is the length of the longest path from a given node
to the deepest leaf. The height of a tree is equal to the height of the root.

A tree with n nodes must have n−1 edges.

A general rooted tree is a set of nodes that has a designated node called
the root, from which zero or more subtrees descend.

Each subtree itself satisfies the definition of a tree.

Every node (except the root) is connected by an edge from exactly one
other node. There is a unique path from the root to each node. An empty
tree has no nodes.

C is the
sibling
of B
D, E, F, G, I are
external nodes, or
leaves

A, B, C, H are internal nodes.

The depth (level) of E is 2.

The height of the tree is 3.

The degree of node B is 2.

Examples

Describe the tree and the node D shown below


The root node is A.

The height of the tree (the maximum depth of any node within the tree) is
5.

The degree of the root node is 2 and its children are B and C.

The parent node is D, the degree is 3 and its children are E, F and G.

The proper ancestors of D are C and A,

The unique path from the root to D is (A, C, D) and

The length of this path is 2 and therefore the depth of D is 2.

The proper descendants of D include all nodes from E to K.


The height of the subtree with root D is 3.

The number of edges is equal to the number of node minus one.

Tree Traversals
It is straight-forward to visit all the elements in either a linked list or an
array: start at the front and step through the elements one at a time.

What to do for trees is less clear: how do you visit all of the nodes within a
tree in an ordered fashion?

Consider the simple tree shown in Figure

We would like to visit each of the nodes within the tree. A scheme for
visiting all of the nodes within a tree is termed a traversal of the tree. The
term visit is used to describe a point where some for of operation or
function is applied to a node.

The simplest way to traverse the nodes of this tree would be to visit all
nodes at level 0 (the root node) and then all nodes at level 1 (B-E).

Such a traversal is shown in Figure , where the nodes are visited in the
order A-B-C-D-E. Because all the nodes at each level are visited before the
level is incremented, such a traversal is termed a breadth-first traversal.

Such a traversal is easy to describe and to implement, however, it isn't as


useful as some of the other traversals we will examine.
Such a traversal is easy to describe and to implement, however, it isn't as
useful as some of the other traversals we will examine.

By definition, a tree does not have any loops, and therefore, we can follow
a path along the outside of all of the nodes. For convenience, we will begin
and end at the root.

Such a walk is termed an Euler walk and is shown in Figure

Rather than visiting all the nodes at each level, such a walk around the
nodes of a tree immediately goes as deep as possible into the tree, and
therefore such traversals are collectively termed depth-first traversals.

Figure above emphasizes that each node is approached at least twice: one
first time, and one last time. These two approaches are shown below.

If we restrict ourselves to actually visiting a node only the first time visit,
the traversal is termed a pre-order depth-first traversal, while if we visit
the node only with the last approach, the traversal is termed a post-order
depth-first traversal.

The first thing to notice is that with a pre-order traversal, the parent
node is visited before any child nodes, while with a post-order traversal,
the parent node is only visited after all the child nodes have been visited.
This is critical in applications: in some cases, it is necessary to know
information about the children before the parent can be processed. In other
cases, the children must have information about the parent before they can
be processed.
Figure shows a depth-first traversal of a more complex tree.

The order in which the nodes are visited using a pre-order traversal are:

ABFGCDHE
The order in which the nodes are visited using a post-order traversal are:

FGBCHDEA

Note that immediately in a pre-order traversal, all the ancestors of a given


node have been visited before the current node is visited (e.g., A and D are
visited before H and the root node, A (the anscestor of all nodes in the tree)
is visited first) while in the post-order traversal, all the descendants of a
given node are visited before the node is visited (e.g., F and G are visited
before node B, and all the nodes are visited before A).

Example 1

Perform a pre-order and a post-order depth-first traversal and a breadthfirst


traversal of the tree shown in Figure .

Pre-order depth-first traversal

35 25 12 7 16 28 26 33 74 63 42 68 87 79 94 Post-

order depth-first traversal

7 16 12 26 33 28 25 42 68 63 79 94 87 74 35 Breadth-

first traversal
35 25 74 12 28 63 87 7 16 26 33 42 68 79 94

Example 2:

Perform a pre-order and a post-order depth-first traversal and a breadthfirst


traversal of the tree shown

Pre-order depth-first traversal

12678DE93ABCD45

Post-order depth-first traversal

67DE892ABDC3451

Breadth-first traversal

123456789ABCDEF

Amortized Analysis of Tree Traversal


Time taken in preorder or postorder traversal of an n-node tree is
proportional to the sum, taken over each node v in the tree, of the time
needed for the recursive call for v.

The call for v costs $(cv + 1), where cv is the number of children of v
• For the call for v, charge one cyber-dollar to v and charge one

cyber-dollar to each child of v.

• Each node (except the root) gets charged twice:

once for its own call and once for its parent’s call.

• Therefore, traversal time is O(n).

Types of trees
Free Trees

A free tree is a connected, acyclic graph.

It has no root.

It is connected in that any node in the graph can be reached


from any other node by exactly one path.

It does not contain any cycles (aka circuits, or closed paths),


which would imply the existence of more than one path between
two nodes.

This is the most general kind of tree, and may be converted into the more
familiar form by designating a node as the root.

All the following types of trees are rooted trees.


Ordered Trees
An ordered tree consists of a root node and any number of children which
are ordered from "oldest" to "youngest", hence the name. Each of the
children may be the root of a sub-trees.

Note the recursive nature of this definition; it is one of the salient points of
the tree structure.

Note also that it is possible to have an unordered rooted tree, eg. a tree
representing the possible moves in a game of chess. Position Trees

This type of tree superficially resembles an ordered tree. However, the


children are not identified by their age, but by their position. I.e., two
ordered trees consisting of only two nodes are identical: they both consist of
a root and an oldest child.

Two two-node position trees, on the other hand, may be quite different.
They both have roots, but one may have a child in position #1, and the
other a child in position #2.

In general, a k-ary position tree is a position tree with a branch factor of


k. The most important of these are binary trees (k=2).

Binary Trees
Algorithms for general trees tend to be complex, as each node may have a
different number of children.
Additionally, the children must be stored in some form of list, either an array
or a linked list. For a given node N, accessing and modifying these
structures is O(deg(N)) where deg(N) is the degree of the node (the
number of children).

Restricting the number of children to exactly 2, where a sub-tree may


optionally be empty, simplifies many operations.

A binary tree is a tree in which each node has exactly two subtrees: the
left subtree and right subtree, either or both of which may be empty.

Each of these children in turn is the root of a binary tree.

The recursive definition of the binary tree: it is either empty or consists of a


root, a left tree, and a right tree.

We will identify the two children as the left sub-tree and the right
subtree of a given node. This reflects the common presentation of such
trees, as is shown in Figure, each node having one sub-tree to the left and
the other sub-tree to the right. Figure explicitly shows empty sub trees
using a ∅. For the purposes of order, we will consider the left sub-tree to
precede the right sub-tree.

A binary tree

We will use the convention of not drawing in null sub-trees, as is shown in


Figure below. This should be intuitive, however, it is still necessary to
remember that each node has two sub-trees, however, some may be
empty.
A binary tree shown in previous Figure without any null sub-trees drawn

Certain definitions must be appropriately modified, in the obvious manner:

• A binary node is a leaf node if both sub-trees are empty.

• The degree of a node is the number of non-empty sub-trees.

Applications
Expression Trees

Any algebraic expression, for example, (3x + 4 + y)(5 - z/2), is composed


of both binary operators (+, ×, -, /) and objects (2, 3, 4, 5, x, y, z).

As each operator has two operands, we can store any operator as a binary
tree node with the two operands forming the two sub-trees.

Objects have no children.

For example, the above expression may be represented by the binary tree in
Figure below:
One observation about Figure above is that this is an ordered tree:

the interpretation of the tree depends on the order of the children, at


least, for some operators. For example, if the two children of / were
reversed, it would represent the expression (3x + 4 + y)(5 - 2/z).

If we were to restrict ourselves to cummutative operators, such as + and ×,


then we would could store this tree as an unordered tree.

Another observation is that if all of the nodes are numeric values, we may
use a post-order depth-first traversal of this tree together with a stack to
evaluate the tree:

When a node is visited during the post-order traversal, one of two things
occurs:

• If the node is storing a number, we push the number onto the stack,
and

• If the node is storing an operator, we pop the last two elements off of
the stack, apply the operation, and put the result onto the stack.

Constructing an Expression Tree

The folowing algebraic expression:a/b (c+(d−e)) was converted to its


postfix form ab/cde−+, by using an algorithm based on queue and stack.

A binary tree is a convenient data structure to store an algebraic expression.


The following algorithm is used to generate such a tree from a given postfix
form.

Algorithm

1. Read postfix form of the algebraic expression token by token and


initializea new node with the token.

2. Test the token.

(a) If the token is an operand, put it on a stack.

(b) If the token is an operator, pop from the stack two values and make the
first one the right child and the second one the left child of the operand.
Push the operand on the stack.

(c) Continue two previous steps until no token left in the postfix form of the
algebraic expression.

For example, consider the post-order depth-first shown in Figure . The order
in which the nodes are visited is indicated by the red line.

Figures below show how the change to the stack with the visit of each node,
ultimately resulting in a single value on the stack, the result of the
evaluation.
Visiting the node 3

Visiting the node 7

Visiting the node ×.


Visiting the node 4.

Visiting the node +.

Visiting the node 8


Visiting the node 2

Visiting the node ×.


EXAMPLE

Expression tree for the arithmetic expression ((6−(12−(3+7)))/(1+0)+2)


(2
(3+1))

Perfect Binary Trees


A perfect binary tree is one where each leaf node is at the same depth.
While these are not common in practice, they give lower bounds on certain
operations are easy to analyze.

A perfect binary tree is a binary tree where all the leaves are at the same
depth, as is demonstrated in Figure
Perfect trees of height 0, 1, 2, 3, and 4.

An empty binary tree is vacuously perfect, as all nodes (there are none) are
at the same depth. A binary tree with a signal node is also perfect, as there
is only one node.

A recursive definition of a non-empty perfect binary tree is

• A binary tree of height 1 is perfect, and

• A binary tree of height h > 1 is perfect if both the left and right
sub-trees are perfect binary trees of height h − 1.

A perfect binary tree of height 4 as a node with two perfect binary trees of
height 3.
Complete Binary Trees
A complete binary tree is one where the nodes are filled in breadth-first
traversal order. Consequently, all leaves in a complete binary tree of height
h must be at a depth of either h or h − 1.

Unlike perfect trees which tend to be rather sparse, it is possible to


represent certain abstract-data types as perfect trees. Specifically, we will
be looking at heaps.

A complete binary tree is a binary tree where the nodes are filled in the
breadth-first order. This suggests that all the the leaves of a complete
binary tree of height h are either at height h or h − 1. Unlike a perfect tree,
there exist perfect binary trees with an arbitrary number of nodes. The
complete binary trees for 1 through 10 nodes are shown in Figure

Complete trees with 1 through 10 nodes

An empty binary tree is complete perfect, as all nodes (there are none) are
do follow the breadth-first transversal. A binary tree with a signal node is
also complete, as there is only one node.

A recursive definition of a non-empty complete binary tree is

• A binary tree of height 1 is complete, and

• A binary tree of height h > 1 is perfect if:


• The left sub-tree is a complete binary tree of height h − 1 and
the right sub-tree is a perfect sub-tree of height h − 2, or

• The left sub-tree is a perfect subtree of height h − 1 and the


right sub-tree is a complete sub-tree (also) of height h − 1.

You might also like