You are on page 1of 176

Unit 1

Programming Fundamentals
Structure:

1.1 Introduction

1.2 Basics of Programming Concepts

1.3 Types of Programming Languages

1.4 Identifiers, Keywords, and Data Types

1.5 Variables, Constants, and Operators

1.6 Input and Output Statements

1.7 Concept of Object Oriented Programming

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or
licensees. This license is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will able to:

 Understand the fundamentals of Programming


 Differentiate between variables and constants
 Compare different operators
 Explain the concept of OOP

1.1 INTRODUCTION

Computers are part of our day to day life. We can’t imagine our life without computers. Right from
the Railway Reservations to Banks to Government Offices to Schools to the Internet, computers are
an indispensable part of our life.

Computers have made a huge impact on the world. Computers operate at very fast speed and they
seldom make any errors. However, computers are less flexible than humans. Computers are needed
to be programmed to complete a specified task/work. Basically, computers are dumb terminals
consisting of hardware components. Programs or Software are required to run the computer and its
peripheral components. Programs which are a list of instructions to be executed by computer are
known as the software of the computer.

In this unit, we are going to discuss the programming concepts with various constructs such as
identifiers and keywords, data types, constants, variables, input and output statement and also OOP
concepts.

1.2 BASICS OF PROGRAMMING CONCEPTS

A program is the set of instructions to be executed by the computer to accomplish a given task/work.
The basic unit of program is data. Program is a computer solution to any problem. Before starting with
the programming, correct information about the task which is to be performed/run by the computer,
customer needs, goals and constraints should be available. Based on the customer’s need and
requirement, the program could be designed accordingly. A Program can be defined as either a set of
written instructions created by a programmer or an executable piece of software.

The term computer programming is defined as “the series of steps involved in solving a problem on a
computer”. In computer jargon, data is simply the value assigned to a variable. The term variable
refers to the name of a memory location which can contain only one data at any point of time. Data
can be of different types. Types of data can be Character, Number or Boolean.

In Computer Application and business environment, data can be the name of the employee, employee
code, roll number of students, marks of the students in subjects, name of the person, etc.

For example, to write a program to add two numbers, we will require 03 variables and steps to perform
the given task of addition in program terminology which could look as follows:

Declare variables like A, B, C

Get first number value in the variable A

Get second number value in the variable B

Add value stored in variable A with the value stored in variable B

Store the value in the variable C


Display the final result i.e. the value stored in variable C

Generally, the languages which computer can understand to solve the given problem are known as a
Programming language. Above instructions can be written into computer understandable
programming languages to solve the problem of the addition of two numbers. Once the program or
instruction is written into a programming language, it is needed to be executed to get the desired
results. There are different programming languages available to work with computers.

Programming techniques

 Unstructured Programming

In this technique of programming, simple and small programs are written using one main function.
Program statements access data and modify data which is global throughout the whole program. Thus
the main program directly operates on global data. This programming technique provides tremendous
disadvantages once the program gets sufficiently large. For example, if the same program statement
sequence is needed or repeated at different locations within the program, the sequence must be
copied. This has led to the idea of writing procedures and functions.

 Procedural Programming

Procedural programming allows the programmer to write procedures for different sub-tasks involved
in the problem definition. A procedure call is used to invoke the procedure. After the program
statements under the procedure are processed, program control is passed to the position where the
call was made. A program can be viewed as a sequence of procedure calls. The main program is
responsible to pass data to the individual calls, the data is processed by the procedures and, once the
program has finished, the resulting data is presented. By writing procedures in the programs, program
or software development can be more structured and error free. For example, if a written procedure
is correct and error free, every time it is used in the program, it produces correct results. Debugging
and testing of a structured program is a very simple and easy task.

 Modular Programming

A program is divided into modules and each module performs the subtask associated with the main
problem statements. In modular programming, procedures of a common functionality are grouped
together into separate modules. Each module can have its own data. The main program coordinates
call to procedures in separate modules and hands over appropriate data as parameters.

1.3 TYPES OF PROGRAMMING LANGUAGES

Programs, as defined, are a set of instructions to be executed by the computer system to perform or
accomplish a given task. Programs are generally written to solve a given problem. The computer can
understand only the language of 0 and 1.

A computer is essentially just a big bunch of tiny electronic switches that are either on or off. By setting
different combinations of these switches, you can make the computer do something, for example,
display something on the screen or make a sound. That’s what programming is at its most basic i.e.
telling a computer what to do.

Understanding which combination of switches will make the computer do what you want would be a
daunting task and that is where programming languages come in.
A programming language acts as a translator between you and the computer. Rather than learning
the computer’s native language (known as machine language), you can use a programming language
to instruct the computer in a way that is easier to learn and understand.

There are following types of Programming Languages:

1) Machine Language:

Machine Language is the only language understood by the computer. Machine level language is
written in 0 and 1. It is represented by binary numbers i.e. a sequence of 0’s and 1’s. Earlier programs
used to be written in machine language which consisted of 0 and 1 strings.

For e.g. 001101110011

The programming in machine level language was highly difficult and complicated and was subject to
error. Machine level language program is machine dependent, i.e. different computers use different
machine language.

Nowadays, programs are written into high-level programming languages like C, C++, Java, but these
programs must be translated into the machine language of the computer before the execution of the
program.

2) Assembly Language:

This is one level higher than the machine level language. Mnemonics are used to simplify the machine
level language instructions. Each assembly level language may have three parts: programmer defined
instruction, opcode, and the operand.

For e.g. ADD A1 B1

The above instruction specifies that add content of B1 into the A1.

3) High Level Language:

These languages are like our normal English language instructions. They require a special program
called compiler, which translates programs written in high language into machine language. The
program written into high level language is known as source program and after its translation into the
machine, language is called the object program. Today all the programs are written into high level
languages as they are easy to write and understand.

For e.g. FORTRAN, COBOL, BASIC, C, etc.

There are the following advantages of high-level languages:

a. Simplicity

As the programming instructions/commands are written into English like language, these are simple
and easy to understand.

b. Standardisation

Programs written in a high-level language are independent of the underlying hardware or machine
configuration. Hence it is accepted by any computer system, which has a compiler for that language
to execute and convert into machine level language.
c. Debugging and Error Findings

High-level language has its own syntax and set of rules which govern the writing of
instructions/statements in the language. Compiler checks for the syntax and rules before translating
the program into the machine level language. Hence the errors, if any in the programs, will be shown
by the compilers. Thus the error detection and debugging of the program is an easy task with high
level languages.

Assemblers, Compilers, and Interpreters

a. Assemblers

An assembler is a program which translates the program written in the assembly language into
machine level language. It is mainly used with the assembly language program.

b. Compilers

A compiler is a special program, which processes the statements/instructions written into high level
languages and translates them into machine level language, or code which the machine can
understand and processor uses.

For example, if the program is written in C++ language, then the C++ compiler first checks for syntax
errors and build the final object code from the given source code. The object code is the output of the
compilation process and object code is nothing but the machine code which the processor can
understand and processes the instruction accordingly.

c. Interpreter

The interpreter translates the program written in a programming language into machine level
language line by line. If it encounters an error at any line, it will halt at the same place. Command
interpreters are part of the operating system that interprets and understand the command typed and
executes the entered command for the user.

For e.g. Shell Command interpreter for UNIX operating system.

Check your Progress 1

Match the Following.


i. Compilers a. Translates Mnemonic codes into machine language
ii. Interpreters b. Translates the whole program at a time
iii. Assemblers c. Translates line by line

Fill in the Blanks.


1. _________ language has its own syntax and set of rules.
2. Compilers translate programs written in a ________ language into ______ language.
3. A _________ is the set of instructions to be executed by the computer to accomplish a given
task/work.
4. The basic unit of program is ________.
5. _________ is defined as “the series of steps involved in solving a problem on a computer”.
Activity 1
1. Write some examples of the machine, assembly and high-level languages.
2. Write the steps to input any 4 numbers and find the average of them.

1.4 IDENTIFIERS, KEYWORDS, AND DATA TYPES

 Identifiers

Identifiers are basically names given to program elements such as variables, arrays, and functions.
They are formed by using a sequence of letters (both uppercase and lowercase), numerals, and
underscores.

Following are the rules for forming identifier names:

1. Identifiers cannot include any special characters or punctuation marks (like #, $, ^, ?, ., etc.)
except the underscore “_”.
2. There cannot be two successive underscores.
3. Keywords cannot be used as identifiers.
4. The case of alphabetic characters that form the identifier name is significant. For example,
‘FIRST’ is different from ‘first’ and ‘First’.
5. Identifiers must begin with a letter or an underscore. However, the use of underscore as the
first character must be avoided because several complier-defined identifiers in the standard
C library have to underscore as their first character. So, inadvertently duplicated names may
cause definition conflicts.

 Keywords

Every computer language has a set of reserved words often known as keywords that cannot be used
as an identifier. All keywords are basically a sequence of characters that have a fixed meaning. By
convention, all keywords must be written in lower case letters. For example, int, float, for, while,
break, struct, return, etc.

 Basic Data Types

The data type determines the set of values that a data item can take and the operations that can be
performed on the item. Almost all programming languages explicitly include the notion of data type,
though different languages may use different terminology. Common data types include:

 integers
 booleans
 characters
 floating-point numbers
 alphanumeric strings

For example, in the Java programming language, the type int represents the set of 32-bit integers
ranging in value from −2,147,483,648 to 2,147,483,647, as well as the operations that can be
performed on integers, such as addition, subtraction, and multiplication. Colours, on the other hand,
are represented by three bytes denoting the amounts each of red, green, and blue, and one string
representing that colour's name; allowable operations include addition and subtraction, but not
multiplication.
Most programming languages also allow the programmer to define additional data types, usually by
combining multiple elements of other types and defining the valid operations of the new data type.
For example, a programmer might create a new data type named "complex number" that would
include real and imaginary parts.

Example: int a, float b, long c;

Table 1.1 Data types with size

1.5 VARIABLES, CONSTANTS, AND OPERATORS

A variable is defined as a meaningful name given to a data storage location in the computer memory.
When using a variable, we actually refer to the address of the memory where the data is stored.

 Numeric Variables

Numeric variables can be used to store either integer values or floating point values. Modifiers like
short, long, signed, and unsigned can also be used with numeric variables. The difference between
signed and unsigned numeric variables is that signed variables can be either negative or positive but
unsigned variables can only be positive. Therefore, by using an unsigned variable, we can increase the
maximum positive range. When we omit the signed/unsigned modifier, C language automatically
makes it a signed variable. To declare an unsigned variable, the unsigned modifier must be explicitly
added during the declaration of the variable.

 Character Variables

Character variables are just single characters enclosed within single quotes. These characters could be
any character from the ASCII character set—letters (‘a’, ‘A’), numerals (‘2’), or special characters (‘&’).

 Declaring Variables

To declare a variable, specify the data type of the variable followed by its name. The data type
indicates the kind of values that the variable can store. Variable names should always be meaningful
and must reflect the purpose of their usage in the program. In C, variable declaration always ends with
a semi-colon.
For example,

int emp_num;
float salary;
char grade;
double balance_amount;
unsigned short int acc_no;

In C, variables can be declared at any place in the program but two things must be kept in mind. First,
variables should be declared before using them. Second, variables should be declared closest to their
first point of use so that the source code is easier to maintain.

 Initializing Variables

While declaring the variables, we can also initialize them with some value. For example,

int emp_num = 7;
float salary = 9800.99
char grade = ‘A’;
double balance_amount = 100000000;

 Constants

Constants are identifiers whose values do not change. While values of variables can be changed at any
time, values of constants can never be changed. Constants are used to define fixed values like pi or
the charge on an electron so that their value does not get changed in the program even by mistake.

To declare a constant, precede the normal variable declaration with const keyword and assign it a
value.

const float pi = 3.14;

 Operators

Operators in programming languages are tools for some predefined operations. The types of
operators are:

1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
1. Arithmetic Operators

Arithmetic operators are used for arithmetic operations. The symbols used for arithmetic operations
are known as arithmetic operators. Following table shows the various arithmetic operators used in
programming languages with the precedence (highest to lowest) and associativity:
2. Relational Operators
Relational operators are used to testing the relationship between two variables and/or between
variable and constant. Following table shows the various relational operators used in programming
languages.

For example, the expression (age==35) means “is the value contained in the variable named age is
equal to 35?” The answer to this could be either ‘true’ or ‘false’. Or else it is ‘yes’ or ‘no’.

3. Logical Operators
Logical operators are used to combine the condition or decision statements. Following table shows
the various logical operators used in programming languages.

For example, the expression (age == 38 AND children <= 2) evaluates to true only if the variable ‘age’
contains the value 38 and the variable ‘children’ contains the value less than or equal to 2.

In the case of OR logical operators, the condition evaluates to true if any of the condition is true inside
the expression.

The logical operator NOT reverses the result of a condition. If the condition results in the true, the
NOT operator makes it false and if the condition returns false, then NOT operator makes it true.

1.6 INPUT AND OUTPUT STATEMENTS

In any programming language, we need to have input statements to enter or read the data from the
user and also output statements to display the result of processing to the user. In the C programming
language, the same can be achieved by using the printf() statements.

The general form of the printf() statement is:

printf(“format specifiers”, variable list);

Variable list is the name of variables whose values need to be displayed. And format specifiers are
required to display the value of the variable of different data types. For different data type of variables,
there are following format specifiers provided by the C programming language.

Thus if we want to display the value of the integer and float and character variable, then printf()
statements will be like:

printf(“%d %f %c”, i, f, c);

the above statement will display the value of integer value ‘i’ and floating point variable ‘f’ and
character variable ‘c’ on the output screen.

While reading the value from the user as an input, scanf() statement is used as shown below:

scanf(“format specifiers”, address of variable list);

For example:
If we need to read the value integer into variable ‘a’ then the statement would be like:

printf(“please enter the integer value for variable a:”);

scanf(“%d”, &a);

1.7 CONCEPT OF OBJECT ORIENTED PROGRAMMING


Object-oriented programming is based on ‘objects’. The objects are components having attributes
and methods. For example, files, forms, buttons, windows, etc. are objects because they have
properties such as size, colour, caption, etc. In object-oriented programming, program elements are
considered as objects.
What is an Object?
We have seen that an object is a component having attributes. These attributes are specified in
terms of two parameters, state, and behaviour. For example, bicycles have state (gear, pedal cadence,
wheels, etc.) and behaviour (braking, accelerating, slowing down, changing gears, etc.). On similar
grounds, a software object maintains its state in variables and implements its behaviour with methods.
Alternatively, an object has attributes (properties) and methods (functions).
The modern languages such as C++, VC++, VB, Java, .NET, Python & JavaScript are based on the
concept of OOPS. (Object Oriented Programming Systems).
A number of programming languages have been developed from the concept of classes and interfaces.
A class defines a set of data structures called ‘variables’, and the operations referred to as ‘methods’,
that are permitted on the variables. An interface defines a collection of methods that are to be
implemented by a class. The classes and interfaces contained in compilation units are organised into
‘packages’. Packages are used to group related classes and interfaces and to avoid naming conflicts.
Java, C++, Smalltalk, and some other object-oriented languages follow a class-based approach.
This approach allows us to declare classes that serve as a template from which objects are created.
A class defines the type of data that is contained in an object and the methods that are used to
access this data. A class also defines methods to be used to create objects that are instances of the
class. An instance can also be understood as an object in its memory space.
The analogy with ‘blank and filled-in application forms’
The concept of class and objects can be understood better with the analogy of a prescribed
application form. Suppose we have a blank application form. This application format comprises usual
fields such as name, address, qualifications, date of birth, etc. This blank application form can be
treated as a class. When we circulate copies of this form to a number of students in the class, each
student fills in the form. Now, although the fields of each form are the same, the information against
these fields is different. For example, the name field will have different records such as Rajesh, Amit,
Prasad, Pooja, etc. These filled-in forms can be called Objects. These objects are an instance of the
class ‘application form’.
Access modifiers
A Class is a set of data and functions. On the other hand, the structure is also a set of data and
functions. However, the main difference between class and structure is that the class supports OOPS
features such as encapsulation, inheritance, and polymorphism. The structure does not support OOPS
features.
The class is divided into two subsets - public and private. ‘Public’ class has global access i.e. access
from outside a class is possible. ‘Private’ class does not have access from outside a class. By default,
functions are public and data variables are private. The class layout is shown below figure.
Encapsulation:
Encapsulation is defined as packaging an object’s variables within the protective custody of its
methods. Typically, encapsulation is used to hide unimportant implementation details from other
objects. When you want to change gears on your bicycle, you don’t need to know how the gear
mechanism works, you just need to know which lever to move. Similarly, in software programs, you
don’t need to know how a class is implemented, you just need to know which methods to invoke.
The concept of encapsulation was not available in conventional structured programming. This
concept is introduced in object-oriented programming. In the Java program, data fields and methods
are the two main elements. The encapsulation enables the user to hide, inside the object, both the
data fields and the methods that act on that data. Thus the user can control access to the data. In
object-oriented programming, an object’s data is always private to the object. Other parts of a
program should never have direct access to that data. Thus encapsulation is nothing but data-hiding.
We can always hide data inside functions, just by making that data local to the function. If the
data is to be made available to other functions, make the data global to the program, which gives any
function access to it. This concept is available in conventional structured programming and will not
serve the purpose of data hiding. The other way is to make our data global to the functions that need
it but still prevent other functions from gaining access. This role is performed by Encapsulation. In an
object, the encapsulated data members are global to the object’s methods, yet they are local to the
object. However, they are not global variables.
One of the characteristics of object-oriented programming that is often touted in discussions of the
subject is encapsulation. The term carries the connotation of an object being enclosed in some sort of
container - and that is exactly what it means. Encapsulation is the combining of data and the code that
manipulates that data into a single component - that is, an object. Encapsulation also refers to the
control of access to the details of an object’s implementation. Object access is limited to a well-
defined, controlled interface. This allows objects to be self-contained and protects them from
accidental misuse, both of which are important to reliable software design.
The advantages of encapsulation are:
1) Data hiding: The object has a public interface. That is, the other objects can communicate with it.
However, the object can maintain private data and methods that can be changed at any time without
affecting the other objects that depend on it. For example, the principle of working and construction
of bike is not required to be understood for using the bike.
2) Modularity: The Java language is a modular language. The source code for an object can be written
and maintained independently of the source code for other objects. We can easily pass the objects
within a program. For example, we can give our bicycle to others and it will still work.
Inheritance:
Inheritance enables you to create a class that is similar to a previously defined class, but one that
still has some of its own properties. Suppose that you have a class for a regular 2 wheeler bike. Let us
suppose that we have to create a bike that has an additional high-speed gear. In conventional
structured programming, we are required to modify the program extensively. This might introduce
bugs into the program.
In order to avoid these problems, we can create a new class by inheritance. This new class inherits
all the data and methods from the base class of bike. The keywords like public, private, and protected
can be used to control the level of inheritance.
The concept of inheritance is derived from the fact that the children inherit many of their
attributes from their parents. In addition to these characters, they also have their own characteristics.
In object-oriented programming, a base class is analogues to a parent and a derived class is analogous
to children.
The advantages of Inheritance are as follows:
1) We can implement super-classes called abstract classes.
2) By using inheritance, we can reuse the code in the superclass a number of times.

Polymorphism:
Polymorphism is the ability to assume different forms. In programming, it is treated as the ability of
objects to have many methods of the same name, but with different forms.
By using polymorphism, we can create new objects that perform the same functions as the base object
but which perform one or more of these functions in a different way. For example, consider an area
object that finds the area of a circle. By using polymorphism, you can create an area object that
computes the area of a rectangle. We can do this by creating a new method that calculates the area
of a rectangle. Both the old circle-area and the new rectangle-area methods have the same name say,
method area.
An example of Encapsulation, Inheritance, and Polymorphism is given below:
We have seen above that the bike is as an object which possesses various attributes such as direction,
position, and speed. The object bike has various means such as handle, kick, and brakes. These means
to act on the attributes. In order to construct a class for a bike object, the attributes (direction,
position, and speed) are considered as the class’s data fields and the means (handle, kick and brakes)
represent methods.

The first step in creating an object is to define its class.


class bike
{
data position;
data direction;
data speed;
method Handle();
method Kick();
method Brake();
}
The above example shows how a class is created. It also shows how encapsulation works. In the above
example, the bike is base class. It has three data fields viz. direction, position, and speed. These three
data fields can be operated by the three methods Handle(), Kick(), and Brake(). The Handle() method
is used to change the direction of the bike. Similarly, the method Kick() is sued to start the bike and
the method Brake() stops the bike. Thus position, direction, and speed can be controlled by these
three methods. The data fields and methods are encapsulated inside the class. These data fields are
private to the class. Therefore, these data fields cannot be directly accessed from outside of the class.
Only three methods of the class can access the data fields.

Now, let us create a new bike that has a special gear. To do this, we can use OOP inheritance to
derive a new class from the bike base class.
Class newbike inherits from bike
{
method newstyle();
}
The class newbike implicitly inherits all the data fields and methods from the base class of bike.
This performs a method called newstyle(). In addition, it also has the direction, position, and speed
data fields and three methods Handle(), Kick(), and Brake(). This is an example of inheritance.
Now let us suppose that we want a new kind of bike that has all the characteristics of a newbike,
except that its additional gear is twice as fast as the gear of newbike.
class Fastbike inherits from newbike
{
method newstyle();
}
The class Fastbike looks exactly like the original class newbike. However, rather than just inheriting
the newstyle() method, it defines its own version. This new version makes the bike move twice as fast
as the newstyle() method of class newbike.
In this way, the Fastbike class implements the same functionality as the newbike() class, but it
implements that functionality a little differently. Because the Fastbike class inherits from newbike,
which itself inherits from the bike, a Fastbike also inherits all the data fields and methods of the bike
class. We can control how inheritance using the public, protected, and private keywords.
For example, the book is the object that provides access to information (data). The functions open,
close, read, etc. are associated with the term book.
The process of creation of objects is provided by the constructor. Using Objects, the program can be
simplified. Moreover, the software developed can be reused by us.

Messages

In a pure object-oriented programming model, such as that used by Smalltalk, objects interact by
sending messages to each other. When an object receives a message, the object invokes a method to
process the message. The method may change the state of the object, return information contained
in the object, or cause objects to be created or deleted.
The object model used by Java is consistent with the concept of message passing but does not
emphasize it. In the Java model, objects interact by invoking each other’s methods. Methods provide
access to the information contained in an object. The type of access varies depending on the method.
Software objects interact and communicate with each other by sending messages to each other. When
object A wants object B to perform one of B’s methods, object A sends a message to object B.

The message comprises the following three components:


1) The object to whom the message is addressed.
2) The name of the method to perform.
3) Any other parameters needed by the method.
These three components are enough information for the receiving object to perform the desired
method. No other information or context is required.
Dynamic Binding
Sometimes a communication link is required to be established between objects of many different
classes. The method for accessing the object is known as dynamic binding. For example, when a Java
applet is executed by the web browser, the applet requires the loading of classes located on other
computers connected across the Internet. Dynamic binding allows the use of new and modified
objects. The recompilation is not required. The compiler provides an executable code. This
dynamically interfaces with unknown objects during program execution.

Nowadays, object-oriented programming languages are widely used in software development. They
are mainly used in the GUI (Graphical User Interface) designing and development. The main use of
object-oriented programming languages is in the real time business systems development.

Object-oriented programming is used in the following program development:

1. Expert Systems Development


2. Artificial Intelligence System Development
3. Real-Time System Development
4. CAD/CAM Systems Development
5. Artificial Neural Network
6. Parallel Programming
7. Object-Oriented Databases
8. Decision Support Systems
9. Automation Systems

Check your Progress 2


State True or False.
1. Keywords that cannot be used as an identifier.
2. Constants are identifiers whose values can be changed.
3. Inheritance is defined as packaging an object’s variables within the protective custody of
its methods.

Summary
 Computer programming is defined as “the series of steps involved in solving a problem on a
computer”.
 Machine level language is written in 0 and 1. The programming in machine level language is
highly difficult and complicated to write and understand. Assembly level language has three
parts: programmer defined instruction, op-code, and the operand.
 High-level languages are like our normal English language instructions.
 An assembler is a program which translates the program written in the assembly language
into machine level language.
 Identifiers are basically names given to program elements such as variables, arrays, and
functions.
 The data type determines the set of values that a data item can take and the operations that
can be performed on the item.
 A variable is defined as a meaningful name given to a data storage location in the computer
memory.
 Operators in programming languages are tools for some predefined operations, the types of
operators are arithmetic, relational, and logical.
 Object-oriented programming ties data more closely to the functions that operate on them
and protects data from accidental modifications from outside functions.
 Core OOPS concepts are Class, Object, Inheritance, Polymorphism, Abstraction, and
Encapsulation.

Keywords
 Programming: The series of steps involved in solving a problem on a computer.
 Operators: Tools for some predefined operations in programming languages.

Self-Assessment Questions
1. What do you mean by a Computer Program?

2. Explain the different types of computer languages in brief.

3. Explain the difference between assemblers and interpreters.

4. Explain the OOP concept.

5. What are identifiers? List the rules of defining identifiers.

6. Differentiate between logical and relational operators.

Answers to Check your Progress


Check your Progress 1

Match the Following.

i. → b.

ii. → c.

iii. → a.

Fill in the Blanks.

1. High-level language has its own syntax and set of rules.

2. Compilers translate programs written in a high-level language into machine language.

3. A program is the set of instructions to be executed by the computer to accomplish a given


task/work.
4. The basic unit of program is data.

5. Computer programming is defined as “the series of steps involved in solving a problem on a


computer”.

Check your Progress 2

State True or False.

1. True
2. False
3. False

Suggested Reading
1. Balagurusamy, E. Object Oriented Programming with C++. New Delhi: Tata McGraw-Hill.
2. Rajaraman, V. Fundamentals of Computers. New Delhi: PHI Learning.
3. Thareja, R. Data Structures Using C. Oxford University Press.
Unit 2
Control Flow
Structure

2.1 Introduction

2.2 If Statement

2.3 The Concept of Loops

2.4 The Continue and Break Statements

2.5 “Switch” - Decision Structure

Summary

Keywords

Self-Assessment Questions

Answers to Check Your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or
licensees. This license is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will be able to:

• Explain decision-making in programming


• Describe the different forms of ‘if’ statements
 Define the term loop/iteration
• Differentiate between continue and break statements
• Explain nesting of various loop constructs
• Describe switch-case construct of programming
• Describe the usage of the switch-case-default construct

2.1 INTRODUCTION
In real-life also we take number of decisions based on the situations or conditions.

For example:

1. If I will get more than 90 percentage in maths, physics and chemistry, then I will join
engineering course.
2. If I will get more than 90 percentages in biology, physics and chemistry, then I will join medical
course.

Thus you can notice one thing in above written sentences, that final action or result is based on the
certain condition being fulfilled or the result is based on that condition itself.

In programming also many times we come across certain conditions and results or flow of program is
based on those conditions. Also, we come across situations, where we need to execute some
instructions repeatedly based on certain conditions. It means, execution of instruction in a block will
be repeated based on some decision. Here comes the concept of iteration or loop in programming.
Iteration or loop is the most powerful construct of computer programming. In this unit, we will study
the decision making structures including special decision making structure namely “switch-case”, the
various loop or iteration constructs of programming.

2.2 IF STATEMENT
This is one of the most important fundamentals in computer programming. It is used in making
decisions while writing the programs. The ‘if’ construct makes use of relational and logical operators
for decision making. If the condition is true, then program flow is directed to take one course of action
and if the condition is false, then the program flow is directed to do something else.

The general form of ‘if’ construct is:

if (condition)

Statement1;

Statement2;

End if

The keyword if tells the compiler that whatever follows it is a decision instruction enclosed in the
parenthesis. If the condition enclosed in the pair of parenthesis is true, then execute the statements.
If the condition is not true, then skip that part of statements. This is simple “if construct” which can
be represented in flowcharts as follow:

Fig. 2.1 Illustrating If construct with the help of flow-chart

From the above flow-chart, it is cleared that, if the condition in the “if construct” is true, then
“processing X” will be executed. If the condition is false, then program flow or control will not execute
“processing X” block. It will directly go to execute “processing Y” block.

Generally, relational operators are used for conditional statements. The relational operator allows us
to compare two values or check for any conditions. Following table shows the different conditional
statements with the help of relational operators.

Let us write down a program to find out whether the entered number is less than 100 or not by using
the ‘if’ statements.

Problem Statement:

Write down a pseudo code to find out whether the entered number is less than 100 or not.

Pseudo code:

Start
Declare variable num

Display “enter the number”

Accept num

if num > 100

Start if loop

Display “the entered number is less than 100”.

End if loop

End

 if-else Statement

The simple ‘if statement’ structure will execute or run the statements under it only if the given
condition is true. If the condition is false, then it simply will not execute those statements which are
under ‘if’. When we want to execute the statements when the condition is not true, then we need to
use ‘if-else’ structure. The general form of ‘if-else’ structure is:

if (condition)
{
Statement1;
Statement2;
Statement3;
}
else
{
Statement4;
Statement5;
Statement6
}
This “if-else construct” can be represented in flowcharts as follows:

Fig. 2.2 Illustrating If-else construct with the help of flow-chart


From the above flow-chart, it is clear that, if the condition in the “if construct” is true, then “processing
X” will be executed. If the condition is false, then “processing Z” will be executed. And at the end,
program flow or control will execute “processing Y” block. Let us write down a program to illustrate
the if-else construct.

Problem Statement:

If the marks obtained in exam is greater or equal to 95 percentage, then join Medical course. Else go
for engineering course. Write down a pseudo code using if-else construct.

Pseudo code:

Start
Declare variable marks
Display “enter the marks”
Accept marks
if marks >= 95
Start if loop
Display “you got more than 95 percentage......go...join Medical course”.
End if loop
else
Start if loop
Display “go...join Engineering course”.
End else loop
End

 Nested if-else

The ‘if’ construct can be nested. In other words, an ‘if’ construct can be present inside the body of
another ‘if’ construct. We can write an entire ‘if-else’ construct within the body of the ‘if’ statements
or inside the body of the ‘else’ statements. It can be represented as:

if (condition)
{
Statements;
if (condition)
{
Statements;
}
else
{
Statement;
}
}
else
{
if (condition)
{
Statements;
}
else
{
Statements;
}
}

Let us write down a program to illustrate the nested ‘if else’ constructs.

Problem statements:

If you will get more than 95 marks, then you will join medical course or else you will join engineering
course. In medical and engineering course based on your marks in English, you will be given division A
or B. If your marks in English will be more than or equal to 75, then you will be given division A or else
you will be given division B.

Pseudo code:

Start
Declare variable marks
Declare variable English_marks
Display “enter the marks”
Accept marks
Display “enter englishmarks”
Accept English_ marks
if marks >= 95
Start if
Display “you got more than 95 percentage......go...join Medical course”.
if (English_marks >=75)
Start if
Display “division A is allotted”.
End if
else
Start else
Display “division B is alloted”.
End else
End if
else
Start if
Display “go...join Engineering course”.
if (English_marks >=75)
Start if
Display “division A is alloted”.
End if
else
Start else
Display “division B is alloted”.
End else
End Else
End

Thus while writing or handling nested “if-else” construct; we need to be careful about opening and
closing braces as those may create confusion in large programs. Hence the proper comments are
inserted to increase the readability of the program. The best program practice is to write the start and
end braces at the same time to avoid missing out the same.
Use of logical operators

Logical operators, as studied in previous units, can be used while constructing the condition or
decision making statements in ‘if’ constructs. The following logical operators can be used in condition
building.

1. AND (&&)

2. OR (||)

3. NOT (!)

When we want to combine two conditions in single ‘if’ constructs, then use of logical operators is
required. If we want to execute the group of statements when the condition1 and condition2 both are
true, then we must use ‘AND’ logical operator. In this case, statements will be executed when both
the conditions listed are true.

If we want to execute the group of statements when the condition1 or condition2 any one of them is
true, then we must use ‘OR’ logical operator. In this case, statements will be executed when any of
the conditions listed is true.

Check your Progress 1


Fill in the Blanks.
1. The ‘if’ construct makes use of ____________and __________ operators for decision-making.
2. The ________ operator allows us to compare two values or check for any conditions.
3. The “else statement” is written exactly _______________ the “if statements” block.
4. The simple ‘if statement’ structure will execute or run the statements under it only if the given
condition is _______.
5. Proper __________ are inserted to increase the readability of the program.

Activity 1
1. Write a pseudo code to find out whether the entered number is equal to 99 or not.
2. Write a program to input the marks of student and display the percentage based on those
marks.
3. Write a program to input the salary of employees and based on the given table calculate
the bonus amount to be paid to them.
 Salary is <3000 then bonus amount is of 10% of Salary
 Salary is >3000 and <=5000 then bonus amount is 20% of Salary
 Salary is >5000 and <=10,000 then bonus amount is 25% of Salary
 Salary is >10,000 then bonus amount is 30% of Salary

2.3 THE CONCEPT OF LOOPS


One of the most important characteristics of computer is its ability to perform a set of instructions
repeatedly. The versatility of a computer is its ability to execute a series of instructions repeatedly.
This cycle introduces the concept of loops, which allows the user great flexibility in controlling the
number of times a specific task is to be repeated.
This repetitive operation is done through a loop control structure. The set of instructions or group of
statements are repeated either a specified number of times or until a given condition is being satisfied.

There are following three methods by way which one can repeat the part of program. Those are:

1. Using a While statement


2. Using a Do-While statement
3. Using a For statement

The segment of program that is executed repeatedly is called as Loop. The loop concept is important
and essential to good problem solving techniques. There are situations where we need to repeat
certain group of statements in program to repeat based on the conditions, such as:

1. To calculate 100 students marks and find out their percentages;


2. Adding the marks of five subjects for 50 students;
3. Adding first 50 numbers;

• The While Loop

The general form of while loop is as follows:

While (condition)
{
Statement1;
Statement2;
:
:
}
The operation of the while loop will be as follows:

The condition in the brackets will be evaluated. If the condition evaluates to true, then statements
within the body of the while loops are executed. Then the condition is again evaluated. Thus the
repetition is involved. If the condition is again evaluated to true, then again the statements within the
while loop body are executed. This process of repetition continues until the test condition becomes
false. At that point, the loop is terminated and the statements following the while loop will be
executed. Thus the statements in the loop are executed and repeated for fixed number of times based
on the condition. The condition in ‘while loop’ can be compound condition using logical and relational
operators. The flow-chart showing the while loop working can be shown as follows:
Fig. 2.3 Flow chart illustrating general while loop construct of programming

From the flow chart, it is clear that when the condition in while loop is true, then processing X and
processing Y and processing Z blocks are executed. After that, again the control will go back to the
decision box to check for the condition. If the condition is again true, then again those processing
blocks are repeated. Thus, iteration of those instructions which are inside the while loop is done based
on fulfilment of the condition. If the condition is false, then remaining processing is done as shown in
the Fig. 2.3.

Let us write down a pseudo code to illustrate the while loop concept of programming.

Problem statement:

Write down a program which will accept the English marks of 20 students and find their average.

Pseudo code:

Start
Declare variable Counter;
Declare variable E-marks;
Declare variable Sum;
Declare variable Average;
Assign value 0 to Counter;
Assign value 0 to Sum
Assign value 0 to Average
While counter is less than 20
Start of while loop
Display “enter English marks:”
Accept E-marks
Sum = Sum + E_marks
Counter = Counter + 1
End of while loop
Average = Sum/20
Display “average of English marks of 20 students”
Display average
End

Explanation of the pseudo code:

In the above written pseudo code, counter variable is initialised to 0. And since we want to take 20
students’ English marks, we will repeat the statements which are inside the while loop body for 20
times by checking the counter to 20. The counter variable value is increased to 1 in each while loop
execution. Thus the while loop will be executed for 20 times. Once the counter reaches to value 20,
then it will come out of while loop. And execution of the statements which are outside the while loop
is done. We have also calculated sum of all 20 students’ English marks and then outside while loop,
we will calculate the average of those marks.

Thus in while loop, usage following things are important:

1. Initialisation of counter variable


2. Checking the value of counter in while condition
3. Incrementing or decrementing of counter variable

And it can be written as follows:

Initialise the counter variable;


While (condition using counter variable)
{
Statements;
Statements;
Increment or decrement counter variable;
}

The statements which are inside the while loop will keep on executing until the condition becomes
false. Thus if condition is not false, then the statements inside the while loop are repeated indefinitely.
Thus proper care should be taken to handle the condition or terminate the condition based on the
certain variable value as shown above.

The condition being tested may use relational or logical operators as discussed earlier. Some examples
showing those are:

 While (y <= 25)


 While (a >= 10 && b >= 20)
 While (y <= 90 || z > 100)
• The do-while loop

The general form of do-while loop is as follows:

do
{
Statements;
Statements;
:
Statements;
} while (condition);

In do-while loop, the statements inside the body of loop are executed once whether the condition
specified inside the while loop is true or false. Irrespective of condition is true or false, statements
within the do while loops are executed for single time.

Whereas in while loop, statements inside the while loop are only executed if the condition specified
inside the while loop is true. The do-while loop will execute the statements inside the loop if the
condition fails.

The ‘while loop’ will not execute the statements inside the loop if the condition fails. Apart from this
difference, the ‘do-while loop’ works same as that of ‘while loop’. The following flow-chart illustrates
the execution of do-while loop:

Fig. 2.4 Flow-chart showing the operations involved in do-while loop

• The FOR loop

The ‘for loop’ is the most popular loop or iteration construct in the programming language.

The general form of the ‘for loop’ is as follows:

for (initialisation of counter; condition or test counter; increment or decrement counter)


{
Statements;
Statements;
:
Statements;
}

The ‘for loop’ programming construct allows us to write:

1. Setting the loop counter


2. Testing the loop counter for a particular condition
3. Incrementing or decrementing counter

And all of those 3 things listed above can be specified in the single line inside the parenthesis following
the ‘for loop’ as shown. The same can be shown in flow-chart as follows:

Fig. 2.5 Flow-chart showing the operations involved in for loop

The difference between while and for loop is that in while loop, initialisation is done outside the while
loop i.e. before the start of while loop and at while loop starting only condition is checked and
decrement or increment of condition variable is done inside the body of the while loop.

In for loop, initialisation, condition testing and decrement or increment of the condition variable is
done in the single statement inside the parenthesis of for loop as shown in the general form of the
while loop.

Let us write down a pseudo code to illustrate the for loop concept of programming.

Problem statement:

Write down a program which will accept the English marks of 20 students and find their average.

Pseudo code:

Start
Declare variable Counter;
Declare variable E-marks;
Declare variable Sum;
Declare variable Average;
Assign value 0 to Sum
Assign value 0 to Average
For (counter=0; counter<20; counter=counter+1)
Start of for loop
Display “enter English marks:”
Accept E-marks
Sum = Sum + E_marks
End of for loop
Average = Sum/20
Display “average of English marks of 20 students”
Display average
End

Explanation of the pseudo code:

In the above written pseudo code, counter variable is initialised to 0. And since we want to take 20
students English marks, we will repeat the statements which are inside the for loop body for 20 times
by checking the counter to 20. The counter variable value is increased to 1 in each for loop execution.
Thus the ‘For loop’ will be executed for 20 times. Once the counter reaches to value 20 then it will
come out of for loop. And execution of the statements which are outside the ‘For loop’ is done. We
have also calculated sum of all 20 students’ English marks and then outside for loop we will calculate
the average of those marks.

• Nesting of Loops

The way If statements can be nested, i.e. inside if statements, there can be another if statements, this
while, do-while and for loop can also be nested. That is ‘while loop’ can be present inside the ‘for loop’
or ‘for loop’ can be present inside the ‘while loop’ or ‘while loop’ may be nested inside the ‘while loop’
or inside the ‘do-while loop’.

The various nesting can be of following types:

1. for (initialisation of counter; condition or test counter; increment or decrement counter)


{//outer for loop
Statements;
for (initialisation of counter; condition or test counter; increment or decrement counter)
{//inner for loop
Statements;
Statements;
:
Statements;
}
Statements;
}
2. for (initialisation of counter; condition or test counter; increment or decrement counter)
{
Statements;
While (condition)
{
Statements;
:
}
Statements;
}
3. while (condition)
{
Statements;
for (initialisation of counter; condition or test counter; increment or decrement counter)
{
Statements;
Statements;
:
Statements;
}
Statements;
}
4. do
{
Statements
for (initialisation of counter; condition or test counter; increment or decrement counter)
{
Statements;
Statements;
:
Statements;
}
Statements;
} while(condition);

Thus we have seen the nesting of for loop inside the while loop and vice versa. Care should be taken
while writing nesting of loops in programming about condition checking and closing of loops on proper
conditions to avoid indefinite looping.

2.4 THE CONTINUE AND BREAK STATEMENTS


 The break statement

There are some situations where we want to jump out of a loop instantly without waiting for the
condition to become false. The keyword break allows us to do this. When keyword break is
encountered inside any loop (while or for or do-while loop), the program control is automatically
transferred to the first statement after the loop.

The break keyword is usually associated with an ‘if construct’.

For example:

program statements;
for (condition statements)
{
Program statements;
:
Program statements
If (condition)
{
program statements;
break;
}
program statements;
:
}
Program statements;

 The continue statement

During programming, there are some situations where we want to take the control to the beginning
of the loop, bypassing the statements inside the loop which have not yet executed. The keyword
‘continue’ allows us to do this. When keyword ‘continue’ is encountered inside any loop (while or for
or do-while loop), the program control is automatically transferred to the beginning of the loop.

The ‘continue’ keyword is usually associated with an ‘if construct’.

For Example:

program statements;
for (condition statements)
{
Program statements;
:
Program statements
If (condition)
{
program statements;
continue;
}
program statements;
:
}
Program statements;

Check your Progress 2


Fill in the blanks.
1. The segment of program that is executed repeatedly is called ________.
2. The versatility of a computer is its ability to execute a _________ of instructions
repeatedly.
3. The while loop is also called_______________.
4. With _______ loop, first the condition is checked and then the statements are evaluated.
5. The ________ loop will execute the statements inside the loop if the condition fails.
6. In for loop, initialisation, condition testing and decrement or increment of the condition
variable is done in the ___________ statement.
7. When keyword ___________ is encountered inside any loop, the program control is
automatically transferred to the beginning of the loop.
Activity 2
1. Write five examples, which can be solved using loops.
2. Using FOR loop, input 50 numbers and find their sum.
3. Write a program to illustrate the usage of break and continue statements.

2.5 “SWITCH” - DECISION STRUCTURE


The control statement which allows us to make a decision from the number of choices is known as a
“switch”. It is also called as a “switch-case default” decision making structure. These three keywords
“switch”, “case” and “default” collectively make up the given control statement.

The general form of the switch decision structure is:

Switch (integer value)


{
case constant value 1:
program statements;
case constant value 2:
program statements;
case constant value 3:
program statements;
case constant value 4:
program statements;
default:
program statements;
}

Explanation:

The integer value inside the braces after “switch” keyword is any expression or any integer value like
1,2,3,4. Based on that value, the appropriate case will be selected. And the program statements under
that case block are executed. And at the end of that, program control will come out of switch loop.

General form of Switch-Case-default construct

Thus the general form of the “switch-case-default” construct would be:

switch (choice)
{
case 1:
program statements:
:
break;
case 2:
program statements:
:
break;
case 3:
program statements:
:
break;
case 4:
program statements:
:
break;
case 5:
program statements:
:
break;
default:
program statements:
:
}

The operation of “switch-case-default” construct can be shown in pictorial form as shown in below
flow-chart:

Fig. 2.5 Flow-chart illustrating switch-case control structure

Usage of “switch-case-default construct”

The main use of switch-case-default program construct is in writing menu driven program. The
following program written in C++ language illustrates the same.

For example:

void main()
{
int answer;
while(1)
{//start of while loop..this is indefinite loop...
cout<<“1. Addition \n”;
cout<<“2. Subtraction \n”;
cout<<“3. Multiplication \n”;
cout<<“4. Division \n”;
cout<<“5. Exiting the program \n”;
cout<<“please enter your choice: \n”;
cin>>answer;
switch (answer)
{
case 1:
// enter logic for addition of two numbers
break;
case 2:
// enter logic for subtraction of two numbers
break;
case 3:
// enter logic for multiplication of two numbers
break;
case 4:
// enter logic for division of two numbers
break;
case 5:
// exiting the program
exit();
// this is system defined function used to come out of the program
default:
cout<<“please enter correct choice....”;
}
}//end of while loop....
}

When this program is executed, the menu is provided to the user.

1. Addition
2. Subtraction
3. Multiplication
4. Division
5. Exiting the program
Please enter your choice:

And based on the user choice entered, the respective case block is executed. For example, if the choice
3 is selected for multiplication, then case 3 will be selected and logic for multiplication will be
executed. And at the end of the case block again the menu is displayed since while(1) is an indefinite
loop and it will keep on executing the program or block of statements inside the loop until explicitly
exit(0) function is used.
Check your Progress 3
Fill in the Blanks.
1. The control statement, which allows us to choose from the number of available choices is
known as a _______________.
2. Switch statement is also called a _________ decision-making structure.
State True or False.
1. The break statement is used with switch-case statement to send the control of program
out of case block.
2. The main use of switch-case-default program construct is in writing menu-driven
program.

Activity 3
1. Write down a menu-driven program using switch-programming construct. The menu of
the program would be like:
i. Mumbai
ii. Pune
iii. Bangalore
iv. New Delhi
v. Exit
Based on the user choice entered, display the appropriate and respective message like
“you are in Mumbai”.
2. Write the syntax of switch case statement.
3. Write a program using switch case to input two numbers and find their sum, difference,
multiplication and division.

Summary
 The ‘if’ construct makes use of relational and logical operators for decision making. This is one
of the most important fundamentals in computer programming. It is used in making decisions
while writing the programs. If the condition is true, then program flow is directed to take one
course of action and if the condition is false, then the program flow is directed to do something
else.
 The relational operator allows us to compare two values or check for any conditions.
 Iteration or loop is the most powerful construct of computer programming.
 The repetitive operation is done through a loop control structure. The set of instructions or
group of statements are repeated either a specified number of times or until a given condition
is being satisfied.
 There are some situations where we want to jump out of a loop instantly without waiting for
the condition to become false. The keyword break allows us to do this.
 During programming, there are some situations where we want to take the control to the
beginning of the loop, bypassing the statements inside the loop which have not yet been
executed. The keyword ‘continue’ allows us to do this.
 The control statement which allows us to make a decision from the number of choices is
known as a “switch”. It is also called as a “switch-case default” decision making structure.
These three keywords “switch”, “case” and “default” collectively make up the given control
statement.
 To execute statements only of the respective blocks, we need to use the keyword break to
come out of the switch loop. Using “break” statement after every case block would make sure
that no other case blocks are executed and that control has come out of the switch loop.
 The “switch-case-default” construct can be implemented using the ‘if’ statements. For each
‘case block’ the equivalent ‘if’ statements are written. But this would be very tedious and
complex if the number of ‘case block’ is more in number.
 Normally the choice is on programmer to use “switch-case-default construct” or “if
constructs”.

Keywords
 Loop: The segment of program that is executed repeatedly.
 Iteration: Set of instructions or group of statements are repeated either a specified number
of times or until a given condition is being satisfied.
 Switch-case: The control statement which allows us to make a decision from the number of
choices.

Self-Assessment Questions
1. Explain the importance of decision making in real life as well as in computer programming.
2. What do you understand by the term simple “if construct”? Explain with the help of flow-
charts.
3. Explain the different forms of “if construct” with the help of suitable examples.
4. Explain the concept of loop in programming.
5. Explain the while loop with the help of flowcharts.
6. Explain the do-while loop with the help of flowcharts.
7. Explain the ‘for loop’ with the help of flowcharts.
8. What are the different types of nesting that can be possible using while, do-while and for
loop?
9. Explain what you understand by the term “switch-case-default” programming construct.
10. What is the main use of “switch-case-default” programming construct?
11. When should one use “if statements” instead of “switch-case-default” programming
construct?

Answers to Check your Progress


Check your progress 1

Fill in the Blanks.

1. The ‘if’ construct makes use of logical and relational operators for decision making.

2. The if operator allows us to compare two values or check for any conditions.

3. The “else statement” is written exactly below the “if statements” block.

4. The simple ‘if statement’ structure will execute or run the statements under it only if the given
condition is true.

5. Proper comments are inserted to increase the readability of the program.


Check your progress 2

Fill in the Blanks.

1. The segment of program that is executed repeatedly is called loop.


2. The versatility of a computer is its ability to execute a series of instructions repeatedly.
3. The while loop is also called entry controlled loop.
4. With while loop, first the condition is checked and then the statements are evaluated.
5. The do-while loop will execute the statements inside the loop if the condition fails.
6. In for loop, initialisation, condition testing and decrement or increment of the condition
variable is done in the single statement.
7. When keyword continue is encountered inside any loop, the program control is
automatically transferred to the beginning of the loop.

Check your progress 3

Fill in the Blanks.

1. The control statement, which allows us to choose from the number of available choices is
known as a switch.
2. Switch statement is also called a switch-case-default decision-making structure.

State True or False.

1. True
2. True

Suggested Reading
1. Balagurusamy, E. Object Oriented Programming with C++. New Delhi: Tata McGraw-Hill.
2. Rajaraman, V. Fundamentals of Computers. New Delhi: PHI Learning.
3. Data Structures Using C by Reema Thareja, Oxford University Press.
4. C Programming Language, by Brian W. Kernighan, Dennis Ritchie, Prentice Hall.
Unit 3
Arrays and Pointers
Structure:
3.1 Introduction

3.2 Arrays

3.3 Array Initialisation

3.4 Introduction to Character Arrays

3.5 Pointers

Summary

Keywords

Self-Assessment Questions

Answers to Check Your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or
licensees. This license is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will be able to:

• Define the term array


• Discuss the array handling rules
• Explain array initialisation techniques
 Define the term pointers
• Describe the use and advantages of pointers
• Describe the & and * operators used in handling pointer

3.1 INTRODUCTION

Programming or coding is used to write programs or instructions in computer programming languages


to solve the given problem statement. We have studied the different data types, constants and
variables used while programming. Variables are used to hold data in memory during program
execution. Before using any variable, it should be declared with the proper data types. During
programming, we may come across situations like declaring 100 or more than 100 variables to store
data, which is not convenient for the programmer. Here comes the concept of arrays.

In this unit, we will study the arrays in detail and how arrays are created and handled and their
importance in programming. Also, the pointers, use of pointers in programming and advantages of
using pointers in programming. Most of the programmers find it difficult to use the pointers and also
find it difficult to understand the pointers used in the programs. We will try to simplify the same here
in this unit to make the pointers understandable while programming.

3.2 ARRAYS

Consider a situation in which we have 100 students in a class and we have been asked to write a
program that reads and prints the marks of all the 100 students. In this program, we will need 100
integer variables with different names. Now to read the values of these 100 variables, we must have
100 read statements. Similarly, to print the value of these variables, we need 100 write statements. If
it is just a matter of 10 variables, then it might be acceptable for the user to follow this approach. But
would it be possible to follow this approach if we have to read and print the marks of 100 or more
than 100 students?

The answer is no, definitely not, it would be inconvenient. To process a large amount of data, we need
a data structure known as an array.

Definition of an Array: An array is a collection of similar data elements

These similar data elements could be all int, or all float or all char. An array is a group of data elements
that has a common name and array elements are differentiated from one another by their position or
subscript within the array.

Arrays play an important role in problem solving when large numbers of variables are to be used in
programming. One should remember that all data elements of any given array must be of the same
data type.

Declaration of Arrays

An array can be declared as:

Data type name_of_the_array[size];


That is an integer array of size 50 with the name numb which can be declared as:

int numb[50];

Thus, an array of size 50 of integer data type is declared. This array will store 50 variables of the integer
data type. The size or dimensions of the array tell the computer to reserve that many sizes of memory
locations to store given data types variable data during program executions. Here, in this case, 50 data
locations are reserved in program memory to store integer data.

Subscripts are used to manipulate the individual array elements. Subscripts start from 0. Thus, if the
size of an array is 10 then the subscripts to handle individual array elements will be from 0 to 9.

For example:

Int a [5];

Here, a [0] will represent the first element of the array, a [1] will represent the second element of the
array, a [2] will represent the third element of the array, a [3] will represent the fourth element of the
array and a [4] will represent the fifth element of the array.

Rules of Array handling

Some basic rules while handling or using arrays in programming are listed below.

 The dimensions of the array should be a positive integer.


 The name of the array should be unique.
 Name of an array variable also follows the variable naming rules.
 All elements of any array must be of the same data type.
 If an array is declared of integer data type, then it cannot contain another data type in that.
 Each element or subscripted variable in the array can be referred anywhere in the
programming just like an independent variable.
 Subscripts are used to manipulate the individual array elements. Subscripts start from 0.
 An entire array cannot be read or written all at once.
 Each element of an array must be read or written separately as if each of them is separate
individual variable with the help of array name and subscript.

Calculating the Address of Array Elements

We must want to know where an individual element of an array is located in the memory. The answer
is that the array name is a symbolic reference to the address of the first byte of the array. When we
use the array name, we are actually referring to the first byte of the array.

The subscript or the index represents the offset from the beginning of the array to the element being
referenced. That is, with just the array name and the index, C can calculate the address of any element
in the array.

Since, an array stores all its data elements in consecutive memory locations, storing just the base
address, that is the address of the first element in the array, is sufficient. The address of other data
elements can simply be calculated using the base address. The formula to perform this calculation is,

Address of data element, A[k] = BA(A) + w(k – lower_bound)

Here, A is the array, k is the index of the element of which we have to calculate the address, BA is the
base address of the array A, and w is the size of one element in memory, for example, size of int is 2.
The length of an array is given by the number of elements stored in it. The general formula to calculate
the length of an array is,

Length = upper_bound – lower_bound + 1

Where upper_bound is the index of the last element and lower_bound is the index of the first element
in the array.

Writing and Reading data into an array

Writing data into an array:

While entering the data into the array variable, generally ‘for’ loop is used for the same. General for
loop to enter the data inside the array element would look as follows:

for (counter=0; counter<size of an array; counter=counter+1)

Program statements to enter the data inside the array element

Reading data from an array:

Similarly while reading the data from the array variable, the ‘for’ loop is used. General for loop to read
the data inside the array element would look as follows:

for (counter=0; counter<size of an array; counter=counter+1)

Program statements to read/display the data inside the array element

Let us now write down a sample program to write the data into array elements and reading from the
array elements using a ‘for loop’;

Pseudo code:

Start
Declare a Variable Counter;
Declare Sample_Array of Size 20;
for Counter=0 to Counter<20 and Counter=Counter+1
Start for Loop
Display “Enter the Data”;
Accept Sample_Array[Counter];
End of for Loop
for Counter=0 to Counter<20 and Counter=Counter+1
Start for Loop
Display “Displaying the Data”;
Display Sample_Array[Counter];
End of for Loop
End
Usage of while loop:

While loops can also be used to enter the data into array elements and read the data from array
elements. The general form of while loop for entering the data into arrays are:

Counter=0;
While(Counter < size of the array)
{
Enter the data into array elements;
Counter=Counter+1;
}

The general form of while loop for reading/displaying the data from arrays are:

Counter=0;
While(Counter < size of the array)
{
Read or display the data into array elements;
Counter=Counter+1;
}

Let us now write down a sample program to write the data into array elements and reading from the
array elements using a “while loop”;

Pseudo code:

Start
Declare a Variable Counter;
Declare Sample_Array of Size 20;
Counter=0
While Counter<20
Start While Loop
Display “Enter the Data”;
Accept Sample_Array[Counter];
Counter=Counter+1
End Of While Loop
Counter=0
While Counter<20
Start While Loop
Display “Displaying the Data”;
Display Sample_Array[Counter];
Counter=Counter+1;
End Of While Loop
End

3.3 ARRAY INITIALISATION

1. Array members or elements can be initialised individually.

For example:

int a[4];
a[0]=10;
a[2]=30;
a[3]=40;
a[4]=50;
a[1]=20;

2. Another form of array initialisation will be at the time of declarations.

For example:

int a[5] = {12,23,45,56,67};

3. When array initialisation will be done at the time of declarations, then mentioning of the dimensions
or size of an array is optional.

For example:

int a[] = {12,23,45,56,67,78,89,90};

In the above example, the size of an array would be 8.

4. Arrays can be of two or three dimensions. These 2-D or 3-D arrays are used for matrix programming
and also help in solving complex mathematical problems.

For example:

Two dimensional arrays can be declared as:

int num_array[3][4];

This means that array store 12 data elements of integer data type. The initialisation of 2-d array would
be like:

int num_array[3][3] = {

{1,2,3},
{4,5,6},
{7,8,9}
};

3.4 INTRODUCTION TO CHARACTER ARRAYS

There are not only integer data types array available in the programming. We can also have character
data type arrays as well as floating point data type array variables in the programming. Character
arrays are a group of character data types. Group of characters is also called as “strings” in
programming. The general form of declaring character arrays is:

The character name of the array[size of the array];


For example:

char c[10];

This will declare an array of character size which is capable of storing 10 variables of the character
data type. Values to character arrays are assigned as follows:

char c[5];
c[0] = ’a’;
c[1] = ’b’;
c[2] = ’c’;
c[3] = ’d’;
c[4] = ’e’;

The initialisation of a character array can also be done in the following way:

char c[5];
c = “abcde”;

The initialisation of a character array can also be done in the following way also:

Char country [10] = “INDIA”;

String terminator:

 The end of the string is marked with the special character ‘\0’.
 This is known as string terminator or ‘NULL’ character.
 This special character ‘\0’ is known as string terminator which marks the end of the string of
any length.
 Thus while declaring a character array; the special character that is string terminator must be
counted.
 The character string declared should be large enough to accommodate the string terminator
character ‘\0’.

Let us write down a program to illustrate the concept of a character array.

Problem statement:

Write down a program to accept the name of your city in character array and display the same on the
screen.

Pseudo code:

Start
Declare character array cityname of size 30;
Display “enter the name of your city”;
Accept cityname;
Display “the name of your city entered is:” ;
Display cityname;
End

Let us write down a program to find the length of the string entered using the concept of character
array.

Problem statement:
Write down a program to accept the name of your city in character array and display the length of the
city on the screen.

Pseudo code:

Start
Declare character array cityname of size 30;
Declare counter;
Declare citylength;
Display “enter the name of your city”;
Accept cityname;
Counter = 0;
Citylength = 0;
While cityname[counter] is not equal to ‘\0’
Start while loop
Increment length by one
End wile loop
Display “the length of your cityname entered is:”;
Display citylength;
End

Check your Progress 1

Fill in the blanks.


1. An array is a collection of ___________ data elements.
2. ____________ are used to manipulate the individual array elements.
3. While entering the data into the array variable, ________ are used for the same.

State True or False.


1. The dimensions of the array should be a negative integer.
2. An entire array cannot be read or written all at once.
3. All elements of any array must be of the different data type.
4. The name of the array should be unique.
5. Arrays can be of two or three dimensions.
6. Character arrays are a group of integer data types.
7. The end of the string is marked with the special character ‘\0’.

Activity 1
1. Find the advantages and disadvantages of using arrays.
2. Write down a program to accept 100 students’ mark and find out their averages using the
concept of an array.
3. Find the method of declaring two-dimensional arrays.
4. Write a program to accept your first name and last name using the concept of the character
array and display the same on the screen.

3.5 POINTERS

Pointers are special computer program variables which hold the address of the variable of their data
type. As they point to the variable of their data types, they are known as pointers. Consider the
following declaration:
int a = 50;

This declaration informs the program language compiler to do the following things:

 ‘a’ is a variable of integer data type


 Reserve the space in the computer memory to hold integer variable data
 Store the value 50 at the memory location

The same can be represented in the following way:

Thus, any variable declared in computer programming should have:

 Name of the variable to identify the memory location


 The value stored at the variable
 Associated Memory location to store the data or value (address)

The pointer variables are widely used in the C programming language. C Programming language is
mainly used for system programming. It is used to write operating systems, device drivers and other
system related software/programs. Pointer variable stores the address of the variable and hence are
very fast in execution and thus are used mainly in the development of systems programming.

The & and * operator

The ‘&’ and ‘*’ operator are associated with the pointers. The ‘&’ is known as “address of“ operator.
This operator, when used with the variable, can give us the memory location address where the value
is stored during the program execution.

The ‘*’ is known as “value at address” operator. This is also known as indirection operator. This
operator, when used with the variable, can give us the value stored at the memory location address
during the program execution.

Pointer declaration

Pointers are declared by using * in front of the variable name. The symbol ‘*’ makes this variable
unique in the program. It also informs that this variable is a special variable which will hold the memory
location address of the other variable in the program.

For example:

int *a;

The above declaration informs the computer programming language compiler that ‘a’ is a pointer
variable which is capable of storing the value of an integer variable. Or in other words, this special
variable ‘a’ will point to an integer value and hence it is called as a pointer. Thus, this declared pointer
‘a’ will store the address of the integer variable.
The address of the other variable declared in the program can be associated with this pointer variable
as shown below:

int *a;
int b;
a=&b;

Thus, the pointer variable ‘a’ is assigned the address of the integer variable ‘b’ using the “&” operator.

As memory location address in the computer programming is whole numbers, hence pointer variable
always holds a whole number in them.

Let us write down a program to illustrate the concept of pointers widely used in the C Programming.

C Program to illustrate the usage of pointers in the program:

main()
{
int a;
int *b;
a=50;
b=&a;
printf(“value stored in variable a : %d”, a);
printf(“address of variable a : %u”, &a);
printf(“value stored in variable a : %d”, *b);
printf(“address of variable a : %u”, b);
}

The written program would give the output as follows:

value stored in variable a : 50


address of variable a : 4356
value stored in variable a :50
address of variable a : 4356

Explanation of the program:

int a; - This will declare the variable ‘a’ with integer data type.

int *b; - This will declare the pointer variable ‘b’ which is capable enough to store the address of the
integer data type variable.

a=50; - This statement assigns the value 50 to the variable ‘a’.

b=&a; - This will assign the address of the variable ‘a’ to the pointer variable ‘b’.

So now b will have stored the address of the variable ‘a’ which in this case is 4356 as displayed in the
output of the program.

printf(“value stored in variable a : %d”, a); - This statement will print the value stored in the variable
‘a’, which happens to be in this case as 50.

printf(“address of variable a : %u”, &a); - This statement will print the address of the variable ‘a’, which
happens to be in this case as 4356.
printf(“value stored in variable a : %d”, *b); - This statement will print the value stored in the variable
‘a’ through the use of ‘*’ operator of the pointer used with the conjunction with pointer variable ‘b’,
which happens to be in this case as 50

printf(“address of variable a : %u”, b); - This statement will print the address of the variable ‘a’ through
the use pointer variable ‘b’, which happens to be in this case as 4356.

Pointer variable declared ‘b’ will itself also have the memory location addresses. To find out the
address of the pointer variable, we need to write the ‘&b’.

Let us, write down one more program to illustrate the pointer concept in more details:

main()
{
int a;
int *b;
a=50;
b=&a;
printf(“value stored in variable a : %d”, a);
printf(“address of variable a : %u”, &a);
printf(“value stored in variable a : %d”, *b);
printf(“address of variable a : %u”, b);
printf(“value stored in variable a : %d”, *(&a));
printf(“value stored in variable b : %u”, b);
printf(“address of variable b : %u”, &b);
}

The output of the above program:

value stored in variable a: 50


address of variable a : 4356
value stored in variable a : 50
address of variable a : 4356
value stored in variable a : 50
value stored in variable b : 4356
address of variable b : 3987

printf(“value stored in variable a : %d”, *(&a)); - Using this statement, we have found the value stored
at the variable ‘a’ which in this case happens to be 50.

printf(“value stored in variable b : %u”, b); - This statement will display the value stored in the variable
‘b’ which is nothing but the address of the variable ‘a’, and in this case, it is 4356.

printf(“address of variable b : %u”, &b); - In this program using this statement, we have found the
address of the pointer variable which in this case happens to be 3987.

Char, Int and Float pointers

As we have studied the integer pointers which will point to the integer variable, similarly, we can have
character pointer and floating pointer as well.

1. Character Pointer Variables:

As the name suggests, the character pointer will point to the character variable. For example:
char *c;

Here ‘c’ is the character pointer which will point to the character variable. It means that it can store
the memory location address of the character type variable.

For example:

char *c;
char a;
a=’w’;
c=&a;

Now, the pointer variable ‘c’ will point to the character variable ‘a’. Thus,

Values stored in variable ‘a’ is: character ‘w’

The value represented by *c is: character ‘w’

The value represented by &a is: memory location address of character variable ‘a’

2. Integer pointer variables:

As the name suggests, integer pointer will point to the integer variable. For example:

int *c;

Here, ‘c’ is the integer pointer which will point to the integer variable.

It means that it can store the memory location address of the integer type of variable.

For example:

int *i;
int a;
a=23;
i=&a;

Now, the pointer variable ‘i’ will point to the pointer variable ‘a’. Thus,

Values stored in variable ‘a’ is: value 23

The value represented by *i is: 23

The value represented by &a is: memory location address of integer variable ‘a’

3. Floating point pointer variable:

As the name suggests, the float pointer will point to the floating point variable. For example:

float *f;

Here, ‘f’ is the float pointer which will point to the float variable. It means that it can store the memory
location address of the float type of variable.

For example:

float *f;
float a;
a=13.5;
f=&a;

Now, the pointer variable ‘f’ will point to the float variable ‘a’. Thus,
Values stored in variable ‘a’ is: 13.5
The value represented by *f is: 13.5
The value represented by &a is: memory location address of float variable ‘a’
Consider the following program statements:
int i;
float f;
char c;
int *w;
float *y;
char *z;

Thus, the following declarations would be invalid, as integer pointer cannot hold the address of the
floating point variable or character variable and similarly, a character pointer variable cannot point to
integer or floating point variable and same is true with floating point pointer variable which cannot
point to a character or integer data:

z=&i;
z=&f;
y=&i;
y=&c;
w=&f;
w=&c;

Check your Progress 2

Fill in the Blanks.


1. __________ are special computer program variables, which hold the address of the variable
of their data type.
2. _________ are used to write operating systems, device drivers and other system related
software/programs.
3. The ‘&’ is known as ___________ operator.
4. The ‘*’ is known as ___________ operator.

State True or False.


1. Pointers are declared by using * in front of the variable name.
Activity 2
1. Write a program to show the usage of pointers.
2. Consider the following statements:
int *a;
int b;
b=50;
a=&b;

a) What would &a yield?


b) What would *a give?

Summary
 An array is a group of similar data elements that has a common name and array elements is
differentiated from one another by their position within the array.
 Arrays play an important role in problem solving when a large number of variables are to be
used in programming. One should remember that all data elements of any given array must
be of the same data type.
 For and while loop can be used to enter the data into array elements and read the data from
array elements. It is more compact and easy to use.
 Character arrays are a group of character data types. Group of characters is also called as
‘strings’ in the programming.
 The end of the string is marked with the special character ‘\0’. This is known as a string
terminator.
 Pointers are a special computer program variable which holds the address of the variable of
their data type. As they point to the variable of their data types, they are known as pointers.
 The ‘&’ is known as “address of“ operator. This operator, when used with the variable, can
give us the memory location address where the value is stored during the program execution.
 The ‘*’ is known as “value at address” operator. This is also known as indirection operator.
This operator, when used with the variable, can give us the value stored at the memory
location address during the program execution.

Keywords
 Variables: It is used to hold data in memory during program execution.
 Strings: A group of characters in the programming.
 Pointers: A special computer program variable which holds the address of the variable of their
data type.
 “address of” Operator: Operator, when used with the variable, can give us the memory
location address where the value is stored during the program execution.
 Indirection Operator: Operator, when used with the variable, can give us the value stored at
the memory location address during the program execution.

Self-Assessment Questions
1. Define the term ‘arrays’ used in programming.
2. Explain the general declaration of an array in programming.
3. Explain with the help of examples, how ‘for loop’ can be used for writing and reading
operation in context with arrays.
4. Explain with the help of examples, how ‘while loop’ can be used for writing and reading
operation in context with arrays.
5. What do you understand by the term ‘String’? Explain with the help of examples.
6. Discuss the different ways of array initialisation.
7. Explain the 2-D arrays with the help of examples. Where are they used generally?
8. Define the term “pointer” used in programming.
9. Which are the different operators used while handling pointers in programming?
10. Discuss the advantages of using pointers in computer programming.

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. An array is a collection of similar data elements.


2. Subscripts are used to manipulate the individual array elements.
3. While entering the data into the array variable, loops are used for the same.

State True or False.

1. False
2. False
3. False
4. True
5. True
6. False
7. True

Check your Progress 2

Fill in the Blanks.

1. Pointers are special computer program variables, which hold the address of the variable of
their data type.
2. Pointers are used to write operating systems, device drivers and other system related
software/programs.
3. The ‘&’ is known as “Address of” operator.
4. The ‘*’ is known as Value at operator.

State True or False.

1. True

Suggested Reading
1. Balagurusamy, E. Object Oriented Programming with C++. New Delhi: Tata McGraw-Hill.
2. Rajaraman, V. Fundamentals of Computers. New Delhi: PHI Learning.
3. Data Structures Using C by Reema Thareja, Oxford University Press.
4. C Programming Language, by Brian W. Kernighan, Dennis Ritchie, Prentice Hall.
Unit 4
Functions
Structure:

4.1 Introduction

4.2 Basic Concept of Functions

4.3 General Form of the Functions

4.4 Functions Terminology

4.5 Advantages of Writing Functions

4.6 Use of Functions

Summary

Key Words

Self-Assessment Questions

Answers to Check your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will be able to:

• Describe the use of the functions


• Differentiate between declaring a function and defining a function
• Describe the advantages of the functions in programming

4.1 INTRODUCTION
We have studied the different loop constructs and decision constructs used in computer
programming. We have also studied the use of arrays and importance of arrays in computer
programming.

Structured coding allows us to work from the general aspects of the code to the more specific.
Program coding does not end with getting the required results, but the generated code has to be
maintained and improved in future based on customer requirement. And also programs written by
you will be used by another programmer or person, and then in this case there has to be proper
structure of the program such that it can be understood by the fellow software engineer.

Programs which are structured correctly offers a lot of advantages over the unstructured programs.
The principles of structured programming, when applied will help programmers to easily modify and
adapt the program for a particular situation. Structured coding or programming increases readability
and clarity within individual modules and thus increases the maintainability of that module. Structured
coding allows the coding of the outer block first and then the inner blocks. Each successive block allows
us to narrow our focus and concentrate on finer and finer details of the programs.

The structured programming can be achieved by dividing the main problem statements in the number
of subtasks which are independent of each other and if solved individually then upon combining solves
the given problem. These subtasks can be implemented in programming languages with the help of
functions. In this unit, we will focus on one of the very essential part of the program namely functions
which help in writing structured programs and increases the modularity of the program.

4.2 BASIC CONCEPT OF FUNCTIONS


A function can be defined as a group of statements or block of statements which perform a given task
of problem solving. Each and every program can be thought of as a collection of functions. The
function performs the coherent task of solving the problem. Function is a self-contained block of
statements written to solve the given problem.

Functions are written to increase the modularity of the program as the program is divided into ‘n’
number of sub-modules and each of those submodules will be implemented with the help of
functions. Writing functions for the small task or subtasks of the program increases the readability of
the program.

Functions help in improving the maintenance of the program. As we have seen while studying the
structured programming, that program can be divided into number of sub-programs or sub-modules
to solve the given problem statement. In programming language, all these subprograms or subtasks
or sub modules can be implemented with the help of functions. For each of those subprogram or
subtask or sub modules, separate functions are written to accomplish the given task of problem
solving. The same can be represented in the form of diagrams as shown in the figure below.
Fig. 4.1 Modular approach of solving the program

As shown in Fig. 4.1, the main program is divided into number of sub-programs or sub-modules or
sub-tasks. This division of problem in subtask helps in writing the structured programs. Thus the
modularity or structure programming provides ways to break up a long, continuous program into a
series of individual modules that are related to each other in a specified fashion. Usually it is easier to
break down a difficult task into a series of smaller tasks and then to solve each of these sub-tasks
separately. Smaller program can handle such subtasks. These smaller programs are called subprogram
or functions in programming languages.

Check your Progress 1


Fill in the Blanks.
1. _______________ is a self-contained block of statements written to solve the given
problem.
2. Functions help in improving the ___________ of the program.

4.3 GENERAL FORM OF THE FUNCTION


The general form of the function is written as follows:

Return type of the function name_of the_function (Data Type arguments1, Data Type arguments2,......
Data Type argumentsN)

Program statements

Program statements

:
Program statements

Consider the following C program which illustrates the use of functions in programming:

int main()

Printf( “hello....we are in main functions”);

Sample_Function();

Printf(“Oops...we are again back to main function”);

Sample_Function()

Printf(“hurray...we are inside the sample functions....”);

return;

The output of the program would be:

hello....we are in main functions

hurray...we are inside the sample functions....

Oops...we are again back to main function

Observations from the given program:

main( ) is also the function of program known as system defined functions. System defined functions
are the functions which are pre-defined by the systems or in the programming languages. These are
readymade functions, already defined and written inside the programming languages and used for
solving common problems. Sample_Function( ) is the function of program known as user-defined
functions.

User defined functions are the functions which are declared and defined by the user to solve their
own problems. If the function return data type is not specified, then by-default it will be an integer.

There are following various return type of functions:

• void - indicates functions returns nothing


• int - indicates functions returns an integer value
• float - indicates functions returns floating point value
• char - indicates functions returns character value
Functions always return a single value. Return keyword is used to transfer the program control back
to the main function.

During program execution, the first function to be executed is main() function (in C and C++
programming language). There can be more than one user-defined function in a given program. We
can have user-defined functions inside the user-defined functions. That means we can call system-
defined functions or user defined functions from any user-defined functions body.

Check your Progress 2

Fill in the Blanks.


1. Main() is also the function of program known as _________ functions.
2. Sample_Function() is the function of program known as _______________ functions.

Match the Following.


i. void a. Indicates functions returns character value
ii. int b. Indicates functions returns nothing
iii. float c. Indicates functions returns an integer value
iv. char d. Indicates functions returns floating point value

Activity 1
1. Find out the differences between system-defined and user-defined functions.

4.4 FUNCTIONS TERMINOLOGY


Declaring a function:

Declaring a function means just making a declaration of the program. This gives pre-information to
the language compiler about the type, name of the functions and arguments of the functions. This is
just the declaration of the program before writing the functions.

For example;

int Sample_Function( );

The above declaration gives the indication that there will be a Sample_Function() in the program
whose return type will be an integer value. And the function is defined elsewhere in the program.

Defining a function:

Defining a function means writing the body of the functions. Here we actually write the program
statements to accomplish the given problem solving task.

For example:

Sample_Function()

Printf(“hurray...we are inside the sample functions....”);

return;
}

The above shows the definition of a Sample_Function() where actual coding is written in programming
languages. This is the process of writing a body of the function.

Check your Progress 3


State True or False.
1. Declaring a function means just making a declaration of the program.
2. Defining a function means writing the body of the functions.

Activity 2
1. Find the purpose of using function declaration statement in programs.

4.5 ADVANTAGES OF WRITING FUNCTIONS


There are several advantages of writing functions. A function increases the re-usability of the code or
program. By writing functions, we can avoid rewriting the same code over and over. We can simply
call the functions to accomplish the specific task whenever it is required in the programming.

For example:

If we need to calculate an area of square and circle in the program for 10 times, then we need to write
those many program statements in the program for 10 times. Instead we will write functions which
will calculate the area of square and circle and then we will call that function in the program whenever
it is required. Thus the redundancy is avoided using the functions in programming.

A function increases the modularity of the program. The problem statement is divided into several
modules and each module is implemented as a function in program. Thus it will be very easy to keep
track of program logic when the functions are used in the programming. The flow of program can be
detected very easily when the functions are used in the program. Debugging and testing would be
very easy while functions are used in the program. Maintenance of program would be very simple and
fast in case functions are used in the programs.

Check your Progress 4


State True or False.
1. A function increases the re-usability of the code or program.
2. A function decreases the modularity of the program.

Activity 3
1. Find the advantages of using functions in programs.
4.6 USE OF FUNCTIONS
The following program will explain the use of the functions in developing the program.

Problem statement:

Write down a C++ program which will display the menu for doing addition, subtraction, multiplication,
division, square of a number using the functions.

C++ Program:

void main()

int answer;

answer=0;

while(answer!=6)

{//start of while loop..this is indefinite loop...

cout<<“1. Addition \n”;

cout<<“2. Subtraction \n”;

cout<<“3. Multiplication \n”;

cout<<“4. Division \n”;

cout<<“5. Square \n”;

cout<<“6. Exiting the program \n”;

cout<<“please enter your choice: \n”;

cin>>answer;

switch (answer)

case 1:

//logic for addition of two numbers

ADDFunction();

break;

case 2:

//execute logic for subtraction of two numbers

SUBFunction();

break;

case 3:
//execute logic for multiplication of two numbers

MULTIFunction();

break;

case 4:

//execute logic for division of two numbers

DIVIFunction()

break;

case 5:

//execute logic for division of two numbers

SQUAREFunction()

break;

case 6:

//exiting the program

exit();

default:

cout<< “please enter correct choice....”;

}//end of while loop....

//writing individual functions

ADDFunction()

int num1;

int num2;

int num3;

cout<<“please enter number 1”;

cin>>num1;

cout<<“please enter number 2”;

cin>>num2;

num3 = num1 + num2;


cout<<“addition of two number is:”;

cout<<num3;

return;

SUBFunction()

int num1, num2, num3;

cout<<“please enter number 1”;

cin>>num1;

cout<<“please enter number 2”;

cin>>num2;

num3 = num1 - num2;

cout<<“subtraction of two number is:”;

cout<<num3;

return;

MULTIFunction()

int num1, num2, num3;

cout<<“please enter number 1”;

cin>>num1;

cout<<“please enter number 2”;

cin>>num2;

num3 = num1 * num2;

cout<<“Multiplication of two number is:”;

cout<<num3;

return;

DIVIFunction()

int num1, num2, num3;


cout<<“please enter number 1”;

cin>>num1;

cout<<“please enter number 2”;

cin>>num2;

num3 = num1/num2;

cout<<“Division of two number is:”;

cout<<num3;

return;

SQUAREFunction()

int num1, num2;

cout<<“please enter number 1”;

cin>>num1;

num2 = num1 * num1;

cout<<“Square of the number is:”;

cout<<num2;

return;

Thus the menu-driven C++ program is written using the functions. On execution of the above program,
it would display on the screen the menu as follows:

1. Addition
2. Subtraction
3. Multiplication
4. Division
5. Square
6. Exiting the program

Upon entering the correct choice, the respective function is called to perform the given operation
according to the entered data.

Check your Progress 5


State True or False.
1. Functions are widely used for menu-driven programs.
Activity 4
1. Write a program to show the usage of functions.

Summary
• A function can be defined as a group of statements or block of statements which performs a
given task of problem solving. Each and every program can be thought of as a collection of
functions.
• Functions are written to increase the modularity of the program as the program is divided into
‘n’ number of sub-modules and each of those submodules will be executed with the help of
functions.
• Writing functions for the small task or subtasks of the program increases the readability of the
program.
• System defined functions are the functions which are pre-defined by the systems or in the
programming languages. These are readymade functions, already defined and written inside
the programming languages and used for solving common problems.
• The flow of program can be detected very easily when the functions are used in the program.
• Debugging and testing would be very easy while functions are used in the program.
• Maintenance of program would be very simple and fast in case functions are used in the
programs.

Keywords
• Function: A group of statements or block of statements which performs a given task of
problem solving.
• System Defined Functions: The functions which are pre-defined by the systems or already
defined by the programming languages.
• User Defined Functions: The functions which are declared and defined by the user to solve
their own problems.

Self-Assessment Questions
1. Define the term “functions” in the programming language.
2. Discuss the main advantages of using functions in computer programming.
3. Discuss how functions can help in increasing the modularity of the program. Explain with the
help of a neat diagram.
4. Explain what you understand by declaring a function.
5. Discuss the general form of the functions in the computer programming.

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. Function is a self-contained block of statements written to solve the given problem.


2. Functions help in improving the modularity of the program.
Check your progress 2

Fill in the Blanks.

1. Main() is also the function of program known as system-defined functions.


2. Sample_Function() is the function of program known as user-defined functions.

Match the Following.

i. – b.
ii. – c.
iii. – d.
iv. – a.

Check your Progress 3

State True or False.

1. True
2. True

Check your Progress 4

State True or False.

1. True
2. False

Check your Progress 5

State True or False.

1. True

Suggested Reading
1. Balagurusamy, E. Object Oriented Programming with C++. New Delhi: Tata McGraw-Hill.
2. Rajaraman, V. Fundamentals of Computers. New Delhi: PHI Learning.
3. Data Structures Using C by Reema Thareja, Oxford University Press.
4. C Programming Language, by Brian W. Kernighan, Dennis Ritchie, Prentice Hall.
Unit 5
Stacks and Queues
Structure:

5.1 Introduction

5.2 Stacks

5.3 Operations on the Stack

5.4 Applications of Stack

5.5 Queue

5.6 Types of Queue

5.7 Applications of Queue

Summary

Keywords

Self-Assessment Questions

Answer to Check Your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will able to:

 Understand the stacks and queues


 Explain the applications of stack and queue
 Implement operations of stack and queue using array and linked list

5.1 INTRODUCTION
Abstract data types or ADTs are a mathematical specification of a set of data and the set of operations
that can be performed on the data. They are abstract in the sense that the focus is on the definitions
of the constructor that returns an abstract handle that represents the data, and the various operations
with their arguments. The actual implementation is not defined, and does not affect the use of the
ADT. For example, rational numbers (numbers that can be written in the form a/b where a and b are
integers) cannot be represented natively in a computer.

In this unit, we will discuss the stack and queue ADT, which are an important data structure and
extensively used in computer applications. We will also study the operations that can be performed
on a stack and queue, discuss the implementation of a stack and queue by using both arrays as well
as linked lists. It will illustrate different types of queues like deque, circular queues, and priority
queues.

5.2 STACKS
Stack is an important data structure where the elements are stored in an ordered manner. The
elements of the stack are added and removed at/from only one end i.e. top.

E Top
D
C
B
A

The stack is a linear data structure which uses the same principle of pile of plates, where the plates
are arrange in bottom-top order. New plate is added at the top of another plate and plate is
removed from the top-most position. Stack is also called as LIFO (Last-In-First-Out).
Representation of Stacks using array:

Stack can be represented as a linear array. Every stack has a variable called top, represents the
position where the element will be added to or deleted from. There is another variable called MAX,
which is used to store the maximum number of elements of the stack. Consider the below
representation of stack where MAX=10 and top = 4.

A B C D E
0 1 2 3 4 5 6 7 8 9

5.3 OPERATIONS ON THE STACK:


1. InitStack(Stack): Creates an empty stack which initially contains NULL value.

2. IsEmpty(Stack): Stack is empty when top = NULL. This operation is used to check whether stack is
empty or not, it returns true if the stack is empty.

3. IsFull(Stack): Stack is full when top = MAX-1. This operation is used to check whether stack is full
or not, it returns true if the stack is full and we cannot add new element in it.
15
8
10
2
3

4. Push(element): Inserts the new element on the top of the stack.

5. Pop(Stack): Removes the element from the top of the stack. In the below example, X =15

6. Top(Stack): Returns the top position element of the stack without removing it. In the below
example, it returns top position element i.e. X=15.

Top

15
8
10
2
3

Implementation of stack in C:

#include <stdio.h>

#define MAX 5 // size of stack

int stack[MAX], top=-1;

int main( )

int val, choice;

do
{

printf("\n *****Select Operations on the Stack*****");

printf("\n 1. Push");

printf("\n 2. Pop");

printf("\n 3. Top Element");

printf("\n 4. Display");

printf("\n 5. Exit");

printf("\n Enter your choice: ");

scanf("%d", &choice);

switch(choice)

case 1:

printf("\n Enter the number to be pushed on stack: ");

scanf("%d", &val);

push(stack, val);

break;

case 2:

val = pop(stack);

if(val != -1)

printf("\n The value deleted from stack is: %d", val);

break;

case 3:

val = top_element(stack);

if(val != -1)

printf("\n The value stored at top of stack is: %d", val);

break;

case 4:

display(stack);

break;

default:

printf(“\nPlease select the correct choice”);


break;

} // End of switch

}while(choice != 5); // End of do_while

return 0;

} // End of Main

void push(int stack[], int val)

if(top == MAX-1)

printf("\n Stack Overflow");

else

top++;

stack[top] = val;

int pop(int stack[])

int val;

if(top == -1)

printf("\n Stack Underflow");

return -1;

else

val = stack[top];

top--;

return val;
}

void display(int stack[])

int i;

if(top == -1)

printf("\n Stack Underflow");

else

for(i=top;i>=0;i--)

printf("\n %d", stack[i]);

int top_element(int stack[])

if(top == -1)

printf("\n Stack Underflow");

return -1;

else

return (stack[top]);

5.4 APPLICATIONS OF STACK


Applications of the stack are:

1. Reversing a list
2. Parentheses checker
3. Conversion of an infix expression into a postfix expression
4. Evaluation of a postfix expression
5. Conversion of an infix expression into a prefix expression
6. Recursion
7. Tower of Hanoi
1. Reversing a list

A list of numbers can be reversed by reading each number from an array starting from the first
index and pushing it on a stack. Once all the numbers have been read, the numbers can be
popped one at a time and then stored in the array starting from the first index.

2. Parentheses checker

Stacks can be used to check whether the parentheses used in any algebraic expression are valid
or not. An algebraic expression is valid if for every open bracket there is a corresponding closing
bracket. For example, the expression (A+B)*({C-D} /E) is valid but an expression {A + (B – C} is
invalid.

3. Conversion of an infix expression into a postfix expression

Algebraic expressions are written using three different but equivalent notations: Infix, postfix,
and prefix notations. In infix notation, the operator is placed in between the operands and it is
easy to write an expressions using this notation. In postfix notation, the operator is placed after
the operands and placed before the operands in prefix notation.

For Example:

Infix expression: (A+B)*(C-D)

Equivalent Postfix expression: AB+CD-*

Equivalent Prefix expression: *+AB-CD

Here the operator precedence and associativity rules are followed.

Algorithm to convert Infix into Postfix expressions is given below. It accepts Infix expression with
operators, operands and parenthesis, uses stack to hold the operators temporarily.

1. Push “(“onto Stack, and add “)” to the end of infix expression.
2. Scan infix expression from left to right and repeat Step 3 to 6 for each element of infix
expression until the Stack is empty.
3. If an operand is encountered, add it to postfix expression.
4. If a left parenthesis is encountered, push it onto Stack.
5. If an operator is encountered ,then:
a. Repeatedly pop from Stack and add to postfix expression each operator (on the top
of Stack) which has the same precedence as or higher precedence than operator.
b. Add operator to Stack.
6. If a right parenthesis is encountered ,then:
a. Repeatedly pop from Stack and add to postfix expression each operator (on the top
of Stack) until a left parenthesis is encountered.
b. Remove the left Parenthesis.
7. END.

Example:

Infix expression: A+ (B*C-(D/E^F)*G)*H

Resultant Postfix expression: ABC*DEF^/G*-H*+

4. Evaluation of a postfix expression

If an algebraic expression written in infix notation, the computer first converts the expression
into the equivalent postfix notation and then evaluates the postfix expression. Both these tasks
make extensive use of stacks as the primary tool. Any postfix expression can be evaluated very
easily using stacks. Postfix expression is read from left to right character by character, push the
element in the stack if it is an operand and Pop the two operands from the stack, if the element is
an operator and then evaluate it. Push back the result of the evaluation. Repeat it till the end of
the expression.
Algorithm to evaluate postfix expression:

1. Add ‘)’ to postfix expression.


2. Read postfix expression Left to Right until ‘)’ encountered
3. If operand is encountered, push it onto Stack
4. If operator is encountered, Pop two elements
a. A -> Top element
b. B-> Next to Top element
c. Evaluate B operator A, push B operator A onto Stack
5. Set result = pop
6. END

Example: Postfix expression is 456*+

5. Conversion of an infix expression into a prefix expression

Conversion of infix into a prefix expression is very similar to the method of converting Infix to
Postfix but the only difference is that here we need to reverse the input string before conversion
and then reverse the final output string before displaying it.

Algorithm to convert Infix to Prefix (Method 1):

1. Push “)” onto STACK, and add “(“ to end of the infix expression
2. Scan infix expression from right to left and repeat step 3 to 6 for each element of A until the
STACK is empty
3. If an operand is encountered add it to prefix expression
4. If a right parenthesis is encountered push it onto STACK
5. If an operator is encountered then:
a. Repeatedly pop from STACK and add to prefix expression each operator (on the top
of STACK) which has same or higher precedence than the operator.
b. Add operator to STACK
6. If left parenthesis is encountered then
a. Repeatedly pop from the STACK and add to prefix expression (each operator on top
of stack until a left parenthesis is encountered)
b. Remove the left parenthesis
7. END
Method 2:

1. Reverse the infix string and interchange left and right parentheses.

2. Obtain the corresponding postfix expression of the infix expression obtained as a result of step
1

3. Reverse the postfix expression to get the prefix expression

Example: (A – B* C) * (D / E + F)

After the execution of step 1: (F + E / D) * (C *B – A)

After the execution of step 2: FED / + CB * A – *

After the execution of step 3: *–A*BC+/DEF

6. Recursion

A Recursive function is a function which calls itself. Such functions always need to have an exit
condition, otherwise it goes to infinite loop (infinite time execution). Each call to a subroutine
requires that the subprogram have a storage area where it can keep its local variables, its calling
parameters and its return address. For a recursive function a storage areas for subprogram calls
are kept in a stack. Therefore any recursive function may be rewritten in a non-recursive form using
stack.

For example, finding the factorial of a given number:

int Fact(int n)

if(n==1)

return 1;

else

return (n * Fact(n–1));

7. Tower of Hanoi

Tower of Hanoi is a historical problem, which can be easily expressed using recursion. There are N
disks of decreasing size stacked on one needle, and two other empty needles. It is required to
stack all the disks onto a second needle in the decreasing order of size. The third needle can be
used as a temporary storage.

The movement of the disks must confirm to the following rules,

1. Only one disk may be moved at a time

2. A disk can be moved from any needle to any other.

3. The larger disk should not rest upon a smaller one.


Check your Progress 1
Fill in the Blanks.

1. The order of evaluation of a postfix expression is from ______.


2. The stack is empty when ________.
3. ______ is used to convert an infix expression into a postfix expression.

5.5 QUEUE
A queue is a sequential list of elements like stack, the only difference between a stack and a
queue is that, in a stack, elements are added and removed at the same end (the top), whereas
in a queue, elements are inserted at one end (the rear) and deleted from the other end (the
front). Queue is also called as First-In-First-Out (FIFO). Example, People standing outside the
ticketing window of a cinema hall. The first person in the line will get the ticket first and thus will
be the first one to move out of it. Consider the following representation of queue:

A B C D E F
0 5
1 2 3 4 6 7 8 9

Front Rear

Operations on Queue

Variable MAX is used to store the maximum number of elements of the Queue.

1. InitQueue(Q): Creates an empty queue with NULL values and both front=rear=-1
2. IsEmpty(Q): It is used to check whether the queue is empty or not, it returns true if queue is
empty else returns false i.e. front == -1 OR front > rear
3. IsFull(Q): It is used to check whether the queue is full or not, it returns true if queue is full
else returns false i.e. rear == MAX-1

A B C D E F G H I J
0 9
1 2 3 4 5 6 7 8

Front Rear

4. Insert (element): Inserts new element in a queue if queue is not full.


Before insertion:

A B C D E F
0 5
1 2 3 4 6 7 8 9

Front Rear

After insertion of element G:

A B C D E F G
0 6
1 2 3 4 5 7 8 9

Front Rear

5. Remove (Q ): Delete an element from a queue if queue is not empty, X= A


Before deletion of an element:

A B C D E F G
0 6
1 2 3 4 5 7 8 9

Front Rear
After deletion of an element:

B C D E F G
1 6
0 2 3 4 5 7 8 9

Front Rear
Implementation of queue in C:

#include <stdio.h>

#define MAX 10 // size of queue

int queue[MAX];

int front = –1, rear = –1;

int main()

int choice, val;

do

printf("\n ***** Select Queue Operation *****");

printf("\n 1. Insert an element");

printf("\n 2. Delete an element");

printf("\n 3. Display the queue");

printf("\n 4. Exit");

printf("\n Enter your choice : ");

scanf("%d", &choice);

switch(choice)

case 1:

insert();

break;

case 2:

val = delete_element();

if(val!=–1)

printf("\n The number deleted is : %d", val);

break;

case 3:

display();

break;

default:
printf(“\nPlease select the correct choice”);

break;

} // End of switch

}while(option!=4); // End of do_while

return 0;

} // End of Main

void insert()

int num;

printf("\n Enter the number to be inserted in the queue : ");

scanf("%d", &num);

if(rear == MAX–1)

printf("\n Queue is full");

else

rear++;

queue[rear]=num;

int delete_element()

int val;

if(front == –1 || front > rear)

printf("\n Queue is empty");

return –1;

val = queue[++front];

return val;

void display()
{

int i;

printf("\n");

if (front == –1 || front > rear)

printf ("\n Queue is empty");

else

for(i=front; i<=rear; i++)

printf("\t %d", queue[i]);

5.6 TYPES OF QUEUE


Following are the types of the queue:

1. Circular Queue
2. Priority Queue
3. Deque (Double Ended Queue)

1. Circular Queue

Circular Queue is a same as queue where the last position is connected back to the first position
to make a circle.

Code for Circular queue in C:

#include <stdio.h>

#define MAX 10

int queue[MAX];

int front=–1, rear=–1;

int main()
{

int choice, val;

do

printf("\n ***** Select Queue Operation *****");

printf("\n 1. Insert an element");

printf("\n 2. Delete an element");

printf("\n 3. Display the queue");

printf("\n 4. Exit");

printf("\n Enter your choice : ");

scanf("%d", &choice);

switch(choice)

case 1:

insert();

break;

case 2:

val = delete_element();

if(val!=–1)

printf("\n The number deleted is : %d", val);

break;

case 3:

display();

break;

default:

printf(“\n Please select the correct choice”);

break;

}// End of switch

}while(option!=4); //End of do_while

return 0;

}
void insert()

int num;

printf("\n Enter the number to be inserted in the queue : ");

scanf("%d", &num);

if(front==0 && rear==MAX–1)

printf("\n Queue is full");

else if(front==–1 && rear==–1)

front=rear=0;

queue[rear]=num;

else if(rear==MAX–1 && front!=0)

rear=0;

queue[rear]=num;

else

rear++;

queue[rear]=num;

int delete_element()

int val;

if(front==–1 && rear==–1)

printf("\n Queue is empty");

return –1;

}
val = queue[front];

if(front==rear)

front=rear=–1;

else

if(front==MAX–1)

front=0;

else

front++;

return val;

void display()

int i;

printf("\n");

if (front ==–1 && rear= =–1)

printf ("\n Queue is empty");

else

if(front<rear)

for(i=front;i<=rear;i++)

printf("\t %d", queue[i]);

else

for(i=front;i<MAX;i++)

printf("\t %d", queue[i]);

for(i=0;i<=rear;i++)

printf("\t %d", queue[i]);


}

2. Priority Queue

A Priority queue is one in which each element will have a priority associated with it. The element
with the highest priority is the one that will be processed /deleted first. If two or more nodes
have the same priority then they will be processed in the same order as they were entered in to
the queue.

3. Deque

A deque is a queue in which insertions and deletions can happen at both ends of. A deque or
double ended queue is a data structure, which unites the properties of a queue and a stack. Like
the stack, items can be pushed into the deque, once inserted into the deque the last item pushed
in may be extracted from one side ( popped, as a stack) and the first item pushed in may be
pulled out of the other side( as in a queue)

In an input restricted deque the insertion of elements is at one end only, but the deletion of
elements can be done at both the ends of a queue. In an output restricted deque, the deletion of
elements is done at one end only, and allows insertion to be done at both the ends of a deque.

5.7 APPLICATIONS OF QUEUE


1. Queues are widely used as waiting lists for a single shared resource like printer, disk, CPU.
2. Queues are used to transfer data asynchronously (data not necessarily received at same rate
as sent) between two processes (IO buffers), e.g., pipes, file IO, sockets.
3. Queues are used in operating system for handling interrupts. When programming a real-
time system that can be interrupted, for example, by a mouse click, it is necessary to process
the interrupts immediately, before proceeding with the current job. If the interrupts have to
be handled in the order of arrival, then a FIFO queue is the appropriate data structure.

Josephus Problem

Let us see how queues can be used for finding a solution to the Josephus problem.

In Josephus problem, n people stand in a circle waiting to be executed. The counting starts at some
point in the circle and proceeds in a specific direction around the circle. In each step, a certain
number of people are skipped and the next person is executed (or eliminated). The elimination of
people makes the circle smaller and smaller. At the last step, only one person remains who is
declared the ‘winner’. Therefore, if there are n number of people and a number k which indicates
that k–1 people are skipped and kth person in the circle is eliminated, then the problem is to choose
a position in the initial circle so that the given person becomes the winner.

For example, if there are 5 (n) people and every second (k) person is eliminated, then first the
person at position 2 is eliminated followed by the person at position 4 followed by person at position
1 and finally the person at position 5 is eliminated. Therefore, the person at position 3 becomes the
winner.
Check your Progress 2
Multiple Choice Single Response.
1. In a queue, insertion is done at
a. Rear
b. Front
c. Back
d. Top
2. The circular queue will be full only when
a. FRONT = MAX –1 and REAR = Max –1
b. FRONT = 0 and REAR = Max –1
c. FRONT = MAX –1 and REAR = 0
d. FRONT = 0 and REAR = 0
3. A line in a bank represents a
a. Stack
b. Linked List
c. Queue
d. Array

Activity 1

Draw the queue structure in each case when the following operations are performed on an empty
queue.
a. Add 1, 2, 3, 4, 6, 8
b. Delete two numbers
c. Add 10
d. Add 15
e. Delete three numbers

Summary
 A stack is a linear data structure in which elements are added and removed only from one
end, which is called the top. It is called a LIFO (Last-In, First-Out) data structure as the
element that is inserted last is the first one to be removed.
 Stack is used to evaluate postfix expression and conversion of infix to postfix/prefix.
 Recursive functions are implemented using system stack.
 A queue is a FIFO data structure in which the element that is inserted first is the first one to
be removed. Here, the elements are added at the rear end and removed from the front.
 Circular queue, priority queue and deque are the types of the queue.
 Both stack and queue are implemented using array and linked list.
Keywords
 Data Structure: It is a data organization, management and storage format that enables
efficient access and modification

Self-Assessment Questions
1. How a stack implemented using a linked list differ from a stack implemented using an array?
2. Explain the terms infix expression, prefix expression, and postfix expression. Convert the
following infix expressions to their postfix equivalents:

(a) A – B + C (b) A * B + C / D

(c) (A – B) + C * D / E – C

(d) (A * B) + (C / D) – ( D + E)

(e) ((A – B) + D / ((E + F) * G))

3. Explain the applications of queue.


4. Explain the concept of a circular queue? How is it better than a linear queue?
5. Write a program to calculate the number of items in a queue.

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. The order of evaluation of a postfix expression is from left to right.


2. The stack is empty when top=MAX-1
3. Stack is used to convert an infix expression into a postfix expression.

Check your Progress 2

Multiple Choice Single Response.

1. In a queue, insertion is done at

b. Front

2. The circular queue will be full only when

b. FRONT = 0 and REAR = Max –1

3. A line in a bank represents a

c. Queue

Suggested Reading
1. Data Structures Using C by Reema Thareja, Oxford University Press
2. M. A. Weiss, ―Data Structures and Algorithm Analysis in C, Pearson Education Asia
3. Aho, J. E. Hopcroft and J. D. Ullman, ―Data Structures and Algorithms
Unit 6
Linked Lists
Structure

6.1 Introduction

6.2 Linked Lists

6.3 Operations on Linked List

6.4 Circular Linked List

6.5 Doubly Linked List

6.6 Polynomial Representation

Summary

Key Words

Self-Assessment Questions

Answers to Check your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objective

After going through this unit, you will able to:

 Understand the concept of linked list


 Explain the types of linked list with its operations
 Represent polynomial using linked list

6.1 INTRODUCTION
Array is a linear collection of data elements which are stored in consecutive memory locations. When
we declare an array we have to mention the size of the array and will need to restrict the number of
elements to be stored in the array. Also it is consider as to be static memory allocation. But when we
want to make efficient use of memory, the elements must be stored randomly at any location rather
than in consecutive locations (dynamic allocation). As such to write an efficient program we have to
data structure which support such criteria. Linked list is a data structure that is free from the
aforementioned restrictions. It does not store elements in consecutive memory locations and the user
can add any number of elements to it. But a linked list does not allow random access of data like array.
In this unit, we are going to discuss it in detail with types of linked list and operations performs on
linked list.

6.2 LINKED LISTS


A linked list is a sequence of elements called as nodes, which are connected together via links or
pointer. In linked list each node can store a data called an element and each link contains an address
of the next node called next. Linked list can be visualized as a chain of nodes, where every node points
to the next node.

Following are the important points to be considered:

 Linked List contains a link element called node.


 Head/Start is pointed to the first node of the linked list.
 Each node carries a data items(s) and a link field called next.
 Each link is linked with its next link using its next link.
 Last link carries a link as null to mark the end of the list.

Types of Linked List:

Following are the various types of linked list.

 Simple Linked List – Element navigation is forward only.


 Doubly Linked List − Elements can be navigated forward and backward. It contains an extra
pointer, typically called previous pointer, which points to the next node, together with next
pointer and data which are there in singly linked list.

 Circular Linked List – Last node contains link of the first node as next.

6.3 OPERATIONS ON LINKED LIST

Following are the basic operations supported by a Linked List.


 Insertion − Adds an element in the list.
 Deletion − Deletes an element from the list.
 Display − Displays the complete list.
 Search − Searches an element using the given key.

Insertion Operation
Adding a new node in linked list is a more than one step activity. First, create a node using
the same structure and find the location where it has to be inserted. We can insert element
in the beginning, end or middle (before/after) position of the linked list.

 At the Beginning of the Linked List:

Consider the linked list shown in Fig. 6.1. Suppose we want to add a new node with data 9 and
add it as the first node of the list. Then the following changes will be done in the linked list.

newnode->data = 9;
newnode->next = NULL;
newnode->next = START;
START = newnode;
Fig. 6.1

 At the End of the Linked List:

Consider the linked list shown in Fig. 6.2. Suppose we want to add a new node with data 9 as
the last node of the list. Then the following changes will be done in the linked list.

newnode->data = 9;
newnode->next = NULL;
PTR = START;
while (PTR->next! =NULL)
PTR = PTR ->next;
PTR ->next = newnode;

Fig. 6.2
 At the middle of the Linked List:

After the given node:


Consider the linked list shown in Fig. 6.3. Suppose we want to add a new node with value 9
after the node containing data 3. Then the following changes will be done in the linked list.

newnode->data = 9;
newnode->next = NULL;
PTR = START;
while (PTR-> data ! = value) // here, the value =3
PTR = PTR ->next;
newnode->next = PTR->next;
PTR ->next = newnode;

Fig. 6.3

Before the given node:

Consider the linked list shown in Fig. 6.4. Suppose we want to add a new node with value 9
before the node containing data 3. Then the following changes will be done in the linked list.
newnode->data = 9;
newnode->next = NULL;
PREPTR = PTR = START;
while (PTR-> data ! = value) // here, the value =3
{
PREPTR = PTR;
PTR = PTR ->next;
}
newnode->next = PTR;
PREPTR ->next = newnode;

Fig. 6.4

Deletion Operation

Whenever we delete a node from a linked list we need to check for underflow condition.
Underflow is a condition that occurs when we try to delete a node from a linked list that is
empty. This happens when START = NULL or when there are no more nodes to delete.
Note that when we delete a node from a linked list, we actually have to free the memory
occupied by that node. The memory is returned to the free pool so that it can be used to store
other programs and data.
Delete the first node from the linked list:
Consider the linked list shown in Fig. 6.5. Suppose we want to delete a node from the
beginning of the list, then the following changes will be done in the linked list.

If (START == NULL)
printf (“Linked List is empty”);
else
{
P = START;
START = START ->next;
P ->next = NULL;
free (P);
}

Fig. 6.5

Delete the last node from the linked list:

Consider the linked list shown in Fig. 6.6. Suppose we want to delete a last node from the list,
then the following changes will be done in the linked list.

If (START == NULL)
printf (“Linked List is empty”);

else
{
PREPTR = PTR = START;
while (PTR - >next! =NULL)
{
PREPTR = PTR;
PTR = PTR ->next;
}
PREPTR -> next =NULL;
free(PTR);
}
Fig. 6.6

Deleting the node after a given node in a Linked List:

Consider the linked list shown in Fig. 6.7. Suppose we want to delete the node that succeeds
the node which contains data value 4. Then the following changes will be done in the linked
list.

If (START == NULL)
printf (“Linked List is empty”);

else
{
PREPTR = PTR = START;
while (PREPTR - >data! =VAL) // here VAL = 4
{
PREPTR = PTR;
PTR = PTR ->next;
}
PREPTR -> next = PTR -> next;
PTR -> next = NULL;
free(PTR);
}
Fig. 6.7

Display Operation

In this operation, we need to traverse linked list using next pointer and display the data part
of the node. It is also called as traversing a linked list. Also we need to check for the underflow
condition.

If (START == NULL)
printf (“Linked List is empty”);

else
{
PTR = START;
while (PTR != NULL)
{
printf (“%d\t”, PTR ->data);
PTR = PTR ->next;
}
}
6.4 CIRCULAR LINKED LIST
In a circular linked list, the last node contains a pointer to the first node of the list. We can have a
circular singly linked list as well as a circular doubly linked list. While traversing a circular linked list,
we can begin at any node and traverse the list in any direction, forward or backward, until we reach
the same node where we started. Thus, a circular linked list has no NULL values in next pointer of any
of the nodes in the list. Figure 6.8 shows a circular linked list.

Fig. 6.8 Circular Linked List

We can traverse a circular linked list till the last node points to the first node.

Insertion Operation

Consider the linked list shown in Fig. 6.9. Suppose we want to add a new node with data 9 as the first
node of the list. Then the following changes will be done in the linked list.

newnode -> data = 9;


newnode -> next = NULL;
PTR = START;
while(PTR ->next !=START)
PTR = PTR ->next;
newnode ->next = START;
PTR ->next = newnode;
START = newnode;

Fig. 6.9
Consider the linked list shown in Fig. 6.10. Suppose we want to add a new node with data 9 at the end
of the list. Then the following changes will be done in the linked list.

newnode -> data = 9;


newnode -> next = NULL;
PTR = START;
while(PTR ->next !=START)
PTR = PTR ->next;
PTR ->next = newnode;
newnode ->next = START;

Fig. 6.10

Deletion Operation

When we want to delete a node from the beginning of the list, then the following changes will be
done in the linked list. Consider the circular linked list shown in Fig. 6.11

if(START ==NULL)
printf (“CLL is empty”);
else
{
PTR = START;
while( PTR ->next !=START)
PTR = PTR ->next;
PTR ->next = START ->next;
P = START;
P ->next =NULL;
free(P);
START = PTR->next;
}
Fig. 6.11

When we want to delete a last node of the list, then the following changes will be done in the linked
list. Consider the circular linked list shown in Fig. 6.12

if(START ==NULL)
printf (“CLL is empty”);
else
{
PREPTR = PTR = START;
while( PTR ->next !=START)
{
PREPTR = PTR;
PTR = PTR ->next;
}
PREPTR ->next = START;
PTR ->next =NULL;
free(PTR);
}
Fig. 6.12

6.5 DOUBLY LINKED LIST


A doubly linked list or a two-way linked list is a more complex type of linked list which contains a
pointer to the next as well as the previous node in the sequence. Therefore, it consists of three parts—
data, a pointer to the next node, and a pointer to the previous node.

The PREV field of the first node and the NEXT field of the last node will contain NULL. The PREV field
is used to store the address of the preceding node, which enables us to traverse the list in the
backward direction.

Thus, we see that a doubly linked list calls for more space per node and more expensive basic
operations. However, a doubly linked list provides the ease to manipulate the elements of the list as
it maintains pointers to nodes in both the directions (forward and backward). The main advantage of
using a doubly linked list is that it makes searching twice as efficient.

Insertion Operation

Consider the doubly linked list shown in Fig. 6.13. Suppose we want to add a new node with data 9 as
the first node of the list. Then the following changes will be done in the linked list.

newnode ->data =9;


newnode->prev = newnode->next = NULL;
newnode ->next = START;
START ->prev = newnode;
START = newnode;

Fig. 6.13

Consider the doubly linked list shown in Fig. 6.14. Suppose we want to add a new node with data 9 as
the last node of the list. Then the following changes will be done in the linked list.

newnode ->data =9;


newnode->prev = newnode->next = NULL;
PTR = START;
while (PTR ->next != NULL)
{
PTR = PTR ->next;
}
PTR->next = newnode;
newnode->prev = PTR;

Fig. 6.14

Consider the doubly linked list shown in Fig. 6.15. Suppose we want to add a new node with value 9
after the node containing 3.

newnode ->data =9;


newnode->prev = newnode->next = NULL;
PTR = START;
while (PTR ->data != VAL) // here VAL =3
{
PTR = PTR ->next;
}
PTR ->next ->prev = newnode;
newnode ->next = PTR->next;
PTR ->next = newnode;
newnode->prev = PTR;

Fig. 6.15

Consider the doubly linked list shown in Fig. 6.16. Suppose we want to add a new node with value 9
before the node containing 3.

newnode ->data =9;


newnode->prev = newnode->next = NULL;
PTR = START;
while (PTR ->data != VAL) // here VAL =3
{
PTR = PTR ->next;
}
newnode ->next = PTR;
newnode -> prev = PTR ->prev;
PTR ->prev ->next = newnode;
PTR ->prev = newnode;
Fig. 6.16

Deletion Operation

Consider the doubly linked list shown in Fig. 6.17. When we want to delete a node from the beginning
of the list, then the following changes will be done in the linked list.

if (START == NULL)
{
printf(“DLL is empty”);
}
else
{
P = START;
START = START ->next;
START ->prev = NULL;
P->next = P->prev = NULL;
free(P);
}

Fig. 6.17

Consider the doubly linked list shown in Fig. 6.18. When we want to delete a last node of the list, then
the following changes will be done in the linked list.

if (START == NULL)
{
printf(“DLL is empty”);
}
else
{
PTR = START;
while(PTR->next !=NULL)
PTR = PTR ->next;
PTR -> prev ->next = NULL;
PTR -> prev =NULL;
free(PTR);
}

Fig. 6.18

Consider the doubly linked list shown in Fig. 6.19. Suppose we want to delete the node that succeeds
the node which contains data value 4. Then the following changes will be done in the linked list.

if (START == NULL)
{
printf(“DLL is empty”);
}
else
{
PTR = START;
while(PTR->data !=VAL) // here VAL =4
PTR = PTR ->next;
P = PTR ->next;
PTR ->next = PTR ->next ->next; // or PTR->next = P->next;
PTR ->next->next ->prev = PTR; // P->next->prev = PTR;
P -> prev = P->next = NULL;
free(P);
}
Fig. 6.19

Consider the doubly linked list shown in Fig. 6.20. Suppose we want to delete the node preceding the
node with value 4. Then the following changes will be done in the linked list.

if (START == NULL)
{
printf(“DLL is empty”);
}
else
{
PTR = START;
while(PTR->data !=VAL) // here VAL =4
PTR = PTR ->next;
P = PTR ->prev;
P->prev->next =PTR;
PTR ->prev = P->prev;
P -> prev = P->next = NULL;
free(P);
}
Fig. 6.20

6.6 POLYNOMIAL REPRESENTATION


One of the most important application of Linked List is representation of a polynomial in memory.
Although, polynomial can be represented using a linear linked list but common and preferred way of
representing polynomial is using circular linked list with a header node.

Polynomial Representation: Header linked list are frequently used for maintaining polynomials in
memory. The header node plays an important part in this representation since it is needed to
represent the zero polynomial. Specifically, the information part of node is divided into two fields
representing respectively, the coefficient and the exponent of corresponding polynomial term and
nodes are linked according to decreasing degree. We can perform addition and multiplication
operations.

Consider a polynomial 6x3 + 9x2 + 7x + 1, its representation using linked list is shown below:

Check your Progress 1


Multiple Choice Single Response.
1. Which type of linked list contains a pointer to the next as well as the previous node in the
sequence?
(a) Singly linked list (b) Circular linked list
(c) Doubly linked list (d) All of these
2. Which type of linked list does not store NULL in next field?
(a) Singly linked list (b) Circular linked list
(c) Doubly linked list (d) All of these
Fill in the Blanks.
1. Each element in a linked list is known as a ______.
2. First node in the linked list is called the _____.
3. Underflow occurs when linked list is ______ .
4. In a circular linked list, the last node contains a pointer to the ____ node of the list.
Activity 1

1. Give the linked representation of the polynomial: 9x3 – 10x2 + 4x – 4


2. Write a program to delete the kth node from a linked list.
3. Write a program to concatenate two doubly linked lists.

Summary
 A linked list is a linear collection of data elements called as nodes in which linear representation
is given by links from one node to another.
 Before we delete a node from a linked list, we must first check for UNDERFLOW condition which
occurs when we try to delete a node from a linked list that is empty.
 When we delete a node from a linked list, we have to actually free the memory occupied by
that node. The memory is returned back to the free pool so that it can be used to store other
programs and data.
 In a circular linked list, the last node contains a pointer to the first node of the list.
 A doubly linked list or a two-way linked list is a linked list which contains a pointer to the next
as well as the previous node in the sequence.

Keywords
 Polynomial: It is an expression consisting of variables and coefficients
 Pointer: It is an object that stores the memory address of another value located in computer
memory.

Self-Assessment Questions
1. Make a comparison between a linked list and a linear array. Which one will you prefer to use
and when?
2. Why is a doubly linked list more useful than a singly linked list?
3. Give the advantages and uses of a circular linked list.
4. Explain the difference between a circular linked list and a singly linked list.

Answers to Check your Progress


Check your Progress 1

Multiple Choice Single Response.

1. Which type of linked list contains a pointer to the next as well as the previous node in the
sequence?

(c) Doubly linked list

2. Which type of linked list does not store NULL in next field?

(b) Circular linked list


Fill in the Blanks.
1. Each element in a linked list is known as a node.
2. First node in the linked list is called the header node.
3. Underflow occurs when linked list is empty.
4. In a circular linked list, the last node contains a pointer to the first node of the list.

Suggested Reading
1. Data Structures Using C by Reema Thareja, Oxford University Press.
2. M. A. Weiss, ―Data Structures and Algorithm Analysis in C, Pearson Education Asia.
3. Aho, J. E. Hopcroft and J. D. Ullman, ―Data Structures and Algorithms.
Unit 7
Trees
Structure:

7.1 Introduction

7.2 Trees Terminology

7.3 Binary Tree

7.3.1 Binary Tree Representation

7.3.2 Binary Tree Traversal

7.4 Introduction of Binary Search Trees

7.5 AVL Trees

7.6 B-Tree

Summary

Key Words

Self-Assessment Questions

Answers to Check your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives:

After going through this unit, you will able to:

 Understand the concept of binary tree and binary search tree


 Demonstrate different traversal method of a tree
 Explain the AVL and B-tree

7.1 INTRODUCTION
In this unit, we are going to introduce one of the most fundamental structures i.e. trees. The use of
the word tree here comes from the fact that, when we draw them, the resultant drawing often
resembles the trees found in a forest. In computer science, a 'tree' is a widely used abstract data type
(ADT) or data structure implementing this ADT that simulates a hierarchical tree structure, with a root
value and subtrees of children with a parent node, represented as a set of linked nodes.

A tree data structure can be defined recursively as a collection of nodes (starting at a root node),
where each node is a data structure consisting of a value, together with a list of references to nodes
(the "children"), with the constraints that no reference is duplicated, and none points to the root.

Alternatively, a tree can be defined abstractly as a whole (globally) as an ordered tree, with a value
assigned to each node. Both these perspectives are useful: while a tree can be analysed
mathematically as a whole, when actually represented as a data structure it is usually represented and
worked with separately by node. For example, looking at a tree as a whole, one can talk about "the
parent node" of a given node, but in general as a data structure a given node only contains the list of
its children, but does not contain a reference to its parent.

7.2 TREES TERMINOLOGY


A node is a structure which may contain a value, a condition, or represent a separate data structure
(which could be a tree of its own). Each node in a tree has zero or more child nodes, which are below
it in the tree. A node that has a child is called the child's parent node (or ancestor node, or superior).
A node has at most one parent.

Nodes that do not have any children are called leaf nodes. They are also referred to as terminal nodes.
The height of a node is the length of the longest downward path to a leaf from that node. The height
of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its
root path).
The topmost node in a tree is called the root node. Being the topmost node, the root node will not
have parents. It is the node at which operations on the tree commonly begin. All other nodes can be
reached from it by following edges or links. In diagrams, it is typically drawn at the top. In some trees,
such as heaps, the root node has special properties. Every node in a tree can be seen as the root node
of the subtree rooted at that node.

An internal node or inner node is any node of a tree that has child nodes and is thus not a leaf node.
A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T. The subtree
corresponding to the root node is the entire tree; the subtree corresponding to any other node is
called a proper subtree.

Summary:

 Root: The top node in a tree.


 Child: A node directly connected to another node when moving away from the root.
 Parent: The converse notion of a child.
 Siblings: A group of nodes with the same parent.
 Descendant: A node reachable by repeated proceeding from parent to child. Also known as
sub-child.
 Ancestor: A node reachable by repeated proceeding from child to parent.
 Leaf/External node: A node with no children.
 Branch node/Internal node: A node with at least one child.
 Degree: For a given node, its number of children. A leaf is necessarily degree zero.
 Edge: The connection between one node and another.
 Path: A sequence of nodes and edges connecting a node with a descendant.
 Level: 1 + the number of edges between the node and the root.
 Depth: The depth of a node is defined as the number of edges between the node and the root.
 Height of node: The height of a node is the number of edges on the longest path between
that node and a descendant leaf.
 Height of tree: The height of a tree is the height of its root node.
 Forest: A forest is a set of n ≥ 0 disjoint trees.
7.3 BINARY TREE
A binary tree consists of a finite set of nodes that is either empty, or consists of one specially
designated node called the root of the binary tree, and the elements of two disjoint binary trees called
the left subtree and right subtree of the root. Note that the definition above is recursive: we have
defined a binary tree in terms of binary trees. This is appropriate since recursion is an innate
characteristic of tree structures.

Fig. 7.1 Binary Tree

If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is termed a
strictly binary tree. Or, to put it another way, all of the nodes in a strictly binary tree are of degree
zero or two, never degree one. A strictly binary tree with N leaves always contains 2N – 1 nodes. Some
texts call this a "full" binary tree. (Fig. 7.1)

A complete binary tree of depth d is the strictly binary tree all of whose leaves are at level d. The total
number of nodes in a complete binary tree of depth d equals 2 d+1 – 1. Since all leaves in such a tree
are at level d, the tree contains 2d leaves and, therefore, 2d - 1 internal nodes.

Fig. 7.2 Complete Binary Tree

A binary tree of depth d is an almost complete binary tree if:

 Each leaf in the tree is either at level d or at level d – 1.


 For any node nd in the tree with a right descendant at level d, all the left descendants of nd
that are leaves are also at level d.

Fig. 7.3 An almost complete binary tree


7.3.1 Binary Tree Representation

A full binary tree of depth k is a binary tree of depth k having 2k-1 nodes. This is the maximum number
of the nodes such a binary tree can have. A very elegant sequential representation for such binary
trees results from sequentially numbering the nodes, starting with nodes on level 1, then those on
level 2 and so on. Nodes on any level are numbered from left to right. This numbering scheme gives
us the definition of a complete binary tree. A binary tree with n nodes and a depth k is complete if its
nodes correspond to the nodes which are numbered 1 to n in the full binary tree of depth k. The nodes
may be represented in an array or using a linked list.

Array Representation

This method is easy to understand and implement. It's very useful for certain kinds of tree applications,
such as heaps, and fairly useless for others.

Steps to implement binary trees using arrays: Take a complete binary tree and number its nodes from
top to bottom, left to right. The root is 0, the left child 1, the right child 2, the left child of the left child
3, etc. Put the data for node i of this tree in the ith element of an Array. If we have a partial (incomplete)
binary tree, and node i is absent, put some value that represents "no data" in the ith position of the
array.

Three simple formulae allow us to go from the index of the parent to the index of its children and vice
versa:

1. if index(parent) = N, index(left child) = 2*N+1


2. if index(parent) = N, index(right child) = 2*N+2
3. if index(child) = N, index(parent) = (N-1)/2 (integer division with truncation)

For a complete or almost complete binary tree, storing the binary tree as an array may be a good
choice. If this scheme is used to store a binary tree that is not complete or almost complete, we can
end up with a great deal of wasted space in the array. For example, the following binary tree:
would be stored like:

Advantages of linear representation:

1. Simplicity.
2. Given the location of the child (say, k), the location of the parent is easy to determine (k / 2).

Disadvantages of linear representation:

1. Additions and deletions of nodes are inefficient, because of the data movements in the array.
2. Space is wasted if the binary tree is not complete. That is, the linear representation is useful
if the number of missing nodes is small.

Linked Representation
For a linked representation, a node in the tree

1. has a data field


2. a left child field with a pointer to another tree node
3. a right child field with a pointer to another tree node
4. optionally, a parent field with a pointer to the parent node

The most important thing we must remember about the linked representation is that a tree is
represented by the pointer to the root node, not a node. The empty tree is simply the NULL pointer,
not an empty node.

7.3.2 Binary Tree Traversal

Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes
are connected via edges (links) we always start from the root (head) node. That is, we cannot randomly
access a node in a tree. There are three ways which we use to traverse a tree −

Types of Traversals are:

 Pre-order traversal
1. Visit the root
2. Traverse the left sub tree
3. Traverse the right sub tree

 In-order traversal
1. Traverse the left sub tree
2. Visit the root
3. Traverse the right sub tree

 Post-order traversal
1. Traverse the left subtree
2. Traverse the right subtree
3. Visit the root

Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the values
it contains.

PreOrder - 8, 5, 9, 7, 1, 12, 2, 4, 11, 3


InOrder - 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
PostOrder - 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
Following diagram demonstrate the order of node visitation. Number 1 denotes the first node in a
particular traversal and 7 denote the last node.

Recursive functions for traversing:

void Inorder(struct Node* node)


{
if (node == NULL)
return;
Inorder(node->left);
printf(“%d”, node->data);
Inorder(node->right);
}

void Preorder(struct Node* node)


{
if (node == NULL)
return;
printf(“%d”, node->data);
Preorder(node->left);
Preorder(node->right);
}

void Postorder(struct Node* node)


{
if (node == NULL)
return;
Postorder(node->left);
Postorder(node->right);
printf(“%d”, node->data);
}

7.4 INTRODUCTION OF BINARY SEARCH TREES


A binary search tree is a binary tree in which the data in the nodes are ordered in a particular way. To
be precise, starting at any given node, the data in any nodes of its left sub tree must all be less than
the item in the given node, and the data in any nodes of its right sub tree must be greater than or
equal to the data in the given node. For numbers this can obviously be done. For strings, alphabetical
ordering is often used. For records of data, a comparison based on a particular field (the key field) is
often used. An example of a binary search tree is shown in Figure.

The above figure shows a binary search tree of size 9 and depth 3, with root 7 and leaves 1, 4, 6 and
12. Every node (object) in a binary tree contains information divided into two parts. The first one is
proper to the structure of the tree, that is, it contains a key field (the part of information used to order
the elements), a parent field, a left child field, and a right child field. The second part is the object data
itself. It can be endogenous (that is, data resides inside the tree) or exogenous (that is nodes only
contains a references to the object's data). The root node of the tree has its parent field set to null.
Whenever a node does not have a right child or a left child, then the corresponding field is set to null.
A binary search tree is a binary tree with more constraints. If x is a node with key value key [x] and it
is not the root of the tree, then the node can have a left child (denoted by left [x]), a right child (right
[x]) and a parent (p [x]). Every node of a tree possesses the following Binary Search Tree properties:

1. For all nodes y in left sub tree of x, key [y] < key [x]
2. For all nodes y in right sub tree of x, key[y] > key [x]

Operations on Binary Search Tree:

1. Searching

The minimum element of a binary search tree is the last node of the left roof, and its maximum
element is the last node of the right roof. Therefore, we can find the minimum and the maximum by
tracking on the left child and the right child, respectively, until an empty sub tree is reached.

We can search for a desirable value in a Binary search tree through the following procedure.

Search for a matching node:

1. Start at the root node as current node

2. If the search key‘s value matches the current node‘s key then found a match

3. If search key‘s value is greater than current node‘s key,

1. If the current node has a right child, search right

2. Else, no matching node in the tree

4. If search key is less than the current node‘s key,

1. If the current node has a left child, search left

2. Else, no matching node in the tree

Example:

Search for 45 in the below tree:

1. Start at the root, 45 is greater than 7, search in right sub tree

2. 45 is greater than 10, search in 10‘s right sub tree

3. 45 is greater than 14, but 14 has no right sub tree so 45 is not in the BST

,
2. Insertion

Both insertion and deletion operations cause changes in the data structure of the dynamic set
represented by a binary search tree. For a standard insertion operation, we only need to search for
the proper position that the element should be put in and replace NIL by the element. If there are n
elements in the binary search tree, then the tree contains (n+1) NIL pointers. Therefore, for the (n+1)th
element to be inserted, there are (n+1) possible places available. In the insertion algorithm, we start
by using two pointers: p and q, where q is always the parent of p. Starting from the top of the tree,
the two pointers are moved following the algorithm of searching the element to be inserted.
Consequently, we end with p=NIL (assuming that the key is not already in the tree). The insertion
operation is then finished by replacing the NIL key with the element to be inserted.

We can insert a node in the binary search tree using the following procedure:
1. Always insert new node as leaf node
2. Start at root node as current node
3. If new node‘s key < current‘s key
1. If current node has a left child, search left
2. Else add new node as current‘s left child
4. If new node‘s key > current‘s key
1. If current node has a right child, search right
2. Else add new node as current‘s right child

Example:

Insert 7 in the below BST:


1. Start at the root, 7 is less than 8, search position in left sub tree
2. 7 is greater than 3, search position in 3‘s right sub tree
3. 7 is greater than 5, search position in 5‘s right sub tree, no right sub tree so insert 7 in the BST

3. Deletion

Deleting an item from a binary search tree is little harder than inserting one. There are several cases
to consider. A node to be deleted
o is not in a tree;
o is a leaf;
o has only one child;
o has two children.

Case I: Deleting a Node that is not in a tree

Search a node to be deleted in a tree. If the node is not in the tree, then there is nothing to delete.
Case II: Deleting a Node that has no children or leaf node

Look at the binary search tree given below. If we have to delete node 78, we can simply remove this
node without any issue and its parent node point to NULL. This is the simplest case of deletion.
Step 1: Start searching of 78 from the root
Step 2: 78 greater than 45, so traverse 45’s right subtree
Step 3: 78 greater than 56, so traverse 56’s right subtree
Step 4: Delete 78 and set NULL as 56’s right child

Case III: Deleting a Node with only one child

To handle this case, the node’s child is set as the child of the node’s parent. In other words, replace
the node with its child. Now, if the node is the left child of its parent, the node’s child becomes the
left child of the node’s parent. Correspondingly, if the node is the right child of its parent, the node’s
child becomes the right child of the node’s parent. Look at the binary search tree shown below and
see how deletion of node 54 is handled.

Case IV: Deleting a Node with two children

In this case, replace the node’s value with its in-order predecessor (largest value in the left sub-tree)
or in-order successor (smallest value in the right sub-tree). The in-order predecessor or the successor
can then be deleted using any of the above cases. Following figure shows how the deletion of node
with value 56 is handled with both cases.
4. Other operations on BST:
Following are the few more operations which we can perform on BST:

a) Find the height of the BST


b) Determine the number of nodes
c) Finding the mirror image of a BST (which is obtained by interchanging the left sub-tree with
the right sub-tree at every node of the tree)
d) Finding the smallest and largest node in a BST

BST implementation:

#include <stdio.h>
#include <stdlib.h>
struct node
{
int data; //node will store an integer
struct node *right; // right child
struct node *left; // left child
};
struct node* search(struct node *root, int x)
{
if(root==NULL || root->data==x) //if root->data is x then the element is found
return root;
else if(x>root->data) // x is greater, so we will search the right subtree
return search(root->right, x);
else //x is smaller than the data, so we will search the left subtree
return search(root->left,x);
}
//function to find the minimum value in a node
struct node* find_minimum(struct node *root)
{
if(root == NULL)
return NULL;
else if(root->left != NULL) // node with minimum value will have no left child
return find_minimum(root->left); // left most element will be minimum
return root;
}
//function to create a node
struct node* new_node(int x)
{
struct node *p;
p = malloc(sizeof(struct node));
p->data = x;
p->left = NULL;
p->right = NULL;
return p;
}

struct node* insert(struct node *root, int x)


{
//searching for the place to insert
if(root==NULL)
return new_node(x);
else if(x > root->data) // x is greater. Should be inserted to right
root->right = insert(root->right, x);
else // x is smaller should be inserted to left
root->left = insert(root->left,x);
return root;
}

// funnction to delete a node


struct node* delete(struct node *root, int x)
{
//searching for the item to be deleted
if(root==NULL)
return NULL;
if (x > root->data)
root->right = delete(root->right, x);
else if(x < root->data)
root->left = delete(root->left, x);
else
{
//No Children
if(root->left == NULL && root->right == NULL)
{
free(root);
return NULL;
}

//One Child
else if(root->left == NULL || root->right == NULL)
{
struct node *temp;
if(root->left==NULL)
temp = root->right;
else
temp = root->left;
free(root);
return temp;
}

//Two Children
else
{
struct node *temp = find_minimum(root->right);
root->data = temp->data;
root->right = delete(root->right, temp->data);
}
}
return root;
}

void inorder(struct node *root)


{
if(root!=NULL) // checking if the root is not null
{
inorder(root->left); // visiting left child
printf(" %d ", root->data); // printing data at root
inorder(root->right);// visiting right child
}
}

int main()
{
struct node *root;
root = new_node(20);
insert(root,5);
insert(root,1);
insert(root,15);
insert(root,9);
insert(root,7);
insert(root,12);
insert(root,30);
insert(root,25);
insert(root,40);
insert(root, 45);
insert(root, 42);
inorder(root);
printf("\n");
root = delete(root, 1);
root = delete(root, 40);
return 0;
}

7.5 AVL TREES


A self-balancing or height-balanced binary search tree is any node-based binary search tree that
automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary
item insertions and deletions. These structures provide efficient implementations for mutable ordered
lists, and can be used for other abstract data structures such as associative arrays, priority queues and
sets. Examples are AVL tree, red-black tree.

An AVL tree named after inventors Adelson-Velsky and Landis, is a self-balancing binary search tree.
It was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees
of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to
restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and
worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions
may require the tree to be rebalanced by one or more tree rotations.

An AVL tree is a binary search tree which has the following properties:
1. The sub-trees of every node differ in height by at most one.
2. Every sub-tree is an AVL tree.

The structure of an AVL tree is the same as binary search tree structure but with a little difference. In
AVL structure, it stores an additional variable called the Balance Factor. Thus, every node has a balance
factor associated with it. The balance factor of a node is calculated by subtracting the height of its
right sub-tree from the height of its left sub-tree. A binary search tree in which every node has a
balance factor of –1, 0, or 1 is said to be height balanced. A node with any other balance factor is
considered to be unbalanced and requires rebalancing of the tree.
Balance factor = Height (left sub-tree) – Height (right sub-tree)

 If the balance factor of a node is 1, then it means that the left sub-tree of the tree is one level
higher than that of the right sub-tree.
 If the balance factor of a node is 0, then it means that the height of the left sub-tree (longest
path in the left sub-tree) is equal to the height of the right sub-tree.
 If the balance factor of a node is –1, then it means that the left sub-tree of the tree is one level
lower than that of the right sub-tree.

Fig. 7.4 (a) Left-heavy AVL tree, (b) right-heavy tree, (c) balanced tree

Rotations
When the tree structure changes (e.g., insertion or deletion), we need to transform the tree to restore
the AVL tree property. This is done using single rotations or double rotations.
Look at Fig. 7.4. Note that the nodes 18, 39, 54, and 72 have no children, so their balance factor = 0.
Node 27 has one left child and zero right child. So, the height of left sub-tree = 1, whereas the height
of right sub-tree = 0. Thus, its balance factor = 1. Look at node 36, it has a left sub-tree with height =
2, whereas the height of right sub-tree = 1. Thus, its balance factor = 2 – 1 = 1. Similarly, the balance
factor of node 45 = 3 – 2 =1; and node 63 has a balance factor of 0 (1 – 1).
Now, look at Figs 7.4 (a) and (b) which show a right-heavy AVL tree and a balanced AVL tree. The trees
given in Fig. 7.4 are typical candidates of AVL trees because the balancing factor of every node is either
1, 0, or –1. However, insertions and deletions from an AVL tree may disturb the balance factor of the
nodes and, thus, rebalancing of the tree may have to be done. The tree is rebalanced by performing
rotation at the critical node. There are four types of rotations: LL rotation, RR rotation, LR rotation,
and RL rotation. The type of rotation that has to be done will vary depending on the particular
situation.

Searching for a Node in an AVL Tree


Searching in an AVL tree is performed exactly the same way as it is performed in a binary search tree.
Due to the height-balancing of the tree, the search operation takes O(log n) time to complete. Since
the operation does not modify the structure of the tree, no special provisions are required.

Inserting a New Node in an AVL Tree


Insertion in an AVL tree is also done in the same way as it is done in a binary search tree. In the AVL
tree, the new node is always inserted as the leaf node. But the step of insertion is usually followed by
an additional step of rotation. Rotation is done to restore the balance of the tree.
However, if insertion of the new node does not disturb the balance factor, that is, if the balance factor
of every node is still –1, 0, or 1, then rotations are not required.
During insertion, the new node is inserted as the leaf node, so it will always have a balance factor
equal to zero. The only nodes whose balance factors will change are those which lie in the path
between the root of the tree and the newly inserted node. The possible changes which may take place
in any node on the path are as follows:
1. Initially, the node was either left- or right-heavy and after insertion, it becomes balanced.
2. Initially, the node was balanced and after insertion, it becomes either left- or right-heavy.
3. Initially, the node was heavy (either left or right) and the new node has been inserted in the
heavy sub-tree, thereby creating an unbalanced sub-tree. Such a node is said to be a critical
node.
Consider the AVL tree given in Fig. 7.5.
If we insert a new node with the value 30, then the new tree will still be balanced and no rotations
will be required in this case. Look at the tree given in Fig. 7.6, which shows the tree after inserting
node 30.

Fig. 7.5 Fig. 7.6

Let us take another example to see how insertion can disturb the balance factors of the nodes and
how rotations are done to restore the AVL property of a tree. Look at the tree given in Fig. 7.5.
After inserting a new node with the value 71, the new tree will be as shown in Fig. 7.7. Note that there
are three nodes in the tree that have their balance factors 2, –2, and –2, thereby disturbing the
AVLness of the tree. So, here comes the need to perform rotation. To perform rotation, our first task
is to find the critical node. Critical node is the nearest ancestor node on the path from the inserted
node to the root whose balance factor is neither –1, 0, nor 1.

Fig. 7.7

In the tree given above, the critical node is 72. The second task in rebalancing the tree is to determine
which type of rotation has to be done. There are four types of rebalancing rotations and application
of these rotations depends on the position of the inserted node with reference to the critical node.
The four categories of rotations are:
1. LL rotation: The new node is inserted in the left sub-tree of the left sub-tree of the critical
node.
2. RR rotation: The new node is inserted in the right sub-tree of the right sub-tree of the critical
node.
3. LR rotation: The new node is inserted in the right sub-tree of the left sub-tree of the critical
node.
4. RL rotation: The new node is inserted in the left sub-tree of the right sub-tree of the critical
node.
LL Rotation:
Let us study each of these rotations in detail. First, we will see where and how LL rotation is applied.
Consider the tree given in Fig. 7.8 which shows an AVL tree.

Fig. 7.8 LL rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), a new node is inserted in the left sub-tree of the left sub-tree of the
critical node A (node A is the critical node because it is the closest ancestor whose balance factor is
not –1, 0, or 1), so we apply LL rotation as shown in tree (c). While rotation, node B becomes the root,
with T1 and A as its left and right child. T2 and T3 become the left and right sub-trees of A.

RR Rotation:
Let us now discuss where and how RR rotation is applied. Consider the tree given in Fig. 7.9 which
shows an AVL tree.

Fig. 7.9 RR rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), a new node is inserted in the right sub-tree of the right sub-tree of
the critical node A (node A is the critical node because it is the closest ancestor whose balance factor
is not –1, 0, or 1), so we apply RR rotation as shown in tree (c). Note that the new node has now
become a part of tree T3.
While rotation, node B becomes the root, with A and T3 as its left and right child. T1 and T2 become
the left and right sub-trees of A.
LR and RL Rotations
Consider the AVL tree given in Fig. 7.10 and see how LR rotation is done to rebalance the tree.

Fig. 7.10 LR rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), a new node is inserted in the right sub-tree of the left sub-tree of
the critical node A (node A is the critical node because it is the closest ancestor whose balance factor
is not –1, 0 or 1), so we apply LR rotation as shown in tree (c). Note that the new node has now become
a part of tree T2.
While rotation, node C becomes the root, with B and A as its left and right children. Node B has T1 and
T2 as its left and right sub-trees and T3 and T4 become the left and right sub-trees of node A.
Now, consider the AVL tree given in Fig. 7.11 and see how RL rotation is done to rebalance the tree.

Fig. 7.11 RL rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), a new node is inserted in the left sub-tree of the right sub-tree of
the critical node A (node A is the critical node because it is the closest ancestor whose balance factor
is not –1, 0, or 1), so we apply RL rotation as shown in tree (c). Note that the new node has now
become a part of tree T2.
While rotation, node C becomes the root, with A and B as its left and right children. Node A has T1 and
T2 as its left and right sub-trees and T3 and T4 become the left and right sub-trees of node B.

Example: Construct an AVL tree by inserting the following elements in the given order.
63, 9, 19, 27, 18, 108, 99, 81.
Deleting a Node from an AVL Tree
Deletion of a node in an AVL tree is similar to that of binary search trees. But it goes one step ahead.
Deletion may disturb the AVLness of the tree, so to rebalance the AVL tree, we need to perform
rotations. There are two classes of rotations that can be performed on an AVL tree after deleting a
given node. These rotations are R rotation and L rotation.
On deletion of node X from the AVL tree, if node A becomes the critical node (closest ancestor node
on the path from X to the root node that does not have its balance factor as 1, 0, or –1), then the type
of rotation depends on whether X is in the left sub-tree of A or in its right sub-tree. If the node to be
deleted is present in the left sub-tree of A, then L rotation is applied, else if X is in the right sub-tree,
R rotation is performed.
Further, there are three categories of L and R rotations. The variations of L rotation are L–1, L0, and
L1 rotation. Correspondingly for R rotation, there are R0, R–1, and R1 rotations. In this section, we will
discuss only R rotation. L rotations are the mirror images of R rotations.

R0 Rotation
Let B be the root of the left or right sub-tree of A (critical node). R0 rotation is applied if the balance
factor of B is 0. This is illustrated in Fig. 7.12.
Fig. 7.12 R0 rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), the node X is to be deleted from the right sub-tree of the critical
node A (node A is the critical node because it is the closest ancestor whose balance factor is not –1, 0,
or 1). Since the balance factor of node B is 0, we apply R0 rotation as shown in tree (c).
During the process of rotation, node B becomes the root, with T1 and A as its left and right child. T2
and T3 become the left and right sub-trees of A.

R1 Rotation
Let B be the root of the left or right sub-tree of A (critical node). R1 rotation is applied if the balance
factor of B is 1. Observe that R0 and R1 rotations are similar to LL rotations; the only difference is that
R0 and R1 rotations yield different balance factors. This is illustrated in Fig. 7.13.

Fig. 7.13 R1 rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), the node X is to be deleted from the right sub-tree of the critical
node A (node A is the critical node because it is the closest ancestor whose balance factor is not –1,
0, or 1). Since the balance factor of node B is 1, we apply R1 rotation as shown in tree (c).
During the process of rotation, node B becomes the root, with T1 and A as its left and right children.
T2 and T3 become the left and right sub-trees of A.

R–1 Rotation
Let B be the root of the left or right sub-tree of A (critical node). R–1 rotation is applied if the balance
factor of B is –1. Observe that R–1 rotation is similar to LR rotation. This is illustrated in Fig. 7.14.
Fig. 7.14 R-1 rotation in an AVL tree

Tree (a) is an AVL tree. In tree (b), the node X is to be deleted from the right sub-tree of the critical
node A (node A is the critical node because it is the closest ancestor whose balance factor is not –1, 0
or 1). Since the balance factor of node B is –1, we apply R–1 rotation as shown in tree (c). While
rotation, node C becomes the root, with T1 and A as its left and right child. T2 and T3 become the left
and right sub-trees of A.

Example: Delete nodes 52, 36, and 61 from the AVL tree given below:

7.6 B-TREE
A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches,
sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a
binary search tree in that a node can have more than two children. Unlike other self-balancing binary
search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of
data, such as discs. It is commonly used in databases and file systems.

According to Knuth's definition, a B-tree of order m is a tree which satisfies the following properties:
 Every node has at most m children.
 Every non-leaf node (except root) has at least ⌈m/2⌉ child nodes.
 The root has at least two children if it is not a leaf node.
 A non-leaf node with k children contains k − 1 keys.
 All leaves appear in the same level and carry no information.

Each internal node’s keys act as separation values which divide its subtrees. For example, if an internal
node has 3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All values in the leftmost
subtree will be less than a1, all values in the middle subtree will be between a1 and a2, and all values
in the rightmost subtree will be greater than a2.

Advantages of B-tree usage for databases:

A B-tree:

1. keeps keys in sorted order for sequential traversing


2. uses a hierarchical index to minimize the number of disk reads
3. uses partially full blocks to speed insertions and deletions
4. keeps the index balanced with a recursive algorithm

In addition, a B-tree minimizes waste by making sure the interior nodes are at least half full. A B-tree
can handle an arbitrary number of insertions and deletions.

Fig. 7.15 B-Tree of order 4

Check your Progress 1


Fill in the Blanks.
1. The ______ node in a tree is called the root node.
2. The ______ of a node is defined as the number of edges between the node and the root.
3. If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is
termed a _______.
Activity 1
1. Write a program to create a BST and traverse the same using preorder, inorder and
postorder.
2. Consider the below BST and perform following operations:
a. Insert node 11, 77, 23, 59, 9
b. Traverse the tree using preorder traversal method
c. Delete the node 56

3. Create an AVL tree using the following sequence of data:


19, 20, 29, 12, 36, 45, 91, 73, 79.

Summary
 A tree data structure can be defined recursively as a collection of nodes (starting at a root
node), where each node is a data structure consisting of a value, together with a list of
references to nodes (the "children"), with the constraints that no reference is duplicated, and
none points to the root.
 A binary tree consists of a finite set of nodes that is either empty, or consists of one specially
designated node called the root of the binary tree, and the elements of two disjoint binary
trees called the left subtree and right subtree of the root.
 A binary tree is represented using array and linked list.
 Traversal is a process to visit all the nodes of a tree and may print their values too. Types of
traversal are: preorder, inorder and postorder.
 In BST, starting at any given node, the data in any nodes of its left sub tree must all be less
than the item in the given node, and the data in any nodes of its right sub tree must be greater
than or equal to the data in the given node.
 An AVL tree, is a self-balancing binary search tree, the heights of the two child subtrees of any
node differ by at most one; if at any time they differ by more than one, rebalancing is done to
restore this property.

Keywords
 Node: A node is a structure which may contain a value, a condition, or represent a separate
data structure.
 Traversal: It is a process to visit all the nodes of a tree and may print their values too.
 Self-balancing tree: It is also called as height-balanced binary search tree that automatically
keeps its height (maximal number of levels below the root) small in the face of arbitrary item
insertions and deletions.

Self-Assessment Questions
1. Explain tree and its terminologies.
2. State the difference between Binary tree and Binary search tree.
3. Explain the different types of tree traversal with example.
4. Write a short note on AVL tree.
5. What is B-Tree?

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. The topmost node in a tree is called the root node.


2. The depth of a node is defined as the number of edges between the node and the root.
3. If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is termed
a strictly binary tree.

Suggested Reading
1. Balagurusamy, E. Object Oriented Programming with C++. New Delhi: Tata McGraw-Hill.
2. Rajaraman, V. Fundamentals of Computers. New Delhi: PHI Learning.
3. Data Structures Using C by Reema Thareja, Oxford University Press.
4. C Programming Language, by Brian W. Kernighan, Dennis Ritchie, Prentice Hall.
Unit 8
Searching Algorithms
Structure:

8.1 Introduction

8.2 Linear Search

8.3 Binary Search

Summary

Keywords

Self-Assessment Questions

Answer to Check Your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will be able to:

 Explain the concept of searching


 Discuss the types of searching techniques
 Discuss the complexity of types of searching algorithms

8.1 INTRODUCTION
Searching is the algorithmic process of finding a particular item in a collection of items. A search
typically answers either True or False as to whether the item is present. On occasions, it may be
modified to return where the item is found. Computer systems are often used to store large amounts
of data from which individual records must be retrieved according to some search criterion. Thus the
efficient storage of data to facilitate fast searching is an important issue. In this unit, we shall
investigate the performance of some searching algorithms and discuss the complexity of the same.

In computer science, a search data structure is any data structure that allows the efficient retrieval of
specific items from a set of items, such as a specific record from a database. The simplest, most
general, and least efficient search structure is merely an unordered sequential list of all the items.
Locating the desired item in such a list, by the linear search method, inevitably requires a number of
operations proportional to the number n of items, in the worst case as well as in the average case.
Useful search data structures allow faster retrieval; however, they are limited to queries of some
specific kind. Moreover, since the cost of building such structures is at least proportional to n, they
only pay off if several queries are to be performed on the same database (or on a database that
changes little between queries).

Static search structures are designed for answering many queries on a fixed database; dynamic
structures also allow insertion, deletion, or modification of items between successive queries. In the
dynamic case, one must also consider the cost of fixing the search structure to account for the changes
in the database. The two common methods of searching are linear and binary searching. These two
searching algorithms are discussed in the subsequent topics.

8.2 LINEAR SEARCH


This is the most natural searching method. When data items are stored in a collection such as a list,
we say that they have a linear or sequential relationship. Each data item is stored in a position relative
to the others. These relative positions are the index values of the individual items. Since these index
values are ordered, it is possible for us to visit them in sequence. This process gives rise to our first
searching technique, the sequential search.

Figure 8.1 shows how this search works. Starting at the first item in the list, we simply move from item
to item, following the underlying sequential ordering until we either find what we are looking for or
run out of items. If we run out of items, we have discovered that the item we were searching for was
not present.

Fig. 8.1 Sequential search of a list of integers


The program for a sequential search in C is presented below.

int linear_search(int arr[ ], int n, int x)


{
int i;
for (i = 0; i < n; i++)
if (arr[i] == x)
return i;
return -1;
}

int main(void)
{
int arr[10] = { 54,26,93,17,77,31,44,55,20,65};
int x = 50; // item to be searched in the list arr
int n = 10; //size of array
int result = linear_search(arr, n, x);
if (result == -1)
printf("Element is not present in array");
else
printf(“Element is present at index :%d", result);
return 0;
}

This code describes a typical variant of linear search, where the result of the search is supposed to
be either the location of the list item where the desired value was found or an invalid location to
indicate that the desired element does not occur in the list.

Applications

Linear search is usually very simple to implement and is practical when the list has only a few elements
or when performing a single search in an unordered list. When many values have to be searched in
the same list, it often pays to pre-process the list in order to use a faster method. For example, one
may sort the list and use binary search or build any efficient search data structure from it. Should the
content of the list change frequently, repeated re-organization may be more trouble than it is worth.

As a result, even though in theory other search algorithms may be faster than linear search (for
instance binary search), in practice, even on medium sized arrays (around 100 items or less), it might
be infeasible to use anything else. On larger arrays, it only makes sense to use other, faster search
methods if the data is large enough, because the initial time to prepare (sort) the data is comparable
to many linear searches.

Complexity of Linear Search

The complexity of an algorithm is the amount of a resource, such as time, that the algorithm requires.
It is a measure of how ‘good’ the algorithm is at solving the problem. The complexity of a problem is
defined as the best algorithm that solves a problem.

For searching, it makes sense to count the number of comparisons performed. Each comparison may
or may not discover the item we are looking for. In addition, we make another assumption here. The
list of items is not ordered in any way. The items have been placed randomly into the list. In other
words, the probability that the item we are looking for is in any particular position is exactly the same
for each position of the list.

If the item is not in the list, the only way to know it is to compare it against every item present. If there
are n items, then the sequential search requires n comparisons to discover that the item is not there.
In the case where the item is in the list, the analysis is not so straightforward. There are actually three
different scenarios that can occur. In the best case we will find the item in the first place we look, at
the beginning of the list. We will need only one comparison. In the worst case, we will not discover
the item until the very last comparison, the nth comparison.

What about the average case? On average, we will find the item about halfway into the list; that is,
we will compare against n/2 items. However, that as n gets large, the coefficients, no matter what
they are, become insignificant in our approximation, so the complexity of the sequential search, is
O(n). The given below tables 1 and 2 shows the case comparisons in sequential search of an unordered
and ordered list.

8.3 BINARY SEARCH


All of the sequential search algorithms have the same problem; they walk over the entire list. Some of
our improvements work to minimize the cost of traversing the whole data set, but those
improvements only cover up what is really a problem with the algorithm. By thinking of the data in a
different way, we can make speed improvements that are much better than anything sequential
search can guarantee.

Consider a list in ascending sorted order. It would work to search from the beginning until an item is
found or the end is reached, but it makes more sense to remove as much of the working data set as
possible so that the item is found more quickly. If we started at the middle of the list we could
determine which half the item is in (because the list is sorted). This effectively divides the working
range in half with a single test. By repeating the procedure, the result is a highly efficient search
algorithm called binary search.

The Binary search or half-interval search algorithm finds the position of the specified input value
within an array, which is sorted by key value. In each step, the algorithm compares the search key
value with the key value of the middle element of the array. If the key value is matched, then a
matching element has been found and its index or position is returned.

Otherwise, if the search key is less than the middle element’s key, then the algorithm repeats its action
on the sub-array to the left of the middle element or, if the search key is greater, on the sub-array to
the right. If the remaining array to be searched is empty, then the key cannot be found in the array
and a message “not found” is displayed.

A binary search halves the number of items to check with each iteration, so locating an item (or
determining its absence) takes logarithmic time. A binary search is a dichotomic divide and conquer
search algorithm.

For example, a dictionary is a sorted list of word definitions. Given a word, one can find its definition.
A telephone book is a sorted list of people’s names, addresses and telephone numbers. Knowing
someone’s name allows one to quickly find their telephone number and address. If the list to be
searched contains more than a few items (a dozen, say), a binary search will require far fewer
comparisons than a linear search, but it imposes the requirement that the list be sorted.

Algorithm

Given an array A of n elements with values or records A0,A1,A2,..…….,An-1 sorted such that A0<=A1
<=A2<=,..…….,<=An-1 and target value T, the following subroutine uses binary search to find the index of
T in A.

1. Set L to 0 and R to n-1.


2. If L > R, the search terminates as unsuccessful.
3. Set m (the position of the middle element) to the floor of (L+R)/2, which is the greatest integer
less than or equal to (L+R)/2.
4. If Am < T, set L to m+1 and go to step 2.
5. If Am >T, set R to m-1 and go to step 2.
6. Now Am = T, the search is done; return value of m.

Recursive function of Binary Search:

int binarySearch (int low, int high, int key)


{
while(low<=high)
{
int mid=(low+high)/2;
if(a[mid]<key)
{
low=mid+1;
}
else if(a[mid]>key)
{
high=mid-1;
}
else
{
return mid;
}
}
return -1; //key not found
}

Repeat this procedure recursively until low > high. If at any iteration, we get a [mid] = key, we return
value of mid. This is the position of key in the array. If key is not present in the array, we return -1.

Limitations of the Binary Search Algorithm

The algorithm requires two conditions:

(1) The list must be sorted.

(2) One must have direct access to the middle element in any sublist.

Complexity of Binary Search

The complexity of binary search in worst and average cases is O(log n) and it is O(1) in best case.

Check your Progress 1

Fill in the Blanks.


1. ______________ is the algorithmic process of finding a particular item.
2. A binary search or half-interval search algorithm finds the ______________ of a specified
input value.
3. Searching a _______________ collection is a common task.
4. The complexity of binary search in worst and average cases is ______________.

State True or False.


1. In linear search, each data item is stored in a position relative to the others.
2. Linear search can also be described as a non-recursive algorithm.
3. The complexity of an algorithm is the amount of a resource, such as time, that the algorithm
requires.
4. Linear searches require the collection to be sorted.

Activity 1
1. Consider an array of10 elements: 23, 52, 9, 34, 77, 18, 10, 42, 55, 29
2. Write the pseudocode to find the position of 10 and display it as “The position of element
10 is _____” (Hint: Use Sequential search method.)
3. Using sequential search method, write a program to create an array to store any 5 numbers
and display the position of all those numbers, which are divisible by 5.
4. Suppose you are doing a sequential search of the list [15, 18, 2, 19, 18, 0, 8, 14, 19, 14]. How
many comparisons would you need to do in order to find the key 18?
5. Store the following elements in an array: 10, 25, 36, 48, 62, 100. Using Binary search
method, write the pseudocode to find the position of 62.

Summary
• Searching is a process of finding the position of an element in a list.
• Linear search is the process of traversing the entire list from beginning to end
• Binary search method is the process of searching for the specified item in a sorted list.
Keywords
• Floor: Round a number down to the nearest integer
• Complexity: Complexity of an algorithm is a measure of the amount of time and/or space
required by an algorithm for an input of a given size (n).

Self-Assessment Questions
1. Explain searching.
2. Discuss the complexities of linear and binary search algorithms with examples.
3. Can binary search be performed on non-sorted list, illustrate your answer.

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. Searching is the algorithmic process of finding a particular item.


2. A binary search or half-interval search algorithm finds the position of a specified input value.
3. Searching a sorted collection is a common task.
4. The complexity of binary search in worst and average cases is O(log n).

State True or False.

1. True
2. False
3. True
4. False

Suggested Reading
1. Anany Levitin. Introduction to the Design and Analysis of Algorithms.
2. Karumanchi, Narsimha. Data Structures and Algorithms Made Easy.
3. Data Structures Using C by Reema Thareja, Oxford University Press.
4. M. A. Weiss, ―Data Structures and Algorithm Analysis in C, Pearson Education Asia.
5. Aho, J. E. Hopcroft and J. D. Ullman, ―Data Structures and Algorithms.
6. Pat Morin, Open Data Structures (in C++) - Edition 0.1G.
Unit 9
Sorting Algorithms
Structure:

9.1 Introduction

9.2 Types of Sorting Algorithms

9.3 Selection Sort

9.4 Bubble Sort

9.5 Merge-Sort

9.6 Quicksort

9.7 Counting Sort

9.8 Radix-Sort

Summary

Keywords

Self-Assessment Questions

Answers to Check your Progress

Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will be able to:

 Explain the concept of sorting and identify its need


 Discuss the different sorting algorithms
 Analyse the usage of these sorting algorithms in different scenarios

9.1 INTRODUCTION
Sorting is the technique of arranging the elements/records in a particular order or sequence. The
arrangement is most often made as ascending or descending if there is numerical data and
alphabetical if there is character data. In an ascending order, the elements are arranged in increasing
order. In descending, the elements are arranged in decreasing order. The commonly used order is
numerical or alphabetical. The process of sorting holds a major importance in computer science as it
helps in speeding up the process of searching an element or a record.

Sorting helps in faster and efficient search in a database. The reverse of sorting is shuffling which
means arranging the elements in random or unordered way. The advantage of sorting is that it allows
to arrange the data in a meaningful order, but it also is a time-consuming process.

Let A be a list of N elements A1, A2, A3......, AN in memory. Sorting of A means arranging the elements
of A numerically or alphabetically so that they are either in ascending or descending order i.e.,

A1< = A2 <= A3<= ….... <= AN or A1 >= A2 >= A3 >= …..... >= AN

In this unit, we are going to discuss algorithms for sorting a set of n items.

9.2 TYPES OF SORTING ALGORITHMS


To perform sorting, the computer needs to follow an algorithm, i.e., a sequence of predefined steps.
A sorting algorithm is used to arrange a group of unordered elements in a particular order. Different
sorting algorithms have different efficiency and performance in terms of time taken and the space
occupied.

There are many sorting algorithms and selection of a particular algorithm depends on the problem to
be solved. They are selected on the basis of complexity involved. Sorting could be comparison or non-
comparison based. It could also be internal sorting or external sorting. In internal sorting, the data to
be sorted is held in the primary memory. While in external sorting, the data to be sorted is held in the
primary memory, while the temporary data that helps in the sorting process is held in secondary
memory. The sorting algorithm to be used is dependent mainly on the problem to be solved. But there
are other factors as well that should be kept in mind while selecting an algorithm, such as:

 Type of data to be sorted


 Performance expected
 Length and complexity of code
 Memory requirements

Type of the data to be sorted implies whether the sorting is to be applied on character, words,
numbers, etc. Different sorting algorithms behave differently on the type of the data to be sorted.
Then in different scenarios, a different type of performance is of priority. Some problems expect that
the algorithm should run in minimum amount of time, while some expect the memory usage should
be less. So the type of algorithm selected also depends on the performance criterion of concern.
Although some algorithms give better performance, the code at times is more complex. Therefore,
the length and complexity of code varies from algorithm to algorithm.

There are commonly two types of sorting algorithms:

 Internal sorting
 External sorting

Internal sorting is applied in situations where the data elements to be sorted are small and can fit into
the processor’s main memory and no extra space is required. Examples of internal sorting are
insertion, sort, bubble sort and selection sort.

External sorting is applied in situations where the data to be sorted is large enough to fit into the
processor’s memory. The source data and the final result are stored in hard disks and tapes, i.e., it
requires auxiliary storage. The data is brought into memory for processing a portion at a time and can
be accessed sequentially. Merge sort is one example of sorting technique that is used in case of
external sorting. Sorting with tapes is very similar to sorting with disks, the only difference being some
amount of time is spent in sequentially searching the data.

When data resides in internal memory, then the data access time is less than the computation time
and there is a need to reduce the number of CPU operations. While when the data resides in the
external memory, data access time is always greater than the computational time and there is a need
to reduce the number of disk accesses.

9.3 SELECTION SORT


Selection sort works recursively to select an element and place it in the correct position in the
sequence. For example, to arrange the elements in ascending order, the smallest element is selected
from the list and placed at the first position. Then the next smallest element is selected from the
remaining list and placed at the second position, and so on till the list is sorted.

As seen in the above example, in the first iteration, the smallest element in the entire list is 11. This
element is swapped with the element at the first position since we are arranging the list in an
ascending order. Therefore, the first iteration ensures that the first element is at the correct sequence
in the list. In the second iteration, the smallest element is found in the remaining list leaving the first
element, since it is at its proper position. The smallest element in the remaining list is 12 which is
swapped with the element in the second position, hence placing 12 in the correct sequence in the list.
This process is continued till the last element in the list. The elements underlined shows the elements
which are swapped. The number of iterations is directly proportional to the number of elements in
the list. In the above example, the number of elements are 10 and hence require 9 iterations to get a
sorted list. This method also does not require additional storage.

If the size of the array is n, then the number of comparisons to find the largest element in each sub-
array is n+(n-1)+(n-2)+....+2+1 comparisons, which is equal to n(n-1)/2 comparisons.

At each iteration, a swapping is performed. So if the number of elements is n, then in all (n-1),
comparisons are required.

Algorithm

1. for I=1 to N-1


2. min=A [I]
3. for K=I+1 to N
4. if (min>A [I])
5. min=A [K], Loc=K
6. Swap (A [Loc],A[I])
7. Exit

9.4 BUBBLE SORT


Bubble sort works repeatedly by swapping elements in the adjacent places that are not in order, till
the entire list is in a particular order.
As can be seen from the above example, each iteration requires a set of comparisons to be made.
After the first iteration, the number that is the largest reaches its correct position in the list and hence
need not be compared in the next iteration. So the number of comparison keeps on decreasing with
each iteration.

The total number of iterations is one less than the total number of elements. So for the above example,
the total number of iterations is 9, which is the same as selection sort. But the number of comparisons
made in bubble sort is more as compared to selection sort.

Number of comparison=9+8+7+6+5+4+3+2+1=45

Number of iterations=10 -1=9

Algorithm

1. for I=1 to N-1 (for pass)


2. for k=1 to N-I (for comparison)
3. if A[K]>A[K+1]
4. swap [A(k) , A(k+1)]
5. End if
6. End For
7. End For

9.5 MERGE-SORT
The merge-sort algorithm is a classic example of recursive divide and conquer. In this approach, a
larger problem set is divided into smaller sub-lists and the solution is applied on the smaller sub-lists.
Initially, the entire list is divided into two smaller sub-lists, which in turn is divided into two smaller
sub-lists, and so on till a sub-list is contains two numbers. The two numbers are sorted. The smaller
sub-lists are combined together to get a sub-list of four numbers. This process is repeated till the entire
list is combined together. Merge sort is an example of external sorting. The entire process of dividing
the list and combining the smaller sub-lists is a recursive process. It is a recursive sorting procedure.

If the length of a is at most 1, then a is already sorted, so we do nothing. Otherwise, we split a into
two halves, a0 = a[0], ….., a[n/2-1] and a1 = a[n/2],….,a[n-1]. We recursively sort a0 and a1, and then
we merge (the now sorted) a0 and a1 to get our fully sorted array a:

Algorithm and Example

MergeSort(array A, int p, int r)

if (p < r)

{ // we have at least 2 items

q = (p + r)/2

MergeSort(A, p, q) // sort A[p..q]

MergeSort(A, q+1, r) // sort A[q+1..r]

Merge(A, p, q, r) // merge everything together

}
Merge(array A, int p, int q, int r) // merges A[p..q] with A[q+1..r]

array B[p..r]

i = k = p // initialize pointers

j = q+1

while (i <= q and j <= r) // while both subarrays are nonempty

if (A[i] <= A[j])

B[k++] = A[i++] // copy from left subarray

else

B[k++] = A[j++] // copy from right subarray

while (i <= q)

B[k++] = A[i++] // copy any leftover to B

while (j <= r)

B[k++] = A[j++]

for i = p to r do

A[i] = B[i] // copy B back to A

An example is shown in Fig. 9.1.

Compared to sorting, merging the two sorted arrays a0 and a1 is fairly easy. We add elements to a
one at a time. If a0 or a1 is empty, then we add the next elements from the other (non-empty) array.
Otherwise, we take the minimum of the next element in a0 and the next element in a1 and add it to
a.

Fig. 9.1 Merge sort


Notice that this algorithm performs at most n - 1 comparisons before running out of elements in one
of a0 or a1. To understand the running-time of merge-sort, it is easiest to think of it in terms of its
recursion tree. Suppose for now that n is a power of two, so that n = 2log n, and log n is an integer. Refer
to Fig. 9.2. Merge sort turns the problem of sorting n elements into two problems, each of sorting n/2
elements.

Fig. 9.2 The merge-sort recursion tree

These two sub-problem are then turned into two problems each, for a total of four sub-problems,
each of size n/4. These four sub-problems become eight sub-problems, each of size n/8, and so on. At
the bottom of this process, n/2 sub-problems, each of size two, are converted into n problems, each
of size one. For each sub-problem of size n/2i , the time spent merging and copying data is O(n/2i ).
Since there are 2i sub-problems of size n=2i , the total time spent working on problems of size 2i , not
counting recursive calls, is

2i x O(n/2i ) = O(n)

Therefore, the total amount of time taken by merge-sort is

9.6 QUICKSORT
The quicksort algorithm is another classic divide and conquer algorithm. Unlike merge-sort, which
does merging after solving the two sub-problems, quicksort does all of its work upfront. Quicksort is
simple to describe: Pick a random pivot element, x, from a; partition a into the set of elements less
than x, the set of elements equal to x, and the set of elements greater than x; and, finally, recursively
sort the first and third sets in this partition. An example is shown in Fig. 9.3
Fig. 9.3 Quicksort

Implementation of Quick Sort Algorithm in C:

// to swap two numbers


void swap(int* a, int* b)
{
int t = *a;
*a = *b;
*b = t;
}

int partition (int arr[], int low, int high)


{
int pivot = arr[high]; // selecting last element as pivot
int i = (low - 1); // index of smaller element

for (int j = low; j <= high- 1; j++)


{
// If the current element is smaller than or equal to pivot
if (arr[j] <= pivot)
{
i++; // increment index of smaller element
swap(&arr[i], &arr[j]);
}
}
swap(&arr[i + 1], &arr[high]);
return (i + 1);
}
/*
a[] is the array, p is starting index, that is 0,
and r is the last index of array.
*/
void quicksort(int a[], int p, int r)
{
if(p < r)
{
int q;
q = partition(a, p, r);
quicksort(a, p, q-1);
quicksort(a, q+1, r);
}
}

// function to print the array


void printArray(int a[], int size)
{
int i;
for (i=0; i < size; i++)
{
printf("%d ", a[i]);
}
printf("\n");
}

int main()
{
int arr[] = {9, 7, 5, 11, 12, 2, 14, 3, 10, 6};
int n = sizeof(arr)/sizeof(arr[0]);

// call quickSort function


quicksort(arr, 0, n-1);

printf("Sorted array: \n");


printArray(arr, n);
return 0;
}

Quicksort is very closely related to the random binary search trees. In fact, if the input to quicksort
consists of n distinct elements, then the quicksort recursion tree is a random binary search tree. To
see this, recall that when constructing a random binary search tree the first thing we do is pick a
random element x and make it the root of the tree. After this, every element will eventually be
compared to x, with smaller elements going into the left subtree and larger elements into the right.

In quicksort, we select a random element x and immediately compare everything to x, putting the
smaller elements at the beginning of the array and larger elements at the end of the array. Quicksort
then recursively sorts the beginning of the array and the end of the array, while the random binary
search tree recursively inserts smaller elements in the left subtree of the root and larger elements in
the right subtree of the root.

9.7 COUNTING SORT

Suppose we have an input array a consisting of n integers, each in the range 0,…,k-1. The counting-
sort algorithm sorts a using an auxiliary array c of counters. It outputs a sorted version of a as an
auxiliary array b. The idea behind counting-sort is simple: For each i ε {f0,…,k-1}, count the number of
occurrences of i in a and store this in c[i]. Now, after sorting, the output will look like c[0] occurrences
of 0, followed by c[1] occurrences of 1, followed by c[2] occurrences of 2,. . . , followed by c[k-1]
occurrences of k - 1. The code that does this is very slick, and its execution is illustrated in Fig. 9.4:
Algorithms

countingSort(int &a[ ], int k) {

int c[k];

for (int i = 0; i < a.length; i++)

c[a[i]]++;

for (int i = 1; i < k; i++)

c[i] += c[i-1];

int b[a.length];

for (int i = a.length-1; i >= 0; i--)

b[--c[a[i]]] = a[i];

a = b;

The first for loop in this code sets each counter c[i] so that it counts the number of occurrences of i in
a. By using the values of a as indices, these counters can all be computed in O(n) time with a single for
loop. At this point, we could use c to fill in the output array b directly. However, this would not work
if the elements of a have associated data.

Therefore we spend a little extra effort to copy the elements of a into b. The next for loop, which takes
O(k) time, computes a running-sum of the counters so that c[i] becomes the number of elements in a
that are less than or equal to i. In particular, for every i ε {f0,…,k-1}, the output array, b, will have

b[c[i -1]] = b[c[i -1] + 1] = … = b[c[i] -1] = i

Finally, the algorithm scans a backwards to place its elements, in order, into an output array b. When
scanning, the element a[i] = j is placed at location b[c[j] - 1] and the value c[j] is decremented.

Fig. 9.4 Counting sort


9.8 RADIX-SORT
Counting-sort is very efficient for sorting an array of integers when the length, n, of the array is not
much smaller than the maximum value, k-1, that appears in the array. The radix-sort algorithm, which
we now describe, uses several passes of counting-sort to allow for a much greater range of maximum
values.

Radix-sort sort’s w-bit integers by using w/d passes of counting-sort to sort these integers d bits at a
time. More precisely, radix sort first sorts the integers by their least significant d bits, then their next
significant d bits, and so on until, in the last pass, the integers are sorted by their most significant d
bits.

Function of radix sort:

void radix_sort(int *a, int n)

int i, b[10], m = 0, exp = 1;

for (i = 0; i < n; i++)

if (a[i] > m)

m = a[i];

while (m / exp > 0)

int box[10] = { 0 };

for (i = 0; i < n; i++)

box[a[i] / exp % 10]++;

for (i = 1; i < 10; i++)

box[i] += box[i - 1];

for (i = n - 1; i >= 0; i--)

b[--box[a[i] / exp % 10]] = a[i];

for (i = 0; i < n; i++)

a[i] = b[i];

exp *= 10;

}
An example is shown in Fig. 9.5.

Fig. 9.5 Radix sort

Check your Progress 1


State True or False.
1. Merge sort has a linear running time complexity.
2. The average case running time complexity of quick sort is O(n log n).
Multiple Choice Single Response
1. Which algorithm uses the divide, conquer, and combine algorithmic paradigm?
(a) Selection sort (b) Insertion sort
(c) Merge sort (d) Radix sort
2. In which sorting, consecutive adjacent pairs of elements in the array are compared with
each other?
(a) Bubble sort (b) Selection sort
(c) Merge sort (d) Radix sort
3. Which of the following techniques deals with sorting the data stored in the computer’s
memory?
(a) Insertion sort (b) Internal sort
(c) External sort (d) Radix sort

Activity 1
1. Sort the following sequence of numbers using quick sort and show the iterations of the
sorting process.
42, 34, 75, 23, 21, 18, 90, 67, 78
2. Write a program to implement a sort technique that works on the principle of divide and
conquer strategy.
3. Write a program to sort an array of names using the bucket sort.

Summary
 Sorting is a very important process and has many applications in various fields. Sorting helps
in finding a record or data easily in a set of data.
 The selection of sorting algorithm is dependent on various factors including the type of data,
etc.
 Internal sorting deals with sorting the data stored in the memory, whereas external sorting
deals with sorting the data stored in files.
 Merge sort and quick sort both works by using a divide-and-conquer strategy.

Keywords
 Sorting: Arranging elements in a particular order.
 Internal sorting: Data sorting process that takes place entirely within the main memory of a
computer.
 External sorting: Class of sorting algorithms that can handle massive amounts of data.

Self-Assessment Questions
1. What is the importance of sorting?
2. Explain the difference between merge sort and quick sort. Which one is more efficient?
3. What is sorting? Explain the types of sorting algorithms.

Answers to Check your Progress


Check your Progress 1

State True or False.

1. False
2. True

Multiple Choice Single Response.

1. Which algorithm uses the divide, conquer, and combine algorithmic paradigm?

(c) Merge sort

2. In which sorting, consecutive adjacent pairs of elements in the array are compared with each
other?

(a) Bubble sort

3. Which of the following techniques deals with sorting the data stored in the computer’s
memory?

(b) Internal sort

Suggested Reading
1. Data Structures Using C by Reema Thareja, Oxford University Press.
2. M. A. Weiss, ―Data Structures and Algorithm Analysis in C, Pearson Education Asia.
3. Aho, J. E. Hopcroft and J. D. Ullman, ―Data Structures and Algorithms.
4. Pat Morin, Open Data Structures (in C++) - Edition 0.1G.
Unit 10
Graphs
Structure:

10.1 Introduction
10.2 Definition of Graph
10.3 Adjacency Matrix: Representing a Graph by a Matrix
10.4 Adjacency Lists: Representing a Graph by a Lists
10.5 Graph Traversal
10.5.1 Breadth-First Search
10.5.2 Depth-First Search
10.6 Shortest Path Algorithms
10.6.1 Minimum Spanning Trees
10.6.2 Dijkstra's Algorithm
Summary
Key Words
Self-Assessment Questions
Answers to Check your Progress
Suggested Reading

The text is adapted by Symbiosis Centre for Distance Learning under a Creative Commons Attribution-
ShareAlike 4.0 International (CC BY-SA 4.0) as requested by the work’s creator or licensees. This license
is available at https://creativecommons.org/licenses/by-sa/4.0/.
Objectives

After going through this unit, you will be able to:

• Define graph
• Implement a graph using adjacency matrix and lists
• Traverse a graph using BFS and DFS traversal

10.1 INTRODUCTION
A graph is an abstract data structure that is used to implement the mathematical concept of graphs.
It is basically a collection of vertices (also called nodes) and edges that connect these vertices. A graph
is often viewed as a generalisation of the tree structure, where instead of having a purely parent-to-
child relationship between trees nodes, any kind of complex relationship can exist. In this unit, we
study two representations of graphs and basic algorithms that use these representations.

10.2 DEFINITION OF GRAPH


Mathematically, a (directed) graph is a pair G = (V, E) where V is a set of vertices and E is a set of
ordered pairs of vertices called edges. An edge (i, j) is directed from i to j; i is called the source of the
edge and j is called the target. A path in G is a sequence of vertices v0,….., vk such that, for every i ε
{1,….,k}, the edge (vi-1,vi ) is in E. A path v0,….., vk is a cycle if, additionally, the edge (vk,v0) is in E. A path
(or cycle) is simple if all of its vertices are unique. If there is a path from some vertex vi to some vertex
vj then we say that vj is reachable from vi .

In an undirected graph, edges do not have any direction associated with them. That is, if an edge is
drawn between nodes A and B, then the nodes can be traversed from A to B as well as from B to A.
An example of a graph is shown in Figure 10.1.

(a) (b)

Fig. 10.1 (a) Directed graph with twelve vertices, (b) Undirected graph with 5 vertices

Graph Terminology:

• Adjacent nodes or neighbours: For every edge, e = (u, v) that connects nodes u and v, the
nodes u and v are the end-points and are said to be the adjacent nodes or neighbours.
• Out-degree of a node: The out-degree of a node u, written as outdeg(u), is the number of
edges that originate at u.
• In-degree of a node: The in-degree of a node u, written as indeg(u), is the number of edges
that terminate at u.
• Degree of a node: Degree of a node u, deg(u), is the total number of edges containing the
node u. If deg(u) = 0, it means that u does not belong to any edge and such a node is known
as an isolated node. deg(u) = indeg(u) + outdeg(u).
• Regular graph: It is a graph where each vertex has the same number of neighbours. That is,
every node has the same degree. A regular graph with vertices of degree k is called a k–regular
graph or a regular graph of degree k.
• Path: A path P written as P = {v0, v1, v2, ..., vn), of length n from a node u to v is defined as a
sequence of (n+1) nodes. Here, u = v0, v = vn and vi–1 is adjacent to vi for i = 1, 2, 3, ..., n.
• Closed path: A path P is known as a closed path if the edge has the same end-points. That is,
if v0 = vn.
• Simple path: A path P is known as a simple path if all the nodes in the path are distinct with
an exception that v0 may be equal to vn. If v0 = vn, then the path is called a closed simple path.
• Cycle: A path in which the first and the last vertices are same. A simple cycle has no repeated
edges or vertices (except the first and last vertices).
• Connected graph: A graph is said to be connected if for any two vertices (u, v) in V there is a
path from u to v. That is to say that there are no isolated nodes in a connected graph. A
connected graph that does not have any cycle is called a tree. Therefore, a tree is treated as a
special graph.
• Complete graph: A graph G is said to be complete if all its nodes are fully connected. That is,
there is a path from one node to every other node in the graph. A complete graph has n(n–
1)/2 edges, where n is the number of nodes in G.
• Clique: In an undirected graph G = (V, E), clique is a subset of the vertex set C ⊆ V, such that
for every two vertices in C, there is an edge that connects two vertices.
• Labelled graph or weighted graph: A graph is said to be labelled if every edge in the graph is
assigned some data. In a weighted graph, the edges of the graph are assigned some weight or
length. The weight of an edge denoted by w(e) is a positive value which indicates the cost of
traversing the edge.
• Multiple edges: Distinct edges which connect the same end-points are called multiple edges.
That is, e = (u, v) and e' = (u, v) are known as multiple edges of G.
• Loop: An edge that has identical end-points is called a loop. That is, e = (u, u).
• Multi-graph: A graph with multiple edges and/or loops is called a multi-graph.
• Size of a graph: The size of a graph is the total number of edges in it.
• Tree: It is an acyclic connected graph.
Complete graph
Due to their ability to model so many phenomena, graphs have an enormous number of applications.
There are many obvious examples. Computer networks can be modelled as graphs, with vertices
corresponding to computers and edges corresponding to (directed) communication links between
those computers. City streets can be modelled as graphs, with vertices representing intersections and
edges representing streets joining consecutive intersections.

Less obvious examples occur as soon as we realize that graphs can model any pairwise relationships
within a set. For example, in a university setting we might have a timetable conflict graph whose
vertices represent courses offered in the university and in which the edge (i,j) is present if and only if
there is at least one student that is taking both class i and class j. Thus, an edge indicates that the exam
for class i should not be scheduled at the same time as the exam for class j.

Throughout this section, we will use n to denote the number of vertices of G and m to denote the
number of edges of G. That is, n = |V| and m = |E|. Furthermore, we will assume that V = {0,….,n-1}.
Any other data that we would like to associate with the elements of V can be stored in an array of
length n.

Some typical operations performed on graphs are:

• addEdge(i,j): Add the edge (i,j) to E.


• removeEdge(i,j): Remove the edge (i,j) from E.
• hasEdge(i,j): Check if the edge (i,j) ε E
• outEdges(i): Return a List of all integers j such that (i,j) ε E
• inEdges(i): Return a List of all integers j such that (j,i) ε E

Note that these operations are not terribly difficult to implement efficiently. For example, the first
three operations can be implemented directly. The last two operations can be implemented in
constant time by storing, for each vertex, a list of its adjacent vertices.

However, different applications of graphs have different performance requirements for these
operations and, ideally, we can use the simplest implementation that satisfies all the application’s
requirements. For this reason, we discuss two broad categories of graph representations.
10.3 ADJACENCY MATRIX: REPRESENTING A GRAPH BY A MATRIX

An adjacency matrix is a way of representing an n vertex graph G = (V, E) by an n x n matrix, a, whose


entries are boolean values. The matrix entry a[i][j] is defined as:

The adjacency matrix for the graph in Fig. 10.1(a) is shown in Fig. 10.2. In this representation, the
operations addEdge(i,j), removeEdge(i,j), and hasEdge(i,j) just involve setting or reading the matrix
entry a[i][j]:

void addEdge(int i, int j)


{
a[i][j] = true;
}
void removeEdge(int i, int j)
{
a[i][j] = false;
}
bool hasEdge(int i, int j)
{
return a[i][j];
}

Fig. 10.2 Graphs and their representation using Adjacency Matrix

These operations clearly take constant time per operation. Where the adjacency matrix performs
poorly is with the outEdges(i) and inEdges(i) operations. To implement these, we must scan all n
entries in the corresponding row or column of a and gather up all the indices, j, where a[i][j],
respectively a[j][i], is true.

void outEdges(int i, List &edges)


{
for (int j = 0; j < n; j++)
if (a[i][j])
edges.add(j);
}
void inEdges(int i, List &edges)
{
for (int j = 0; j < n; j++)
if (a[j][i])
edges.add(j);
}

These operations clearly take O(n) time per operation. Another drawback of the adjacency matrix
representation is that it is large. It stores an n x n boolean matrix, so it requires at least n2 bits of
memory. The implementation here uses a matrix of bool values so it actually uses on the order of n2
bytes of memory. A more careful implementation, which packs w boolean values into each word of
memory, could reduce this space usage to O (n2/w) words of memory.

An Adjacency Matrix supports the operations

• addEdge(i,j), removeEdge(i,j), and hasEdge(i,j) in constant time per operation and


• inEdges(i), and outEdges(i) in O(n) time per operation.

The space used by an Adjacency Matrix is O(n2).

Despite its high memory requirements and poor performance of the inEdges(i) and outEdges(i)
operations, an Adjacency Matrix can still be useful for some applications. In particular, when the graph
G is dense, i.e., it has close to n2 edges, then a memory usage of n2 may be acceptable.

The Adjacency Matrix data structure is also commonly used because algebraic operations on the
matrix a can be used to efficiently compute properties of the graph G. This is a topic for a course on
algorithms, but we point out one such property here: If we treat the entries of a as integers (1 for true
and 0 for false) and multiply a by itself using matrix multiplication then we get the matrix a2. Recall,
from the definition of matrix multiplication, that

Interpreting this sum in terms of the graph G, this formula counts the number of vertices, k, such that
G contains both edges (i,k) and (k,j). That is, it counts the number of paths from i to j (through
intermediate vertices, k) whose length is exactly two. This observation is the foundation of an
algorithm that computes the shortest paths between all pairs of vertices in G using only O (log n)
matrix multiplications.

10.4 ADJACENCY LISTS: REPRESENTING A GRAPH BY A LISTS


Adjacency list representations of graphs take a more vertex-centric approach. There are many possible
implementations of adjacency lists. In this section, we present a simple one. At the end of the section,
we discuss different possibilities. In an adjacency list representation, the graph G = (V,E) is represented
as an array, adj, of lists. The list adj[i] contains a list of all the vertices adjacent to vertex i. That is, it
contains every index j such that (i,j) ε E. The list adj is implemented as a singly or doubly linked list.

The addEdge(i,j) operation just appends the value j to the list adj[i]:

Adjacency Lists

void addEdge(int i, int j)

{
adj[i].add(j);

This takes constant time.

Fig. 10.3 Graphs and their representation using Adjacency List

The removeEdge(i,j) operation searches through the list adj[i] until it finds j and then removes it:

Adjacency Lists

void removeEdge(int i, int j)


{
for (int k = 0; k < adj[i].size(); k++)
{
if (adj[i].get(k) == j)
{
adj[i].remove(k);
return;
}
}
}
This takes O(deg(i)) time, where deg(i) (the degree of i) counts the number of edges in E that have i as
their source. The hasEdge(i;j) operation is similar; it searches through the list adj[i] until it finds j (and
returns true), or reaches the end of the list (and returns false):

Adjacency Lists

bool hasEdge(int i, int j)


{
return adj[i].contains(j);
}
This also takes O(deg(i)) time.

The outEdges(i) operation is very simple; it copies the values in adj[i] into the output list:

Adjacency Lists

void outEdges(int i, List &edges)


{
for (int k = 0; k < adj[i].size(); k++)
edges.add(adj[i].get(k));
}
This clearly takes O(deg(i)) time.

The inEdges(i) operation is much more work. It scans over every vertex j checking if the edge (i,j) exists
and, if so, adding j to the output list:

Adjacency Lists

void inEdges(int i, List &edges)


{
for (int j = 0; j < n; j++)
if (adj[j].contains(i))
edges.add(j);
}
This operation is very slow. It scans the adjacency list of every vertex, so it takes O (n + m) time. An
Adjacency Lists supports the operations

• addEdge(i;j) in constant time per operation;


• removeEdge(i;j) and hasEdge(i;j) in O(deg(i)) time per operation;
• outEdges(i) in O(deg(i)) time per operation; and
• inEdges(i) in O(n + m) time per operation.
 The space used by a Adjacency Lists is O(n + m).
10.5 GRAPH TRAVERSAL
In this section we present two algorithms for exploring a graph, starting at one of its vertices, i, and
finding all vertices that are reachable from i. Both of these algorithms are best suited to graphs
represented using an adjacency list representation. Therefore, when analysing these algorithms we
will assume that the underlying representation is an Adjacency Lists.

10.5.1 Breadth-First Search

The bread-first-search (BFS) algorithm starts at a vertex i and visits, first the neighbours of i, then the
neighbours of the neighbours of i, then the neighbours of the neighbours of the neighbours of i, and
so on.

This algorithm is a generalization of the breadth-first traversal algorithm for binary trees, and is very
similar; it uses a queue, q, which initially contains only i. It then repeatedly extracts an element from
q and adds its neighbours to q, provided that these neighbours have never been in q before. The only
major difference between the breadth-first- search algorithm for graphs and the one for trees is that
the algorithm for graphs has to ensure that it does not add the same vertex to q more than once. It
does this by using an auxiliary boolean array, seen, that tracks which vertices have already been
discovered.

Algorithm for BFS:

Step 1: SET STATUS = 1 (ready state) for each node in G

Step 2: Add the starting node A in a queue and set its STATUS = 2 (waiting state)

Step 3: Repeat Steps 4 and 5 until QUEUE is empty

Step 4: Delete a node N from queue. Process it and set its STATUS = 3 (processed state)

Step 5: Add all the neighbours of node N that are in the ready state in a queue (whose STATUS = 1)
and set their STATUS = 2 (waiting state) [END OF LOOP]

Step 6: EXIT

Example:

Let's see how the Breadth First Search algorithm works with an example. We use an undirected graph
with 5 vertices.

We start from vertex 0, the BFS algorithm starts by putting it in the Visited list and putting all its
adjacent vertices in the queue.
Next, we visit the element at the front of queue i.e. 1 and go to its adjacent nodes. Since 0 has already
been visited, we visit 2 instead and continue the same step till queue becomes empty.

Now, the queue is empty and we have visited all vertices of a graph. So the BFS traversal sequence of
the graph is 0, 1, 2, 3 and 4
Fig. 10.4 Breadth First Traversal

Function of BFS using adjacency matrix in C:

void breadth_first_search(int adj[][MAX],int visited[],int start) // initial value of start = 0


{
int queue[MAX],rear = –1,front =– 1, i;
queue[++rear] = start;
visited[start] = 1;
while(rear != front)
{
start = queue[++front];
if(start == 4)
printf("5\t");
else
printf("%c \t",start + 65);
for(i = 0; i < MAX; i++)
{
if(adj[start][i] == 1 && visited[i] == 0)
{
queue[++rear] = i;
visited[i] = 1;
}
}
}
}

The time complexity can be expressed as O (|V|+|E|), since every vertex and every edge will be
explored in the worst case. |V| is the number of vertices and |E| is the number of edges in the graph.

Completeness: Breadth-first search is said to be a complete algorithm because if there is a solution,


breadth-first search will find it regardless of the kind of graph. But in case of an infinite graph where
there is no possible solution, it will diverge.

Applications of Breadth-First Search Algorithm

Breadth-first search can be used to solve many problems such as:

 Finding all connected components, all nodes within an individual connected component in a
graph G.
 Finding the shortest path between two nodes, u and v, of a both weighted and unweighted
graph.
 Computing the maximum flow in a flow network

10.5.2 Depth-First Search

The depth-first-search (DFS) algorithm is similar to the standard algorithm for traversing binary trees;
it first fully explores one subtree before returning to the current node and then exploring the other
subtree. Another way to think of depth-first-search is by saying that it is similar to breadth-first search
except that it uses a stack instead of a queue.

During the execution of the depth-first-search algorithm, each vertex, i, is assigned a colour, c[i]: white
if we have never seen the vertex before, grey if we are currently visiting that vertex, and black if we
are done visiting that vertex. The easiest way to think of depth-first-search is as a recursive algorithm.
It starts by visiting r. When visiting a vertex i, we first mark i as grey. Next, we scan i’s adjacency list
and recursively visit any white vertex we find in this list. Finally, we are done processing i, so we colour
i black and return.

Fig. 10.5 Depth First Search

An example of the execution of this algorithm is shown in Fig. 10.5. Although depth-first-search may
best be thought of as a recursive algorithm, recursion is not the best way to implement it. Indeed, the
code given above will fail for many large graphs by causing a stack overflow.

Algorithm for DFS:

Step 1: SET STATUS = 1 (ready state) for each node in G

Step 2: Push the starting node A on the stack and set its STATUS = 2 (waiting state)

Step 3: Repeat Steps 4 and 5 until STACK is empty

Step 4: Pop the top node N. Process it and set its STATUS = 3 (processed state)

Step 5: Push on the stack all the neighbours of N that are in the ready state (whose STATUS = 1) and
set their STATUS = 2 (waiting state) [END OF LOOP]

Step 6: EXIT
Example:

Consider a below graph and adjacency list, here we want to print all the nodes that can be reached
from the node H.

Adjacency List:

1. Push H onto the stack, STACK:

2. Pop and print the top element of the STACK, that is, H. Push all the neighbours of H onto the
stack that are in the ready state. The STACK now becomes:

I
E
Print: H

3. Pop and print the top element of the STACK, that is, I. Push all the neighbours of I onto the
stack that are in the ready state. The STACK now becomes:

F
E
PRINT: I
4. Pop and print the top element of the STACK, that is, F. Push all the neighbours of F onto the
stack that are in the ready state. The STACK now becomes:

C
E

PRINT: F

5. Pop and print the top element of the STACK, that is, C. Push all the neighbours of C onto the
stack that are in the ready state. The STACK now becomes:

G
B
E

PRINT: C

6. Pop and print the top element of the STACK, that is, G. Push all the neighbours of G onto the
stack that are in the ready state. Since there are no neighbours of G that are in the ready state,
no push operation is performed. The STACK now becomes:

B
E
PRINT: G

7. Pop and print the top element of the STACK, that is, B. Push all the neighbours of B onto the
stack that are in the ready state. Since there are no neighbours of B that are in the ready state,
no push operation is performed. The STACK now becomes:

E
PRINT: B

8. Pop and print the top element of the STACK, that is, E. Push all the neighbours of E onto the
stack that are in the ready state. Since there are no neighbours of E that are in the ready state,
no push operation is performed. The STACK now becomes empty.

PRINT: E

Since the STACK is now empty, the depth-first search starting at node H is complete and the sequence
of nodes which are visited: H, I, F, C, G, B, and E
Function of DFS using adjacency matrix in C:

void depth_first_search(int adj[][MAX],int visited[ ],int start)


{
int stack[MAX];
int top = –1, i;
printf("%c–",start + 65);
visited[start] = 1;
stack[++top] = start;
while(top ! = –1)
{
start = stack[top];
for(i = 0; i < MAX; i++)
{
if(adj[start][i] && visited[i] == 0)
{
stack[++top] = i;
printf("%c–", i + 65);
visited[i] = 1;
break;
}
}
if(i == MAX)
top––;
}
}

The space complexity of a depth-first search is lower than that of a breadth first search. The time
complexity of a depth-first search is proportional to the number of vertices plus the number of edges
in the graphs that are traversed. The time complexity can be given as O(|V| + |E|).

Completeness: Depth-first search is said to be a complete algorithm. If there is a solution, depth first
search will find it regardless of the kind of graph. But in case of an infinite graph, where there is no
possible solution, it will diverge.

Applications of Depth-First Search Algorithm

Depth-first search is useful for:

 Finding a path between two specified nodes, u and v, of an unweighted and weighted graph.
 Finding whether a graph is connected or not.
 Computing the spanning tree of a connected graph.

10.6 SHORTEST PATH ALGORITHMS


The shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph
such that the sum of the weights of its constituent edges is minimized. In this section, we will discuss
different algorithms to calculate the shortest path between the vertices of a graph G.
10.6.1 Minimum Spanning Trees

Given an undirected and connected graph, a spanning tree of the graph G is a tree that spans G (that
is, it includes every vertex of G) and is a subgraph of G (every edge in the tree belongs to G)

Minimum Spanning Tree: The cost of the spanning tree is the sum of the weights of all the edges in
the tree. There can be many spanning trees. Minimum spanning tree is the spanning tree where the
cost is minimum among all the spanning trees. There also can be many minimum spanning trees.

Minimum spanning tree has direct application in the design of networks. It is used in algorithms
approximating the travelling salesman problem, multi-terminal minimum cut problem and minimum-
cost weighted perfect matching. Other practical applications are:

1. Cluster Analysis
2. Handwriting recognition
3. Image segmentation

There are two famous algorithms for finding the Minimum Spanning Tree:

Kruskal’s Algorithm:

Kruskal’s algorithm builds the spanning tree by adding edges one by one into a growing spanning tree.
Kruskal's algorithm follows greedy approach as in each iteration it finds an edge which has least weight
and add it to the growing spanning tree.

Algorithm Steps:

1. Sort the graph edges with respect to their weights.


2. Start adding edges to the MST from the edge with the smallest weight until the edge of the
largest weight.
3. Only add edges which doesn't form a cycle, edges which connect only disconnected
components.

So now the question is how to check if 2 vertices are connected or not?

This could be done using DFS which starts from the first vertex, then check if the second vertex is
visited or not. But DFS will make time complexity large as it has an order of O (V + E), where V is the
number of vertices, E is the number of edges. So the best solution is "Disjoint Sets": Disjoint sets are
sets whose intersection is the empty set so it means that they don't have any element in common.
Consider following example:

In Kruskal’s algorithm, at each iteration we will select the edge with the lowest weight. So, we will
start with the lowest weighted edge first i.e., the edges with weight 1. After that we will select the
second lowest weighted edge i.e., edge with weight 2. Notice these two edges are totally disjoint.
Now, the next edge will be the third lowest weighted edge i.e., edge with weight 3, which connects
the two disjoint pieces of the graph. Now, we are not allowed to pick the edge with weight 4, that will
create a cycle and we can’t have any cycles. So we will select the fifth lowest weighted edge i.e., edge
with weight 5. Now the other two edges will create cycles so we will ignore them. In the end, we end
up with a minimum spanning tree with total cost 11 (= 1 + 2 + 3 + 5).

In Kruskal’s algorithm, most time consuming operation is sorting because the total complexity of the
Disjoint-Set operations will be O (ElogV), which is the overall Time Complexity of the algorithm.

Prim’s Algorithm

Prim’s Algorithm also use Greedy approach to find the minimum spanning tree. In Prim’s Algorithm
we grow the spanning tree from a starting position. Unlike an edge in Kruskal's, we add vertex to the
growing spanning tree in Prim's.

Algorithm Steps:

 Maintain two disjoint sets of vertices. One containing vertices that are in the growing spanning
tree and other that are not in the growing spanning tree.
 Select the cheapest vertex that is connected to the growing spanning tree and is not in the
growing spanning tree and add it into the growing spanning tree. This can be done using
Priority Queues. Insert the vertices that are connected to growing spanning tree, into the
Priority Queue.
 Check for cycles. To do that, mark the nodes which have been already selected and insert only
those nodes in the Priority Queue that are not marked.

Consider the example below:

In Prim’s Algorithm, we will start with an arbitrary node (it doesn’t matter which one) and mark it. In
each iteration we will mark a new vertex that is adjacent to the one that we have already marked. As
a greedy algorithm, Prim’s algorithm will select the cheapest edge and mark the vertex. So we will
simply choose the edge with weight 1. In the next iteration we have three options, edges with
weight 2, 3 and 4. So, we will select the edge with weight 2 and mark the vertex. Now again we have
three options, edges with weight 3, 4 and 5. But we can’t choose edge with weight 3 as it is creating
a cycle. So we will select the edge with weight 4 and we end up with the minimum spanning tree of
total cost 7 (= 1 + 2 +4).

The time complexity of the Prim’s Algorithm is O ((V+E) log V) because each vertex is inserted in the
priority queue only once and insertion in priority queue take logarithmic time.

10.6.2 Dijkstra's Algorithm

Dijkstra's algorithm has many variants but the most common one is to find the shortest paths from
the source vertex to all other vertices in the graph.

Algorithm Steps:

1. Set all vertices distances = infinity except for the source vertex, set the source distance = 0.
2. Push the source vertex in a min-priority queue in the form (distance, vertex), as the
comparison in the min-priority queue will be according to vertices distances.
3. Pop the vertex with the minimum distance from the priority queue (at first the popped vertex
= source).
4. Update the distances of the connected vertices to the popped vertex in case of "current vertex
distance + edge weight < next vertex distance", then push the vertex with the new distance
to the priority queue.
5. If the popped vertex is visited before, just continue without using it.
6. Apply the same algorithm again until the priority queue is empty.

Time Complexity of Dijkstra's Algorithm is O (V2) but with min-priority queue it drops down to O (V +
E log V).

Example:

We want to find the shortest path from node 1 to all other nodes using Dijkstra’s algorithm.

• Node 1 is designated as the current node, the state of node 1 is (0, p), every other node has
state (∞, t)

• Nodes 2, 3,and 6 can be reached from the current node 1, update distance values for these
nodes

d2 = min{∞, 0+7} = 7

d3 = min{∞, 0+9} = 9

d6 = min{∞, 0+14} = 14

Now, among the nodes 2, 3, and 6, node 2 has the smallest distance value, the status label of
node 2 changes to permanent, so its state is (7, p), while the status of 3 and 6 remains
temporary. Node 2 becomes the current node.
• We are not done, not all nodes have been reached from node 1, so we perform another
iteration
• Nodes 3 and 4 can be reached from the current node 2, Update distance values for these
nodes

d3 = min{9, 7+10} = 9

d6 = min{∞, 7+15} = 22

Now, between the nodes 3 and 4 node 3 has the smallest distance value, the status label of
node 3 changes to permanent, while the status of 6 remains temporary. Node 3 becomes
the current node. We are not done, so we perform another iteration.

• Nodes 6 and 4 can be reached from the current node 3, update distance values for them
d4 = min{22, 9+11} = 20

d6 = min{14, 9+2} = 11

Now, between the nodes 6 and 4 node 6 has the smallest distance value, the status label of
node 6 changes to permanent, while the status of 4 remains temporary. Node 6 becomes
the current node. We are not done, so we perform another iteration

• Node 5 can be reached from the current node 6, update distance value for node 5
d5 = min{∞, 11+9} = 20
Now, node 5 is the only candidate, so its status changes to permanent. Node 5 becomes the
current node. From node 5 we cannot reach any other node. Hence, node 4 gets
permanently labelled and we are done.
Check your Progress 1
Fill in the Blanks.
1. An edge that has identical end-points is called a _____ .
2. The number of edges that originate at u are called ________ .
3. A ______ of a connected, undirected graph G is a sub-graph of G which is a tree that
connects all the vertices together.

Activity 1
1. Consider the graph given below and represent it using adjacency list:

2. Write a program to create and print a graph using BFS and DFS.
3. Consider the graph given below. Find the minimum spanning tree of this graph using
(a) Prim’s algorithm,
(b) Kruskal’s algorithm, and
(c) Dijkstra’s algorithm.
Summary
 A graph is a collection of vertices or nodes and edges that connect these vertices.
 Connected graph is a graph in which there exists a path between any two of its nodes.
 Graph is represented using adjacency lists and matrix.
 Breadth-first search is a graph search algorithm that begins at the root node and explores all
the neighbouring nodes. Then for each of those nearest nodes, the algorithm explores their
unexplored neighbour nodes, and so on, until it finds the goal.
 The depth-first search algorithm progresses by expanding the starting node of G and thus
going deeper and deeper until a goal node is found, or until a node that has no children is
encountered.
 A spanning tree of the graph G is a tree that spans G (that is, it includes every vertex of G) and
is a subgraph of G.
 Kruskal’s algorithm and Prim’s algorithm also use Greedy approach to find the minimum
spanning tree.
 Dijkstra's algorithm has many variants but the most common one is to find the shortest paths
from the source vertex to all other vertices in the graph.

Keywords
 Matrix: It is a rectangular array of numbers, symbols, or expressions, arranged in rows and
columns.
 Shortest Path: Finding a path between two vertices (or nodes) in a graph such that the sum
of the weights of its constituent edges is minimized.

Self-Assessment Questions
1. What is graph? Explain the terms.
2. Explain the graph traversal algorithm with example.
3. Explain Dijkstra's algorithm.
4. State the difference between Prim’s and Kruskal’s algorithm.

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. An edge that has identical end-points is called a loop.


2. The number of edges that originate at u are called out degree.
3. A spanning tree of a connected, undirected graph G is a sub-graph of G which is a tree that
connects all the vertices together.

Suggested Reading
1. Balagurusamy, E. Object Oriented Programming with C++. New Delhi: Tata McGraw-Hill.
2. Rajaraman, V. Fundamentals of Computers. New Delhi: PHI Learning.
3. Data Structures Using C by Reema Thareja, Oxford University Press.
4. C Programming Language, by Brian W. Kernighan, Dennis Ritchie, Prentice Hall.

You might also like