You are on page 1of 588

UNIT - I

Introduction: Data structures


 Classification of data structures
 Abstract Data Types
 Analysis of Algorithms
 Recursion: Examples illustrating Recursion (Factorial, Binary Search)
 Analyzing Recursive Algorithms
 Sorting: Quick sort, Merge Sort, Selection Sort, Radix sort
 Comparison of Sorting Algorithms
Innovation is the mother of necessity

Each innovation in data structure was driven by the need to solve a fundamental
problem that the preceding structure could not solve effectively or not at all.

Progress versus Perfection

Perfection focuses on Progress looks at what is working,


what’s not working, the the improvements, the discoveries,
flaws, the not enough. the aha moments that come from
looking at things in a new way.
Why Should you Learn Data Structures and Algorithms?

Most Computer Science students and working professionals tend to skip


learning DSA, especially in India, because they find it quite complicated and do not
fully understand its benefits. What they fail to understand is that DSA has profound
uses in various walks of life and not just in making an application more efficient.
However, programmers need to realize the importance of DSA as early as possible
in their career to be not just better programmer but to contribute significantly to
your company by solving their problems.

Reasons to learn DSA: Many people consider DSA as just a mere subject in
computer science. This is where they get it wrong. DSA is much more than that. It
teaches you a way to be a better programmer and a way to think better. It is a skill
that will help you throughout your life and is not a skill to learn just to pass a
subject. Let us dive deeper into various reasons why one should learn DSA –
Role of DSA in Solving Real-World Problems

You will be surprised to know that DSA has quite an important role to play even in
solving real-world problems. Real-world problems that take months can be solved in
minutes using the knowledge of DSA.

Let us say you want to find a set of people in the same age group within a large
collection of data. Assuming this data is sorted, you can solve this issue easily with
the binary search algorithm which works on the principle of DSA. The binary search
algorithm is considered a logarithmically scalable algorithm, unlike traditional
methods that are just linearly scalable. This means, if the number of data points in
the database is squared, the time taken to do the same task in the binary search will
only be doubled.
Why Should You Learn Data Structures and Algorithms?

1. Another real-world problem that DSA could solve is the Rubik’s cube.

 Most of you would have used or at least seen the Rubik’s cube.
 But do you know that a simple object like a Rubik’s cube has confused even the
most outstanding mathematicians?
 It is known that a Rubik’s cube has a total of 43,252,003,274,489,856,000 (~43
Quintillion) diffèrent possible positions (Ref: https://www.youtube.com/watch?v=z2-
d0x_qxSM). Then imagine the total number of paths to reach all these positions.
 Thankfully they found the solution to solve it through Djikstra’s algorithm, which is
based on the concept of DSA. It helps to solve the problem in linear time, which
means you can reach the solved position in the minimum number of states.
Why Should you Learn Data Structures and Algorithms?
2. Role of DSA in Machine Learning
Can you imagine, that a concept as advanced and futuristic as Machine Learning (ML) needs
Engineers with knowledge of DSA?

 Apart from solving real-world problems, these engineers can design amazing products using the
combination of their ML and DSA knowledge. The knowledge of DSA is the basic building block
of algorithmic thinking, and logical capabilities in any field of computer science, and ML is no
exception.
 An ML engineer spends a considerable part of his time collecting data which can lead to various
complex challenges that can be solved easily using the knowledge of DSA.
 Let us assume you are creating an ML product that has a dataset with the address as one of its
columns. Now suppose you want to retrieve a portion of this data, say the street name, then ML
cannot work on the string directly. You would need the help of DSA to implement an algorithm
based on a string to retrieve the required data.
Why Should You Learn Data Structures and Algorithms?

3. The core of computer science

 Data Structures and Algorithms are often considered to be the root or the foundation of
computer science. With advancements in the computer science field, more and more data is
being stored and processed. These huge data can slow down the processing time of the
systems. This is where DSA helps by improving the processing power of the systems due
to the effective utilization of the stored data. DSA also helps in tasks like data search,
which plays an important role in any application.
 The DSA typically shifts the focus of programming from the syntax to the approach. If you
notice, most computer science books in any curriculum will have a chapter or a course on
DSA.
 The learners can use the concepts of DSA in any programming language of their choice
and also learn how to store and manipulate the data in it to get the desired outcome.
Do you have questions about why should I study all the complicated stuff such as Array, Linked List, Stack,
Queues, Searching, Sorting, Trees, Graphs, etc..,
if it has absolutely no use in real life??
Why do companies ask questions related to data structures and algorithms if it’s not useful in a daily job?
Now I will ask a simple question to you and you need to find the solution to that.
 How would you do that if you need to search your roll number in 20000 pages of a PDF document?
 How would you do that if you need to search your roll number in 20000 pages of a PDF document (roll
numbers are arranged in a particular order)?
If you try to search it randomly or sequentially, it will take too much time. You might get frustrated after some
time. You can try another solution which is given below…
Go to page no. 10000, If your roll no. is not there, but all other roll no. on that page are lesser than yours, Go to
page no. 15000. Still if your roll no. is not there. but this time all other roll no. is greater than your Go to page
no. 12500. Continue the same process and within 30-40 seconds you will find your roll number.
Congratulations…
You just used the Binary Search Algorithm unintentionally..
To Crack the Interviews of the Top Product Based
Companies
Do you know that under the hood all your SQL and Linux commands are algorithms and data structures?
You might not realize this, but that’s how the software works.
Data structures and algorithms play a major role in implementing software and in the hiring process as well. A lot of
students and professionals have the question of why these companies’ interviews are focused on DSA instead of
language/frameworks/tools-specific questions.
Let us explain why it happens…
When you ask someone to make a decision for something the good one will be able to tell you “I choose to do X
because it’s better than A, B in these ways. I could have gone with C, but I felt this was a better choice because of this“.
In our daily lives, we always go with the person who can complete the task in a short amount of time with efficiency
and using fewer resources. The same things happen with these companies. The problem faced by these companies is
much harder and on a much larger scale. Software developers also have to make the right decisions when it comes to
solving the problems of these companies.
Knowledge of DSA like Hash Tables, Trees, Graphs, and various algorithms goes a long way in solving these problems
efficiently and the interviewers are more interested in seeing how candidates use these tools to solve a problem. Just
like a car mechanic needs the right tool to fix a car and make it run properly, a programmer needs the right tool
(algorithm and data structure) to make the software run properly. So the interviewer wants to find a candidate who can
apply the right set of tools to solve the given problem. . If you know the characteristics of one data structure in contrast
to another you will be able to make the right decision in choosing the right data structure to solve a problem.
To Crack the Interviews of the Top Product Based
Companies
Engineers working in Google, Microsoft, Facebook, and Amazon such companies are
different than others and paid higher as compared to other companies…but why?
 In these companies coding is just the implementation and roughly takes 20-30% of
the time allotted to a project.
 Most of the time goes into designing things with the best and optimum algorithms to
save on the company’s resources (servers, computation power, etc.). This is the main
reason why interviews in these companies are focused on algorithms as they want
people who can think out of the box to design algorithms that can save the company
thousands of dollars.
 YouTube, Facebook, Twitter, Instagram, GoogleMaps all these sites have the
highest number of users in the world. To handle more users on these sites it requires
more optimization to be done and that’s the reason product-based companies only
hire candidates who can optimize their software as per user demand.
To Crack the Interviews of the Top Product Based
Companies
Engineers working in Google, Microsoft, Facebook, and Amazon-like such companies are
different than others and paid higher as compared to other companies…but why?
Example: Suppose you are working in a Facebook company. You come up with an optimal solution
of a problem (like sorting a list of users from India) with time complexity of O(nLogn) instead of
O(n^2) and assume that n for the problem here for the company in real life scenario is 100 million
(very fair assumption considering the number of users registered on Facebook exceeds 1 billion).
nLogn would be 800 million, while n^2 would be 10^7 billion. In cost terms, you can see that the
efficiency has been improved more than 10^7 times, which could be a huge saving in terms of server
cost and time.
Now you might have got that companies want to hire a smart developer who can make the right
decision and save company resources, time, and money. So before you give the solution to use a
Hash table instead of a List to solve a specific problem think about the big scale and all the case
scenarios carefully. It can generate revenue for the company or the company can lose a huge amount
of money.
Data structure and algorithms help in understanding the nature of the problem at a deeper level
and thereby a better understanding of the world.
To Solve Some Real-World Complex Problems

• Have you ever been scolded by your parents when you were unable to find your book
or clothes in your messed-up room?

 Definitely yes…your parents are right when they give the advice to keep everything
in the right place so the next time you can get your stuff easily. Here you need to
arrange and keep everything (data) in such a structure that whenever you need to
search for something you get that easily and as soon as possible. This example gives
a clear idea of how important it is to arrange or structure the data in real life.

• Now take the example of a library. If you need to find a book on Python from a library,
you will go to the CSE section first, then the Programming Languages section. If these
books are not organized in this manner and just distributed randomly then it will be
frustrating to find a specific book. So data structures refer to the way we organize
information on our computers. Computer scientists process and look for the best way we
can organize the data we have, so it can be better processed based on the input provided.
To Solve Some Real-World Complex Problems
A lot of newbie programmers have this question that where we use all the stuff of data
structure and algorithms in our daily life and how it’s useful in solving real-world complex
problems. We need to mention that whether you are interested in getting into the top tech
giant companies or not DSA concepts still help a lot in your day-to-day life. Don’t you
believe us…Let’s consider some examples…
1. Facebook (Yes… we are talking about your favorite application).
 Can you just imagine that your friends on Facebook, friends of friends, and mutual
friends all can be represented easily by a Graph?
 Relax….sit for a couple of moments and think again…you can apply a graph to
represent friends’ connections on Facebook.
2. If you need to keep a deck of cards and arrange it properly how would you do that?
 You will throw it randomly or you will arrange the cards one over another and from a
proper deck. You can use Stack here to make a proper arrangement of cards one over
another.
To Solve Some Real-World Complex Problems

3. If you need to search for a word in the dictionary, what would be your approach?
 Do you go page by page or do you open some pages and if the word is not found you
open a page prior to/later to one opened depending upon the order of words to the
current page (Binary Search)?
The first two were a good example of choosing the right data structure for a real-world
problem and the third one is a good example of choosing the right algorithm to solve a
specific problem in less amount time.
All the above examples give you a clear understanding of how the organization of data is
really important in our day-to-day life. Arranging data in a specific structure is really helpful
in saving a lot of time and it becomes easier to manipulate or use them. The same goes for the
algorithm…we all want to save our time, energy, and resources. We all want to choose the
best approach to solve the problems in our daily lives.
Introduction to Data Structures
WHAT IS DATA? Type of data structure :

Data is the collection of different numbers, symbols, and alphabets to  Linear Data Structure
represent information.  Non-Linear Data Structure

WHAT IS DATA STRUCTURE?  Linear Data Structure: Elements are arranged


in one dimension, also known as linear
A data structure is a group of data elements that provides the easiest way dimension. Examples: lists, stack, queue, etc.
to store and perform different actions on the data of the computer. A data
structure is a particular way of organizing data in a computer so that it can  Non-Linear Data Structure: Elements are
be used effectively. The idea is to reduce the space and time complexities of arranged in one-many, many-one and many-
different tasks. many dimensions. Example: tree, graph, table,
etc.
The choice of a good data structure makes it possible to perform a variety
of critical operations effectively. An efficient data structure also uses Data structures are used in various fields such as:
minimum memory space and execution time to process the structure.
Operating system
A data structure has also defined an instance of ABSTRACT DATA TYPE (In Graphics
computer science, an abstract data type is a mathematical model for data Computer Design
types. An abstract data type is defined by its behavior from the point of Blockchain
view of a user, of the data, specifically in terms of possible values, possible Genetics
operations on data of this type, and the behavior of these operations.) Image Processing
Simulation etc.
Need of the Data Structures: As applications are getting Advantages of Data Structures
more complex and the amount of data is increasing day by
day, there may arise the following problems:  Efficiency: The efficiency of a program depends upon
the choice of data structures. For example: suppose,
 Processor speed: To handle very large amounts of data, we have some data and we need to perform the
high-speed processing is required, but as the data is search for a particular record. In that case, if we
growing day by day to the billions of files per entity, the organize our data in an array, we will have to search
processor may fail to deal with that much amount of data. sequentially element by element. Hence, using an
array may not be very efficient here. There are better
 Data Search: Consider an inventory size of 106 items in a data structures that can make the search process
store, If our application needs to search for a particular efficient like ordered arrays, binary search trees, or
item, it needs to traverse 106 items every time, resulting hash tables.
in slowing down the search process.
 Reusability: Data structures are reusable, i.e. once we
 Multiple requests: If thousands of users are searching the have implemented a particular data structure, we can
data simultaneously on a web server, then there are the use it at any other place. Implementation of data
chances that a very large server can fail during that structures can be compiled into libraries which can be
process used by different clients.

in order to solve the above problems, data structures are  Abstraction: The data structure is specified by the
used. Data is organized to form a data structure in such a way ADT which provides a level of abstraction. The client
that all items are not required to be searched and required program uses the data structure through the interface
data can be searched instantly. only, without getting into the implementation details.
 Data structures: An introduction Data structures are essential  Algorithm analysis: Algorithm analysis examines an
elements of computer science that enable effective data algorithm's performance in terms of its time and space
storage, organization, and manipulation. They offer a method complexity. It aids in our comprehension of how an
for efficiently managing and retrieving data. The effectiveness algorithm's efficiency changes as the size of the input
of algorithms and programs can be significantly impacted by increases. The objective is to create efficient and
selecting the appropriate data structure for a given problem. accurate algorithms.
Different types of data structures are created for specific jobs.
 Time Complexity: This metric assesses how long
 Data structure classification: Linear and non-linear data an algorithm takes to execute in relation to the
structures are the two primary categories into which data size of the input. "Big O" notation, which provides
structures may be divided. Data elements are ordered an upper constraint on the growth rate of an
sequentially in linear data structures, and each element has a algorithm's running time, is typically used to
direct predecessor and successor. Examples include queues, express it.
stacks, linked lists, and arrays. Data elements are not
sequentially arranged in non-linear data structures, which are  Space Complexity: It gauges how much memory
a type of data organization. Instead, they are arranged in an algorithm consumes in relation to the size of
hierarchical connections. Graphs and trees are two examples. the input. Big O notation is used to express it, just
like time complexity.
 Abstract Data Types: The term "Abstract Data Types" refers to
a high-level definition of data structures that emphasizes their  Best, Worst, and Average Case Analysis:
behavior and operations above the specifics of how they are Depending on the type of input, algorithms might
implemented. The operations that can be carried out on the respond in various ways. The performance of an
data structure as well as the guidelines for doing so are algorithm is constrained by its best-case time
specified by ADTs. ADTs frequently take the form of stacks, complexity, worst-case time complexity, and
queues, lists, and dictionaries. average-case analysis, which takes into account
expected performance for a variety of inputs.
Introduction to Data Structures: Data Structure Classification
Linear Data Structures: A data structure is called linear if all of its elements are arranged in a linear order. In
linear data structures, the elements are stored in a non-hierarchical way where each element has successors
and predecessors except the first and last elements.

Types of Linear Data Structures are given below:

 Arrays: An array is a collection of similar types of data items and each data item is called an element of the
array. The data type of the element may be any valid data type like char, int, float, or double. The elements of
the array share the same variable name but each one carries a different index number known as subscript.
The array can be one-dimensional, two-dimensional, or multi-dimensional.

 The individual elements of the array age are: age[0], age[1], age[2], age[3],......... age[98], age[99]

 Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called the top. A stack
is an abstract data type (ADT), that can be implemented in most of the programming languages. It is named a
stack because it behaves like a real-world stack, for example: - piles of plates or decks of cards, etc.

 Queue: Queue is a linear list in which elements can be inserted only at one end called the rear and deleted
only at the other end called the front. It is an abstract data structure, similar to a stack. Queue is opened at
both ends therefore it follows the First-In-First-Out (FIFO) methodology for storing the data items.
Non-Linear Data Structures: This data structure does not form a sequence i.e. each item or element is
connected with two or more other items in a non-linear arrangement. The data elements are not arranged
in a sequential structure.

Types of Non-Linear Data Structures are given below:

 Trees: Trees are multilevel data structures with a hierarchical relationship among their elements known
as nodes. The bottommost nodes in the hierarchy are called leaf nodes while the topmost node is
called root node. Each node contains pointers to point adjacent nodes.

 Tree data structure is based on the parent-child relationship among the nodes. Each node in the tree
can have more than one child except the leaf nodes whereas each node can have one parent except
the root node. Trees can be classified into many categories which will be discussed later in this
tutorial.

 Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented by
vertices) connected by the links known as edges. A graph is different from a tree in the sense that a graph
can have a cycle while a tree can not have one.
Introduction to Data Structures: Operations on data structure

1) Traversing: Every data structure contains a set of data elements. Traversing the data structure means visiting each element
of the data structure in order to perform some specific operation like searching or sorting.

 Example: If we need to calculate the average of the marks obtained by a student in 6 different subjects, we need to traverse
the complete array of marks and calculate the total sum, then we will divide that sum by the number of subjects i.e. 6, in
order to find the average.

2) Insertion: Insertion can be defined as the process of adding elements to the data structure at any location. If the size of the
data structure is n then we can only insert n-1 data elements into it.

3) Deletion: The process of removing an element from the data structure is called Deletion. We can delete an element from
the data structure at any random location. If we try to delete an element from an empty data structure then underflow occurs.

4) Searching: The process of finding the location of an element within the data structure is called Searching. There are two
algorithms to perform searching, Linear Search and Binary Search. We will discuss each one of them later in this tutorial.

5) Sorting: The process of arranging the data structure in a specific order is known as Sorting. There are many algorithms that
can be used to perform sorting, for example, insertion sort, selection sort, bubble sort, etc.

6) Merging: When two lists List A and List B of size M and N respectively, of similar type of elements, clubbed or joined to
produce the third list, List C of size (M+N), then this process is called merging
Array / List based representation and operations

In this session, we will discuss the array in the data structure. Arrays are defined as the collection
of similar types of data items stored at contiguous memory locations. It is one of the simplest data
structures where each data element can be randomly accessed by using its index number.

In C programming, they are the derived data types that can store the primitive type of data such as
int, char, double, float, etc. For example, if we want to store the marks of a student in 6 subjects,
then we don't need to define a different variable for the marks in different subjects. Instead, we can
define an array that can store the marks in each subject at the contiguous memory locations.

Properties of array

•Each element in an array is of the same data type and carries the same size that is 4 bytes.
•Elements in the array are stored at contiguous memory locations from which the first element is
stored at the smallest memory location.
•Elements of the array can be randomly accessed since we can calculate the address of each
element of the array with the given base address and the size of the data element.
Array / List-based representation and operations:
Representation of an array -> Why are arrays required?
Arrays are useful because –
We can represent an array in various ways in different •Sorting and searching a value in an
programming languages. As an illustration, let's see the array is easier.
declaration of the array in C language –
•Arrays are best to process multiple
values quickly and easily.

•Arrays are good for storing


multiple values in a single variable -
In computer programming, most cases
require storing a large number of data
• As per the above illustration, there are some of the of a similar type. To store such an
following important points – amount of data, we need to define a
large number of variables. It would be
• Index starts with 0. very difficult to remember the names of
all the variables while writing the
• The array's length is 10, which means we can store 10 programs. Instead of naming all the
elements. variables with a different name, it is
better to define an array and store all
• Each element in the array can be accessed via its index.
the elements into it.
Array / List-based representation and operations: Memory allocation of an array

As stated above, all the data elements of an array are stored at contiguous locations in the main memory. The name of
the array represents the base address or the address of the first element in the main memory. Each element of the array
is represented by proper indexing.

We can define the indexing of an array in the below ways -\

• 0 (zero-based indexing): The first element of the array will be arr[0].


• 1 (one-based indexing): The first element of the array will be arr[1].
• n (n - based indexing): The first element of the array can reside at any random index number.

• In the above image, we have shown the memory allocation of an array arr of size 5. The array follows a 0-based
indexing approach. The base address of the array is 100 bytes. It is the address of arr[0]. Here, the size of the data
type used is 4 bytes; therefore, each element will take 4 bytes in the memory.
Array / List-based representation and operations: Basic operations: Now, let's discuss the basic
How to access an element from the array? operations supported in the array –

We required the information given below to access any random •Traversal - This operation is used to print the
element from the array – elements of the array.
• Base Address of the array.
• Size of an element in bytes. •Insertion - It is used to add an element at a
• Type of indexing, array follows. particular index.

The formula to calculate the address to access an array element – •Deletion - It is used to delete an element from
Byte address of element A[i] = base address + size * ( i - first index) a particular index.

Here, the size represents the memory taken by the primitive data •Search - It is used to search an element using
types. For an instance, int takes 2 bytes, float takes 4 bytes of the given index or by the value.
memory space in C programming.
•Update - It updates an element at a particular
We can understand it with the help of an example – index.

Suppose an array, A[-10 ..... +2 ] having Base address (BA) = 999 and
size of an element = 2 bytes, find the location of A[-1].

L(A[-1]) = 999 + 2 x [(-1) - (-10)] = 999 + 18 = 1017


Traversal operation: This operation is performed to
traverse through the array elements. It prints all array
elements one after another. We can understand it with the
below program -

void main()
{
int Arr[5] = {18, 30, 15, 70, 12};
int i;
printf("Elements of the array are:\n");
for(i = 0; i<5; i++)
{
printf("Arr[%d] = %d, ", i, Arr[i]);
}
}

Output
Insertion operation: This operation is performed to printf("Array elements after insertion\n");
insert one or more elements into the array. As per for (i = 0; i < n; i++)
the requirements, an element can be added at the printf("%d ", arr[i]);
beginning, end, or at any index of the array. Now, printf("\n");
let's see the implementation of inserting an return 0;
element into the array. }
int main()
{ Output
int arr[20] = { 18, 30, 15, 70, 12 };
int i, x, pos, n = 5;
printf("Array elements before insertion\n");
for (i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");

x = 50; // element to be inserted


pos = 4;
n++;

for (i = n-1; i >= pos; i--)


arr[i] = arr[i - 1];
arr[pos - 1] = x;
Deletion operation printf("\nElements of array after deletion:\n");
As the name implies, this operation removes an element
from the array and then reorganizes all of the array for(i = 0; i<n; i++)
elements. {
void main()
printf("arr[%d] = %d, ", i, arr[i]);
{
}
int arr[] = {18, 30, 15, 70, 12};
}
int k = 30, n = 5;
int i, j; Output

printf("Given array elements are :\n");


for(i = 0; i<n; i++)
{
printf("arr[%d] = %d, ", i, arr[i]);
}

j = k;

while( j < n)
{
arr[j-1] = arr[j];
j = j + 1;
}
n = n -1;
Search operation
This operation is performed to search an element in the
array based on the value or index.
void main()
{
int arr[5] = {18, 30, 15, 70, 12}; int item = 70, i, j=0 ;
printf("Given array elements are :\n");
for(i = 0; i<5; i++)
{ Output
printf("arr[%d] = %d, ", i, arr[i]);
}
printf("\nElement to be searched = %d", item);

while( j < 5)
{
if( arr[j] == item )
{
break;
}
j = j + 1;
}
printf("\nElement %d is found at %d position", item, j+1);
}
Search operation
This operation is performed to search an element in the
array based on the value or index.
void main()
{
int arr[5] = {18, 30, 15, 70, 12}; int item = 70, i, j=0 ;
printf("Given array elements are :\n");
for(i = 0; i<5; i++)
{ Output
printf("arr[%d] = %d, ", i, arr[i]);
}
printf("\nElement to be searched = %d", item);

while( j < 5)
{
if( arr[j] == item )
{
break;
}
j = j + 1;
}
printf("\nElement %d is found at %d position", item, j+1);
}
Update operation
This operation is performed to update an existing array
element located at the given index.

void main()
{
int arr[5] = {18, 30, 15, 70, 12};
int item = 50, i, pos = 3;
printf("Given array elements are :\n"); Output

for(i = 0; i<5; i++)


{
printf("arr[%d] = %d, ", i, arr[i]);
}

arr[pos-1] = item;
printf("\nArray elements after updation :\n");
for(i = 0; i<5; i++)
{
printf("arr[%d] = %d, ", i, arr[i]);
}
}
Array / List-based representation and operations:
Advantages of Array
The complexity of Array operations:
•Array provides the single name for the group of variables of
Time and space complexity of various array operations the same type. Therefore, it is easy to remember the name of
are described in the following table. all the elements of an array.
•Traversing an array is a very simple process; we just need to
Time Complexity increment the base address of the array in order to visit each
element one by one.
•Any element in the array can be directly accessed by using the
Operation Average Case Worst Case index.
Access O(1) O(1) Disadvantages of Array
Search O(n) O(n)
•Array is homogenous. It means that the elements with similar
Insertion O(n) O(n) data type can be stored in it.
•In array, there is static memory allocation that is size of an
Deletion O(n) O(n)
array cannot be altered.
•There will be wastage of memory if we store less number of
elements than the declared size.
Space Complexity
Conclusion: In this session, we have discussed the special data
In array, space complexity for worst case is O(n). structure, i.e., array, and the basic operations performed on it.
Arrays provide a unique way to structure the stored data such
that it can be easily accessed and can be queried to fetch the
value using the index.
What is Searching in Data Structure?

Searching in data structure refers to the process of finding the required information from a collection
of items stored as elements in the computer memory. These sets of items are in different forms,
such as an array, linked list, graph, or tree. Another way to define searching in the data structures is
by locating the desired element of specific characteristics in a collection of items.

Based on the type of search operation, these algorithms are generally classified into two categories:

Sequential Search:

In this, the list or array is traversed sequentially, and every element is checked. For example: Linear
Search.

Interval Search:

These algorithms are specifically designed for searching in sorted data-structures. These type of
searching algorithms are much more efficient than Linear Search as they repeatedly target the
center of the search structure and divide the search space in half. For Example: Binary Search.
Linear Search Algorithm

Linear search is also called as sequential search algorithm. It is the simplest searching algorithm. In
Linear search, we simply traverse the list completely and match each element of the list with the
item whose location is to be found. If the match is found, then the location of the item is returned;
otherwise, the algorithm returns NULL.

It is widely used to search an element from the unordered list, i.e., the list in which items are not
sorted. The worst-case time complexity of linear search is O(n).

The steps used in the implementation of Linear Search are listed as follows -

• First, we have to traverse the array elements using a for loop.


• In each iteration of for loop, compare the search element with the current array element, and -
 If the element matches, then return the index of the corresponding array element.
 If the element does not match, then move to the next element.
• If there is no match or the search element is not present in the given array, return -1.
Linear Search Algorithm Working of Linear search: Now, let's see the working of
the linear search Algorithm.
To understand the working of linear search algorithm, let's
Linear_Search(a, n, val) // 'a' is the given
take an unsorted array. It will be easy to understand the
array, 'n' is the size of given array, 'val' is
working of linear search with an example.
the value to search
Let the elements of array are –
Step 1: set pos = -1
Step 2: set i = 1
Step 3: repeat step 4 while i <= n
Let the element to be searched is K = 41
Step 4: if a[i] == val
Now, start from the first element and compare K with each
set pos = i
element of the array.
print pos
go to step 6
[end of if]
set ii = i + 1
[end of loop]
Step 5: if pos = -1
The value of K, i.e., 41, is not matched with the first
print "value is not present in the array "
element of the array. So, move to the next element. And
[end of if]
follow the same process until the respective element is
Step 6: exit
found.
Linear Search Algorithm Linear Search complexity: Now, let's see the time complexity of linear search
in the best case, average case, and worst case. We will also see the space
complexity of linear search.
1. Time Complexity Case Time Complexity
Best Case O(1)
Average O(n)
Case
Worst Case O(n)
•Best Case Complexity - In Linear search, best case occurs when the
element we are finding is at the first position of the array. The best-case
time complexity of linear search is O(1).
•Average Case Complexity - The average case time complexity of linear
search is O(n).
•Worst Case Complexity - In Linear search, the worst case occurs when
the element we are looking is present at the end of the array. The worst-
case in linear search could be when the target element is not present in
the given array, and we have to traverse the entire array. The worst-case
Now, the element to be time complexity of linear search is O(n).
searched is found. So,
algorithm will return the index of The time complexity of linear search is O(n) because every element in the
the element matched. array is compared only once.
Linear Search Algorithm
2. Space Complexity
int result = Linear_search(arr, n, key);
Space O(1)
if(result == -1)
Complexity
printf("Element is not present in array")
Implementation of Linear Search: Now, let's see the
Else
programs of linear search in different programming
printf("Element is present at index %d", result);
languages.
return 0;
Program: Write a program to implement linear search in
}
Cintlanguage.
Linear_search(int arr[], int n, int key)
{
int i;
for (i = 0; i<n; i++)
if (arr[i] == key) //key is the lement to be searched
return i;
return -1; //when the element is not present inside the array
}
int main(void)
{
int arr[] = { 5, 3, 6, 2, 20, 7};
int key = 20;
int n = sizeof(arr) / sizeof(arr[0]);
Binary Search is a searching algorithm used in a sorted
array by repeatedly dividing the search interval in half.
The idea of binary search is to use the information that the
array is sorted and reduce the time complexity to O(Log
n).
Binary Search Algorithm: The basic steps to performing
Binary Search are:
•Begin with the mid element of the whole array as a
search key.
•If the value of the search key is equal to the item then
return an index of the search key.
•Or if the value of the search key is less than the item in
the middle of the interval, narrow the interval to the lower
half.
•Otherwise, narrow it to the upper half.
•Repeatedly check from the second point until the value is
found or the interval is empty.

Binary Search Algorithm can be implemented in the


following two ways
// Recursive Binary Search
// Iterative Binary Search in C
int binarySearch(int arr[], int l, int r, int x)
int binarySearch(int array[], int x, int low, int high) {
{ if (r >= l)
// Repeat until the pointers low and high meet each other {
while (low <= high) int mid = l + (r - l) / 2;
{
int mid = low + (high - low) / 2; // If the element is present at the middle itself
if (arr[mid] == x) return mid;
if (array[mid] == x) return mid;
// If element is smaller than mid, then it must present in left subarray
if (array[mid] < x) low = mid + 1; if (arr[mid] > x) return binarySearch(arr, l, mid - 1, x);

else high = mid - 1; // Else the element can only be present in right subarray
} return binarySearch(arr, mid + 1, r, x);
}
return -1; // We reach here when element is not present in array
} return -1;
}
int main(void)
{ int main(void)
int array[] = {3, 4, 5, 6, 7, 8, 9}; int x = 4; {
int n = sizeof(array) / sizeof(array[0]); int arr[] = { 2, 3, 4, 10, 40 }; int x = 10; int n = sizeof(arr) / sizeof(arr[0]);
int result = binarySearch(array, x, 0, n - 1); int result = binarySearch(arr, 0, n - 1, x);
if (result == -1) printf("Not found"); (result == -1)
else printf("Element is found at index %d", result); ? printf("Element is not present in array")
return 0; : printf("Element is present at index %d", result); return 0;
} }
Illustration of Binary Search Algorithm: Complexity Analysis of Binary Search

Step-by-step Binary Search Algorithm: We basically


ignore half of the elements just after one comparison.
1.Compare x with the middle element.
2.If x matches with the middle element, we return the
mid index.
3.Else If x is greater than the mid element, then x can
only lie in the right half subarray after the mid element.
So we recur for the right half.
4.Else (x is smaller) recur for the left half.
Sorting Techniques: Insertion Sort How Does Insertion Sort Work?
• Insertion sort is a sorting algorithm that places an Suppose we need to sort the following array.
unsorted element at its suitable place in each iteration.

• Insertion sort works similarly to we sort cards in our The first element in the array is assumed to be
hands in a card game. sorted. Take the second element and store it
separately in key
• We assume that the first card is already sorted then, Compare the key with the first element. If the first
we select an unsorted card. If the unsorted card is element is greater than the key, then the key is
greater than the card in hand, it is placed on the right placed in front of the first element.
otherwise, to the left. In the same way, other unsorted
cards are taken and put in their right place.

• A similar approach is used by insertion sort.


Characteristics of Insertion Sort:

•This algorithm is one of the simplest algorithms with


simple implementation

•Basically, Insertion sort is efficient for small data values

•Insertion sort is adaptive in nature, i.e. it is appropriate for


data sets that are already partially sorted.
Sorting Techniques: Insertion Sort (cont.) 3. Similarly, place every unsorted element in its
2. Now, the first two elements are sorted. correct position.

Take the third element and compare it with the


elements on the left of it. Placed it just behind the
element smaller than it. If there is no element
smaller than it, then place it at the beginning of the
array.
Sorting Techniques: Insertion Sort (cont.)
// Insertion sort in C
void printArray(int array[], int size) // Function to print an array
{
for (int i = 0; i < size; i++) // Driver code
{ int main()
printf("%d ", array[i]); {
} int data[] = {9, 5, 1, 4, 3};
printf("\n"); int size = sizeof(data) / sizeof(data[0]);
} insertionSort(data, size);
void insertionSort(int array[], int size) printf("Sorted array in ascending order:\n");
{ printArray(data, size);
for (int step = 1; step < size; step++) }
{
int key = array[step]; int j = step - 1;
// Compare key with each element on the left of it until an element smaller than it is found.
// For descending order, change key<array[j] to key>array[j].
while (key < array[j] && j >= 0)
{
array[j + 1] = array[j]; --j;
}
array[j + 1] = key;
}
}
Sorting Techniques: Insertion Sort (cont.)

# Insertion sort in Python


def insertionSort(array):
for step in range(1, len(array)):
key = array[step]
j = step - 1

# Compare key with each element on the left of it until an


element smaller than it is found
# For descending order, change key<array[j] to key>array[j].
while j >= 0 and key < array[j]:
array[j + 1] = array[j]
j=j-1

# Place key at after the element just smaller than it.


array[j + 1] = key

data = [9, 5, 1, 4, 3]
insertionSort(data)
print('Sorted Array in Ascending Order:')
print(data)
Sorting Techniques: Bubble Sort 1. First Iteration (Compare and Swap)
Bubble Sort is the simplest sorting algorithm that
 Starting from the first index, compare the first and the
works by repeatedly swapping the adjacent second elements.
elements if they are in the wrong order. This  If the first element is greater than the second element,
algorithm is not suitable for large data sets as its they are swapped.
average and worst-case time complexity is quite  Now, compare the second and the third elements. Swap
high. them if they are not in order.
The above process goes on until the last element.
2. Remaining Iteration: The same process goes on for
the remaining iterations. After each iteration, the largest
element among the unsorted elements is placed at the
end.

Working of Bubble Sort: Suppose we are trying


to sort the elements in ascending order.
Sorting Techniques: Bubble The array is sorted when all the unsorted elements are
Sort
In each iteration, the comparison takes place up placed at their correct positions.
to the last unsorted element.
// Bubble sort in C
// print array
void printArray(int array[], int size)
void bubbleSort(int array[ ], int size) // perform the bubble sort
{
{
for (int i = 0; i < size; ++i)
for (int step = 0; step < size - 1; ++step) // loop to access each array element
{
{
printf("%d ", array[i]);
for (int i = 0; i < size - step - 1; ++i) // loop to compare array elements
}
{
printf("\n");
//compare two adjacent elements change > to < to sort in descending order
}
if (array[i] > array[i + 1])
{
int main()
// swapping occurs if elements are not in the intended order
{
int temp = array[i];
int data[] = {-2, 45, 0, 11, -9};
array[i] = array[i + 1];
// find the array's length
array[i + 1] = temp;
int size = sizeof(data) / sizeof(data[0]);
}
}
bubbleSort(data, size);
}
}
printf("Sorted Array in Ascending Order:\n");
printArray(data, size);
}
Another Illustration:
# Bubble sort in Python
def bubbleSort(array):

for i in range(len(array)): # loop to access each


array element
for j in range(0, len(array) - i - 1): # loop to
compare array elements
# compare two adjacent elements
# change > to < to sort in descending order
if array[j] > array[j + 1]:

# swapping elements if elements


# are not in the intended order
temp = array[j]
array[j] = array[j+1]
array[j+1] = temp

data = [-2, 45, 0, 11, -9]


bubbleSort(data)
print('Sorted Array in Ascending Order:')
print(data)
Sorting Techniques: Selection Sort
In selection sort, the smallest value among the unsorted elements of the array is
selected in every pass and inserted to its appropriate position into the array. It is also
the simplest algorithm. It is an in-place comparison sorting algorithm.

In this algorithm, the array is divided into two parts, first is sorted part, and another
one is the unsorted part. Initially, the sorted part of the array is empty, and unsorted
part is the given array. Sorted part is placed at the left, while the unsorted part is
placed at the right.

In selection sort, the first smallest element is selected from the unsorted array and
placed at the first position. After that second smallest element is selected and placed
in the second position. The process continues until the array is entirely sorted.

Selection sort is generally used when –


•A small array is to be sorted
•Swapping cost doesn't matter
•It is compulsory to check all elements
Sorting Techniques: Selection Sort

1. Set the first element as minimum

2. Compare minimum with the second


element. If the second element is smaller
than the minimum, assign the second
element as minimum

Compare the minimum with the third element.


Again, if the third element is smaller, then
assign that minimum to the third element
otherwise do nothing. The process goes on until
the last element.
Sorting Techniques: Selection Sort (cont.): Working of Selection Sort
3. After each iteration, the minimum is placed in the
front
of the unsorted list.
4. For each iteration, indexing starts from the first
unsorted element. Step 1 to 3 are repeated until all the
elements are placed at their correct positions.
Sorting Techniques: Selection Sort (cont.): Working of Selection Sort
Sorting Techniques: Selection Sort (cont.): Code
void swap(int *a, int *b) void printArray(int array[], int size) # Selection sort in Python
{ { def selectionSort(array, size):
int temp = *a; for (int i = 0; i < size; ++i) for step in range(size):
*a = *b; { min_idx = step
*b = temp; printf("%d ", array[i]); for i in range(step + 1, size):
} } # to sort in descending order, change > to < in this line
printf("\n"); # select the minimum element in each loop
void selectionSort(int array[], int size) } if array[i] < array[min_idx]:
{ int main() min_idx = i
for (int step = 0; step < size - 1; step++) { # put min at the correct position
{ int data[] = {20, 12, 10, 15, 2}; (array[step], array[min_idx]) = (array[min_idx], array[step])
int min_idx = step; int size = sizeof(data) / sizeof(data[0]);
for (int i = step + 1; i < size; i++) data = [-2, 45, 0, 11, -9]
{ selectionSort(data, size); size = len(data)
if (array[i] < array[min_idx]) selectionSort(data, size)
min_idx = i; printf("Sorted array in Acsending Order:\n"); print('Sorted Array in Ascending Order:')
} printArray(data, size); print(data)
swap(&array[min_idx], &array[step]); }
}
}
Sorting Techniques: Selection Sort (cont.)
The time complexity of the selection
sort is the same in all cases. At
every step, you have to find the
minimum element and put it in the
right place. The minimum element is
not known until the end of the array
is not reached.
Space Complexity:

Space complexity is O(1) because


an extra variable temp is used
Selection Sort Applications

The selection sort is used when

•a small list is to be sorted

•cost of swapping does not matter

•checking of all the elements is


compulsory
Sorting Techniques: Merge •At first, check if the left index of an array is less than the
Sort right index, if yes then calculate its midpoint
Merge Sort is one of the most popular sorting
algorithms that is based on the principle of the Divide
and Conquer strategy. Here, a problem is divided into
multiple sub-problems. Each sub-problem is solved
individually. Finally, sub-problems are combined to form
the final solution. •Now, as we already know that merge sort first divides
Merge Sort Working Process: the whole array iteratively into equal halves, unless the
Think of it as a recursive algorithm continuously splits the atomic values are achieved.
array in half until it cannot be further divided. This means •Here, we see that an array of 7 items is divided into
that if the array becomes empty or has only one element two arrays of size 4 and 3 respectively.
left, the dividing will stop, i.e. it is the base case to stop
the recursion.

If the array has multiple elements, split the array into •Now, again find that is left index is less than the right
halves and recursively invoke the merge sort on each of index for both arrays, if found yes, then again calculate
the halves. Finally, when both halves are sorted, the mid points for both the arrays.
merge operation is applied. Merge operation is the
process of taking two smaller sorted arrays and
combining them to eventually make a larger one.

Illustration: To know the functioning of merge sort, lets


consider an array arr[] = {38, 27, 43, 3, 9, 82, 10}
Sorting Techniques: Merge Sort •After the final merging, the list looks like this:
(cont.)
•Now, further divide these two arrays into further halves,
until the atomic units of the array is reached and further
division is not possible.

The following diagram shows the complete merge


sort process for an example array {38, 27, 43, 3, 9,
82, 10}.
If we take a closer look at the diagram, we can see
that the array is recursively divided into two halves till
•After dividing the array into smallest units, start merging the size becomes 1. Once the size becomes 1, the
the elements again based on comparison of size of merge processes come into action and start merging
elements arrays back till the complete array is merged.
•Firstly, compare the element for each list and then
combine them into another list in a sorted manner.
/* C program for Merge Sort */
/* Merges two subarrays of arr[]. /* Copy the remaining elements of L[], if there are any */ /* UTILITY FUNCTIONS */
First subarray is arr[l..m]. while (i < n1) /* Function to print an array */
Second subarray is arr[m+1..r]*/ { void printArray(int A[], int size)
arr[k] = L[i]; i++; k++; {
void merge(int arr[], int l, int m, int r) } int i;
{ for (i = 0; i < size; i++)
int i, j, k; int n1 = m - l + 1; int n2 = r - m; /* Copy the remaining elements of R[], if there are any */ printf("%d ", A[i]);
int L[n1], R[n2]; /* create temp arrays */ while (j < n2) printf("\n");
{ }
/* Copy data to temp arrays L[] and R[] */ arr[k] = R[j]; j++; k++;
for (i = 0; i < n1; i++) L[i] = arr[l + i]; } /* Driver code */
for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j]; } int main()
{
/* Merge the temp arrays back into arr[l..r]*/ /* l is for left index and r is right index of the sub-array of int arr[] = { 12, 11, 13, 5, 6, 7 };
i = 0; // Initial index of first subarray arr to be sorted */ int arr_size = sizeof(arr) / sizeof(arr[0]);
j = 0; // Initial index of second subarray
k = l; // Initial index of merged subarray void mergeSort(int arr[], int l, int r) printf("Given array is \n");
while (i < n1 && j < n2) { printArray(arr, arr_size);
{ if (l < r)
if (L[i] <= R[j]) { mergeSort(arr, 0, arr_size - 1);
{ // Same as (l+r)/2, but avoids overflow for large l and h
arr[k] = L[i]; i++; int m = l + (r - l) / 2; printf("\nSorted array is \n");
} printArray(arr, arr_size);
else // Sort first and second halves return 0;
{ mergeSort(arr, l, m); }
arr[k] = R[j]; j++; mergeSort(arr, m + 1, r);
} merge(arr, l, m, r);
k++; }
} }
# Python program for implementation of MergeSort # Code to print the list
def mergeSort(arr): def printList(arr):
if len(arr) > 1: for i in range(len(arr)):
mid = len(arr)//2 print(arr[i], end=" ")
L = arr[:mid] print()
R = arr[mid:] # Driver Code
# Sorting the first half if __name__ == '__main__':
mergeSort(L) arr = [12, 11, 13, 5, 6, 7]
# Sorting the second half print("Given array is", end="\n")
mergeSort(R) printList(arr)
i=j=k=0 mergeSort(arr)
# Copy data to temp arrays L[] and R[] print("Sorted array is: ", end="\n")
while i < len(L) and j < len(R): printList(arr
if L[i] < R[j]:
arr[k] = L[i]
i += 1
else:
arr[k] = R[j]
j += 1
k += 1
# Checking if any element was left
while i < len(L):
arr[k] = L[i]
i += 1
k += 1
while j < len(R):
arr[k] = R[j]
j += 1
k += 1
)
Time Complexity: O(N log(N)), Sorting arrays on
different machines. Merge Sort is a recursive algorithm
and time complexity can be expressed as following
recurrence relation.
T(n) = 2T(n/2) + θ(n)

The above recurrence can be solved either using the


Recurrence Tree method or the Master method. It falls in case
II of the Master Method and the solution of the recurrence is
θ(Nlog(N)).

The time complexity of Merge Sort isθ(Nlog(N)) in all 3 cases


(worst, average, and best) as merge sort always divides the
array into two halves and takes linear time to merge two
halves.

Auxiliary Space: O(n), In merge sort all elements are copied


into an auxiliary array. So N auxiliary space is required for
merge sort.
Sorting Techniques: Quick
Sort
Like merge sort, quick sort is also based on the divide
and conquer approach.

It picks an element as a pivot and partitions the given


array around the picked pivot.

There are many different versions of the quick sort that


pick pivot in different ways.

•Always pick the first element as a pivot.


•Always pick the last element as a pivot (implemented
below) Partition Algorithm:
•Pick a random element as a pivot. There can be many ways to do partition, following
•Pick median as the pivot. pseudo-code adopts the method given in the CLRS
book. The logic is simple, we start from the leftmost
The key process in quick sort is a partition(). The target element and keep track of the index of smaller (or
of partitions is, given an array and an element x of an equal to) elements as i. While traversing, if we find a
array as the pivot, put x at its correct position in a sorted smaller element, we swap the current element with
array and put all smaller elements (smaller than x) before arr[i]. Otherwise, we ignore the current element.
x, and put all greater elements (greater than x) after x. All
Sorting Techniques: Working of Quick Sort
(cont.)
Let the elements of an array are

Now, a[left] = 19, a[right] = 24, and a[pivot] = 24.


In the given array, we consider the leftmost element Since, the pivot is at the right, so algorithm starts
as the pivot. from the left and moves to the right
So, in this case, a[left] = 24, a[right] = 27 and a[pivot]
As a[pivot] > a[left], so algorithm moves one
= 24.
position to right as
Since, the pivot is at the left, so algorithm starts from
the right and moves toward left.

Now, a[left] = 9, a[right] = 24, and a[pivot] = 24. As


- a[pivot] > a[left], so algorithm moves one position
Now, a[pivot] < a[right], so algorithm moves forward to right as -
one position towards left, i.e. -

Now, a[left] = 29, a[right] = 24, and a[pivot] = 24. As


a[pivot] < a[left], so, swap a[pivot] and a[left], now
pivot is at left, i.e. -
Now, a[left] = 24, a[right] = 19, and a[pivot] = 24.
Because, a[pivot] > a[right], so, algorithm will swap
a[pivot] with a[right], and pivot moves to right, as -
Sorting Techniques: Working of Quick Sort Element 24, which is the pivot element is placed at its
(cont.)
Since, pivot is at left, so algorithm starts from right, and exact position.
move to left. Now, a[left] = 24, a[right] = 29, and Elements that are right side of element 24 are greater
a[pivot] = 24. As a[pivot] < a[right], so algorithm moves than it, and the elements that are left side of element 24
one position to left, as - are smaller than it.

Now, a[pivot] = 24, a[left] = 24, and a[right] = 14. As Now, in a similar manner, quick sort algorithm is
a[pivot] > a[right], so, swap a[pivot] and a[right], now separately applied to the left and right sub-arrays. After
pivot is at right, i.e. - sorting gets done, the array will be -

Now, a[pivot] = 24, a[left] = 14, and a[right] = 24. Pivot is


at right, so the algorithm starts from left and move to right.
Case Time Complexity
Best Case O(n*logn)
Average Case O(n*logn)
2
Now, a[pivot] = 24, a[left] = 24, and a[right] = 24. So, Worst Case O(n )
pivot, left and right are pointing the same element. It
represents the termination of procedure.
// Quick sort in C void quickSort(int array[], int low, int high)
void swap(int *a, int *b) // function to swap elements {
{ if (low < high)
int t = *a; *a = *b; *b = t; {
} // find the pivot element such that elements smaller than pivot are on
// left of pivot elements greater than pivot are on right of pivot
// function to find the partition position int pi = partition(array, low, high);
int partition(int array[], int low, int high) quickSort(array, low, pi - 1); // recursive call on the left of pivot
{ quickSort(array, pi + 1, high); // recursive call on the right of pivot
int pivot = array[high]; // select the rightmost element as pivot }
int i = (low - 1); // pointer for greater element }

// traverse each element of the array and compare them with the pivot void printArray(int array[], int size) // function to print array elements
for (int j = low; j < high; j++) {
{ for (int i = 0; i < size; ++i)
if (array[j] <= pivot) {
{ printf("%d ", array[i]);
// if element smaller than pivot is found }
// swap it with the greater element pointed by i printf("\n");
i++; }
// swap element at i with element at j
swap(&array[i], &array[j]); int main()
} {
} int data[] = {8, 7, 2, 1, 0, 9, 6}; int n = sizeof(data) / sizeof(data[0]);
printf("Unsorted Array\n"); printArray(data, n);
// swap the pivot element with the greater element at i quickSort(data, 0, n - 1); // perform quicksort on data
swap(&array[i + 1], &array[high]); printf("Sorted array in ascending order: \n");
return (i + 1); // return the partition point printArray(data, n);
} }
# Quick sort in Python # function to perform quicksort
def quickSort(array, low, high):
# function to find the partition position if low < high:
def partition(array, low, high):
# find pivot element such that
# choose the rightmost element as pivot # element smaller than pivot are on the left
pivot = array[high] # element greater than pivot are on the right
pi = partition(array, low, high) Quicksort Applications
# pointer for greater element
i = low - 1 # recursive call on the left of pivot The quicksort algorithm is
quickSort(array, low, pi - 1) used when
# traverse through all elements
# compare each element with pivot # recursive call on the right of pivot •the programming language is
for j in range(low, high): quickSort(array, pi + 1, high) good for recursion
if array[j] <= pivot: •time complexity matters
# if element smaller than pivot is found •space complexity matters
# swap it with the greater element pointed by i data = [8, 7, 2, 1, 0, 9, 6]
i=i+1 print("Unsorted Array")
print(data)
# swapping element at i with element at j
(array[i], array[j]) = (array[j], array[i]) size = len(data)

# swap the pivot element with the greater element specified by i quickSort(data, 0, size - 1)
(array[i + 1], array[high]) = (array[high], array[i + 1])
print('Sorted Array in Ascending Order:')
# return the position from where partition is done print(data)
return i + 1
void quicksort(int number[25],int first,int last)
{ int main()
int i, j, pivot, temp; {
if(first<last) int i, count, number[25];
{ clrscr();
pivot=first; i=first; j=last; printf("How many elements are u going to enter?: ");
while(i<j) scanf("%d",&count);
{
while(number[i]<=number[pivot]&&i<last) printf("Enter %d elements: ", count);
i++; for(i=0;i<count;i++)
while(number[j]>number[pivot]) scanf("%d",&number[i]);
j--;
if(i<j) quicksort(number,0,count-1);
{
temp=number[i]; printf("Order of Sorted elements: ");
number[i]=number[j]; for(i=0;i<count;i++)
number[j]=temp; printf(" %d",number[i]);
} getch();
} return 0;
}
temp=number[pivot];
number[pivot]=number[j];
number[j]=temp;
quicksort(number,first,j-1);
quicksort(number,j+1,last);

}
}
Shrada 116, 158, 147,117, 124, 163, 164, 142,
136, 143, 128, 120, 155, 125, 101, 138, 146, 111,
131, 156, abs on 4th sep ds first hour

sep 4 dbms lab B1 151,


163,143,112,120,155,125,101,138,146,129,154,1
11,131,103
UNIT-1
Overview of Python, Concept of Class, and objects; NumPy: The Basics of NumPy Arrays,
Aggregations; Pandas: Pandas Objects, Data Indexing and Selection; Visualisation: Simple Line Plots,
Simple Scatter Plots, Histograms, Binnings, and Density.

OVER VEIW OF PYTHON:


Python is a high-level, interpreted programming language known for its simplicity and readability.
It was created by Guido van Rossum and first released in 1991. Python has gained immense
popularity in various domains, including web development, data science, artificial intelligence,
scientific computing, and more.
Some features:
1. Readability: Python's syntax emphasizes code readability, making it easier for developers
to write and maintain code.
2. Versatility: Python is a versatile language with a vast standard library and numerous third-
party packages, making it suitable for a wide range of applications.
3. Interpreted: Python is an interpreted language, which means you can write and execute
code without a separate compilation step.
4. Object-Oriented: Python is an object-oriented programming (OOP) language, which
means it supports the creation and manipulation of objects, providing a clean and organized
way to structure code.
All ready we know the oops concepts they are 5 lets discuss it:
1) Class:
A class is a blueprint or a template for creating objects. It defines the attributes (data
members) and behaviors (methods) that its objects will have. Classes are used to model
real-world entities and encapsulate their properties and actions into a single unit. In Python,
you can define a class using the class keyword.

Syntax:
class ClassName:
# Class attributes (optional)
class_attribute = "I am a class attribute"
Example:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age

def greet(self):
print(f"Hello, my name is {self.name} and I am {self.age} years old.")
# Creating an instance of the Person class
john = Person("John Doe", 30)
# Accessing attributes and calling methods
print(john.name) # Output: John Doe
print(john.age) # Output: 30
john.greet() # Output: Hello, my name is John Doe and I am 30 years old.

In this example, we define a class Person with an __init__ method (which is called a constructor)
that initializes the name and age attributes. We also define a greet method which prints a greeting
using the attributes.
2) Objects:
An object is an instance of a class. It is a concrete realization of the class blueprint, with
its own set of attributes and the ability to perform actions defined by the class's methods.
You can create multiple objects from the same class, and each object will have its unique
state.
Syntax:
# Creating objects (instances) of the class
object1 = ClassName(arg1, arg2)
object2 = ClassName(arg3, arg4)
“In the example above, john is an object of the Person class.it has its own name and age
attributes.”
Remaining Key OOP Concepts:
3) Inheritance:
Classes can inherit attributes and methods from other classes, allowing for code reuse and
the creation of hierarchies.
4) Encapsulation:
Encapsulation refers to the practice of bundling the data (attributes) and the methods that
operate on that data into a single unit (class). It helps in data hiding and maintaining the
integrity of the object.
5) Polymorphism:
Polymorphism allows objects of different classes to be treated as objects of a common
superclass. This concept enables flexibility and extensibility in your code.
Differences between the class and objects:

Aspect Class Object (Instance)

A blueprint or template for


Definition creating objects. A concrete instance created from a class.

Defines attributes and methods Represents a specific item or entity with its
Purpose that objects of the class will have. own data and behavior.

my_car = Car() or
Example class Car: person1=person(“rakesh”)
Describes what data the objects
Attributes will store. Contains actual data specific to the object.

Describes what actions objects can Allows the object to execute specific
Methods perform. behaviors.

Multiple You can create multiple objects Each object is a distinct instance with its own
Instances from the same class. data.

Numpy:
NumPy (Numerical Python) is a fundamental package for numerical computations in Python. It
provides support for arrays (both 1-dimensional and multi-dimensional), as well as a large
collection of high-level mathematical functions to operate on these arrays.
Or
NumPy is a fundamental library in Python used for numerical and scientific computing. It provides
support for large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays. NumPy is widely used in fields like data analysis, machine
learning, and scientific research due to its efficiency and versatility.

*******import NumPy using np as an alias: import numpy as np**************


Simple syntax:
import numpy as np # Importing NumPy with an alias
my_list = [1, 2, 3, 4, 5] # Creating a NumPy array from a list
my_array = np.array(my_list)
print(my_array)
we can use np.array to create arrays from Python lists:
integer array:
np.array([1, 4, 2, 5, 3])
Out[8]: array([1, 4, 2, 5, 3])

np.array([3.14, 4, 2, 3])
Out[9]: array([ 3.14, 4. , 2. , 3. ])

nested lists result in multidimensional arrays

np.array([range(i, i + 3) for i in [2, 4, 6]])


Out[11]: array([[2, 3, 4],
[4, 5, 6],
[6, 7, 8]])

In[12]: np.zeros(10, dtype=int) # Create a length-10 integer array filled with zeros

Out[12]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In[13]: np.ones((3, 5), dtype=float) # Create a 3x5 floating-point array filled with 1s
Out[13]: array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])

# Create a 3x5 array filled with 3.14


In[14]: np.full((3, 5), 3.14)
Out[14]: array([[ 3.14, 3.14, 3.14, 3.14, 3.14],
[ 3.14, 3.14, 3.14, 3.14, 3.14],
[ 3.14, 3.14, 3.14, 3.14, 3.14]])

Standard numpy data types:

type Description
• bool_ Boolean (True or False) stored as a byte
• int_ Default integer type (same as C long; normally either int64 or int32)
• intc Identical to C int (normally int32 or int64)
• intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
• int8 Byte (–128 to 127)
• Int16 Integer (–32768 to 32767)
• int32 Integer (–2147483648 to 2147483647)
• int64 Integer (–9223372036854775808 to 9223372036854775807)
• uint8 Unsigned integer (0 to 255)
• uint16 Unsigned integer (0 to 65535)
• uint32 Unsigned integer (0 to 4294967295)
• Uint64 Unsigned integer (0 to 18446744073709551615)
• float_ Shorthand for float64
• float16 Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa

The Basics of NumPy Arrays

1) Attributes of arrays
Definition: Determining the size, shape, memory consumption, and data types of arrays
Syntax:
“array_name.attribute_name”

Example 1:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Getting the shape of the array
dtype = arr.dtype # Getting the data type of the elements
size = arr.size # Number of elements in the array
nbytes = arr.nbytes # Total memory consumption

Explanation:
 shape returns a tuple representing the dimensions of the array (e.g., (2, 3) for a 2x3 array).
 dtype returns the data type of the elements in the array (e.g., int64 for 64-bit integers).

Example 2:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape) # Output: (2, 3)


print(arr.ndim) # Output: 2
print(arr.size) # Output: 6
print(arr.dtype) # Output: int64
2) Indexing of arrays
Definition: Getting and setting the value of individual array elements
Syntax:
array_name[index]

Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
element = arr[2] # Accessing the third element (index 2)

Explanation:
 You can access individual elements of the array by specifying their index inside square
brackets.

Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr[2]) # Output: 3

3) Slicing of arrays
Definition: Getting and setting smaller subarrays within a larger array
Syntax:
array_name[start:stop:step]

Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
subset = arr[1:4] # Slicing to get elements from index 1 to 3

Explanation:
 Slicing allows you to extract a portion of an array based on a start index, stop index
(exclusive), and an optional step size.

Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr[1:4]) # Output: [2 3 4]
4) Reshaping of arrays
Definition: Changing the shape of a given array
Syntax:
array_name.reshape(new_shape)

Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape((2, 3)) # Reshaping into a 2x3 array

Explanation:
 reshape() changes the shape of the array to the specified new_shape. The total number of
elements in the original and reshaped arrays must be the same.

Example 2:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
reshaped_arr = arr.reshape((3, 2))
print(reshaped_arr)
# Output:
# [[1 2]
# [3 4]
# [5 6]]

5) Joining and splitting of arrays


Definition: Combining multiple arrays into one, and splitting one array into many

Joining Arrays:
Syntax:
np.concatenate((array1, array2), axis=0)

Example 1:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
concatenated_arr = np.concatenate((arr1, arr2), axis=0) # Concatenating along the
first axis

Explanation:
 np.concatenate() combines two or more arrays along a specified axis. In the example, we
concatenate two 1-dimensional arrays along the first axis (rows).
Example 2:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = np.concatenate((arr1, arr2))
print(result) # Output: [1 2 3 4 5 6]

Splitting Arrays:
Syntax:
np.split(array, indices_or_sections, axis=0)

Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 2) # Splitting into two equal parts

Explanation:
 np.split() splits an array into multiple subarrays along a specified axis. In the example, we
split a 1-dimensional array into two equal parts.

Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
result = np.split(arr, 3)
print(result)
# Output:
# [array([1, 2]), array([3, 4]), array([5, 6])]
Aggregations:
There are so many functions in aggregation like:
Min(),Max(),sum(),avgmin(),avgMax(),percentile (Q1,Q2,Q3),

Aggregation functions in NumPy allow you to perform calculations on arrays to summarize their
data. They are useful for gaining insights into data, extracting statistical information, and more.
Here are some common aggregation functions along with examples:

***important_Note: The way the axis is specified here can be confusing to users coming from
other languages. The axis keyword specifies the dimension of the array that will be collapsed,
rather than the dimension that will be returned. So specifying axis=0 means that the first axis will
be collapsed: for two-dimensional arrays, this means that values within each column will be
aggregated.***
1. np.sum()
Syntax:
np.sum(array,axis =None)
Example:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
total_sum = np.sum(arr) # Sum of all elements in the array
print(total_sum) #output =21
Use:
 np.sum() computes the sum of all elements in an array.
 It can also calculate the sum along a specific axis (e.g., rows or columns) by specifying the
axis parameter.
2. np.mean()
Syntax:
np.mean(array, axis=None)
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
average = np.mean(arr) # Average of the elements
print (average) #output 3.0
Use:
 np.mean() calculates the mean (average) of the elements in an array.
 Similar to np.sum(), it can compute the mean along a specified axis.

3. np.min() and np.max()


Syntax:
np.min(array, axis=None) np.max(array, axis=None)
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
min_val = np.min(arr) # Minimum value in the array
max_val = np.max(arr) # Maximum value in the array
print (min_val,max_val) #output: 1 5

Use:
 np.min() finds the minimum value in an array.
 np.max() finds the maximum value in an array.
 Both functions can also work along a specific axis.
4. np.median()
Syntax:
np.median(array, axis=None)
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
median = np.median(arr) # Median of the elements
print(median) #output : 3.0
Use:
 np.median() calculates the median of the elements in an array.
 Median is the middle value in a sorted list of numbers and is useful for understanding the
central tendency of data.
5.np.argmin() and np.argmax()
These functions return the indices of the minimum and maximum values in an array, respectively.
Syntax:
np.argmin(array)
np.argmax(array)
Example:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
min_index = np.argmin(arr) # Index of the minimum value
max_index = np.argmax(arr) # Index of the maximum value
print(f"Index of the minimum value: {min_index}")
print(f"Index of the maximum value: {max_index}")
output:
Index of the minimum value: 1
Index of the maximum value: 5

6.np.ptp()
The peak-to-peak (ptp) function calculates the range of values (maximum - minimum) in an array.
Syntax:
np.ptp(array)
Example:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
range_val = np.ptp(arr) # Range of values
print(f"Range of values: {range_val}")
output:
Range of values: 8
7.np.percentile()
This function calculates the nth percentile of an array, which is a value below which a given
percentage of the data falls.
Syntax:
np.percentile(array, percentile)
Example:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
percentile_25 = np.percentile(arr, 25) # 25th percentile
percentile_75 = np.percentile(arr, 75) # 75th percentile
print(f"25th percentile: {percentile_25}")
print(f"75th percentile: {percentile_75}")
output:
25th percentile: 2.5
75th percentile: 5.0

Aggregation functions in NumPy are essential for summarizing data and conducting statistical
analyses. They help in understanding the central tendency, spread, and distribution of data, making
them valuable tools in data analysis, scientific research, and various numerical computations.

8.Multidimensional aggregates
One common type of aggregation operation is an aggregate along a row or column.
Say you have some data stored in a two-dimensional array:
In[9]: M = np.random.random((3, 4))
print(M)
out[9]:[[ 0.8967576 0.03783739 0.75952519 0.06682827]
[ 0.8354065 0.99196818 0.19544769 0.43447084]
[ 0.66859307 0.15038721 0.37911423 0.6687194 ]]
A sample program on Aggregation
import numpy as np

# Sample daily temperature data for a week (in Celsius)


temperature_data = np.array([25, 26, 27, 24, 23, 26, 28,29,31,33])

# Explore the dataset using aggregation functions


min_temp = np.min(temperature_data)
max_temp = np.max(temperature_data)
range_temp = np.ptp(temperature_data)
median_temp = np.median(temperature_data)
percentile_25 = np.percentile(temperature_data, 25)
percentile_75 = np.percentile(temperature_data, 75)

# Display the results


print(f"Minimum temperature: {min_temp}°C")
print(f"Maximum temperature: {max_temp}°C")
print(f"Temperature range: {range_temp}°C")
print(f"Median temperature: {median_temp}°C")
print(f"25th percentile temperature: {percentile_25}°C")
print(f"75th percentile temperature: {percentile_75}°C")

output:
Minimum temperature: 23°C
Maximum temperature: 33°C
Temperature range: 10°C
Median temperature: 26.5°C
25th percentile temperature: 25.25°C
75th percentile temperature: 28.75°C
PANDAS:
Pandas is an open-source data manipulation and analysis library for Python. It provides easy-to-
use data structures (Pandas objects) and data analysis tools, making it one of the most popular
libraries for working with structured data, such as spreadsheets or SQL tables.
Key features of Pandas include:
 Data structures: Pandas offers two primary data structures: Series (for 1D data) and
DataFrame (for 2D data).
 Data cleaning and preparation: Pandas allows you to clean, transform, and prepare data
for analysis.
 Data analysis: You can perform various data analysis tasks, including aggregation,
grouping, filtering, and more.
 Data visualization: Pandas can work seamlessly with data visualization libraries like
Matplotlib and Seaborn.
At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays
in which the rows and columns are identified with labels rather than simple integer indices. As we will see
during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top
of the basic data structures, but nearly everything that follows will require an understanding of what these
structures are. Thus, before we go any further, let's introduce these three fundamental Pandas data
structures: the Series, DataFrame, and Index.

We will start our code sessions with the standard NumPy and Pandas imports:

In [1]:import numpy as np
import pandas as pd
series = pd.Series(data, index=index)

Series:
 A Series is a one-dimensional array-like object that can hold various data types.
 It is similar to a column in a spreadsheet or a single column in a SQL table.
Syntax:
import pandas as pd
series = pd.Series(data, index=index)
In [17]:pd.Series({2:'a', 1:'b', 3:'c'})
Out[16]: 2 a
1 b
3 c
dtype: object
In each case, the index can be explicitly set if a different result is preferred:
In [17]:pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])
Out[17]:3 c
2 a
dtype: object

EXAMPLE 1: ## Series as generalized NumPy array


In [7]:data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data
Out[7]:
a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64
And the item access works as expected:

In [8]:data['b']
Out[8]:0.5
We can even use non-contiguous or non-sequential indices:

In [9]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=[2, 5, 3, 7])
data
Out[9]:
2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64

In [10]:data[5]
Out[10]:0.5

EXAMPLE 2:
import pandas as pd
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
output:
0 1
1 2
2 3
3 4
4 5
dtype: int64
Example 3:
In [2]:data = pd.Series([0.25, 0.5, 0.75, 1.0])
data
Out[2]:
0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64
As we see in the output, the Series wraps both a sequence of values and a sequence of indices,
which we can access with the values and index attributes. The values are simply a familiar
NumPy array:

In [3]:data.values
Out[3]:array([ 0.25, 0.5 , 0.75, 1. ])
The index is an array-like object of type pd.Index, which we'll discuss in more detail
momentarily.

In [4]:data.index
Out[4]:RangeIndex(start=0, stop=4, step=1)
Like with a NumPy array, data can be accessed by the associated index via the familiar Python
square-bracket notation:

In [5]:data[1]
Out[5]:0.5
In [6]:data[1:3]
Out[6]:1 0.50
2 0.75
dtype: float64
As we will see, though, the Pandas Series is much more general and flexible than the one-
dimensional NumPy array that it emulates.
Example 4: ##Series as specialized dictionary
In [11]:population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population

Out[11]:
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
By default, a Series will be created where the index is drawn from the sorted keys. From here,
typical dictionary-style item access can be performed:

In [12]:population['California']
Out[12]:38332521
Unlike a dictionary, though, the Series also supports array-style operations such as slicing:

In [13]:population['California':'Illinois']
Out[13]:
California 38332521
Florida 19552860
Illinois 12882135
dtype: int64
DataFrame:
 A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular
data structure.
 It is similar to a spreadsheet or a SQL table.
Syntax:
import pandas as pd
df = pd.DataFrame(data, columns=columns)

example 1:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}


df = pd.DataFrame(data)
print(df)

output:
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35

Example 2:
import pandas as pd

data = {'Name': ['John', 'Jane', 'Jim', 'Jill'],


'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)

print(df)

output:
Name Age
0 John 25
1 Jane 30
2 Jim 35
3 Jill 40
***Note: If a Series is an analog of a one-dimensional array with flexible indices, a DataFrame is an
analog of a two-dimensional array with both flexible row indices and flexible column names. Just as you
might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you
can think of a DataFrame as a sequence of aligned Series objects. Here, by "aligned" we mean that they share
the same index.

Example 3:
in[18]:area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area
Out[18]:
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64
Now that we have this along with the population Series from before, we can use a dictionary to
construct a single two-dimensional object containing this information:

In [19]:states = pd.DataFrame({'population': population,


'area': area})
states
Out[19]:

area population

California 423967 38332521

Florida 170312 19552860

Illinois 149995 12882135

New York 141297 19651127

Texas 695662 26448193


Like the Series object, the DataFrame has an index attribute that gives access to the index labels:

In [20]:states.index
Out[20]:
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column
labels:
In [21]: states.columns
Out[21]:Index(['area', 'population'], dtype='object')
Thus the DataFrame can be thought of as a generalization of a two-dimensional NumPy array,
where both the rows and columns have a generalized index for accessing the data.

****Similarly, we can also think of a DataFrame as a specialization of a dictionary. Where a


dictionary maps a key to a value, a DataFrame maps a column name to a Series of column data. For
example, asking for the 'area' attribute returns the Series object containing the areas we saw earlier:

In [22]:states['area']
Out[22]:
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64
Notice the potential point of confusion here: in a two-dimesnional NumPy array, data[0] will
return the first row. For a DataFrame, data['col0'] will return the first column. Because of this, it is
probably better to think about DataFrames as generalized dictionaries rather than generalized
arrays, though both ways of looking at the situation can be useful. We'll explore more flexible
means of indexing DataFrames in Data Indexing and Selection.

## From a two-dimensional NumPy array

Given a two-dimensional array of data, we can create a DataFrame with any specified column and
index names. If omitted, an integer index will be used for each:

In [27]:pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])
Out[27]:

foo bar

a 0.865257 0.213169

b 0.442759 0.108267

c 0.047110 0.905718
The Pandas Index Object

We have seen here that both the Series and DataFrame objects contain an explicit index that lets you
reference and modify data. This Index object is an interesting structure in itself, and it can be
thought of either as an immutable array or as an ordered set (technically a multi-set,
as Index objects may contain repeated values). Those views have some interesting consequences in
the operations available on Index objects. As a simple example, let's construct an Index from a list
of integers:

In [30]:ind = pd.Index([2, 3, 5, 7, 11])


ind
Out[30]:Int64Index([2, 3, 5, 7, 11], dtype='int64')

## Index as immutable array

The Index in many ways operates like an array. For example, we can use standard Python
indexing notation to retrieve values or slices:

In [31]:ind[1]
Out[31]:3

In [32]:ind[::2]
Out[32]:Int64Index([2, 5, 11], dtype='int64')

Index objects also have many of the attributes familiar from NumPy arrays:

In [33]:print(ind.size, ind.shape, ind.ndim, ind.dtype)


Out[33]:5 (5,) 1 int64

One difference between Index objects and NumPy arrays is that indices are immutable–that is,
they cannot be modified via the normal means:
In [34]:ind[1] = 0
Out[34]:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-40e631c82e8a> in <module>()
----> 1 ind[1] = 0

/Users/jakevdp/anaconda/lib/python3.5/site-packages/pandas/indexes/base.py in __setitem__(self, key, value)


1243
1244 def __setitem__(self, key, value):
-> 1245 raise TypeError("Index does not support mutable operations")
1246
1247 def __getitem__(self, key):

TypeError: Index does not support mutable operations

This immutability makes it safer to share indices between multiple DataFrames and arrays, without
the potential for side effects from inadvertent index modification.

## Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend
on many aspects of set arithmetic. The Index object follows many of the conventions used by
Python's built-in set data structure, so that unions, intersections, differences, and other
combinations can be computed in a familiar way:

In [35]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [36]:indA & indB # intersection


Out[36]:Int64Index([3, 5, 7], dtype='int64')

In [37]:indA | indB # union


Out[37]:Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [38]:indA ^ indB # symmetric difference


Out[38]:Int64Index([1, 2, 9, 11], dtype='int64')
Sample program:
import pandas as pd

data = {'Name': ['John', 'Jane', 'Jim', 'Jill'],


'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])

jane_age = df.loc['B', 'Age'] # Label-based indexing


print(jane_age)

jim_age = df.iloc[2, 1] # Integer-location based indexing


print(jim_age)

output:
30
35

Data Indexing and Selection:


Data indexing and selection in Pandas involve extracting specific portions of data
from a Pandas object (Series or DataFrame) based on various criteria. Pandas
provides several methods for this purpose:

1)Selection by Label:
You can use labels (column or index names) to select data.
Syntax for DataFrame:
df.loc[row_label, column_label]
Example:
# Selecting a specific cell by label
value = df.loc[1, 'Age']
2)Selection by Position:
You can use integer-based positions to select data.
Syntax for DataFrame:
df.iloc[row_position, column_position]
Example:
# Selecting a specific cell by position
value = df.iloc[1, 1]
3)Boolean Indexing:
You can use boolean conditions to filter data.
Syntax for DataFrame:
df[df['Column_Name'] < value]
Example:
# Selecting rows where Age is less than 30
filtered_df = df[df['Age'] < 30]
DATA VISUALIZATIONS
Visualisation: Simple Line Plots, Simple Scatter Plots, Histograms, Binnings, and Density.

Data Visualization is the graphical representation of data to provide insights,


patterns, and relationships within datasets. It involves the use of various visual
elements such as charts, graphs, maps, and tables to make data more
understandable and accessible to the human brain. Data visualization is an essential
part of data analysis, as it helps in:
 Identifying trends and patterns.
 Discovering outliers and anomalies.
 Communicating complex information effectively.
 Supporting decision-making processes.
 Conveying insights to a broader audience.
There are several popular data visualization libraries and packages in Python. Here
are some of the commonly used ones:
1. Matplotlib: Matplotlib is one of the most widely used libraries for creating
static, animated, and interactive visualizations. It provides extensive
customization options and is the foundation for many other visualization
libraries.
Syntax: import matplotlib.pyplot as plt
2. Seaborn: Seaborn is built on top of Matplotlib and offers a higher-level
interface for creating attractive and informative statistical graphics. It
simplifies many complex tasks and has built-in support for various plot
types.
Syntax: import seaborn as sns
3. Plotly: Plotly is a versatile library for creating interactive and web-based
visualizations. It's suitable for dashboards and web applications.
Syntax: import plotly.express as px
4. Bokeh: Bokeh is another library for creating interactive web-based
visualizations. It's designed for building interactive and scalable dashboards.
Syntax: from bokeh.plotting import figure, output_file, show
5. ggplot (Plotnine): Inspired by the R ggplot2 library, ggplot (Plotnine) offers
a high-level interface for creating complex and attractive visualizations.
Syntax: from plotnine import ggplot, aes, geom_*
6. Folium: Folium is specifically designed for creating interactive maps and
visualizing geospatial data.
Syntax: import folium

Simple Line Plots:


Definition: Line plots are used to visualize data points connected by straight lines.
They are often used to display trends or changes in data over time or continuous
variables.
Or
Line plots are used to visualize the relationship between two continuous variables.
They are effective for showing trends and patterns in data.
Syntax:import matplotlib.pyplot as plt
plt.plot(x_values, y_values)
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Y Label")
plt.show()
Example 1:import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 14, 8, 15, 12]
# Creating a line plot
plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
output:
Example 2:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

plt.plot(x, y, label='Prime Numbers')


plt.xlabel('X')
plt.ylabel('Y')
plt.title('Prime Numbers Line Plot')
plt.legend()
plt.show()

output:
Simple Scatter Plots:
Definition: Scatter plots are used to visualize individual data points as dots on a 2D
plane. They are useful for displaying relationships between two continuous
variables.
Or
Scatter plots are used to visualize the relationship between two continuous variables.
Each data point is represented as a dot.
Syntax:import matplotlib.pyplot as plt
plt.scatter(x_values, y_values)
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Y Label")
plt.show()
Example 1:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 14, 8, 15, 12]

# Creating a scatter plot


plt.scatter(x, y)
plt.title("Simple Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Example 2:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.scatter(x, y, c='blue', marker='o', label='Prime Numbers')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Prime Numbers Scatter Plot')
plt.legend()
plt.show()
Histograms:
Definition: Histograms are used to visualize the distribution of a single variable.
They group data into bins and display the frequency of data points in each bin.
Or
Histograms are used to visualize the distribution of a continuous variable. It divides
the range of values into intervals (bins) and shows the frequency of data points
falling into each bin.
Syntax:
import matplotlib.pyplot as plt
plt.hist(data, bins=n_bins, edgecolor='k')
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Frequency")
plt.show()
Example 1:
import matplotlib.pyplot as plt
import numpy as np
# Sample data (random values)
data = np.random.randn(1000)
# Creating a histogram
plt.hist(data, bins=20, edgecolor='k')
plt.title("Histogram")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.show()
Output:

Example 2:
import matplotlib.pyplot as plt
import numpy as np

# Generate random data for demonstration


data = np.random.randn(1000)
plt.hist(data, bins='auto', edgecolor='black', alpha=0.7, label='Random Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Random Data Histogram')
plt.legend()
plt.show()
Output:

Binnings:
Definition: Binnings refer to the division of data into discrete intervals (bins) in a
histogram or bar chart. Binning helps in understanding the distribution of data.
Or
Binnings refer to the process of dividing data into bins
Syntax :
import matplotlib.pyplot as plt
plt.hist(data, bins=bin_edges, edgecolor='k')
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Frequency")
plt.show()
Example :
import matplotlib.pyplot as plt
import numpy as np
# Sample data (random values)
data = np.random.randn(1000)
# Specifying custom bin edges
bin_edges = [-3, -2, -1, 0, 1, 2, 3]
# Creating a histogram with custom bins
plt.hist(data, bins=bin_edges, edgecolor='k')
plt.title("Histogram with Custom Bins")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.show()

output:
Density Plots:
Definition: Density plots are used to estimate the probability density function of a
continuous random variable. They provide a smoother representation of data
distribution compared to histograms.
Or
while density plots represent the distribution of data in a smoothed manner.

Syntax:
import seaborn as sns
sns.kdeplot(data, shade=True)
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Density")
plt.show()
Example:
import seaborn as sns
import numpy as np
# Sample data (random values)
data = np.random.randn(1000)
# Creating a density plot
sns.kdeplot(data, shade=True)
plt.title("Density Plot")
plt.xlabel("Values")
plt.ylabel("Density")
plt.show()
output:

Note: These visualizations provide different ways to explore and represent data.
Line plots show trends, scatter plots reveal relationships, histograms display
distributions, and density plots offer smoothed distributions. They are essential
tools in data analysis and visualization.
Experiment or assignment :Binnings and Density:
Binnings refer to the process of dividing data into bins, while density plots
represent the distribution of data in a smoothed manner.
Syntax (Density Plot):
import seaborn as sns
sns.kdeplot(data, shade=True, label='label_name')
plt.xlabel('x_label')
splt.ylabel('Density')
plt.title('Density Plot Title')
plt.legend()
plt.show()
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Generate random data for demonstration


data = np.random.randn(1000)

sns.kdeplot(data, shade=True, label='Random Data')


plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Random Data Density Plot')
plt.legend()
plt.show()
Unit-2
Linked Lists: Introduction, Linked lists and types, Representation of
linked list, operations on linked list, Comparison of Linked Lists with
Arrays and Dynamic Arrays.
Disadvantage of Array
• Arrays and lists are simple data structures used to hold sequence of data.
• Disadvantage:
1. In Array, elements are stored in consecutive memory locations. To occupy the adjacent space,
block of memory that is required for the array should be allocated before hand.
2. Once memory is allocated it cannot be extended any more. So that array is called the static data
structure.
3. Wastage of memory is more in arrays.
How Linked List will help?
1. Linked list is able to grow in size as needed
2. Does not require the shifting of items during insertions and deletions.
3. No wastage of memory.
4. Data elements need not be stored in consecutive memory locations.
Linked List
• Alinked list is a collection of data elements called nodes in which the linear
representation is given by links from one node to the next node.
• It is a linear data structure that supports the dynamic memory allocation .
• Linked list is used to hold sequence of data values but data values need not be
stored in adjacent memory cells.
• An element in a linked list is known as a node. A node contains a data part and
one or two pointer part which contains the address of the neighborhood nodes in
the list. The left part of the node which contains data may include a simple data
type, an array, or a structure.
• The START always points to the first node in the list.
• A NULL pointer used to mark the end of the linked list .
• Linked list is a data structure which in turn can be used to implement other data
structures like stacks, queues etc.
LL example
• Linked lists contain a pointer variable START that stores the address of the first
node in the list.
• If START = NULL, then the linked list is empty and contains no nodes.
• Every node contains a pointer to another node which is of the same type, it is also
called a self-referential data type.
• Self-referential structures are those which have structure pointer(s) of the same
type as their member(s).

struct student { struct node


char name[20]; {
int roll; int data;
char gender; struct node *next;
int marks[5]; };
struct student *next;
};
Operations on LL
1.Insertion
2.Deletion Basic
3.Traverse Operations
4.Search
5.Destroy
6.IsEmpty
7.Reverse a linked list
8.Copying
9.Merging (combine two linked lists)
Types of Linked List
• The pointers are maintained based on the requirements and accordingly linked list can be classified
into three groups,
1. Singly linked lists
2. Circular linked lists
3. Doubly linked lists
Singly Linked List
• Generally “linked list” means a singly linked list.
• This list consists of a number of nodes in which each node has a next pointer to the following
element. The link of the last node in the list is NULL, which indicates the end of the list.
Insertion in SLL

There are various positions where node can be inserted.


Case-1: Insert at front ( as a first element)
Case-2: Insert at end ( as a last node)
Case-3: Insert at given position
Case-4: Insert after a given node
Case-5: Insert before a given node
Insert at front
• The new node is always added before the first node of the given Linked List. And
newly added node becomes the new head of the Linked List.

5
2000

7 1010 18 3000 14
1000

2000 5 1000 7 1010 3000


• Steps to insert a node at the beginning of the list:
1.Create new node
2.Update the next pointer of new node, to point to the current head.
3.Update head pointer to point to the new node.

Code for inserting a new node at the beginning of the list


// 1. create a new node
struct Node* new_node = (struct Node*) malloc(sizeof(struct
Node));
//store data 5 into new node
new_node->data = 5;
// 2. Make next of new node as head
new_node->next = head
// 3. move the head pointer to point to the new node
head= new_node;
Insert new node at end of the linked list
• In this case, we need to modify two next pointers (last nodes next pointer and new
nodes next pointer).

1000 1020 1100

10 1020 30 1100 40
1000

10 1020 30 1100 40 1500


1000

60
1. Create
new node
2. New nodes next pointer points to NULL.
3. Last nodes next pointer points to the new node

Code for inserting a new node at the end of the list


struct Node* ptr;
// 1. create a new node
struct Node* newNode = (struct Node*) malloc(sizeof(struct Node));
//store data 40 into new node
newNode->data = 40;
ptr=header;
while(ptr->next!=null)
ptr=ptr->next
ptr->next=newNode
newNode->next=null
Insert new node at any position in linked list
Let us assume that we are given a position where we want to insert the new node.
•In this case also, we need to modify two next pointers.
• If we want to add an element at position 3. Then we traverse 2 nodes and insert the
new node.
Algorithm For Inserting new node at any position in linked list
Step- 1: Create new node
Step-2: ptr=header
Step-3: for(i=1;i<position-1;i++)
ptr=ptr->next;
Step-4: newnode->next=ptr->next;
Step-5: ptr->next=newnode;
step-6: newnode->data= value (in our eg. 5)
Step-7: stop
Algorithm
5.Insert a new node before a given node in linked list
Insert new node before node with data 30
#include<stdio.h>
#include<malloc.h> switch(option){
#include <stdlib.h> case 1: insertion();
void search(); break;
void display(); case 2: deletion();
void deletion(); break;
void insertion(); case 3: display();
struct node break;
{ case 4:search();
int data; break;
struct node *next; case 5: exit(0);
}; default:printf("\nInvalid option!!\n");
struct node *header; }//switch
void main() }//while
{ }//main
int option; //insertion function
header=NULL; void insertion()
while(1) {
{ int value, opt, position,i,element;
printf("\n****Menu****"); struct node *new_node, *ptr,*preptr;
printf("\n1.Insertion\n2.Deletion\n3.Display //creating new node --Allocating memory for new
\n4.Search node
\n5.exit\n"); new_node=malloc(sizeof(struct node));
printf("\nEnter your option : "); printf("\nEnter the value to be inserted : ");
scanf("%d",&option); scanf("%d",&value);
new_node->data=value;
case 2:
//if list is empty //Inserting the new node at given Position
if(header==NULL) ptr=header;
{ printf("\t\tEnter the position at which new node
new_node->next=NULL; to be inserted: ");
header=new_node; scanf("%d",&position);
}//if //Searching for Position node insert to the new
else node after position node
{ for(i=1;i<position-1;i++)
// select the case to insert the new node ptr=ptr->next;
printf("\t1.Insert at front \n\t2.Insert at //inserting the new node after position node
given Position new_node->next=ptr->next;
\n\t3.Insert at the end \n"); ptr->next=new_node;
printf("\t4.Insert after a given node break;
\n\t5.Insert before a case 3:
given node\n"); //insert the new node at the end
printf("\n\tSelect the case to insert the ptr=header;
new value :"); //search for current last node
scanf("%d",&opt); while(ptr->next!=NULL)
switch(opt){ ptr=ptr->next;
case 1: //placing new node after current last node
//Insering new node at front new_node->next=NULL;
new_node->next=header; ptr->next=new_node;
header=new_node; break;
break;
case 5:
case 4:
//Insert new node before a given node
//Insert new node after a given node printf("\t\tEnter an existing node before which
printf("\t\tEnter an existing node the new node has to be inserted:");
after which the scanf("%d",&element);
new node has to be inserted:"); ptr= header;
scanf("%d",&element); preptr=header;
ptr= header; //searching position node(before this node new
//searching position node(after this node has to be inserted)
while((ptr->next!=NULL)&&(ptr->data!=element))
node new node
{
has to be inserted) preptr=ptr;
while((ptr->next!=NULL)&&(ptr- ptr=ptr->next;
>data!=element)) }
{ //inserting new node before node with given
ptr=ptr->next; element
} new_node->next=ptr;
//inserting new node after node with preptr->next=new_node;
break;
given element
default:
new_node->next=ptr->next; printf("\n\tInvalid option");
ptr->next=new_node; }
break; }//else
printf("\n\t\tLinked list After inserting new
node %d",value);
display();
}//insertion
Deletion of a Node from Singly Linked List (SLL)

Case-1: Deleting a node at front


Case-2: Deleting a node at end ( as a last node)
Case-3: Deleting a node at given position
Case-4: Deleting a given node
Case-5: Deleting a node after a given node
1. Deleting the First Node in Singly Linked List
or (Linked List)

• To delete a node from the beginning of the list.


• Steps to delete first node from the list:

1.Create a temporary node which will point to the


first node of the Linked list.
2.Move the header pointer to the next
node

3. Dispose of the temporary node.

Algorithm
Step-1: If header==NULL
print “list is empty”
goto step 5
Step-2: Set ptr = header
Step-3: Set header = header->NEXT
Step-4: Free Ptr
Step-5: stop
2. Deleting the Last Node in Singly Linked List

• Steps to delete last node from SLL:


1. Traverse the list with 2 pointers. When
we reach the end of the list, one pointer
points to the last node and the other
pointer points to the node before the last
node.
• Update previous node’s next pointer with
NULL.
• Dispose of the last node.
• Algorithm to delete last node from the linked list
Step -1: IF header = NULL
print “List is empty”
Go to Step 8
[END OF IF]
Step -2: SET ptr = header
Step-3: Repeat Steps-4 and 5 while ptr -> next != NULL
Step 4: SET preptr = ptr
Step 5: SET ptr = ptr-> next
[END OF LOOP]
Step 6: set preptr-> next = NULL
Step 7: FREE ptr
Step 8: EXIT
3. Deleting a node at any position in the list

• In this case, the node to be


removed is always located between
two nodes.
• Steps to delete a middle node from
SLL:
1. Traverse the list with 2 pointers.
Once we find the node to be deleted,
change the previous node’s next
pointer to the next pointer of the
node to be deleted.
2.Dispose of the current node to be
deleted.
Algorithm to delete a node at any location from the linked list
• Step -1: IF header = NULL
print “List is empty”
Go to Step 8
[END OF IF]
Step -2: SET ptr = header
SET preptr= header
SET currentPosition=1
Step-3: Repeat Steps-4 and 5
while currentPosition<givenPosition
Step 4: SET preptr = ptr
Step 5: SET ptr = ptr->next currentPosition++
[END OF LOOP]
Step 6: SET preptr next =ptr->next
Step 7: FREE ptr Step 8: Stop
4. Deleting a given node
Algorithm to delete a given node from the
linked list
Step -1: IF header = NULL
print “List is empty”
Go to Step 8
[END OF IF]
Step -2: SET ptr = header
Step-3: Repeat Steps-4 and 5 while ptr ->data !=
value
Step 4: SET preptr = ptr
Step 5: SET ptr = ptr->next
[END OF LOOP]
Step 6: set preptr->next =ptr->next
Step 7: FREE ptr
Step 8: Stop
Traversing a Linked List

• Traversing a linked list means accessing the nodes of the list in order to perform
some processing on them.
• Example: Displaying the contents of Linked list, Counting number of nodes in the
linked list, etc..
Algorithm for Traversing a SLL:
Step -1: ptr = header
Step-2: Repeat Steps-3 and 4 while ptr != NULL
Step-3: Apply Process on ptr ->data
Step-4: Set ptr = ptr->next
[END OF LOOP]
Step 5: Stop
Code for Displaying Linked List:
if(header = = NULL)
print “List is empty”;
else
for (ptr = header ; ptr != NULL ; ptr = ptr -> next)
print “ptr->data”;
Searching for a Value in a Linked List

• Searching a linked list means to find a particular element in the linked list.
• A linked list consists of nodes with two parts, the data part and the pointer part.
• In linked list, searching means finding whether a given value is present in the data
part of the node or not. If it is present, then return the address of the node that
contains the search value.
Algorithm to search for a value in Linked list
Step-1: ptr = header
Step-2: Repeat Step-3 while ptr != NULL
Step-3: IF val = ptr->data
SET pos = ptr // val is found in the list.
Go To Step 5
ELSE
SET ptr = ptr->next
[END OF IF]
[END OF LOOP]
Step-4: SET ptr = NULL // val is not found in the linked list.
Step-5: return ptr and stop
Doubly linked list (or) Two-way Linked List

• In a singly linked list we can move from the header node to any node in one direction only (left-right).
• A doubly linked list is a two-way list because one can move in either direction. That is, either from left to
right or from right to left.
• It contains a pointer to the next as well as to the previous node in the sequence. Therefore, it consists of three
parts—data, a pointer to the next node, and a pointer to the previous node
Where, DATA field stores the element or data,
PREVIOUS field contains the address of its previous node, NEXT field contains the address of its next node.

PREVIOUS DATA NEXT


Basic Operations on Doubly Linked List

• Insertion
• Deletion
• Traverse
• Search

• The node structure of a doubly linked list in C,


Insertion on doubly linked list

• There are 3 different cases of Insertion into a doubly-linked list:


• Inserting a new node at the front.
Inserting a new node at the end of the list.
Inserting a new node at the middle of the list(at any position)
Extra cases:
• Inserting a new node before given node
• Inserting a new node after given node
Insertion of a Node at the Front of DLL
Update head node’s left pointer to point to the new node and
make new node as head.
Insertion of a node at the end of DLL
Update right pointer of last node to point to new
node
Insertion of a Node at any position in DLL
Position node(node pointed by ptr) NEXT pointer points to the new node and
the next node of position node PREV pointer points to new node.
Deletion on doubly linked list
• Deleting a node at the Front of DLL
Deleting a Last node from DLL
Algorithm to delete a node at any position from DLL
UNIT 2 – INTRODUCTION

Data structure:

Data Structures are a way of organizing data so that it can be accessed more efficiently depending upon
the situation. Data Structures are fundamentals of any programming language around which a program
is built.

General data Structure types include arrays, files, linked lists, stacks, queues, trees,graphs and so on…

The data structures differ based on mutability and order

Linear data structure: Data structure where data elements are arranged sequentially or linearly where
each and every element is attached to its previous and next adjacent is called a linear data structure.

Examples: Linked Lists, Stacks and Queues.

Non linear data structure: Elements of this data structure arc stored /accessed in a non-linear order.

Examples: Trees and graphs

Abstract Data Types (ADTs) :

In general, user defined data types are defined along with their operations.

To simplify the process of solving problems, we com bine the data structures with their operations and
we call This Abstract Data Types (ADTs). An ADT consists of two parts:

1.Declaration of data

2.Declaration of opera Lions

Commonly used ADTs include: Linked Lists, Stacks, Queues, Priority Queues, Binary Trees, Dic tionaries,
Disjoint Sets (Union and Find), Hash Tables, Graphs, and many others.

For example, stack uses LIFO (Last-In First-Out) mechanism while storing the data in data structures. The
last element inserted into the stack is the First element that gets deleted. Common operations of it are:
creating the stack, pushing an element onto the Stack, popping an element from stack, finding the
current top of the stack, finding number of elements in the Stack, etc.

While defining the ADTs do not. Worry about the implementation details. They come into the picture only
when we want to use them. Different kinds of ADT’s are suited to different locations different kinds of
applications , and some are highly specialized to specific tasks.

ALGORITHM:

A set of rules that must be followed when solving a particular problem

Analysis of algorithm: The goal of the analysis of algorithms is to compare algorithms (or solutions)
mainly in terms of running time but also in terms of other factors (e.g., memory, developer effort, etc.)

What is Running Time Analysis?


It is the process of determining how processing time increases as the size of the problem (input size)
increases.

Input size is the number of elements in the input, and depending on the problem type, the input may be
of different types. The following are the common types of inputs.

• Size of an a rray

• Polynomial degree

• Number of elements in a matrix

• Number of bits in the binary representation of the input

• Vertices and edges in a graph.

What is Rate of Growth?

The rate at which the running time increases as a function of input is called rate of growth.

The rate at which the running time increases as a function of input is called rate of growth.
TYPES OF ANALYSIS:

To analysis the given algorithm, we need to know with which inputs the algorithm lakes less time
(performing Well) and with which inputs the algorithm takes a long time. We have already seen that an
algorithm can be represented in the form of an expression. That means we represent the algorithm with
multiple expressions: one for the case where it takes less time and another for the case where it takes
more time.

worst case:

O Defines the input for which the algorithm takes a long time.

O Input is the one for which the algorithm runs the slowest.

• Best case

O Defines the input for which the algorithm takes the least time.

O Input is the one for which the algorithm runs the fastest.

• Average case

O Provides a predict ion about the running time or the algorithm.

O Assumes that the input is random.

Lower Bound <= Average Time <= Upper Bound

Asymptotic Notations :

Asymptotic notations are mathematical tools to represent the time complexity of algorithms for
asymptotic analysis.

There are mainly three asymptotic notations:

Big-O Notation (O-notation)

Omega Notation (Ω-notation)

Theta Notation (Θ-notation)


1. Theta Notation (Θ-Notation):

Theta notation encloses the function from above and below. Since it represents the upper and
the lower bound of the running time of an algorithm, it is used for analyzing the average-case
complexity of an algorithm.
.Theta (Average Case) You add the running times for each possible input combination and take
the average in the average case.
Let g and f be the function from the set of natural numbers to itself. The function f is said to be
Θ(g), if there are constants c1, c2 > 0 and a natural number n0 such that c1* g(n) ≤ f(n) ≤ c2 *
g(n) for all n ≥ n0
Note: Θ provides exact bound

2.Big-O Notation (O-notation):

It returns the highest possible output value (big-O)for a given input.

The execution time serves as an upper bound on the algorithm’s time complexity.
Mathematical Representation of Big-O Notation:

O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }

Note: O(n2) also covers linear time.

The worst-case time complexity of Insertion Sort is Θ(n2).

The best case time complexity of Insertion Sort is Θ(n).

3.Omega Notation (Ω-Notation):

The execution time serves as a lower bound on the algorithm’s time complexity.

It is defined as the condition that allows an algorithm to complete statement execution in the shortest
amount of time.
Mathematical Representation of Omega notation :

Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }

Properties of Asymptotic Notations:

1. General Properties:

If f(n) is O(g(n)) then a*f(n) is also O(g(n)), where a is a constant.

Example:

F(n) = 2n²+5 is O(n²)

Then, 7*f(n) = 7(2n²+5) = 14n²+35 is also O(n²).

Similarly, this property satisfies both Θ and Ω notation.

2. Transitive Properties:

If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) = O(h(n)).

Example:
If f(n) = n, g(n) = n² and h(n)=n³

N is O(n²) and n² is O(n³) then, n is O(n³)p

Similarly, this property satisfies both Θ and Ω notation.

3. Reflexive Properties:

Reflexive properties are always easy to understand after transitive.

If f(n) is given then f(n) is O(f(n)). Since MAXIMUM VALUE OF f(n) will be f(n) ITSELF!

Hence x = f(n) and y = O(f(n) tie themselves in reflexive relation always.

Example:

F(n) = n² ; O(n²) i.e O(f(n))

Similarly, this property satisfies both Θ and Ω notation.

4. Symmetric Properties:

If f(n) is Θ(g(n)) then g(n) is Θ(f(n)).

Example:

If(n) = n² and g(n) = n²

Then, f(n) = Θ(n²) and g(n) = Θ(n²)

This property only satisfies for Θ notation.

5. Transpose Symmetric Properties:

If f(n) is O(g(n)) then g(n) is Ω (f(n)).

Example:

If(n) = n , g(n) = n²

Then n is O(n²) and n² is Ω (n)

This property only satisfies O and Ω notations.

6. Some More Properties:

1. If f(n) = O(g(n)) and f(n) = Ω(g(n)) then f(n) = Θ(g(n))

2. If f(n) = O(g(n)) and d(n)=O(e(n)) then f(n) + d(n) = O( max( g(n), e(n) ))

Example:

F(n) = n i.e O(n)

D(n) = n² i.e O(n²)

Then f(n) + d(n) 8= n + n² i.e O(n²)


2. If f(n)=O(g(n)) and d(n)=O(e(n)) then f(n) * d(n) = O( g(n) * e(n))

Example:

F(n) = n i.e O(n)

D(n) = n² i.e O(n²)

Then f(n) * d(n) = n * n² = n³ i.e O(n³)

Note: If f(n) = O(g(n)) then g(n) = Ω(f(n))

SORTING TECHNIQUES:

Selection sort:

Selection sort is a simple and efficient sorting algorithm that works by repeatedly selecting the smallest
(or largest) element from the unsorted portion of the list and moving it to the sorted portion of the list.

Code:

Def selection_sort(arr):

N = len(arr)

For I in range(n):

Min_index = i

# Find the index of the minimum element in the unsorted portion

For j in range(I + 1, n):

If arr[j] < arr[min_index]:

Min_index = j

# Swap the minimum element with the first element in the unsorted portion

Arr[i], arr[min_index] = arr[min_index], arr[i]

# Example usage

If __name__ == “__main__”:

Input_list = [64, 25, 12, 22, 11]

Print(“Original List:”, input_list)


Selection_sort(input_list)

Print(“Sorted List:”, input_list)

{SELECTION SORT}

Time Complexity: The time complexity of Selection Sort is O(N2) as there are two nested loops:

One loop to select an element of Array one by one = O(N)

Another loop to compare that element with every other Array element = O(N)

Therefore overall complexity = O(N) * O(N) = O(N*N) = O(N2)

MERGE SORT:

In simple terms, we can say that the process of merge sort is to divide the array into two halves, sort
each half, and then merge the sorted halves back together. This process is repeated until the entire array
is sorted

# Python program for implementation of MergeSort

Def mergeSort(arr):

If len(arr) > 1:

# Finding the mid of the array

Mid = len(arr)//2

# Dividing the array elements

L = arr[:mid]

# Into 2 halves

R = arr[mid:]

# Sorting the first half

mergeSort(L)

# Sorting the second half

mergeSort®

I=j=k=0

# Copy data to temp arrays L[] and R[]


While I < len(L) and j < len®:

If L[i] <= R[j]:

Arr[k] = L[i]

I += 1

Else:

Arr[k] = R[j]

J += 1

K += 1

# Checking if any element was left

While I < len(L):

Arr[k] = L[i]

I += 1

K += 1

While j < len®:

Arr[k] = R[j]

J += 1

K += 1

# Code to print the list

Def printList(arr):

For I in range(len(arr)):

Print(arr[i], end=” “)

Print()

# Driver Code

If __name__ == ‘__main__’:

Arr = [12, 11, 13, 5, 6, 7]

Print(“Given array is”)

printList(arr)

mergeSort(arr)

print(“\nSorted array is “)
printList(arr)

OUTPUT:

Given array is

12 11 13 5 6 7

Sorted array is

5 6 7 11 12 13

Time Complexity: O(N log(N)), Merge Sort is a recursive algorithm and time complexity can be expressed
as following recurrence relation.

T(n) = 2T(n/2) + θ(n)

Radix Sort Algorithm:

Rather than comparing elements directly, Radix Sort distributes the elements into buckets based on each
digit’s value. By repeatedly sorting the elements by their significant digits, from the least significant to
the most significant, Radix Sort achieves the final sorted order.

# Python program for implementation of Radix Sort

# A function to do counting sort of arr[] according to

# the digit represented by exp.

Def countingSort(arr, exp1):

N = len(arr)

# The output array elements that will have sorted arr

Output = [0] * (n)

# initialize count array as 0

Count = [0] * (10)

# Store count of occurrences in count[]

For I in range(0, n):

Index = arr[i] // exp1

Count[index % 10] += 1

# Change count[i] so that count[i] now contains actual

# position of this digit in output array

For I in range(1, 10):


Count[i] += count[I – 1]

# Build the output array

I=n–1

While I >= 0:

Index = arr[i] // exp1

Output[count[index % 10] – 1] = arr[i]

Count[index % 10] -= 1

I -= 1

# Copying the output array to arr[],

# so that arr now contains sorted numbers

I=0

For I in range(0, len(arr)):

Arr[i] = output[i]

# Method to do Radix Sort

Def radixSort(arr):

# Find the maximum number to know number of digits

Max1 = max(arr)

# Do counting sort for every digit. Note that instead

# of passing digit number, exp is passed. Exp is 10^i

# where I is current digit number

Exp = 1

While max1 / exp >= 1:

countingSort(arr, exp)

exp *= 10

# Driver code

Arr = [170, 45, 75, 90, 802, 24, 2, 66]

# Function Call
radixSort(arr)

for I in range(len(arr)):

print(arr[i], end=” “)

Time Complexity:

Radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping
the keys by the individual digits which share the same significant position and value. It has a time
complexity of O(d * (n + b)), where d is the number of digits, n is the number of elements, and b is the
base of the number system being used.

In practical implementations, radix sort is often faster than other comparison-based sorting algorithms,
such as quicksort or merge sort, for large datasets, especially when the keys have many digits. However,
its time complexity grows linearly with the number of digits, and so it is not as efficient for small
datasets.

QUICK SORT:

The key process in quickSort is a partition(). The target of partitions is to place the pivot (any element can
be chosen to be a pivot) at its correct position in the sorted array and put all smaller elements to the left
of the pivot, and all greater elements to the right of the pivot.

Partition is done recursively on each side of the pivot after the pivot is placed in its correct position and
this finally sorts the array.

Choice of Pivot:

There are many different choices for picking pivots.

Always pick the first element as a pivot.

Always pick the last element as a pivot (implemented below)

Pick a random element as a pivot.

Pick the middle as the pivot.

Partition Algorithm:

The logic is simple, we start from the leftmost element and keep track of the index of smaller (or equal)
elements as i. While traversing, if we find a smaller element, we swap the current element with arr[i].
Otherwise, we ignore the current element.

# Python3 implementation of QuickSort

# Function to find the partition position


Def partition(array, low, high):

# Choose the rightmost element as pivot

Pivot = array[high]

# Pointer for greater element

I = low – 1

# Traverse through all elements

# compare each element with pivot

For j in range(low, high):

If array[j] <= pivot:

# If element smaller than pivot is found

# swap it with the greater element pointed by i

I=I+1
# Swapping element at I with element at j

(array[i], array[j]) = (array[j], array[i])

# Swap the pivot element with

# the greater element specified by i

(array[I + 1], array[high]) = (array[high], array[I + 1])

# Return the position from where partition is done

Return I + 1

# Function to perform quicksort

Def quicksort(array, low, high):

If low < high:

# Find pivot element such that

# element smaller than pivot are on the left

# element greater than pivot are on the right


Pi = partition(array, low, high)

# Recursive call on the left of pivot

Quicksort(array, low, pi – 1)

# Recursive call on the right of pivot

Quicksort(array, pi + 1, high)

# Driver code

If __name__ == ‘__main__’:

Array = [10, 7, 8, 9, 1, 5]

N = len(array)

# Function call

Quicksort(array, 0, N – 1)

Print(‘Sorted array:’)
For x in array:

Print(x, end=” “)

Output

Sorted array:

1 5 7 8 9 10

Time Complexity:

Best Case: Ω (N log (N))

The best-case scenario for quicksort occur when the pivot chosen at the each step divides the array into
roughly equal halves.

In this case, the algorithm will make balanced partitions, leading to efficient Sorting.

Average Case: θ ( N log (N))

Quicksort’s average-case performance is usually very good in practice, making it one of the fastest
sorting Algorithm.

Worst Case: O(N2)

The worst-case Scenario for Quicksort occur when the pivot at each step consistently results in highly
unbalanced partitions. When the array is already sorted and the pivot is always chosen as the smallest or
largest element. To mitigate the worst-case Scenario, various techniques are used such as choosing a
good pivot (e.g., median of three) and using Randomized algorithm (Randomized Quicksort ) to shuffle
the element before sorting.

Auxiliary Space: O(1), if we don’t consider the recursive stack space. If we consider the recursive stack
space then, in the worst case quicksort could make O(N).
Selection Sort
Selection Sort
Algorithm
1. Find the minimum value in the list
2. Swap it with the value in the current position
3. Repeat this process for all the elements until the entire array is sorted

This algorithm is called selection sort since it repeatedly selects the smallest element.

Example:
Selection Sort example (cont..)
Given Array
Index

Index

Selected Position i=0 min_pos = i = 0

Index
min_pos = 1

Index
min_pos = 2
Selection Sort example (cont..)
Index
min_pos = 2

Index
min_pos = 4

Index
min_pos = 4

Selected Position i=0

Index
min_pos = 4

Index

Iteration 1
completed
Selection Sort example (cont..)

Index

Selected Position i=1 min_pos = i = 1

Index
min_pos = 2

Index
min_pos = 2

Index
min_pos = 2
Selection Sort example (cont..)
Index
min_pos = 2

Index
min_pos = 6

Selected Position i=1

Index

Index
Iteration 2
completed
Selection Sort example (cont..)
Index

Selected Position i=2 min_pos = i = 2

Index
min_pos = 2

Index
min_pos = 2

Index
min_pos = 2
Selection Sort example (cont..)

Index
min_pos = 2

Selected Position i=2

Index
Iteration 3
completed

Index
Selection Sort example (cont..)

Index

Selected Position i=3 min_pos = i = 3

Index
min_pos = 3

Index
min_pos = 3

Index
min_pos = 3
Selection Sort example (cont..)

Selected Position i=3


min_pos = 3

Index

Index

Iteration 4
completed
Selection Sort example (cont..)
Index

Selected Position i=4 min_pos = i = 4

Index
min_pos = 4

Index
min_pos = 6

Index

Index
Iteration 5
completed
Selection Sort example (cont..)
Index

Selected Position i=5 min_pos = i = 5

Index
min_pos =6

Index

Index

Iteration 6
completed
Selection Sort
Time Complexity

In the above Example n=7 then in


iteration 1 6 comparison
iteration 2 5 comparison
…….
iteration 6 1 comparison
Total Number of comparison are = 6+5+…+1

In terms of N if N is number of elements in array

(N-1)+(N-2)+……..+1 = (N*(N-1))/2 = O(N*N)

In all case best, worst, average the complexity is same


Selection Sort
Selection Sort
Merge Sort
MergeSort

 MergeSort is a divide and conquer method of


sorting

2
MergeSort Algorithm

 MergeSort is a recursive sorting procedure that


uses at most O(n lg(n)) comparisons.
 To sort an array of n elements, we perform the
following steps in sequence:
 If n < 2 then the array is already sorted.
 Otherwise, n > 1, and we perform the following
three steps in sequence:
1. Sort the left half of the the array using MergeSort.
2. Sort the right half of the the array using MergeSort.
3. Merge the sorted left and right halves.
3
How to Merge works
Here are two lists to be merged:
First: (12, 16, 17, 20, 21, 27)
Second: (9, 10, 11, 12, 19)
Compare12 and 9
First: (12, 16, 17, 20, 21, 27)
Second: (10, 11, 12, 19)
New: (9)
Compare 12 and 10
First: (12, 16, 17, 20, 21, 27)
Second: (11, 12, 19)
New: (9, 10)
4
Merge Example

Compare 12 and 11
First: (12, 16, 17, 20, 21, 27)
Second: (12, 19)
New: (9, 10, 11)
Compare 12 and 12
First: (16, 17, 20, 21, 27)
Second: (12, 19)
New: (9, 10, 11, 12)

5
Merge Example

Compare 16 and 12
First: (16, 17, 20, 21, 27)
Second: (19)
New: (9, 10, 11, 12, 12)
Compare 16 and 19
First: (17, 20, 21, 27)
Second: (19)
New: (9, 10, 11, 12, 12, 16)

6
Merge Example

Compare 17 and 19
First: (20, 21, 27)
Second: (19)
New: (9, 10, 11, 12, 12, 16, 17)

Compare 20 and 19
First: (20, 21, 27)
Second: ()
New: (9, 10, 11, 12, 12, 16, 17, 19)

7
Merge Example

Checkout 20 and empty list


First: ( )
Second: ()
New: (9, 10, 11, 12, 12, 16, 17, 19, 20, 21, 27)

8
Merge-Sort Tree
 An execution of merge-sort is depicted by a binary tree
– each node represents a recursive call of merge-sort and stores
unsorted sequence before the execution and its partition
sorted sequence at the end of the execution

– the root is the initial call


– the leaves are calls on subsequences of size 0 or 1

7 2  9 4  2 4 7 9

7  2  2 7 9  4  4 9

77 22 99 44

10
Execution Example

 Partition
7 2 9 43 8 6 1

11
Execution Example (cont.)

 Recursive call, partition


7 2 9 43 8 6 1

7 29 4

12
Execution Example (cont.)

 Recursive call, partition


7 2 9 43 8 6 1

7 29 4

72

13
Execution Example (cont.)

 Recursive call, base case


7 2 9 43 8 6 1

7 29 4

72

77

14
Execution Example (cont.)

 Recursive call, base case


7 2 9 43 8 6 1

7 29 4

72

77 22

15
Execution Example (cont.)

 Merge
7 2 9 43 8 6 1

7 29 4

722 7

77 22

16
Execution Example (cont.)
 Recursive call, …, base case, merge
7 2 9 43 8 6 1

7 29 4

722 7 9 4  4 9

77 22 99 44

17
Execution Example (cont.)

 Merge
7 2 9 43 8 6 1

7 29 4 2 4 7 9

722 7 9 4  4 9

77 22 99 44

18
Execution Example (cont.)

 Recursive call, …, merge, merge


7 2 9 43 8 6 1

7 29 4 2 4 7 9 3 8 6 1  1 3 6 8

722 7 9 4  4 9 3 8  3 8 6 1  1 6

77 22 99 44 33 88 66 11

19
Execution Example (cont.)

 Merge
7 2 9 43 8 6 1  1 2 3 4 6 7 8 9

7 29 4 2 4 7 9 3 8 6 1  1 3 6 8

722 7 9 4  4 9 3 8  3 8 6 1  1 6

77 22 99 44 33 88 66 11

20
Mergesort Analysis
• Let T(N) be the running time for an array of N
elements
• Mergesort divides array in half and calls itself
on the two halves. After returning, it merges
both halves using a temporary array
• Each recursive call takes T(N/2) and merging
takes O(N)
Mergesort Recurrence Relation
• The recurrence relation for T(N) is:
– T(1) < a
• base case: 1 element array  constant time
– T(N) < 2T(N/2) + bN
• Sorting N elements takes
– the time to sort the left half
– plus the time to sort the right half
– plus an O(N) time to merge the two halves

• T(N) = O(n log n)


Mergesort Analysis
Quick Sort
Quick Sort
Quick sort is an example of a divide-and-conquer algorithmic technique.

Divide-and-Conquer is very important strategy in computer science:


• Divide problem into smaller parts
• Independently solve the parts
• Combine these solutions to get overall solution

Divide-and-Conquer strategy in quick sort


Quick Sort

Idea : Partition array into items that are “small” and items that are “large”,
then recursively sort the two sets
Quick Sort
Implementation
Example

https://learnprogramo.com/quick-sort-programs-in-c/
Example

https://learnprogramo.com/quick-sort-programs-in-c/
Example
This completes
31 26 20 17 44 54 77 55 93
iteration one
Now left array contains all elements < 54 and right array contains all elements >54

31 26 20 17 44 54 77 55 93

17 26 20 31 44 54 77 55 93

17 26 20 31 44 54 77 55 93

17 20 26 31 44 54 77 55 93

17 20 26 31 44 54 55 77 93
Quick Sort
Analysis

Let us assume that T(n) be the complexity of Quicksort and also assume that all elements
are distinct.
Recurrence for T(n) depends on two subproblem sizes which depend on partition element.
If pivot is ith smallest element then exactly (i–1) items will be in left part and (n–i) in right
part.
Let us call it as i–split.Since each element has equal probability of selecting it as pivot the
probability of selecting ith element is (1/n) .

Best Case: Each partition splits array in halves and gives

T(n)=2T(n/2)+n= O(nlogn)
Quick Sort
Worst Case: Each partition gives unbalanced splits and we get
Theworst-case occurs when the list is already sorted and last element chosen as
pivot.

T(n) = T(n–1) + n =O(n2)

Average Case: In the average case of Quick sort, we do not know where the split
happens. For this reason, we take all possible values of split locations, add all
their complexities and divide with n to get the average case complexity.
Quick Sort Implementation
Radix Sort
Radix Sort
Radix Sort
– radix is a synonym for base. base 10, base 2
Multi pass sorting algorithm that only looks
at individual digits during each pass
Use queues as buckets to store elements
Create an array of 10 queues
Starting with the least significant digit place
value in queue that matches digit
empty queues back into array
repeat, moving to next least significant digit
CS314
3
Queues
Radix Sort in Action: 1s place

original values in array


9, 113, 70, 86, 12, 93, 37, 40, 252, 7, 79, 12
Look at ones place
9, 113, 70, 86, 12, 93, 37, 40, 252, 7, 79, 12
Array of Queues (all empty initially):
q0 : q5:
q1 : q6 :
q2 : q7 :
q3 : q8:
4 9
Radix Sort in Action: 1s place
original values in array
9, 113, 70, 86, 12, 93, 37, 40, 252, 7, 79, 12
Look at ones place
9, 113, 70, 86, 12, 93, 37, 40, 252, 7, 79, 12
Queues:
q0 : q5:
q1 : q6 :
q2 : q7 :
q3 : q8:
CS314
5
Queues
Radix Sort in Action: 1s
original values in array
9, 113, 70, 86, 12, 93, 37, 40, 252, 7, 79, 12
Look at ones place
9, 113, 70, 86, 12, 93, 37, 40, 252, 7, 79, 12
Queues:
q0:70, 40 q5:
q1: q6: 86
q2:12, 252, 12 q7: 37, 7
q3:113, 93 q8:
q4: q9: 9, 79
CS314
6
Queues Radix Sort in Action: 10s
Empty queues in order from 0 to 9 back into
array
70, 40, 12, 252, 12, 113, 93, 86, 37, 7, 9, 79
Now look at 10's place
70, 40, 12, 252, 12, 113, 93, 86, 37, 7, 9, 79
Queues:
q0 : 7, 9 q5: 252
q1:12, 12, 113 q6:
q2: q7: 70, 79
q3: 37 q8: 86
q4: 40 q9: 93
CS314
7
Queues
Radix Sort in Action: 100s
Empty queues in order from 0 to 9 back into array
7, 9, 12, 12, 113, 37, 40, 252, 70, 79, 86, 93
Now look at 100's place
_ _7, _ _9, _12, _12, 113, _37, _40, 252, _70, _79, _86, _93
Queues:
q0 : 7, 9, 12, 12, 37, 40, 70, 79, 86, 93
q5:
q1: 113 q6:
q2: 252 q7:
q3: q8:
q4: q9:
CS314
8
Queues
Radix Sort in Action: Final Step
Empty queues in order from 0 to 9 back into
array
7, 9, 12, 12, 40, 70, 79, 86, 93, 113, 252
Unit-3
Stacks and Queues: Introduction to stacks, applications of stacks, implementation
and comparison of stack implementations. Introduction to queues, applications of
queues and implementations, Priority Queues and applications
Stack
• Stack is a Linear Data Structure which stores its elements in an ordered manner or sequential
order.
• Definition: A stack is an ordered list in which insertion and deletion are done at one end, called
top. The last element inserted is the first one to be deleted. Hence, it is called the Last in First out
(LIFO) or First in Last out (FILO) list.
• Stack maintains a pointer called top, which keeps track of the top most element in the stack.
• Any insertions or deletions should be based upon the value of top.
Operations on Stack
• Two operations can be made to a stack. They are,
1. Push – Inserting an element on to a stack
2. Pop – Deleting or Removing an element from the stack.
Exceptions in Stack

• Attempting the execution of an operation may sometimes cause an error condition,


called an exception.
• Exceptions are said to be “thrown” by an operation that cannot be executed.
• In the Stack ADT, operations pop and top cannot be performed if the stack is
empty. The execution of pop (or) top operation on an empty stack throws an
exception called as underflow.
• Trying to push an element in a full stack throws an exception called as overflow.
Applications of stacks

• Balancing of symbols (or) Parenthesis matching


• Infix-to-postfix conversion
• Infix to prefix conversions.
• Evaluation of postfix expression
• Implementing function calls and recursion
• Tower of Hanoi
• Matching Tags in HTML and XML
• Auxiliary data structure for other algorithms (Example: Tree traversal algorithms)
Balancing of symbols (or)parenthesis matching
• Stacks can be used to check whether the given expression has balanced symbols. This algorithm is
very useful in compilers.
• Each time the parser reads one character at a time. If the character is an opening delimiter such as (,
{, or [- then it is Pushed to the stack.
• When a closing delimiter is encountered like ), }, or ]-the stack is popped.
• The opening and closing delimiters are then compared. If they match, the parsing of the string
continues. If they do not match, the parser indicates that there is an error on the line.
• NOTE: In an expression the no. of left parenthesis must be equal to no. of right parenthesis.
• Example-1: ((A+B)*C)
This is a valid expression.
Because, No. of left parenthesis (2) = No. of right parenthesis (2).
• Example-2: ((A+B*C)
This is an Invalid expression.
Because, No. of left parenthesis (2) != No. of right parenthesis (1).
Conversion of expressions
Arithmetic expressions can be represented in three ways:
• Infix Expression
• Prefix Expression
• Postfix Expression
1. Infix Expression:
Operator should be placed in between the two operands.
Syntax: Operand1 Operator Operand2
 Example: A+B , (A+B) + (C-D)
2. Prefix Expression (polish notation):
• Operator is placed before both the operands
Syntax: Operator Operand1 Operand2
Examples: +AB ++AB-CD
3. Postfix Expression (Reverse polish notation or suffix notation):
• Operator should be placed after both the operands.
Syntax: Operand1 Operand2 Operator
Example: AB+ AB+CD- +
Examples
Infix to prefix

• Step-1: Reverse the infix string. Note that while reversing the
string you must interchange left and right parentheses.
• Step-2: Obtain the postfix expression of the infix expression
obtained in Step 1.
• Step-3: Reverse the postfix expression to get prefix
expression
• Step-4: Exit
• Example:
• Given Infix Expression: (A – B/ C) * (A / K – L)
• Step 1: Reverse the infix string.
(L – K / A) * (C / B – A)
• Step 2: Obtain the corresponding postfix expression of the infix expression .
postfix expression of (L – K / A) * (C / B – A) is
LKA/ – C B /A– *
• Step 3: Reverse the postfix expression to get the prefix expression
* –A/ B C -/AK L
• Hence, the prefix expression of (A – B/ C) * (A / K – L) is
* –A/ B C - /AK L
Example 2: (A+B^C)*D+E^5
Ans: +*+A^BCD^E5
Evaluation of a Prefix Expression
Step-1: Create an empty stack.
Step-2: Scan Prefix expression from right to left and Repeat steps 3 and 4 for each element of the
expression LOOP
Step-3: If the scanned character is an operand then
PUSH it onto the stack.
Step-4: If the scanned character is an operator op1 then
1. Remove top two elements of stack, where A is the top and B is the next top element.
2. Evaluate, A op1 B
3. PUSH the result of evaluation onto the stack.
[END OF IF]
[END OF LOOP]
Step-4: Set the RESULT as the top most value of the stack.
Step-5. Exit
Evalute the Prefix Expression
-+8/632
Evalute the Prefix Expression /*20*50+3 6 300 2
+ - 2 7 * 8 / 4 12
Ans: 28
Conversion of an Infix Expression into a Postfix Expression
• An algebraic expression written in infix notation may contain parentheses,
operands, and operators.
• The order of evaluation of the operators in the Infix expression can be changed by
the use of parentheses.
Properties
 Order of the numbers (or operands) is unchanged but order of operators may be
changed.
Example: Let us consider the infix expression 2 + 3*4 and its postfix equivalent
234*+. Notice that between infix and postfix the order of the numbers (or operands)
is unchanged. It is 2 3 4 in both cases. But the order of the operators * and + is
affected in the two expressions.
 Only one stack is enough to convert an infix expression to postfix expression.
 This stack will be used to change the order of operators from infix to postfix.
 This stack will only contain operators and the Left parenthesis symbol ‘(‘.
Algorithm to convert Infix To Postfix Infix Expression: A+ (B*C-(D/E^F)*G)*H, where ^ is an exponential operator.
Let, X is an arithmetic expression written in infix
notation. This algorithm finds the equivalent postfix
expression Y.
1.Push “(“onto Stack, and add “)” to the end of X.
2.Scan X from left to right and repeat Step 3 to 6 for
each element of X until the Stack is empty.
3.If an operand is encountered, add it to Y.
4.If a left parenthesis is encountered, push it onto
Stack.
5.If an operator is encountered ,then:
1. Repeatedly pop from Stack and add to Y each
operator (on the top of Stack) which has the
same precedence as or higher precedence
than operator.
2. Add operator to Stack.
[End of If]
6.If a right parenthesis is encountered ,then:
1. Repeatedly pop from Stack and add to Y each
operator (on the top of Stack) until a left
parenthesis is encountered.
2. Remove the left Parenthesis.
[End of If]
[End of If]
7.END.
Evaluation of a Postfix(reverse polish notation) Expression
Step-1: Create an empty stack.
Step-2: Scan expression from left to right and Repeat steps 3 and 4 for each element of the expression
LOOP
Step-3: If the scanned character is an operand then
PUSH it onto the stack.
Step-4: If the scanned character is an operator op1 then
1. Remove top two elements of stack, where A is the top and B is the next top element.
2. Evaluate, B op1 A
3.PUSH the result of evaluation onto the stack.
[END OF IF]
[END OF LOOP]
Step-4: Set the RESULT as the top most value of the stack.
Step-5. Exit
Example: 2 10 + 9 6 - /
Evalute the Postfix Expression
934*8+4/–
• Convert infix to postfix expression
• 1.(A-B)*(D/E)
• 2.(A+B^D)/(E-F)+G
• 3.A*(B+D)/E-F*(G+H/K)
• 4.((A+B)*D)^(E-F)
• 5.(A-B)/((D+E)*F)
• 6.((A+B)/D)^((E-F)*G)
• 7.12/(7-3)+2*(1+5)
• 8.5+3^2-8/4*3+6
• 9.6+2^3+9/3-4*5
• 10.6+2^3^2-4*5
• Evaluate the postfix expression ( , is the separator )
• 1.5,3,+,2,*,6,9,7,-,/,-
• 2.3,5,+,6,4,-,*,4,1,-,2,^,+
• 3.3,1,+,2,^7,4,1,-,2,*,+,5.-
Array representation of Queue
-> Every queue has front and rear variables that point to the position from where deletions and insertions
can be done, respectively.
.
-> Before inserting an element in a queue, we must check for overflow conditions. An overflow will occur
when we try to insert an element into a queue that is already full. When REAR = MAX – 1, where MAX is
the size of the queue, we have an overflow condition
1. Algorithm to insert an element in a queue
Step 1: IF REAR = MAX-1 Explanation of Algorithm:
Write OVERFLOW Step 1, we first check for the overflow condition.
Goto step 4 Step 2, we check if the queue is empty. In case the queue
[END OF IF] is
empty, then both FRONT and REAR are set to zero, so
Step 2: IF FRONT = -1 and REAR = -1
that the
SET FRONT = REAR =0 new value can be stored at the 0th location. Otherwise, if
ELSE the
SET REAR = REAR + 1 queue already has some values, then REAR is
incremented so
[END OF IF]
that it points to the next location in the array.
Step 3: SET QUEUE[REAR] = NUM Step 3, the value is stored in the queue at the location
Step 4: EXIT pointed by REAR.
-> Before deleting an element from a queue, we must check for underflow conditions. An underflow
condition occurs when we try to delete an element from a queue that is already empty. If FRONT = –1
and REAR = –1, it means there is no element in the queue.

2. Algorithm to delete an element from a queue Explanation of Algorithm


• Step 1: IF FRONT = -1 OR FRONT > REAR Step 1: We check for underflow condition. An
underflow occurs if FRONT = –1 or FRONT >
• Write UNDERFLOW
REAR.
• ELSE Step 2: if queue has some values, then FRONT
• SET FRONT = FRONT + 1 is incremented so that it now points to the next
value in the queue.
• [END OF IF]
• Step 2: EXIT
• A queue data structure can be classified into the following types:
1. Circular Queue 2. Deque 3. Priority Queue 4. Multiple Queue
• Circular Queues(RING BUFFER )
• In Linear queues, insertions can be done only at one end called the REAR and deletions are always done
from the other end called the FRONT.
23 22 40

• Consider a scenario in which two successive deletions are made. Even though there is space available, the
overflow condition still exists because the condition rear = MAX – 1 still holds true. This is a major
drawback of a linear queue.
• To resolve this problem, we have two solutions. First, shift the elements to the left so that the vacant space
can be occupied and utilized efficiently. But this can be very time-consuming, especially when the queue is
quite large.
• The second option is to use a circular queue. In the circular queue, the first index comes right after the last
index.
• Queue is called circular when the last room comes just before the first room. That is, Q[0] comes after Q[n-1].
• The circular queue will be full only when front = 0 and rear = Max – 1.
• It is implemented in the same manner as a linear queue is implemented. The only difference will be in the
code that performs insertion and deletion operations.
• It uses 2 variables to keep track of first element and last element.
• Front is used to refer first element and rear is used to refer last element.
• Condition for “Circular Queue is Empty ”
FRONT = –1 and REAR=-1
• Condition for “Circular Queue is FULL ”

((FRONT = 0 and REAR = MAX – 1) or (rear=front-1) )


(or)
Empty Circular Queue
FRONT == (REAR+1) % MAX
front =rear=-1
• If we try to insert an element into a queue that is already full then Overflow exception occurs.
• If we try to delete an element from an empty queue it will throw Underflow exception.
• Insertion on Circular Queue:
Steps to do Insertion on Circular Queue
1. If FRONT == (REAR+1) % MAX, then the circular queue
is full. If Circular queue is full then throw overflow exception.
2. If rear != MAX – 1, then rear will be incremented by 1 and
the new value will be inserted there.
3. If front != 0 and rear = MAX – 1, then it means that the
queue is not full. So, set rear = 0 and insert the new element
there.
Algorithm to insert new value into a Circular Queue

Step 1: IF FRONT == (Rear+1) % MAX


print “OVERFLOW”
Goto Step-4
[END OF IF]
Step 2: IF FRONT = -1 and REAR = -1
SET FRONT = REAR = 0
ELSE IF
REAR = MAX - 1 and FRONT != 0 //CASE-3
SET REAR = 0
ELSE
SET REAR = REAR + 1 //case-2
[END OF IF]
Step 3: SET queue[REAR] = value
Step 4: EXIT
Steps to do Deletion on Circular Queue
1.If front = –1, then there are no in the queue elements(empty Queue).
So, throw an underflow exception.
2.If the queue is not empty and front = rear, then after deleting the
element at the front the queue becomes empty and so front and rear are
set to –1.
3.If the queue is not empty and front = MAX–1, then after deleting the
element at the front, front is set to 0.
4.If the queue is not empty and above 3 conditions are not met then
increment the front by 1.
Algorithm to delete a value from a Circular Queue
Step 1: IF FRONT == -1
print “UNDERFLOW”
Goto Step-4
[END OF IF]
Step 2: SET value = queue[FRONT]
Step 3: IF FRONT == REAR //case-2: q with single element
SET FRONT = REAR = -1
ELSE IF FRONT == MAX - 1 //case-3
SET FRONT = 0
ELSE
SET FRONT = FRONT + 1 // case -4
[END OF IF]
[END OF IF]
Step 4: EXIT
Algorithm to display the elements of a Circular Queue
Step 1: IF FRONT == -1 and REAR == -1
print “Queue is empty”
Goto Step-4
[END OF IF]
Step 2: IF FRONT < REAR
REPEAT for i from FRONT to REAR DO
PRINT queue[i]
ELSE
REPEAT for i from FRONT to MAX DO
PRINT queue[i]
REPEAT for i from 0 to REAR DO
PRINT queue[i]
[END OF IF]
Step 4: EXIT
Dequeue
• A deque (pronounced as ‘deck’ or ‘dequeue’) is a list in which the elements can be
inserted or deleted at either end. It is also known as a head-tail linked list because
elements can be added to or removed from either the front (head) or the back (tail)
end.
• No element can be added and deleted from the middle.
• A deque is implemented using either a circular array or a circular doubly linked
list.
• Two pointers are maintained, LEFT and RIGHT, which point to either end of the
deque. The elements in a deque extend from the LEFT end to the RIGHT end and
since it is circular, Dequeue[N–1] is followed by Dequeue[0].
• There are two variants of a double-ended queue. They include
a. Input restricted deque In this dequeue, insertions can be done only at one of the
ends, while deletions can be done from both ends.
b. Output restricted deque In this dequeue, deletions can be done only at one of the
ends, while insertions can be done on both ends.
Operations on Deque
• InsertLeft (e): Insert a new element e at the beginning of the deque.
• InsertRight(e): Insert a new element e at the end of the deque.
• DeleteLeft(): Remove and return the first element of the deque; error occurs if the deque is empty.
• DeleteRight(): Remove and return the last element of the deque; an error occurs if the deque is
empty.
• getLeft(): Return the first element of the deque; an error occurs if the deque is empty.
• getRight(): Return the last element of the deque; an error occurs if the deque is empty.
• size(): Return the number of elements of the deque.
• isEmpty(): check if the deque is empty.
Note:
• InsertRight(e), DeleteLeft(), DeleteRight() operations are performed by input-restricted
deque
• InsertLeft(), InsertRight(), DeleteLeft() operations are performed by output-restricted deque.
InsertRight (element)
DeleteLeft() - An element is removed from the beginning of the deque
DeleteRight()
Unit-III
Linked List
A linked List is a data structure used for storing collections of data. A linked list has the following properties.
• Successive elements are connected by pointers
• The last element points to NULL
• Can grow or shrink in size during execution of a program
• Can be made just as long as required (until systems memory exhausts)
• Docs not waste memory space (but takes some extra memory for storing addresses)

The pointers are maintained based on the requirements and accordingly linked list can be classified into three
groups,
1. Singly linked lists
2. Circular linked lists
3. Doubly linked lists

Singly Linked List


Generally "linked list" means a singly linked list. The list grows in forward direction. This list consists of a number
of nodes in which each node has a next pointer to the next element in the list. The link of the last node in the list
is NULL, which indicates the end of the list.

In a linked list, every node contains a NEXT pointer to another node which points to a node of the same type.
Hence, it is also called a self-referential data type.

Basic Operations on a List


 Traversing the list
 Inserting an item in the list
 Deleting an item from the list
Traversing the Linked List :
Let us assume that the head points to the first node of the list. To traverse the list we do the following.
 follow the pointers.
 Display the contents of the nodes (or count) as they are traversed.
 Stop when the next pointer points to NULL

Inserting an item in the list

 There are various positions where node can be inserted.


 Case-1: Insert at front ( as a first element)
 Case-2: Insert at end ( as a last node)
 Case-3: Insert at given position

Inserting At End of the list


We can use the following steps to insert a new node at end of the single linked list...
Step 1 - Create a newNode with given value and newNode → next as NULL.
Step 2 - Check whether list is Empty (head == NULL).
Step 3 - If it is Empty then, set head = newNode.
Step 4 - If it is Not Empty then, define a node pointer temp and initialize with head.
Step 5 - Keep moving the temp to its next node until it reaches to the last node in the list (until temp → next is
equal to NULL).
Step 6 - Set temp → next = newNode.

Basic Construction of Single Linked List


class Node:
#constructor
def _ init (sell):
self.data = None
self.next = None
#method for setting the data field of the node
def setData(self,data):
self.data = data
#method for getting the data field of the node
def getData(self):
return self.data
#method for setting the next field of the node
def setNext(self,next):
self. next = next
#method for getting the next field of the node
def getNext(self):
return self.next

#Method for inserting a new node at any position in a Linked List


def insertAtPos(self,pos,data):
if pos > self.length or pos < 0:
return None
else:
if pos == 0:
self.insertAtBeg(data)
else:
if pos ==self.length :
self. insert.AtEnd (data)
else:
newNode = Node()
newNode. setData(data)
count= 0
current= self.head
while count< pos-1:
count+= 1
current = current.getNext()
newNode.setNext(current.getNext())
current.setNext(newNode)
self. length += l

We can use the following steps to delete a node from beginning of the single linked list...
Step 1 - Check whether list is Empty (head == NULL)
Step 2 - If it is Empty then, display 'List is Empty!!! Deletion is not possible' and terminate the function.
Step 3 - If it is Not Empty then, define a Node pointer 'temp' and initialize with head.
Step 4 - Check whether list is having only one node (temp → next == NULL)
Step 5 - If it is TRUE then set head = NULL and delete temp (Setting Empty list conditions)
Step 6 - If it is FALSE then set head = temp → next, and delete temp.

Deleting the last element:

def deleteLastNode(self):
if self.length == 0:
print "The list is empty"
else:
currentnode = self.head
previousnode = self.head
while currentnode.getNext() != None:
previousnode = currentnode
currcntnode = currentnode.getNext()
previousnode. setNext(None)
self. length -= l
Similarly we can insert and delete using other options

Doubly Linked Lists


The advantage of a doubly linked list (also called two - way linked list) is that given a node in the list, we can navigate in
both directions. A node in a singly linked list cannot be removed unless we have the pointer to its predecessor. But in a
doubly linked list, we can delete a node even if we don't have the previous node's address (since each node has a left
pointer pointing to the previous node and can move backward).

The primary disadvantages of doubly linked lists are:


• Each node requires an extra pointer, requiring more s pace.
• The insertion or deletion of a node takes a bit longer (more pointer opera Lions).
Similar to a singly linked list, let us implement the operations of a doubly linked list.

Creating a Double Linked List:


class Node:
# If data is not given by user,ils taken as None
def __init__ (self, data, next=None, prev=None):
self.data = data
self.next= next
self. prev = prev
#method for setting Lhe data field or I.he node
def setData(self,data):
self.data = data
#method for getting the data field of the node
def getData(self):
return self.data

We can use the following steps to insert a new node at beginning of the double linked list...
Step 1 - Create a newNode with given value and newNode → previous as NULL.
Step 2 - Check whether list is Empty (head == NULL)
Step 3 - If it is Empty then, assign NULL to newNode → next and newNode to head.
Step 4 - If it is not Empty then, assign head to newNode → next and newNode to head.

Inserting At End of the list


Step 1 - Create a newNode with given value and newNode → next as NULL.
Step 2 - Check whether list is Empty (head == NULL)
Step 3 - If it is Empty, then assign NULL to newNode → previous and newNode to head.
Step 4 - If it is not Empty, then, define a node pointer temp and initialize with head.
Step 5 - Keep moving the temp to its next node until it reaches to the last node in the list (until
temp → next is equal to NULL).
Step 6 - Assign newNode to temp → next and temp to newNode → previous.

We can use the following steps to delete a node from beginning of the double linked list...
Step 1 - Check whether list is Empty (head == NULL)
Step 2 - If it is Empty then, display 'List is Empty!!! Deletion is not possible' and terminate the
function.
Step 3 - If it is not Empty then, define a Node pointer 'temp' and initialize with head.
Step 4 - Check whether list is having only one node (temp → previous is equal to temp → next)
Step 5 - If it is TRUE, then set head to NULL and delete temp (Setting Empty list conditions)
Step 6 - If it is FALSE, then assign temp → next to head, NULL to head → previous and delete
temp.
Deleting from End of the list
We can use the following steps to delete a node from end of the double linked list...
Step 1 - Check whether list is Empty (head == NULL)
Step 2 - If it is Empty, then display 'List is Empty!!! Deletion is not possible' and terminate the
function.
Step 3 - If it is not Empty then, define a Node pointer 'temp' and initialize with head.
Step 4 - Check whether list has only one Node (temp → previous and temp → next both are
NULL)
Step 5 - If it is TRUE, then assign NULL to head and delete temp. And terminate from the
function. (Setting Empty list condition)
Step 6 - If it is FALSE, then keep moving temp until it reaches to the last node in the list. (until
temp → next is equal to NULL)

class Node:
def __init__(self, data):
self.data = data
self.next = None
self.prev = None
class DoublyLinkedList:
def __init__(self):
self.head = None
# insert node at the front
def insert_front(self, data):
# allocate memory for newNode and assign data to newNode
new_node = Node(data)
# make newNode as a head
new_node.next = self.head
# assign null to prev (prev is already none in the constructore)
# previous of head (now head is the second node) is newNode
if self.head is not None:
self.head.prev = new_node
# head points to newNode
self.head = new_node
# insert a node after a specific node
def insert_after(self, prev_node, data):
# check if previous node is null
if prev_node is None:
print("previous node cannot be null")
return
# allocate memory for newNode and assign data to newNode
new_node = Node(data)
# set next of newNode to next of prev node
new_node.next = prev_node.next
# set next of prev node to newNode
prev_node.next = new_node
# set prev of newNode to the previous node
new_node.prev = prev_node
# set prev of newNode's next to newNode
if new_node.next:
new_node.next.prev = new_node
# insert a newNode at the end of the list
def insert_end(self, data):
# allocate memory for newNode and assign data to newNode
new_node = Node(data)
# assign null to next of newNode (already done in constructor)
# if the linked list is empty, make the newNode as head node
if self.head is None:
self.head = new_node
return
# store the head node temporarily (for later use)
temp = self.head
# if the linked list is not empty, traverse to the end of the linked list
while temp.next:
temp = temp.next
# now, the last node of the linked list is temp
# assign next of the last node (temp) to newNode
temp.next = new_node
# assign prev of newNode to temp
new_node.prev = temp
return
# delete a node from the doubly linked list
def deleteNode(self, dele):
# if head or del is null, deletion is not possible
if self.head is None or dele is None:
return
# if del_node is the head node, point the head pointer to the next of del_node
if self.head == dele:
self.head = dele.next
# if del_node is not at the last node, point the prev of node next to del_node to the previous of
del_node
if dele.next is not None:
dele.next.prev = dele.prev
# if del_node is not the first node, point the next of the previous node to the next node of del_node
if dele.prev is not None:
dele.prev.next = dele.next
# free the memory of del_node
gc.collect()
# print the doubly linked list
def display_list(self, node):
while node:
print(node.data, end="->")
last = node
d_linked_list = DoublyLinkedList()
d_linked_list.insert_end(5)
d_linked_list.insert_front(1)
d_linked_list.insert_front(6)
d_linked_list.insert_end(9)
# insert 11 after head
d_linked_list.insert_after(d_linked_list.head, 11)
# insert 15 after the seond node
d_linked_list.insert_after(d_linked_list.head.next, 15)
d_linked_list.display_list(d_linked_list.head)
# delete the last node
d_linked_list.deleteNode(d_linked_list.head.next.next.next.next.next)
print()
d_linked_list.display_list(d_linked_list.head)

Circular Linked List


A circular linked list is a type of linked list in which the first and the last nodes are also connected to each other to
form a circle.

There are basically two types of circular linked list:

1. Circular Singly Linked List

Here, the address of the last node consists of the address of the first node.
2. Circular Doubly Linked List

Here, in addition to the last node storing the address of the first node, the first node will also store the address of the
last node.

Operations on circular linked lists can be performed exactly like a singly linked list. It’s just that we have to
maintain an extra pointer to check if we have gone through the list once. The circular linked list is the collection
of nodes in which tail node also point back to head node. The diagram shown below depicts a circular linked list.
Node A represents head and node D represents tail. So, in this list, A is pointing to B, B is pointing to C and C is
pointing to D but what makes it circular is that node D is pointing back to node A.

Stacks

1. Stack is a data structure in which addition of new element or deletion of an existing element always takes
place at the same end. This end is often known as top of stack. When an item is added to a stack, the
operation is called push, and when an item is removed from the stack the operation is called pop. Stack
is also called as Last- In-First- Out (LIFO) list.
2. Stacks are used in function calls. The system stack ensures a proper execution order of functions.
Therefore, stacks are frequently used in situations where the order of processing is very important,
especially when the processing needs to be postponed until other conditions are fulfilled.
Applications of stacks

• Balancing of symbols (or) Parenthesis matching


• Infix-to-postfix conversion
• Infix to prefix conversions.
• Evaluation of postfix expression
• Implementing function calls and recursion
• Tower of Hanoi
• Matching Tags in HTML and XML
• Auxiliary data structure for other algorithms (Example: Tree traversal algorithms)

Real time Examples

1. Pile of plates in cafeteria - The plates are added to the stack as they are cleaned and they are placed on the
top. When a plate, is required it is taken from the top of the stack. The first plate placed on the stack is the
last one to be used.
2. Stack of coins
3. Stack of Books

Operations on Stack:

There are two possible operations done on a stack. They are pop and push operations.
Push: Allows adding an element at the top of the stack.
Pop: Allows removing an element from the top of the stack.
The Stack can be implemented using both arrays and linked lists. When dynamic memory allocation is preferred,
we go for linked lists to implement the stacks.
Attempting the execution of an operation may sometimes cause an error condition, called an exception.

•Exceptions are said to be “thrown” by an operation that cannot be executed.

• In the Stack ADT, operations pop and top cannot be performed if the stack is empty. The execution of pop (or)
top operation on an empty stack throws an exception called as underflow.

•Trying to push an element in a full stack throws an exception called as overflow.

ALGORITHM / PROCEDURE:
To push a node in the stack :
step 1. Initialise a node
step 2. Update the value of that node by data i.e. node->data = data
step 3. Now link this node to the top of the linked list
step 4. And update top pointer to the current node

To pop a node from the stack:


step 1. First Check whether there is any node present in the linked list or not, if not then return
step 2. Otherwise make pointer let say temp to the top node and move forward the top node by 1 step
step 3. Now free this temp node
Peek operationon on stack
step 1. Check if there is any node present or not, if not then return.
step 2. Otherwise return the value of top/front node of the data structure(stack/queue)

Display Operation on stack


step 1. Take a temp node and initialize it with top/front pointer
step 2. Now start traversing temp till it encounters NULL
step 3. Simultaneously print the value of the temp node

Below is the pseudo-code to perform operations on stack using linked list

class Node:
def __init__(self, data):
self.data=data
self.next=None
class LinkedList:
def __init__(self):
self.head=None
self.tail=None
def insert_at_beg(self, data):
new_node=Node(data)
if self.head is None:
self.head = new_node
else:
new_node.next = self.head
self.head = new_node
def insert_at_end(self, data):
new_node=Node(data)
if self.tail is None:
self.tail = new_node
else:
new_node.next = self.tail
self.tail = new_node
def delete_at_beg(self):
if self.head is None:
return None
else:
delnode=self.head
self.head=self.head.next
return delnode.data
def delete_at_end(self):
if self.tail is None:
return None
else:
delnode=self.head
while delnode.next!=self.tail:
delnode=delnode.next
delnode.next=None
self.tail=delnode
def get_head(self):
if self.head!=None:
return self.head
else:
return None
def get_tail(self):
if self.tail!=None:
return self.tail
else:
return None
class Stack:
def __init__(self):
self.stack= LinkedList()
self.top=self.stack.get_head()
def push(self,data):
self.stack.insert_at_beg(data)
self.top=self.stack.get_head()
def pop(self):
x=self.stack.delete_at_beg()
if x is None:
print("Stack is Empty")
else:
print(f"{x} deleted from stack")
self.top=self.stack.get_head()
def display(self):
if self.top is None:
print("Stack is Empty")
else:
curr=self.top
while curr:
print(curr.data,end='\n')
curr = curr.next
s=Stack()
while(True):
print("1.Push 2.Pop 3.Display 4.exit")
ip=int(input("Enter the input"))
if(ip==1):
ele=int(input("Enter element"))
s.push(ele)
elif(ip==2):
ele=s.pop()
elif(ip==3):
s.display()
else:
break

Queue

Queue:
A queue is another special kind of list, where items are inserted at one end called the rear and deleted at the other
end called the front. Another name for a queue is a ―FIFO‖ or ―First-in-first-out‖ list.
The operations for a queue are analogues to those for a stack, the difference is that the insertions go at the end of
the list, rather than the beginning. We shall use the following operations on queues:
• enqueue: which inserts an element at the end of the queue.
• dequeue: which deletes an element at the start of the queue.
Representation of Queue:

The header pointer of the linked list is used as FRONT. Another pointer called REAR, which will store the address
of the last element in the queue.

•All insertions will be done at the rear end and all the deletions will be done at the front end.
• Condition for “Empty Queue” : FRONT = REAR = NULL

•Space complexity of linked list representation of the queue with n elements is O(n), and time complexity for the
operations is O(1).

1.insert Operation (enqueue):

•It inserts a new element at the end of the queue.

•First check if FRONT=NULL then allocate memory for a new node and new node will be both FRONT and
REAR.

•If FRONT!=NULL then insert the new node at the rear end of the linked queue and name this new node as REAR.
Linked List Implementation of Queue

Linked List Representation of Queue

class Node:
def __init__(self, data):
self.data=data #
self.next=None
class LinkedList:
def __init__(self):
self.head=None
self.tail=None
def insert_at_beg(self, data):
new_node=Node(data)
if self.head is None:
self.head = new_node
else:
new_node.next = self.head
self.head = new_node
def insert_at_end(self, data):
new_node=Node(data)
if self.tail is None:
self.tail = new_node
self.head = new_node
else:
self.tail.next=new_node
self.tail = new_node

def delete_at_beg(self):

if self.head is None:
return None
else:
delnode=self.head
self.head=self.head.next
return delnode.data
def delete_at_end(self):
if self.tail is None:
return None
else:
delnode=self.head
while delnode.next!=self.tail:
delnode=delnode.next
delnode.next=None
self.tail=delnode
def get_head(self):
if self.head!=None:
return self.head
else:
return None
def get_tail(self):
if self.tail!=None:
return self.tail
else:
return None
class Queue:
def __init__(self):
self.q= LinkedList() #Create a linked list called q which is the object of LinkedList class
self.head=self.q.get_head() #rear is initialized with head

self.tail=self.q.get_tail()

def enque(self,data):

self.q.insert_at_end(data)

self.tail=self.q.get_tail()

def deque(self):
x=self.q.delete_at_beg()
if x is None:
print("Queue is Empty")
else:
print(f"{x} deleted from queue")
self.head=self.q.get_head()
def display(self):
if self.tail is None:
print("Queue is Empty")
else:
curr=self.q.get_head()
while curr :
print(curr.data,end='\n')
curr=curr.next

s=Queue()
while(True):
print("1.Enque 2.Deque 3.Display 4.exit")
ip=int(input("Enter the input"))
if(ip==1):
ele=int(input("Enter element"))
s.enque(ele)
elif(ip==2):
ele=s.deque()
elif(ip==3):
s.display()
else:
break
UNIT 4 TREES : Introduction, binary trees, type of trees. ,properties
of binary trees, trees ,binary tree traversals,binary search trees ,Graph :
intoduction ,applications of graph , graph represntation, graph traversalas .

INTRODUCTION
Tree is a non-linear data structure. It is a hierarchical data structure that has
nodes connected through links. The topmost node of the tree which has no
parent is known as the root node.

BINARY TREE
Tree represents the nodes connected by edges. It is a non-linear
data structure. It has the following properties −
• One node is marked as Root node.
• Every node other than the root is associated with one parent
node.
• Each node can have an arbiatry number of chid node.

CODE
class Node:
def __init__(self, data):

self.left = None
self.right = None
self.data = data
def PrintTree(self):

print(self.data)
root = Node(10)
root.PrintTree()

OUTPUT
10
Inserting into a Tree
CODE :
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data

def insert(self, data):


# Compare the new value with the parent node
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data

# Print the tree


def PrintTree(self):
if self.left:
self.left.PrintTree()
print( self.data),
if self.right:
self.right.PrintTree()

# Use the insert method to add nodes


root = Node(12)
root.insert(6)
root.insert(14)
root.insert(3)
root.PrintTree()
OUTPUT
3 6 12 14

Types of Binary Tree

Following are the types of Binary Tree based on the


number of children:
1.Full Binary Tree
2.Degenerate Binary Tree
3.Skewed Binary Trees
1. Full Binary Tree
A) A Binary Tree is a full binary tree if every node has 0 or 2
children. The following are examples of a full binary tree. We
can also say a full binary tree is a binary tree in which all nodes
except leaf nodes have two children.
B)A full Binary tree is a special type of binary tree in which
every parent node/internal node has either two or no children.
It is also known as a proper binary tree.
2. Degenerate (or pathological) tree
A Tree where every internal node has one child. Such trees are
performance-wise same as linked list. A degenerate or
pathological tree is a tree having a single child either left or
right.
3. Skewed Binary Tree
A skewed binary tree is a pathological/degenerate tree in which the tree is either dominated by
the left nodes or the right nodes. Thus, there are two types of skewed binary tree: left-skewed
binary tree and right-skewed binary tree.
Types of Binary Tree On the basis of the completion of
levels:
1.Complete Binary Tree
2.Perfect Binary Tree
3.Balanced Binary Tree

1. Complete Binary Tree


A Binary Tree is a Complete Binary Tree if all the levels are completely filled
except possibly the last level and the last level has all keys as left as possible.
A complete binary tree is just like a full binary tree, but with two major
differences:
1.Every level except the last level must be completely filled.
2.All the leaf elements must lean towards the left.
3.The last leaf element might not have a right sibling i.e. a complete binary tree
doesn’t have to be a full binary tree.
2. Perfect Binary Tree
A Binary tree is a Perfect Binary Tree in which all the internal nodes have two
children and all leaf nodes are at the same level.
The following are examples of Perfect Binary Trees.
A perfect binary tree is a type of binary tree in which every internal node has
exactly two child nodes and all the leaf nodes are at the same level.
In a Perfect Binary Tree, the number of leaf nodes is the number of internal nodes
plus 1

L = I + 1 Where L = Number of leaf nodes, I = Number of internal nodes.


A Perfect Binary Tree of height h (where the height of the binary tree is
the number of edges in the longest path from the root node to any leaf
node in the tree, height of root node is 0) has 2h+1 – 1 node.
An example of a Perfect binary tree is ancestors in the family. Keep a
person at root, parents as children, parents of parents as their children.

3. Balanced Binary Tree


A binary tree is balanced if the height of the tree is O(Log n) where n is
the number of nodes. For Example, the AVL tree maintains O(Log n)
height by making sure that the difference between the heights of the
left and right subtrees is at most 1. Red-Black trees maintain O(Log n)
height by making sure that the number of Black nodes on every root to
leaf paths is the same and that there are no adjacent red nodes.
Balanced Binary Search trees are performance-wise good as they
provide O(log n) time for search, insert and delete.

It is a type of binary tree in which the difference between the height of the left
and the right subtree for each node is either 0 or 1. In the figure above, the root
node having a value 0 is unbalanced with a depth of 2 units.
Some Special Types of Trees:
On the basis of node values, the Binary Tree can be classified
into the following special types:
Binary Search Tree
AVL Tree
Red Black Tree
B Tree
B+ Tree
Segment Tree

Below Image Shows Important Special cases of binary Trees:

1. Binary Search Tree


Binary Search Tree is a node-based binary tree data structure that has
the following properties:
1.The left subtree of a node contains only nodes with keys
lesser than the node’s key.
2.The right subtree of a node contains only nodes with keys
greater than the node’s key.
3. The left and right subtree each must also be a binary search
tree.

2. AVL Tree
AVL tree is a self-balancing Binary Search Tree (BST) where the difference
between heights of left and right subtrees cannot be more than one for all nodes.
Example of AVL Tree shown below:
The below tree is AVL because the differences between the heights of left and
right subtrees for every node are less than or equal to 1
3. Red Black Tree
A red-black tree is a kind of self-balancing binary search tree where each node has
an extra bit, and that bit is often interpreted as the color (red or black). These
colors are used to ensure that the tree remains balanced during insertions and
deletions. Although the balance of the tree is not perfect, it is good enough to
reduce the searching time and maintain it around O(log n) time, where n is the
total number of elements in the tree. This tree was invented in 1972 by Rudolf
Bayer.
Properties of Binary Tree

1. The maximum number of nodes at level ‘l’ of a binary tree is 2l :


Note: Here level is the number of nodes on the path from the root to the node
(including root and node). The level of the root is 0
This can be proved by induction:
For root, l = 0, number of nodes = 20 = 1
Assume that the maximum number of nodes on level ‘l’ is 2l
Since in a Binary tree every node has at most 2 children, the next level would have
twice nodes, i.e. 2 * 2l

2. The Maximum number of nodes in a binary tree of height ‘h’ is 2h – 1:


Note: Here the height of a tree is the maximum number of nodes on the root-to-
leaf path. The height of a tree with a single node is considered as 1
This result can be derived from point 2 above. A tree has maximum nodes if all
levels have maximum nodes. So the maximum number of nodes in a binary tree
of height h is 1 + 2 + 4 + .. + 2h-1. This is a simple geometric series with h terms
and the sum of this series is 2h– 1.
In some books, the height of the root is considered as 0. In this convention, the
above formula becomes 2h+1 – 1

3. In a Binary Tree with N nodes, the minimum possible height


or the minimum number of levels is Log2(N+1):
Each level should have at least one element, so the height cannot be more than
N. A binary tree of height ‘h’ can have a maximum of 2h – 1 nodes (previous
property). So the number of nodes will be less than or equal to this maximum
value
N <= 2h – 1
2h >= N+1
log2(2h) >= log2(N+1) (Taking log both sides)
hlog22 >= log2(N+1) (h is an integer)
h >= | log2(N+1) |
So the minimum height possible is | log2(N+1) |

5. In a Binary tree where every node has 0 or 2 children, the number of leaf nodes
is always one more than nodes with two children:
L=T+1
Where L = Number of leaf nodes
T = Number of internal nodes with two children
Proof:
No. of leaf nodes (L) i.e. total elements present at the bottom of tree = 2h-1 (h is
height of tree)
No. of internal nodes = {total no. of nodes} – {leaf nodes} = { 2h – 1 } – {2h-1} = 2h-
1 (2-1) – 1 = 2h-1 – 1
So , L = 2h-1
T = 2h-1 – 1
Therefore L = T + 1
Hence proved
6. In a non-empty binary tree, if n is the total number of nodes and e is
the total number of edges, then e = n-1:
Every node in a binary tree has exactly one parent with the exception of
the root node. So if n is the total number of nodes then n-1 nodes have
exactly one parent. There is only one edge between any child and its
parent. So the total number of edges is n-1.

Some extra properties of binary tree are:


1.Each node in a binary tree can have at most two child nodes: In a binary tree,
each node can have either zero, one, or two child nodes. If a node has zero
children, it is called a leaf node. If a node has one child, it is called a unary node. If
a node has two children, it is called a binary node.

2.The node at the top of the tree is called the root node: The root node
is the first node in a binary tree and all other nodes are connected to it.
All other nodes in the tree are either child nodes or descendant nodes
of the root node.
3.Nodes that do not have any child nodes are called leaf nodes: Leaf
nodes are the endpoints of the tree and have no children. They
represent the final result of the tree.
4.The height of a binary tree is defined as the number of edges from
the root node to the deepest leaf node: The height of a binary tree is
the length of the longest path from the root node to any of the leaf
nodes. The height of a binary tree is also known as its depth.
5.In a full binary tree, every node except the leaves has exactly two
children: In a full binary tree, all non-leaf nodes have exactly two
children. This means that there are no unary nodes in a full binary tree.
6.In a complete binary tree, every level of the tree is completely filled
except for the last level, which can be partially filled: In a complete
binary tree, all levels of the tree except the last level are completely
filled. This means that there are no gaps in the tree and all nodes are
connected to their parent nodes.
7 . In a balanced binary tree, the height of the left and right subtrees of
every node differ by at most 1: In a balanced binary tree, the height of
the left and right subtrees of every node is similar. This ensures that the
tree is balanced and that the height of the tree is minimized.
8. The in-order, pre-order, and post-order traversal of a binary tree are
three common ways to traverse the tree: In-order, pre-order, and post-
order are three different ways to traverse a binary tree. In-order
traversal visits the left subtree, the node itself, and then the right
subtree. Pre-order traversal visits the node itself, the left subtree, and
then the right subtree. Post-order traversal visits the left subtree, the
right subtree, and then the node itself.

Tree Traversal Techniques –


Unlike linear data structures (Array, Linked List, Queues, Stacks, etc) which have
only one logical way to traverse them, trees can be traversed in different ways.
A Tree Data Structure can be traversed in following ways:
1.Depth First Search or DFS
Inorder Traversal
Preorder Traversal
Postorder Traversal
2. Level Order Traversal or Breadth First Search or BFS
3.Boundary Traversal
4. Diagonal Traversal
BINARY SEARCH TREES : (BST s)

Binary Search Tree is a node-based binary tree data structure which has the
following properties:
1. The left subtree of a node contains only nodes with keys lesser than the node’s
key.
2.The right subtree of a node contains only nodes with keys greater than the
node’s key.
3.The left and right subtree each must also be a binary search tree.

CODE
If root == NULL
return NULL;
If number == root->data
return root->data;
If number < root->data
return search(root->left)
If number > root->data
return search(root->right)

1.Search Operation:
The algorithm depends on the property of BST that if each left subtree has values
below root and each right subtree has values above the root.
If the value is below the root, we can say for sure that the value is not in the right
subtree; we need to only search in the left subtree and if the value is above the
root, we can say for sure that the value is not in the left subtree; we need to only
search in the right subtree.

2. Insert Operation
Inserting a value in the correct position is similar to searching because we try to
maintain the rule that the left subtree is lesser than root and the right subtree is
larger than root.
We keep going to either right subtree or left subtree depending on the value and
when we reach a point left or right subtree is null, we put the new node there.

Algorithm:
If node == NULL
return createNode(data)
if (data < node->data)
node->left = insert(node->left, data);
else if (data > node->data)
node->right = insert(node->right, data);
return node;

3.Deletion Operation
There are three cases for deleting a node from a binary search tree.
Case I
In the first case, the node to be deleted is the leaf node. In such a case, simply
delete the node from the tree.

Case II
In the second case, the node to be deleted lies has a single child node. In such a
case follow the steps below:
1.Replace that node with its child node.
2.Remove the child node from its original position.
Case III
In the third case, the node to be deleted has two children. In such a case follow
the steps below:
1.Get the inorder successor of that node.
2.Replace the node with the inorder successor.
3.Remove the inorder successor from its original position.

# Binary Search Tree operations in Python

# Create a node
class Node:
def __init__(self, key):
self.key = key
self.left = None
self.right = None

# Inorder traversal
def inorder(root):
if root is not None:
# Traverse left
inorder(root.left)

# Traverse root
print(str(root.key) + "->", end=' ')

# Traverse right
inorder(root.right)

# Insert a node
def insert(node, key):

# Return a new node if the tree is empty


if node is None:
return Node(key)

# Traverse to the right place and insert the node


if key < node.key:
node.left = insert(node.left, key)
else:
node.right = insert(node.right, key)

return node

# Find the inorder successor


def minValueNode(node):
current = node

# Find the leftmost leaf


while(current.left is not None):
current = current.left

return current

# Deleting a node
def deleteNode(root, key):

# Return if the tree is empty


if root is None:
return root

# Find the node to be deleted


if key < root.key:
root.left = deleteNode(root.left, key)
elif(key > root.key):
root.right = deleteNode(root.right, key)
else:
# If the node is with only one child or no child
if root.left is None:
temp = root.right
root = None
return temp

elif root.right is None:


temp = root.left
root = None
return temp

# If the node has two children,


# place the inorder successor in position of the node to be deleted
temp = minValueNode(root.right)

root.key = temp.key
# Delete the inorder successor
root.right = deleteNode(root.right, temp.key)

return root

root = None
root = insert(root, 8)
root = insert(root, 3)
root = insert(root, 1)
root = insert(root, 6)
root = insert(root, 7)
root = insert(root, 10)
root = insert(root, 14)
root = insert(root, 4)

print("Inorder traversal: ", end=' ')


inorder(root)

print("\nDelete 10")
root = deleteNode(root, 10)
print("Inorder traversal: ", end=' ')
inorder(root)
GRAPHS :
A graph is a pictorial representation of a set of objects where some pairs of
objects are connected by links. The interconnected objects are represented by
points termed as vertices, and the links that connect the vertices are called edges.
The various terms and functionalities associated with a graph is described in great
detail in our tutorial here.
In this chapter we are going to see how to create a graph and add various data
elements to it using a python program. Following are the basic operations we
perform on graphs.

1.Display graph vertices


2.Display graph edges
3.Add a vertex

4.Add an edge
5.Creating a graph

A graph can be easily presented using the python dictionary data types. We
represent the vertices as the keys of the dictionary and the connection between
the vertices also called edges as the values in the dictionary.
Take a look at the following graph −
In the above graph,
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}

# Create the dictionary with graph elements

graph = {

"a" : ["b","c"],

"b" : ["a", "d"],

"c" : ["a", "d"],

"d" : ["e"],

"e" : ["d"]

# Print the graph

print(graph)
Output
When the above code is executed, it produces the following result −
{'c': ['a', 'd'], 'a': ['b', 'c'], 'e': ['d'], 'd': ['e'], 'b': ['a', 'd']}

Display graph vertices


To display the graph vertices we simple find the keys of the graph dictionary. We
use the keys() method.
CODE:
class graph:
def __init__(self,gdict=None):

if gdict is None:
gdict = []

self.gdict = gdict

# Get the keys of the dictionary


def getVertices(self):

return list(self.gdict.keys())

# Create the dictionary with graph elements

graph_elements = {

"a" : ["b","c"],

"b" : ["a", "d"],

"c" : ["a", "d"],

"d" : ["e"],

"e" : ["d"]

g = graph(graph_elements)

print(g.getVertices())

Output
When the above code is executed, it produces the following result −
['d', 'b', 'e', 'c', 'a']

APPLICATIONS OF GRAPHS:
1.In Computer science graphs are used to represent the flow of
computation.
2.Google maps uses graphs for building transportation systems, where
intersection of two(or more) roads are considered to be a vertex and
the road connecting two vertices is considered to be an edge, thus their
navigation system is based on the algorithm to calculate the shortest
path between two vertices.
3.In Facebook, users are considered to be the vertices and if they are
friends then there is an edge running between them. Facebook’s Friend
suggestion algorithm uses graph theory. Facebook is an example of
undirected graph.
4.In World Wide Web, web pages are considered to be the vertices.
There is an edge from a page u to other page v if there is a link of page
v on page u. This is an example of Directed graph. It was the basic idea
behind Google Page Ranking Algorithm.
5.In Operating System, we come across the Resource Allocation Graph
where each process and resources are considered to be vertices. Edges
are drawn from resources to the allocated process, or from requesting
process to the requested resource. If this leads to any formation of a
cycle then a deadlock will occur.
6.In mapping system we use graph. It is useful to find out which is an
excellent place from the location as well as your nearby location. In GPS
we also use graphs.
7.Facebook uses graphs. Using graphs suggests mutual friends. it shows
a list of the f following pages, friends, and contact list.
8.Microsoft Excel uses DAG means Directed Acyclic Graphs.
9.In the Dijkstra algorithm, we use a graph. we find the smallest path
between two or many nodes.
10.On social media sites, we use graphs to track the data of the users.
liked showing preferred post suggestions, recommendations, etc.
11.Graphs are used in biochemical applications such as structuring of
protein, DNA etc.

Representations of Graph:
Here are the two most common ways to represent a graph :
Adjacency Matrix
Adjacency List
Adjacency Matrix:
An adjacency matrix is a way of representing a graph as a matrix of
boolean (0’s and 1’s).
Let’s assume there are n vertices in the graph So, create a 2D matrix
adjMat[n][n] having dimension n x n.
1.If there is an edge from vertex i to j, mark adjMat[i][j] as 1.
2.If there is no edge from vertex i to j, mark adjMat[i][j] as 0.
Representation of Undirected Graph to Adjacency Matrix:

The below figure shows an undirected graph. Initially, the entire Matrix
is initialized to 0. If there is an edge from source to destination, we
insert 1 to both cases (adjMat[destination] and adjMat[destination])
because we can go either way.

Adjacency List
An array of Lists is used to store edges between two vertices. The size of array is
equal to the number of vertices (i.e, n). Each index in this array represents a
specific vertex in the graph. The entry at the index i of the array contains a linked
list containing the vertices that are adjacent to vertex i.
Let’s assume there are n vertices in the graph So, create an array of list of size n as
adjList[n].
adjList[0] will have all the nodes which are connected (neighbour) to vertex 0.
adjList[1] will have all the nodes which are connected (neighbour) to vertex 1 and
so on.

Representation of Undirected Graph to Adjacency list:


The below undirected graph has 3 vertices. So, an array of list will be created of
size 3, where each indices represent the vertices. Now, vertex 0 has two
neighbours (i.e, 1 and 2). So, insert vertex 1 and 2 at indices 0 of array. Similarly,
For vertex 1, it has two neighbour (i.e, 2 and 1) So, insert vertices 2 and 1 at
indices 1 of array. Similarly, for vertex 2, insert its neighbours in array of list.

Representation of Directed Graph to Adjacency list:


The below directed graph has 3 vertices. So, an array of list will be created of size
3, where each indices represent the vertices. Now, vertex 0 has no neighbours.
For vertex 1, it has two neighbour (i.e, 0 and 2) So, insert vertices 0 and 2 at
indices 1 of array. Similarly, for vertex 2, insert its neighbours in array of list.

Graph Traversal in Data Structure


We can traverse a graph in two ways :
1. BFS ( Breadth First Search )
2. DFS ( Depth First Search )

BFS Graph Traversal in Data Structure:


Breadth-first search (BFS) traversal is a technique for visiting all nodes in a given
network. This traversal algorithm selects a node and visits all nearby nodes in
order. After checking all nearby vertices, examine another set of vertices, then
recheck adjacent vertices. This algorithm uses a queue as a data structure as an
additional data structure to store nodes for further processing. Queue size is the
maximum total number of vertices in the graph.

Graph Traversal: BFS Algorithm


Pseudo Code :
def bfs(graph, start_node):
queue = [start_node]
visited = set()

while queue:
node = queue.pop(0)

if node not in visited:


visited.add(node)
print(node)

for neighbor in graph[node]:


queue.appe Graph Traversal: BFS Algorithm

DFS Graph Traversal in Data Structure:


When traversing a graph, the DFS method goes as far as it can before turning
around. This algorithm explores the graph in depth-first order, starting with a
given source node and then recursively visiting all of its surrounding vertices
before backtracking. DFS will analyze the deepest vertices in a branch of the
graph before moving on to other branches. To implement DFS, either recursion or
an explicit stack might be utilized.

Graph Traversal: DFS Algorithm


Pseudo Code :
def dfs(graph, start_node, visited=set()):
visited.add(start_node)
print(start_node)

for neighbor in graph[start_node]:


if neighbor not in visited:
dfs(graph, neighbor, visited)
Following are the types of Binary Tree based on the
number of children:
1. Full Binary Tree
2. Degenerate Binary Tree
3. Skewed Binary Trees
UNIT-IV
 Trees: General Trees, Binary Trees
 Implementing Trees
 Tree traversals
 Search Trees
 Binary Search Trees
 Balanced search trees
 AVL trees
 B- trees
 Priority Queue and Heaps
 Priority queue ADT
 Priority queue applications,
 Heap Trees
 implementing a priority queue with a Heap
 Heap Sort
An intuitive introduction to Non-Linear Data Structure
Suppose, John and Sarah are two high school students. Their class teacher assigned them to give a detailed list consisting of the names of
the Faculties in their year, along with their departments.
John decides to make a table consisting of the names of each person Sarah, on the other hand, decides to make a tree diagram that
along with his/her department and designations. shows all the faculties along with their designations:

Member Name Designation


Jane Principal
Marilla HOD, Science Dept
Anne Teacher, Physics
Gilbert Teacher, Chemistry
Moody Teacher, Math's
Mathew HOD, Commerce Dept
Jerry Teacher, Accountancy
Prissy Teacher, Business Studies
Mary Teacher, Economics

Q. Who do you think will get a better grade in their assessment?

It's Sarah because she has represented the relationship between the faculties while John has only provided a one-sided list that does not
show who works under whom. John's list is a linear data structure as you might have guessed, while Sarah's tree is a non-linear data
structure.
 A Non-Linear Data Structure is one in which its elements are not Q. Why is LinkedList Non-Linear?
connected in a linear fashion, as suggested by its name itself. Even though it might seem that LinkedList should be
 In such a data structure elements might be connected in a hierarchical Linear due to its sequential connection of elements, you
manner like a tree or graph, or it may be nonhierarchical like in a must remember that there is no contiguous memory
LinkedList. Non-linear data structures have a more complex structure in a LinkedList. All the elements of a LinkedList
implementation than their linear counterparts. are spread across the memory in a Non-Linear fashion,
hence it is a Non-Linear Data Structure.
 This session introduces Non-Linear Data structure, explores examples
of non-linear data structure, and goes through the differences Linear Data Structure Non-Linear Data Structure
between linear and nonlinear data structures.  Elements are connected  Elements are not
sequentially or in a connected sequentially or
 The main advantage of non-linear data structure is that it uses
contiguous manner. in a contiguous manner.
memory very efficiently than linear data structures.
 Elements are always  Elements may be present in
Let us now analyze the key points of a Non-Linear Data Structure: present in a single level single or multiple levels.
 Elements are not arranged sequentially.  There is no hierarchy  There is usually a hierarchy
between the elements. between elements.
 One element can be connected to multiple elements.
 They are easier to  They have a more complex
 There might be a hierarchical structure present. implement. implementation.
 Here, memory is not allocated in a contiguous manner, unlike  Memory allocation is  Memory allocation isn’t
linear data structure. sequential. sequential.
 Can be traversed in a  Requires multiple runs for
Examples of Non-Linear Data Structure: single run. traversal.
 Inefficient utilization of  Memory is utilized
Some examples of non-linear data structures are LinkedList, Trees, and memory. efficiently.
Graphs. We'll now go through each of them and understand why they are
 Examples include arrays,  Examples include trees,
called nonlinear data structures.
hash tables, stack, queue graphs etc.
Tree: As you might have figured it, the tree is a data structure that is General Tree: A tree that can contain any number of subtrees is
both nonlinear as well as hierarchical. Here elements are arranged in known as a general tree.
multiple levels, and each level can be completely or partially filled. Let
us now go through some of the basic terminologies of a tree-

 Root – The topmost node of the tree


 Parent – Each node is a parent of the nodes connected to it, below.
 Child – Each node that is a descendant of another node is called a
child of that node.
 Siblings – Nodes with the same parent
 Leaf - The nodes in the last/bottom-most level of the tree
 Edge - Link connecting two nodes.
Binary Tree: A tree where each node has atmost two children
(known as left child and right child) is known as a binary tree.
In the figure below, each subtree has either one or two child.

We will now see the types of trees out there:


Binary Search Tree: A binary Search tree is a special kind of binary AVL Tree: AVL Tree is a special kind of Binary Search Tree where
tree that holds the following properties: the height difference of the left and right subtrees is less than or
equal to one. Eg - In the figure given below, let’s analyze height
 All the nodes in the left subtree hold key with a value lesser difference of the left and right subtrees for all the subtrees:
than the node's key value.
 All the nodes in the right subtree hold key with a value greater
than the node's key value.

However, for the leftmost figure, 2 lies in the right subtree of 3 and
has a lesser value than 3, whereas, in a Binary Search Tree, all the
nodes in the right subtree should hold a key with a value greater
than the node's key value. Hence, the leftmost figure is not a
Binary Search Tree. Parent Node Height of left Height of Height
of Subtree subtree right subtree difference
12 3 2 1
8 2 1 1
18 1 0 1
5 1 0 1
11 0 0 0
17 0 0 0
4 0 0 0

As, we can see, the maximum height difference between any


left and right subtree is one. Thus the below tree is an AVL Tree.
2-3 Tree: A tree where every non leaf node has either of the following B+ Tree: B+ tree is an extension of B Tree. It often contains
two properties: large number of children per node.
 One data element and two children.
 Two data elements and three children.

B Tree: A B Tree helps to store data in sorted order and is commonly


used in database or file systems. Below are some of the properties
that a B Tree of order m holds:
Red Black Tree: Red-black trees are self-balancing binary
 There can be atmost m children for each node.
search trees where each node has one extra attribute which
 There should be atleast ⌈m/2⌉ child nodes for each non-leaf node
denotes its color (either RED or BLACK). Nodes are colored to
(except root).
ensure that the height of the tree remains balanced after
 The root should contain minimum 1 key.
insertion or deletion from it.
 Depth of every leaf is same.
Types of binary trees
5. Skewed Binary Tree: A skewed binary tree
1. Full Binary Tree: A full Binary is a pathological/degenerate tree in which
tree is a special type of binary tree the tree is either dominated by the left
in which every parent node/internal nodes or the right nodes. Thus, there are two
node has either two or no children. types of skewed binary tree: left-skewed
binary tree and right-skewed binary tree.

2. Perfect Binary Tree: A perfect


binary tree is a type of binary tree 6. Balanced Binary Tree: It is a type of
in which every internal node has binary tree in which the difference between
exactly two child nodes and all the the height of the left and the right subtree
leaf nodes are at the same level. for each node is either 0 or 1.

3. Complete Binary Tree: It is a Binary Tree Representation: A node of a


special type of binary tree where binary tree is represented by a structure
all the levels of the tree are filled containing a data part and two pointers to
completely except the lowest level other structures of the same type.
nodes which are filled from as left
as possible.

4. Degenerate/Pathological Tree
A degenerate or pathological tree is
the tree having a single child either
left or right.
Representation of binary trees How to identify the left child, right child, and parent of any node that is
The Binary tree means the node can have a maximum of two represented in the sequential form?
children. Here, the binary name suggests that ‘two’; In instance 1
therefore, each node can have either 0, 1, or 2 children. A
binary tree data structure is represented using two methods.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Those methods are as follows...

1. Array Representation If a node is at the ith index the


2. Linked List Representation
• The left child would be at (2*i)+1
Consider the given binary tree... • The right child would be at (2*i+2)
• The parent would be the floor((i-1)/2)

1. Array Representation of Binary Tree In instance 2


In the array representation of a binary tree, we use a one-
dimensional array (1-D Array) to represent a binary tree.
Consider the above example of a binary tree, and it is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
represented as follows...
If a node is at the ith index the

• The left child would be at (2*i)+


To represent a binary tree of depth 'n' using array
• The right child would be at (2*i+1)
representation, we need a one-dimensional array with a
maximum size of 2n + 1. • The parent would be the floor(i/2)
2. Linked List Representation of Binary Tree Properties of Binary Tree
•At each level of i, the maximum number of nodes is 2i.
We use a double-linked list to represent a binary tree. In a
double-linked list, every node consists of three fields. The first •The tree’s height is defined as the longest path from the root node
field is for storing the left child address, the second for storing to the leaf node. The tree which is shown above has a height equal to
actual data, and the third for storing the right child address. 3. Therefore, the maximum number of nodes at height 3 equals
(1+2+4) = 7.
In this linked list representation, a node has the following •Generally, the maximum number of nodes possible at height h is
structure... (20 + 21 + 22+….2h) = 2h+1 -1.
•The minimum number of nodes possible at height h equals h+1.
The general form of the left binary tree picture is represented If there are 'n' number of nodes in the binary tree.
using Linked list in the right-side picture. The minimum height can be computed as follows: As we know,
n = 2h+1 -1
n+1 = 2h+1
Taking logs on both sides,
log2(n+1) = log2 (2h+1)
log2(n+1) = h+1
h = log2 (n+1) - 1
The maximum height can be computed as: As we know,
n = h+1
h= n-1
Binary tree operations traversals 1. In-Order Traversal ( left child- root – right child)

The main operations in a binary tree are: search, In In-Order traversal, the root node is visited between the left child and
insert and delete. When we wanted to display a the right child. In this traversal, the left child node is visited first, then
binary tree, we need to follow some order in which all the root node is visited, and later, we go for visiting the right child node.
the nodes of that binary tree must be displayed. In any This in-order traversal is applicable for every root node of all subtrees in
binary tree, displaying order of nodes depends on the the tree. This is performed recursively for all nodes in the tree.
traversal method.
In the above example of a binary tree, first, we try to visit the left child
Displaying (or) visiting order of nodes in a binary tree of root node ‘A’, but A’s left child ‘B’ is a root node for the left subtree.
is called as Binary Tree Traversal. So we try to visit its (B’s) left child ‘D’, and again D is a root for subtree
with nodes D, I, and J. So we try to visit its left child, 'I', and it is the
There are three types of binary tree traversals. leftmost child. So first we visit 'I’ , then go for its root node 'D' and later
we visit D's right child 'J'. With this, we have completed the left part of
 In - Order Traversal node B. Then visit 'B’, and next B's right child 'F' is visited. With this, we
 Pre - Order Traversal have completed the left part of node A. Then visit root node 'A'. With
 Post - Order Traversal this, we have completed the left and root parts of node A. Then we go
for the right part of node A. In the right of A again, there is a subtree
Consider the following binary tree... with root C. So go for a left child of C, and again it is a subtree with root
G., But G does not have a left part, so we visit ’G’ and then visit G’s right
child K. With this, we have completed the left part of node C. Then visit
root node 'C' and next visit C's right child 'H' which is the rightmost
child in the tree. So we stop the process.

That means here we have visited in the order of I - D - J - B - F - A - G - K -


C-H
using In-Order Traversal
2. Pre - Order Traversal ( root – left child – right child )

In Pre-Order traversal, the root node is visited before the left child and right child
nodes. In this traversal, the root node is visited first, then its left child and later its
right child. This pre-order traversal is applicable for every root node of all subtrees in
the tree.
In the above example of a binary tree, first, we visit root node 'A' then visit its left
child 'B' which is a root for D and F. So we visit B's left child 'D’, and again D is a root
for I and J. So we visit D’s left child, ’I’ which is the leftmost child. So next, we go to
visit D’s right child ’J’. With this, we have completed the root, left, and right parts of
node D and the root, and left parts of node B. Next, visit B’s right child ’F’. With this,
we have completed the root and left parts of node A.

So we go for A's right child 'C' which is a root node for G and H. After visiting C, we go
for its left child 'G' which is a root for node K. So next, we visit the left of G, but it does
not have the left child, so we go for G’s right child, ’K’. With this, we have completed
node C’s root and left parts. Next, visit C's right child 'H' which is the rightmost child in
the tree. So we stop the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order Traversal.
3. Post - Order Traversal ( left child – right child – root )

In Post-Order traversal, the root node is visited after the left child and right child. In this traversal, the left child node is visited first, then its
right child, and then its root node. This is recursively performed until the rightmost node is visited.
Here we have visited in the order of I - J - D - F - B - K - G - H - C - A using Post-Order Traversal.
C Binary search Tree with an Example C Code (Search, Delete, Insert Nodes)
Binary tree is the data structure to maintain data into memory
of program. There exists many data structures, but they are
chosen for usage on the basis of time consumed in
insert/search/delete operations performed on data structures.

Binary tree is one of the data structures that are efficient in


insertion and searching operations. Binary tree works on O
(logN) for insert/search/delete operations.

Binary tree is basically tree in which each node can have two
child nodes and each child node can itself be a small binary
tree. To understand it, below is the example figure of binary
tree.

Binary tree works on the rule that child nodes which are lesser
than root node keep on the left side and child nodes which are
greater than root node keep on the right side. Same rule is
followed in child nodes as well that are itself sub-trees. Like in
above figure, nodes (2, 4, 6) are on left side of root node (9)
and nodes (12, 15, 17) are on right side of root node (9).
We will understand binary tree through its operations. We will cover following
operations.
•Create binary tree
•Search into binary tree
•Delete binary tree
•Displaying binary tree
Creation of binary tree: Binary tree is created by inserting root node and its
child nodes. We will use a C programming language for all the examples. Below is
the code snippet for insert function. It will insert nodes.

This function would determine the position as per value of node to be added and
new node would be added into binary tree. Function is explained in steps below
and code snippet lines are mapped to explanation steps given below.

[Lines 13-19] Check first if tree is empty, then insert node as root.
[Line 21] Check if node value to be inserted is lesser than root node value, then
•a. [Line 22] Call insert() function recursively while there is non-NULL left node
•b. [Lines 13-19] When reached to leftmost node as NULL, insert new node.
[Line 23] Check if node value to be inserted is greater than root node value, then
• a. [Line 24] Call insert() function recursively while there is non-NULL right node
• b. [Lines 13-19] When reached to rightmost node as NULL, insert new node.
Searching into binary tree: Searching is done as per value of node to
be searched whether it is root node or it lies in left or right sub-tree.
Below is the code snippet for search function. It will search node
into binary tree.

This search function would search for value of node whether node
of same value already exists in binary tree or not. If it is found, then
searched node is returned otherwise NULL (i.e. no node) is
returned. Function is explained in steps below and code snippet
lines are mapped to explanation steps given below.

1.[Lines 47-49] Check first if tree is empty, then return NULL.


2.[Lines 50-51] Check if node value to be searched is equal to root
node value, then return node
3.[Lines 52-53] Check if node value to be searched is lesser than
root node value, then call search() function recursively with left
node
4.[Lines 54-55] Check if node value to be searched is greater than
root node value, then call search() function recursively with right
node
5.Repeat step 2, 3, 4 for each recursion call of this search function
until node to be searched is found.
Deletion of binary tree: Binary tree is deleted by removing its child
nodes and root node. Below is the code snippet for deletion of binary
tree.

This function would delete all nodes of binary tree in the manner –
left node, right node and root node. Function is explained in steps
below and code snippet lines are mapped to explanation steps given
below.

[Line 39] Check first if root node is non-NULL, then


•a. [Line 40] Call deltree() function recursively while there is non-
NULL left node
•b. [Line 41] Call deltree() function recursively while there is non-
NULL right node
•c. [Line 42] Delete the node.
Displaying binary tree: Binary tree can be displayed in three forms – pre-order, in-
order and post-order.

•Pre-order displays root node, left node and then right node.
•In-order displays left node, root node and then right node.
•Post-order displays left node, right node and then root node.
These functions would display binary tree in pre-order, in-order and post-order
respectively. Function is explained in steps below and code snippet lines are mapped
to explanation steps given below.

Pre-order display

a. [Line 30] Display value of root node.


b. [Line 31] Call print_preorder() function recursively while there is non-NULL left node
c. [Line 32] Call print_preorder() function recursively while there is non-NULL right node

In-order display

a. [Line 37]Call print_inorder() function recursively while there is non-NULL left node
b. [Line38] Display value of root node.
c. [Line 39] Call print_inorder() function recursively while there is non-NULL right node

Post-order display

a. [Line 44] Call print_postorder() function recursively while there is non-NULL left node
b. [Line 45] Call print_postorder() function recursively while there is non-NULL right node
c. [Line46] Display value of root node.
Working program: It is noted that above code snippets are parts of below C program. This below program would be working basic program for binary tree.
#include<stdlib.h>#include<stdio.h> void print_preorder(node * tree) void deltree(node * tree)
struct bin_tree { {
{ if (tree) if (tree)
int data; { {
struct bin_tree * right, * left; printf("%d\n",tree->data); deltree(tree->left);
}; print_preorder(tree->left); deltree(tree->right);
typedef struct bin_tree node; print_preorder(tree->right); free(tree);
} }
void insert(node ** tree, int val) } }
{ node* search(node ** tree, int val)
node *temp = NULL; void print_inorder(node * tree) {
if(!(*tree)) { if(!(*tree))
{ if (tree) {
temp = (node *)malloc(sizeof(node)); { return NULL;
temp->left = temp->right = NULL; print_inorder(tree->left); }
temp->data = val; printf("%d\n",tree->data); if(val < (*tree)->data)
*tree = temp; return; print_inorder(tree->right); {
} } search(&((*tree)->left), val);
} }
if(val < (*tree)->data) else if(val > (*tree)->data)
{ void print_postorder(node * tree) {
insert(&(*tree)->left, val); { search(&((*tree)->right), val);
} if (tree) }
else if(val > (*tree)->data) { else if(val == (*tree)->data)
{ print_postorder(tree->left); {
insert(&(*tree)->right, val); print_postorder(tree->right); return *tree;
} printf("%d\n",tree->data); }
} } }
}
void main() Output of Program: It is noted that binary tree figure used at top of article can be
{ referred to under output of program and display of binary tree in pre-order, in-
node *root; node *tmp; root = NULL; order and post-order forms.

/* Inserting nodes into tree */


insert(&root, 9);
insert(&root, 4);
insert(&root, 15);
insert(&root, 6);
insert(&root, 12);
insert(&root, 17);
insert(&root, 2);

/* Printing nodes of tree */


printf("Pre Order Display\n");
print_preorder(root);
printf("In Order Display\n");
print_inorder(root);
printf("Post Order Display\n");
print_postorder(root);

/* Search node into tree */


tmp = search(&root, 4);
if (tmp)
{ printf("Searched node=%d\n", tmp->data); }
else
{ printf("Data Not found in tree.\n"); }

/* Deleting all nodes of tree */


deltree(root); }
AVL tree and operations: (Adelson-Velsky and Landis) Tree is a Now, you may ask what exactly the limitation was and how AVL
self-balancing binary search tree that can perform certain Trees overcame it, providing us with a better search process in
operations in logarithmic time. It exhibits height-balancing property terms of efficiency. For this, let’s take a look at the Binary Search
by associating each tree node with a balance factor and ensuring Trees (BSTs) search process, its limitations, and how the AVL Trees
that it stays between -1 and 1 by performing specific tree rotations. overcome them.
This property prevents the Binary Search Tree from getting skewed,
achieving a minimal height tree that provides logarithmic time Searching Using Binary Search Trees: A Tree is a non-linear and
complexity for some significant operations such as searching. hierarchical data structure that consists of a root node (the
topmost node of the tree), and each node can have some number
Introduction: In our fast-paced daily lives, we make use of a lot of of children nodes. It represents an actual hierarchical tree
different data structures and algorithms without even noticing structure, such as your Family Tree. Now, a special case of trees is
them. For example, consider a scenario in which you wish to call the Binary Trees, i.e., trees in which every node has at most two
someone from your contact list that contains a ton of data. You children. Furthermore, Binary Search Trees are a special type of
need to find that individual’s phone number by searching. This is binary tree in which all the elements in the left subtree of a node
internally implemented using specific data structures and uses are smaller than that node, whereas all the elements in the right
particular algorithms to provide you with the best results in an subtree are greater in value (BST Rule).
efficient manner. This is required as the faster the search, the more
convenience you get, and the faster you can connect with others.

With time, this search process was gradually improved by


implementing and developing new data structures that eradicate or
reduce the limitations of the previously used methods. One such
data structure is AVL Trees. It was developed to reduce the In the above example, you can notice that all the nodes of the
limitations of the searching process implemented using a non- Binary Search Tree follow the BST rule i.e., for all the nodes,
linear data structure known as Binary Search Trees. their left subtrees have lesser values while the right subtrees
have more significant values.
Binary Search Trees are useful in searching for an element from a constantly Now, if we wish to search whether the element 50 is
updating data stream or in finding an element when a combination of search present in this BST, we will have to traverse all the
and update operations are being performed on a dataset. elements present in the BST as 50 is present at the
deepest level of the tree. Also, there is no left subtree to
Here, upon each updation in the data stream, the element is inserted in the traverse at each level. Hence, instead of reducing the
BST, and the search is performed based on the query provided. In BSTs, to find number of checks at each level, we are just searching the
an element, we start with the root node and check whether the given element element in linear time i.e., O(n), where n is the total
is greater than the root node; if it’s greater, we continue searching in the right number of nodes present in the BST.
subtree; otherwise, we look in the left subtree. This process is continued until
Case 2: Balanced BST: Now,
we find a node with the same value as the given query value or we reach the
consider another example in which
leaf node of the tree, i.e., the element is not present in the tree. In either case,
the same elements are inserted
the tree’s depth or height (the number of edges on the longest path from the
differently, such as [20, 10, 40, 30,
root node to a leaf node) defines the time complexity of the search operation.
50]. In this case, when we create a
BST using this order, it will attain
Let’s understand this effect of height on the searching operation in BSTs with
minimal height and will look like
an example:
this:
Case 1: Unbalanced BST: Consider a sorted array A
Now, on searching the element 50 in this tree we are
having elements [10, 20, 30, 40, 50]. Now, when
reducing the number of comparisons as in each step we
we create a BST for this array, all the elements will
are neglecting the left subtrees at each level i.e., we are
be inserted in the right subtree as all have greater
reducing the check operations by half at each level.
value than the previous element, i.e., the BST
Hence, in this case, when the tree is not skewed, the
becomes skewed or unbalanced (for a given node,
process of searching takes logarithmic time i.e., O (log n)
one subtree is significantly larger than its sibling
where n is the total number of nodes present in the tree.
subtree) and will look like this:
In the above example, we can observe that when the tree was skewed in the First case, it attained the maximum height,
i.e., O (n), which is the same as the time complexity for the search operation. Also, in the second case, when the tree
attained minimal height, i.e., O (log n), the search operation took logarithmic time. Hence, we can say that the height or
the depth of the tree somewhat determines the time complexity of the BST.

Now, you may wonder that for the same elements, we can have two Binary Search Trees having drastically different heights
and search times. Hence, there must be a way to control the height of the BST such that we always achieve logarithmic
search time complexity irrespective of the order of the elements. This can be achieved by checking when the Binary Search
Tree starts becoming skewed (Balancing Criteria) and performing certain operations to limit this skewness. This way, we
can control the tree’s height and achieve a logarithmic time complexity for almost all the operations. This is exactly
where AVL Trees come into action.

Highlights:

1.BSTs are binary trees in which all elements in the left subtree of a node are smaller while the elements in the right
subtree are larger than that node.
2.BSTs are useful for performing searches on dynamic datasets.
3.As the operations performed using BSTs always start from the root and traverse down the tree, the time complexity
of BSTs depends upon the tree’s height.
4.BSTs can be skewed (unbalanced) or balanced depending upon the order of insertion of the elements.
5.Balanced BSTs provide logarithmic time complexity because of their optimal height.
What is an AVL Tree?: AVL Tree, named after its inventors Adelson-Velsky and Landis, is a special variation of Binary Search
Tree which exhibits self-balancing property, i.e., AVL Trees automatically attain the minimal possible height of the tree after
the execution of any operation. The AVL Trees implement the self-balancing property by attaching extra information known
as the balance factor to each node of the tree, then verifying that the balance factor for all the nodes of the tree follows
certain criteria (Balancing Criteria) upon the execution of any operation that affects the height of the tree, and finally
applying certain Tree Rotations to maintain this criterion of height-balancing.
The Criterion of height balancing is a principle that determines whether a Binary Search Tree is unbalanced (skewed). It
states that:

Tip: A Binary Search Tree is considered to be balanced if any two sibling subtrees present in the tree don’t differ in height by
more than one level, i.e., the difference between the height of the left subtree and the height of the right subtree for all the
nodes of the tree should not exceed unity. If it exceeds unity, then the tree is known as an unbalanced tree.
Since skewed or unbalanced BSTs provide inefficient search operations, AVL Trees prevent unbalancing by defining a
balance factor for each node. Let's look at what exactly is this balancing factor.

Highlights:
1.AVL Trees were developed to achieve logarithmic time complexity in BSTs irrespective of the order in which the elements
were inserted.
2.AVL Tree implemented a Balancing Criteria (For all nodes, the subtrees’ height difference should be at most 1) to
overcome the limitations of BST.
3.It maintains its height by performing rotations whenever the balance factor of a node violates the Balancing Criteria. As a
result, it has self-balancing properties.
4.It exists as a balanced BST at all times, providing logarithmic time complexity for operations such as searching.
Balance Factor: The balance factor in AVL Trees is an additional value associated with each tree node that represents the height difference
between the left and the right sub-trees of a given node. The balance factor of a given node can be represented as:

balance_factor = (Height of Left sub-tree) - (Height of right sub-tree)


Or mathematically speaking, bf=hl −hr
Where bf is the balance factor of a given node in the tree, hl represents the height of the left subtree, and hr represents the height of the
right subtree.
In the balanced tree example of the above
illustration, we can observe that the height of the
left subtree (h-1) is one greater than the height of
the right subtree (h-2) of the highlighted node i.e.,
the given node is left-heavy having the balance
factor of positive unity. Since the balance factor of
the node follows the Balancing Criteria (height
difference should be at most unity), the given tree
example is considered as a balanced tree.

Now, in the unbalanced tree example, we can observe that the tree is left-skewed i.e., the height of the left subtree is much greater than
that on the right subtree. This is clearly an unbalanced tree as it is highly skewed. This is also indicated by the balance factor of the node as it
doesn’t follow the Balancing Criteria.

Hence, AVL Trees make use of the balance factor to check whether a given node is left-heavy (height of left sub-tree is one greater than that
of right sub-tree), balanced, or right-heavy (height of right sub-tree is one greater than that of left sub-tree). Hence, using the balance factor,
we can find an unbalanced node in the tree and can locate where the height-affecting operation was performed that caused the imbalance
of the tree.
NOTE: Since the leaf nodes don't contain any subtrees, the balance factor for all the leaf nodes present in the
Binary Search Tree is equal to 0.

Upon the execution of any height-affecting operation on the tree, if the magnitude of the balance factor of a
given node exceeds unity, the specified node is said to be unbalanced as per the Balancing Criteria. This
condition can be mathematically represented with the help of the given equation:

bf=(hl​−hr​),s.t.bf∈[−1,0,1] Or ∣bf∣=∣hl​−hr∣≤1

Here, the above equation indicates that the balance factor of any given node can only take the value of -1, 0,
and 1 for a height-balanced Binary Search Tree. To maintain this criterion for all the nodes, AVL Trees take
use of certain Tree Rotations that are discussed later in this article.

Highlights:

1.Balance Factor represents the height difference between a given node’s left and right sub-trees.
2.For leaf nodes, the balance factor is 0.
3.AVL balance criteria: |bf| ≤ 1 for all nodes.
4.Balance factor indicates whether a node is left heavy, right heavy, or balanced.
AVL Tree Rotation: As discussed earlier, the AVL Trees make use of the balance factor to check whether a given node is left-heavy (height
of left sub-tree is one greater than that of right sub-tree), balanced, or right-heavy (height of right sub-tree is one greater than that of left
sub-tree). If any node is unbalanced, it performs certain Tree Rotations to re-balance the tree.

Tree Rotations: It is the process of changing the tree’s structure by moving smaller subtrees down and larger subtrees up, without
interfering with the order of the elements.

If the balance factor of any node doesn't follow the AVL Balancing criterion, the AVL Trees make use of 4 different types of Tree rotations
to re-balance themselves. These rotations are classified based on the node imbalance cured by them i.e., a specific rotation is applied to
counter the change that occurred in the balance factor of a node making it unbalanced.

These rotations include:


Now, let's look at all the tree rotations and understand how they This is confirmed after calculating the balance factor of all the
can balance the tree and make it follow the AVL balance criterion. nodes present in the tree. As you can observe, when we insert
element 10 in the tree, the root node becomes
1. LL Rotation: It is a type of single rotation that is performed when imbalanced (balance factor = 2) because the tree becomes left-
the tree gets unbalanced upon insertion of a node into the left skewed upon this operation. Also, notice that element 10 is
subtree of the left child of the imbalance node i.e., upon Left-Left inserted as a left child in the left subtree of the imbalanced node
(LL) insertion. This imbalance indicates that the tree is heavy on the (here, the root node of the tree). Hence, this is the case of L-L
left side. Hence, a right rotation (or clockwise rotation) is applied insertion and we will have to perform a certain operation to
such that this left heaviness imbalance is countered and the tree counter-act this left skewness.
becomes a balanced tree. Let’s understand this process using an
example: Imagine a weighing scale in which we only have 5 kg on the left
plate and nothing on the right plate. This is the case of left heavy
Consider a case when we wish to create a BST using elements 30, since there is nothing on the right plate to balance the weight
20, and 10. Now, since these elements are given in sorted order, present in the left plate. Now, to balance this scale we can just add
the BST so formed is a left-skewed tree as shown below: some weight to the right plate. Hence, to balance the weight on
one side we try to increase the weight on the other side. In the
case of trees, instead of adding a new node (weight) on the lighter
side, we try to rotate the structure of the tree around a pivot point
thereby shifting the nodes from the heavier side to the lighter side.

In our example, we have extra weight on the left subtree (LL


insertion) therefore we will perform right rotation or clockwise
rotation on the imbalanced node to transfer this node on the right
side to retrieve a balanced tree i.e., we will pull the imbalanced
node down by rotating the tree in a clockwise direction along the
edge of the imbalanced or in this case, the root node.
2. RR Rotation: It is similar to that of LL Rotation but in this case, the tree gets unbalanced, upon insertion of a node into the right subtree of
the right child of the imbalance node i.e., upon Right-Right (RR) insertion instead of the LL insertion. In this case, the tree becomes right
heavy and a left rotation (or anti-clockwise rotation) is performed along the edge of the imbalanced node to counter this right skewness
caused by the insertion operation. Let’s understand this process with an example: Consider a case where we wish to create a BST using the
elements 10, 20, and 30. Now, since the elements are given in sorted order, the BST so created becomes right-skewed as shown below:
Upon calculating the balance factor of all the nodes, we can confirm
that the root node of the tree is imbalanced (balance factor = 2) when
the element 30 is inserted using RR-insertion. Hence, the tree is heavier
on the right side and we can balance it by transferring the imbalanced
node on the left side by applying an anti-clockwise rotation around the
edge (pivot point) of the imbalanced node or in this case, the root
node.
3. LR Rotation: So far, we have discussed that when the tree is heavy on one side, perform a single rotation in the opposite direction to
counter the effect of tree skewness. But, there also exist some cases where a single tree rotation isn’t enough to balance the tree, i.e., we
may need to perform one more rotation to counter the height-affecting operation’s effects finally.
One such case is of Left-Right (LR) insertion, i.e.; the tree gets unbalanced upon insertion of a node into the right subtree of the left child of
the imbalance node. Let’s understand this case using an example:
Consider a situation where you create a BST using elements 30, 10, and 20. When the elements are inserted, element 30 becomes the
root, 10 becomes its left child, and when element 20 is inserted, it is inserted as the right child of the node having the value 10. This causes
an imbalance in the tree as the root node’s balance factor equals 2.
Now, as per the previous discussions, you may have noticed that a positive balance factor indicates that the given node is left-heavy, while a
negative one indicates that the node is right-heavy. Now, if we notice the immediate parent of the inserted node, we notice that its balance
factor is negative i.e., its right-heavy. Hence, you may say that we should perform a left rotation (RR rotation) on the immediate parent of
the inserted node to counter this effect. Let’s perform this rotation and notice the change:
3. LR Rotation: So far, we have discussed that when the tree is heavy on one side, perform a single rotation in the opposite direction to counter the
effect of tree skewness. But, there also exist some cases where a single tree rotation isn’t enough to balance the tree, i.e., we may need to perform
one more rotation to counter the height-affecting operation’s effects finally.
One such case is of Left-Right (LR) insertion, i.e.; the tree gets unbalanced upon insertion of a node into the right subtree of the left child of the
imbalance node. Let’s understand this case using an example:
Consider a situation where you create a BST using elements 30, 10, and 20. When the elements are inserted, element 30 becomes the
root, 10 becomes its left child, and when element 20 is inserted, it is inserted as the right child of the node having the value 10. This causes an
imbalance in the tree as the root node’s balance factor equals 2.
Now, as per the previous discussions, you may have noticed that a positive balance factor indicates that the given node is left-heavy, while a negative
one indicates that the node is right-heavy. Now, if we notice the immediate parent of the inserted node, we notice that its balance factor is negative
i.e., its right-heavy. Hence, you may say that we should perform a left rotation (RR rotation) on the immediate parent of the inserted node to counter
this effect. Let’s perform this rotation and notice the change:
As you can observe, upon applying the RR rotation the BST becomes left-skewed and is
still unbalanced. This is now the case of LL rotation and by rotating the tree along the
edge of the imbalanced node in the clockwise direction, we can retrieve a balanced BST.
Hence, a simple rotation won’t fully balance the tree but it may flip the tree in such a
manner that it gets converted into a single rotation scenario, after which we can balance
the tree by performing one more tree rotation. This process of applying two rotations
sequentially one after another is known as double rotation and since in our example the
insertion was Left-Right (LR) insertion, this combination of RR and LL rotation is known
as LR rotation. Hence, to summarize:
The LR rotation consists of 2 steps:
1.Apply RR Rotation (anti-clockwise rotation) on the left subtree of the imbalanced node
as the left child of the imbalanced node is right-heavy. This process flips the tree and
converts it into a left-skewed tree.
2.Perform LL Rotation (clock-wise rotation) on the imbalanced node to balance the left-
skewed tree.
Hence, LR rotation is essentially a combination of RR and LL Rotation.
Operations on AVL Trees: Since AVL Trees are self-balancing Binary Search Trees, all the
operations carried out using AVL Trees are similar to that of Binary Search Trees. Also,
since searching an element and traversing the tree doesn’t change the tree’s structure,
these operations can't violate the height balancing property of AVL Trees. Hence,
searching and traversing operations are the same as that of Binary Search Trees.
However, upon the execution of each insertion or deletion operation, we check the
balance factor of all the nodes and perform rotations to balance the AVL Tree if
needed. Let's look at these operations in detail:

1. Insertion: In Binary Search Trees, the new node (let say N) was inserted in the tree
by traversing it using BST logic to locate a node with NULL as its child that can be
replaced to insert the new node N. Hence, in BSTs a new node is always inserted as a
leaf node by replacing the NULL value of a node’s child.

Just like the insertion in BSTs, the new node is always inserted as a leaf node in AVL
Trees i.e., the balance factor of the newly inserted node is always equal to 0. However,
after each insertion in the tree, the balance factor for the ancestors of the newly
inserted node is checked to verify that the tree is balanced or not. Here, only the
ancestors of the inserted node are checked for imbalance because when a new node is
inserted, it only alters the height of its ancestors, thereby inducing an imbalance in the
tree. This process of finding the unbalanced node by traversing the ancestors of the
newly inserted node is known as retracing. If the tree becomes unbalanced after
inserting a new node, retracing helps us find the node’s location in the tree at which
we need to perform the tree rotations to balance the tree.
The below gif demonstrates the retracing process upon inserting a new element in the
AVL Tree:
Let’s look at the algorithm of the insertion operation in AVL Trees:

Insertion in AVL Trees:

1. START

2. Insert the node using BST insertion logic.

3. Calculate and check the balance factor of each node.

4. If the balance factor follows the AVL criterion, go to step 6

5. Else, perform tree rotations according to the insertion done. Once

the tree, is balanced go to step 6.

6. END

For better understanding, let’s consider an example where we wish to


create an AVL Tree by inserting the elements: 10, 20, 30, 40, and 50. The
below gif demonstrates how the given elements are inserted one by one
in the AVL Tree:
2. Deletion: When an element is to be deleted from a Binary Search Tree, the tree is searched using various
comparisons via the BST rule till the currently traversed node has the same value as that of the specified
element. Suppose the element is found in the tree. In that case, there are three different cases in which the
deletion operation occurs depending upon whether the node to be deleted has any children or not:
Case 1: When the node to be deleted is a leaf node
•In this case, the node to be deleted contains no subtrees, i.e., it’s a leaf node. Hence, it can be directly
removed from the tree.
Case 2: When the node to be deleted has one subtree
•In this case, the node to be deleted is replaced by its only child, thereby removing the specified node from
the BST.
Case 3: When the node to be deleted has both subtrees.
•In this case, the node to be deleted can be replaced by one of the two available nodes:
• It can be replaced by the node having the largest value in the left subtree (Longest left node or
Predecessor).
• Or, it can be replaced by the node having the smallest value in the right subtree (Smallest right
node or Successor).
Like the deletion operation in Binary Search Trees, the elements are deleted from AVL Trees depending on
whether the node has any children. However, upon every deletion in AVL Trees, the balance factor is
checked to verify whether the tree is balanced or not. If the tree becomes unbalanced after deletion, certain
rotations are performed to balance the Tree.
Let’s look at the algorithm of the deletion operation in AVL Trees: Deletion in AVL Trees
1.START
2.Find the node in the tree. If the element is not found, go to step 7.
3.Delete the node using BST deletion logic.
4.Calculate and check the balance factor of each node.
5.If the balance factor follows the AVL criterion, go to step 7.
6.Else, perform tree rotations to balance the unbalanced nodes. Once the tree is balanced, go to step 7.
7.END
For better understanding, let’s consider an example where we wish to delete the element having value 10 from the AVL Tree created using the
elements 10, 20, 30, and 40. The below gif demonstrates how we can delete an element from an AVL Tree:
4. RL Rotation: It is similar to LR rotation but it is performed when the Hence, RL rotation consists of two steps:
tree gets unbalanced, upon insertion of a node into the left subtree of 1.Apply LL Rotation (clockwise rotation) on the right subtree of the
the right child of the imbalance node i.e., upon Right-Left (RL) insertion imbalanced node as the right child of the imbalanced node is left-
instead of LR insertion. In this case, the immediate parent of the inserted heavy. This process flips the tree and converts it into a right-skewed
node becomes left-heavy i.e., the LL rotation (right rotation or clockwise tree.
rotation) is performed that converts the tree into a right-skewed tree. 2.Perform RR Rotation (anti-clockwise rotation) on the imbalanced
After which, RR rotation (left rotation or anti-clockwise rotation) is node to balance the right-skewed tree.
applied around the edge of the imbalanced node to convert this right- NOTE:
skewed tree into a balanced BST. Let’s now observe an example of the RL •Rotations are done only on three nodes (including the imbalanced
rotation: node) irrespective of the size of the Binary Search Tree. Hence, in the
case of a large tree always focus on the two nodes around the
imbalanced node and perform the tree rotations.
•Upon insertion of a new node, if multiple nodes get imbalanced then
traverse the ancestors of the inserted node in the tree and perform
rotations on the first occurred imbalanced node. Continue this process
until the whole tree is balanced. This process is knowns
as retracing which is discussed later in the article.
Highlights:
1.Rotations are performed to maintain the AVL Balance criteria.
2.Rotation is a process of changing the structure without affecting the
elements' order.
3.Rotations are done on an unbalanced node based on the location of
In the above example, we can observe that the tree's root node becomes the newly inserted node.
imbalanced upon insertion of the node with the value 20. Since this is a 4.Single rotations include LL (clockwise) and RR (anti-clockwise)
type of RL insertion, we will perform LL rotation on the immediate parent rotations.
of the inserted node thereby retrieving a right-skewed tree. Finally, we 5.Double rotations include LR (RR + LL) and RL (LL + RR) rotations.
will perform RR Rotation around the edge of the imbalanced node (in this 6.Rotations are done only on 3 nodes, including the unbalanced node.
case the root node) to get the balanced AVL tree.
Heap Sort Algorithm What is heap sort?
In this session, we will discuss the Heapsort Heapsort is a popular and efficient
Algorithm. sorting algorithm. The concept of heap
sort is to eliminate the elements one by
Heap sort processes the elements by creating
one from the heap part of the list and
the min-heap or max-heap using the elements of
then insert them into the sorted part of
the given array.
the list.
Min-heap or max-heap represents the ordering of
Heapsort is the in-place sorting
the array in which the root element represents
algorithm.
the minimum or maximum element of the array.
Now, let's see the algorithm of heap
Heap sort recursively performs two main
sort.
operations –
Working of Heap sort Algorithm: Now,
Build a heap H using the elements of the array.
let's see the working of the Heapsort
Repeatedly delete the root element of the heap Algorithm.
formed in 1st phase.
In heap sorting, there are two phases
Before knowing more about the heap sort, let's involved in the sorting of elements. By
first see a brief description of Heap. using the heap sort algorithm, they are
as follows –
What is a heap?
•The first step includes the creation of a
A heap is a complete binary tree, and the binary heap by adjusting the elements of the
tree is a tree in which the node can have the array.
utmost two children. A complete binary tree is a
tree in which all the levels except the last level, •After the creation of the heap, now
i.e., leaf node, should be filled, and all the nodes remove the root element of the heap
Now let's see the working of heap sort in detail by using an In the next step, we must again delete the root
example. To understand it more clearly, let's take an unsorted element (81) from the max heap. To delete this node, we have
array and try to sort it using heap sort. It will make the to swap it with the last node, i.e. (54). After deleting the root
explanation clearer and easier. element, we again have to heapify it to convert it into max
heap.

First, we have to construct a heap from the given array and


convert it into a max heap.

After swapping the array element 81 with 54 and converting the heap
into max-heap, the elements of array are –

After converting the given heap into a max heap, the array
elements are – In the next step, we have to delete the root element (76) from
the max heap again. To delete this node, we have to swap it
Next, we must delete the root element (89) from the max heap. with the last node, i.e. (9). After deleting the root element, we
To delete this node, we have to swap it with the last node, again have to heapify it to convert it into max heap.
i.e. (11). After deleting the root element, we must heapify it to
convert it into a max heap.

After swapping the array element 76 with 9 and converting the


After swapping the array element 89 with 11, and converting heap into max-heap, the elements of array are -
the heap into max-heap, the elements of array are –
In the next step, again we have to delete the root element In the next step, again we have to delete the root
(54) from the max heap. To delete this node, we have to swap element (14) from the max heap. To delete this node, we have to
it with the last node, i.e. (14). After deleting the root element, swap it with the last node, i.e. (9). After deleting the root element,
we again have to heapify it to convert it into max heap. we again have to heapify it to convert it into max heap.

After swapping the array element 14 with 9 and converting the


heap into max-heap, the elements of array are –
After swapping the array element 54 with 14 and
converting the heap into max-heap, the elements of
array are – In the next step, again we have to delete the root
element (11) from the max heap. To delete this node, we have to
swap it with the last node, i.e. (9). After deleting the root element,
In the next step, again we have to delete the root we again have to heapify it to convert it into max heap.
element (22) from the max heap. To delete this node, we
have to swap it with the last node, i.e. (11). After deleting After swapping the array element 11 with 9, the elements of array
the root element, we again have to heapify it to convert it are –
into max heap.

Now, heap has only one element left. After deleting it, heap will
be empty.

After swapping the array element 22 with 11 and After completion of sorting, the array elements are –
converting the heap into max-heap, the elements of
array are -
Now, the array is completely sorted.
#include <stdio.h> /*Function to implement the heap sort*/ int main()
/* function to heapify a subtree. Here 'i' is th void heapSort(int a[], int n) {
e index of root node in array a[], and 'n' is t { int a[] = {48, 10, 23, 43, 28, 26, 1};
he size of heap. */ for (int i = n / 2 - 1; i >= 0; i--) int n = sizeof(a) / sizeof(a[0]);
void heapify(int a[], int n, int i) heapify(a, n, i); printf("Before sorting array elements are -
{ // One by one extract an element from he \n");
int largest = i; // Initialize largest as root ap printArr(a, n);
int left = 2 * i + 1; // left child for (int i = n - 1; i >= 0; i--) { heapSort(a, n);
int right = 2 * i + 2; // right child /* Move current root element to end*/ printf("\nAfter sorting array elements are -
// If left child is larger than root // swap a[0] with a[i] \n");
if (left < n && a[left] > a[largest]) int temp = a[0]; printArr(a, n);
largest = left; a[0] = a[i]; return 0;
// If right child is larger than root a[i] = temp; }
if (right < n && a[right] > a[largest])
largest = right; heapify(a, i, 0);
// If root is not largest }
if (largest != i) { }
// swap a[i] with a[largest] /* function to print the array elements */
int temp = a[i]; void printArr(int arr[], int n)
a[i] = a[largest]; {
a[largest] = temp; for (int i = 0; i < n; ++i)
{
heapify(a, n, largest); printf("%d", arr[i]);
} printf(" ");
} }

}
Unit-4
Trees

A tree is a data structure with each node pointing to a number of nodes. Tree is an example of
non-linear data structurcs. A tree structure is a way of representing the hierarchical nature of a
structure in a graphical form.

 Internal node: node with at least one child. Eg: A, B, C, F


 External node (leaf node): node with no children. Eg: E, I, J, K
 Path: A sequence of consecutive edges is called a path. Eg: Path from root nod A to
node J is A – B – F – J
 Subtree: A subtree is a portion of a tree that can be viewed as a complete tree in itself.
Any node in a tree, together with all the nodes below it, comprise a subtree of that tree.
(i.e) A tree consisting of a node and its descendants is called as sub tree. Eg: T1,T2,T3
 Level : A rank of the hierarchy. That is, Every node in the tree is assigned a level
number in such a way that the root node is at level 0, children of the root node are at
level number. Thus, every node is at one level higher than its parent. If a node is at level
i and its child is at level i+1 and parent is at level i-1. This is true for all except the root
node.
 Edge: It is the line connecting a node N to any of its successors.
 Depth of a node: the length of the path from the root to that node . Eg: depth of G is 2,
A – C – G. Depth of root= 0
 Height of a node: the length of the path from that node to the deepest leaf. Eg: height
of B is 2, B – F – I. Height of leaf is 0
 Height of a tree: The height of a tree is a height of the root. (i.e) It is the total number
of edges on the path from the root node to the deepest leaf node in the tree. A tree with
only a root node has a height of 0.
 Degree of a node : Degree of a node is equal to the number of children that a node has.
The degree of a leaf node is zero.
 In-degree: In-degree of a node is the number of edges arriving at that node.
 Out-degree: Out-degree of a node is the number of edges leaving that node.
 Size of a node: The size of a node is the number of descendants it has including itself
Eg: the size of the sub-tree C is 3.
 Size of the tree= no. of nodes in that tree Ex.11
 Degree of a node = Number Of Children Ex. Degree of B is 2
 Degree of a Tree = Number of nodes in the deepest path(Node Degree) Ex. Here it is
3
Let, i = the number of internal nodes
n = be the total number of nodes
l = number of leaves
λ = number of levels

The number of leaves is i + 1.


The total number of nodes is 2i + 1.
The number of internal nodes is (n – 1) / 2.
The number of leaves is (n + 1) / 2.
The total number of nodes is 2l – 1.
The number of internal nodes is l – 1.
The number of leaves is at most 2λ - 1.

TYPES OF TREES
1. General trees
2. Forests
3. Binary trees
4. Binary search trees
5. Expression trees

1.General trees
•It store elements hierarchically.
•The top node of a tree is the root node and each node, except the root, has a parent.
• A node in a general tree except the leaf nodes may have zero or more sub-trees. Ex: General
trees which have 3 sub-trees per node are called ternary trees.
•But, the number of sub-trees for any node may be variable. Ex: a node can have 1 sub-tree,
whereas some other node can have 3 sub-trees.

2. Forests

•A forest is a disjoint union of trees. A set of disjoint trees (or forests) is obtained by deleting
the root and the edges connecting the root node to nodes at level 1.
Forest Tree
3. Binary Tree
A binary tree is a finite set of nodes such that
i. T contains a specially designed node called the root of T, and remaining nodes of T
form two disjoint binary trees T1 and T2 which are called left sub tree and right sub
tree respectively.
ii. Each node in binary tree has at most two children. (0,1,2)
iii. T may be empty tree (called empty binary tree)

Comparison between tree and binary tree


TreesTree Binary Trees
Tree can not be empty Binary tree may be empty
A node may have any no of A node may have at most two
children nodes children
0, 1, or 2

Types of Binary Trees:


1.Strict Binary Tree
2.Full binary tree
3.Complete binary tree

Strict Binary Tree:


A binary tree is called strict binary tree if each node has exactly two children or no children.
•Full binary tree
A binary tree is a full binary tree, if it contains maximum possible number of nodes in all levels.
In full binary tree, each node has exactly two children and all leaf nodes are at the same level.
Complete binary tree
A binary tree is said to be complete binary tree, if all its levels have maximum number of nodes
except the last level, and In the last level, the nodes are filled from the left to right position.

Properties of binary trees


1.In BT, the maximum number of nodes at level i is, (where i>=0) , 2i
at level 0 =20=1 node
at level 1 = 21=2 nodes
2. The maximum no. of nodes possible in a binary tree of height h is
Max no. of nodes= 2h+1-1
3. The minimum no. of node possible in a binary tree of height h is h+1
4. For any non empty binary tree, if n is the number of nodes and e is the number of edges, then
n= e+1
5. The number of nodes in a full binary tree is 2h+1 – 1. where h is the height of the tree.
10. The number of nodes n in a complete binary tree is between 2h (minimum) and 2 h+1 – 1
(maximum).
11. The number of leaf nodes in a full binary tree is 2h.
12. The total number of nodes in a full binary tree with L leaves is 2L-1.
13. Total no. of binary trees possible with n nodes is

Binary Tree Traversal


1. An in-order traversal technique follows the Left Root Right policy.
Step 1: Recursively traverse the left subtree
Step 2: Now, visit the root
Step 3: Traverse the right subtree recursively

{15, 25, 28, 30, 35, 40, 45, 50, 55, 60, 70}

2. Preorder Traversal:
Ans: 40, 30, 25, 15, 28, 35, 50, 45, 60, 55, 70
First, visit the root node.

Then, visit the left subtree.

At last, visit the right subtree.

3. Postorder traversal:

Traverse the left subtree by calling the postorder function recursively.

Traverse the right subtree by calling the postorder function recursively.


Access the data part of the current node

Ans: {15, 28, 25, 35, 30, 45, 55, 70, 60, 50, 40}

Constructing a Binary Tree from Traversal Results


•We can construct a binary tree if we are given at least two traversal results.
•The first traversal must be the in-order traversal and the second can be either pre-order or post-
order traversal or level-order.

Steps to construct binary tree from In-order and pre-order traversal:

Step 1: Use the pre-order sequence to determine the root node of the tree. The first element
would be the root node.
Step 2 : Elements on the left side of the root node in the in-order traversal sequence form the
left sub-tree of the root node. Similarly, elements on the right side of the root node in the in-
order traversal sequence form the right sub-tree of the root node.
Step 3: Recursively select each element from pre-order traversal sequence and create its left
and right sub-trees from the in-order traversal sequence.
In binary search trees, all the left subtree elements should be less than root data and all the right
subtree elements should be greater than root data. This is called binary search tree property.

Ex1.
Ex2. Pre-order Sequence : 1 2 4 5 3 6
In-order Sequence : 4 2 5 1 6 3

Ex2.Postorder=[10,18,9,22,4]
Inorder = [10, 4, 18, 22, 9]

Ex3. in = { 12, 25, 30, 37, 40, 50, 60, 62, 70, 75, 87 };
post = { 12, 30, 40, 37, 25, 60, 70, 62, 87, 75, 50 }
Why Binary Search Tree?
 To search for an element in binary tree, we need to check both in left subtree and in
right subtree. Due to this, the worst-case complexity of search operation is O(n).
 Binary search tree is for searching. In this tree, there is a restriction on the kind of data
a node can contain. As a result, it reduces the worst-case average search operation to
O(logn).
 In a Binary search tree, the value of left node must be smaller than the parent node,
and the value of right node must be greater than the parent node. This rule is applied
recursively to the left and right subtrees of the root.

 Since root data is always between left subtree data and right subtree data, performing
in-order traversal on binary search tree produces a sorted list.
 The basic operations that can be performed on binary search tree (BST) are insertion of
element, deletion or clement, and searching for an element.
 While performing these operations on BST the height of the tree gets changed each
time. The basic operations on a binary search tree take time proportional to the height
of the tree.

Binary Search Tree Declaration

There is no difference between regular binary tree declaration and binary search tree
declaration. The difference is only in data but not in structure.

class BSTNode:
def __init__ (self, data):
self.data = data
self.left= None
self.right = None
#set data
def setData(self, data):
self.data = data
#get data
def getData(self):
return self.data
#get left child of a node
def getLeft(self):
return self.left
#get right child of a node
def getRight(self):
return self.right
def insert (self, data):
if data < self.data:
if self.left is None:
self.left = BSTNode(data)
else:
self.left.insert(data)

else:
if self.right is None:
self.right = BSTNode(data)
else:
self.right.insert(data)
def find(self,data):
if data < self.data:
if self.left is None:
return False
else:
return self.left.find(data)
elif data > self.data:
if self.right is None:
return False
else:
return self.right.find(data)
else:
return True
def preorder(self):
print(self.key)
if self.left:
self.left.preorder()
if self.right:
self.right.preorder()
def inorder(self):

if self.left:
self.left.inorder()
print(self.key,end=” “)
if self.right:
self.right.inorder()
def postorder(self):

if self.left:
self.left.postorder()
if self.right:
self.right.postorder()
print(self.key,end=” “)
def delete(self,data):
if self.data is None:
print(“Tree is empty”)
return
if data < self.data: #finding the position of element, perform
search operation
if self.left:
self.left=self.left.delete(data)
else:
print(“given node is not present)
elif data > self.data:
if self.right:
self.right=self.right.delete(data)
else:
print(“given node is not present)
else: #delete node
if self.left is None: #if no child present
temp = self.right
self = None
return temp
if self.right is None:
temp = self.left
self = None
return temp
node = self.right
while node.left:
node = node.left
self.node = node.data
self.right = self.right.delete(node.data)
return self

tree = BSTNode(10)
tree.insert(30)
tree.insert(3)
tree.insert(5)
tree.insert(8)
tree.insert(7)
print(tree.left.left.left.data)
print(tree.find(7))
tree.preorder()
tree.inorder()
tree.postorder()
tree.delete(2)

Finding an Element in Binary Search Trees

If the data we are searching is less than nodes data then search left subtree of current node;
otherwise search right subtree of current node. If the data is not present, we end up in a NULL
link.

Inserting an element in BST

To insert data into binary search tree, first we need to find the location for that element. While
finding the location, if the data is already there then we can simply neglect and come out.
Otherwise, insert data at the last location on the path
traversed.
Delete a Node
Deletion in BST has been divided into 3 cases:
1. Node to be deleted is leaf
replace the leaf node with the NULL and simple free the allocated space. we are deleting
the node 85, since the node is a leaf node, therefore the node will be replaced with
NULL

2. Node to be deleted has only 1 child


3. Node to be deleted has 2 children.

The node which is to be deleted, is replaced with its in-order successor or predecessor
recursively until the node value (to be deleted) is placed on the leaf of the tree. After the
procedure, replace the node with NULL and free the allocated space.

Inorder Predecessor and Successor If X has two children then its inorder predecessor is the
maximum value in its left subtree(righmost node in left subtree) and its inorder successor the
minimum value in its right subtree( leftmost node in right subtree).

In the following image, the node 50 is to be deleted which is the root node of the tree. The in-
order traversal of the tree given below.

6, 25, 30, 50, 52, 60, 70, 75.

replace 50 with its in-order successor 52. Now, 50 will be moved to the leaf of the tree, which
will simply be deleted.
The in-order predecessor or the successor can then be deleted using any of the case-1 or case-
2.
UNIT-IV(Trees)

Trees: Definitions and Concepts


 Operations on Binary Trees
 Representation of binary tree
Tree Traversals
Trees
General View of a Tree

leaves

branches

root
Computer Scientist’s View

root
leaves

branches

nodes
Trees
Definition:- Tree is a non linear data structures. It is a collection of entities called nodes.
A tree is a finite set of one or more nodes such that:
i) There is a specially designated node called the root.
ii) Remaining nodes are partitioned into ‘n’ (n>0) disjoint sets T 1,T2,..Tn, where each
Ti (i=1,2,….n) is a Tree, T1,T2,..Tn are called sub tree of the root.
Trees
Definition:- Tree is a non linear data structures. It is a collection of entities called nodes.
A tree is a finite set of one or more nodes such that:
i) There is a specially designated node called the root.
ii) Remaining nodes are partitioned into ‘n’ (n>0) disjoint sets T 1,T2,..Tn, where each
Ti (i=1,2,….n) is a Tree, T1,T2,..Tn are called sub tree of the root.

structure of a tree
T1 A T3
T2

B C D

E F G H I J

K L
Tree Terminology
Root: node without parent (A) Subtree: tree consisting of a node and its
Siblings: nodes share the same parent descendants.
Internal node: node with at least one child Sibling: node have same parent . I,J,K has
(A, B, C, F) parent F.
External node (leaf ): node without
children (E, I, J, K, G, H, D)
Ancestors of a node: parent, grandparent, A
grand-grandparent, etc.
Descendant of a node: child, grandchild,
grand-grandchild, etc.
Level : set of nodes with same depth called B C D
depth and root is termed as in level 0. If a
node is at level i and its child is at level i+1
and parent is at level i-1. This is true for all
except the root node.
E F G H
Depth of a node: number of ancestors, ie
length of the path from the root to the node
Height of a node: length of the path from
that node to the deepest node
Height of a tree: maximum depth of any I J K
node
Degree of a node: the number of its children
Degree of a tree: the maximum number of
its node.
Tree Terminology
Depth of a node: number of ancestors, ie length
of the path from the root to the node
Height of a node: length of the path from that
node to the deepest node
Tree Terminology

• Path: a sequence of nodes and edges


connecting a node with descendants.
Tree Terminology

• Height of a tree:
• The height of tree is number of edges present in longest path of a tree
• A leaf node will have a height of 0.
• Height of a node is the number of edges on the longest path from the node
to a leaf.
• Height of node A is 3 that is from
A->C->D->E
Not from A to G
Tree Terminology
• The depth of a node is the number of edges from root node to that particular node.
A root node will have a depth of 0.t

Note: depth and height of the tree


returns same value. But for individual
nodes gives different values.
Tree Terminology
- Skew trees: every node is a tree has exactly one child except the leaf node is called
skew tree.
- if every node has only left child is called left skew tree
- if every node has only right child is called right skew tree.
A tree satisfies the following properties:

1. It has one designated node, called the root, that has no


parent.
2. Every node, except the root, has exactly one parent.
3. A node may have zero or more children.
4. There is a unique directed path from the root to each node.

5 5
5
1
3 2 3 2
3 2

4 1 6 4 6
4 1 6
tree Not a tree Not a tree
Types of trees
1. Binary tree
2. Binary search tree
3. Heap tree
Binary Trees
• Definition :-
A binary tree is a finite set of nodes such that
i. T is empty tree (called empty binary tree)
ii. T contains a specially designed node called the root of T, and remaining
nodes of T form two disjoint binary trees T 1 and T2 which are called left
sub tree and right sub tree respectively.
iii. Each node in binary tree has at most two children. (0,1,2)

Example of binary tree Left Right sub tree


A
sub tree
root B C

D E F G
left Right
sub sub H I J
tree tree
K
• Difference between tree and binary tree
Trees
1.Tree never be empty Binary tree
2. A node may have any 1. May be empty
number of child nodes. 2. A node may have
at most two children
0, 1, or 2
Three special situations of a binary tree are possible
1.Full binary tree
2.Complete binary tree.
3. Strict binary tree
Types of binary Trees
Full binary tree
A binary tree is a full binary tree, if it contains
maximum possible number of nodes in all levels.
Ie. each node will have exactly two children and all leaf
nodes at same level.
Level 0- 1node
1

2 3 Level 1-2nodes

4 5 6 7 Level 2-4 nodes

Level 3-8nodes
8 9 10 11 12 13 14 15
Types of binary Trees
Full binary tree
A binary tree is a full binary tree, if it contains
maximum possible number of nodes in all levels.
Ie. each node will have exactly two children and all leaf
nodes at same level.
Level 0- 1node
1

2 3 Level 1-2nodes

4 5 6 7 Level 2-4 nodes

Level 3-8nodes
8 9 10 11 12 13 14 15
Number of nodes in a full binary tree is 2h+1-1
Types of binary Trees
Full binary tree
A binary tree is a full binary tree, if it contains
maximum possible number of nodes in all levels.
Ie. each node will have exactly two children and all leaf
nodes at same level.
Level 0- 1node
1

2 3 Level 1-2nodes

4 5 6 7 Level 2-4 nodes

Level 3-8nodes
8 9 10 11 12 13 14 15
Number of nodes in a full binary tree is 2h+1-1

Number of leaf nodes in a full binary tree is 2h


Types of binary Trees

Complete binary tree


A binary tree is said to be complete binary tree, if all its levels
have maximum number of nodes except possibly the last level,
and In the last level, the nodes are attached starting from the
left-most position. All leaf nodes at height h or h-1.

1 Level 0-1node
A

2 3 Level 1-2 nodes B C


4 5 6 7 Level 2- 4 nodes D E F G
8 9 Level 3- 2 nodes
H I J
K
Types of binary Trees

Complete binary tree


A binary tree is said to be complete binary tree, if all its levels
have maximum number of nodes except possibly the last level,
and In the last level, the nodes are attached starting from the
left-most position. All leaf nodes at height h or h-1.

1 Level 0-1node
2 3 Level 1-2 nodes
4 5 6 7 Level 2- 4 nodes
8 9 Level 3- 2 nodes

Number of nodes in a complete binary tree is between 2h (minimum) and 2h+1-1 (maximum)
Types of binary Trees

strict binary tree:


every node in binary tree has exactly two
children or no children ie except the leaf nodes.

B C

D E F G

H I
Applications of Binary Trees
 Expression trees used in compilers
 Huffman code trees used in data compression
algorithms.
 B- trees used in data bases.

 Binary Search Tree (BST), which supports search,


insertion and deletion on a collection of items in
O(logn) (average).
 Priority Queue(PQ),which supports search an deletion
of minimum(or maximum) on a collection of items in
logarithmic time (in worst case).
Operations on Binary Trees
 Basic Operations
• Inserting an element into a tree
• Deleting an element from a tree
• Search for an element in a tree
• Traversing a tree( ie. visting all nodes )
Auxiliary Operations
• Find size , height of the tree
Binary Tree Traversals
The process of visiting all nodes of a tree is called tree traversal. Each node is processed
only oncebut it may be visited more than once. The four traversal techniques are -

• Preorder (DLR) Traversal


• Inorder (LDR) Traversal
• Postorder (LRD) Traversal
• Level Order Traversal
Binary Tree Traversals
Preorder (DLR) Traversal
Preorder traversal is defined as follows:
• Visit the root.
• Traverse the left subtree in Preorder.
• Traverse the right subtree in Preorder.
Binary Tree Traversals

Preorder (DLR) Traversal


Binary Tree Traversals

Preorder (DLR) Traversal


Preorder of binary tree
Preorder- A,B,D,E,H,I,C,F,J,K,G

B C

D E F G

H I J
K
Binary Tree Traversals
Preorder (DLR) Traversal
Example

8
Binary Tree Traversals
Inorder (LDR) Traversal

Inorder traversal is defined as follows:


• Traverse the left subtree in Inorder.
• Visit the root.
• Traverse the right subtree in Inorder.

8
Inorder of binary tree

A Inorder- D,B,H,E,I,A,F,K,J,C,G

B C

D E F G

J
H I

K
In order of binary tree

A Inorder- D,B,H,E,I,A,F,K,J,C,G

B C

D E F G

H J
I

K
Binary Tree Traversals
Inorder (LDR) Traversal

Example

8
Binary Tree Traversals
Postorder (LRD) Traversal

Postorder traversal is defined as follows:


• Traverse the left subtree in Postorder.
• Traverse the right subtree in Postorder.
• Visit the root.

8
Post order of binary tree

A Post order- D,H,I,E,B,K,J,F,G,C,A

B C

D E F G

J
H I

K
Binary Tree Traversals
Level Order Traversal

Level order traversal is defined as follows:


• Visit the root.
• While traversing level(,keepall the elements at level in queue.
• Go to the next level and visit all the nodes at that level.
• Repeat this until all levels are completed.
Level order of binary tree

A Level order- A,B,C,D,E,F,G,H,I,J,K

B C

D E F G

J
H I

K
Binary Tree Traversals
Level Order Traversal

Example

8
Representing an Expression in Binary tree and
applying Traversals
Write pre,in and post order of the following tree
• (A-B) + C* (D/E)
+

- *

A B C /

D E
Preorder for the below tree

+
Pre order- +,-,A,B,*,C,/,D,E.
- *

A B C /

D E
Inorder for the below tree

+
Pre order- +,-,A,B,*,C,/,D,E.
- *

A B C /

In order- A,-,B,+,C,*,D,/,E D E
Postorder for the below tree

• (A-B) + C* (D/E)
+

- *

A B C /
Pre order- +,-,A,B,*,C,/,D,E.

In order- A,-,B,+,C,*,D,/,E D E

Post order- A,B,-,C,D,E,/,*,+


Example
(a+(b*c)+c)*c+d
Binary tree representation
1. linear\sequential representation
2. Linked list
Binary tree representation
1. linear\sequential representation
2. Linked list

1. linear\sequential representation (using arrays)

 The nodes are stored level by level, starting from the zero level
where root node is present.
the following rules can be used to decide the location of any node of
a tree in the array.
a. The root node is at location 0.
b. If a node is at a location ‘i’, then its left child is located at 2 * i + 1
and right child is located at 2 * i + 2
c. The space required by an n node binary tree is 2n+1.
Example- linear\sequential representation
1
A

2 3
B D

4 6 7
C E G

13
F

A B D C . E G . . . . . F

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sequential representation
Advantages of linear representation
1. Any node can be accessed from any other node by calculating the index
and this is efficient from execution point of view.
2. There is no overhead of maintaining the pointers.
3. Some programming languages like BASIC, FORTRAN, where dynamic
allocation is not possible, array representation is the only way to store a
tree.

Disadvantages

1. Other than full binary tree, if you store any tree most of the memory locations are
empty.
2. Allows only static representation. There is no possible way to enhance the size of the
tree.
3. Inserting a new node and deleting a node from it are inefficient with this
representation because these require considerable data movement up and down the
array which demand excessive processing time.
Representation of Binary Tree using Linked List

 The most popular way to represent a binary tree.


 Each element is represented by a node that has two link fields (leftChild
and rightChild) plus an element field .
 The space required by an n node binary tree is n * sizeof a node.

struct node { /* a node in the tree structure */


struct node *lchild;
int data ;
struct node *rchild;
};
The pointer lchild stores the address of left child node.
The pointer rchild stores the address of right child node.
If child is not available NULL is stored.
A pointer variable root represents the root of the tree.
Representation of Binary Tree using Linked List
Advantages of linked representation

 This representation is superior to the array representation as there is no


wastage of memory.

 There is no need to have prior knowledge of depth of the tree. Using dynamic
memory allocation concept one can create as much memory (node) as
required.

 Insertion and deletion which are the most common operations can be done
without moving the other nodes.

Disadvantages of linked representation


 This representation does not provide direct access to a node.
 It needs additional space in each node for storing the left and right subtrees.
 The number of NULL values in a tree which contains ‘n’ nodes is ‘n+1’
Binary Tree Insertion
1. Define structure for a node
• struct node
• {
• int data;
• struct node *left;
• struct node *right;
• };
2. Create new node
• struct node* createNode(int data)
• {
• // Allocate memory for new node
• struct node* node = (struct node*)malloc(sizeof(struct node));

• // Assign data to this node
• node->data = data;

• // Initialize left and right children as NULL
• node->left = NULL;
• node->right = NULL;
• return(node);
• }
int main()
{
// insert new node into a binary tree
struct node * root=createNode(20);
root->lchild=createNode(30); 20

root->rchild=createNode(40);
root->lchild->lchild=createNode(50);
root->rchild->lchild=creatNode(60); 30 40

return 0;
} 50 60
Tree traversals
void preorder(struct node *root)
{ 20
if(root)
{
printf(“%d “,root->data); 30 40
preorder(root->lchild);
preorder(root->rchild);
} 50 60
}
20 30 50 40 60
Inorder traversal
void inorder(struct node* root)
{
if(root) 20
{
inorder(root->lchild);
printf(“%d “,root->data);
inorder(root->rchild); 30 40
}
}
50
50 60
Post order traversal
void postorder(struct node *root)
{
if(root)
{
postorder(root->lchild);
postorder(root->rchild);
printf(“%d “, root->data);
}
}
Height of Binary Tree
UNIT-IV(Trees)

Binary search Tree(BST).


Search operation on BST
Insert operation on BST
Create BST
Delete operation on BST
Binary search tree

 Binary search tree is one type of binary tree.


 Binary search tree is a binary tree that may be empty .
 Non empty binary search tree satisfies the following properties.
1. every node has a value and no two elements (node) have same value,
therefore all values are different.
2. The values in the left sub tree are smaller than value of its root node.
3. The values in the right sub tree are greater than its value of the root node.
4. The left and right sub tree of the root also binary search tree.

All<K All>K
• The following are the BST ( binary search trees)

35
18
45
15 20

12 17 25 40 50

20
15 25 Its not a binary search
tree because it fails to satisfy
12 the property of 3 and 4
18 22
Binary search tree representation

• Binary search tree representation is same as


binary tree representation
• Refer binary tree representation.
BST operations
1.create
2. insertion
3. deletion
4.findmax
5.findmin
6.search
7.display
1. Search operation on BST
• Searching-
suppose we wish to search an element with key or value (K), we begin at the
root. If root is NULL, the search tree contains no such element, the search
is unsuccessful.
• Otherwise, we compare key value with value of root node. if K < key
value in the root, then no element in the right sub tree and only left sub tree
is to be searched.
• If K is greater than key in the root, only the right sub tree is to be searched.
• This process repeated until the key element is found or reached to the leaf
nodes.
• If K= key value in the root, then search terminates successfully. The sub
tree may be searched similarly.
Example- find a given key node in a following BST

22>20

20
Search node-22
15
25

12
18 22
Insertion operation on BST
• Insertion operation on a binary search tree is very simple. In fact, one step
more than the searching operation.
• To insert a node with data, say ITEM, into a tree, the tree is to be searched
starting from the root node.
• If ITEM is found, do nothing, otherwise ITEM is to be inserted at the dead
end where search halts.
6 6
Insert node- 5 2 8 2 8

1 4 1 4
3
3 5
a) Before insertion b) After inserting node 5
Construct BST with following elements
19, 55,44,98,8,23,15,6,10,82,99

Root
19

8 55

6 15
44 98

10
23
82 99
Deletion operation on BST

• Another frequently used operation on the BST is to delete any node from it.
This is slightly complicated one.
• To delete a node from a binary search tree, there are three possible cases
when we delete a node.
case-1- if deleted node is a leaf node or node has no children, it can be deleted
by making its parent pointers ( left, right) to NULL.

30
30

25 40
25 40

20 20 45
35 45

Deleted node-35 b. After deletion


a. Before deletion
Case-2 if the deleted node has one child, either left child or
right child, its parent pointer needs to be changed.
8
8
Deleted After deletion
6 9 6 9
node-1

1 7
4 7
4
case-3
• If deleted node has two children
• First find out the inorder successor (X) of the deleted node ( in order
successor means the node which comes after the deleted node during the
inorder traversal).
• Delete a node X from tree by using case 1 or case 2 (it can be verified that
x never has a left child) and then replace the data content in deleted node
by the data of the x.

35
35

45 24 45
20

16 42 16 29 42
29

24 27 33
33
x
27
Insert a Node
Delete a Node
Search for a Node
Applications of Trees
Trees are most useful data structures in computer science. Some of the
applications of trees are
1. Library database in library, students database in schools and colleges,
patients database in hospitals, employee database in an organization or any
database implemented using trees.
2. The file system in your computer i.e folders and all files, would be stored
as a tree
3. When you search for a word or misspelled you would get list of possible
corrected words, you are using tree.
4. When you watch you tube video’s or surfing internet you would get all the
information in your computer which is somewhere in the world would
travel to your computer through many intermediate computers called
routers. Routers uses tree and graphs for routing data.
Properties of binary trees

1. In BT, the maximum number of nodes at level i, where i>=0


at level 0=20=1 node
at level 1= 21=2 nodes n= 2i
2. The maximum no of nodes possible in a binary tree of height h is

Nmax=2h+1-1 If height=2 maximum no of nodes are 7

3. The minimum no of node possible in a binary tree of height h is

Nmin=h+1 If height =3 minimum no of nodes are 3

4. The minimum no of nodes possible at every level is only one node. When every parent
node has on child such kind of tree is called skew binary tree.

Fig-b
Fig-a
5. for any non empty binary tree, if n is the number of nodes and e is the number of edges, then

n= e+1

6. for any non empty binary tree T, if n0 is the no. of leaf nodes and n2 is
the no. of internal nodes (degree-2), then

n0= n2+1

No of internal nodes n2 =3
No of leaf nodes n0 = 3+1=4
7. Minimum height of a binary tree with n number of nodes is
log2(n+1)
8. Total no of binary trees possible with n nodes is

(1/(n+1) )*2nCn
• Total no of binary tree possible with 3 nodes is- 5- A,B,C

A A A A
A

B B B B
B C

C C C
1 C
2 3 5
4
18CS C07 - Data Structures
Unit-4

Graphs: Introduction, Applications of graphs,


Graph representations, graph traversals,
Minimal Spanning Trees.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Introduction to Graphs
• Non linear data structure
• A graph G is defined as an ordered set (V, E), where V
represents the set of vertices and E represents the edges that
connect these vertices. A graph is a generalization of the tree
structure
• An edge e = (u,v) is a pair of vertices
• Example:

a b This graph has five vertices or nodes and


seven edges in the graph.
c V= {a,b,c,d,e}
E= {(a,b),(a,c),(a,d),(b,e),
d e (c,d),(c,e),(d,e)}

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Graph Terminology
• A graph can be directed or undirected.
• Directed edge: ordered pair of vertices (u, v). First vertex u is the
origin and second vertex v is the destination
Example: one-way road traffic
• Undirected edge: unordered pair of vertices (u, v). edges do not
have any direction associated with them.
Example: railway lines
• Undirected graph: A graph in which all the edges are undirected
• Directed graph(Digraph): A graph in which all the edges are directed

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• Adjacent vertices If two nodes are connected by an edge, they are
neighbors and the nodes are adjacent to each other.
• Degree of a Vertex- the number of edges connected to that vertex..
• For directed graph, the in-degree of a vertex v is the number of
edges incident on to the vertex. the out-degree of a vertex v is the
number of edges leaving from that vertex. Degree of a node is sum of
in-degree and out-degree.

directed graph
Examples Undirected graph
3 0 in:1, out: 1
0
0 2
1 2 1 in: 1, out: 2
3 3
3 1 2 3 3 4 5 6 2 in: 1, out: 0
2
1 1 2
3 3 B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• Cut vertex: A vertex which when deleted would disconnect the
remaining graph.
• isolated node : degree of a node is zero. a vertex is not an end-point
of any edge
• Parallel Edge/Multiple edge: Two distinct edges are parallel if they
connect the same pair of vertices.

• Loop: An edge that has identical end-points is called a loop. That is, e
= (A, A).

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• Multi-graph: A graph with multiple edges and/or loops is called a
multi-graph.
• Simple Graph: A graph is called a simple graph if it has no loops and
no parallel edges.
• Acyclic graph: A graph without cycles is called acyclic graph.
• A directed acyclic graph [DAG] is a directed graph with no cycles

Cyclic graph Acyclic graph


A A

C B C
B

D D

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• Regular graph It is a graph where each vertex has the same number
of neighbours. That is, every node has the same degree. A regular
graph with vertices of degree k is called a k–regular graph or a
regular graph of degree k.

Regular graphs

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• Path: A path P written as P = {v0, v1, v2, ..., vn), of length n from a
node u to v is defined as a sequence of nodes. Here, u = v0, v = vn
and vi–1 is adjacent to vi for i = 1, 2, 3,..., n.
• Simple path A path P is known as a simple path if all the nodes in
the path are distinct . In a simple path if v0 is equal to vn , v0 = vn,
then the path is called a closed simple path.
• Cycle: A path in which the first and the last vertices are same. A
simple cycle has no repeated edges or vertices (except the first
and last vertices).
a b

d e
Path: a c d a
Path: b e c Cycle
Simple path B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• Connected graph: A graph is said to be connected if for any two
vertices (u, v) in V there is a path from u to v. That is to say that there
are no isolated nodes in a connected graph. A connected graph that
does not have any cycle is called a tree.
• strongly connected graph: A directed graph is said to be strongly
connected if for every pair of distinct vertices vi,vj in G, there is a path
from vi to vj and also from vj to vi.
• Weakly connected graph: a directed graph is not strongly connected
but it is connected

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• Complete graph: A graph G is said to be complete if all its nodes are
fully connected. That is, there is a path from one node to every other
node in the graph. A complete graph has n(n–1)/2 edges, where n is
the number of nodes in G.

• Labelled graph or weighted graph: Every edge in the graph is


assigned some weight value. The weight of an edge denoted by w(e)
is a positive value which indicates the cost of traversing the edge.
Weighted Graph

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• Sparse graphs : Graphs with relatively few edges (generally if it
edges < |V| log |V|) are called sparse graphs.
• Dense Graph: Graphs with relatively few of the possible edges
missing are called dense.
• Sub graph is a graph with subset of vertices and edges of a graph.

0 0 0 1 2 0
1 2 G1
1 2 3 1 2
3
G2 G3 3
G
G4

Some of the sub graph of G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Applications of Graph
• Representing relationships between components in
electronic circuits.
• Transportation networks: Highway network, Flight
network.
• Computer networks: Local area network, Internet,
Web.
• Databases: For representing ER (Entity Relationship)
diagrams in databases, for representing dependency
of tables in databases.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Representation of a Graph

There are two ways of representing a graph in memory:

Sequential Representation by using Adjacency


Matrix.

Linked list Representation by using Adjacency List.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Adjacency Matrix Representation
• An adjacency matrix is used to represent which nodes are adjacent to one
another. Two nodes are adjacent if there is an edge connecting them.
• For any graph G having n nodes, the adjacency matrix will have the
dimension of n X n.(2D martix)
• In an adjacency matrix, the rows and columns are labelled by graph
vertices.
• An entry aij in the adjacency matrix will contain 1, if vertices vi and vj are
adjacent to each other. However, if the nodes are not adjacent, aij will be
set to zero.

• Since an adjacency matrix contains only 0s and 1s, it is called a bit matrix or
a Boolean matrix.
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• For a simple graph (that has no loops), the adjacency matrix has 0s
on the diagonal.
• The adjacency matrix of an undirected graph is symmetric.
• The memory use of an adjacency matrix is O(n2), where n is the
number of nodes in the graph.
• The adjacency matrix for a weighted graph contains the weights of
the edges connecting the nodes.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Linked List Representation
• This structure consists of a list of all nodes in G and every node is
in turn linked to its own list that contains the names of all other
nodes that are adjacent to it.

Example -1 Example -2

Example -3

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
In C, Node structure:
typedef struct node
{
int vertex;
int weight;
struct node *next;
};

• The advantages of using an adjacency list are:


 It is easy to follow and clearly shows the adjacent nodes of a
particular node.
 It is often used for storing graphs that have a small-to-moderate
number of edges. That is, an adjacency list is preferred for
representing sparse graphs in the computer’s memory; otherwise,
an adjacency matrix is a good choice.
 Adding new nodes in G is easy and straightforward when G is
represented using an adjacency list. Adding new nodes in an
adjacency matrix is a difficult task, as the size of the matrix needs
to be changed and existing nodes may have to be reordered.
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
Undirected Graph Adjacency List Adjacency Matrix

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Directed Graph Adjacency List Adjacency Matrix

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Comparison among various representations

• Adjacency matrix representation always requires n x n matrix


with n vertices, regardless of the number of edges. But from
the manipulation point of view, this representation is the best.
• Insertion, deletion and searching are concerned, in some
cases, linked list representation has an advantage, but when
we consider the overall performance, matrix representation
of graph is more powerful than adjacency list representation.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Bi-connected Components
• A vertex v of G is called an articulation point, if removing v
along with the edges incident on v, results in a graph that has
at least two connected components.
Here, C is articulation
point/cut vertex.
As deleting vertex C
from the graph results
in two disconnected
components of the
original graph

Example-1

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
• A bi-connected graph is defined as a connected graph that has no
articulation vertices and no bridges. That is, a bi-connected graph is
connected and non-separable in the sense that even if we remove
any vertex or edge from the graph, the resultant graph is still
connected.
• An edge in a graph is called a bridge if removing that edge results in a
disconnected graph. Also, an edge in a graph that does not lie on a
cycle is a bridge.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Graph Traversal Techniques
• Traversing a graph means visiting all the vertices in the graph
exactly once.
• Graph traversal algorithm is also called as graph search algorithm.
• There are two standard graph traversal techniques:
1. Depth-First Search (DFS)
2. Breadth-First Search (BFS)

• DFS and BFS traversals result an acyclic graph.


• DFS and BFS traversal on the same graph do not give the same
order of visit of vertices.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Depth-First Search (DFS)
• DFS is similar to the pre-order traversal of a tree.
• It processes the nodes depth-wise.
• The depth-first search algorithm progresses by expanding the starting
node of G and then going deeper and deeper until the goal node is found,
or until a node that has no children is encountered. When a dead-end is
reached, the algorithm backtracks, returning to the most recent node that
has not been completely explored. This process is repeated until start
node is reached when backtracking.
• Stack is used to implement the Depth- First- Search.
• Initially starting vertex will be push onto the stack.
• To visit a vertex, we are to pop a vertex from the stack, and then push all
the adjacent vertices onto it.
• A list is maintained to store the vertices already visited. This list is used to
identify whether the popped vertex is already visited or not.
• If the vertex is already visited, ignore it and pop the stack for the next
vertex to be visited.
• This procedure will continue until the stack becomes empty
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• Steps to do DFS:
1. Select an unvisited node say X, visit it, and treat as the current node.
2. Find an unvisited adjacent of the current node as Y, visit it, and make it
Y as the new current node(X);
3. Repeat step 1 and 2 till there is a ‘dead end’. dead end means a vertex
which do not have adjacent nodes or no more nodes can be visited.
4. After coming to a dead end do backtracking along to its parent to see
if it has another adjacent vertex other than Y and then repeat the
steps 1,2,3. until all vertices are visited.
5. The algorithm terminates when backtracking leads back to the starting
node.
• In DFS algorithm, the following edges are encountered:
– Tree edge/discovery edge: encounter new vertex
– Back edge: from descendent to ancestor
– Forward edge: from ancestor to descendent
– Cross edge: between a tree or subtrees
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• The final generated tree is called the DFS tree and the order in
which the vertices are processed is called DFS numbers of the
vertices.
• The time complexity of DFS is O(V + E), if we use adjacency lists
for representing the graphs.
• if an adjacency matrix is used for a graph representation, O(V2)
complexity.
• Applications of DFS
 Topological sorting
 To check whether the undirected graph is connect or not --
if a DFS traversal visits all the nodes then the graph is
connected
 Finding articulation points (cut vertices) of the graph
 To check the Acyclicity of the graph
 Solving puzzles such as mazes
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
Algorithm for Depth First Traversal(DFS)

Status – 1  unvsited node


Status- 2  node is visited but not processed ( it is in stack)
Statuc – 3 node is processed

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Example: Perform DFS Traversal on the following graph

DFS Traversal Order:


A, B, E, F, C, D

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Breadth First Traversal:
• It processes the vertices breadth-wise.
• The breadth first traversal is similar to the level-order traversal
of a binary tree.
• Breadth-first search (BFS) is a graph search algorithm that begins
at the root node and explores all the neighbouring nodes. Then
for each of those nearest nodes, it explores their unexplored
neighbour nodes, and so on, until it finds the goal.
• The breadth first traversal of a graph is similar to traversing a
binary tree level by level (the nodes at each level are visited
from left to right).
• All the nodes at any level, i, are visited before visiting the nodes
at level i + 1.
• To implement the breadth first search algorithm, we use a
queue.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
BFS follows the following steps:
1. Select an unvisited node x, visit it, have it be the root in a
BFS tree being formed. Its level is called the current level.
2. From each node x in the current level, visit all the unvisited
neighbors of x. The newly visited nodes from this level form
a new level that becomes the next current level.
3. Repeat step 2 until no more nodes can be visited.
4. If there are still unvisited nodes, repeat from Step 1.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Example: Perform BFS Traversal on the following graph

BFS Traversal Order:


A D E B C F G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Algorithm for Breadth First Traversal(BFS)

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Applications of Breadth-First Search Algorithm
Breadth-first search can be used to solve many
problems such as:

• Finding all connected components in a graph G.


• Finding the shortest path between two nodes, u
and v, of an unweighted graph.
• Finding the shortest path between two nodes, u
and v, of a weighted graph.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Spanning Trees
•A spanning tree of a graph is just a sub graph that contains all the
vertices and is a tree( without any cycles in a graph)
•A graph may have many spanning trees.

Graph A Some Spanning Trees from Graph A

or or or

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Minimum spanning tree
• Take an example that connect four cities in India like
vijayawada, hyderabad, banglore and mumbai by a graph

Hyderabad
580km 270km

700km
Banglore vijayawada
650

1000km
980km
Mumbai
Find a short distance to cover all four cities

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Minimum Spanning Trees
• The minimum spanning tree for a given graph is the spanning
tree of minimum cost for that graph.

Weighted Graph Minimum Spanning Tree


7

2 2
5 3 3
4

1 1

Algorithms to find Minimum Spanning Trees are:


• Kruskal‘s Algorithm
• Prim‘s Algorithm
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
Kruskal’s algorithm

• Kruskal’s algorithm finds the minimum cost spanning tree by


selecting the edges one by one
• Steps to find minimum spanning tree using krushkal’s
algorithm:
1. Draw all the vertices of the graph.
2. Select the smallest edge from the graph and add it into
the spanning tree (initially it is empty).
3. Select the next smallest edge and add it into the
spanning tree so that it does form any cycle.
4. Repeat the 2 and 3 steps until all the vertices are
reached.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
A 5 B
4 6 2

C 2 D 3

3 1 2
E F
4

Minimum Spanning Tree for the above graph is:

A B
2

C 2 D

3 1 2
B.Poonguzharselvi,Assistant Professor, CSE, E F
CBIT
Example
Consider an undirected, weight graph
3
10
F C
A 4
4
3
8
6
5
4
B D
4
H 1
2
3
G 3
E

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Sort the edges by increasing edge weight
3
10
F C edge dv edge dv

A 4
4
3 (D,E) 1 (B,E) 4
8 (B,F) 4
6 (D,G) 2
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10

Accepting edge (E,G) would create a cycle

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3  (A,B) 8
(B,C) 4 (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3  (A,B) 8
(B,C) 4  (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4 
8 4
6 (D,G) 2  (B,F) 4
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3  (A,B) 8
(B,C) 4  (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4 
4
8
6 (D,G) 2  (B,F) 4 
5
4
B D (E,G) 3  (B,H) 4
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3  (A,B) 8
(B,C) 4  (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4 
4
8
6 (D,G) 2  (B,F) 4 
5
4
B D (E,G) 3  (B,H) 4 
4
H 1
(C,D) 3  (A,H) 5
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3  (A,B) 8
(B,C) 4  (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
10
F C edge dv edge dv

A 4 3 (D,E) 1  (B,E) 4 
4
8
6 (D,G) 2  (B,F) 4 
5
4
B D (E,G) 3  (B,H) 4 
4
H 1
(C,D) 3  (A,H) 5 
2
3 (G,H) 3  (D,F) 6
G 3
E (C,F) 3  (A,B) 8
(B,C) 4  (A,F) 10

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Select first |V|–1 edges which do not
generate a cycle
3
F C edge dv edge dv

A 3 (D,E) 1  (B,E) 4 
4
(D,G) 2  (B,F) 4 
5
B D (E,G) 3  (B,H) 4 
H 1
(C,D) 3  (A,H) 5 
2
3
G E
(G,H)
(C,F)
(B,C)
3
3
4



(D,F)
(A,B)
(A,F)
6
8
10
}
not
considered

Done
Total Cost =  dv = 21

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Prim’s algorithm
Prim’s algorithm finds the minimum cost spanning tree by
selecting the edges one by one as follows.
1. All vertices are marked as not visited

2. Any vertex v you like is chosen as starting vertex and is


marked as visited (define a cluster C).

3. The smallest- weighted edge e = (v,u), which connects one


vertex v inside the cluster C with another vertex u outside of C,
is chosen and is added to the MST.

4. The process is repeated until a spanning tree is formed with


all the vetices of the graph.

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Walk-Through
2
Initialize array
3
10
F C K dv pv
A 7 3 A F  
8 4
18 B F  
4
9
B D C F  
10
H 25 D F  
2
3 E F  
G 7
E F F  
G F  
H F  

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2
Start with any node, say D
3
10
F C K dv pv
A 7
4
3 A
8
18 B
4
9
B D C
10
H 25 D T 0 
2
3 E
G 7
E F
G
H

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Update distances of adjacent,
unselected nodes
3
10
F C K dv pv
A 7
4
3 A
8
18 B
4
9
B D C 3 D
10
H 25 D T 0 
2
3 E 25 D
G 7
E F 18 D
G 2 D
H

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A
8
18 B
4
9
B D C 3 D
10
H 25 D T 0 
2
3 E 25 D
G 7
E F 18 D
G T 2 D
H

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Update distances of adjacent,
unselected nodes
3
10
F C K dv pv
A 7
4
3 A
8
18 B
4
9
B D C 3 D
10
H 25 D T 0 
2
3 E 7 G
G 7
E F 18 D
G T 2 D
H 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A
8
18 B
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E 7 G
G 7
E F 18 D
G T 2 D
H 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Update distances of adjacent,
unselected nodes
3
10
F C K dv pv
A 7
4
3 A
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E 7 G
G 7
E F 3 C
G T 2 D
H 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E 7 G
G 7
E F T 3 C
G T 2 D
H 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Update distances of adjacent,
2 unselected nodes
3
10
F C K dv pv
A 7
4
3 A 10 F
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E 2 F
G 7
E F T 3 C
G T 2 D
H 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A 10 F
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Update distances of adjacent,
2
unselected nodes
3
10
F C K dv pv
A 7
4
3 A 10 F
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H 3 G
Table entries unchanged

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A 10 F
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H T 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Update distances of adjacent,
2
unselected nodes
3
10
F C K dv pv
A 7
4
3 A 4 H
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H T 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A T 4 H
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H T 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
Update distances of adjacent,
2 unselected nodes
3
10
F C K dv pv
A 7
4
3 A T 4 H
8
18 B 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H T 3 G
Table entries unchanged

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Select node with minimum
distance
3
10
F C K dv pv
A 7
4
3 A T 4 H
8
18 B T 4 C
4
9
B D C T 3 D
10
H 25 D T 0 
2
3 E T 2 F
G 7
E F T 3 C
G T 2 D
H T 3 G

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
2 Cost of Minimum Spanning
Tree =  dv = 21
3
F C K dv pv
A 4
3 A T 4 H
B T 4 C
4
B D C T 3 D
H D T 0 
2
3 E T 2 F
G E F T 3 C
G T 2 D
H T 3 G

Done

B.Poonguzharselvi,Assistant Professor, CSE,


CBIT
A 5 B
4 6 2
C 2 D 3

3 1 2
E F
4

Adjacency matrix Minimum Spanning Tree for the above graph is:

A B C D E F
A - 5 4 6 2 -
A B
B 5 - - 2 - 3 2
C 4 - - - 3 - 2
C D
D 6 2 - - 1 2
E 2 - 3 1 - 4 3 1 2
F - 3 - 2 4 - B.Poonguzharselvi,Assistant Professor, CSE, E F
CBIT
Graph

 A graph is a non-linear data structure


 A graph G is defined as an ordered set (V, E), where V represents the set of vertices and E represents the edges that connect
these vertices. A graph is a generalization of the tree structure
 An edge e = (u,v) is a pair of vertices
 Directed edge: It is an ordered pair of vertices (u, v), where first vertex u is the origin and second vertex v is the destination.

 Undirected edge: It is an unordered pair- of vertices (u, v)

 Directed graph: all the edges are directed.

 Undirected Graph: All the edges are undirected.


 When an edge connects two vertices, the vertices are said to be adjacent to each other and the edge is incident on both
vertices.
 A graph with no cycles is called a tree. A tree is an acyclic connected graph.

 A self-loop is an edge that connects a vertex to itself.

 Two edges are parallel if they connect the same pair of vertices.

 The Degree of a vertex is the number of edges incident on it.

 A subgraph is a subset of a graph's edges (with associated vertices) that form a graph.
 A path in a graph is a sequence of adjacent vertices. Simple path is a path with no repeated vertices. In the graph below, the
dotted lines represent a path from G to E.

 A cycle is a path where the first und last vertices ore the same. A simple cycle is a cycle with no repeated vertices or edges
(except the first and last vertices).

 A graph is connected if there is a path from every vertex to every other vertex.
In the following graph, it is possible to travel from one vertex to any other vertex. For example, one can traverse from vertex
‘a’ to vertex ‘e’ using the path ‘a-b-e’.

 A directed acyclic graph [DAG] is a directed graph with no cycles.


 A spanning tree can be defined as the subgraph of an undirected connected graph. It includes all the vertices along with the
least possible number of edges. If any vertex is missed, it is not a spanning tree. A spanning tree is a subset of the graph that
does not have cycles, and it also cannot be disconnected. In the spanning tree, the total number of edges is n-1. Here, n is the
number of vertices in the graph. A graph with n vertices can have an n^(n-2) number of spanning trees.


 A bipartite graph is a graph whose vertices can be divided into two sets such that all edges connect a vertex in one set
with a vertex in the other set. Or it is a set of graph vertices decomposed into two disjoint sets such that no two graph
vertices within the same set are adjacent.
Ex1.

Ex2.

Here, we partition the vertex set V= { A,B,C,D,E} into two disjoint vertex sets V1 = {A,D} and V2 = {B,C,E}.

 Graphs with all edges present are called complete graphs.

 In weighted graphs integers (weights) are assigned to each edge to represent (distances or costs).
Applications of Graphs

• Representing relationships between components in electronic circuits


• Transportation networks: Highway network, Flight network
• Computer networks: Local area network, Internet, Web
• Databases: For representing ER (Entity Relationship) diagrams in databases.

Representing Graphs

1. An adjacency matrix can be thought of as a table with rows and columns. The row labels and column labels represent the
nodes of a graph. An adjacency matrix is a square matrix where the number of rows, columns and nodes are the same. Each
cell of the matrix represents an edge or the relationship between two given nodes.

Directed Graph adjacency matrix: if there exists a directed edge from a given node to another, then the corresponding cell will be
marked one else zero.
Undirected weighted graph represenation

2. An adjacency list represents a graph as an array of linked lists. The index of the array represents a vertex and each element
in its linked list represents the other vertices that form an edge with the vertex.

For example, we have a graph below for undirected graph.


Linked list representation of the graph: Here, 0, 1, 2, 3 are the vertices and each of them forms a linked list with all
of its adjacent vertices.
Ex2. Directed Graph adjacency list

Graph Traversals: graph search algorithm can be thought of as starting at some source vertex in a graph and
"searching" the graph by going through the edges and visiting the vertices.
1. Depth First Traversal:
Initially all vertices are marked unvisited (false). The DFS algorithm starts at a vertex u in thc graph. By starting at vertex u it
considers the edges from u to other vertices. If the edge leads to an already visited vertex, then backtrack to current vertex u. If
an edge leads to an unvisited vertex, then go to that vertex and start processing from that vertex. That means the new vertex
becomes the current vertex. Follow this process until the dead-end is reached. At this point start backtracking. The process
terminates when backtracking leads back to the start vertex.

Ex.

In fig C, It is a directed graph, Start from A->B->E->G->backtrack to E->F->Backtrack to E->Backtrack to B->Backtrack


to A->C->D
Implementation of DFS
 Implemented using adjacency list representation and dictionary, recursion
 The key in the dictionary is the node of the graph and the element is the list of nodes adjacent to the visiting node.
Ex.(Fig a) G= {A: [B,C,D], B:[A,E,D], C:[A,D], D:[A,B,C,E], E:[B,D]}

def add_node(v):
if v in G:
print(v, “is already present in graph”)
else:
G[v] = []
def add_edge(v1,v2):
if v1 not in G:
print(v1,”is not present in graph”)
elif v2 not in G:
print(v2,”is not present in graph”)
else:
G[v1].append(v2)
G[v2].append(v1)
def DFS(node,visited,G): #node is the starting node, visited is the visited nodes, G is the dictionary
if node not in G:
print(“Node is not present”)
return
if node not in visited:
print(node)
visited.add(node)
for i in G[node]: #G[node] will give the values of node(A) i.e list of adjacent nodes.
DFS(i,visited,G)
visited = set()
G={}
add_node(“A”)
add_node(“B”)
add_node(“C”)
add_node(“D”)
add_node(“E”)
add_edge(“A”,”B”)
add_edge(“B”,”E”)
add_edge(“A”,”C”)
add_edge(“A”,”D”)
add_edge(“B”,”D”)
add_edge(“C”,”D”)
add_edge(“E”,”D”)
print(G)
DFS(“A”,visited, G)

2. BFS: Breadth First Search


The algorithm works as follows:

 Start by putting any one of the graph's vertices at the back of a queue.
 Take the front item of the queue and add it to the visited list.
 Create a list of that vertex's adjacent nodes. Add the ones which aren't in the visited list to the back of the queue.
 Keep repeating steps 2 and 3 until the queue is empty.
Python code:

def add_node(v):
if v in G:
print(v, "is already present in graph")
else:
G[v] = []
def add_edge(v1,v2):
if v1 not in G:
print(v1,"is not present in graph")
elif v2 not in G:
print(v2,"is not present in graph")
else:
G[v1].append(v2)
G[v2].append(v1)

def bfs(visited, graph,node):


visited.append(node)
queue.append(node)
while queue:
m = queue.pop(0)
print (m, end = " ")
for i in G[m]:
if i not in visited:
visited.append(i)
queue.append(i)

visited = []
G={}
queue = []
add_node("A")
add_node("B")
add_node("C")
add_node("D")
add_node("E")
add_edge("A","B")
add_edge("B","E")
add_edge("A","C")
add_edge("A","D")
add_edge("B","D")
add_edge("C","D")
add_edge("E","D")
print(G)
bfs(visited,G,'A')
UNIT-V

Graphs: Introduction, Applications of graphs, Graph

representations, graph traversals.

Hashing: Introduction, Hash Functions-Modulo,

Middle of Square, Folding, Collision Resolution

Techniques-Separate Chaining, Open addressing,-

Linear Probing, Quadratic Probing, Double Hashing.


Types of Graphs in Data Structures:
 Graphs in data structures are non-linear data structures made up of There are different types of graphs in
a finite number of nodes or vertices and the edges that connect data structures, each of which is detailed
them. below.
 Graphs in data structures are used to address real-world problems in
which it represents the problem area as a network like telephone 1. Finite Graph: The graph G=(V, E) is
networks, circuit networks, and social networks. called a finite graph if the number of
vertices and edges in the graph is limited
 For example, it can represent a single user as nodes or vertices in a in the number.
telephone network, while the link between them via telephone
represents edges. 2. Infinite Graph: The graph G=(V, E) is
called a finite graph if the number of
What Are Graphs in Data Structure? vertices and edges in the graph is
A graph is a non-linear kind of data interminable.
structure made up of nodes or vertices
and edges. The edges connect any two 3. Trivial Graph: A graph G= (V, E) is
nodes in the graph, and the nodes are also trivial if it contains only a single vertex
and no edges.
known as vertices.
4. Simple Graph: If each pair of nodes or
 This graph has a set of vertices V= { 1,2,3,4,5} and a set of edges vertices in a graph G=(V, E) has only one
E= { (1,2),(1,3),(2,3),(2,4),(2,5),(3,5),(4,50 }. edge, it is a simple graph.
 Now that you’ve learned about the definition of graphs in data
structures, you will learn about their various types. As a result, there is just one edge linking
two vertices, depicting one-to-one
interactions between two elements.
5. Multi Graph: The graph is referred to as a 10. Weighted Graph: A graph G= (V, E) is
multigraph if there are numerous edges called a labeled or weighted graph because
between a pair of vertices in a graph G= (V, E). each edge has a value or weight representing
There are no self-loops in a Multigraph. the cost of traversing that edge.

6. Null Graph: It's a reworked version of a 11. Directed Graph: A directed graph also
trivial graph. If several vertices but no edges referred to as a digraph, is a set of nodes
connect them, a graph G= (V, E) is a null connected by edges, each with a direction.
graph.
12. Undirected Graph: An undirected graph
7. Complete Graph: If a graph G= (V, E) is also comprises a set of nodes and links connecting
a simple graph, it is complete. Using the them. The order of the two connected
edges, with n number of vertices must be vertices is irrelevant and has no direction. You
connected. It's also known as a full graph can form an undirected graph with a finite
because each vertex's degree must be n-1. number of vertices and edges.

8. Pseudo Graph: If a graph G= (V, E) contains 13. Connected Graph: If there is a path
a self-loop besides other edges, it is a between one vertex of a graph data structure
pseudograph. and any other vertex, the graph is connected.

9. Regular Graph: If a graph G= (V, E) is a 14. Disconnected Graph: When there is no


simple graph with the same degree at each edge linking the vertices, you refer to the null
vertex, it is a regular graph. As a result, every graph as a disconnected graph.
whole graph is a regular graph.
Terminologies of Graphs in Data Structures
15. Cyclic Graph: If a graph contains at least one graph cycle, it is
considered to be cyclic.

16. Acyclic Graph: When there are no cycles in a graph, it is called an


acyclic graph.

17. Directed Acyclic Graph: It's also known as a directed acyclic


graph (DAG), and it's a graph with directed edges but no cycle. It
represents the edges using an ordered pair of vertices since it directs
the vertices and stores some data.

18. Subgraph: The vertices and edges of a graph that are subsets of
another graph are known as a subgraph.
Representation of Graphs in Data Structures Undirected Graph Representation

Graphs in data structures are used to represent the


relationships between objects. Every graph consists of a
set of points known as vertices or nodes connected by
lines known as edges. The vertices in a network represent
entities.

The most frequent graph representations are the two that Directed Graph Representation
follow:  Adjacency matrix  Adjacency list

You’ll look at these two representations of graphs in data


structures in more detail:

Adjacency Matrix: A sequential representation is an


adjacency matrix. It's used to show which nodes are next
to one another. I.e., is there any connection between Weighted Undirected Graph Representation: Weight or cost is
nodes in a graph? indicated at the graph's edge, a weighted graph representing
these values in the matrix.
You create an M X M matrix G for this representation. If an
edge exists between vertex a and vertex b, the
corresponding element of G, gi,j = 1, otherwise gi,j = 0.

If there is a weighted graph, you can record the edge's


weight instead of 1s and 0s.
Adjacency List: A linked representation is an adjacency list. Graph Traversal: In simple words, traversal means the
You keep a list of neighbors for each vertex in the graph in this process of visiting every node in the graph.
representation. It means that each vertex in the graph has a list of its
 Graph traversal is a technique used for searching a
neighboring vertices.
vertex in a graph. The graph traversal is also used to
You have an array of vertices indexed by the vertex number, and the decide the order of vertices is visited in the search
corresponding array member for each vertex x points to a singly linked process.
list of x's neighbors.
 A graph traversal finds the edges to be used in the
search process without creating loops. That means
Weighted Undirected Graph Representation Using Linked-List using graph traversal we visit all the vertices of the
graph without getting into looping path.
 There are two standard methods of graph traversal
Breadth-First Search and Depth First Search
 Breadth First Search: Breadth-First Search (BFS) is a
graph traversal algorithm where we start from a
selected(source) node and traverse the graph level by
level, by exploring the neighbor nodes at each level.
Weighted Undirected Graph Representation Using an Array
 Depth First Search: Depth First Search (DFS) is a graph
traversal algorithm where we start from a
selected(source) node and go into the depth of this
node by recursively calling the DFS function until no
children are encountered. When the dead-end is
reached, this algorithm backtracks and starts visiting
the other children of the current node.
BFS (Breadth First Search): BFS traversal of a graph produces a spanning
tree as the final result. A Spanning Tree is a graph without loops. We use a
Queue data structure with a maximum size of the total number of vertices
in the graph to implement BFS traversal.

We use the following steps to implement BFS traversal…

Step 1 - Define a Queue of size and the total number of vertices in the
graph.
Step 2 - Select any vertex as a starting point for traversal. Visit that vertex
and insert it into the Queue.
Step 3 - Visit all the non-visited adjacent vertices of the vertex at the front
of the Queue and insert them into the Queue.
Step 4 - When there is no new vertex to be visited from the vertex at the
front of the Queue, then delete that vertex.
Step 5 - Repeat steps 3 and 4 until the queue becomes empty.
Step 6 - When the queue becomes empty, then produce the final
spanning tree by removing unused edges from the graph
DFS (Depth First Search): DFS traversal of a graph produces a spanning
tree as the final result.

A Spanning Tree is a graph without loops. We use a Stack data


structure with a maximum size of a total number of vertices in the graph to
implement DFS traversal.

We use the following steps to implement DFS traversal…


Step 1 - Define a Stack of size total number of vertices in the graph.
Step 2 - Select any vertex as a starting point for traversal. Visit that vertex
and push it onto the Stack.
Step 3 - Visit any one of the non-visited adjacent vertices of a vertex that is
at the top of the stack and push it onto the stack.
Step 4 - Repeat step 3 until there is no new vertex to be visited from the
vertex which is at the top of the stack.
Step 5 - When there is no new vertex to visit then, use backtracking and
pop one vertex from the stack.
Step 6 - Repeat steps 3, 4, and 5 until the stack becomes Empty.
Step 7 - When the stack becomes Empty, then produce the final spanning
tree by removing unused edges from the graph.
Hashing in the data structure is a technique of mapping a large chunk of data into small
tables using a hashing function. It is also known as the message digest function. It is a
technique that uniquely identifies a specific item from a collection of similar items.

 It uses hash tables to store the data in an array format. Each value in the array has been
assigned a unique index number. Hash tables use a technique to generate these unique
index numbers for each value stored in an array format. This technique is called the hash
technique.
 You only need to find the index of the desired item, rather than finding the data. With
indexing, you can quickly scan the entire list and retrieve the item you wish. Indexing also
helps in inserting operations when you need to insert data at a specific location. No matter
how big or small the table is, you can update and retrieve data within seconds.
 The hash table is basically the array of elements and the hash techniques of search are
performed on a part of the item i.e. key. Each key has been mapped to a number, the range
remains from 0 to table size 1
 Types of hashing in data structure is a two-step process.
• The hash function converts the item into a small integer or hash value. This integer is
used as an index to store the original data.
• It stores the data in a hash table. You can use a hash key to locate data quickly.
Need for Hash data structure: Every day, the data on the internet is increasing multifold and it is
always a struggle to store this data efficiently. In day-to-day programming, this amount of data might
not be that big, but still, it needs to be stored, accessed, and processed easily and efficiently. A very
common data structure that is used for such a purpose is the Array data structure.

• Now the question arises if Array was already there, what was the need for a new data structure! The
answer to this is in the word “efficiency“. Though storing in Array takes O(1) time, searching in it
takes at least O(log n) time. This time appears to be small, but for a large data set, it can cause a lot
of problems and this, in turn, makes the Array data structure inefficient.

• So now we are looking for a data structure that can store the data and search in it in constant time,
i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction of the
Hash data structure, it is now possible to easily store data in constant time and retrieve them in
constant time as well.

The following are real-life examples of hashing in the data structure –

 In schools, the teacher assigns a unique roll number to each student. Later, the teacher uses that
roll number to retrieve information about that student.
 A library has an infinite number of books. The librarian assigns a unique number to each book. This
unique number helps in identifying the position of the books on the bookshelf.
Components of Hashing: There are majorly three
components of hashing:
1.Key: A Key can be anything string or integer which
is fed as input in the hash function the technique
that determines an index or location for storage of
an item in a data structure.
2.Hash Function: The hash function receives the
input key and returns the index of an element in an
array called a hash table. The index is known as
the hash index.
3.Hash Table: Hash table is a data structure that
maps keys to values using a special function called a
hash function. Hash stores the data in an associative
manner in an array where each data value has its
own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table. Our main objective here is to
search or update the values stored in the table quickly in O(1) time and we are not concerned about the ordering of strings in
the table. So the given set of strings can act as a key and the string itself will act as the value of the string but how to store the
value corresponding to the key?

 Step 1: We know that hash functions (which is some mathematical formula) are used to calculate the hash value which
acts as the index of the data structure where the value will be stored.
 Step 2: So, let’s assign “a” = 1, “b”=2, .. etc, to all alphabetical characters.
 Step 3: Therefore, the numerical value by summation of all characters of the string: “ab” = 1 + 2 = 3, “cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
 Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that is used here is the sum of
the characters in key mod Table size. We can compute the location of the string in the array by taking the sum(string) mod
7.
 Step 5: So we will then store “ab” in 3 mod 7 = 3, “cd” in 7 mod 7 = 0, and “efg” in 18 mod 7 = 4.

 The above technique enables us to calculate the location of a given string by using a simple hash function and rapidly find
the value that is stored in that location. Therefore the idea of hashing seems like a great way to store (key, value) pairs of
the data in a table.
How does Hashing in Data Structure Works?

• In hashing, the hashing function maps strings or numbers to a


small integer value. Hash tables retrieve the item from the list
using a hashing function. The objective of hashing technique is
to distribute the data evenly across an array. Hashing assigns all
the elements a unique key. The hash table uses this key to
access the data in the list.

• Hash table stores the data in a key-value pair. The key acts as
an input to the hashing function. Hashing function then
generates a unique index number for each value stored. The
index number keeps the value that corresponds to that key. The
hash function returns a small integer value as an output. The
output of the hashing function is called the hash value.

• Let us understand hashing in a data structure with an


example. Imagine you need to store some items (arranged in a
key-value pair) inside a hash table with 30 cells. The values are:
(3,21) (1,72) (40,36) (5,30) (11,44) (15,33) (18,12) (16,80)
(38,99)
The process of taking any size of data and then converting that into smaller data value which can be named as
• The hash table will look like the following:
hash value. This hash value can be used in an index accessible in hash table. This process define hashing in data
What is a Hash function? The hash function creates a mapping between key and value, this
is done through the use of mathematical formulas known as hash functions. The result of the
hash function is referred to as a hash value or hash. The hash value is a representation of the
original string of characters but usually smaller than the original.

For example: Consider an array as a Map where the key is the index and the value is the
value at that index. So for an array A if we have index i which will be treated as the key then
we can find the value by simply looking at the value at A[i].
simply looking up A[i].

Types of Hash functions: There are many hash functions that use numeric or alphanumeric
keys.

 Division Method.
 Mid Square Method.
 Folding Method.
 Multiplication Method
Properties of a Good hash function: A hash function that maps every item into its own unique slot is known
as a perfect hash function. We can construct a perfect hash function if we know the items and the collection
will never change but the problem is that there is no systematic way to construct a perfect hash function given
an arbitrary collection of items.

 Fortunately, we will still gain performance efficiency even if the hash function isn’t perfect. We can achieve
a perfect hash function by increasing the size of the hash table so that every possible value can be
accommodated. As a result, each item will have a unique slot. Although this approach is feasible for a small
number of items, it is not practical when the number of possibilities is large.

 So, We can construct our hash function to do the same but the things that we must be careful about while
constructing our own hash function. A good hash function should have the following properties:

1.Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
3.Should minimize collisions.
4.Should have a low load factor(number of items in the table divided by the size of the table).

Complexity of calculating hash value using the hash function


•Time complexity: O(n) Space complexity: O(1)
Problem with Hashing: If we consider the
above example, the hash function we used is the
sum of the letters, but if we examined the hash
function closely then the problem can be easily
visualized that for different strings same hash
value is begin generated by the hash function.

For example: {“ab”, “ba”} both have the same


hash value, and string {“cd”,”be”} also generate
the same hash value, etc. This is known
as collision and it creates problem in searching,
insertion, deletion, and updating of value.

What is collision?

The hashing process generates a small number


for a big key, so there is a possibility that two
keys could produce the same value. The situation
where the newly inserted key maps to an already
occupied, and it must be handled using some
collision handling technology.
How to handle Collisions?

There are mainly two methods to handle collision:

1.Open Hashing (Separate Chaining) 2. Closed Hashing (Open Addressing)


1) Separate Chaining: Step 1: First draw the empty hash table which will have a possible range of hash
values from 0 to 4 according to the hash function provided.
The idea is to make each cell of
Step 2: Now insert all the keys in the hash table one by one. The first key to be
the hash table point to a linked
inserted is 12 which is mapped to bucket number 2 which is calculated by using the
list of records that have the same
hash function 12%5=2.
hash function value. Chaining is
simple but requires additional Step 3: Now the next key is 22. It will map to bucket number 2 because 22%5=2. But
memory outside the table. bucket 2 is already occupied by key 12.
Step 4: The next key is 15. It will map to slot number 0 because 15%5=0.
Example: We have given a hash
function and we have to insert Step 5: Now the next key is 25. Its bucket number will be 25%5=0. But bucket 0 is
some elements in the hash table already occupied by key 25. So separate chaining method will again handle the
using a separate chaining method collision by creating a linked list to bucket 0.
for collision resolution technique.

Hash function = key % 5,

Elements = 12, 15, 22, 25 and 37.

Let’s see step by step approach to


how to solve the above problem:

Hence In this way, the separate chaining method is used as the collision resolution technique.
2) Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we examine the table slots one
by one until the desired element is found or it is clear that the element is not in the table.

2.a) Linear Probing: In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already occupied, then we check
for the next location.

Algorithm:

1. Calculate the hash key. i.e. key = data % size


2. Check, if hashTable[key] is empty
3. store the value directly by hashTable[key] = data
4. If the hash index already has some value then
5. check for next index using key = (key+1) % size
6. Check, if the next index is available hashTable[key] then store the value. Otherwise try for
next index.
7. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are to be inserted are
50, 70, 76, 85, 93.
Step 1: First draw the empty
hash table which will have a
possible range of hash values Step 4: The next key is 76. It will
from 0 to 4 according to the
map to slot number 1 because
hash function provided.
76%5=1 but 70 is already at slot
number 1 so, search for the next
empty slot and insert it.
Step 2: Now insert all the
keys in the hash table one by
one. The first key is 50. It will
map to slot number 0
because 50%5=0. So insert it
into slot number 0.
Step 5: The next key is 93 It will
Step 3: The next key is 70. It map to slot number 3 because
will map to slot number 0 93%5=3, So insert it into slot
because 70%5=0 but 50 is number 3.
already at slot number 0 so,
search for the next empty slot
and insert it.
2.b) Quadratic Probing: Quadratic probing is an open Example: Let us consider table Size = 7, hash function
addressing scheme in computer programming for as Hash(x) = x % 7 and collision resolution strategy to
resolving hash collisions in hash tables. Quadratic probing be f(i) = i2 . Insert = 22, 30, and 50
operates by taking the original hash index and adding
Step 1: Create a table of size 7.
successive values of an arbitrary quadratic polynomial until
an open slot is found. An example sequence using
Step 2 – Insert 22 and 30
quadratic probing is:
• Hash(22) = 22 % 7 = 1, Since the cell at
index 1 is empty, we can easily insert 22 at
slot 1.
This method is also known as the mid-square method
• Hash(30) = 30 % 7 = 2, Since the cell at
because in this method we look for i2‘th probe (slot) in i’th
index 2 is empty, we can easily insert 30 at
iteration and the value of i = 0, 1, . . . n – 1. We always start
slot 2.
from the original hash location. If only the location is
occupied then we check the other slots.

Let hash(x) be the slot index computed using the hash


function and n be the size of the hash table.
Step 3: Inserting 50

• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we
will search for slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search
for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in
slot 5.
2.c) Double Hashing: Double hashing is a collision- Example: Insert the keys 27, 43, 692, 72 into the
resolving technique in Open Addressed Hash tables. Hash Table of size 7. where first hash-function
Double hashing make use of two hash function, is h1(k) = k mod 7 and second hash-function
is h2(k) = 1 + (k mod 5)
• The first hash function is h1(k) which takes the key
and gives out a location on the hash table. But if the Step 1: Insert 27,
new location is not occupied or empty then we can • 27 % 7 = 6, location 6 is
easily place our key. empty so insert 27 into 6 slot.

• But in case the location is occupied (collision) we


will use secondary hash-function h2(k) in
combination with the first hash-function h1(k) to Step 2: Insert 43
find the new location on the hash table. • 43 % 7 = 1, location 1 is
empty so insert 43 into 1 slot.
• This combination of hash functions is of the form
h(k, i) = (h1(k) + i * h2(k)) % n , here

i is a non-negative integer that indicates a collision


number,
k = element/key which is being hashed
n = hash table size.

Complexity of the Double hashing algorithm:


Time complexity: O(n)
Step 3: Insert 692

•692 % 7 = 6, but location 6 is already being occupied and this is a collision


•So we need to resolve this collision using double hashing.

hnew = [h1(692) + i * (h2(692)] % 7


= [6 + 1 * (1 + 692 % 5)] % 7
=9%7
=2

Now, as 2 is an empty slot, so we can insert 692 into 2nd slot.

Step 4: Insert 72

•72 % 7 = 2, but location 2 is already being occupied and this is a collision.


•So we need to resolve this collision using double hashing.

hnew = [h1(72) + i * (h2(72)] % 7


= [2 + 1 * (1 + 72 % 5)] % 7
= 5 % 7 = 5,

Now, as 5 is an empty slot, so we can insert 72 into 5th slot.


UNIT-V
STRING ALGORITHMS AND HASHING

String Matching Algorithms have greatly influenced computer science and play an essential role in various
real-world problems. It helps in performing time-efficient tasks in multiple domains. These algorithms are
useful in case of searching a string within a string. String matching is used in the Database schema,
networking systems.

String matching algorithms can be broadly classified into two types:


1.Exact String-Matching Algorithms
2.Approximate String-Matching Algorithms

EXACT STRING-MATCHING ALGORITHMS:


Exact string-matching algorithms is to find one, several, or all occurrences of a defined string (pattern) in
a large string (text or sequences) such that each matching is perfect
a. Algorithms based on character composition:
1. Naive Algorithm: It slides the pattern over text one by one and check for a match. If a match is found,
then slides by 1 and checks for subsequent matches.
2. KMP (Kutch Morris Pratt) Algorithm: The idea is whenever a mismatch is detected, we already
know some of the characters in the text of the next window. So, we take advantage of this information to
avoid matching the characters we know anyway.
3. Boyer Moore Algorithm: This algorithm uses best heuristics of Naïve and KMP algorithm and starts
matching from the last character of pattern.
4.Using the Trie data structure: It is used as an efficient information retrieval data structure. It stores
the key in the form of a balanced BST.
b. Deterministic Finite Automation (DFA) method:
1.Automation Matter Algorithm: It starts from the first state of the automata and the first character of
the text. At every step, it considers the next character of text, and look for the next state in bulit finite
automata and move to a new state.
c. Algorithms based on a bit (parallelism method):
1. Aho-Corasick Algorithm: It finds all words in O (n + m + z) time where n is length of the text, m be
the total number of characters in all words, and z be the total number of occurrences of word in that
text.The algorithm forms the basis of the original Unix command fgrep.
d. Hashing –string matching Algorithm:
1. Rabin Karp Algorithm: It matches the hash value of the pattern with the hash value of current
substring of text, and if the hash values match then only it starts matching individual characters.

APPROXIMATE STRING-MATCHING ALGORITHMS:


Approximate String-Matching Algorithms (also known as Fuzzy String Searching) searches for substrings
of the input string. More specifically, the approximate string-matching approach is stated as follows:
Suppose that we are given two strings, text T [1....n] and pattern P [1....m]. The task is to find all the
occurrences of patterns in the text whose edit distance to the pattern is at most k. Some well-known edit
distances are- Levenshtein edit distance and Hamming edit distance.

Those techniques are used when the quality of text is low, there are spelling errors in the pattern or text,
finding DNA subsequences after mutation, heterogenous databases, etc. Some of them are:
1.Naive Approach: It slides the pattern over text one by one and check for subsequent approximate
matches:
1.Sellers Algorithm (Dynamic Programming)
2. Shift or Algorithm (Bitmap Algorithm)

BRUTE FORCE ALGORITHM:


Brute Force Algorithms are exactly what they sound like – straightforward methods of solving a problem
that rely on sheer computing power and trying every possibility rather than advanced techniques to
improve efficiency.
For example, imagine you have a small padlock with 4 digits, each from 0-9. You forgot your combination,
but you don't want to buy another padlock. Since you can't remember any of the digits, you have to use a
brute force method to open the lock.
So you set all the numbers back to 0 and try them one by one: 0001, 0002, 0003, and so on until it opens.
In the worst-case scenario, it would take 104, or 10,000 tries to find your combination.
A classic example in computer science is the traveling salesman problem (TSP). Suppose a salesman needs
to visit 10 cities across the country. How does one determine the order in which those cities should be
visited such that the total distance travelled is minimized?
The brute force solution is simply to calculate the total distance for every possible route and then select
the shortest one. This is not particularly efficient because it is possible to eliminate many possible routes
through clever algorithms.
The time complexity of brute force is O(mn), which is sometimes written as O(n*m). So, if we were to
search for a string of "n" characters in a string of "m" characters using brute force, it would take us n * m
tries.

CODE:
def brute_force_string_matching(pattern, text):
m, n = len(pattern), len(text)

for i in range(n - m + 1):


if text[i:i + m] == pattern:
return i # Return the starting index if the pattern is found
return -1 # Return -1 if the pattern is not found

# Example usage:
pattern = "abc"
text = "abracadabra"

result = brute_force_string_matching(pattern, text)

if result != -1:
print(f"Pattern found at index {result}.")
else:
print("Pattern not found.")

RESULT:
Pattern found at index 0.

CODE:
import itertools
Defbrute_force_password_cracking(target_password,
characters="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"):
password_length = len(target_password)

for combination in itertools.product(characters, repeat=password_length):


attempt = ''.join(combination)
if attempt == target_password:
return attempt # Return the cracked password

return None # Return None if the password is not cracked

# Example usage:
target_password = "secret123"
cracked_password = brute_force_password_cracking(target_password)
if cracked_password:
print (f"Password cracked: {cracked_password}")
else:
print ("Password not cracked.")

RESULT:
Password not cracked.

In real world scenarios, brute force algorithms might not be practical for large datasets or more complex
problems due to their inefficiency.

RABIN KARP ALGORITHM:


Rabin-Karp algorithm is an algorithm used for searching/matching patterns in the text using a hash
function. Unlike Naive string-matching algorithm, it does not travel through every character in the initial
phase rather it filters the characters that do not match and then performs the comparison.
A hash function is a tool to map a larger input value to a smaller output value. This output value is called
the hash value.

How Rabin-Karp Algorithm Works?


A sequence of characters is taken and checked for the possibility of the presence of the required string. If
the possibility is found then, character matching is performed.

Let us understand the algorithm with the following steps:

1. Let the text be:

Text

And the string to be searched in the above text be:

Pattern

Let us assign a numerical value(v)/weight for the characters we will be using in the problem. Here, we
have taken first ten alphabets only (i.e. A to J).

Let n be the length of the pattern and m be the length of the text. Here, m = 10 and n = 3.
Let d be the number of characters in the input set. Here, we have taken input set {A, B, C, ...,
J}. So, d = 10. You can assume any suitable value for d.
Let us calculate the hash value of the pattern.

hash value for pattern(p) = Σ(v * dm-1) mod 13

= ((3 * 102) + (4 * 101) + (4 * 100)) mod 13

= 344 mod 13

=6

In the calculation above, choose a prime number (here, 13) in such a way that we can perform all the
calculations with single-precision arithmetic.
The reason for calculating the modulus is given below.

Calculate the hash value for the text-window of size m.

For the first window ABC,


hash value for text(t) = Σ(v * dn-1) mod 13
= ((1 * 102) + (2 * 101) + (3 * 100)) mod 13
= 123 mod 13
=6

Compare the hash value of the pattern with the hash value of the text. If they match then, character-
matching is performed.
In the above examples, the hash value of the first window (i.e. t) matches with p so, go for character
matching between ABC and CDD. Since they do not match so, go for the next window.
We calculate the hash value of the next window by subtracting the first term and adding the next term as
shown below.

t = ((1 * 102) + ((2 * 101) + (3 * 100)) * 10 + (3 * 100)) mod 13


= 233 mod 13
= 12

In order to optimize this process, we make use of the previous hash value in the following way.

t = ((d * (t - v[character to be removed] * h) + v[character to be added] ) mod 13


= ((10 * (6 - 1 * 9) + 3 )mod 13
= 12
Where, h = dm-1 = 103-1 = 100.

For BCC, t = 12 (≠6). Therefore, go for the next window.


After a few searches, we will get the match for the window CDA in the text.

CODE:
Input: txt[] = "AABAACAADAABAABA"
pat[] = "AABA"
Output: Pattern found at index 0
Pattern found at index 9
Pattern found at index 12

# Rabin-Karp algorithm in python

d = 10

def search(pattern, text, q):

m = len(pattern)

n = len(text)

p=0

t=0

h=1

i=0

j=0

for i in range(m-1):
h = (h*d) % q

# Calculate hash value for pattern and text

for i in range(m):

p = (d*p + ord(pattern[i])) % q

t = (d*t + ord(text[i])) % q

# Find the match

for i in range(n-m+1):

if p == t:

for j in range(m):

if text[i+j] != pattern[j]:

break

j += 1

if j == m:

print("Pattern is found at position: " + str(i+1))

if i < n-m:
t = (d*(t-ord(text[i])*h) + ord(text[i+m])) % q

if t < 0:

t = t+q

text = "ABCCDDAEFG"

pattern = "CDD"

q = 13

search(pattern, text, q)

RESULT:

Pattern is found at position: 4

Limitations of Rabin-Karp Algorithm


Spurious Hit
When the hash value of the pattern matches with the hash value of a window of the text but the window
is not the actual pattern then it is called a spurious hit.
Spurious hit increases the time complexity of the algorithm. In order to minimize spurious hit, we use
modulus. It greatly reduces the spurious hit.
Rabin-Karp Algorithm Complexity
The average case and best-case complexity of Rabin-Karp algorithm is O(m + n) and the worst-
case complexity is O(mn).

The worst-case complexity occurs when spurious hits occur a number for all the windows.

Rabin-Karp Algorithm Applications


For pattern matching
For searching string in a bigger text

HASH TABLE ADT:


• A hash table is a data structure that efficiently implements the dictionary abstract data structure
with fast insert, find and remove operations.
In previous sections we were able to make improvements in our search algorithms by taking advantage of
information about where items are stored in the collection with respect to one another. For example, by
knowing that a list was ordered, we could search in logarithmic time using a binary search. In this section
we will attempt to go one step further by building a data structure that can be searched in O(1) time. This
concept is referred to as hashing.

In order to do this, we will need to know even more about where the items might be when we go to look
for them in the collection. If every item is where it should be, then the search can use a single comparison
to discover the presence of an item. We will see, however, that this is typically not the case.
A hash table is a collection of items which are stored in such a way as to make it easy to find them later.
Each position of the hash table, often called a slot, can hold an item and is named by an integer value
starting at 0. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on. Initially,
the hash table contains no items so every slot is empty. We can implement a hash table by using a list with
each element initialized to the special Python value None. Figure shows a hash table of size m=11. In
other words, there are m slots in the table, named 0 through 10.
The mapping between an item and the slot where that item belongs in the hash table is called the hash
function. The hash function will take any item in the collection and return an integer in the range of slot
names, between 0 and m-1. Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. Our
first hash function, sometimes referred to as the “remainder method,” simply takes an item and divides it
by the table size, returning the remainder as its hash value (). Table 4 gives all of the hash values for
our example items. Note that this remainder method (modulo arithmetic) will typically be present in some
form in all hash functions, since the result must be in the range of slot names.
Table 4: Simple Hash Function Using Remainders

Once the hash values have been computed, we can insert each item into the hash table at the designated
position as shown in Figure 5. Note that 6 of the 11 slots are now occupied. This is referred to as the load
factor and is commonly denoted by lambda = number of items/tablesize. For this example, lambda=6/11.

Now when we want to search for an item, we simply use the hash function to compute the slot name for
the item and then check the hash table to see if it is present. This searching operation is O(1), since a
constant amount of time is required to compute the hash value and then index the hash table at that
location. If everything is where it should be, we have found a constant time search algorithm.
You can probably already see that this technique is going to work only if each item maps to a unique
location in the hash table. For example, if the item 44 had been the next item in our collection, it would
have a hash value of 0 (44%11==0). Since 77 also had a hash value of 0, we would have a problem.
According to the hash function, two or more items would need to be in the same slot. This is referred to
as a collision (it may also be called a “clash”). Clearly, collisions create a problem for the hashing
technique. We will discuss them in detail later.
HASH FUNCTIONS:
Given a collection of items, a hash function that maps each item into a unique slot is referred to as a
perfect hash function.

Our goal is to create a hash function that minimizes the number of collisions, is easy to compute, and
evenly distributes the items in the hash table. There are a number of common ways to extend the simple
remainder method. We will consider a few of them here.

The folding method for constructing hash functions begins by dividing the item into equal-size pieces
(the last piece may not be of equal size). These pieces are then added together to give the resulting hash
value. For example, if our item was the phone number 436-555-4601, we would take the digits and divide
them into groups of 2 (43,65,55,46,01). After the addition, 43+65+55+46+01, we get 210. If we assume
our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the
remainder. In this case 210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1. Some folding
methods go one step further and reverse every other piece before the addition. For the above example, we
get 43+56+55+64+01=219 which gives 219 % 11=10.

Another numerical technique for constructing a hash function is called the mid-square method. We first
square the item, and then extract some portion of the resulting digits. For example, if the item were 44,
we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder
step, we get 5 (93 % 11). Table 5 shows items under both the remainder method and the mid-square
method. You should verify that you understand how these values were computed.

We can also create hash functions for character-based items such as strings. The word “cat” can be thought
of as a sequence of ordinal values.
We can then take these three ordinal values, add them up, and use the remainder method to get a hash
value (see Figure 6). Listing 1 shows a function called hash that takes a string and a table size and returns
the hash value in the range from 0 to tablesize-1.

Figure 6: Hashing a String Using Ordinal Values

It is interesting to note that when using this hash function, anagrams will always be given the same hash
value. To remedy this, we could use the position of the character as a weight. Figure 7 shows one possible
way to use the positional value as a weighting factor. The modification to the hash function is left as an
exercise.

Figure 7: Hashing a String Using Ordinal Values with Weighting

You may be able to think of several additional ways to compute hash values for items in a collection. The
important thing to remember is that the hash function has to be efficient so that it does not become the
dominant part of the storage and search process. If the hash function is too complex, then it becomes more
work to compute the slot name than it would be to simply do a basic sequential or binary search as
described earlier. This would quickly defeat the purpose of hashing.

HOW TO CHOOSE A HASH FUNCTION:


The basic problems associated with the creation of hash tables are:

• An efficient hash function should be designed so that it distributes the index values of inserted
objects uniformly across the table.
• An efficient collision resolution algorithm should be designed so that it computes an alternative
index for a key whose hash index corresponds to a location previously inserted in the hash table.
• We must choose a hash function which can be calculated quickly, returns values within the range
of locations in our table, and minimizes collision.

CHARACTERISTICS OF A GOOD HASH FUNCTION:


A good hash function should have following characteristics:

• Minimize collision
• Be easy and quick to compute
• Distribute key values evenly in the hash table
• Use all the information provided in the key
• Have high load factor for given set of keys

LOAD FACTOR:
The load factor of a non-empty hash table is the number of items stored in the table divided by size of the
table. This is the decision parameter used when we want to rehash or expand the existing hash table entries.
This also helps us in determining the efficiency of hashing function. That means, it tells whether
the hash function is distributing the keys uniformly or not.

COLLISIONS:
Hash functions are used to map each key to a different address space, but practically it is not possible to
create such a hash function and the problem is collision. Collision is the condition where two records are
stored in the small location.

COLLISION RESOLUTION TECHNIQUES:


The process of finding an alternate location is collision resolution. Even though hash tables have collision
problems, they are more efficient in many cases compared to all the other data structures, like search trees.
There are several collisions techniques, and the most popular are direct chaining and open addressing.

• Direct Chaining: An array of linked list application

Separate chaining

• Open addressing: Array based implementation

Linear probing (Linear search)

Quadratic probing (Non-linear search)

Double Hashing (using two hash functions)

Separate Chaining:

Collision resolution by chaining combines linked representation with hash table. When two or more
records hash to the same location, these records are constituted into a singly linked list is called a chain

Open addressing:

In open addressing, all keys are stored in hash table itself. This approach is also known as closed hashing.
This procedure is based on probing. A collision is resolved by probing.
Linear Probing:

The interval between probes is fixed at 1. In linear probing, we search the hash table sequentially, starting
from the original hash location. If location is occupied, we check the next location. We wrap around from
the last table location to the first table location if necessary. The function for rehashing is the following:

rehash(key) = (11+1) % tableslze

One of the problems with linear probing is that table items tend to cluster together in the hash table. This
means that the table contains groups of consecutively occupied locations that a rc called clustering.

Clusters can get close to one another, and merge into a larger cluster. Thus, the one pa rt of the table might
be quite dense, even though another part has relatively few items. Clustering causes long probe searches
and therefore decreases the overall efficiency.

The next location to be probed is determined by the step-size, where other step-sizes (more than one) arc
possible. The step-size should be relatively prime to the table size, i.e. their greatest common divisor
should be equal to 1. If we choose the table size lo be a prime number, then any step-size is relatively
prime to the table size. Clustering cannot be avoided by larger step-sizes.

Quadratic Probing:

The interval between probes increases proportionally to the hash value (the interval thus increasing
linearly, and the indices are described by a quadratic function). The problem of Clustering can be
eliminated if we use the quadratic probing method.

In quadratic probing, we start from the original hash location i. If a location is occupied, we check the
locations i+1^2, i+2^2, i+3^2, i+4^2.Wrap around from the last table location Lo the first table location if
necessary. The function for rehashing is the following:

rehash(key) = (n + k2) % tablesize


Even though clustering is avoided by quadratic probing, still there are chances of clustering. Clustering is
caused by multiple search keys mapped to the same hash key. Thus, the probing sequence for such search
keys is prolonged by repeated connects along the probing sequence. Both linear and quadratic probing use
a probing sequence that is independent of the search key.

Double Hashing:

The interval between probes is computed by another hash function. Double hashing reduces clustering in
a better way. The increments for the probing sequence are computed by using a second hash function. The
second hash function should be:

h2(key) is not equal to 0 and h1 is not equal to h2

We first probe the location hl(key). If the location is occupied, we probe the location hl(key) + h2(key),
hl(key) + 2 * h2(key), …

Comparison of Collision Resolution Techniques

Comparisons: Linear Probing vs. Double Hashing

The choice between linear probing and double hashing depends on the cost of
computing the hash function and on the load factor [number of elements per
slot] of the table. Both use few probes, but double hashing takes more time
because it hushes to compare two hash functions for long keys.

Comparisons: Open Addressing vs. Separate Chaining


It is somewhat complicated because we must account for the memory usage.
Separate chaining uses extra memory for links. Open addressing needs extra
memory implicitly within the table to terminate the probe sequence. Open-
addressed hash tables cannot be used if the data does not have unique keys. An
alternative is to use separate chained hash tables.

Hashing Techniques:

There are two types of hashing techniques: static hashing and dynamic hashing

Static Hashing

If the data is fixed, then static hashing is useful. In static hashing, the set of keys is kept fixed and given
in advance, and the number of primary pages in the directory arc kept fixed.

Dynamic ;:Hashing

If the data is not fixed, static hashing can give bad performance, in which case dynamic hashing is the
alternative, in which case the set of keys can change dynamically.

PROBLEMS FOR WHICH HASH TABLES ARE NOT SUITABLE:

• Problems for which data ordering is required

• Problems having multidimensional data

• Prefix searching, especially if the keys are long and of variable-lengths

• Problems that have dynamic data

• Problems in which the data does not have unique keys


Code No.: 20CSC06
CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY (Autonomous)
B.E. & B.Tech III Sem (Main) Examination March 2023
Basics of Data Structures
(Common to Mech, ECE, EEE & Chem)
Time: 3 Hours Max Marks: 60
Note: Answer ALL questions from Part-A & Part –B (Internal Choice) at one place in the
same order
Part - A
(5Q X 3M = 15 Marks)
M CO BT
1 How data structures are classified? (3) 1 1
2 When doubly linked list can be represented as circular linked list? (3) 2 1
3 Write the routine to delete a element from a queue. (3) 3 3
4 List out the steps involved in deleting a node from a binary search tree. (3) 4 1
5 When is a graph said to be weakly connected? (3) 5 1
Part – B
(5Q X 9M = 45 Marks)
M CO BT
6 (a) State the properties of LIST abstract data type with suitable example. (5) 1 1
(b) Define recursion. Explain it with Fibonacci series and factorial of a (4) 1 2
number.
(OR)
7 (a) Differentiate between Linear and Non-linear Data Structure? (5) 1 1
(b) Define ADT and What are the features of ADT? (4) 1 1

8 (a) Explain the steps involved in insertion and deletion into a singly and (5) 2 2
doubly linked list?
(b) What are the benefits and limitations of linked list? (4) 2 2
(OR)
9 (a) How polynomial manipulations are performed with lists? Explain its (5) 2 2
operations?
(b) What are the applications of linked list in dynamic storage management? (4) 2 2

10 (a) Explain double ended queue and its operations? (5) 3 3


(b) State & explain the algorithm to perform Quick sort. Also analyse the (4) 3 3
time complexity of the algorithm.
(OR)
11 (a) Explain how to evaluate arithmetic expressions using stacks? (5) 3 3
(b) Write an algorithm to implement selection sort with suitable example. (4) 3 2

12 (a) Construct an expression tree for the expression (a+b*c) + ((d*e+f)*g). (5) 4 3
Give the outputs when you apply inorder, preorder and postorder
traversals.
(b) Explain steps for conversion of general tree to binary tree, with an (4) 4 1
example.
(OR)
Page 1 of 2
Code No.: 20CSC06
13 (a) Write a recursive algorithm for binary tree traversal with an example. (5) 4 3
(b) List out the steps involved in deleting a node from a binary search tree (4) 4 3
with an example.

14 (a) Define Graph and explain how graphs can be represented in adjacency (5) 5 1
matrix and adjacency list.
(b) Briefly explain various levels in a graph using examples. (4) 5 2
(OR)
15 (a) Explain the minimum spanning tree algorithms with an example. (5) 5 2
(b) Define Indegree and Outdegree of a graph with an example? (4) 5 1

******

Page 2 of 2

You might also like