Professional Documents
Culture Documents
Dsup All Units
Dsup All Units
Each innovation in data structure was driven by the need to solve a fundamental
problem that the preceding structure could not solve effectively or not at all.
Reasons to learn DSA: Many people consider DSA as just a mere subject in
computer science. This is where they get it wrong. DSA is much more than that. It
teaches you a way to be a better programmer and a way to think better. It is a skill
that will help you throughout your life and is not a skill to learn just to pass a
subject. Let us dive deeper into various reasons why one should learn DSA –
Role of DSA in Solving Real-World Problems
You will be surprised to know that DSA has quite an important role to play even in
solving real-world problems. Real-world problems that take months can be solved in
minutes using the knowledge of DSA.
Let us say you want to find a set of people in the same age group within a large
collection of data. Assuming this data is sorted, you can solve this issue easily with
the binary search algorithm which works on the principle of DSA. The binary search
algorithm is considered a logarithmically scalable algorithm, unlike traditional
methods that are just linearly scalable. This means, if the number of data points in
the database is squared, the time taken to do the same task in the binary search will
only be doubled.
Why Should You Learn Data Structures and Algorithms?
1. Another real-world problem that DSA could solve is the Rubik’s cube.
Most of you would have used or at least seen the Rubik’s cube.
But do you know that a simple object like a Rubik’s cube has confused even the
most outstanding mathematicians?
It is known that a Rubik’s cube has a total of 43,252,003,274,489,856,000 (~43
Quintillion) diffèrent possible positions (Ref: https://www.youtube.com/watch?v=z2-
d0x_qxSM). Then imagine the total number of paths to reach all these positions.
Thankfully they found the solution to solve it through Djikstra’s algorithm, which is
based on the concept of DSA. It helps to solve the problem in linear time, which
means you can reach the solved position in the minimum number of states.
Why Should you Learn Data Structures and Algorithms?
2. Role of DSA in Machine Learning
Can you imagine, that a concept as advanced and futuristic as Machine Learning (ML) needs
Engineers with knowledge of DSA?
Apart from solving real-world problems, these engineers can design amazing products using the
combination of their ML and DSA knowledge. The knowledge of DSA is the basic building block
of algorithmic thinking, and logical capabilities in any field of computer science, and ML is no
exception.
An ML engineer spends a considerable part of his time collecting data which can lead to various
complex challenges that can be solved easily using the knowledge of DSA.
Let us assume you are creating an ML product that has a dataset with the address as one of its
columns. Now suppose you want to retrieve a portion of this data, say the street name, then ML
cannot work on the string directly. You would need the help of DSA to implement an algorithm
based on a string to retrieve the required data.
Why Should You Learn Data Structures and Algorithms?
Data Structures and Algorithms are often considered to be the root or the foundation of
computer science. With advancements in the computer science field, more and more data is
being stored and processed. These huge data can slow down the processing time of the
systems. This is where DSA helps by improving the processing power of the systems due
to the effective utilization of the stored data. DSA also helps in tasks like data search,
which plays an important role in any application.
The DSA typically shifts the focus of programming from the syntax to the approach. If you
notice, most computer science books in any curriculum will have a chapter or a course on
DSA.
The learners can use the concepts of DSA in any programming language of their choice
and also learn how to store and manipulate the data in it to get the desired outcome.
Do you have questions about why should I study all the complicated stuff such as Array, Linked List, Stack,
Queues, Searching, Sorting, Trees, Graphs, etc..,
if it has absolutely no use in real life??
Why do companies ask questions related to data structures and algorithms if it’s not useful in a daily job?
Now I will ask a simple question to you and you need to find the solution to that.
How would you do that if you need to search your roll number in 20000 pages of a PDF document?
How would you do that if you need to search your roll number in 20000 pages of a PDF document (roll
numbers are arranged in a particular order)?
If you try to search it randomly or sequentially, it will take too much time. You might get frustrated after some
time. You can try another solution which is given below…
Go to page no. 10000, If your roll no. is not there, but all other roll no. on that page are lesser than yours, Go to
page no. 15000. Still if your roll no. is not there. but this time all other roll no. is greater than your Go to page
no. 12500. Continue the same process and within 30-40 seconds you will find your roll number.
Congratulations…
You just used the Binary Search Algorithm unintentionally..
To Crack the Interviews of the Top Product Based
Companies
Do you know that under the hood all your SQL and Linux commands are algorithms and data structures?
You might not realize this, but that’s how the software works.
Data structures and algorithms play a major role in implementing software and in the hiring process as well. A lot of
students and professionals have the question of why these companies’ interviews are focused on DSA instead of
language/frameworks/tools-specific questions.
Let us explain why it happens…
When you ask someone to make a decision for something the good one will be able to tell you “I choose to do X
because it’s better than A, B in these ways. I could have gone with C, but I felt this was a better choice because of this“.
In our daily lives, we always go with the person who can complete the task in a short amount of time with efficiency
and using fewer resources. The same things happen with these companies. The problem faced by these companies is
much harder and on a much larger scale. Software developers also have to make the right decisions when it comes to
solving the problems of these companies.
Knowledge of DSA like Hash Tables, Trees, Graphs, and various algorithms goes a long way in solving these problems
efficiently and the interviewers are more interested in seeing how candidates use these tools to solve a problem. Just
like a car mechanic needs the right tool to fix a car and make it run properly, a programmer needs the right tool
(algorithm and data structure) to make the software run properly. So the interviewer wants to find a candidate who can
apply the right set of tools to solve the given problem. . If you know the characteristics of one data structure in contrast
to another you will be able to make the right decision in choosing the right data structure to solve a problem.
To Crack the Interviews of the Top Product Based
Companies
Engineers working in Google, Microsoft, Facebook, and Amazon such companies are
different than others and paid higher as compared to other companies…but why?
In these companies coding is just the implementation and roughly takes 20-30% of
the time allotted to a project.
Most of the time goes into designing things with the best and optimum algorithms to
save on the company’s resources (servers, computation power, etc.). This is the main
reason why interviews in these companies are focused on algorithms as they want
people who can think out of the box to design algorithms that can save the company
thousands of dollars.
YouTube, Facebook, Twitter, Instagram, GoogleMaps all these sites have the
highest number of users in the world. To handle more users on these sites it requires
more optimization to be done and that’s the reason product-based companies only
hire candidates who can optimize their software as per user demand.
To Crack the Interviews of the Top Product Based
Companies
Engineers working in Google, Microsoft, Facebook, and Amazon-like such companies are
different than others and paid higher as compared to other companies…but why?
Example: Suppose you are working in a Facebook company. You come up with an optimal solution
of a problem (like sorting a list of users from India) with time complexity of O(nLogn) instead of
O(n^2) and assume that n for the problem here for the company in real life scenario is 100 million
(very fair assumption considering the number of users registered on Facebook exceeds 1 billion).
nLogn would be 800 million, while n^2 would be 10^7 billion. In cost terms, you can see that the
efficiency has been improved more than 10^7 times, which could be a huge saving in terms of server
cost and time.
Now you might have got that companies want to hire a smart developer who can make the right
decision and save company resources, time, and money. So before you give the solution to use a
Hash table instead of a List to solve a specific problem think about the big scale and all the case
scenarios carefully. It can generate revenue for the company or the company can lose a huge amount
of money.
Data structure and algorithms help in understanding the nature of the problem at a deeper level
and thereby a better understanding of the world.
To Solve Some Real-World Complex Problems
• Have you ever been scolded by your parents when you were unable to find your book
or clothes in your messed-up room?
Definitely yes…your parents are right when they give the advice to keep everything
in the right place so the next time you can get your stuff easily. Here you need to
arrange and keep everything (data) in such a structure that whenever you need to
search for something you get that easily and as soon as possible. This example gives
a clear idea of how important it is to arrange or structure the data in real life.
• Now take the example of a library. If you need to find a book on Python from a library,
you will go to the CSE section first, then the Programming Languages section. If these
books are not organized in this manner and just distributed randomly then it will be
frustrating to find a specific book. So data structures refer to the way we organize
information on our computers. Computer scientists process and look for the best way we
can organize the data we have, so it can be better processed based on the input provided.
To Solve Some Real-World Complex Problems
A lot of newbie programmers have this question that where we use all the stuff of data
structure and algorithms in our daily life and how it’s useful in solving real-world complex
problems. We need to mention that whether you are interested in getting into the top tech
giant companies or not DSA concepts still help a lot in your day-to-day life. Don’t you
believe us…Let’s consider some examples…
1. Facebook (Yes… we are talking about your favorite application).
Can you just imagine that your friends on Facebook, friends of friends, and mutual
friends all can be represented easily by a Graph?
Relax….sit for a couple of moments and think again…you can apply a graph to
represent friends’ connections on Facebook.
2. If you need to keep a deck of cards and arrange it properly how would you do that?
You will throw it randomly or you will arrange the cards one over another and from a
proper deck. You can use Stack here to make a proper arrangement of cards one over
another.
To Solve Some Real-World Complex Problems
3. If you need to search for a word in the dictionary, what would be your approach?
Do you go page by page or do you open some pages and if the word is not found you
open a page prior to/later to one opened depending upon the order of words to the
current page (Binary Search)?
The first two were a good example of choosing the right data structure for a real-world
problem and the third one is a good example of choosing the right algorithm to solve a
specific problem in less amount time.
All the above examples give you a clear understanding of how the organization of data is
really important in our day-to-day life. Arranging data in a specific structure is really helpful
in saving a lot of time and it becomes easier to manipulate or use them. The same goes for the
algorithm…we all want to save our time, energy, and resources. We all want to choose the
best approach to solve the problems in our daily lives.
Introduction to Data Structures
WHAT IS DATA? Type of data structure :
Data is the collection of different numbers, symbols, and alphabets to Linear Data Structure
represent information. Non-Linear Data Structure
in order to solve the above problems, data structures are Abstraction: The data structure is specified by the
used. Data is organized to form a data structure in such a way ADT which provides a level of abstraction. The client
that all items are not required to be searched and required program uses the data structure through the interface
data can be searched instantly. only, without getting into the implementation details.
Data structures: An introduction Data structures are essential Algorithm analysis: Algorithm analysis examines an
elements of computer science that enable effective data algorithm's performance in terms of its time and space
storage, organization, and manipulation. They offer a method complexity. It aids in our comprehension of how an
for efficiently managing and retrieving data. The effectiveness algorithm's efficiency changes as the size of the input
of algorithms and programs can be significantly impacted by increases. The objective is to create efficient and
selecting the appropriate data structure for a given problem. accurate algorithms.
Different types of data structures are created for specific jobs.
Time Complexity: This metric assesses how long
Data structure classification: Linear and non-linear data an algorithm takes to execute in relation to the
structures are the two primary categories into which data size of the input. "Big O" notation, which provides
structures may be divided. Data elements are ordered an upper constraint on the growth rate of an
sequentially in linear data structures, and each element has a algorithm's running time, is typically used to
direct predecessor and successor. Examples include queues, express it.
stacks, linked lists, and arrays. Data elements are not
sequentially arranged in non-linear data structures, which are Space Complexity: It gauges how much memory
a type of data organization. Instead, they are arranged in an algorithm consumes in relation to the size of
hierarchical connections. Graphs and trees are two examples. the input. Big O notation is used to express it, just
like time complexity.
Abstract Data Types: The term "Abstract Data Types" refers to
a high-level definition of data structures that emphasizes their Best, Worst, and Average Case Analysis:
behavior and operations above the specifics of how they are Depending on the type of input, algorithms might
implemented. The operations that can be carried out on the respond in various ways. The performance of an
data structure as well as the guidelines for doing so are algorithm is constrained by its best-case time
specified by ADTs. ADTs frequently take the form of stacks, complexity, worst-case time complexity, and
queues, lists, and dictionaries. average-case analysis, which takes into account
expected performance for a variety of inputs.
Introduction to Data Structures: Data Structure Classification
Linear Data Structures: A data structure is called linear if all of its elements are arranged in a linear order. In
linear data structures, the elements are stored in a non-hierarchical way where each element has successors
and predecessors except the first and last elements.
Arrays: An array is a collection of similar types of data items and each data item is called an element of the
array. The data type of the element may be any valid data type like char, int, float, or double. The elements of
the array share the same variable name but each one carries a different index number known as subscript.
The array can be one-dimensional, two-dimensional, or multi-dimensional.
The individual elements of the array age are: age[0], age[1], age[2], age[3],......... age[98], age[99]
Stack: Stack is a linear list in which insertion and deletions are allowed only at one end, called the top. A stack
is an abstract data type (ADT), that can be implemented in most of the programming languages. It is named a
stack because it behaves like a real-world stack, for example: - piles of plates or decks of cards, etc.
Queue: Queue is a linear list in which elements can be inserted only at one end called the rear and deleted
only at the other end called the front. It is an abstract data structure, similar to a stack. Queue is opened at
both ends therefore it follows the First-In-First-Out (FIFO) methodology for storing the data items.
Non-Linear Data Structures: This data structure does not form a sequence i.e. each item or element is
connected with two or more other items in a non-linear arrangement. The data elements are not arranged
in a sequential structure.
Trees: Trees are multilevel data structures with a hierarchical relationship among their elements known
as nodes. The bottommost nodes in the hierarchy are called leaf nodes while the topmost node is
called root node. Each node contains pointers to point adjacent nodes.
Tree data structure is based on the parent-child relationship among the nodes. Each node in the tree
can have more than one child except the leaf nodes whereas each node can have one parent except
the root node. Trees can be classified into many categories which will be discussed later in this
tutorial.
Graphs: Graphs can be defined as the pictorial representation of the set of elements (represented by
vertices) connected by the links known as edges. A graph is different from a tree in the sense that a graph
can have a cycle while a tree can not have one.
Introduction to Data Structures: Operations on data structure
1) Traversing: Every data structure contains a set of data elements. Traversing the data structure means visiting each element
of the data structure in order to perform some specific operation like searching or sorting.
Example: If we need to calculate the average of the marks obtained by a student in 6 different subjects, we need to traverse
the complete array of marks and calculate the total sum, then we will divide that sum by the number of subjects i.e. 6, in
order to find the average.
2) Insertion: Insertion can be defined as the process of adding elements to the data structure at any location. If the size of the
data structure is n then we can only insert n-1 data elements into it.
3) Deletion: The process of removing an element from the data structure is called Deletion. We can delete an element from
the data structure at any random location. If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is called Searching. There are two
algorithms to perform searching, Linear Search and Binary Search. We will discuss each one of them later in this tutorial.
5) Sorting: The process of arranging the data structure in a specific order is known as Sorting. There are many algorithms that
can be used to perform sorting, for example, insertion sort, selection sort, bubble sort, etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of elements, clubbed or joined to
produce the third list, List C of size (M+N), then this process is called merging
Array / List based representation and operations
In this session, we will discuss the array in the data structure. Arrays are defined as the collection
of similar types of data items stored at contiguous memory locations. It is one of the simplest data
structures where each data element can be randomly accessed by using its index number.
In C programming, they are the derived data types that can store the primitive type of data such as
int, char, double, float, etc. For example, if we want to store the marks of a student in 6 subjects,
then we don't need to define a different variable for the marks in different subjects. Instead, we can
define an array that can store the marks in each subject at the contiguous memory locations.
Properties of array
•Each element in an array is of the same data type and carries the same size that is 4 bytes.
•Elements in the array are stored at contiguous memory locations from which the first element is
stored at the smallest memory location.
•Elements of the array can be randomly accessed since we can calculate the address of each
element of the array with the given base address and the size of the data element.
Array / List-based representation and operations:
Representation of an array -> Why are arrays required?
Arrays are useful because –
We can represent an array in various ways in different •Sorting and searching a value in an
programming languages. As an illustration, let's see the array is easier.
declaration of the array in C language –
•Arrays are best to process multiple
values quickly and easily.
As stated above, all the data elements of an array are stored at contiguous locations in the main memory. The name of
the array represents the base address or the address of the first element in the main memory. Each element of the array
is represented by proper indexing.
• In the above image, we have shown the memory allocation of an array arr of size 5. The array follows a 0-based
indexing approach. The base address of the array is 100 bytes. It is the address of arr[0]. Here, the size of the data
type used is 4 bytes; therefore, each element will take 4 bytes in the memory.
Array / List-based representation and operations: Basic operations: Now, let's discuss the basic
How to access an element from the array? operations supported in the array –
We required the information given below to access any random •Traversal - This operation is used to print the
element from the array – elements of the array.
• Base Address of the array.
• Size of an element in bytes. •Insertion - It is used to add an element at a
• Type of indexing, array follows. particular index.
The formula to calculate the address to access an array element – •Deletion - It is used to delete an element from
Byte address of element A[i] = base address + size * ( i - first index) a particular index.
Here, the size represents the memory taken by the primitive data •Search - It is used to search an element using
types. For an instance, int takes 2 bytes, float takes 4 bytes of the given index or by the value.
memory space in C programming.
•Update - It updates an element at a particular
We can understand it with the help of an example – index.
Suppose an array, A[-10 ..... +2 ] having Base address (BA) = 999 and
size of an element = 2 bytes, find the location of A[-1].
void main()
{
int Arr[5] = {18, 30, 15, 70, 12};
int i;
printf("Elements of the array are:\n");
for(i = 0; i<5; i++)
{
printf("Arr[%d] = %d, ", i, Arr[i]);
}
}
Output
Insertion operation: This operation is performed to printf("Array elements after insertion\n");
insert one or more elements into the array. As per for (i = 0; i < n; i++)
the requirements, an element can be added at the printf("%d ", arr[i]);
beginning, end, or at any index of the array. Now, printf("\n");
let's see the implementation of inserting an return 0;
element into the array. }
int main()
{ Output
int arr[20] = { 18, 30, 15, 70, 12 };
int i, x, pos, n = 5;
printf("Array elements before insertion\n");
for (i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
j = k;
while( j < n)
{
arr[j-1] = arr[j];
j = j + 1;
}
n = n -1;
Search operation
This operation is performed to search an element in the
array based on the value or index.
void main()
{
int arr[5] = {18, 30, 15, 70, 12}; int item = 70, i, j=0 ;
printf("Given array elements are :\n");
for(i = 0; i<5; i++)
{ Output
printf("arr[%d] = %d, ", i, arr[i]);
}
printf("\nElement to be searched = %d", item);
while( j < 5)
{
if( arr[j] == item )
{
break;
}
j = j + 1;
}
printf("\nElement %d is found at %d position", item, j+1);
}
Search operation
This operation is performed to search an element in the
array based on the value or index.
void main()
{
int arr[5] = {18, 30, 15, 70, 12}; int item = 70, i, j=0 ;
printf("Given array elements are :\n");
for(i = 0; i<5; i++)
{ Output
printf("arr[%d] = %d, ", i, arr[i]);
}
printf("\nElement to be searched = %d", item);
while( j < 5)
{
if( arr[j] == item )
{
break;
}
j = j + 1;
}
printf("\nElement %d is found at %d position", item, j+1);
}
Update operation
This operation is performed to update an existing array
element located at the given index.
void main()
{
int arr[5] = {18, 30, 15, 70, 12};
int item = 50, i, pos = 3;
printf("Given array elements are :\n"); Output
arr[pos-1] = item;
printf("\nArray elements after updation :\n");
for(i = 0; i<5; i++)
{
printf("arr[%d] = %d, ", i, arr[i]);
}
}
Array / List-based representation and operations:
Advantages of Array
The complexity of Array operations:
•Array provides the single name for the group of variables of
Time and space complexity of various array operations the same type. Therefore, it is easy to remember the name of
are described in the following table. all the elements of an array.
•Traversing an array is a very simple process; we just need to
Time Complexity increment the base address of the array in order to visit each
element one by one.
•Any element in the array can be directly accessed by using the
Operation Average Case Worst Case index.
Access O(1) O(1) Disadvantages of Array
Search O(n) O(n)
•Array is homogenous. It means that the elements with similar
Insertion O(n) O(n) data type can be stored in it.
•In array, there is static memory allocation that is size of an
Deletion O(n) O(n)
array cannot be altered.
•There will be wastage of memory if we store less number of
elements than the declared size.
Space Complexity
Conclusion: In this session, we have discussed the special data
In array, space complexity for worst case is O(n). structure, i.e., array, and the basic operations performed on it.
Arrays provide a unique way to structure the stored data such
that it can be easily accessed and can be queried to fetch the
value using the index.
What is Searching in Data Structure?
Searching in data structure refers to the process of finding the required information from a collection
of items stored as elements in the computer memory. These sets of items are in different forms,
such as an array, linked list, graph, or tree. Another way to define searching in the data structures is
by locating the desired element of specific characteristics in a collection of items.
Based on the type of search operation, these algorithms are generally classified into two categories:
Sequential Search:
In this, the list or array is traversed sequentially, and every element is checked. For example: Linear
Search.
Interval Search:
These algorithms are specifically designed for searching in sorted data-structures. These type of
searching algorithms are much more efficient than Linear Search as they repeatedly target the
center of the search structure and divide the search space in half. For Example: Binary Search.
Linear Search Algorithm
Linear search is also called as sequential search algorithm. It is the simplest searching algorithm. In
Linear search, we simply traverse the list completely and match each element of the list with the
item whose location is to be found. If the match is found, then the location of the item is returned;
otherwise, the algorithm returns NULL.
It is widely used to search an element from the unordered list, i.e., the list in which items are not
sorted. The worst-case time complexity of linear search is O(n).
The steps used in the implementation of Linear Search are listed as follows -
else high = mid - 1; // Else the element can only be present in right subarray
} return binarySearch(arr, mid + 1, r, x);
}
return -1; // We reach here when element is not present in array
} return -1;
}
int main(void)
{ int main(void)
int array[] = {3, 4, 5, 6, 7, 8, 9}; int x = 4; {
int n = sizeof(array) / sizeof(array[0]); int arr[] = { 2, 3, 4, 10, 40 }; int x = 10; int n = sizeof(arr) / sizeof(arr[0]);
int result = binarySearch(array, x, 0, n - 1); int result = binarySearch(arr, 0, n - 1, x);
if (result == -1) printf("Not found"); (result == -1)
else printf("Element is found at index %d", result); ? printf("Element is not present in array")
return 0; : printf("Element is present at index %d", result); return 0;
} }
Illustration of Binary Search Algorithm: Complexity Analysis of Binary Search
• Insertion sort works similarly to we sort cards in our The first element in the array is assumed to be
hands in a card game. sorted. Take the second element and store it
separately in key
• We assume that the first card is already sorted then, Compare the key with the first element. If the first
we select an unsorted card. If the unsorted card is element is greater than the key, then the key is
greater than the card in hand, it is placed on the right placed in front of the first element.
otherwise, to the left. In the same way, other unsorted
cards are taken and put in their right place.
data = [9, 5, 1, 4, 3]
insertionSort(data)
print('Sorted Array in Ascending Order:')
print(data)
Sorting Techniques: Bubble Sort 1. First Iteration (Compare and Swap)
Bubble Sort is the simplest sorting algorithm that
Starting from the first index, compare the first and the
works by repeatedly swapping the adjacent second elements.
elements if they are in the wrong order. This If the first element is greater than the second element,
algorithm is not suitable for large data sets as its they are swapped.
average and worst-case time complexity is quite Now, compare the second and the third elements. Swap
high. them if they are not in order.
The above process goes on until the last element.
2. Remaining Iteration: The same process goes on for
the remaining iterations. After each iteration, the largest
element among the unsorted elements is placed at the
end.
In this algorithm, the array is divided into two parts, first is sorted part, and another
one is the unsorted part. Initially, the sorted part of the array is empty, and unsorted
part is the given array. Sorted part is placed at the left, while the unsorted part is
placed at the right.
In selection sort, the first smallest element is selected from the unsorted array and
placed at the first position. After that second smallest element is selected and placed
in the second position. The process continues until the array is entirely sorted.
If the array has multiple elements, split the array into •Now, again find that is left index is less than the right
halves and recursively invoke the merge sort on each of index for both arrays, if found yes, then again calculate
the halves. Finally, when both halves are sorted, the mid points for both the arrays.
merge operation is applied. Merge operation is the
process of taking two smaller sorted arrays and
combining them to eventually make a larger one.
Now, a[pivot] = 24, a[left] = 24, and a[right] = 14. As Now, in a similar manner, quick sort algorithm is
a[pivot] > a[right], so, swap a[pivot] and a[right], now separately applied to the left and right sub-arrays. After
pivot is at right, i.e. - sorting gets done, the array will be -
// traverse each element of the array and compare them with the pivot void printArray(int array[], int size) // function to print array elements
for (int j = low; j < high; j++) {
{ for (int i = 0; i < size; ++i)
if (array[j] <= pivot) {
{ printf("%d ", array[i]);
// if element smaller than pivot is found }
// swap it with the greater element pointed by i printf("\n");
i++; }
// swap element at i with element at j
swap(&array[i], &array[j]); int main()
} {
} int data[] = {8, 7, 2, 1, 0, 9, 6}; int n = sizeof(data) / sizeof(data[0]);
printf("Unsorted Array\n"); printArray(data, n);
// swap the pivot element with the greater element at i quickSort(data, 0, n - 1); // perform quicksort on data
swap(&array[i + 1], &array[high]); printf("Sorted array in ascending order: \n");
return (i + 1); // return the partition point printArray(data, n);
} }
# Quick sort in Python # function to perform quicksort
def quickSort(array, low, high):
# function to find the partition position if low < high:
def partition(array, low, high):
# find pivot element such that
# choose the rightmost element as pivot # element smaller than pivot are on the left
pivot = array[high] # element greater than pivot are on the right
pi = partition(array, low, high) Quicksort Applications
# pointer for greater element
i = low - 1 # recursive call on the left of pivot The quicksort algorithm is
quickSort(array, low, pi - 1) used when
# traverse through all elements
# compare each element with pivot # recursive call on the right of pivot •the programming language is
for j in range(low, high): quickSort(array, pi + 1, high) good for recursion
if array[j] <= pivot: •time complexity matters
# if element smaller than pivot is found •space complexity matters
# swap it with the greater element pointed by i data = [8, 7, 2, 1, 0, 9, 6]
i=i+1 print("Unsorted Array")
print(data)
# swapping element at i with element at j
(array[i], array[j]) = (array[j], array[i]) size = len(data)
# swap the pivot element with the greater element specified by i quickSort(data, 0, size - 1)
(array[i + 1], array[high]) = (array[high], array[i + 1])
print('Sorted Array in Ascending Order:')
# return the position from where partition is done print(data)
return i + 1
void quicksort(int number[25],int first,int last)
{ int main()
int i, j, pivot, temp; {
if(first<last) int i, count, number[25];
{ clrscr();
pivot=first; i=first; j=last; printf("How many elements are u going to enter?: ");
while(i<j) scanf("%d",&count);
{
while(number[i]<=number[pivot]&&i<last) printf("Enter %d elements: ", count);
i++; for(i=0;i<count;i++)
while(number[j]>number[pivot]) scanf("%d",&number[i]);
j--;
if(i<j) quicksort(number,0,count-1);
{
temp=number[i]; printf("Order of Sorted elements: ");
number[i]=number[j]; for(i=0;i<count;i++)
number[j]=temp; printf(" %d",number[i]);
} getch();
} return 0;
}
temp=number[pivot];
number[pivot]=number[j];
number[j]=temp;
quicksort(number,first,j-1);
quicksort(number,j+1,last);
}
}
Shrada 116, 158, 147,117, 124, 163, 164, 142,
136, 143, 128, 120, 155, 125, 101, 138, 146, 111,
131, 156, abs on 4th sep ds first hour
Syntax:
class ClassName:
# Class attributes (optional)
class_attribute = "I am a class attribute"
Example:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def greet(self):
print(f"Hello, my name is {self.name} and I am {self.age} years old.")
# Creating an instance of the Person class
john = Person("John Doe", 30)
# Accessing attributes and calling methods
print(john.name) # Output: John Doe
print(john.age) # Output: 30
john.greet() # Output: Hello, my name is John Doe and I am 30 years old.
In this example, we define a class Person with an __init__ method (which is called a constructor)
that initializes the name and age attributes. We also define a greet method which prints a greeting
using the attributes.
2) Objects:
An object is an instance of a class. It is a concrete realization of the class blueprint, with
its own set of attributes and the ability to perform actions defined by the class's methods.
You can create multiple objects from the same class, and each object will have its unique
state.
Syntax:
# Creating objects (instances) of the class
object1 = ClassName(arg1, arg2)
object2 = ClassName(arg3, arg4)
“In the example above, john is an object of the Person class.it has its own name and age
attributes.”
Remaining Key OOP Concepts:
3) Inheritance:
Classes can inherit attributes and methods from other classes, allowing for code reuse and
the creation of hierarchies.
4) Encapsulation:
Encapsulation refers to the practice of bundling the data (attributes) and the methods that
operate on that data into a single unit (class). It helps in data hiding and maintaining the
integrity of the object.
5) Polymorphism:
Polymorphism allows objects of different classes to be treated as objects of a common
superclass. This concept enables flexibility and extensibility in your code.
Differences between the class and objects:
Defines attributes and methods Represents a specific item or entity with its
Purpose that objects of the class will have. own data and behavior.
my_car = Car() or
Example class Car: person1=person(“rakesh”)
Describes what data the objects
Attributes will store. Contains actual data specific to the object.
Describes what actions objects can Allows the object to execute specific
Methods perform. behaviors.
Multiple You can create multiple objects Each object is a distinct instance with its own
Instances from the same class. data.
Numpy:
NumPy (Numerical Python) is a fundamental package for numerical computations in Python. It
provides support for arrays (both 1-dimensional and multi-dimensional), as well as a large
collection of high-level mathematical functions to operate on these arrays.
Or
NumPy is a fundamental library in Python used for numerical and scientific computing. It provides
support for large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays. NumPy is widely used in fields like data analysis, machine
learning, and scientific research due to its efficiency and versatility.
np.array([3.14, 4, 2, 3])
Out[9]: array([ 3.14, 4. , 2. , 3. ])
In[12]: np.zeros(10, dtype=int) # Create a length-10 integer array filled with zeros
In[13]: np.ones((3, 5), dtype=float) # Create a 3x5 floating-point array filled with 1s
Out[13]: array([[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]])
type Description
• bool_ Boolean (True or False) stored as a byte
• int_ Default integer type (same as C long; normally either int64 or int32)
• intc Identical to C int (normally int32 or int64)
• intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
• int8 Byte (–128 to 127)
• Int16 Integer (–32768 to 32767)
• int32 Integer (–2147483648 to 2147483647)
• int64 Integer (–9223372036854775808 to 9223372036854775807)
• uint8 Unsigned integer (0 to 255)
• uint16 Unsigned integer (0 to 65535)
• uint32 Unsigned integer (0 to 4294967295)
• Uint64 Unsigned integer (0 to 18446744073709551615)
• float_ Shorthand for float64
• float16 Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa
1) Attributes of arrays
Definition: Determining the size, shape, memory consumption, and data types of arrays
Syntax:
“array_name.attribute_name”
Example 1:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Getting the shape of the array
dtype = arr.dtype # Getting the data type of the elements
size = arr.size # Number of elements in the array
nbytes = arr.nbytes # Total memory consumption
Explanation:
shape returns a tuple representing the dimensions of the array (e.g., (2, 3) for a 2x3 array).
dtype returns the data type of the elements in the array (e.g., int64 for 64-bit integers).
Example 2:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
element = arr[2] # Accessing the third element (index 2)
Explanation:
You can access individual elements of the array by specifying their index inside square
brackets.
Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr[2]) # Output: 3
3) Slicing of arrays
Definition: Getting and setting smaller subarrays within a larger array
Syntax:
array_name[start:stop:step]
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
subset = arr[1:4] # Slicing to get elements from index 1 to 3
Explanation:
Slicing allows you to extract a portion of an array based on a start index, stop index
(exclusive), and an optional step size.
Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr[1:4]) # Output: [2 3 4]
4) Reshaping of arrays
Definition: Changing the shape of a given array
Syntax:
array_name.reshape(new_shape)
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape((2, 3)) # Reshaping into a 2x3 array
Explanation:
reshape() changes the shape of the array to the specified new_shape. The total number of
elements in the original and reshaped arrays must be the same.
Example 2:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
reshaped_arr = arr.reshape((3, 2))
print(reshaped_arr)
# Output:
# [[1 2]
# [3 4]
# [5 6]]
Joining Arrays:
Syntax:
np.concatenate((array1, array2), axis=0)
Example 1:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
concatenated_arr = np.concatenate((arr1, arr2), axis=0) # Concatenating along the
first axis
Explanation:
np.concatenate() combines two or more arrays along a specified axis. In the example, we
concatenate two 1-dimensional arrays along the first axis (rows).
Example 2:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = np.concatenate((arr1, arr2))
print(result) # Output: [1 2 3 4 5 6]
Splitting Arrays:
Syntax:
np.split(array, indices_or_sections, axis=0)
Example 1:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 2) # Splitting into two equal parts
Explanation:
np.split() splits an array into multiple subarrays along a specified axis. In the example, we
split a 1-dimensional array into two equal parts.
Example 2:
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
result = np.split(arr, 3)
print(result)
# Output:
# [array([1, 2]), array([3, 4]), array([5, 6])]
Aggregations:
There are so many functions in aggregation like:
Min(),Max(),sum(),avgmin(),avgMax(),percentile (Q1,Q2,Q3),
Aggregation functions in NumPy allow you to perform calculations on arrays to summarize their
data. They are useful for gaining insights into data, extracting statistical information, and more.
Here are some common aggregation functions along with examples:
***important_Note: The way the axis is specified here can be confusing to users coming from
other languages. The axis keyword specifies the dimension of the array that will be collapsed,
rather than the dimension that will be returned. So specifying axis=0 means that the first axis will
be collapsed: for two-dimensional arrays, this means that values within each column will be
aggregated.***
1. np.sum()
Syntax:
np.sum(array,axis =None)
Example:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
total_sum = np.sum(arr) # Sum of all elements in the array
print(total_sum) #output =21
Use:
np.sum() computes the sum of all elements in an array.
It can also calculate the sum along a specific axis (e.g., rows or columns) by specifying the
axis parameter.
2. np.mean()
Syntax:
np.mean(array, axis=None)
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
average = np.mean(arr) # Average of the elements
print (average) #output 3.0
Use:
np.mean() calculates the mean (average) of the elements in an array.
Similar to np.sum(), it can compute the mean along a specified axis.
Use:
np.min() finds the minimum value in an array.
np.max() finds the maximum value in an array.
Both functions can also work along a specific axis.
4. np.median()
Syntax:
np.median(array, axis=None)
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
median = np.median(arr) # Median of the elements
print(median) #output : 3.0
Use:
np.median() calculates the median of the elements in an array.
Median is the middle value in a sorted list of numbers and is useful for understanding the
central tendency of data.
5.np.argmin() and np.argmax()
These functions return the indices of the minimum and maximum values in an array, respectively.
Syntax:
np.argmin(array)
np.argmax(array)
Example:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
min_index = np.argmin(arr) # Index of the minimum value
max_index = np.argmax(arr) # Index of the maximum value
print(f"Index of the minimum value: {min_index}")
print(f"Index of the maximum value: {max_index}")
output:
Index of the minimum value: 1
Index of the maximum value: 5
6.np.ptp()
The peak-to-peak (ptp) function calculates the range of values (maximum - minimum) in an array.
Syntax:
np.ptp(array)
Example:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
range_val = np.ptp(arr) # Range of values
print(f"Range of values: {range_val}")
output:
Range of values: 8
7.np.percentile()
This function calculates the nth percentile of an array, which is a value below which a given
percentage of the data falls.
Syntax:
np.percentile(array, percentile)
Example:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
percentile_25 = np.percentile(arr, 25) # 25th percentile
percentile_75 = np.percentile(arr, 75) # 75th percentile
print(f"25th percentile: {percentile_25}")
print(f"75th percentile: {percentile_75}")
output:
25th percentile: 2.5
75th percentile: 5.0
Aggregation functions in NumPy are essential for summarizing data and conducting statistical
analyses. They help in understanding the central tendency, spread, and distribution of data, making
them valuable tools in data analysis, scientific research, and various numerical computations.
8.Multidimensional aggregates
One common type of aggregation operation is an aggregate along a row or column.
Say you have some data stored in a two-dimensional array:
In[9]: M = np.random.random((3, 4))
print(M)
out[9]:[[ 0.8967576 0.03783739 0.75952519 0.06682827]
[ 0.8354065 0.99196818 0.19544769 0.43447084]
[ 0.66859307 0.15038721 0.37911423 0.6687194 ]]
A sample program on Aggregation
import numpy as np
output:
Minimum temperature: 23°C
Maximum temperature: 33°C
Temperature range: 10°C
Median temperature: 26.5°C
25th percentile temperature: 25.25°C
75th percentile temperature: 28.75°C
PANDAS:
Pandas is an open-source data manipulation and analysis library for Python. It provides easy-to-
use data structures (Pandas objects) and data analysis tools, making it one of the most popular
libraries for working with structured data, such as spreadsheets or SQL tables.
Key features of Pandas include:
Data structures: Pandas offers two primary data structures: Series (for 1D data) and
DataFrame (for 2D data).
Data cleaning and preparation: Pandas allows you to clean, transform, and prepare data
for analysis.
Data analysis: You can perform various data analysis tasks, including aggregation,
grouping, filtering, and more.
Data visualization: Pandas can work seamlessly with data visualization libraries like
Matplotlib and Seaborn.
At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays
in which the rows and columns are identified with labels rather than simple integer indices. As we will see
during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top
of the basic data structures, but nearly everything that follows will require an understanding of what these
structures are. Thus, before we go any further, let's introduce these three fundamental Pandas data
structures: the Series, DataFrame, and Index.
We will start our code sessions with the standard NumPy and Pandas imports:
In [1]:import numpy as np
import pandas as pd
series = pd.Series(data, index=index)
Series:
A Series is a one-dimensional array-like object that can hold various data types.
It is similar to a column in a spreadsheet or a single column in a SQL table.
Syntax:
import pandas as pd
series = pd.Series(data, index=index)
In [17]:pd.Series({2:'a', 1:'b', 3:'c'})
Out[16]: 2 a
1 b
3 c
dtype: object
In each case, the index can be explicitly set if a different result is preferred:
In [17]:pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])
Out[17]:3 c
2 a
dtype: object
In [8]:data['b']
Out[8]:0.5
We can even use non-contiguous or non-sequential indices:
In [9]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=[2, 5, 3, 7])
data
Out[9]:
2 0.25
5 0.50
3 0.75
7 1.00
dtype: float64
In [10]:data[5]
Out[10]:0.5
EXAMPLE 2:
import pandas as pd
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
output:
0 1
1 2
2 3
3 4
4 5
dtype: int64
Example 3:
In [2]:data = pd.Series([0.25, 0.5, 0.75, 1.0])
data
Out[2]:
0 0.25
1 0.50
2 0.75
3 1.00
dtype: float64
As we see in the output, the Series wraps both a sequence of values and a sequence of indices,
which we can access with the values and index attributes. The values are simply a familiar
NumPy array:
In [3]:data.values
Out[3]:array([ 0.25, 0.5 , 0.75, 1. ])
The index is an array-like object of type pd.Index, which we'll discuss in more detail
momentarily.
In [4]:data.index
Out[4]:RangeIndex(start=0, stop=4, step=1)
Like with a NumPy array, data can be accessed by the associated index via the familiar Python
square-bracket notation:
In [5]:data[1]
Out[5]:0.5
In [6]:data[1:3]
Out[6]:1 0.50
2 0.75
dtype: float64
As we will see, though, the Pandas Series is much more general and flexible than the one-
dimensional NumPy array that it emulates.
Example 4: ##Series as specialized dictionary
In [11]:population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
population = pd.Series(population_dict)
population
Out[11]:
California 38332521
Florida 19552860
Illinois 12882135
New York 19651127
Texas 26448193
dtype: int64
By default, a Series will be created where the index is drawn from the sorted keys. From here,
typical dictionary-style item access can be performed:
In [12]:population['California']
Out[12]:38332521
Unlike a dictionary, though, the Series also supports array-style operations such as slicing:
In [13]:population['California':'Illinois']
Out[13]:
California 38332521
Florida 19552860
Illinois 12882135
dtype: int64
DataFrame:
A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular
data structure.
It is similar to a spreadsheet or a SQL table.
Syntax:
import pandas as pd
df = pd.DataFrame(data, columns=columns)
example 1:
import pandas as pd
output:
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
Example 2:
import pandas as pd
print(df)
output:
Name Age
0 John 25
1 Jane 30
2 Jim 35
3 Jill 40
***Note: If a Series is an analog of a one-dimensional array with flexible indices, a DataFrame is an
analog of a two-dimensional array with both flexible row indices and flexible column names. Just as you
might think of a two-dimensional array as an ordered sequence of aligned one-dimensional columns, you
can think of a DataFrame as a sequence of aligned Series objects. Here, by "aligned" we mean that they share
the same index.
Example 3:
in[18]:area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
area
Out[18]:
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
dtype: int64
Now that we have this along with the population Series from before, we can use a dictionary to
construct a single two-dimensional object containing this information:
area population
In [20]:states.index
Out[20]:
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')
Additionally, the DataFrame has a columns attribute, which is an Index object holding the column
labels:
In [21]: states.columns
Out[21]:Index(['area', 'population'], dtype='object')
Thus the DataFrame can be thought of as a generalization of a two-dimensional NumPy array,
where both the rows and columns have a generalized index for accessing the data.
In [22]:states['area']
Out[22]:
California 423967
Florida 170312
Illinois 149995
New York 141297
Texas 695662
Name: area, dtype: int64
Notice the potential point of confusion here: in a two-dimesnional NumPy array, data[0] will
return the first row. For a DataFrame, data['col0'] will return the first column. Because of this, it is
probably better to think about DataFrames as generalized dictionaries rather than generalized
arrays, though both ways of looking at the situation can be useful. We'll explore more flexible
means of indexing DataFrames in Data Indexing and Selection.
Given a two-dimensional array of data, we can create a DataFrame with any specified column and
index names. If omitted, an integer index will be used for each:
In [27]:pd.DataFrame(np.random.rand(3, 2),
columns=['foo', 'bar'],
index=['a', 'b', 'c'])
Out[27]:
foo bar
a 0.865257 0.213169
b 0.442759 0.108267
c 0.047110 0.905718
The Pandas Index Object
We have seen here that both the Series and DataFrame objects contain an explicit index that lets you
reference and modify data. This Index object is an interesting structure in itself, and it can be
thought of either as an immutable array or as an ordered set (technically a multi-set,
as Index objects may contain repeated values). Those views have some interesting consequences in
the operations available on Index objects. As a simple example, let's construct an Index from a list
of integers:
The Index in many ways operates like an array. For example, we can use standard Python
indexing notation to retrieve values or slices:
In [31]:ind[1]
Out[31]:3
In [32]:ind[::2]
Out[32]:Int64Index([2, 5, 11], dtype='int64')
Index objects also have many of the attributes familiar from NumPy arrays:
One difference between Index objects and NumPy arrays is that indices are immutable–that is,
they cannot be modified via the normal means:
In [34]:ind[1] = 0
Out[34]:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-40e631c82e8a> in <module>()
----> 1 ind[1] = 0
This immutability makes it safer to share indices between multiple DataFrames and arrays, without
the potential for side effects from inadvertent index modification.
Pandas objects are designed to facilitate operations such as joins across datasets, which depend
on many aspects of set arithmetic. The Index object follows many of the conventions used by
Python's built-in set data structure, so that unions, intersections, differences, and other
combinations can be computed in a familiar way:
In [35]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])
output:
30
35
1)Selection by Label:
You can use labels (column or index names) to select data.
Syntax for DataFrame:
df.loc[row_label, column_label]
Example:
# Selecting a specific cell by label
value = df.loc[1, 'Age']
2)Selection by Position:
You can use integer-based positions to select data.
Syntax for DataFrame:
df.iloc[row_position, column_position]
Example:
# Selecting a specific cell by position
value = df.iloc[1, 1]
3)Boolean Indexing:
You can use boolean conditions to filter data.
Syntax for DataFrame:
df[df['Column_Name'] < value]
Example:
# Selecting rows where Age is less than 30
filtered_df = df[df['Age'] < 30]
DATA VISUALIZATIONS
Visualisation: Simple Line Plots, Simple Scatter Plots, Histograms, Binnings, and Density.
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
output:
Simple Scatter Plots:
Definition: Scatter plots are used to visualize individual data points as dots on a 2D
plane. They are useful for displaying relationships between two continuous
variables.
Or
Scatter plots are used to visualize the relationship between two continuous variables.
Each data point is represented as a dot.
Syntax:import matplotlib.pyplot as plt
plt.scatter(x_values, y_values)
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Y Label")
plt.show()
Example 1:
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [10, 14, 8, 15, 12]
Example 2:
import matplotlib.pyplot as plt
import numpy as np
Binnings:
Definition: Binnings refer to the division of data into discrete intervals (bins) in a
histogram or bar chart. Binning helps in understanding the distribution of data.
Or
Binnings refer to the process of dividing data into bins
Syntax :
import matplotlib.pyplot as plt
plt.hist(data, bins=bin_edges, edgecolor='k')
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Frequency")
plt.show()
Example :
import matplotlib.pyplot as plt
import numpy as np
# Sample data (random values)
data = np.random.randn(1000)
# Specifying custom bin edges
bin_edges = [-3, -2, -1, 0, 1, 2, 3]
# Creating a histogram with custom bins
plt.hist(data, bins=bin_edges, edgecolor='k')
plt.title("Histogram with Custom Bins")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.show()
output:
Density Plots:
Definition: Density plots are used to estimate the probability density function of a
continuous random variable. They provide a smoother representation of data
distribution compared to histograms.
Or
while density plots represent the distribution of data in a smoothed manner.
Syntax:
import seaborn as sns
sns.kdeplot(data, shade=True)
plt.title("Title")
plt.xlabel("X Label")
plt.ylabel("Density")
plt.show()
Example:
import seaborn as sns
import numpy as np
# Sample data (random values)
data = np.random.randn(1000)
# Creating a density plot
sns.kdeplot(data, shade=True)
plt.title("Density Plot")
plt.xlabel("Values")
plt.ylabel("Density")
plt.show()
output:
Note: These visualizations provide different ways to explore and represent data.
Line plots show trends, scatter plots reveal relationships, histograms display
distributions, and density plots offer smoothed distributions. They are essential
tools in data analysis and visualization.
Experiment or assignment :Binnings and Density:
Binnings refer to the process of dividing data into bins, while density plots
represent the distribution of data in a smoothed manner.
Syntax (Density Plot):
import seaborn as sns
sns.kdeplot(data, shade=True, label='label_name')
plt.xlabel('x_label')
splt.ylabel('Density')
plt.title('Density Plot Title')
plt.legend()
plt.show()
Example:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
5
2000
7 1010 18 3000 14
1000
10 1020 30 1100 40
1000
60
1. Create
new node
2. New nodes next pointer points to NULL.
3. Last nodes next pointer points to the new node
Algorithm
Step-1: If header==NULL
print “list is empty”
goto step 5
Step-2: Set ptr = header
Step-3: Set header = header->NEXT
Step-4: Free Ptr
Step-5: stop
2. Deleting the Last Node in Singly Linked List
• Traversing a linked list means accessing the nodes of the list in order to perform
some processing on them.
• Example: Displaying the contents of Linked list, Counting number of nodes in the
linked list, etc..
Algorithm for Traversing a SLL:
Step -1: ptr = header
Step-2: Repeat Steps-3 and 4 while ptr != NULL
Step-3: Apply Process on ptr ->data
Step-4: Set ptr = ptr->next
[END OF LOOP]
Step 5: Stop
Code for Displaying Linked List:
if(header = = NULL)
print “List is empty”;
else
for (ptr = header ; ptr != NULL ; ptr = ptr -> next)
print “ptr->data”;
Searching for a Value in a Linked List
• Searching a linked list means to find a particular element in the linked list.
• A linked list consists of nodes with two parts, the data part and the pointer part.
• In linked list, searching means finding whether a given value is present in the data
part of the node or not. If it is present, then return the address of the node that
contains the search value.
Algorithm to search for a value in Linked list
Step-1: ptr = header
Step-2: Repeat Step-3 while ptr != NULL
Step-3: IF val = ptr->data
SET pos = ptr // val is found in the list.
Go To Step 5
ELSE
SET ptr = ptr->next
[END OF IF]
[END OF LOOP]
Step-4: SET ptr = NULL // val is not found in the linked list.
Step-5: return ptr and stop
Doubly linked list (or) Two-way Linked List
• In a singly linked list we can move from the header node to any node in one direction only (left-right).
• A doubly linked list is a two-way list because one can move in either direction. That is, either from left to
right or from right to left.
• It contains a pointer to the next as well as to the previous node in the sequence. Therefore, it consists of three
parts—data, a pointer to the next node, and a pointer to the previous node
Where, DATA field stores the element or data,
PREVIOUS field contains the address of its previous node, NEXT field contains the address of its next node.
• Insertion
• Deletion
• Traverse
• Search
Data structure:
Data Structures are a way of organizing data so that it can be accessed more efficiently depending upon
the situation. Data Structures are fundamentals of any programming language around which a program
is built.
General data Structure types include arrays, files, linked lists, stacks, queues, trees,graphs and so on…
Linear data structure: Data structure where data elements are arranged sequentially or linearly where
each and every element is attached to its previous and next adjacent is called a linear data structure.
Non linear data structure: Elements of this data structure arc stored /accessed in a non-linear order.
In general, user defined data types are defined along with their operations.
To simplify the process of solving problems, we com bine the data structures with their operations and
we call This Abstract Data Types (ADTs). An ADT consists of two parts:
1.Declaration of data
Commonly used ADTs include: Linked Lists, Stacks, Queues, Priority Queues, Binary Trees, Dic tionaries,
Disjoint Sets (Union and Find), Hash Tables, Graphs, and many others.
For example, stack uses LIFO (Last-In First-Out) mechanism while storing the data in data structures. The
last element inserted into the stack is the First element that gets deleted. Common operations of it are:
creating the stack, pushing an element onto the Stack, popping an element from stack, finding the
current top of the stack, finding number of elements in the Stack, etc.
While defining the ADTs do not. Worry about the implementation details. They come into the picture only
when we want to use them. Different kinds of ADT’s are suited to different locations different kinds of
applications , and some are highly specialized to specific tasks.
ALGORITHM:
Analysis of algorithm: The goal of the analysis of algorithms is to compare algorithms (or solutions)
mainly in terms of running time but also in terms of other factors (e.g., memory, developer effort, etc.)
Input size is the number of elements in the input, and depending on the problem type, the input may be
of different types. The following are the common types of inputs.
• Size of an a rray
• Polynomial degree
The rate at which the running time increases as a function of input is called rate of growth.
The rate at which the running time increases as a function of input is called rate of growth.
TYPES OF ANALYSIS:
To analysis the given algorithm, we need to know with which inputs the algorithm lakes less time
(performing Well) and with which inputs the algorithm takes a long time. We have already seen that an
algorithm can be represented in the form of an expression. That means we represent the algorithm with
multiple expressions: one for the case where it takes less time and another for the case where it takes
more time.
worst case:
O Defines the input for which the algorithm takes a long time.
O Input is the one for which the algorithm runs the slowest.
• Best case
O Defines the input for which the algorithm takes the least time.
O Input is the one for which the algorithm runs the fastest.
• Average case
Asymptotic Notations :
Asymptotic notations are mathematical tools to represent the time complexity of algorithms for
asymptotic analysis.
Theta notation encloses the function from above and below. Since it represents the upper and
the lower bound of the running time of an algorithm, it is used for analyzing the average-case
complexity of an algorithm.
.Theta (Average Case) You add the running times for each possible input combination and take
the average in the average case.
Let g and f be the function from the set of natural numbers to itself. The function f is said to be
Θ(g), if there are constants c1, c2 > 0 and a natural number n0 such that c1* g(n) ≤ f(n) ≤ c2 *
g(n) for all n ≥ n0
Note: Θ provides exact bound
The execution time serves as an upper bound on the algorithm’s time complexity.
Mathematical Representation of Big-O Notation:
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ cg(n) for all n ≥ n0 }
The execution time serves as a lower bound on the algorithm’s time complexity.
It is defined as the condition that allows an algorithm to complete statement execution in the shortest
amount of time.
Mathematical Representation of Omega notation :
Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤ f(n) for all n ≥ n0 }
1. General Properties:
Example:
2. Transitive Properties:
Example:
If f(n) = n, g(n) = n² and h(n)=n³
3. Reflexive Properties:
If f(n) is given then f(n) is O(f(n)). Since MAXIMUM VALUE OF f(n) will be f(n) ITSELF!
Example:
4. Symmetric Properties:
Example:
Example:
If(n) = n , g(n) = n²
2. If f(n) = O(g(n)) and d(n)=O(e(n)) then f(n) + d(n) = O( max( g(n), e(n) ))
Example:
Example:
SORTING TECHNIQUES:
Selection sort:
Selection sort is a simple and efficient sorting algorithm that works by repeatedly selecting the smallest
(or largest) element from the unsorted portion of the list and moving it to the sorted portion of the list.
Code:
Def selection_sort(arr):
N = len(arr)
For I in range(n):
Min_index = i
Min_index = j
# Swap the minimum element with the first element in the unsorted portion
# Example usage
If __name__ == “__main__”:
{SELECTION SORT}
Time Complexity: The time complexity of Selection Sort is O(N2) as there are two nested loops:
Another loop to compare that element with every other Array element = O(N)
MERGE SORT:
In simple terms, we can say that the process of merge sort is to divide the array into two halves, sort
each half, and then merge the sorted halves back together. This process is repeated until the entire array
is sorted
Def mergeSort(arr):
If len(arr) > 1:
Mid = len(arr)//2
L = arr[:mid]
# Into 2 halves
R = arr[mid:]
mergeSort(L)
mergeSort®
I=j=k=0
Arr[k] = L[i]
I += 1
Else:
Arr[k] = R[j]
J += 1
K += 1
Arr[k] = L[i]
I += 1
K += 1
Arr[k] = R[j]
J += 1
K += 1
Def printList(arr):
For I in range(len(arr)):
Print(arr[i], end=” “)
Print()
# Driver Code
If __name__ == ‘__main__’:
printList(arr)
mergeSort(arr)
print(“\nSorted array is “)
printList(arr)
OUTPUT:
Given array is
12 11 13 5 6 7
Sorted array is
5 6 7 11 12 13
Time Complexity: O(N log(N)), Merge Sort is a recursive algorithm and time complexity can be expressed
as following recurrence relation.
Rather than comparing elements directly, Radix Sort distributes the elements into buckets based on each
digit’s value. By repeatedly sorting the elements by their significant digits, from the least significant to
the most significant, Radix Sort achieves the final sorted order.
N = len(arr)
Count[index % 10] += 1
I=n–1
While I >= 0:
Count[index % 10] -= 1
I -= 1
I=0
Arr[i] = output[i]
Def radixSort(arr):
Max1 = max(arr)
Exp = 1
countingSort(arr, exp)
exp *= 10
# Driver code
# Function Call
radixSort(arr)
for I in range(len(arr)):
print(arr[i], end=” “)
Time Complexity:
Radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping
the keys by the individual digits which share the same significant position and value. It has a time
complexity of O(d * (n + b)), where d is the number of digits, n is the number of elements, and b is the
base of the number system being used.
In practical implementations, radix sort is often faster than other comparison-based sorting algorithms,
such as quicksort or merge sort, for large datasets, especially when the keys have many digits. However,
its time complexity grows linearly with the number of digits, and so it is not as efficient for small
datasets.
QUICK SORT:
The key process in quickSort is a partition(). The target of partitions is to place the pivot (any element can
be chosen to be a pivot) at its correct position in the sorted array and put all smaller elements to the left
of the pivot, and all greater elements to the right of the pivot.
Partition is done recursively on each side of the pivot after the pivot is placed in its correct position and
this finally sorts the array.
Choice of Pivot:
Partition Algorithm:
The logic is simple, we start from the leftmost element and keep track of the index of smaller (or equal)
elements as i. While traversing, if we find a smaller element, we swap the current element with arr[i].
Otherwise, we ignore the current element.
Pivot = array[high]
I = low – 1
I=I+1
# Swapping element at I with element at j
Return I + 1
Quicksort(array, low, pi – 1)
Quicksort(array, pi + 1, high)
# Driver code
If __name__ == ‘__main__’:
Array = [10, 7, 8, 9, 1, 5]
N = len(array)
# Function call
Quicksort(array, 0, N – 1)
Print(‘Sorted array:’)
For x in array:
Print(x, end=” “)
Output
Sorted array:
1 5 7 8 9 10
Time Complexity:
The best-case scenario for quicksort occur when the pivot chosen at the each step divides the array into
roughly equal halves.
In this case, the algorithm will make balanced partitions, leading to efficient Sorting.
Quicksort’s average-case performance is usually very good in practice, making it one of the fastest
sorting Algorithm.
The worst-case Scenario for Quicksort occur when the pivot at each step consistently results in highly
unbalanced partitions. When the array is already sorted and the pivot is always chosen as the smallest or
largest element. To mitigate the worst-case Scenario, various techniques are used such as choosing a
good pivot (e.g., median of three) and using Randomized algorithm (Randomized Quicksort ) to shuffle
the element before sorting.
Auxiliary Space: O(1), if we don’t consider the recursive stack space. If we consider the recursive stack
space then, in the worst case quicksort could make O(N).
Selection Sort
Selection Sort
Algorithm
1. Find the minimum value in the list
2. Swap it with the value in the current position
3. Repeat this process for all the elements until the entire array is sorted
This algorithm is called selection sort since it repeatedly selects the smallest element.
Example:
Selection Sort example (cont..)
Given Array
Index
Index
Index
min_pos = 1
Index
min_pos = 2
Selection Sort example (cont..)
Index
min_pos = 2
Index
min_pos = 4
Index
min_pos = 4
Index
min_pos = 4
Index
Iteration 1
completed
Selection Sort example (cont..)
Index
Index
min_pos = 2
Index
min_pos = 2
Index
min_pos = 2
Selection Sort example (cont..)
Index
min_pos = 2
Index
min_pos = 6
Index
Index
Iteration 2
completed
Selection Sort example (cont..)
Index
Index
min_pos = 2
Index
min_pos = 2
Index
min_pos = 2
Selection Sort example (cont..)
Index
min_pos = 2
Index
Iteration 3
completed
Index
Selection Sort example (cont..)
Index
Index
min_pos = 3
Index
min_pos = 3
Index
min_pos = 3
Selection Sort example (cont..)
Index
Index
Iteration 4
completed
Selection Sort example (cont..)
Index
Index
min_pos = 4
Index
min_pos = 6
Index
Index
Iteration 5
completed
Selection Sort example (cont..)
Index
Index
min_pos =6
Index
Index
Iteration 6
completed
Selection Sort
Time Complexity
2
MergeSort Algorithm
Compare 12 and 11
First: (12, 16, 17, 20, 21, 27)
Second: (12, 19)
New: (9, 10, 11)
Compare 12 and 12
First: (16, 17, 20, 21, 27)
Second: (12, 19)
New: (9, 10, 11, 12)
5
Merge Example
Compare 16 and 12
First: (16, 17, 20, 21, 27)
Second: (19)
New: (9, 10, 11, 12, 12)
Compare 16 and 19
First: (17, 20, 21, 27)
Second: (19)
New: (9, 10, 11, 12, 12, 16)
6
Merge Example
Compare 17 and 19
First: (20, 21, 27)
Second: (19)
New: (9, 10, 11, 12, 12, 16, 17)
Compare 20 and 19
First: (20, 21, 27)
Second: ()
New: (9, 10, 11, 12, 12, 16, 17, 19)
7
Merge Example
8
Merge-Sort Tree
An execution of merge-sort is depicted by a binary tree
– each node represents a recursive call of merge-sort and stores
unsorted sequence before the execution and its partition
sorted sequence at the end of the execution
7 2 9 4 2 4 7 9
7 2 2 7 9 4 4 9
10
Execution Example
Partition
7 2 9 43 8 6 1
11
Execution Example (cont.)
7 29 4
12
Execution Example (cont.)
7 29 4
72
13
Execution Example (cont.)
7 29 4
72
77
14
Execution Example (cont.)
7 29 4
72
77 22
15
Execution Example (cont.)
Merge
7 2 9 43 8 6 1
7 29 4
722 7
77 22
16
Execution Example (cont.)
Recursive call, …, base case, merge
7 2 9 43 8 6 1
7 29 4
722 7 9 4 4 9
17
Execution Example (cont.)
Merge
7 2 9 43 8 6 1
7 29 4 2 4 7 9
722 7 9 4 4 9
18
Execution Example (cont.)
7 29 4 2 4 7 9 3 8 6 1 1 3 6 8
722 7 9 4 4 9 3 8 3 8 6 1 1 6
19
Execution Example (cont.)
Merge
7 2 9 43 8 6 1 1 2 3 4 6 7 8 9
7 29 4 2 4 7 9 3 8 6 1 1 3 6 8
722 7 9 4 4 9 3 8 3 8 6 1 1 6
20
Mergesort Analysis
• Let T(N) be the running time for an array of N
elements
• Mergesort divides array in half and calls itself
on the two halves. After returning, it merges
both halves using a temporary array
• Each recursive call takes T(N/2) and merging
takes O(N)
Mergesort Recurrence Relation
• The recurrence relation for T(N) is:
– T(1) < a
• base case: 1 element array constant time
– T(N) < 2T(N/2) + bN
• Sorting N elements takes
– the time to sort the left half
– plus the time to sort the right half
– plus an O(N) time to merge the two halves
Idea : Partition array into items that are “small” and items that are “large”,
then recursively sort the two sets
Quick Sort
Implementation
Example
https://learnprogramo.com/quick-sort-programs-in-c/
Example
https://learnprogramo.com/quick-sort-programs-in-c/
Example
This completes
31 26 20 17 44 54 77 55 93
iteration one
Now left array contains all elements < 54 and right array contains all elements >54
31 26 20 17 44 54 77 55 93
17 26 20 31 44 54 77 55 93
17 26 20 31 44 54 77 55 93
17 20 26 31 44 54 77 55 93
17 20 26 31 44 54 55 77 93
Quick Sort
Analysis
Let us assume that T(n) be the complexity of Quicksort and also assume that all elements
are distinct.
Recurrence for T(n) depends on two subproblem sizes which depend on partition element.
If pivot is ith smallest element then exactly (i–1) items will be in left part and (n–i) in right
part.
Let us call it as i–split.Since each element has equal probability of selecting it as pivot the
probability of selecting ith element is (1/n) .
T(n)=2T(n/2)+n= O(nlogn)
Quick Sort
Worst Case: Each partition gives unbalanced splits and we get
Theworst-case occurs when the list is already sorted and last element chosen as
pivot.
Average Case: In the average case of Quick sort, we do not know where the split
happens. For this reason, we take all possible values of split locations, add all
their complexities and divide with n to get the average case complexity.
Quick Sort Implementation
Radix Sort
Radix Sort
Radix Sort
– radix is a synonym for base. base 10, base 2
Multi pass sorting algorithm that only looks
at individual digits during each pass
Use queues as buckets to store elements
Create an array of 10 queues
Starting with the least significant digit place
value in queue that matches digit
empty queues back into array
repeat, moving to next least significant digit
CS314
3
Queues
Radix Sort in Action: 1s place
• Step-1: Reverse the infix string. Note that while reversing the
string you must interchange left and right parentheses.
• Step-2: Obtain the postfix expression of the infix expression
obtained in Step 1.
• Step-3: Reverse the postfix expression to get prefix
expression
• Step-4: Exit
• Example:
• Given Infix Expression: (A – B/ C) * (A / K – L)
• Step 1: Reverse the infix string.
(L – K / A) * (C / B – A)
• Step 2: Obtain the corresponding postfix expression of the infix expression .
postfix expression of (L – K / A) * (C / B – A) is
LKA/ – C B /A– *
• Step 3: Reverse the postfix expression to get the prefix expression
* –A/ B C -/AK L
• Hence, the prefix expression of (A – B/ C) * (A / K – L) is
* –A/ B C - /AK L
Example 2: (A+B^C)*D+E^5
Ans: +*+A^BCD^E5
Evaluation of a Prefix Expression
Step-1: Create an empty stack.
Step-2: Scan Prefix expression from right to left and Repeat steps 3 and 4 for each element of the
expression LOOP
Step-3: If the scanned character is an operand then
PUSH it onto the stack.
Step-4: If the scanned character is an operator op1 then
1. Remove top two elements of stack, where A is the top and B is the next top element.
2. Evaluate, A op1 B
3. PUSH the result of evaluation onto the stack.
[END OF IF]
[END OF LOOP]
Step-4: Set the RESULT as the top most value of the stack.
Step-5. Exit
Evalute the Prefix Expression
-+8/632
Evalute the Prefix Expression /*20*50+3 6 300 2
+ - 2 7 * 8 / 4 12
Ans: 28
Conversion of an Infix Expression into a Postfix Expression
• An algebraic expression written in infix notation may contain parentheses,
operands, and operators.
• The order of evaluation of the operators in the Infix expression can be changed by
the use of parentheses.
Properties
Order of the numbers (or operands) is unchanged but order of operators may be
changed.
Example: Let us consider the infix expression 2 + 3*4 and its postfix equivalent
234*+. Notice that between infix and postfix the order of the numbers (or operands)
is unchanged. It is 2 3 4 in both cases. But the order of the operators * and + is
affected in the two expressions.
Only one stack is enough to convert an infix expression to postfix expression.
This stack will be used to change the order of operators from infix to postfix.
This stack will only contain operators and the Left parenthesis symbol ‘(‘.
Algorithm to convert Infix To Postfix Infix Expression: A+ (B*C-(D/E^F)*G)*H, where ^ is an exponential operator.
Let, X is an arithmetic expression written in infix
notation. This algorithm finds the equivalent postfix
expression Y.
1.Push “(“onto Stack, and add “)” to the end of X.
2.Scan X from left to right and repeat Step 3 to 6 for
each element of X until the Stack is empty.
3.If an operand is encountered, add it to Y.
4.If a left parenthesis is encountered, push it onto
Stack.
5.If an operator is encountered ,then:
1. Repeatedly pop from Stack and add to Y each
operator (on the top of Stack) which has the
same precedence as or higher precedence
than operator.
2. Add operator to Stack.
[End of If]
6.If a right parenthesis is encountered ,then:
1. Repeatedly pop from Stack and add to Y each
operator (on the top of Stack) until a left
parenthesis is encountered.
2. Remove the left Parenthesis.
[End of If]
[End of If]
7.END.
Evaluation of a Postfix(reverse polish notation) Expression
Step-1: Create an empty stack.
Step-2: Scan expression from left to right and Repeat steps 3 and 4 for each element of the expression
LOOP
Step-3: If the scanned character is an operand then
PUSH it onto the stack.
Step-4: If the scanned character is an operator op1 then
1. Remove top two elements of stack, where A is the top and B is the next top element.
2. Evaluate, B op1 A
3.PUSH the result of evaluation onto the stack.
[END OF IF]
[END OF LOOP]
Step-4: Set the RESULT as the top most value of the stack.
Step-5. Exit
Example: 2 10 + 9 6 - /
Evalute the Postfix Expression
934*8+4/–
• Convert infix to postfix expression
• 1.(A-B)*(D/E)
• 2.(A+B^D)/(E-F)+G
• 3.A*(B+D)/E-F*(G+H/K)
• 4.((A+B)*D)^(E-F)
• 5.(A-B)/((D+E)*F)
• 6.((A+B)/D)^((E-F)*G)
• 7.12/(7-3)+2*(1+5)
• 8.5+3^2-8/4*3+6
• 9.6+2^3+9/3-4*5
• 10.6+2^3^2-4*5
• Evaluate the postfix expression ( , is the separator )
• 1.5,3,+,2,*,6,9,7,-,/,-
• 2.3,5,+,6,4,-,*,4,1,-,2,^,+
• 3.3,1,+,2,^7,4,1,-,2,*,+,5.-
Array representation of Queue
-> Every queue has front and rear variables that point to the position from where deletions and insertions
can be done, respectively.
.
-> Before inserting an element in a queue, we must check for overflow conditions. An overflow will occur
when we try to insert an element into a queue that is already full. When REAR = MAX – 1, where MAX is
the size of the queue, we have an overflow condition
1. Algorithm to insert an element in a queue
Step 1: IF REAR = MAX-1 Explanation of Algorithm:
Write OVERFLOW Step 1, we first check for the overflow condition.
Goto step 4 Step 2, we check if the queue is empty. In case the queue
[END OF IF] is
empty, then both FRONT and REAR are set to zero, so
Step 2: IF FRONT = -1 and REAR = -1
that the
SET FRONT = REAR =0 new value can be stored at the 0th location. Otherwise, if
ELSE the
SET REAR = REAR + 1 queue already has some values, then REAR is
incremented so
[END OF IF]
that it points to the next location in the array.
Step 3: SET QUEUE[REAR] = NUM Step 3, the value is stored in the queue at the location
Step 4: EXIT pointed by REAR.
-> Before deleting an element from a queue, we must check for underflow conditions. An underflow
condition occurs when we try to delete an element from a queue that is already empty. If FRONT = –1
and REAR = –1, it means there is no element in the queue.
• Consider a scenario in which two successive deletions are made. Even though there is space available, the
overflow condition still exists because the condition rear = MAX – 1 still holds true. This is a major
drawback of a linear queue.
• To resolve this problem, we have two solutions. First, shift the elements to the left so that the vacant space
can be occupied and utilized efficiently. But this can be very time-consuming, especially when the queue is
quite large.
• The second option is to use a circular queue. In the circular queue, the first index comes right after the last
index.
• Queue is called circular when the last room comes just before the first room. That is, Q[0] comes after Q[n-1].
• The circular queue will be full only when front = 0 and rear = Max – 1.
• It is implemented in the same manner as a linear queue is implemented. The only difference will be in the
code that performs insertion and deletion operations.
• It uses 2 variables to keep track of first element and last element.
• Front is used to refer first element and rear is used to refer last element.
• Condition for “Circular Queue is Empty ”
FRONT = –1 and REAR=-1
• Condition for “Circular Queue is FULL ”
The pointers are maintained based on the requirements and accordingly linked list can be classified into three
groups,
1. Singly linked lists
2. Circular linked lists
3. Doubly linked lists
In a linked list, every node contains a NEXT pointer to another node which points to a node of the same type.
Hence, it is also called a self-referential data type.
We can use the following steps to delete a node from beginning of the single linked list...
Step 1 - Check whether list is Empty (head == NULL)
Step 2 - If it is Empty then, display 'List is Empty!!! Deletion is not possible' and terminate the function.
Step 3 - If it is Not Empty then, define a Node pointer 'temp' and initialize with head.
Step 4 - Check whether list is having only one node (temp → next == NULL)
Step 5 - If it is TRUE then set head = NULL and delete temp (Setting Empty list conditions)
Step 6 - If it is FALSE then set head = temp → next, and delete temp.
def deleteLastNode(self):
if self.length == 0:
print "The list is empty"
else:
currentnode = self.head
previousnode = self.head
while currentnode.getNext() != None:
previousnode = currentnode
currcntnode = currentnode.getNext()
previousnode. setNext(None)
self. length -= l
Similarly we can insert and delete using other options
We can use the following steps to insert a new node at beginning of the double linked list...
Step 1 - Create a newNode with given value and newNode → previous as NULL.
Step 2 - Check whether list is Empty (head == NULL)
Step 3 - If it is Empty then, assign NULL to newNode → next and newNode to head.
Step 4 - If it is not Empty then, assign head to newNode → next and newNode to head.
We can use the following steps to delete a node from beginning of the double linked list...
Step 1 - Check whether list is Empty (head == NULL)
Step 2 - If it is Empty then, display 'List is Empty!!! Deletion is not possible' and terminate the
function.
Step 3 - If it is not Empty then, define a Node pointer 'temp' and initialize with head.
Step 4 - Check whether list is having only one node (temp → previous is equal to temp → next)
Step 5 - If it is TRUE, then set head to NULL and delete temp (Setting Empty list conditions)
Step 6 - If it is FALSE, then assign temp → next to head, NULL to head → previous and delete
temp.
Deleting from End of the list
We can use the following steps to delete a node from end of the double linked list...
Step 1 - Check whether list is Empty (head == NULL)
Step 2 - If it is Empty, then display 'List is Empty!!! Deletion is not possible' and terminate the
function.
Step 3 - If it is not Empty then, define a Node pointer 'temp' and initialize with head.
Step 4 - Check whether list has only one Node (temp → previous and temp → next both are
NULL)
Step 5 - If it is TRUE, then assign NULL to head and delete temp. And terminate from the
function. (Setting Empty list condition)
Step 6 - If it is FALSE, then keep moving temp until it reaches to the last node in the list. (until
temp → next is equal to NULL)
class Node:
def __init__(self, data):
self.data = data
self.next = None
self.prev = None
class DoublyLinkedList:
def __init__(self):
self.head = None
# insert node at the front
def insert_front(self, data):
# allocate memory for newNode and assign data to newNode
new_node = Node(data)
# make newNode as a head
new_node.next = self.head
# assign null to prev (prev is already none in the constructore)
# previous of head (now head is the second node) is newNode
if self.head is not None:
self.head.prev = new_node
# head points to newNode
self.head = new_node
# insert a node after a specific node
def insert_after(self, prev_node, data):
# check if previous node is null
if prev_node is None:
print("previous node cannot be null")
return
# allocate memory for newNode and assign data to newNode
new_node = Node(data)
# set next of newNode to next of prev node
new_node.next = prev_node.next
# set next of prev node to newNode
prev_node.next = new_node
# set prev of newNode to the previous node
new_node.prev = prev_node
# set prev of newNode's next to newNode
if new_node.next:
new_node.next.prev = new_node
# insert a newNode at the end of the list
def insert_end(self, data):
# allocate memory for newNode and assign data to newNode
new_node = Node(data)
# assign null to next of newNode (already done in constructor)
# if the linked list is empty, make the newNode as head node
if self.head is None:
self.head = new_node
return
# store the head node temporarily (for later use)
temp = self.head
# if the linked list is not empty, traverse to the end of the linked list
while temp.next:
temp = temp.next
# now, the last node of the linked list is temp
# assign next of the last node (temp) to newNode
temp.next = new_node
# assign prev of newNode to temp
new_node.prev = temp
return
# delete a node from the doubly linked list
def deleteNode(self, dele):
# if head or del is null, deletion is not possible
if self.head is None or dele is None:
return
# if del_node is the head node, point the head pointer to the next of del_node
if self.head == dele:
self.head = dele.next
# if del_node is not at the last node, point the prev of node next to del_node to the previous of
del_node
if dele.next is not None:
dele.next.prev = dele.prev
# if del_node is not the first node, point the next of the previous node to the next node of del_node
if dele.prev is not None:
dele.prev.next = dele.next
# free the memory of del_node
gc.collect()
# print the doubly linked list
def display_list(self, node):
while node:
print(node.data, end="->")
last = node
d_linked_list = DoublyLinkedList()
d_linked_list.insert_end(5)
d_linked_list.insert_front(1)
d_linked_list.insert_front(6)
d_linked_list.insert_end(9)
# insert 11 after head
d_linked_list.insert_after(d_linked_list.head, 11)
# insert 15 after the seond node
d_linked_list.insert_after(d_linked_list.head.next, 15)
d_linked_list.display_list(d_linked_list.head)
# delete the last node
d_linked_list.deleteNode(d_linked_list.head.next.next.next.next.next)
print()
d_linked_list.display_list(d_linked_list.head)
Here, the address of the last node consists of the address of the first node.
2. Circular Doubly Linked List
Here, in addition to the last node storing the address of the first node, the first node will also store the address of the
last node.
Operations on circular linked lists can be performed exactly like a singly linked list. It’s just that we have to
maintain an extra pointer to check if we have gone through the list once. The circular linked list is the collection
of nodes in which tail node also point back to head node. The diagram shown below depicts a circular linked list.
Node A represents head and node D represents tail. So, in this list, A is pointing to B, B is pointing to C and C is
pointing to D but what makes it circular is that node D is pointing back to node A.
Stacks
1. Stack is a data structure in which addition of new element or deletion of an existing element always takes
place at the same end. This end is often known as top of stack. When an item is added to a stack, the
operation is called push, and when an item is removed from the stack the operation is called pop. Stack
is also called as Last- In-First- Out (LIFO) list.
2. Stacks are used in function calls. The system stack ensures a proper execution order of functions.
Therefore, stacks are frequently used in situations where the order of processing is very important,
especially when the processing needs to be postponed until other conditions are fulfilled.
Applications of stacks
1. Pile of plates in cafeteria - The plates are added to the stack as they are cleaned and they are placed on the
top. When a plate, is required it is taken from the top of the stack. The first plate placed on the stack is the
last one to be used.
2. Stack of coins
3. Stack of Books
Operations on Stack:
There are two possible operations done on a stack. They are pop and push operations.
Push: Allows adding an element at the top of the stack.
Pop: Allows removing an element from the top of the stack.
The Stack can be implemented using both arrays and linked lists. When dynamic memory allocation is preferred,
we go for linked lists to implement the stacks.
Attempting the execution of an operation may sometimes cause an error condition, called an exception.
• In the Stack ADT, operations pop and top cannot be performed if the stack is empty. The execution of pop (or)
top operation on an empty stack throws an exception called as underflow.
ALGORITHM / PROCEDURE:
To push a node in the stack :
step 1. Initialise a node
step 2. Update the value of that node by data i.e. node->data = data
step 3. Now link this node to the top of the linked list
step 4. And update top pointer to the current node
class Node:
def __init__(self, data):
self.data=data
self.next=None
class LinkedList:
def __init__(self):
self.head=None
self.tail=None
def insert_at_beg(self, data):
new_node=Node(data)
if self.head is None:
self.head = new_node
else:
new_node.next = self.head
self.head = new_node
def insert_at_end(self, data):
new_node=Node(data)
if self.tail is None:
self.tail = new_node
else:
new_node.next = self.tail
self.tail = new_node
def delete_at_beg(self):
if self.head is None:
return None
else:
delnode=self.head
self.head=self.head.next
return delnode.data
def delete_at_end(self):
if self.tail is None:
return None
else:
delnode=self.head
while delnode.next!=self.tail:
delnode=delnode.next
delnode.next=None
self.tail=delnode
def get_head(self):
if self.head!=None:
return self.head
else:
return None
def get_tail(self):
if self.tail!=None:
return self.tail
else:
return None
class Stack:
def __init__(self):
self.stack= LinkedList()
self.top=self.stack.get_head()
def push(self,data):
self.stack.insert_at_beg(data)
self.top=self.stack.get_head()
def pop(self):
x=self.stack.delete_at_beg()
if x is None:
print("Stack is Empty")
else:
print(f"{x} deleted from stack")
self.top=self.stack.get_head()
def display(self):
if self.top is None:
print("Stack is Empty")
else:
curr=self.top
while curr:
print(curr.data,end='\n')
curr = curr.next
s=Stack()
while(True):
print("1.Push 2.Pop 3.Display 4.exit")
ip=int(input("Enter the input"))
if(ip==1):
ele=int(input("Enter element"))
s.push(ele)
elif(ip==2):
ele=s.pop()
elif(ip==3):
s.display()
else:
break
Queue
Queue:
A queue is another special kind of list, where items are inserted at one end called the rear and deleted at the other
end called the front. Another name for a queue is a ―FIFO‖ or ―First-in-first-out‖ list.
The operations for a queue are analogues to those for a stack, the difference is that the insertions go at the end of
the list, rather than the beginning. We shall use the following operations on queues:
• enqueue: which inserts an element at the end of the queue.
• dequeue: which deletes an element at the start of the queue.
Representation of Queue:
The header pointer of the linked list is used as FRONT. Another pointer called REAR, which will store the address
of the last element in the queue.
•All insertions will be done at the rear end and all the deletions will be done at the front end.
• Condition for “Empty Queue” : FRONT = REAR = NULL
•Space complexity of linked list representation of the queue with n elements is O(n), and time complexity for the
operations is O(1).
•First check if FRONT=NULL then allocate memory for a new node and new node will be both FRONT and
REAR.
•If FRONT!=NULL then insert the new node at the rear end of the linked queue and name this new node as REAR.
Linked List Implementation of Queue
class Node:
def __init__(self, data):
self.data=data #
self.next=None
class LinkedList:
def __init__(self):
self.head=None
self.tail=None
def insert_at_beg(self, data):
new_node=Node(data)
if self.head is None:
self.head = new_node
else:
new_node.next = self.head
self.head = new_node
def insert_at_end(self, data):
new_node=Node(data)
if self.tail is None:
self.tail = new_node
self.head = new_node
else:
self.tail.next=new_node
self.tail = new_node
def delete_at_beg(self):
if self.head is None:
return None
else:
delnode=self.head
self.head=self.head.next
return delnode.data
def delete_at_end(self):
if self.tail is None:
return None
else:
delnode=self.head
while delnode.next!=self.tail:
delnode=delnode.next
delnode.next=None
self.tail=delnode
def get_head(self):
if self.head!=None:
return self.head
else:
return None
def get_tail(self):
if self.tail!=None:
return self.tail
else:
return None
class Queue:
def __init__(self):
self.q= LinkedList() #Create a linked list called q which is the object of LinkedList class
self.head=self.q.get_head() #rear is initialized with head
self.tail=self.q.get_tail()
def enque(self,data):
self.q.insert_at_end(data)
self.tail=self.q.get_tail()
def deque(self):
x=self.q.delete_at_beg()
if x is None:
print("Queue is Empty")
else:
print(f"{x} deleted from queue")
self.head=self.q.get_head()
def display(self):
if self.tail is None:
print("Queue is Empty")
else:
curr=self.q.get_head()
while curr :
print(curr.data,end='\n')
curr=curr.next
s=Queue()
while(True):
print("1.Enque 2.Deque 3.Display 4.exit")
ip=int(input("Enter the input"))
if(ip==1):
ele=int(input("Enter element"))
s.enque(ele)
elif(ip==2):
ele=s.deque()
elif(ip==3):
s.display()
else:
break
UNIT 4 TREES : Introduction, binary trees, type of trees. ,properties
of binary trees, trees ,binary tree traversals,binary search trees ,Graph :
intoduction ,applications of graph , graph represntation, graph traversalas .
INTRODUCTION
Tree is a non-linear data structure. It is a hierarchical data structure that has
nodes connected through links. The topmost node of the tree which has no
parent is known as the root node.
BINARY TREE
Tree represents the nodes connected by edges. It is a non-linear
data structure. It has the following properties −
• One node is marked as Root node.
• Every node other than the root is associated with one parent
node.
• Each node can have an arbiatry number of chid node.
CODE
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
def PrintTree(self):
print(self.data)
root = Node(10)
root.PrintTree()
OUTPUT
10
Inserting into a Tree
CODE :
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
It is a type of binary tree in which the difference between the height of the left
and the right subtree for each node is either 0 or 1. In the figure above, the root
node having a value 0 is unbalanced with a depth of 2 units.
Some Special Types of Trees:
On the basis of node values, the Binary Tree can be classified
into the following special types:
Binary Search Tree
AVL Tree
Red Black Tree
B Tree
B+ Tree
Segment Tree
2. AVL Tree
AVL tree is a self-balancing Binary Search Tree (BST) where the difference
between heights of left and right subtrees cannot be more than one for all nodes.
Example of AVL Tree shown below:
The below tree is AVL because the differences between the heights of left and
right subtrees for every node are less than or equal to 1
3. Red Black Tree
A red-black tree is a kind of self-balancing binary search tree where each node has
an extra bit, and that bit is often interpreted as the color (red or black). These
colors are used to ensure that the tree remains balanced during insertions and
deletions. Although the balance of the tree is not perfect, it is good enough to
reduce the searching time and maintain it around O(log n) time, where n is the
total number of elements in the tree. This tree was invented in 1972 by Rudolf
Bayer.
Properties of Binary Tree
5. In a Binary tree where every node has 0 or 2 children, the number of leaf nodes
is always one more than nodes with two children:
L=T+1
Where L = Number of leaf nodes
T = Number of internal nodes with two children
Proof:
No. of leaf nodes (L) i.e. total elements present at the bottom of tree = 2h-1 (h is
height of tree)
No. of internal nodes = {total no. of nodes} – {leaf nodes} = { 2h – 1 } – {2h-1} = 2h-
1 (2-1) – 1 = 2h-1 – 1
So , L = 2h-1
T = 2h-1 – 1
Therefore L = T + 1
Hence proved
6. In a non-empty binary tree, if n is the total number of nodes and e is
the total number of edges, then e = n-1:
Every node in a binary tree has exactly one parent with the exception of
the root node. So if n is the total number of nodes then n-1 nodes have
exactly one parent. There is only one edge between any child and its
parent. So the total number of edges is n-1.
2.The node at the top of the tree is called the root node: The root node
is the first node in a binary tree and all other nodes are connected to it.
All other nodes in the tree are either child nodes or descendant nodes
of the root node.
3.Nodes that do not have any child nodes are called leaf nodes: Leaf
nodes are the endpoints of the tree and have no children. They
represent the final result of the tree.
4.The height of a binary tree is defined as the number of edges from
the root node to the deepest leaf node: The height of a binary tree is
the length of the longest path from the root node to any of the leaf
nodes. The height of a binary tree is also known as its depth.
5.In a full binary tree, every node except the leaves has exactly two
children: In a full binary tree, all non-leaf nodes have exactly two
children. This means that there are no unary nodes in a full binary tree.
6.In a complete binary tree, every level of the tree is completely filled
except for the last level, which can be partially filled: In a complete
binary tree, all levels of the tree except the last level are completely
filled. This means that there are no gaps in the tree and all nodes are
connected to their parent nodes.
7 . In a balanced binary tree, the height of the left and right subtrees of
every node differ by at most 1: In a balanced binary tree, the height of
the left and right subtrees of every node is similar. This ensures that the
tree is balanced and that the height of the tree is minimized.
8. The in-order, pre-order, and post-order traversal of a binary tree are
three common ways to traverse the tree: In-order, pre-order, and post-
order are three different ways to traverse a binary tree. In-order
traversal visits the left subtree, the node itself, and then the right
subtree. Pre-order traversal visits the node itself, the left subtree, and
then the right subtree. Post-order traversal visits the left subtree, the
right subtree, and then the node itself.
Binary Search Tree is a node-based binary tree data structure which has the
following properties:
1. The left subtree of a node contains only nodes with keys lesser than the node’s
key.
2.The right subtree of a node contains only nodes with keys greater than the
node’s key.
3.The left and right subtree each must also be a binary search tree.
CODE
If root == NULL
return NULL;
If number == root->data
return root->data;
If number < root->data
return search(root->left)
If number > root->data
return search(root->right)
1.Search Operation:
The algorithm depends on the property of BST that if each left subtree has values
below root and each right subtree has values above the root.
If the value is below the root, we can say for sure that the value is not in the right
subtree; we need to only search in the left subtree and if the value is above the
root, we can say for sure that the value is not in the left subtree; we need to only
search in the right subtree.
2. Insert Operation
Inserting a value in the correct position is similar to searching because we try to
maintain the rule that the left subtree is lesser than root and the right subtree is
larger than root.
We keep going to either right subtree or left subtree depending on the value and
when we reach a point left or right subtree is null, we put the new node there.
Algorithm:
If node == NULL
return createNode(data)
if (data < node->data)
node->left = insert(node->left, data);
else if (data > node->data)
node->right = insert(node->right, data);
return node;
3.Deletion Operation
There are three cases for deleting a node from a binary search tree.
Case I
In the first case, the node to be deleted is the leaf node. In such a case, simply
delete the node from the tree.
Case II
In the second case, the node to be deleted lies has a single child node. In such a
case follow the steps below:
1.Replace that node with its child node.
2.Remove the child node from its original position.
Case III
In the third case, the node to be deleted has two children. In such a case follow
the steps below:
1.Get the inorder successor of that node.
2.Replace the node with the inorder successor.
3.Remove the inorder successor from its original position.
# Create a node
class Node:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
# Inorder traversal
def inorder(root):
if root is not None:
# Traverse left
inorder(root.left)
# Traverse root
print(str(root.key) + "->", end=' ')
# Traverse right
inorder(root.right)
# Insert a node
def insert(node, key):
return node
return current
# Deleting a node
def deleteNode(root, key):
root.key = temp.key
# Delete the inorder successor
root.right = deleteNode(root.right, temp.key)
return root
root = None
root = insert(root, 8)
root = insert(root, 3)
root = insert(root, 1)
root = insert(root, 6)
root = insert(root, 7)
root = insert(root, 10)
root = insert(root, 14)
root = insert(root, 4)
print("\nDelete 10")
root = deleteNode(root, 10)
print("Inorder traversal: ", end=' ')
inorder(root)
GRAPHS :
A graph is a pictorial representation of a set of objects where some pairs of
objects are connected by links. The interconnected objects are represented by
points termed as vertices, and the links that connect the vertices are called edges.
The various terms and functionalities associated with a graph is described in great
detail in our tutorial here.
In this chapter we are going to see how to create a graph and add various data
elements to it using a python program. Following are the basic operations we
perform on graphs.
4.Add an edge
5.Creating a graph
A graph can be easily presented using the python dictionary data types. We
represent the vertices as the keys of the dictionary and the connection between
the vertices also called edges as the values in the dictionary.
Take a look at the following graph −
In the above graph,
V = {a, b, c, d, e}
E = {ab, ac, bd, cd, de}
graph = {
"a" : ["b","c"],
"d" : ["e"],
"e" : ["d"]
print(graph)
Output
When the above code is executed, it produces the following result −
{'c': ['a', 'd'], 'a': ['b', 'c'], 'e': ['d'], 'd': ['e'], 'b': ['a', 'd']}
if gdict is None:
gdict = []
self.gdict = gdict
return list(self.gdict.keys())
graph_elements = {
"a" : ["b","c"],
"d" : ["e"],
"e" : ["d"]
g = graph(graph_elements)
print(g.getVertices())
Output
When the above code is executed, it produces the following result −
['d', 'b', 'e', 'c', 'a']
APPLICATIONS OF GRAPHS:
1.In Computer science graphs are used to represent the flow of
computation.
2.Google maps uses graphs for building transportation systems, where
intersection of two(or more) roads are considered to be a vertex and
the road connecting two vertices is considered to be an edge, thus their
navigation system is based on the algorithm to calculate the shortest
path between two vertices.
3.In Facebook, users are considered to be the vertices and if they are
friends then there is an edge running between them. Facebook’s Friend
suggestion algorithm uses graph theory. Facebook is an example of
undirected graph.
4.In World Wide Web, web pages are considered to be the vertices.
There is an edge from a page u to other page v if there is a link of page
v on page u. This is an example of Directed graph. It was the basic idea
behind Google Page Ranking Algorithm.
5.In Operating System, we come across the Resource Allocation Graph
where each process and resources are considered to be vertices. Edges
are drawn from resources to the allocated process, or from requesting
process to the requested resource. If this leads to any formation of a
cycle then a deadlock will occur.
6.In mapping system we use graph. It is useful to find out which is an
excellent place from the location as well as your nearby location. In GPS
we also use graphs.
7.Facebook uses graphs. Using graphs suggests mutual friends. it shows
a list of the f following pages, friends, and contact list.
8.Microsoft Excel uses DAG means Directed Acyclic Graphs.
9.In the Dijkstra algorithm, we use a graph. we find the smallest path
between two or many nodes.
10.On social media sites, we use graphs to track the data of the users.
liked showing preferred post suggestions, recommendations, etc.
11.Graphs are used in biochemical applications such as structuring of
protein, DNA etc.
Representations of Graph:
Here are the two most common ways to represent a graph :
Adjacency Matrix
Adjacency List
Adjacency Matrix:
An adjacency matrix is a way of representing a graph as a matrix of
boolean (0’s and 1’s).
Let’s assume there are n vertices in the graph So, create a 2D matrix
adjMat[n][n] having dimension n x n.
1.If there is an edge from vertex i to j, mark adjMat[i][j] as 1.
2.If there is no edge from vertex i to j, mark adjMat[i][j] as 0.
Representation of Undirected Graph to Adjacency Matrix:
The below figure shows an undirected graph. Initially, the entire Matrix
is initialized to 0. If there is an edge from source to destination, we
insert 1 to both cases (adjMat[destination] and adjMat[destination])
because we can go either way.
Adjacency List
An array of Lists is used to store edges between two vertices. The size of array is
equal to the number of vertices (i.e, n). Each index in this array represents a
specific vertex in the graph. The entry at the index i of the array contains a linked
list containing the vertices that are adjacent to vertex i.
Let’s assume there are n vertices in the graph So, create an array of list of size n as
adjList[n].
adjList[0] will have all the nodes which are connected (neighbour) to vertex 0.
adjList[1] will have all the nodes which are connected (neighbour) to vertex 1 and
so on.
while queue:
node = queue.pop(0)
It's Sarah because she has represented the relationship between the faculties while John has only provided a one-sided list that does not
show who works under whom. John's list is a linear data structure as you might have guessed, while Sarah's tree is a non-linear data
structure.
A Non-Linear Data Structure is one in which its elements are not Q. Why is LinkedList Non-Linear?
connected in a linear fashion, as suggested by its name itself. Even though it might seem that LinkedList should be
In such a data structure elements might be connected in a hierarchical Linear due to its sequential connection of elements, you
manner like a tree or graph, or it may be nonhierarchical like in a must remember that there is no contiguous memory
LinkedList. Non-linear data structures have a more complex structure in a LinkedList. All the elements of a LinkedList
implementation than their linear counterparts. are spread across the memory in a Non-Linear fashion,
hence it is a Non-Linear Data Structure.
This session introduces Non-Linear Data structure, explores examples
of non-linear data structure, and goes through the differences Linear Data Structure Non-Linear Data Structure
between linear and nonlinear data structures. Elements are connected Elements are not
sequentially or in a connected sequentially or
The main advantage of non-linear data structure is that it uses
contiguous manner. in a contiguous manner.
memory very efficiently than linear data structures.
Elements are always Elements may be present in
Let us now analyze the key points of a Non-Linear Data Structure: present in a single level single or multiple levels.
Elements are not arranged sequentially. There is no hierarchy There is usually a hierarchy
between the elements. between elements.
One element can be connected to multiple elements.
They are easier to They have a more complex
There might be a hierarchical structure present. implement. implementation.
Here, memory is not allocated in a contiguous manner, unlike Memory allocation is Memory allocation isn’t
linear data structure. sequential. sequential.
Can be traversed in a Requires multiple runs for
Examples of Non-Linear Data Structure: single run. traversal.
Inefficient utilization of Memory is utilized
Some examples of non-linear data structures are LinkedList, Trees, and memory. efficiently.
Graphs. We'll now go through each of them and understand why they are
Examples include arrays, Examples include trees,
called nonlinear data structures.
hash tables, stack, queue graphs etc.
Tree: As you might have figured it, the tree is a data structure that is General Tree: A tree that can contain any number of subtrees is
both nonlinear as well as hierarchical. Here elements are arranged in known as a general tree.
multiple levels, and each level can be completely or partially filled. Let
us now go through some of the basic terminologies of a tree-
However, for the leftmost figure, 2 lies in the right subtree of 3 and
has a lesser value than 3, whereas, in a Binary Search Tree, all the
nodes in the right subtree should hold a key with a value greater
than the node's key value. Hence, the leftmost figure is not a
Binary Search Tree. Parent Node Height of left Height of Height
of Subtree subtree right subtree difference
12 3 2 1
8 2 1 1
18 1 0 1
5 1 0 1
11 0 0 0
17 0 0 0
4 0 0 0
4. Degenerate/Pathological Tree
A degenerate or pathological tree is
the tree having a single child either
left or right.
Representation of binary trees How to identify the left child, right child, and parent of any node that is
The Binary tree means the node can have a maximum of two represented in the sequential form?
children. Here, the binary name suggests that ‘two’; In instance 1
therefore, each node can have either 0, 1, or 2 children. A
binary tree data structure is represented using two methods.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Those methods are as follows...
The main operations in a binary tree are: search, In In-Order traversal, the root node is visited between the left child and
insert and delete. When we wanted to display a the right child. In this traversal, the left child node is visited first, then
binary tree, we need to follow some order in which all the root node is visited, and later, we go for visiting the right child node.
the nodes of that binary tree must be displayed. In any This in-order traversal is applicable for every root node of all subtrees in
binary tree, displaying order of nodes depends on the the tree. This is performed recursively for all nodes in the tree.
traversal method.
In the above example of a binary tree, first, we try to visit the left child
Displaying (or) visiting order of nodes in a binary tree of root node ‘A’, but A’s left child ‘B’ is a root node for the left subtree.
is called as Binary Tree Traversal. So we try to visit its (B’s) left child ‘D’, and again D is a root for subtree
with nodes D, I, and J. So we try to visit its left child, 'I', and it is the
There are three types of binary tree traversals. leftmost child. So first we visit 'I’ , then go for its root node 'D' and later
we visit D's right child 'J'. With this, we have completed the left part of
In - Order Traversal node B. Then visit 'B’, and next B's right child 'F' is visited. With this, we
Pre - Order Traversal have completed the left part of node A. Then visit root node 'A'. With
Post - Order Traversal this, we have completed the left and root parts of node A. Then we go
for the right part of node A. In the right of A again, there is a subtree
Consider the following binary tree... with root C. So go for a left child of C, and again it is a subtree with root
G., But G does not have a left part, so we visit ’G’ and then visit G’s right
child K. With this, we have completed the left part of node C. Then visit
root node 'C' and next visit C's right child 'H' which is the rightmost
child in the tree. So we stop the process.
In Pre-Order traversal, the root node is visited before the left child and right child
nodes. In this traversal, the root node is visited first, then its left child and later its
right child. This pre-order traversal is applicable for every root node of all subtrees in
the tree.
In the above example of a binary tree, first, we visit root node 'A' then visit its left
child 'B' which is a root for D and F. So we visit B's left child 'D’, and again D is a root
for I and J. So we visit D’s left child, ’I’ which is the leftmost child. So next, we go to
visit D’s right child ’J’. With this, we have completed the root, left, and right parts of
node D and the root, and left parts of node B. Next, visit B’s right child ’F’. With this,
we have completed the root and left parts of node A.
So we go for A's right child 'C' which is a root node for G and H. After visiting C, we go
for its left child 'G' which is a root for node K. So next, we visit the left of G, but it does
not have the left child, so we go for G’s right child, ’K’. With this, we have completed
node C’s root and left parts. Next, visit C's right child 'H' which is the rightmost child in
the tree. So we stop the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order Traversal.
3. Post - Order Traversal ( left child – right child – root )
In Post-Order traversal, the root node is visited after the left child and right child. In this traversal, the left child node is visited first, then its
right child, and then its root node. This is recursively performed until the rightmost node is visited.
Here we have visited in the order of I - J - D - F - B - K - G - H - C - A using Post-Order Traversal.
C Binary search Tree with an Example C Code (Search, Delete, Insert Nodes)
Binary tree is the data structure to maintain data into memory
of program. There exists many data structures, but they are
chosen for usage on the basis of time consumed in
insert/search/delete operations performed on data structures.
Binary tree is basically tree in which each node can have two
child nodes and each child node can itself be a small binary
tree. To understand it, below is the example figure of binary
tree.
Binary tree works on the rule that child nodes which are lesser
than root node keep on the left side and child nodes which are
greater than root node keep on the right side. Same rule is
followed in child nodes as well that are itself sub-trees. Like in
above figure, nodes (2, 4, 6) are on left side of root node (9)
and nodes (12, 15, 17) are on right side of root node (9).
We will understand binary tree through its operations. We will cover following
operations.
•Create binary tree
•Search into binary tree
•Delete binary tree
•Displaying binary tree
Creation of binary tree: Binary tree is created by inserting root node and its
child nodes. We will use a C programming language for all the examples. Below is
the code snippet for insert function. It will insert nodes.
This function would determine the position as per value of node to be added and
new node would be added into binary tree. Function is explained in steps below
and code snippet lines are mapped to explanation steps given below.
[Lines 13-19] Check first if tree is empty, then insert node as root.
[Line 21] Check if node value to be inserted is lesser than root node value, then
•a. [Line 22] Call insert() function recursively while there is non-NULL left node
•b. [Lines 13-19] When reached to leftmost node as NULL, insert new node.
[Line 23] Check if node value to be inserted is greater than root node value, then
• a. [Line 24] Call insert() function recursively while there is non-NULL right node
• b. [Lines 13-19] When reached to rightmost node as NULL, insert new node.
Searching into binary tree: Searching is done as per value of node to
be searched whether it is root node or it lies in left or right sub-tree.
Below is the code snippet for search function. It will search node
into binary tree.
This search function would search for value of node whether node
of same value already exists in binary tree or not. If it is found, then
searched node is returned otherwise NULL (i.e. no node) is
returned. Function is explained in steps below and code snippet
lines are mapped to explanation steps given below.
This function would delete all nodes of binary tree in the manner –
left node, right node and root node. Function is explained in steps
below and code snippet lines are mapped to explanation steps given
below.
•Pre-order displays root node, left node and then right node.
•In-order displays left node, root node and then right node.
•Post-order displays left node, right node and then root node.
These functions would display binary tree in pre-order, in-order and post-order
respectively. Function is explained in steps below and code snippet lines are mapped
to explanation steps given below.
Pre-order display
In-order display
a. [Line 37]Call print_inorder() function recursively while there is non-NULL left node
b. [Line38] Display value of root node.
c. [Line 39] Call print_inorder() function recursively while there is non-NULL right node
Post-order display
a. [Line 44] Call print_postorder() function recursively while there is non-NULL left node
b. [Line 45] Call print_postorder() function recursively while there is non-NULL right node
c. [Line46] Display value of root node.
Working program: It is noted that above code snippets are parts of below C program. This below program would be working basic program for binary tree.
#include<stdlib.h>#include<stdio.h> void print_preorder(node * tree) void deltree(node * tree)
struct bin_tree { {
{ if (tree) if (tree)
int data; { {
struct bin_tree * right, * left; printf("%d\n",tree->data); deltree(tree->left);
}; print_preorder(tree->left); deltree(tree->right);
typedef struct bin_tree node; print_preorder(tree->right); free(tree);
} }
void insert(node ** tree, int val) } }
{ node* search(node ** tree, int val)
node *temp = NULL; void print_inorder(node * tree) {
if(!(*tree)) { if(!(*tree))
{ if (tree) {
temp = (node *)malloc(sizeof(node)); { return NULL;
temp->left = temp->right = NULL; print_inorder(tree->left); }
temp->data = val; printf("%d\n",tree->data); if(val < (*tree)->data)
*tree = temp; return; print_inorder(tree->right); {
} } search(&((*tree)->left), val);
} }
if(val < (*tree)->data) else if(val > (*tree)->data)
{ void print_postorder(node * tree) {
insert(&(*tree)->left, val); { search(&((*tree)->right), val);
} if (tree) }
else if(val > (*tree)->data) { else if(val == (*tree)->data)
{ print_postorder(tree->left); {
insert(&(*tree)->right, val); print_postorder(tree->right); return *tree;
} printf("%d\n",tree->data); }
} } }
}
void main() Output of Program: It is noted that binary tree figure used at top of article can be
{ referred to under output of program and display of binary tree in pre-order, in-
node *root; node *tmp; root = NULL; order and post-order forms.
Now, you may wonder that for the same elements, we can have two Binary Search Trees having drastically different heights
and search times. Hence, there must be a way to control the height of the BST such that we always achieve logarithmic
search time complexity irrespective of the order of the elements. This can be achieved by checking when the Binary Search
Tree starts becoming skewed (Balancing Criteria) and performing certain operations to limit this skewness. This way, we
can control the tree’s height and achieve a logarithmic time complexity for almost all the operations. This is exactly
where AVL Trees come into action.
Highlights:
1.BSTs are binary trees in which all elements in the left subtree of a node are smaller while the elements in the right
subtree are larger than that node.
2.BSTs are useful for performing searches on dynamic datasets.
3.As the operations performed using BSTs always start from the root and traverse down the tree, the time complexity
of BSTs depends upon the tree’s height.
4.BSTs can be skewed (unbalanced) or balanced depending upon the order of insertion of the elements.
5.Balanced BSTs provide logarithmic time complexity because of their optimal height.
What is an AVL Tree?: AVL Tree, named after its inventors Adelson-Velsky and Landis, is a special variation of Binary Search
Tree which exhibits self-balancing property, i.e., AVL Trees automatically attain the minimal possible height of the tree after
the execution of any operation. The AVL Trees implement the self-balancing property by attaching extra information known
as the balance factor to each node of the tree, then verifying that the balance factor for all the nodes of the tree follows
certain criteria (Balancing Criteria) upon the execution of any operation that affects the height of the tree, and finally
applying certain Tree Rotations to maintain this criterion of height-balancing.
The Criterion of height balancing is a principle that determines whether a Binary Search Tree is unbalanced (skewed). It
states that:
Tip: A Binary Search Tree is considered to be balanced if any two sibling subtrees present in the tree don’t differ in height by
more than one level, i.e., the difference between the height of the left subtree and the height of the right subtree for all the
nodes of the tree should not exceed unity. If it exceeds unity, then the tree is known as an unbalanced tree.
Since skewed or unbalanced BSTs provide inefficient search operations, AVL Trees prevent unbalancing by defining a
balance factor for each node. Let's look at what exactly is this balancing factor.
Highlights:
1.AVL Trees were developed to achieve logarithmic time complexity in BSTs irrespective of the order in which the elements
were inserted.
2.AVL Tree implemented a Balancing Criteria (For all nodes, the subtrees’ height difference should be at most 1) to
overcome the limitations of BST.
3.It maintains its height by performing rotations whenever the balance factor of a node violates the Balancing Criteria. As a
result, it has self-balancing properties.
4.It exists as a balanced BST at all times, providing logarithmic time complexity for operations such as searching.
Balance Factor: The balance factor in AVL Trees is an additional value associated with each tree node that represents the height difference
between the left and the right sub-trees of a given node. The balance factor of a given node can be represented as:
Now, in the unbalanced tree example, we can observe that the tree is left-skewed i.e., the height of the left subtree is much greater than
that on the right subtree. This is clearly an unbalanced tree as it is highly skewed. This is also indicated by the balance factor of the node as it
doesn’t follow the Balancing Criteria.
Hence, AVL Trees make use of the balance factor to check whether a given node is left-heavy (height of left sub-tree is one greater than that
of right sub-tree), balanced, or right-heavy (height of right sub-tree is one greater than that of left sub-tree). Hence, using the balance factor,
we can find an unbalanced node in the tree and can locate where the height-affecting operation was performed that caused the imbalance
of the tree.
NOTE: Since the leaf nodes don't contain any subtrees, the balance factor for all the leaf nodes present in the
Binary Search Tree is equal to 0.
Upon the execution of any height-affecting operation on the tree, if the magnitude of the balance factor of a
given node exceeds unity, the specified node is said to be unbalanced as per the Balancing Criteria. This
condition can be mathematically represented with the help of the given equation:
bf=(hl−hr),s.t.bf∈[−1,0,1] Or ∣bf∣=∣hl−hr∣≤1
Here, the above equation indicates that the balance factor of any given node can only take the value of -1, 0,
and 1 for a height-balanced Binary Search Tree. To maintain this criterion for all the nodes, AVL Trees take
use of certain Tree Rotations that are discussed later in this article.
Highlights:
1.Balance Factor represents the height difference between a given node’s left and right sub-trees.
2.For leaf nodes, the balance factor is 0.
3.AVL balance criteria: |bf| ≤ 1 for all nodes.
4.Balance factor indicates whether a node is left heavy, right heavy, or balanced.
AVL Tree Rotation: As discussed earlier, the AVL Trees make use of the balance factor to check whether a given node is left-heavy (height
of left sub-tree is one greater than that of right sub-tree), balanced, or right-heavy (height of right sub-tree is one greater than that of left
sub-tree). If any node is unbalanced, it performs certain Tree Rotations to re-balance the tree.
Tree Rotations: It is the process of changing the tree’s structure by moving smaller subtrees down and larger subtrees up, without
interfering with the order of the elements.
If the balance factor of any node doesn't follow the AVL Balancing criterion, the AVL Trees make use of 4 different types of Tree rotations
to re-balance themselves. These rotations are classified based on the node imbalance cured by them i.e., a specific rotation is applied to
counter the change that occurred in the balance factor of a node making it unbalanced.
1. Insertion: In Binary Search Trees, the new node (let say N) was inserted in the tree
by traversing it using BST logic to locate a node with NULL as its child that can be
replaced to insert the new node N. Hence, in BSTs a new node is always inserted as a
leaf node by replacing the NULL value of a node’s child.
Just like the insertion in BSTs, the new node is always inserted as a leaf node in AVL
Trees i.e., the balance factor of the newly inserted node is always equal to 0. However,
after each insertion in the tree, the balance factor for the ancestors of the newly
inserted node is checked to verify that the tree is balanced or not. Here, only the
ancestors of the inserted node are checked for imbalance because when a new node is
inserted, it only alters the height of its ancestors, thereby inducing an imbalance in the
tree. This process of finding the unbalanced node by traversing the ancestors of the
newly inserted node is known as retracing. If the tree becomes unbalanced after
inserting a new node, retracing helps us find the node’s location in the tree at which
we need to perform the tree rotations to balance the tree.
The below gif demonstrates the retracing process upon inserting a new element in the
AVL Tree:
Let’s look at the algorithm of the insertion operation in AVL Trees:
1. START
6. END
After swapping the array element 81 with 54 and converting the heap
into max-heap, the elements of array are –
After converting the given heap into a max heap, the array
elements are – In the next step, we have to delete the root element (76) from
the max heap again. To delete this node, we have to swap it
Next, we must delete the root element (89) from the max heap. with the last node, i.e. (9). After deleting the root element, we
To delete this node, we have to swap it with the last node, again have to heapify it to convert it into max heap.
i.e. (11). After deleting the root element, we must heapify it to
convert it into a max heap.
Now, heap has only one element left. After deleting it, heap will
be empty.
After swapping the array element 22 with 11 and After completion of sorting, the array elements are –
converting the heap into max-heap, the elements of
array are -
Now, the array is completely sorted.
#include <stdio.h> /*Function to implement the heap sort*/ int main()
/* function to heapify a subtree. Here 'i' is th void heapSort(int a[], int n) {
e index of root node in array a[], and 'n' is t { int a[] = {48, 10, 23, 43, 28, 26, 1};
he size of heap. */ for (int i = n / 2 - 1; i >= 0; i--) int n = sizeof(a) / sizeof(a[0]);
void heapify(int a[], int n, int i) heapify(a, n, i); printf("Before sorting array elements are -
{ // One by one extract an element from he \n");
int largest = i; // Initialize largest as root ap printArr(a, n);
int left = 2 * i + 1; // left child for (int i = n - 1; i >= 0; i--) { heapSort(a, n);
int right = 2 * i + 2; // right child /* Move current root element to end*/ printf("\nAfter sorting array elements are -
// If left child is larger than root // swap a[0] with a[i] \n");
if (left < n && a[left] > a[largest]) int temp = a[0]; printArr(a, n);
largest = left; a[0] = a[i]; return 0;
// If right child is larger than root a[i] = temp; }
if (right < n && a[right] > a[largest])
largest = right; heapify(a, i, 0);
// If root is not largest }
if (largest != i) { }
// swap a[i] with a[largest] /* function to print the array elements */
int temp = a[i]; void printArr(int arr[], int n)
a[i] = a[largest]; {
a[largest] = temp; for (int i = 0; i < n; ++i)
{
heapify(a, n, largest); printf("%d", arr[i]);
} printf(" ");
} }
}
Unit-4
Trees
A tree is a data structure with each node pointing to a number of nodes. Tree is an example of
non-linear data structurcs. A tree structure is a way of representing the hierarchical nature of a
structure in a graphical form.
TYPES OF TREES
1. General trees
2. Forests
3. Binary trees
4. Binary search trees
5. Expression trees
1.General trees
•It store elements hierarchically.
•The top node of a tree is the root node and each node, except the root, has a parent.
• A node in a general tree except the leaf nodes may have zero or more sub-trees. Ex: General
trees which have 3 sub-trees per node are called ternary trees.
•But, the number of sub-trees for any node may be variable. Ex: a node can have 1 sub-tree,
whereas some other node can have 3 sub-trees.
2. Forests
•A forest is a disjoint union of trees. A set of disjoint trees (or forests) is obtained by deleting
the root and the edges connecting the root node to nodes at level 1.
Forest Tree
3. Binary Tree
A binary tree is a finite set of nodes such that
i. T contains a specially designed node called the root of T, and remaining nodes of T
form two disjoint binary trees T1 and T2 which are called left sub tree and right sub
tree respectively.
ii. Each node in binary tree has at most two children. (0,1,2)
iii. T may be empty tree (called empty binary tree)
{15, 25, 28, 30, 35, 40, 45, 50, 55, 60, 70}
2. Preorder Traversal:
Ans: 40, 30, 25, 15, 28, 35, 50, 45, 60, 55, 70
First, visit the root node.
3. Postorder traversal:
Ans: {15, 28, 25, 35, 30, 45, 55, 70, 60, 50, 40}
Step 1: Use the pre-order sequence to determine the root node of the tree. The first element
would be the root node.
Step 2 : Elements on the left side of the root node in the in-order traversal sequence form the
left sub-tree of the root node. Similarly, elements on the right side of the root node in the in-
order traversal sequence form the right sub-tree of the root node.
Step 3: Recursively select each element from pre-order traversal sequence and create its left
and right sub-trees from the in-order traversal sequence.
In binary search trees, all the left subtree elements should be less than root data and all the right
subtree elements should be greater than root data. This is called binary search tree property.
Ex1.
Ex2. Pre-order Sequence : 1 2 4 5 3 6
In-order Sequence : 4 2 5 1 6 3
Ex2.Postorder=[10,18,9,22,4]
Inorder = [10, 4, 18, 22, 9]
Ex3. in = { 12, 25, 30, 37, 40, 50, 60, 62, 70, 75, 87 };
post = { 12, 30, 40, 37, 25, 60, 70, 62, 87, 75, 50 }
Why Binary Search Tree?
To search for an element in binary tree, we need to check both in left subtree and in
right subtree. Due to this, the worst-case complexity of search operation is O(n).
Binary search tree is for searching. In this tree, there is a restriction on the kind of data
a node can contain. As a result, it reduces the worst-case average search operation to
O(logn).
In a Binary search tree, the value of left node must be smaller than the parent node,
and the value of right node must be greater than the parent node. This rule is applied
recursively to the left and right subtrees of the root.
Since root data is always between left subtree data and right subtree data, performing
in-order traversal on binary search tree produces a sorted list.
The basic operations that can be performed on binary search tree (BST) are insertion of
element, deletion or clement, and searching for an element.
While performing these operations on BST the height of the tree gets changed each
time. The basic operations on a binary search tree take time proportional to the height
of the tree.
There is no difference between regular binary tree declaration and binary search tree
declaration. The difference is only in data but not in structure.
class BSTNode:
def __init__ (self, data):
self.data = data
self.left= None
self.right = None
#set data
def setData(self, data):
self.data = data
#get data
def getData(self):
return self.data
#get left child of a node
def getLeft(self):
return self.left
#get right child of a node
def getRight(self):
return self.right
def insert (self, data):
if data < self.data:
if self.left is None:
self.left = BSTNode(data)
else:
self.left.insert(data)
else:
if self.right is None:
self.right = BSTNode(data)
else:
self.right.insert(data)
def find(self,data):
if data < self.data:
if self.left is None:
return False
else:
return self.left.find(data)
elif data > self.data:
if self.right is None:
return False
else:
return self.right.find(data)
else:
return True
def preorder(self):
print(self.key)
if self.left:
self.left.preorder()
if self.right:
self.right.preorder()
def inorder(self):
if self.left:
self.left.inorder()
print(self.key,end=” “)
if self.right:
self.right.inorder()
def postorder(self):
if self.left:
self.left.postorder()
if self.right:
self.right.postorder()
print(self.key,end=” “)
def delete(self,data):
if self.data is None:
print(“Tree is empty”)
return
if data < self.data: #finding the position of element, perform
search operation
if self.left:
self.left=self.left.delete(data)
else:
print(“given node is not present)
elif data > self.data:
if self.right:
self.right=self.right.delete(data)
else:
print(“given node is not present)
else: #delete node
if self.left is None: #if no child present
temp = self.right
self = None
return temp
if self.right is None:
temp = self.left
self = None
return temp
node = self.right
while node.left:
node = node.left
self.node = node.data
self.right = self.right.delete(node.data)
return self
tree = BSTNode(10)
tree.insert(30)
tree.insert(3)
tree.insert(5)
tree.insert(8)
tree.insert(7)
print(tree.left.left.left.data)
print(tree.find(7))
tree.preorder()
tree.inorder()
tree.postorder()
tree.delete(2)
If the data we are searching is less than nodes data then search left subtree of current node;
otherwise search right subtree of current node. If the data is not present, we end up in a NULL
link.
To insert data into binary search tree, first we need to find the location for that element. While
finding the location, if the data is already there then we can simply neglect and come out.
Otherwise, insert data at the last location on the path
traversed.
Delete a Node
Deletion in BST has been divided into 3 cases:
1. Node to be deleted is leaf
replace the leaf node with the NULL and simple free the allocated space. we are deleting
the node 85, since the node is a leaf node, therefore the node will be replaced with
NULL
The node which is to be deleted, is replaced with its in-order successor or predecessor
recursively until the node value (to be deleted) is placed on the leaf of the tree. After the
procedure, replace the node with NULL and free the allocated space.
Inorder Predecessor and Successor If X has two children then its inorder predecessor is the
maximum value in its left subtree(righmost node in left subtree) and its inorder successor the
minimum value in its right subtree( leftmost node in right subtree).
In the following image, the node 50 is to be deleted which is the root node of the tree. The in-
order traversal of the tree given below.
replace 50 with its in-order successor 52. Now, 50 will be moved to the leaf of the tree, which
will simply be deleted.
The in-order predecessor or the successor can then be deleted using any of the case-1 or case-
2.
UNIT-IV(Trees)
leaves
branches
root
Computer Scientist’s View
root
leaves
branches
nodes
Trees
Definition:- Tree is a non linear data structures. It is a collection of entities called nodes.
A tree is a finite set of one or more nodes such that:
i) There is a specially designated node called the root.
ii) Remaining nodes are partitioned into ‘n’ (n>0) disjoint sets T 1,T2,..Tn, where each
Ti (i=1,2,….n) is a Tree, T1,T2,..Tn are called sub tree of the root.
Trees
Definition:- Tree is a non linear data structures. It is a collection of entities called nodes.
A tree is a finite set of one or more nodes such that:
i) There is a specially designated node called the root.
ii) Remaining nodes are partitioned into ‘n’ (n>0) disjoint sets T 1,T2,..Tn, where each
Ti (i=1,2,….n) is a Tree, T1,T2,..Tn are called sub tree of the root.
structure of a tree
T1 A T3
T2
B C D
E F G H I J
K L
Tree Terminology
Root: node without parent (A) Subtree: tree consisting of a node and its
Siblings: nodes share the same parent descendants.
Internal node: node with at least one child Sibling: node have same parent . I,J,K has
(A, B, C, F) parent F.
External node (leaf ): node without
children (E, I, J, K, G, H, D)
Ancestors of a node: parent, grandparent, A
grand-grandparent, etc.
Descendant of a node: child, grandchild,
grand-grandchild, etc.
Level : set of nodes with same depth called B C D
depth and root is termed as in level 0. If a
node is at level i and its child is at level i+1
and parent is at level i-1. This is true for all
except the root node.
E F G H
Depth of a node: number of ancestors, ie
length of the path from the root to the node
Height of a node: length of the path from
that node to the deepest node
Height of a tree: maximum depth of any I J K
node
Degree of a node: the number of its children
Degree of a tree: the maximum number of
its node.
Tree Terminology
Depth of a node: number of ancestors, ie length
of the path from the root to the node
Height of a node: length of the path from that
node to the deepest node
Tree Terminology
• Height of a tree:
• The height of tree is number of edges present in longest path of a tree
• A leaf node will have a height of 0.
• Height of a node is the number of edges on the longest path from the node
to a leaf.
• Height of node A is 3 that is from
A->C->D->E
Not from A to G
Tree Terminology
• The depth of a node is the number of edges from root node to that particular node.
A root node will have a depth of 0.t
5 5
5
1
3 2 3 2
3 2
4 1 6 4 6
4 1 6
tree Not a tree Not a tree
Types of trees
1. Binary tree
2. Binary search tree
3. Heap tree
Binary Trees
• Definition :-
A binary tree is a finite set of nodes such that
i. T is empty tree (called empty binary tree)
ii. T contains a specially designed node called the root of T, and remaining
nodes of T form two disjoint binary trees T 1 and T2 which are called left
sub tree and right sub tree respectively.
iii. Each node in binary tree has at most two children. (0,1,2)
D E F G
left Right
sub sub H I J
tree tree
K
• Difference between tree and binary tree
Trees
1.Tree never be empty Binary tree
2. A node may have any 1. May be empty
number of child nodes. 2. A node may have
at most two children
0, 1, or 2
Three special situations of a binary tree are possible
1.Full binary tree
2.Complete binary tree.
3. Strict binary tree
Types of binary Trees
Full binary tree
A binary tree is a full binary tree, if it contains
maximum possible number of nodes in all levels.
Ie. each node will have exactly two children and all leaf
nodes at same level.
Level 0- 1node
1
2 3 Level 1-2nodes
Level 3-8nodes
8 9 10 11 12 13 14 15
Types of binary Trees
Full binary tree
A binary tree is a full binary tree, if it contains
maximum possible number of nodes in all levels.
Ie. each node will have exactly two children and all leaf
nodes at same level.
Level 0- 1node
1
2 3 Level 1-2nodes
Level 3-8nodes
8 9 10 11 12 13 14 15
Number of nodes in a full binary tree is 2h+1-1
Types of binary Trees
Full binary tree
A binary tree is a full binary tree, if it contains
maximum possible number of nodes in all levels.
Ie. each node will have exactly two children and all leaf
nodes at same level.
Level 0- 1node
1
2 3 Level 1-2nodes
Level 3-8nodes
8 9 10 11 12 13 14 15
Number of nodes in a full binary tree is 2h+1-1
1 Level 0-1node
A
1 Level 0-1node
2 3 Level 1-2 nodes
4 5 6 7 Level 2- 4 nodes
8 9 Level 3- 2 nodes
Number of nodes in a complete binary tree is between 2h (minimum) and 2h+1-1 (maximum)
Types of binary Trees
B C
D E F G
H I
Applications of Binary Trees
Expression trees used in compilers
Huffman code trees used in data compression
algorithms.
B- trees used in data bases.
B C
D E F G
H I J
K
Binary Tree Traversals
Preorder (DLR) Traversal
Example
8
Binary Tree Traversals
Inorder (LDR) Traversal
8
Inorder of binary tree
A Inorder- D,B,H,E,I,A,F,K,J,C,G
B C
D E F G
J
H I
K
In order of binary tree
A Inorder- D,B,H,E,I,A,F,K,J,C,G
B C
D E F G
H J
I
K
Binary Tree Traversals
Inorder (LDR) Traversal
Example
8
Binary Tree Traversals
Postorder (LRD) Traversal
8
Post order of binary tree
B C
D E F G
J
H I
K
Binary Tree Traversals
Level Order Traversal
B C
D E F G
J
H I
K
Binary Tree Traversals
Level Order Traversal
Example
8
Representing an Expression in Binary tree and
applying Traversals
Write pre,in and post order of the following tree
• (A-B) + C* (D/E)
+
- *
A B C /
D E
Preorder for the below tree
+
Pre order- +,-,A,B,*,C,/,D,E.
- *
A B C /
D E
Inorder for the below tree
+
Pre order- +,-,A,B,*,C,/,D,E.
- *
A B C /
In order- A,-,B,+,C,*,D,/,E D E
Postorder for the below tree
• (A-B) + C* (D/E)
+
- *
A B C /
Pre order- +,-,A,B,*,C,/,D,E.
In order- A,-,B,+,C,*,D,/,E D E
The nodes are stored level by level, starting from the zero level
where root node is present.
the following rules can be used to decide the location of any node of
a tree in the array.
a. The root node is at location 0.
b. If a node is at a location ‘i’, then its left child is located at 2 * i + 1
and right child is located at 2 * i + 2
c. The space required by an n node binary tree is 2n+1.
Example- linear\sequential representation
1
A
2 3
B D
4 6 7
C E G
13
F
A B D C . E G . . . . . F
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sequential representation
Advantages of linear representation
1. Any node can be accessed from any other node by calculating the index
and this is efficient from execution point of view.
2. There is no overhead of maintaining the pointers.
3. Some programming languages like BASIC, FORTRAN, where dynamic
allocation is not possible, array representation is the only way to store a
tree.
Disadvantages
1. Other than full binary tree, if you store any tree most of the memory locations are
empty.
2. Allows only static representation. There is no possible way to enhance the size of the
tree.
3. Inserting a new node and deleting a node from it are inefficient with this
representation because these require considerable data movement up and down the
array which demand excessive processing time.
Representation of Binary Tree using Linked List
There is no need to have prior knowledge of depth of the tree. Using dynamic
memory allocation concept one can create as much memory (node) as
required.
Insertion and deletion which are the most common operations can be done
without moving the other nodes.
root->rchild=createNode(40);
root->lchild->lchild=createNode(50);
root->rchild->lchild=creatNode(60); 30 40
return 0;
} 50 60
Tree traversals
void preorder(struct node *root)
{ 20
if(root)
{
printf(“%d “,root->data); 30 40
preorder(root->lchild);
preorder(root->rchild);
} 50 60
}
20 30 50 40 60
Inorder traversal
void inorder(struct node* root)
{
if(root) 20
{
inorder(root->lchild);
printf(“%d “,root->data);
inorder(root->rchild); 30 40
}
}
50
50 60
Post order traversal
void postorder(struct node *root)
{
if(root)
{
postorder(root->lchild);
postorder(root->rchild);
printf(“%d “, root->data);
}
}
Height of Binary Tree
UNIT-IV(Trees)
All<K All>K
• The following are the BST ( binary search trees)
35
18
45
15 20
12 17 25 40 50
20
15 25 Its not a binary search
tree because it fails to satisfy
12 the property of 3 and 4
18 22
Binary search tree representation
22>20
20
Search node-22
15
25
12
18 22
Insertion operation on BST
• Insertion operation on a binary search tree is very simple. In fact, one step
more than the searching operation.
• To insert a node with data, say ITEM, into a tree, the tree is to be searched
starting from the root node.
• If ITEM is found, do nothing, otherwise ITEM is to be inserted at the dead
end where search halts.
6 6
Insert node- 5 2 8 2 8
1 4 1 4
3
3 5
a) Before insertion b) After inserting node 5
Construct BST with following elements
19, 55,44,98,8,23,15,6,10,82,99
Root
19
8 55
6 15
44 98
10
23
82 99
Deletion operation on BST
• Another frequently used operation on the BST is to delete any node from it.
This is slightly complicated one.
• To delete a node from a binary search tree, there are three possible cases
when we delete a node.
case-1- if deleted node is a leaf node or node has no children, it can be deleted
by making its parent pointers ( left, right) to NULL.
30
30
25 40
25 40
20 20 45
35 45
1 7
4 7
4
case-3
• If deleted node has two children
• First find out the inorder successor (X) of the deleted node ( in order
successor means the node which comes after the deleted node during the
inorder traversal).
• Delete a node X from tree by using case 1 or case 2 (it can be verified that
x never has a left child) and then replace the data content in deleted node
by the data of the x.
35
35
45 24 45
20
16 42 16 29 42
29
24 27 33
33
x
27
Insert a Node
Delete a Node
Search for a Node
Applications of Trees
Trees are most useful data structures in computer science. Some of the
applications of trees are
1. Library database in library, students database in schools and colleges,
patients database in hospitals, employee database in an organization or any
database implemented using trees.
2. The file system in your computer i.e folders and all files, would be stored
as a tree
3. When you search for a word or misspelled you would get list of possible
corrected words, you are using tree.
4. When you watch you tube video’s or surfing internet you would get all the
information in your computer which is somewhere in the world would
travel to your computer through many intermediate computers called
routers. Routers uses tree and graphs for routing data.
Properties of binary trees
4. The minimum no of nodes possible at every level is only one node. When every parent
node has on child such kind of tree is called skew binary tree.
Fig-b
Fig-a
5. for any non empty binary tree, if n is the number of nodes and e is the number of edges, then
n= e+1
6. for any non empty binary tree T, if n0 is the no. of leaf nodes and n2 is
the no. of internal nodes (degree-2), then
n0= n2+1
No of internal nodes n2 =3
No of leaf nodes n0 = 3+1=4
7. Minimum height of a binary tree with n number of nodes is
log2(n+1)
8. Total no of binary trees possible with n nodes is
(1/(n+1) )*2nCn
• Total no of binary tree possible with 3 nodes is- 5- A,B,C
A A A A
A
B B B B
B C
C C C
1 C
2 3 5
4
18CS C07 - Data Structures
Unit-4
directed graph
Examples Undirected graph
3 0 in:1, out: 1
0
0 2
1 2 1 in: 1, out: 2
3 3
3 1 2 3 3 4 5 6 2 in: 1, out: 0
2
1 1 2
3 3 B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• Cut vertex: A vertex which when deleted would disconnect the
remaining graph.
• isolated node : degree of a node is zero. a vertex is not an end-point
of any edge
• Parallel Edge/Multiple edge: Two distinct edges are parallel if they
connect the same pair of vertices.
• Loop: An edge that has identical end-points is called a loop. That is, e
= (A, A).
C B C
B
D D
Regular graphs
d e
Path: a c d a
Path: b e c Cycle
Simple path B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• Connected graph: A graph is said to be connected if for any two
vertices (u, v) in V there is a path from u to v. That is to say that there
are no isolated nodes in a connected graph. A connected graph that
does not have any cycle is called a tree.
• strongly connected graph: A directed graph is said to be strongly
connected if for every pair of distinct vertices vi,vj in G, there is a path
from vi to vj and also from vj to vi.
• Weakly connected graph: a directed graph is not strongly connected
but it is connected
0 0 0 1 2 0
1 2 G1
1 2 3 1 2
3
G2 G3 3
G
G4
• Since an adjacency matrix contains only 0s and 1s, it is called a bit matrix or
a Boolean matrix.
B.Poonguzharselvi,Assistant Professor, CSE,
CBIT
• For a simple graph (that has no loops), the adjacency matrix has 0s
on the diagonal.
• The adjacency matrix of an undirected graph is symmetric.
• The memory use of an adjacency matrix is O(n2), where n is the
number of nodes in the graph.
• The adjacency matrix for a weighted graph contains the weights of
the edges connecting the nodes.
Example -1 Example -2
Example -3
Example-1
or or or
Hyderabad
580km 270km
700km
Banglore vijayawada
650
1000km
980km
Mumbai
Find a short distance to cover all four cities
2 2
5 3 3
4
1 1
C 2 D 3
3 1 2
E F
4
A B
2
C 2 D
3 1 2
B.Poonguzharselvi,Assistant Professor, CSE, E F
CBIT
Example
Consider an undirected, weight graph
3
10
F C
A 4
4
3
8
6
5
4
B D
4
H 1
2
3
G 3
E
A 4
4
3 (D,E) 1 (B,E) 4
8 (B,F) 4
6 (D,G) 2
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
8 4
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
4
8
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
4
8
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 4 3 (D,E) 1 (B,E) 4
4
8
6 (D,G) 2 (B,F) 4
5
4
B D (E,G) 3 (B,H) 4
4
H 1
(C,D) 3 (A,H) 5
2
3 (G,H) 3 (D,F) 6
G 3
E (C,F) 3 (A,B) 8
(B,C) 4 (A,F) 10
A 3 (D,E) 1 (B,E) 4
4
(D,G) 2 (B,F) 4
5
B D (E,G) 3 (B,H) 4
H 1
(C,D) 3 (A,H) 5
2
3
G E
(G,H)
(C,F)
(B,C)
3
3
4
(D,F)
(A,B)
(A,F)
6
8
10
}
not
considered
Done
Total Cost = dv = 21
Done
3 1 2
E F
4
Adjacency matrix Minimum Spanning Tree for the above graph is:
A B C D E F
A - 5 4 6 2 -
A B
B 5 - - 2 - 3 2
C 4 - - - 3 - 2
C D
D 6 2 - - 1 2
E 2 - 3 1 - 4 3 1 2
F - 3 - 2 4 - B.Poonguzharselvi,Assistant Professor, CSE, E F
CBIT
Graph
Two edges are parallel if they connect the same pair of vertices.
A subgraph is a subset of a graph's edges (with associated vertices) that form a graph.
A path in a graph is a sequence of adjacent vertices. Simple path is a path with no repeated vertices. In the graph below, the
dotted lines represent a path from G to E.
A cycle is a path where the first und last vertices ore the same. A simple cycle is a cycle with no repeated vertices or edges
(except the first and last vertices).
A graph is connected if there is a path from every vertex to every other vertex.
In the following graph, it is possible to travel from one vertex to any other vertex. For example, one can traverse from vertex
‘a’ to vertex ‘e’ using the path ‘a-b-e’.
A bipartite graph is a graph whose vertices can be divided into two sets such that all edges connect a vertex in one set
with a vertex in the other set. Or it is a set of graph vertices decomposed into two disjoint sets such that no two graph
vertices within the same set are adjacent.
Ex1.
Ex2.
Here, we partition the vertex set V= { A,B,C,D,E} into two disjoint vertex sets V1 = {A,D} and V2 = {B,C,E}.
In weighted graphs integers (weights) are assigned to each edge to represent (distances or costs).
Applications of Graphs
Representing Graphs
1. An adjacency matrix can be thought of as a table with rows and columns. The row labels and column labels represent the
nodes of a graph. An adjacency matrix is a square matrix where the number of rows, columns and nodes are the same. Each
cell of the matrix represents an edge or the relationship between two given nodes.
Directed Graph adjacency matrix: if there exists a directed edge from a given node to another, then the corresponding cell will be
marked one else zero.
Undirected weighted graph represenation
2. An adjacency list represents a graph as an array of linked lists. The index of the array represents a vertex and each element
in its linked list represents the other vertices that form an edge with the vertex.
Graph Traversals: graph search algorithm can be thought of as starting at some source vertex in a graph and
"searching" the graph by going through the edges and visiting the vertices.
1. Depth First Traversal:
Initially all vertices are marked unvisited (false). The DFS algorithm starts at a vertex u in thc graph. By starting at vertex u it
considers the edges from u to other vertices. If the edge leads to an already visited vertex, then backtrack to current vertex u. If
an edge leads to an unvisited vertex, then go to that vertex and start processing from that vertex. That means the new vertex
becomes the current vertex. Follow this process until the dead-end is reached. At this point start backtracking. The process
terminates when backtracking leads back to the start vertex.
Ex.
def add_node(v):
if v in G:
print(v, “is already present in graph”)
else:
G[v] = []
def add_edge(v1,v2):
if v1 not in G:
print(v1,”is not present in graph”)
elif v2 not in G:
print(v2,”is not present in graph”)
else:
G[v1].append(v2)
G[v2].append(v1)
def DFS(node,visited,G): #node is the starting node, visited is the visited nodes, G is the dictionary
if node not in G:
print(“Node is not present”)
return
if node not in visited:
print(node)
visited.add(node)
for i in G[node]: #G[node] will give the values of node(A) i.e list of adjacent nodes.
DFS(i,visited,G)
visited = set()
G={}
add_node(“A”)
add_node(“B”)
add_node(“C”)
add_node(“D”)
add_node(“E”)
add_edge(“A”,”B”)
add_edge(“B”,”E”)
add_edge(“A”,”C”)
add_edge(“A”,”D”)
add_edge(“B”,”D”)
add_edge(“C”,”D”)
add_edge(“E”,”D”)
print(G)
DFS(“A”,visited, G)
Start by putting any one of the graph's vertices at the back of a queue.
Take the front item of the queue and add it to the visited list.
Create a list of that vertex's adjacent nodes. Add the ones which aren't in the visited list to the back of the queue.
Keep repeating steps 2 and 3 until the queue is empty.
Python code:
def add_node(v):
if v in G:
print(v, "is already present in graph")
else:
G[v] = []
def add_edge(v1,v2):
if v1 not in G:
print(v1,"is not present in graph")
elif v2 not in G:
print(v2,"is not present in graph")
else:
G[v1].append(v2)
G[v2].append(v1)
visited = []
G={}
queue = []
add_node("A")
add_node("B")
add_node("C")
add_node("D")
add_node("E")
add_edge("A","B")
add_edge("B","E")
add_edge("A","C")
add_edge("A","D")
add_edge("B","D")
add_edge("C","D")
add_edge("E","D")
print(G)
bfs(visited,G,'A')
UNIT-V
6. Null Graph: It's a reworked version of a 11. Directed Graph: A directed graph also
trivial graph. If several vertices but no edges referred to as a digraph, is a set of nodes
connect them, a graph G= (V, E) is a null connected by edges, each with a direction.
graph.
12. Undirected Graph: An undirected graph
7. Complete Graph: If a graph G= (V, E) is also comprises a set of nodes and links connecting
a simple graph, it is complete. Using the them. The order of the two connected
edges, with n number of vertices must be vertices is irrelevant and has no direction. You
connected. It's also known as a full graph can form an undirected graph with a finite
because each vertex's degree must be n-1. number of vertices and edges.
8. Pseudo Graph: If a graph G= (V, E) contains 13. Connected Graph: If there is a path
a self-loop besides other edges, it is a between one vertex of a graph data structure
pseudograph. and any other vertex, the graph is connected.
18. Subgraph: The vertices and edges of a graph that are subsets of
another graph are known as a subgraph.
Representation of Graphs in Data Structures Undirected Graph Representation
The most frequent graph representations are the two that Directed Graph Representation
follow: Adjacency matrix Adjacency list
Step 1 - Define a Queue of size and the total number of vertices in the
graph.
Step 2 - Select any vertex as a starting point for traversal. Visit that vertex
and insert it into the Queue.
Step 3 - Visit all the non-visited adjacent vertices of the vertex at the front
of the Queue and insert them into the Queue.
Step 4 - When there is no new vertex to be visited from the vertex at the
front of the Queue, then delete that vertex.
Step 5 - Repeat steps 3 and 4 until the queue becomes empty.
Step 6 - When the queue becomes empty, then produce the final
spanning tree by removing unused edges from the graph
DFS (Depth First Search): DFS traversal of a graph produces a spanning
tree as the final result.
It uses hash tables to store the data in an array format. Each value in the array has been
assigned a unique index number. Hash tables use a technique to generate these unique
index numbers for each value stored in an array format. This technique is called the hash
technique.
You only need to find the index of the desired item, rather than finding the data. With
indexing, you can quickly scan the entire list and retrieve the item you wish. Indexing also
helps in inserting operations when you need to insert data at a specific location. No matter
how big or small the table is, you can update and retrieve data within seconds.
The hash table is basically the array of elements and the hash techniques of search are
performed on a part of the item i.e. key. Each key has been mapped to a number, the range
remains from 0 to table size 1
Types of hashing in data structure is a two-step process.
• The hash function converts the item into a small integer or hash value. This integer is
used as an index to store the original data.
• It stores the data in a hash table. You can use a hash key to locate data quickly.
Need for Hash data structure: Every day, the data on the internet is increasing multifold and it is
always a struggle to store this data efficiently. In day-to-day programming, this amount of data might
not be that big, but still, it needs to be stored, accessed, and processed easily and efficiently. A very
common data structure that is used for such a purpose is the Array data structure.
• Now the question arises if Array was already there, what was the need for a new data structure! The
answer to this is in the word “efficiency“. Though storing in Array takes O(1) time, searching in it
takes at least O(log n) time. This time appears to be small, but for a large data set, it can cause a lot
of problems and this, in turn, makes the Array data structure inefficient.
• So now we are looking for a data structure that can store the data and search in it in constant time,
i.e. in O(1) time. This is how Hashing data structure came into play. With the introduction of the
Hash data structure, it is now possible to easily store data in constant time and retrieve them in
constant time as well.
In schools, the teacher assigns a unique roll number to each student. Later, the teacher uses that
roll number to retrieve information about that student.
A library has an infinite number of books. The librarian assigns a unique number to each book. This
unique number helps in identifying the position of the books on the bookshelf.
Components of Hashing: There are majorly three
components of hashing:
1.Key: A Key can be anything string or integer which
is fed as input in the hash function the technique
that determines an index or location for storage of
an item in a data structure.
2.Hash Function: The hash function receives the
input key and returns the index of an element in an
array called a hash table. The index is known as
the hash index.
3.Hash Table: Hash table is a data structure that
maps keys to values using a special function called a
hash function. Hash stores the data in an associative
manner in an array where each data value has its
own unique index.
How does Hashing work?
Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to store it in a table. Our main objective here is to
search or update the values stored in the table quickly in O(1) time and we are not concerned about the ordering of strings in
the table. So the given set of strings can act as a key and the string itself will act as the value of the string but how to store the
value corresponding to the key?
Step 1: We know that hash functions (which is some mathematical formula) are used to calculate the hash value which
acts as the index of the data structure where the value will be stored.
Step 2: So, let’s assign “a” = 1, “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string: “ab” = 1 + 2 = 3, “cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash function that is used here is the sum of
the characters in key mod Table size. We can compute the location of the string in the array by taking the sum(string) mod
7.
Step 5: So we will then store “ab” in 3 mod 7 = 3, “cd” in 7 mod 7 = 0, and “efg” in 18 mod 7 = 4.
The above technique enables us to calculate the location of a given string by using a simple hash function and rapidly find
the value that is stored in that location. Therefore the idea of hashing seems like a great way to store (key, value) pairs of
the data in a table.
How does Hashing in Data Structure Works?
• Hash table stores the data in a key-value pair. The key acts as
an input to the hashing function. Hashing function then
generates a unique index number for each value stored. The
index number keeps the value that corresponds to that key. The
hash function returns a small integer value as an output. The
output of the hashing function is called the hash value.
For example: Consider an array as a Map where the key is the index and the value is the
value at that index. So for an array A if we have index i which will be treated as the key then
we can find the value by simply looking at the value at A[i].
simply looking up A[i].
Types of Hash functions: There are many hash functions that use numeric or alphanumeric
keys.
Division Method.
Mid Square Method.
Folding Method.
Multiplication Method
Properties of a Good hash function: A hash function that maps every item into its own unique slot is known
as a perfect hash function. We can construct a perfect hash function if we know the items and the collection
will never change but the problem is that there is no systematic way to construct a perfect hash function given
an arbitrary collection of items.
Fortunately, we will still gain performance efficiency even if the hash function isn’t perfect. We can achieve
a perfect hash function by increasing the size of the hash table so that every possible value can be
accommodated. As a result, each item will have a unique slot. Although this approach is feasible for a small
number of items, it is not practical when the number of possibilities is large.
So, We can construct our hash function to do the same but the things that we must be careful about while
constructing our own hash function. A good hash function should have the following properties:
1.Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
3.Should minimize collisions.
4.Should have a low load factor(number of items in the table divided by the size of the table).
What is collision?
Hence In this way, the separate chaining method is used as the collision resolution technique.
2) Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we examine the table slots one
by one until the desired element is found or it is clear that the element is not in the table.
2.a) Linear Probing: In linear probing, the hash table is searched sequentially that starts from the
original location of the hash. If in case the location that we get is already occupied, then we check
for the next location.
Algorithm:
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we
will search for slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search
for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in
slot 5.
2.c) Double Hashing: Double hashing is a collision- Example: Insert the keys 27, 43, 692, 72 into the
resolving technique in Open Addressed Hash tables. Hash Table of size 7. where first hash-function
Double hashing make use of two hash function, is h1(k) = k mod 7 and second hash-function
is h2(k) = 1 + (k mod 5)
• The first hash function is h1(k) which takes the key
and gives out a location on the hash table. But if the Step 1: Insert 27,
new location is not occupied or empty then we can • 27 % 7 = 6, location 6 is
easily place our key. empty so insert 27 into 6 slot.
Step 4: Insert 72
String Matching Algorithms have greatly influenced computer science and play an essential role in various
real-world problems. It helps in performing time-efficient tasks in multiple domains. These algorithms are
useful in case of searching a string within a string. String matching is used in the Database schema,
networking systems.
Those techniques are used when the quality of text is low, there are spelling errors in the pattern or text,
finding DNA subsequences after mutation, heterogenous databases, etc. Some of them are:
1.Naive Approach: It slides the pattern over text one by one and check for subsequent approximate
matches:
1.Sellers Algorithm (Dynamic Programming)
2. Shift or Algorithm (Bitmap Algorithm)
CODE:
def brute_force_string_matching(pattern, text):
m, n = len(pattern), len(text)
# Example usage:
pattern = "abc"
text = "abracadabra"
if result != -1:
print(f"Pattern found at index {result}.")
else:
print("Pattern not found.")
RESULT:
Pattern found at index 0.
CODE:
import itertools
Defbrute_force_password_cracking(target_password,
characters="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"):
password_length = len(target_password)
# Example usage:
target_password = "secret123"
cracked_password = brute_force_password_cracking(target_password)
if cracked_password:
print (f"Password cracked: {cracked_password}")
else:
print ("Password not cracked.")
RESULT:
Password not cracked.
In real world scenarios, brute force algorithms might not be practical for large datasets or more complex
problems due to their inefficiency.
Text
Pattern
Let us assign a numerical value(v)/weight for the characters we will be using in the problem. Here, we
have taken first ten alphabets only (i.e. A to J).
Let n be the length of the pattern and m be the length of the text. Here, m = 10 and n = 3.
Let d be the number of characters in the input set. Here, we have taken input set {A, B, C, ...,
J}. So, d = 10. You can assume any suitable value for d.
Let us calculate the hash value of the pattern.
= 344 mod 13
=6
In the calculation above, choose a prime number (here, 13) in such a way that we can perform all the
calculations with single-precision arithmetic.
The reason for calculating the modulus is given below.
Compare the hash value of the pattern with the hash value of the text. If they match then, character-
matching is performed.
In the above examples, the hash value of the first window (i.e. t) matches with p so, go for character
matching between ABC and CDD. Since they do not match so, go for the next window.
We calculate the hash value of the next window by subtracting the first term and adding the next term as
shown below.
In order to optimize this process, we make use of the previous hash value in the following way.
CODE:
Input: txt[] = "AABAACAADAABAABA"
pat[] = "AABA"
Output: Pattern found at index 0
Pattern found at index 9
Pattern found at index 12
d = 10
m = len(pattern)
n = len(text)
p=0
t=0
h=1
i=0
j=0
for i in range(m-1):
h = (h*d) % q
for i in range(m):
p = (d*p + ord(pattern[i])) % q
t = (d*t + ord(text[i])) % q
for i in range(n-m+1):
if p == t:
for j in range(m):
if text[i+j] != pattern[j]:
break
j += 1
if j == m:
if i < n-m:
t = (d*(t-ord(text[i])*h) + ord(text[i+m])) % q
if t < 0:
t = t+q
text = "ABCCDDAEFG"
pattern = "CDD"
q = 13
search(pattern, text, q)
RESULT:
The worst-case complexity occurs when spurious hits occur a number for all the windows.
In order to do this, we will need to know even more about where the items might be when we go to look
for them in the collection. If every item is where it should be, then the search can use a single comparison
to discover the presence of an item. We will see, however, that this is typically not the case.
A hash table is a collection of items which are stored in such a way as to make it easy to find them later.
Each position of the hash table, often called a slot, can hold an item and is named by an integer value
starting at 0. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on. Initially,
the hash table contains no items so every slot is empty. We can implement a hash table by using a list with
each element initialized to the special Python value None. Figure shows a hash table of size m=11. In
other words, there are m slots in the table, named 0 through 10.
The mapping between an item and the slot where that item belongs in the hash table is called the hash
function. The hash function will take any item in the collection and return an integer in the range of slot
names, between 0 and m-1. Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. Our
first hash function, sometimes referred to as the “remainder method,” simply takes an item and divides it
by the table size, returning the remainder as its hash value (). Table 4 gives all of the hash values for
our example items. Note that this remainder method (modulo arithmetic) will typically be present in some
form in all hash functions, since the result must be in the range of slot names.
Table 4: Simple Hash Function Using Remainders
Once the hash values have been computed, we can insert each item into the hash table at the designated
position as shown in Figure 5. Note that 6 of the 11 slots are now occupied. This is referred to as the load
factor and is commonly denoted by lambda = number of items/tablesize. For this example, lambda=6/11.
Now when we want to search for an item, we simply use the hash function to compute the slot name for
the item and then check the hash table to see if it is present. This searching operation is O(1), since a
constant amount of time is required to compute the hash value and then index the hash table at that
location. If everything is where it should be, we have found a constant time search algorithm.
You can probably already see that this technique is going to work only if each item maps to a unique
location in the hash table. For example, if the item 44 had been the next item in our collection, it would
have a hash value of 0 (44%11==0). Since 77 also had a hash value of 0, we would have a problem.
According to the hash function, two or more items would need to be in the same slot. This is referred to
as a collision (it may also be called a “clash”). Clearly, collisions create a problem for the hashing
technique. We will discuss them in detail later.
HASH FUNCTIONS:
Given a collection of items, a hash function that maps each item into a unique slot is referred to as a
perfect hash function.
Our goal is to create a hash function that minimizes the number of collisions, is easy to compute, and
evenly distributes the items in the hash table. There are a number of common ways to extend the simple
remainder method. We will consider a few of them here.
The folding method for constructing hash functions begins by dividing the item into equal-size pieces
(the last piece may not be of equal size). These pieces are then added together to give the resulting hash
value. For example, if our item was the phone number 436-555-4601, we would take the digits and divide
them into groups of 2 (43,65,55,46,01). After the addition, 43+65+55+46+01, we get 210. If we assume
our hash table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the
remainder. In this case 210 % 11 is 1, so the phone number 436-555-4601 hashes to slot 1. Some folding
methods go one step further and reverse every other piece before the addition. For the above example, we
get 43+56+55+64+01=219 which gives 219 % 11=10.
Another numerical technique for constructing a hash function is called the mid-square method. We first
square the item, and then extract some portion of the resulting digits. For example, if the item were 44,
we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder
step, we get 5 (93 % 11). Table 5 shows items under both the remainder method and the mid-square
method. You should verify that you understand how these values were computed.
We can also create hash functions for character-based items such as strings. The word “cat” can be thought
of as a sequence of ordinal values.
We can then take these three ordinal values, add them up, and use the remainder method to get a hash
value (see Figure 6). Listing 1 shows a function called hash that takes a string and a table size and returns
the hash value in the range from 0 to tablesize-1.
It is interesting to note that when using this hash function, anagrams will always be given the same hash
value. To remedy this, we could use the position of the character as a weight. Figure 7 shows one possible
way to use the positional value as a weighting factor. The modification to the hash function is left as an
exercise.
You may be able to think of several additional ways to compute hash values for items in a collection. The
important thing to remember is that the hash function has to be efficient so that it does not become the
dominant part of the storage and search process. If the hash function is too complex, then it becomes more
work to compute the slot name than it would be to simply do a basic sequential or binary search as
described earlier. This would quickly defeat the purpose of hashing.
• An efficient hash function should be designed so that it distributes the index values of inserted
objects uniformly across the table.
• An efficient collision resolution algorithm should be designed so that it computes an alternative
index for a key whose hash index corresponds to a location previously inserted in the hash table.
• We must choose a hash function which can be calculated quickly, returns values within the range
of locations in our table, and minimizes collision.
• Minimize collision
• Be easy and quick to compute
• Distribute key values evenly in the hash table
• Use all the information provided in the key
• Have high load factor for given set of keys
LOAD FACTOR:
The load factor of a non-empty hash table is the number of items stored in the table divided by size of the
table. This is the decision parameter used when we want to rehash or expand the existing hash table entries.
This also helps us in determining the efficiency of hashing function. That means, it tells whether
the hash function is distributing the keys uniformly or not.
COLLISIONS:
Hash functions are used to map each key to a different address space, but practically it is not possible to
create such a hash function and the problem is collision. Collision is the condition where two records are
stored in the small location.
Separate chaining
Separate Chaining:
Collision resolution by chaining combines linked representation with hash table. When two or more
records hash to the same location, these records are constituted into a singly linked list is called a chain
Open addressing:
In open addressing, all keys are stored in hash table itself. This approach is also known as closed hashing.
This procedure is based on probing. A collision is resolved by probing.
Linear Probing:
The interval between probes is fixed at 1. In linear probing, we search the hash table sequentially, starting
from the original hash location. If location is occupied, we check the next location. We wrap around from
the last table location to the first table location if necessary. The function for rehashing is the following:
One of the problems with linear probing is that table items tend to cluster together in the hash table. This
means that the table contains groups of consecutively occupied locations that a rc called clustering.
Clusters can get close to one another, and merge into a larger cluster. Thus, the one pa rt of the table might
be quite dense, even though another part has relatively few items. Clustering causes long probe searches
and therefore decreases the overall efficiency.
The next location to be probed is determined by the step-size, where other step-sizes (more than one) arc
possible. The step-size should be relatively prime to the table size, i.e. their greatest common divisor
should be equal to 1. If we choose the table size lo be a prime number, then any step-size is relatively
prime to the table size. Clustering cannot be avoided by larger step-sizes.
Quadratic Probing:
The interval between probes increases proportionally to the hash value (the interval thus increasing
linearly, and the indices are described by a quadratic function). The problem of Clustering can be
eliminated if we use the quadratic probing method.
In quadratic probing, we start from the original hash location i. If a location is occupied, we check the
locations i+1^2, i+2^2, i+3^2, i+4^2.Wrap around from the last table location Lo the first table location if
necessary. The function for rehashing is the following:
Double Hashing:
The interval between probes is computed by another hash function. Double hashing reduces clustering in
a better way. The increments for the probing sequence are computed by using a second hash function. The
second hash function should be:
We first probe the location hl(key). If the location is occupied, we probe the location hl(key) + h2(key),
hl(key) + 2 * h2(key), …
The choice between linear probing and double hashing depends on the cost of
computing the hash function and on the load factor [number of elements per
slot] of the table. Both use few probes, but double hashing takes more time
because it hushes to compare two hash functions for long keys.
Hashing Techniques:
There are two types of hashing techniques: static hashing and dynamic hashing
Static Hashing
If the data is fixed, then static hashing is useful. In static hashing, the set of keys is kept fixed and given
in advance, and the number of primary pages in the directory arc kept fixed.
Dynamic ;:Hashing
If the data is not fixed, static hashing can give bad performance, in which case dynamic hashing is the
alternative, in which case the set of keys can change dynamically.
8 (a) Explain the steps involved in insertion and deletion into a singly and (5) 2 2
doubly linked list?
(b) What are the benefits and limitations of linked list? (4) 2 2
(OR)
9 (a) How polynomial manipulations are performed with lists? Explain its (5) 2 2
operations?
(b) What are the applications of linked list in dynamic storage management? (4) 2 2
12 (a) Construct an expression tree for the expression (a+b*c) + ((d*e+f)*g). (5) 4 3
Give the outputs when you apply inorder, preorder and postorder
traversals.
(b) Explain steps for conversion of general tree to binary tree, with an (4) 4 1
example.
(OR)
Page 1 of 2
Code No.: 20CSC06
13 (a) Write a recursive algorithm for binary tree traversal with an example. (5) 4 3
(b) List out the steps involved in deleting a node from a binary search tree (4) 4 3
with an example.
14 (a) Define Graph and explain how graphs can be represented in adjacency (5) 5 1
matrix and adjacency list.
(b) Briefly explain various levels in a graph using examples. (4) 5 2
(OR)
15 (a) Explain the minimum spanning tree algorithms with an example. (5) 5 2
(b) Define Indegree and Outdegree of a graph with an example? (4) 5 1
******
Page 2 of 2