You are on page 1of 205

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/375457234

Advanced Data Structures and Algorithms

Book · October 2023


DOI: 10.5281/zenodo.10074335

CITATIONS READS

0 1,156

1 author:

Sanjay Agal
Parul Universiy
29 PUBLICATIONS 12 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sanjay Agal on 08 November 2023.

The user has requested enhancement of the downloaded file.


ADVANCED DATA STRUCTURES AND
ALGORITHMS

Editors:

- Prof. Sanjay Agal

www.xoffencerpublication.in

i
Copyright © 2023 Xoffencer

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis
or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive
use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the
provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must
always be obtained from Springer. Permissions for use may be obtained through Rights Link at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.

ISBN-13: 978-81-19534-88-3 (paperback)

Publication Date: 30 October 2023

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every
occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion
and to the benefit of the trademark owner, with no intention of infringement of the trademark.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not
identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary
rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither
the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may
be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

MRP: 450/-

ii
Published by:

Xoffencer International Publication

Behind shyamviharvatika, laxmi colony

Dabra, Gwalior, M.P. – 475110

Cover Page Designed by:

Satyam soni

Contact us:

Email: mr.xoffencer@gmail.com

Visit us: www.xofferncerpublication.in

Copyright © 2023 Xoffencer

iii
iv
Author Details

Prof. Sanjay Agal


Prof. Sanjay Agal, is working as Principal at Dr V R Godhania College of Engineering
And Technology, Porbandar, Gujarat. He has received his PhD degree in Computer
Engineering from Pacific University, Udaipur in 2016. He has published 16
international research paper and have two patents in artificial intelligence. He is a keen
observer, planner & implementer with track record of developing operational policies/
norms, systems & controls, motivational schemes & education standards for
professionals during the career span. Prof Sanjay Agal is a renowned computer
engineer and an esteemed author in the field of computer engineering and technology.
With a lifelong passion for computers and technology, Prof Agal has dedicated her
career to advancing the field and imparting her vast knowledge to aspiring engineers.

Prof. Agal's journey in computer engineering began during her undergraduate studies,
where she developed a deep fascination for the inner workings of computers and their
transformative potential. Eager to explore the field further, she pursued a Master's
degree in Computer Science, delving into topics such as programming languages,
algorithms, and system design. Driven by her thirst for knowledge and a desire to make
meaningful contributions, Dr Agal went on to earn her Ph.D. in Computer Engineering.
His doctoral research focused on developing innovative solutions for improving the
performance and efficiency of computer systems. His groundbreaking work in the field
earned her recognition and accolades from leading academic institutions and industry
experts.

v
Throughout her career, Dr. Agal has held prominent positions in academia. As a
professor of computer engineering, he has mentored and inspired countless students,
sharing his expertise, and guiding them in their pursuit of knowledge. His teaching style
is known for its clarity, precision, and ability to distil complex concepts into easily
understandable principles. In addition to his academic endeavours, Dr. Agal has
collaborated with industry leaders to implement cutting-edge technologies and address
real-world challenges. His hands-on experience in the field has provided her with
valuable insights into the practical applications of computer engineering principles.

vi
Preface

The text has been written in simple language and style in well organized and
systematic way and utmost care has been taken to cover the entire prescribed
procedures for Science Students.

We express our sincere gratitude to the authors not only for their effort in
preparing the procedures for the present volume, but also their patience in waiting to
see their work in print. Finally, we are also thankful to our publishers Xoffencer
Publishers, Gwalior, Madhya Pradesh for taking all the efforts in bringing out this
volume in short span time.

vii
viii
Contents
Chapter No. Chapter Names Page No.
Chapter 1 Introduction 1-12

Chapter 2 Data Structure 13-44

Chapter 3 Linear Data Structure 45-91

Chapter 4 Nonlinear Data Structure 92-119

Chapter 5 Sorting And Searching 120-162

Chapter 6 Hashing And File Structures 163-192

ix
x
CHAPTER 1

INTRODUCTION

1.1 ALGORITHM SPECIFICATIONS

One way to think of an algorithm is as a finite set of instructions that, when followed
out in the proper sequence, result in a certain operation being carried out. The following
is a list of the criteria that need to be satisfied by any algorithm to introduce. An
algorithm may contain zero or more inputs, which are taken or collected from a certain
collection of items that have been defined in advance. The result of. An algorithm may
have the capability of producing one or more outputs, each of which is uniquely
connected to one or more of the algorithm's inputs. Clarity of the definition is required.
Each step must have a goal that is expressed very clearly, and each instruction must be
extremely clear in order for the step to be successful. Having a cap on anything.
Following a fixed and finite number of steps, the algorithm needs to successfully finish,
also known as "terminate." The potential for achieving one's goals. It is essential that
every one of the responsibilities that need to be fulfilled be so vital that they can be
carried out accurately and within a certain length of time. It is possible to express an
algorithm in a variety of different ways.

• Adopting a natural language such as English is an example of using a language


that exists in nature.
• Flowcharts are graphical representations of algorithms, and you should only use
flowcharts in situations when the technique is very simple and uncomplicated.
• Code pseudonym for: This pseudo code steers clear of the vast majority of
issues brought on by ambiguity; the programming language does not impose
any particularly stringent limitations on its syntax.

1|Page
The following is an illustration of one possible method for calculating the factorial
value of an integer.

1.1.1 Recursive Algorithms

Due to the fact that recursive algorithms function in this manner, the returned result is
often put to use once again as a parameter inside the recursive algorithm when it is
invoked. This parameter is considered to be the input, while the value that is returned
by the function is considered to be the output.

The phrase "recursive algorithm" refers to a kind of approach to problem-solving that


reduces difficult problems by dismantling them into sub-issues that are analogous to
one another. The result of one round of recursion serves as the input for the next round
of recursion, which means that recursion may continue indefinitely. The restocking was
done in a manner that was consistent with previous operations. The algorithm is run
with gradually lower values for its inputs, and the results are acquired by simply
carrying out the necessary operations on these progressively lower values. Moreover,
the algorithm runs itself. The production of factorials and the Fibonacci number
sequence are two instances of recursive algorithms that illustrate their use.

1.2 PERFORMANCE ANALYSIS AND MEASUREMENT

1. Big-O Notation

When attempting to estimate the temporal complexity of an algorithm in the most


extreme of circumstances, the Big-O notation is what we turn to. This notation specifies
whether the set of functions expands more slowly than the expression, or whether it
increases at the same speed as the expression. In addition to this, it provides a
description of the most amount of time that an algorithm requires in order to be able to
take into consideration all of the data that is entered.

2. Omega Notation

It provides a description of the best-case scenario for the time complexity of an


algorithm, and the Omega notation details whether the collection of functions will
expand more quickly than the expression or at the same rate. In addition to this, it
provides a description of the shortest amount of time required by an algorithm in order
to take into consideration all of the data that was supplied.

2|Page
3. Theta Notation

The theta notation is used if the set of functions is in both O(expression) and
Omega(expression). This is because the theta notation depicts the typical situation of
an algorithm's time complexity. In addition to this, it lays forth the parameters for when
the Theta notation should be used. When we discuss the temporal complexity of an
algorithm's average case in this manner, we are describing what we mean by the term.

1.2.1 Measurement of Complexity of an Algorithm

There are three distinct approaches to analyzing algorithms, and each of these
approaches is based on a different one of the temporal complexity notations shown
earlier:

1. Worst Case Analysis (Mostly used)

Determine the absolute greatest amount of time that can be spent running a program by
doing a worst-case analysis. This will help you prepare for the worst. It is essential for
us to have an understanding of the situation that leads to the carrying out of the most
number of processes. In Linear Search, the worst-case situation is when the element to
be searched (x) is not present in the array. This is called an empty array.

The search () method performs a comparison with each individual member of arr [,
beginning with the first element, when x is not present in the array. Therefore, the
worst-case scenario for the linear search in terms of its temporal complexity would be
O(n). This would be the case since it would take n times longer.

2. Best Case Analysis (Very Rarely used)

A component of the best-case analysis is the determination of the absolute minimum


amount of time that must pass before an algorithm can be considered complete. It is
essential for us to have an understanding of the situation that will result in the
performance of the fewest number of actions conceivable. When it comes to the
problem of linear search, the ideal situation is one in which x is placed at the first
feasible position. This is the case in the ideal scenario. The number of operations does
not change regardless of the value of n, which indicates that this is the best possible
outcome. Therefore, the most favorable option for the complexity of the time period
would be (1).

3|Page
3. Average Case Analysis (Rarely used)

When doing an analysis of the average situation, we take into account all of the possible
inputs and calculate the amount of time needed to process each of those inputs
individually. After computing the entire sum of all of the previously calculated values,
divide the resulting figure by the total quantity of data points. It is necessary for us to
have prior knowledge about, or the capacity to make predictions regarding, the
distribution of occurrences.

In order to solve the problem of linear search, let us assume that all of the potential
scenarios are spread out equally (this includes the possibility that x is not present in the
array). This will allow us to find a solution. As a result, we first make a count of all of
the occurrences, and then we divide that tally by (n+1). The number that is shown here
is a representation of the temporal complexity of the typical circumstance.

1.2.2 Which Complexity analysis is generally used?

The many different mentions of complexity analysis notation are ranked here
according to how often they are using the table that follows:

1. Worst Case Analysis:

When we are doing research on algorithms, the vast majority of the time, we look at
the worst-case scenarios. In the most pessimistic analysis, we are able to guarantee an
upper limit on the amount of time it takes an algorithm to do its work, which is useful
information. In other words, we can predict how long it will take an algorithm to finish
its job.

2. Average Case Analysis

Because it is challenging to use in the majority of real-world settings, the average case
analysis is not used as often as it should be. In order to conduct an analysis of the typical
scenario, it is necessary for us to be familiar with the mathematical distribution of all
of the possible inputs, or at the at least to be able to make an educated guess about it.

4|Page
3. Best Case Analysis

The analysis based on the best-case scenario is wholly inaccurate. Guaranteeing a lower
limit on an algorithm does not provide any more knowledge when it comes to analyzing
the most catastrophic possible outcome. The process of carrying out an algorithm might
take many years.

1.2.3 Interesting information about asymptotic notations:

A) The results of all the potential scenarios for specific algorithms, including the worst
case, the best case, and the average case, are basically the same. In other words, there
is no such thing as a "best case" or a "worst case."

 Here's an example to illustrate the point: The complexity of the operations


performed by the Merge Sort algorithm is expressed as (n log(n)).

B) On the other hand, the great majority of the other algorithms for sorting have both
the worst-case scenario and the best-case scenario built into them.

• Here's an example: In a common implementation of Quick Sort (one in which


the pivot element is picked as a corner element), the input array being already
sorted is the worst-case scenario, and the pivot elements always splitting the
array into two halves is the best-case scenario. The worst-case situation occurs
when the input array is already sorted.
• An example of the worst-case situation that might occur with insertion sort is
when the array is sorted in the opposite direction. The insertion sort achieves
the greatest results when the array is sorted into the same order as the output.
This is the best-case situation.

Output: 30 is present at index 2

1.2.4 Time Complexity Analysis: (In Big-O notation)

 Best Case: O (1), This will take place if the element to be searched is on the
first index of the given list. So, the number of comparisons, in this case, is 1.

 Average Case: O(n), This will take place if the element to be searched is on
the middle index of the given list.

5|Page
Worst Case: O(n), This will take place if:

• The element to be searched is on the last index


• The element to be searched is not present on the list

2. In this example, we will take an array of length (n) and deals with the following
cases:

• If (n) is even then our output will be 0


• If (n) is odd then our output will be the sum of the elements of the array.

Below is the implementation of the given problem:

6|Page
Output

15

1.2.5 Time Complexity Analysis:

 Best Case: The order of growth will be constant because in the best case we are
assuming that (n) is even.

 Average Case: In this case, we will assume that even and odd are equally
likely, therefore Order of growth will be linear

 Worst Case: The order of growth will be linear because in this case, we are
assuming that (n) is always odd.

One strategy that may be used to explore the efficacy of algorithms in a number of
contexts is one that is known as Worst-Case, Average-Case, and Best-Case Analysis of
Algorithms (Worst-Case, Average-Case, and Best-Case Analysis of Algorithms). The
following is a list of pertinent advantages, limitations, vital points, and reference
materials in connection with this approach of analysis:

1.2.6 Advantages:

1. Software developers are able to obtain a knowledge of the performance of


algorithms in a number of scenarios by using this approach, which may help
them make well-informed choices regarding the algorithm that should be
utilized for a certain task.
2. The study of the worst-case scenario provides a guarantee on the maximum limit
of the running time of an algorithm. This assurance may be helpful in the
process of creating algorithms that are trustworthy and efficient.
3. When applied to scenarios that occur in the real world, the findings of an
average case study provide a more realistic estimate of the amount of time it
takes for an algorithm to accomplish its work. This is knowledge that may be
beneficial when applying the algorithm in question to those situations.

1.2.7 Disadvantages:

1. This technique might be time-consuming, and it also requires a comprehensive


understanding of the algorithm that is being researched in order to be successful.
7|Page
2. When applied to circumstances that occur in the real world, worst-case analysis
could be a disadvantage due to the fact that it does not provide any information
on the typical amount of time it takes for an algorithm to complete its tasks.
3. In order to do an analysis of the average case, you need to have a working
understanding of the probability distribution of the input data, which is not
something that is usually easily available.

1.2.8 Important points:

1. The worst-case analysis of an algorithm establishes a maximum limit on the


amount of time it will take for the algorithm to do the job it was given,
regardless of the amount of data that was initially fed into the method.
2. Using an algorithm's average case analysis, one may gain an estimate of the
amount of time it takes for an algorithm to finish its work when given a random
input. This time estimate can be used to help optimize the algorithm.
3. The best-case scenario analysis of an algorithm places a lower restriction on the
amount of time the algorithm requires to do its work. This is true regardless of
the quantity of the data that is being fed into the algorithm.
4. The large O notation is often used whenever there is an effort made to determine
the lengthiest feasible execution time of an algorithm.
5. It is possible that different algorithms will have differing best-case, average-
case, and worst-case running times for their execution.

1.2.9 Time Complexity and Space Complexity

In the realm of computer science, there is never just one technique that may be used to
solve a problem; rather, there are multiple different algorithms from which to choose
and select. Therefore, it is of the greatest necessity to employ a method in order to
compare the answers and establish which one is preferable to the others in terms of
optimality. This may be done by determining which solution has the highest score. The
method should be carried out in this manner:

• Independent of the machine and its configuration, on which the algorithm is


running on.
• Shows a direct correlation with the number of inputs.
• Can distinguish two algorithms clearly without ambiguity.

There are two methods similar to this one that are utilized, and they are referred to as
time complexity and space complexity, respectively. Both of these methods are

8|Page
employed. An algorithm's time complexity is a metric that measures the total amount
of time it takes for an algorithm to run as a function of the total length of the data that
it processes. This time complexity is measured as a function of the total length of the
data. It is essential to keep in mind that the amount of time necessary to run the
algorithm is a function of the length of the input, and not the actual execution time of
the machine that the program is being done on. This is one of the most important things
to keep in mind.

When an algorithm is put into action, it will always require a limited but predefined
amount of time to complete the task at hand. The temporal complexity of an algorithm
is defined as the length of time that must elapse before the algorithm is able to solve a
certain problem. The evaluation of algorithms is aided tremendously by the use of the
metric of time complexity. The term "execution time" refers to the length of time that
must pass before an algorithm can be considered complete. It is vital to take into
consideration both the cost of each fundamental instruction and the number of times
the instruction is executed in order to arrive at an accurate estimate of the amount of
time that will be required to finish the activity. Only then will an accurate estimate be
possible. An example of the addition of two scalar variables is provided below for your
reference.

To complete the addition of two scalar values, it is only necessary to do a single


addition operation. This technique has a temporal complexity that is always the same,
which implies that T(n) = O (1). Because of this, the runtime is linear in complexity. It
is important to make the assumption that each individual operation takes the same
amount of time, indicated by c, in order to calculate the time complexity of an
algorithm. This assumption must be made in order to calculate the time complexity of
an algorithm. After completing this stage, the total number of operations that must be
performed in order to process an input with a length of N is calculated. Take the
following example into consideration in order to get a grasp of how the calculation
works: Assume that one of the jobs at hand is to identify whether or not a pair with the

9|Page
coordinates (X, Y) can be found in an array that has N entries and whose total is Z. The
other objective involves determining whether or not a pair with these coordinates can
be found in the array. Examining each potential combination in order to ascertain
whether or not it satisfies the criteria that have been set is the method that may be
described as having the least amount of complexity. To properly write the pseudo-code,
it should look like this:

Let's give the variable the name c and make the assumption that the amount of time
needed for each computer action is pretty much always the same. The value of Z has
an immediate and decisive impact on the overall number of lines of code that are
executed. Studies of the algorithm nearly typically include taking into account the
most catastrophic possible outcome. This refers to the circumstance in which there is
not a single component pair whose combined value is equal to Z. In the direst of all
imaginable circumstances,

 N*c operations are required for input.


 The outer loop i loop runs N times.
 For each i, the inner loop j loop runs N times.
As a result, the total amount of time required for the execution is equal to N*c + N*N*c
plus c. Now leave out the components of the lower order since the lower order terms
are often unimportant when dealing with large inputs. As a consequence of this, just
the highest order term is considered (without the constant), which in this case is denoted
by the notation N times N. It is feasible to define the limiting behavior of a function
using a variety of different notations. However, given that the worst case is being
addressed, the big-O notation will be used to express the temporal complexity of the
problem. This is because the big-O notation has a larger number value than the other
notations.
10 | P a g e
As a consequence of this, the temporal complexity of the procedure described above is
represented by the notation O(N2) in computer science. It is vital to bear in mind that
the time complexity is entirely dependent on the number of items in array A, which is
also referred to as the input length. As a result, the amount of time it takes to execute
the command will also rise in proportion to the amount of data contained in the array.
The order of growth is a descriptive term that is used to explain the connection between
the length of the input and the amount of time required to finish the activity. The image
that was just shown makes it clearly clear that the amount of time needed to complete
an operation is proportional, although in a quadratic way, to the size of the array.

The sequence of growth will be of assistance in simplifying the estimation of the total
amount of time that the process will take. This is a difficult situation from a legal
standpoint. The complexity may seem to be O (N * log N) at first look; however,
additional investigation shows that this assumption is incorrect. For the j's loop, N will
be utilized, while log(N) will be used for the i's loop. However, you are doing it
completely wrong. What leads you to believe that is the case? Take into account the
total number of times the count++ command will be run.

 When i = N, it will run N times.


 When i = N / 2, it will run N / 2 times.
 When i = N / 4, it will run N / 4 times.
 And so on.
count++ will be carried out an aggregate of N + N/2 + N/4 +... +1 times, which is
equivalent to N multiplied by its own value 2 times. Therefore, the amount of time that
will be necessary is going to be O(N). In competitive programming, the following is a
list of some general temporal complexity, along with the range of inputs for which they
are acceptable:

Input Worst Accepted Time Usually type of solutions


Length Complexity
10 -12 O (N!) Recursion and backtracking
15-18 O (2N * N) Recursion, backtracking, and bit
manipulation
18-22 O (2N * N) Recursion, backtracking, and bit
manipulation
30-40 O (2N/2 * N) Meet in the middle, Divide and Conquer

11 | P a g e
100 O (N4) Dynamic programming, Constructive
400 O (N3) Dynamic programming, Constructive
2K O (N2* log N) Dynamic programming, Binary
Search, Sorting,
Divide and Conquer
10K O (N2) Dynamic programming, Graph, Trees,
Constructive
1M O (N* log N) Sorting, Binary Search, Divide and
Conquer
100M O (N), O (log N), O (1) Constructive, Mathematical, Greedy
Algorithms

1.2.10 Space Complexity:

The use of a computer to solve issues calls for an adequate amount of memory, which
may be used to store either temporary data or the final solution while the program is
being executed. The amount of memory that is required by an algorithm in order for it
to properly solve a certain problem is referred to as the algorithm's space complexity.
The space complexity of a method is determined by the length of the input and is equal
to the amount of space that the algorithm uses while it is operating.

This number is referred to as the space complexity of the method. Consider the
following as an illustration: Imagine that there is a situation in which you need to figure
out the frequency of the elements in the array. It is a measure of the amount of memory
that must be accessible in order for an algorithm to be completed successfully.

To be able to offer an accurate prediction of the amount of memory that is needed, we


need to focus on two aspects:

1. A component that remains constant despite the changing values of other


variables, such as the input's strength. It incorporates memory for a variety of
things, including instructions (code), constants, variables, and other things.
2. The value of this component could shift based on the total amount of the data
that is read in. Memory for recursion stacks, memory for variables that are
referenced, and so on are all included.

12 | P a g e
CHAPTER 2

DATA STRUCTURE

The area of computer science encompasses a wide variety of specialized subfields, one
of which is the study of data structure. The study of data structure gives us the ability
to get knowledge of the management of data flow in addition to the organization of
data, both of which are essential in order to improve the effectiveness of any process
or program. The study of data structure also gives us the ability to gain knowledge of
how to organize data. Additionally, the study of data structure enables us to get a
knowledge of the arrangement of data, which is a very useful skill to have. Learning
data structure will acquaint you with both of these elements of data management, thus
it is important to pay attention to each of them. The phrase "data structure" refers to a
certain way of storing and organizing data in the memory of a computer, with the
objective of making the data easier to access and more effective to use in the event that
they are required at some point in the future.

The goal of this approach is to make the data simpler to reach and more effective to
utilize. The term "data structure" originates from the Latin phrase "data structure,"
which translates to "data organization." The English word "data structure" is derived
from the Latin phrase "data structure." A member in the field of computer science
named Donald Knuth is generally credited with coming up with the notion. Utilizing a
data structure, which can be thought of as a logical or mathematical model for a specific
arrangement of the data, is one of the many methods that can be used to process the
data. There are several other methods that may be used to accomplish this task.
Depending on how this model is used, there are a number of different ways that the
data may be handled. When it comes to the management of one's data, one has access
to a large number of different options from which to pick.

When determining the level of coverage that a certain data model provides, two distinct
aspects are taken into account. Both of these aspects are equally important:

1. First, it has to have sufficient information contained in the structure so that it may
properly reflect the data's definite relationship with a real-world item. Second, it
needs to have sufficient information included in the structure so that it can be
used. In order to achieve this goal, the structure must first include all of the

13 | P a g e
necessary information. This is something that can only be done if information has
been acquired from a wide range of various sources. There is no other way that it
can be done.
2. Second, the structure should be so straightforward and simple to comprehend that
it is always amenable to modification in the event that this turns out to be essential
for the efficient processing of the data. This suggests that the structure should be
simplified to the greatest extent feasible. It is essential that this second condition
be fulfilled in order for the structure to be a good candidate for the job.
3. The term "data structure" may refer to a large variety of diverse objects, such as
arrays, linked lists, stacks, queues, trees, and other structures. These can all be
considered examples. There are several more conceivable instances, such as. Data
structures have applications in almost every subfield of computer science,
including the development of operating systems and compilers, as well as in the
fields of computer graphics, artificial intelligence, and a huge number of other
fields.

The vast majority of the algorithms that are used in the discipline of computer science
need the utilization of data structures as an essential component in order to function
properly. This is due to the fact that there are data structures that make it possible for
programmers to efficiently organize and manage data. As a result, programmers are
able to properly handle and organize data. The process of improving the functionality
of a program or piece of software is heavily reliant on the software's ability to store and
retrieve the user's data as rapidly as is physically possible. This is because the faster the
software can do these things, the better its performance will be. This is as a result of
the fact that the primary objective of the program is to save and retrieve the user's data
in the quickest time feasible given the constraints of the human body.

The effective selection of a data structure opens the way for the efficient performance
of a broad variety of actions that are crucial to the operation of an organization. When
determining whether or not a data structure is efficient, two aspects are taken into
account: the length of time required to process the structure, as well as the amount of
memory space required to store the structure. A data structure that is both effective and
efficient will meet both of these requirements while returning the lowest possible value.
A data structure may be put to use for more than simply arranging the data; it can also
be used for a range of additional reasons. One of these applications is to store a variety
of different types of information. In addition, it plays a role in the operations of data
processing, data retrieval, and data storage, in that order.

14 | P a g e
When it comes to classifying various kinds of data structures, one may choose from a
wide variety of distinct categories. These are broken down into three categories: simple,
intermediate, and complicated data structures. These categories are used in some way
by almost every program and software system that has ever been developed. Because
of this, we need to have an in-depth knowledge of the various data structures. This is a
direct result of the previous point.

 Need of Data Structure:

The structure of the algorithm and the arrangement of the data need to be thought about
in connection to one another before either can be considered. In order to promote
efficient execution of the action by both the developer and the end user, it is vital that
the display of data be clear and easy. This is a prerequisite for effective execution. The
processes of storing, retrieving, managing, and organizing data may all be made easier
with the assistance of various data structures. The following list provides further
information on the data requirements.

• Data structure modification is easy.


• It requires less time.
• Save storage memory space.
• Data representation is easy.
• Easy access to the large database

Classification/Types of Data Structures:

1. Linear Data Structure


2. Non-Linear Data Structure.

Linear Data Structure:

• Elements are arranged in one dimension, also known as linear dimension.


• Example: lists, stack, queue, etc.

Non-Linear Data Structure

 Elements are arranged in one-many, many-one and many-many dimensions.


 Example: tree, graph, table, etc.
 Most Popular Data Structures:

15 | P a g e
1. Array:

An array is a collection of data components that are stored in memory areas that are
next to one another. Arrays have their own dedicated memory regions. The objective
is to concentrate in a single area a large number of items that are all of the same sort.
This makes it easier to compute the location of each element by only adding an offset
to a base value, which is the memory address of the first element of the array (often
represented by the name of the array). This makes it possible to determine the position
of each element in the array much more quickly. Because of this, it is feasible to arrive
at a more precise determination of the location of each element inside the array.

2. Linked Lists:

Linked lists, which are sometimes known as linked databases, are considered to be a
type of linear data structure. In contrast to arrays, linked lists do not store their
components in a single, continuous space; rather, pointers are used to join the various
components of the list together. Arrays, on the other hand, do store their components
in a single, continuous area. One kind of data structure is called an array.

3. Stack:

A stack is a type of linear data structure that stores its information in ascending order
and conducts its actions in accordance with a specified sequence. The sequence might
be FILO, which stands for "first in, last out," or LIFO, which means "last in, first out."
When utilizing stack, it is only possible to conduct actions such as adding and removing
items at one of the list's ends.

Processing Carried Out on Stacks:

 push (): While carrying out this method, a new component will be added to the
stack. This will happen at some point during the process.

 pop (): An item from the top of the stack is removed whenever this operation is
performed, and the item that was removed is the one that is returned.

 top (): This action will return the element that was most recently inserted, which
will keep it in its current place at the top without removing it.

 Empty (): This operation will disclose whether there is something currently on
top of the stack or not.

16 | P a g e
 Size (): This operation will return the size of the stack, which is the total number
of components that are now contained within the stack. Alternatively, it may
return the size of the stack.

4. Queue:

Along the same lines as the Stack, the Queue is a linear structure that adheres to a
certain order in which the operations are carried out. In other words, the Queue is
similar to the Stack. The sequence will be "first in, first out," which is an abbreviation
for "first in, first out." While items are being withdrawn from the opposite end of the
line, they are being put to the queue at the beginning of the process. The concept of a
queue may be usefully illustrated by imagining a line of individuals waiting for a
resource, in which the person who arrived at the location first is given precedence for
service. The difference between queues and stacks is in how objects may be removed
from the system. In a stack, we remove the item from the collection that was most
recently added, while in a queue, we remove the item from the collection that was most
recently added in the most distant past.

Activities pertaining to the Queue:

 Enqueue (): Adds (or stores) an element to the end of the queue.

 Dequeue (): Removal of elements from the queue.

 Peek () or front (): Acquires the data element available at the front node of the
queue without deleting it.

 Rear (): This operation returns the element at the rear end without removing it.

 Full (): Validates if the queue is full.

 Null (): Checks if the queue is empty.

5. Binary Tree:

In contrast to other popular data structures, such as arrays, linked lists, stacks, and
queues, trees are hierarchical in nature. Other common data structures, on the other
hand, are linear. In a binary tree, each node's left child and right child are referred to as
the left child and right child of that node, respectively. The left child is the left child.

17 | P a g e
One kind of data structure that uses trees is called a binary tree. Links are extremely
important components in the whole execution of it. The structure of a Binary Tree may
be expressed by using a pointer that is directed to the node that is located at the very
top of the tree. If there are no further nodes in the tree to be considered children, the
value of root will be "NULL." A node in a binary tree is comprised of each of the
following components in its whole.

1. Binary Search Tree:

A Binary Search Tree is an extension of the data structure known as a Binary Tree. It
features the following additional attributes in addition to those already present in the
Binary Tree:

• The left side of the root node contains the keys that have a value that is less than
the key that represents the root node itself.
• To the right of the root node, you'll find the keys that are more significant than
the key that corresponds to the root node itself.
• The binary tree does not contain even a single instance of a key that is a copy of
another.

In computer science, a binary tree is said to be of the "Binary search tree" (BST) variety
if it satisfies the criteria listed below.

2. Heap:

A heap is a special form of data structure that is constructed on trees and in which the
trees themselves are complete binary trees. This tree-based data structure is known as
a heap. In general, there are two distinct varieties of piles, which are as follows:

• When using a Max-Heap, the value of the key that is kept at the root node must
have a bigger value than the values of the keys that are kept at each and every
one of the nodes that are considered to be its offspring. It is necessary, from a
recursive perspective, for the same characteristic to be true for each of the
subtrees that comprise that Binary Tree.
• Min-Heap: In a Min-Heap, the key that is present at the root node of the heap
must be the key that is least common to all of the keys that are present at the
heap's children. This is because the key that is present at the root node of the
heap is considered to be the most important key. It is necessary, from a recursive

18 | P a g e
perspective, for the same characteristic to be true for each of the subtrees that
comprise that Binary Tree.

8. Hashing Data Structure:

Hash tables are helpful data structures that are designed to take use of a specific
operation that is referred to as the hash function. This function is used to map a certain
key to a predetermined value in order to provide for more expedient access to the
information contained in the table. The efficiency with which a certain hash function is
carried out is directly related to the efficiency with which mapping is carried out. It is
taken for granted that a hash function, designated by the notation H(x), would map the
value x to the location x%10 inside an array. For example, if the list of values is [11,
12, 13, 14, 15], it will be inserted into the array or hash table at the appropriate positions
1, 2, 3, 4, and 5.

9. Matrix:

A matrix is a representation of a collection of integers that is structured in rows and


columns in a certain sequence. The rows and columns of a matrix are ordered in a
particular order. It is necessary to enclose the elements that make up a matrix in
parentheses or brackets in order to accurately describe those elements. The matrix that
may be seen below includes 9 separate components.

10. Trie:

A data structure known as a triad makes it possible to retrieve information in an


efficient manner. It is possible, via the utilization of Trie, to minimize the complexity
of search algorithms to an optimal limit, which is referred to as key length. If we choose
to store keys in the binary search tree, the amount of time that will be required for a
well-balanced BST will be proportional to M times the logarithm of the number of keys
that are stored in the tree. The value M indicates the utmost string length that may be
used, while the value N specifies the number of keys used in the tree. When we employ
the Trie method, the search for the key may be completed in O(M) time. Despite this,
the cost is determined by the amount of storage space required for Trie.

2.1 DATA MANAGEMENT CONCEPTS

When we talk about "data definition," we're referring to a certain data that possesses
the following qualities.
19 | P a g e
 Atomic should define a one concept or principle that applies to the whole thing.
 If an item is to be considered traceable, its definition must be capable of being
mapped to at least one data element.
 The definitions that we use shouldn't be open to interpretation in any way.
 The definition needs to be clear, succinct, and straightforward, all while being
simple to understand.

Data Type

A data type is a mechanism for classifying distinct kinds of data, such as integers,
strings, and so on. Some examples of data types are strings and numbers. This approach
determines the types of operations that can be performed on each type of data as well
as the kinds of values that can be used with each type of data. It also determines the
kinds of values that can be used with each type of data. There are not one but two
distinct types of data:

 Built-in Data Type


 Derived Data Type

Built-in Data Type

Built-in data types are the data types for which a language already offers support and
are referred to by that term. These data types are called "built-in" data types. To give
you an example, the majority of computer languages include the following data types
built in:

 Integers
 Boolean (true, false)
 Floating (Decimal numbers)
 Character and Strings

Derived Data Type

Data types are said to be implementation independent if they can be implemented using
one of two different methods. Derived data types can be implemented using either
approach. Both approaches are valid for implementing derived data types. Primary data
types or built-in data types are commonly combined with the operations that are
connected with those data types in order to produce these data types. These data types
may then be used. Take for instance the following:

20 | P a g e
 List
 Array
 Stack
 Queue

Basic Operations

Following the completion of several operations on the data, it is finally saved in the
appropriate data structure. The frequency of the operation that has to be done on the
data structure will, to a considerable extent, play a significant role in determining the
particular data structure that will be picked.

 Traversing
 Searching
 Insertion
 Deletion
 Sorting
 Merging

2.1.1 Basic Terminology

The data structures included within a piece of software or computer program are
considered to be its most essential components. One of the most challenging problems
that a programmer will need to solve is figuring out which data structure would be most
effective for a particular application. This is one of the most difficult issues that a
programmer will encounter. The following is a glossary of fundamental terminology
that are invoked whenever the topic of data structures comes up in conversation:

1. One way to look at data is as a one fundamental value, and another is as a


collection of distinct values in their own right. For example, the data connected
with an employee would include the individual's name and ID number. This
information would be stored in the employee's personnel file.
2. Items of Data The phrase "data item" refers to a singular unit of value in the
context of this article.
3. Group Items are a special kind of data item that can contain other data items as
subordinates. Data items can also be referred to as Group Items. For illustration's
sake, an employee's name may be composed of their initial name, middle name,
and last name.

21 | P a g e
4. Items at the Earliest and Most Primitive Level: Data Items are considered to as
elementary when they cannot be further split into other items and do not fall into
any of the other categories. Consider, for example, the identity card that is
carried by an employee.
5. Entity and Attribute: In computer science, an Entity is a representation of a
certain class of objects, while an Attribute is a way to describe that class. It is
composed of a wide range of qualities or aspects in their various forms. One and
only one Attribute can be used to describe the specific trait that an Entity holds.
Take for instance the following:

Attributes ID Name Gender Job Title

Values 1234 Stacey M. Hill Female Software Developer

To create an entity set, you need entities that have attributes in common with one
another. A collection of all the various possible values that might be assigned to a
certain attribute is referred to as a range of values, and it is connected to each attribute
that constitutes an entity set. An entity set can be thought of as a collection of attributes.
Sometimes, the term "information" will be used when referring to data that has been
given the attributes of meaningful data or processed data. This is because these data
have been given the qualities of "information."

1. A single piece of information that stands in for an attribute of an entity is


referred to as a "Field," and the word "Field" stands alone as its own noun. It is
possible to create information that is more complicated by grouping fields
together.
2. Take note: A record is a compilation of many data items that are kept together
in a single area for storage purposes. For example, if we are speaking about the
employee entity, the record for the employee may be created by compiling the
individual's name, id, address, and work title into a single set.
3. A File is a collection of separate Records that all belong to the same kind of
entity and contains all of those Records in one place. For example, if there are
one hundred workers, the linked file will include twenty-five entries, and each
of those records will contain information pertaining to one of the workers.

One possible definition of a data structure is "basically a group of data elements that
are put together under one name." This is an explanation of a specific method for

22 | P a g e
storing and organizing data in a computer in such a way that it may be accessed and
utilized efficiently. The majority of programs and pieces of software cannot function
properly without the utilization of data structures. Examples of common data structures
include the array, the linked list, the stack, the queue, the tree, and several other graphs.

The following is a list of the fields that make extensive use of data structures.

 Compiler Design
 Operating System
 DBMS
 Artificial Intelligence
 Graphics
 Simulation
 Numerical Analysis Etc.

2.1.2 Classification of data structures

 The most elementary representation of data.

Primitive data structures are the most fundamental data types that may be handled by a
computer language. They get their name from the fact that they are the most
fundamental. Examples of fundamental data types are integers, characters, Boolean
pointers, floats, doubles, and reals. Other examples include reals and floats.

 Data structures that are not of a fundamental nature

The data structures known as arrays, linked lists, and trees are all instances of
fundamental data structures. The data structures known as non-primitive data structures
are those that may be produced with the help of basic data structures.

 Linear data structures are examined in this study.

When the components of a data structure are organized in a way that allows them to be
read and written in a linear fashion, we refer to that arrangement as a linear data
structure. An array, a stack, a queue, and a linked list are all examples of linear data
structures. Other examples include a linked list. When storing linear data structures in
memory, one uses a sequential memory location. This ensures that there is a linear link
between the different components of the structure that is being stored.

23 | P a g e
 Structures of data that are not linear in nature

The term "non-linear data structure" refers to a type of data structure in which the
constituent parts of the structure are not maintained in the order in which they were
initially formed. both graphically and in the shape of a tree. For the purpose of storing
non-linear data structures, memory is structured in a manner that is both random and
location-based.

2.1.3 Need for Data Structures

There is a potential that problems with data searching, processing speed, the
administration of multiple requests, and a great lot of other problems may occur as a
result of the growing complexity of applications and the ever-increasing volume of
data. This complexity and volume of data are both expanding at an alarming rate. The
data that are supported by data structures are given access to a wide variety of efficient
organizing, administration, and storage techniques thanks to the existence of data
structures. As a result of the support that is offered by data structures, we are able to
traverse the data items in a very timely manner. Utilization of data structures can result
in improvements to operational efficacy, reusability, and abstractness of the underlying
data. Why is it essential for us to continue our education? Data skeletons or structures?

1. Data structures and algorithm design are typically considered to be two of the
most essential aspects in computer science.
2. While data structures allow us to organize and store data, algorithms allow us
to process data in a meaningful way by making use of the data that we have
already saved. Both of these capabilities are essential to our work.
3. When it comes to programming, our level of expertise will directly correlate to
the amount of time we spend studying data structures and algorithms.
4. To a larger extent, we will be able to design code that is both effective and
dependable in its operation.
5. In addition to this, we will be able to address issues in a manner that is both
more prompt and more effective.

Understanding the Objectives of Data Structures:

Data structures are able to accomplish two aims that are mutually supportive of one
another:

24 | P a g e
1. Accuracy: Data structures are built to operate correctly for all different kinds
of inputs, based on the area of interest that is being examined. This is done so
that the data may be as accurate as possible. To put it another way, achieving
accuracy is the primary objective of data structure, and whether or not this
objective is met depends on the questions that the data structure is designed to
answer.

2. The design of a data structure also requires another attribute, and that quality is
efficiency. It is important that the data be processed quickly while utilizing as
few of the available computer resources, such as memory space, as possible.
The efficiency of a data structure is one of the most essential aspects that will
determine the success or failure of a process while it is being carried out in a
real-time setting.

Understanding some Key Features of Data Structures

The following is a list of some of the essential qualities of data structures:

1. Robustness: In general, the objective of all computer programmers is to design


software that, in addition to running efficiently on all different kinds of
hardware platforms, also generates the right result for every possible input.
Robustness refers to the ability of the software to withstand a wide range of
inputs and still deliver the expected output. This form of trustworthy software
has to be able to handle both accurate and wrong inputs in order to function
properly.

2. Capability to Adapt: The process of developing software programs such as


web browsers, word processors, and internet search engines includes the
building of massive software systems that, in order to be run, need to work
correctly and effectively for a number of years. Additionally, the development
of software can be attributed to the appearance of new technologies or the
continually evolving conditions of the market.

3. Along with flexibility, reusability is one of those qualities that naturally occurs
in conjunction with a variety of other characteristics. It is a well-known fact that
in order for a programmer to create any piece of software, he or she requires
access to a vast number of resources, which results in the attempt being a costly

25 | P a g e
one. On the other hand, if the software is constructed in a manner that makes it
adaptable and reusable, then it will be able to be implemented in the vast
majority of applications that will be created in the years to come. As a result of
this, it is possible to build reusable software by executing high-quality data
structures, which has the potential to save both money and time.

2.1.4 Goals of Data Structure

A data structure is a method of storing and arranging a large quantity of data in the
memory of a computer. This can be accomplished in a number of different ways. This
ensures that data will be able to be recycled in an efficient manner in the years to come.
Data can be stored with logical or mathematical representations of themselves, if
desired. It is possible for it to alter depending on the levels of adaptability that are
present within the organization as well as the requirements that are set by the
corporation. Modifications are made to the data models in order to take into
consideration the following two factors:

1. The structure was devised in such a way that it would correctly portray the link
that exists between the data and the actual things that are present in the
environment.
2. The degree of complexity should be kept low so that reusability and resilience
may be maximized.

When creating software, the best solution takes into account a variety of distinct criteria
at various stages of the process;

1. It is essential that the solution precisely address the problem at hand in order to
be effective.
2. The efficacy of the treatment in every possible setting.
3. In addition to this, the data structures should be as effective as is practically
practicable.
4. It should be more time- and cost-efficient compared to other alternative options,
especially when weighed against others.

Selecting the suitable data structure and algorithm may assist reduce down costs while
simultaneously enhancing advantages. This is accomplished by lowering the amount
of space and time complexity involved in the process. It makes it possible to retrieve
the data in the right manner as well as make changes to it. The data structure establishes

26 | P a g e
the connection between the many components of the answer, as well as the roles that
each one need to play in resolving the issue. It shows how fantasy and reality are
connected in this way.

2.1.5 Fundamental of the Data Structure

A data structure is a specific format of data that is used for organizing and storing data
in such a way that any user may simply access and operate within an appropriate data
set to guarantee that a program is performed as effectively as possible. A data structure
is a format of data that is used for organizing and storing data in such a manner that
any user may simply access and operate within an appropriate data set. A data structure
is a method of arranging the data that is kept in the memory of a computer in a manner
that is either logical or mathematical. This procedure is known as "data structuring." In
general, the choice of whether or not to employ a certain format of data is impacted by
two distinct aspects that need to be taken into consideration. The depth of the data must
be sufficient to satisfy the real-world links that exist between the data in the natural
contexts in which they are stored. On the other hand, the data structure has to be
maintained in such a way that it is kept as simple as possible, making it simple to
process the data whenever it is necessary to be utilized. The structures of the data and
the features that define them.

The following is a list of the characteristics shared by the many types of data structures:

1. Linear: A linear describes data characteristics whether the data items are
arranged in sequential form like an array.

2. Non-Linear: A Non-Linear data structure describes the characteristics of data


items that are not in sequential form like a tree, graph.

3. Static: It is a static data structure that describes the size and structures of a
collection of data items associated with a memory location at compile time that
are fixed. Example - Array.

4. Homogenous: It is a characteristic of data structures representing whether the


data type of all elements is the same. Example- Array.

5. Non-Homogenous: It is a characteristic of data structures representing whether


the data type elements may or may not be the same.

27 | P a g e
6. Dynamic: It is a dynamic data structure that defines the shrinking and
expanding of data items at the run time or the program's execution. It is also
related to the utilization of memory location that can be changed at the
program's run time. Example: Linked list.

7. It has some rules that define how the data items are related to each other.

8. It defines some rules to display the relationship between data items and how
they interact with each other.

9. It has some operations used to perform on data items like insertion, searching,
deletion, etc.

10. It helps in reducing the usage of memory resources.

11. Time Complexity: The execution time of a program in a data structure should
be minimum as possible.

12. Space Complexity: The memory usage through all data items in a data
structure should be less possible.

2.1.6 Basic Operations of Data Structures

Some basic functions are chosen based on all types of data that occur in a data structure.

1. Traversing: It is used to visit each variable once. It is also referred to as visiting


elements in the data structure.

2. Searching: It is used to find the location of a given element in the whole data
structure. Example, an array.

3. Insertion: It is used to insert an element at the specified position in data


elements.

4. Deletion: A deletion operation is used to delete an element from the specified


location.

5. Sorting: It is used to arrange or sort the data elements in ascending or


descending order. However, the data items are arranged in some logical order,
such as Name key, account number, etc.

28 | P a g e
6. Merging: The Merge operation is used to join two or more sorted data elements
to a data structure.

2.1.7 Needs of the data structures

Data is a general term for information that may relate to either fundamental facts or
things and can be computed or altered in some way. A data structure will often make
use of two separate types of data, such as numeric data and alphanumeric data, amongst
other combinations. These data structures are what determine the features of the data
item that a particular function is working on. A number can be represented as a numeric
data type in a variety of ways, such as an integer or a floating-point number. The data
may consist of a single value or a group of values that are organised in a certain format.
Either way, the data are classified as "information." The organization of the data results
in the development of memory, which in turn makes a logical relationship between the
various data items and mandates the employment of data structures. In this way, the
data organization process creates a virtuous cycle.

2.1.8 Advantages of Data Structures

There are some advantages of data structure:

1. Efficiency: The efficiency and organization of a program depend on the


selection of the right data structures. Suppose we want to search for a particular
item from a collection of data records. In that case, our data should be organized
in linear like an array that helps to perform the sequential search, such as
element by element. However, it is efficient but more time consuming because
we need to arrange all elements and then search each element. Hence, we
choose a better option in a data structure, making the search process more
efficient, like a binary search tree, selection or hash tables.

2. Reusability: In the data structure, many operations make the programs


reusable. For example, when we write programs or implement a particular data
structure, we can use it from any source location or place to get the same results.

3. Abstraction: The data structure is maintained by the ADT, which provides


different levels of abstraction. The client can interact with data structures only
through the interface.

29 | P a g e
4. The data structure helps to simplify the process of collection of data through the
software systems.

5. It is used to save collection data storage on a computer that can be used by


various programs.

2.1.9 Disadvantages of data structures

1. A user who has deep knowledge about the functionality of the data structure
can make changes to it.

2. If there is an error in the data structure, an expert can detect the bug; The
original user cannot help themselves solve the problem and fix it.

Here is a collection of data structures, followed by some examples of how you may put
them to use:

An array data structure is a collection of things that all have the same data type and are
used to store information in memory areas that are close to one another. Arrays are used
to store information in memory regions that are next to one another. It contains a
collection of data fragments that have a size that has been established in advance and
cannot be changed while the program is being carried out. The vast majority of the
time, it is used in a computer program to organize data in such a manner that it makes
it simpler to search for or sort connected objects or values inside a system. This is
accomplished by utilizing a data structure.

A Linked List Is an Example Of A: It is a collection of data connections that are referred


to as nodes, and each node contains a data value as well as the address of the link that
follows next. Each node in the collection also includes a data linkage that leads to the
next node. There is no component of the linked list that is stored in memory locations
that are immediately next to any other component of the linked list.

To put it another way, it is a series of data nodes that are linked to one another and
connected to the rest of the string. Each node in a list is made up of two components,
which are described in the following order: When used as a data structure, the major
role of a linked list is to give each node with a data section in which we are able to store
values and a pointer that indicates where the next node can be found. In other words, a
linked list's primary job is to supply each node with a data section. The uses of a linked

30 | P a g e
list are predicated on this fundamental idea, which serves as the list's underlying
structure. The node that is regarded to be the head of the linked list is the beginning of
the list, while the endpoints of the list are considered to be the node's tail.

Stack is an example of a linear data structure that enables things to be added and
removed beginning from a single endpoint known as the Top of Stack (TOS). The TOS
is also abbreviated as the Top of Stack. In a stack data structure, the operation known
as LIFO, which stands for "last in, first out," is used when adding and removing entries
from the stack list. This operation is used in the context of adding and deleting elements
from the stack list. Push and pop are the two operations that are viable candidates for
employment in the process of adding pieces to the stack or removing components from
the stack. This is what's known as an abstract data type. When working with a stack,
the push operation may be used to put elements at the top of the list, hide elements that
are already existing in the list, or initialize the stack if it is empty. Alternatively, it can
be used to conceal elements that are already present in the list.

The pop operation is used in the context of the stack data structure for the purpose of
removing a data item from the top of the list. This is accomplished by transferring the
item to the bottom of the list. The queue is a form of linear data structure that enables
the insertion of elements at one end of the list, which is referred to as the rear, and the
deletion of elements to occur at the other end of the list, which is referred to as the
front. These two ends of the list are referred to respectively as the rear and the front of
the list. It is a collection of data items that are organized sequentially and are based on
the data structure known as First in First out (FIFO). This structure ensures that the
most recent data is processed first. This specifies that the data components that were
added to a queue in the order in which they appear in the queue list will be removed
from the queue list in the order in which they appear.

The following description will walk you through the various queue-related procedures:

1. Enqueue (): It is a queue operation used to insert an element to the list.

2. Dequeue (): It is a queue operation used to delete an item from the list.

3. Peek (): It is used to get the first element of the queue list without removing it.

4. Is Full (): It is an Is Full operation that indicates whether the list is full.

5. Is Empty (): It is an Is Empty operation that represents whether the list is empty.
31 | P a g e
A graph is a non-linear data structure that consists of a finite number of vertices (nodes)
and edges to produce an illustrated representation of a collection of things. Vertices are
points on the graph that are connected to one another by edges. Diagrams of networks
may also be referred to as graphs. These edges and nodes might form a connection
between any two nodes in the network by going via any one of the other nodes. There
are valid representations of the connected node that may be made using either directed
or undirected graphs. A directed network consists of nodes that are only tenuously
connected to one another via the use of edges that go in just one of two possible
directions. In an undirected graph, each node is connected to at least one edge in each
and every one of the conceivable directions. Due to the fact that it can send and receive
information in both directions, it is often called a bidirectional node.

Trees are a kind of non-linear data structure that express hierarchical data in a pattern
similar to that of a tree, and they are structured in this manner. One of these nodes or
components is referred to as a root node, while the other components of a data structure
consisting of a value are referred to as subtrees. The root node is the element that has
the most significance. Another way to look at it is that a subtree is a subset of a subtree,
and a subtree is a subset of a subtree. There is only one node in the tree that may be
regarded to be the parent node, and the other nodes in the tree are referred to as child
nodes. Every node in the tree is responsible for ensuring that the parent-child link
remains intact. There can be only one parent node for a given node, even if it may have
any number of child nodes. There are many different types of trees, some examples of
which are a binary tree (sometimes called a binary search tree), an expression tree, an
AVL tree, and a B tree.

A heap is a specialized kind of complete binary tree that satisfies the heap criterion and
organizes the components in a certain sequence. The abbreviation "heap" is used to
refer to the data structure known as a heap. There are two distinct varieties of heap data
structures: the maximum heap and the minimum heap. In a Max heap, the value of the
root node is always either larger than or equal to the value of all of the child nodes that
are already present in the heap tree. This is because the root node is always at the top
of the heap tree. The value of the root node or element in the min-heap is always less
than the values of the elements that are already present in the heap node, and the value
of each child node in the min-heap is either equal to or larger than the value of the
parent node. This is because the value of the root node or element in the min-heap is
always less than the values of the elements that are already present in the heap node.

32 | P a g e
Hash tables are a non-linear kind of data structure that store and arrange data in key-
value pairs for the goal of getting access to certain keys or data items. This form of data
structure is recognized by its acronym, "hash table." A value that is not employed but
is linked or mapped to an element is referred to as a key. Hashing, which our data
structure makes use of, makes it simpler and faster to carry out operations such as
insertion and search on a broad range of data items, regardless of the amount of the
data. This is true regardless of whether or not the data is encrypted.

It's all in the dictionary... In the same way that a hash table contains data components
inside a collection of objects, a dictionary is a kind of data structure that works in a
manner that is analogous to that of a dictionary. On the other hand, in contrast to a hash
table, a dictionary is a collection of data elements that may either be sorted or unordered
and are arranged into key-value pairs. Each key in the system is matched with a specific
value that is allocated to it. Whenever we request a certain key from the dictionary, it
will provide us with the value that is associated with that key.

2.2 DATA TYPES – PRIMITIVE AND NON-PRIMITIVE

A data type is an attribute of a variable that defines to the compiler or interpreter how
the variable will be used by the programmer. A data type may be either a character
string or an integer. It details the types of actions that may be performed on the data as
well as the sorts of values that can be kept in the system. I will give you with an
overview of the different data types that may be utilized in Java in this article. This
overview will be at a high level. On the basis of the features that each of these group’s
display, it is possible to differentiate between them regardless of the data format:

1. Primitive Data Types


2. Non-Primitive Data Types

Primitive Data Types: When you buy a programming language, it will often come
with its own examples of fundamental data types that have been pre-defined by the
manufacturer. Both the scope of the variable values and the characteristics of those
values are preset, and there are no alternative options available.

The following are examples of non-primitive types of data: The programmer, and
not the programming language itself, is the one who is responsible for the construction
of these data types since the programming language does not truly define them. Due to
the fact that they point to a location in memory that houses the data, they are also

33 | P a g e
sometimes referred to as "reference variables" or "object references." Both of these
names are interchangeable. Let's go further into the particulars of Primitive Data Types
now that we've covered some of the terrain before.

 The Most Primitive Types of Information

Integers, floating point numbers, character strings, and boo lean values are the four
different forms of data that may be used in Java programs. On the other hand, there are
eight primary classifications of data in general. The following is a breakdown of their
components:

 boo lean data type


 byte data type
 char data type
 short data type
 int data type
 long data type
 float data type
 double data type

You can refer to the below figure to understand the different data types with respect to
the memory allocated to them. Now that we have an overview of the many types of
data, let's dive further into each one. Permit me to begin by elaborating on the concept
of a Boolean data type.

 Boolean data type

In a Boolean data type, the only thing that can be saved is a single piece of information,
and that information can only have either a true or a false value. This data type is used
for the purpose of maintaining a record of whether a scenario is true or untrue. Let's put
34 | P a g e
everything we've learned so far into practice by building a basic computer program and
analyzing its operation. When it came to the boo lean data type, that was all there was
to know about it. I really hope that wasn't too difficult to understand. After going over
that, let's go to the next data type, which is known as the byte data type. Let's look at
how it works.

 byte data type

This is an example of a data type that only uses basic operations. It is an eight-bit signed
integer with a two's complement representation. It remembers full numbers that range
from -128 to 127 inclusively. The use of a byte data type is beneficial when attempting
to save significant amounts of memory. Now let's try our hand at writing a simple
program and figuring out how it works.

That was all about the byte data type. Now let’s move further and comprehend the
following data type i.e., char.

 char data type

This particular data type can only hold a single character at a time. It is necessary to
contain the character in single quotation marks, such as 'E' or 'e'. You also have the
option of displaying certain characters based on their associated ASCII value. Let's
look at a simplified example to better understand how the process works.

35 | P a g e
That was all there was to know about the char data type. I really hope that wasn't too
confusing. Now that we have covered the previous data type, let's move on to the next
one on the list, which is the short data type.

 short data type

The size of data represented by a short type is higher than that of a byte but less than
that of an integer. It holds a value that may be anywhere in the range of -32768 to
32767. This data type has a default size of 2 bytes for its value. Let's look at an example
to better grasp what the short data type entails.

Moving on, let's proceed even farther and investigate the subsequent data type, which
is the int data type.

 int data type

This particular data type allows for the storage of whole numbers in the range of -
2147483648 to 21473647, inclusive. When generating variables that will store a
numerical value, the int data type should almost always be the one that is chosen. This
is because the int data type can store an unlimited number of bytes.

For example:

After getting a firm hold on this idea, let's go to the next data type on the list and
investigate its characteristics.

36 | P a g e
 long data type

This specific data type is a 64-bit integer that is represented in a manner known as two's
complement. The default size of a long data type is 64 bits, and the range of values that
are acceptable for the value of a long data type may be anywhere from -263 to 263-1.

2.3 TYPES OF DATA STRUCTURES - LINEAR & NON-LINEAR DATA


STRUCTURES

In computer science, data structures are essential constructions that play a crucial role
in the efficient organization, storage, and manipulation of data. Because they constitute
the foundation upon which algorithms are constructed, they are necessary components
in the development of software as well as in the examination of data. In the realm of
data structures, linear and non-linear data structures emerge as the overarching
categories that serve as the fundamental building blocks for the arrangement of data.
These categories govern how data pieces are stored and retrieved, and each category
has its own set of distinctive qualities and application scenarios. The name of these
kinds of data structures gives away the fact that they keep their data pieces arranged in
a linear fashion, which makes traversal and retrieval of the information quite simple.

Examples that fall into this category include arrays, linked lists, stacks, and queues.
These data structures make accessing the data easy and predictable. Non-linear data
structures, on the other hand, challenge this linear order and make it possible for data
items to have complicated connections with one another. This category include things
like trees and graphs, both of which allow the ability to describe complicated
hierarchies, networks, and interrelated data. Because the decision between linear and
non-linear data structures has a significant bearing on the effectiveness and beauty of
algorithms, it is essential for every prospective computer scientist or programmer to
have a comprehensive grasp of both types of data structures.

In the course of this investigation, we dig into the worlds of linear and non-linear data
structures, illuminating their qualities, use cases, and the essential part they play in the
field of computing. The effective arrangement, storage, and retrieval of data are made
possible by the data structures that are essential principles in the field of computer
science. They are necessary for resolving a wide range of computational issues, since
they are a critical component in the process of designing and optimizing algorithms.
Linear data structures and non-linear data structures are the two primary categories that
may be used to classify data structures in a more general sense.

37 | P a g e
1. Linear Data Structures: Linear data structures are those in which the elements
are organized in a sequential manner, and each element has a unique
predecessor and successor, except for the first and last elements. Some common
examples of linear data structures include:

Arrays are collections of elements, with each item being able to be identified by either
an index or a key. Arrays may be created in a number of different programming
languages. Elements are stored in memory in the order that they were added, and by
utilizing the indexes associated with those elements, it is possible to retrieve them in
the same amount of time each and every time.

Linked Lists: Linked lists are composed of nodes, and each node in a linked list not
only contains the data but also a reference (also known as a pointer) to the next node in
the sequence. Linked lists are also known as hypertext hyperlinks. Linked lists may
either be singularly linked, in which case each node in the list only connects to the next
node in the list, or doubly linked, in which case each node in the list refers to both the
next and the previous nodes. singularly linked lists are the most common kind of linked
list.

Stacks: A stack is a kind of linear data structure that follows the principle of "Last-In-
First-Out" (sometimes written as "LIFO") in its operation. The very top of the stack is
the only location where new items may be added or removed. Anywhere else in the
stack is off limits.

1. Queues: A queue is a kind of linear data structure that adheres to the First-In-
First-Out (FIFO) principle when it comes to how it processes information.
Components farther back in the vehicle are made more numerous while those
further front are removed.

2. Non-Linear Data Structures: Non-linear data structures are those in which


the elements are not organized sequentially and may have multiple predecessors
and successors. These data structures allow for more complex relationships
among elements. Some common examples of non-linear data structures include:

Trees are a kind of hierarchical data structure that are made up of nodes that are
connected to one another by edges. Edges connect nodes to one another in the tree.
They are structured with a root node at the very top of the hierarchy, and each node
may have zero, one, or more child nodes below it in the hierarchy. There are a wide

38 | P a g e
variety of diverse applications that make use of trees, some of which include
hierarchical file systems, binary search trees, and more complicated structures such as
AVL trees and Red-Black trees.

Nodes, which are often referred to as vertices, and the edges that connect them to one
another make up the components of a graph. In contrast to trees, graphs may include
cycles and may not have a single root at all. This is not always the case, though. They
have a wide variety of applications and are used in a variety of networks, including
social networks, transportation networks, and algorithms such as Dijkstra's shortest
route technique. Heaps are a special kind of tree-based data structure that are
implemented with the intention of improving the efficiency of algorithms that are used
for selecting and sorting data. Max Heaps are defined as those in which the parent node
is more than or equal to its children, while Min Heaps are defined as those in which the
parent node is less than or equal to its children.

Non-linear data structures provide a higher degree of flexibility and are used in more
intricate scenarios that include arbitrary connections between different data points. For
more straightforward situations in which the data is organized in a sequential fashion,
linear data structures are the best option. In order to successfully construct algorithms
and discover answers to issues in the area of computer science, it is vital to have a
strong grasp of the features and functions of different data structures. This
understanding is essential since data structures are used in almost every aspect of
computer science.

2.3.1 Linear vs Nonlinear Data Structures

A data structure is a method of organizing and storing data, which if done correctly,
would make it possible to retrieve and use the data in an efficient manner. Relational
databases are often used for the implementation of data structures. A data structure that
organizes its data components in a consecutive pattern, starting with the first element
and continues all the way through to the last, is referred to as a linear data structure. In
terms of the structure of the organization as a whole, the organization of linear data
structures is fairly comparable to that of the memory of the computer. A nonlinear data
structure is constructed by attaching a data element to a number of other data elements
in such a way that it indicates a given link among them. This may be done in a variety
of different ways. In order to build the structure, this step has to be taken. Memory on
a computer is organized in a manner that is distinct from the way in which nonlinear
data structures are laid out and organized.

39 | P a g e
2.3.2 Nonlinear data structures

In contrast to linear data structures, nonlinear data structures do not organize the data
components in a sequential fashion according to their order. Because one data item may
be related to numerous other data elements in order to show a specific relationship
between the data elements, it is not feasible to go through all of the data items in a
nonlinear data structure in a single pass. This is because going through all of the data
elements in a nonlinear data structure would take too long. Multidimensional arrays,
trees, and graphs are some examples of the nonlinear data structures that are utilized
often nowadays. Other kinds of data structures provide another group of examples.

A multidimensional array is nothing more than a collection of arrays that each have a
single dimension as a single component. A tree is a kind of data structure that is made
up of a collection of nodes that are linked to one another. Trees are useful for describing
the hierarchical relationship that exists between different data components. Several
computer languages, such as Java, C++, and Python, allow for the creation of trees in
their respective environments. A graph is a kind of data structure that may have an
infinite number of vertices and edges, all of which are related to one another in some
way. Edges are used to represent the connections or relationships that exist between
vertices, which are the nodes in a graph that are in charge of storing data items.

2.3.3 Linear data structures

The data items that are included inside linear data structures are structured in a linear
fashion, which means that the data pieces are added one after the other in the
appropriate sequence. This kind of data organization is referred to as a linear form of
data organization. In a liner data structure, the data components are traversed one after
the other, and at the time that the structure is being traversed, only one of those data
pieces may be accessed directly at any given time. Implementing linear data structures
is a procedure that may be described as being reasonably easy and uncomplicated. This
is due to the fact that the memory of the computer is also organized in a linear manner.

Linear data structures are used rather often, and some instances of these structures are
arrays, linked lists, stacks, and queues. An index may be used to locate any one of the
data components that make up an array. An array is a collection of data components.
In computer programming, arrays are used rather often. A linked list is a series of nodes,
each of which consists of a data element and a reference to the next node in the
sequence. Each node in the sequence also has a reference to the next node in the series.

40 | P a g e
Each node in the series also contains a reference to the node in the sequence that came
before it. A stack is really nothing more than a list with the limitation that data
components may only be added or removed from the very top of the list. In other words,
a stack is simply a list with that restriction. The only difference between a list and a
queue is that the latter allows data components to be removed from one end of the list
while the former allows data components to be added to the queue from either end of
the list.

A data structure is said to be linear if its constituent data components are organized in
a sequential or linear fashion, and if every element in the structure is related to both its
immediate predecessor and its immediate successor. One more name for this form of
data structure is a tree data structure, which is another name for it. Only one degree of
complexity is involved if one is working with a linear data structure. As a direct
consequence of this, we only need to complete one run in order to go through all of the
components. Because the memory of a computer is laid out in a linear form, linear data
structures are not only easy to construct but also take up very little space. Some
examples of this type are the array, the stack, the queue, and the linked list.

1. Array

A form of data structure known as an array will only hold instances of the same type
of element. These are the data structures that are the most basic and fundamental. A
positive integer, referred to as the index of the element, is assigned to the data that is
stored in each place of an array. An array's items may have their locations within the
array more easily identified with the assistance of the index. If it turns out that we need
to save some data, like the prices of 10 different vehicles, for example, we can build
the structure of an array and keep all of the integers together in one place. Creating 10
distinct integer variables is not required for this purpose. Because of this, the number
of lines included in a piece of code is decreased, and memory is preserved. When
dealing with an array, the index value for the first element begins with 0 in most cases.

2. Stack

The rule of LIFO, or "Last In-First Out," dictates that the piece of the data that was
most recently added will be eliminated first. The data structure adheres to this rule.
Both the push operation, which is used to add an element of data to a stack, and the pop
operation, which is used to delete data from a stack, are referred to as operations. The
analogy of books piled one on top of another might be used to illustrate this point. It is

41 | P a g e
necessary to carefully remove all of the books that have been stacked on top of the final
book in order to get access to the last book.

3. Queue

Due to the fact that the data is kept in sequential order, this structure is virtually
identical to the stack. The distinction is that the queue data structure adheres to the
FIFO (First In-First Out) rule, which states that the element that was added to the queue
first should be the one to leave the queue first. In a line, there are two points of
reference: the front and the back. The insertion action is called "enqueue," while the
deletion process is called "dequeue." The first task is completed at the end of the line,
whereas the second task is completed at the beginning of the queue. One possible way
to describe the data structure is by using the analogy of passengers waiting in line to
board a bus. The person who is first in line will be the first to leave the queue, while
the one who is last in line will be the person who leaves the queue last.

4. Linked List

The kind of lists known as linked lists are the kinds of lists in which the information is
stored in the form of nodes. Each node in a linked list consists of a data element, a
pointer, and a pointer to the next node in the list. The purpose of the pointer is to suggest
or call attention to the node that is situated immediately after the element in question
in the sequence. This node is positioned immediately after the element in question in
the sequence. It is possible for the data that is retained in a linked list to take any form,
including strings, numbers, or characters. This flexibility allows the data to be used in
a variety of applications. A linked list may store data that has been sorted, data that has
not been sorted, data that contains either unique or duplicate items, or data that contains
both types of things.

5. Hash Tables

For the purpose of implementing these types, linear and non-linear data structures are
both valid solutions to consider. The data structures have the information grouped into
key-value pairs according to the data.

2.3.4 Non-linear Data Structure:

The term "non-linear data structures" refers to a category of data structures in which
the constituent data elements are not arranged in a sequential or linear fashion, as the
42 | P a g e
name suggests. A non-linear data structure does not need the presence of a single level
at any given instant in time. This means that we won't be able to traverse all of the
components in a single run as we had hoped to. It is well known that non-linear data
structures are notoriously difficult to put into action as compared to linear data
structures. In contrast to a linear data structure, this data organization method makes
efficient use of the RAM that is accessible on the computer. There are examples of this
type in the form of trees and graphs.

1. Trees

A tree data structure is constructed from a number of nodes, all of which are linked to
one another in some way. Because a tree is hierarchical in its structure, there is a link
between its nodes that is comparable to the relationship that exists between a parent
and a child. The structure of the tree is built in such a way that there is one link for
every parent-child node connection that is present. This ensures that the tree can be
traversed in a logical manner. From the node at the tree's root to any other node in the
tree, there should be only one possible path to take. There are many distinct types of
trees, each of which may be distinguished from the others based on their structures.
Some examples of these structures include an AVL tree, a binary tree, and a binary
search tree, amongst others.

2. Graph

The non-linear data structures known as graphs are distinguished by the fact that they
have a predetermined number of vertices in addition to edges. The data is stored in the
vertices, also known as the nodes, and the edges illustrate how the vertices are
connected to one another. In contrast to a tree, a graph does not adhere to any particular
set of guidelines when it comes to the manner in which its nodejs are connected to one
another. Graphs allow for the representation of real-world issues such as social
networks, telephone networks, and other similar issues.

2.3.5 Difference between Linear and Non-linear Data Structures:

S.NO Linear Data Structure Non-linear Data Structure


1. In a linear data structure, data In a non-linear data structure, data
elements are arranged in a linear elements are attached in
order where each and every element hierarchically manner.

43 | P a g e
is attached to its previous and next
adjacent.
2. In linear data structure, single level Whereas in non-linear data
is involved. structure, multiple levels are
involved.
3. Its implementation is easy in While its implementation is
comparison to non-linear data complex in comparison to linear
structure. data structure.
4. In linear data structure, data While in non-linear data structure,
elements can be traversed in a single data elements can’t be traversed in
run only. a single run only.
5. In a linear data structure, memory is While in a non-linear data structure,
not utilized in an efficient way. memory is utilized in an efficient
way.

6. Its examples are: array, stack, While its examples are: trees and
queue, linked list, etc. graphs.
7. Applications of linear data Applications of non-linear data
structures are mainly in application structures are in Artificial
software development. Intelligence and image processing.
8. Linear data structures are useful for Non-linear data structures are
simple data storage and useful for representing complex
manipulation. relationships and data hierarchies,
such as in social networks, file
systems, or computer networks.
9. Performance is usually good for Performance can vary depending on
simple operations like adding or the structure and the operation, but
removing at the ends, but slower for can be optimized for specific
operations like searching or operations.
removing elements in the middle.

44 | P a g e
CHAPTER 3

LINEAR DATA STRUCTURE

Linear data structures are a basic category of data structures that create the basis for the
efficient storing, retrieval, and manipulation of information. They are also known as
the "backbone" of data organization in the field of computer science. These structures,
which can be identified by the sequential ordering of the data pieces inside them, have
a simplicity and beauty that belies the enormous adaptability and usefulness that they
provide. Linear data structures include anything from arrays and linked lists to stacks
and queues. These structures provide the fundamental building blocks necessary for
arranging data in a linear or one-dimensional form. They are the go-to choose in
innumerable applications, ranging from the implementation of dynamic arrays in
programming languages to the management of tasks in operating systems and the
modeling of real-world situations in simulations.

In other words, they are indispensable. Because linear data structures are able to give
users with uncomplicated access and manipulation of data, they have become an
essential tool for both software developers and computer scientists. This is because of
the linear data structure's attractiveness. During this investigation of linear data
structures, we will dissect the fundamentals, characteristics, and many applications that
have made these structures a pillar in the fields of computer science and data
management. These structures have had a significant impact on the digital environment
in a variety of significant ways.

The basic building elements of data organization in computer science, known as


foundational building blocks, indicate a category of data structures that is essential and
fundamental. These structures provide a straightforward and effective method of
storing and manipulating data in a sequential fashion, with each element being linked
to both its predecessor and its subsequent member in the chain. Linear data structures
are analogous to the threads that weave through the fabric of computing methods. They
make it possible to store information in an organized fashion, retrieve it, and manipulate
it in a systematic manner. Linear data structures are the foundation upon which many
algorithms and applications are constructed. Examples of linear data structures include
the time-tested arrays and linked lists, as well as the more specialized stacks and
queues.

45 | P a g e
Their adaptability allows them to be used in a broad variety of fields, ranging from the
deft management of program execution in software engineering to the effective
archiving of information in database management systems. Not only is an
understanding of these linear data structures a vital step in one's journey through the
field of computer science, but it is also a key to unlocking the potential for efficient
data processing and algorithmic innovation. This is because linear data structures are
organized in a linear fashion. In the course of this investigation of linear data structures,
we will look into the different shapes that linear data structures may take, the
applications to which they can be put, as well as the important role that linear data
structures play in sculpting the landscape of contemporary computing and information
management.

In the field of computer science and the structuring of data, linear data structures are
an essential component of the overall infrastructure. These structures provide a basic
and easy-to-understand method of organizing and managing data by aligning it in a
linear, one-dimensional pattern. This makes the process more efficient overall. The
elegance of linear data structures rests in the fact that they are simple, flexible, and
straightforward to implement. As a result, these structures are very useful for a broad
variety of different applications. Arrays, linked lists, stacks, and queues are some of the
data structures that are particularly useful for storing and retrieving data in a way that
accurately represents the order in which it was added. Linked lists give dynamic
flexibility by linking nodes, which enables simple insertion and deletion, but arrays
provide a contiguous block of memory that is perfect for random access and rapid
element retrieval. Arrays also allow for efficient element retrieval.

Stacks and queues, along with their respective Last-In-First-Out (LIFO) and First-In-
First-Out (FIFO) policies, are essential for managing data in specific ways. Stacks are
helpful for function call tracking and undo operations, whereas queues are useful for
tasks such as managing processes in operating systems and scheduling. The
administration and manipulation of data in ways that are efficient, structured, and
important to contemporary computing may be facilitated by linear data structures,
which serve as the building blocks of more complicated data structures and algorithms.
Linear data structures also constitute the backbone of computer programs and systems.
During this investigation of linear data structures, we will dig into the ideas,
implementations, and real-world applications that make them a vital and ever-present
component of the computer science environment.

46 | P a g e
3.1 ARRAY

A program can be conceived of, understood, and verified on the basis of the laws that
govern the abstractions, and it is not necessary to have further insight and knowledge
about the ways in which the abstractions are implemented and represented in a
particular computer. This is the core idea behind the use of abstractions in
programming, and it allows a program to be developed, understood, and verified more
efficiently. In spite of this, it is absolutely necessary for a professional programmer to
have an awareness of the methods that are frequently used for describing the core
notions of programming abstractions. For example, a professional programmer has to
be familiar with the fundamental data structures.

It is useful in the sense that it might enable the programmer to make reasonable
decisions regarding the design of the program and the data in light not only of the
abstract properties of structures, but also of their realizations on actual computers,
taking into account a computer's particular capabilities and limitations. This would be
helpful in the sense that it would be useful. The challenge of accurately representing
data lies in correctly mapping an abstract structure onto a computer's memory space.
To a first approximation, the components that make up a computer's storage are arrays
of individual storage cells that are referred to as bytes. It is generally accepted that they
are groups of 8 bits. The names given to the positions inside the bytes are known as
addresses.

3.1.1 Representation of Arrays

A mapping of the (abstract) array with components of type T onto the store, which is
an array with components of type BYTE, is a representation of an array structure. A
representation of an array structure is also known as a "representation." It is important
that the mapping of the array be done in such a manner that it makes the calculation of
the addresses of the array components as straightforward and straightforward (and
hence as efficient) as feasible. The linear mapping function is responsible for
computing the address of the jth array component in the array.

where is the address of the first component, and is the number of words that a
component takes up in total. a component takes up a certain number of words.
Assuming that the word is the smallest individually transferable unit of storage, it is
obvious that it is very desirable for s to be a whole integer, with the most
straightforward scenario being that s equals 1. In the event that s is not a whole number,

47 | P a g e
which is the typical scenario, the value of s is often rounded up to the next bigger integer
S. The result is that S words are utilized by each component of the array, but S-s words
are not used at all (see Figures 3.1 and 3.2). Padding refers to the practice of increasing
the required number of words to the next whole number beyond that. The storage
utilization factor, denoted by the letter u, is calculated by taking the minimum amount
of storage space required to represent a structure and dividing it by the amount of
storage space that is actually utilized.

u = s / (s rounded up to nearest integer)

Figure 3.1 Mapping an array onto a store

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 3.2 Padded representation of a record

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

48 | P a g e
Because an implementor is required to strive for a storage utilization that is as near to
1 as is reasonably practicable, and because accessing individual portions of words is a
procedure that is both laborious and somewhat inefficient, the implementor is forced to
make a compromise. The following factors should be taken into consideration:

1. The use of storage space is reduced by padding.


2. If padding is not included, partial word access may have to be performed in an
inefficient manner.
3. It's possible that accessing just a portion of a word may cause the code (the built
program) to grow, which would cancel out any benefits gained by eliminating
padding.

In point of fact, factors 2 and 3 are almost always so significant that compilers will
almost always automatically employ padding on their own initiative. The fact that the
utilization factor, u, is always more than 0.5 in cases where s is greater than 0.5 has
recently been brought to our notice. If, on the other hand, s is less than 0.5, it may be
feasible to significantly increase the use factor by putting more than one array
component in each word. This would be the case in the event that the usage factor is
less than 0.5. This process is most often referred to by the name "packing." When there
are n components crammed into a word, the utilization factor is determined as follows
(see Fig. 3.3).

u = n*s / (n*s rounded up to nearest integer)

Figure 3.3 Packing 6 components into one word

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Access to the i-th component of a packed array requires the calculation of the word
address j in which the desired component is placed, as well as the computation of the
relevant component position k inside the word. Both of these computations must be
performed before the desired component may be accessed

49 | P a g e
In the great majority of programming languages, the programmer is powerless to affect
the representation of the abstract data structures. However, there should be a
mechanism to indicate the utility of packing at least in those circumstances in which
more than one component would fit into a single word, which is to say, when a gain in
storage economy of a factor of 2 or more may be attained. This would be the case where
more than one component would fit into a single word. To indicate that packing is
beneficial, the convention that we recommend is to prefix the symbol ARRAY (or
RECORD) in the declaration with the symbol PACKED. This will indicate that packing
is useful. This would imply that packing is something that should be desired.

3.1.2 Applications of Arrays

Arrays are one of the most prevalent types of structures that you will see regardless of
where you look. Arrays have an extremely wide variety of applications; some examples
include assembly lines, the contact list on your phone, egg cartons, and online ticket
buying portals. If you are studying programming, one of the first topics that you will
have the opportunity to learn about in data structures and algorithms is the array data
structure. That is the extent of its pervasiveness and significance.

To begin, let's start with the formal definition:

 “Arrays are a type of data structure in which elements of the same data type are
stored in contiguous memory locations.”
 As we know, data structures are nothing but different means to store data in a
structured manner. An array is one of those means.

Basic Terms of An Array Data Structure

 In arrays, an element refers to a particular item that is stored.


 Each element carries an index - a location with respect to a base value.
 The base value is the memory location of the first element of the array.
 We simply add offsets to this value which makes it easier for us to use the
reference and identify items.
 Array length – the number of elements an array can store is defined as the length
of the array. It is the total space allocated in memory while declaring an array.

A linear data structure known as an array is a collection of data types that are
comparable to one another. Memory regions that are next to one another are used to
50 | P a g e
store arrays. It is a non-dynamic data structure that has a predetermined amount of
space. It groups data of the same kind together.

3.1.3 Applications of Array Data Structure:

1. Implementation of Stacks and Queues

Stacks and queues are two types of data structures that may be implemented using
arrays. Stacks and queues are both examples of linear data structures that may be used
in a variety of contexts. Arrays, as opposed to linked lists are more straightforward to
construct than stacks and queues.

Let's have a look at the many ways in which an array may be used to accomplish
operations on a stack. -

Putting another component on top of the stack

The push operation, often known as the action of putting an item to the very top of a
stack, is divided into two stages:

• We increment the top variable so it can now refer to the next memory location
• Then we add a new element at the position of the incremented top.

2. Implementation of other data structures

A wide variety of data structures, including lists, heaps, hash tables, strings, and VLists,
may also be constructed with the help of arrays. The use of arrays as a data structure
allows for a great deal of flexibility. Implementations of data structures that make use
of arrays are simpler, more space-efficient, and need less space overhead than tree-
based data structures. This is in contrast to the space overhead required by tree-based
data structures. Array-based data structures may, on the other hand, have a bad space
complexity, particularly when they are altered or updated.

3. CPU Scheduling

For the benefit of those who are uninformed, the term "CPU scheduling" refers to the
process by which the central processing unit (CPU) chooses the order and manner in
which multiple tasks will be accomplished when it is required to carry out many
activities at the same time. CPU scheduling is also known as "task scheduling." Arrays,

51 | P a g e
which may be a handy data structure to employ for this purpose, may include the list
of processes that need to be scheduled for CPUs that we need to keep track of. This list
may be included in arrays.

• In Linux CPU scheduling, a runnable job is considered executable if it has not


spent all of the time available in its allotted time quantum. These jobs are placed
in an active array, which is indexed by priority.
• When a job’s time slice expires, it is moved to an expired array. As part of the
transfer, the priority of the tasks may be re-assigned.
• The two arrays are exchanged when the active array gets empty.
• Runqueue structures are used to store these arrays. Each processor on a
multiprocessor computer has its own scheduler and runqueue.

4. Implementation of complete binary trees

In order for binary trees to be considered complete, each level of the tree's structure
must be populated, and the orientation of all of the leaf components should be to the
left. During the process of constructing a tree, one may use an array to store the data
values of the tree in an efficient way. This can be done at any point in the process. This
is performed by putting each data value in the array location that corresponds to that
node's position within the tree. This allows the data to be organized in a hierarchical
fashion. If you take a look at the table, you should be able to see that the positions of a
node's relatives inside the array adhere to a certain format that has been established in
advance. We are able to derive simple methods that enable us to locate each relative of
a node X depending on X's index. This enables us to create a family tree.

In order to go to the left or right child of a node, pointers are not required in any
circumstances. If the array is chosen to have a size of n for a tree that has n nodes, as
this indicates, then it seems that there is no cost associated in the implementation of the
array. This is suggested by the fact that there is no cost involved. When it comes to the
construction of tree data structures, arrays are so often regarded to be the technique of
choice, as opposed to pointers, which are more commonly used. The following is a list
of examples of how arrays may be used:

 Storing and accessing data: Arrays are used to store and retrieve data in a
specific order. For example, an array can be used to store the scores of a group
of students, or the temperatures recorded by a weather station.

52 | P a g e
 Sorting: Arrays can be used to sort data in ascending or descending order.
Sorting algorithms such as bubble sort, merge sort, and quicksort rely heavily
on arrays.

 Searching: Arrays can be searched for specific elements using algorithms such
as linear search and binary search.

 Matrices: Arrays are used to represent matrices in mathematical computations


such as matrix multiplication, linear algebra, and image processing.

 Stacks and queues: Arrays are used as the underlying data structure for
implementing stacks and queues, which are commonly used in algorithms and
data structures.

 Graphs: Arrays can be used to represent graphs in computer science. Each


element in the array represents a node in the graph, and the relationships
between the nodes are represented by the values stored in the array.

 Dynamic programming: Dynamic programming algorithms often use arrays


to store intermediate results of subproblems in order to solve a larger problem.

3.1.4 Real-Time Applications of Array:

Below are some real-time applications of arrays.

 Signal Processing: Arrays are used in signal processing to represent a set of


samples that are collected over time. This can be used in applications such as
speech recognition, image processing, and radar systems.

 Multimedia Applications: Arrays are used in multimedia applications such as


video and audio processing, where they are used to store the pixel or audio
samples. For example, an array can be used to store the RGB values of an image.

 Data Mining: Arrays are used in data mining applications to represent large
datasets. This allows for efficient data access and processing, which is
important in real-time applications.

 Robotics: Arrays are used in robotics to represent the position and orientation
of objects in 3D space. This can be used in applications such as motion planning
and object recognition.

53 | P a g e
 Real-time Monitoring and Control Systems: Arrays are used in real-time
monitoring and control systems to store sensor data and control signals. This
allows for real-time processing and decision-making, which is important in
applications such as industrial automation and aerospace systems.

 Financial Analysis: Arrays are used in financial analysis to store historical


stock prices and other financial data. This allows for efficient data access and
analysis, which is important in real-time trading systems.

 Scientific Computing: Arrays are used in scientific computing to represent


numerical data, such as measurements from experiments and simulations. This
allows for efficient data processing and visualization, which is important in
real-time scientific analysis and experimentation.

3.1.5 Advantages of array data structure:

 Efficient access to elements: Arrays provides direct and efficient access to any
element in the collection. Accessing an element in an array is an O (1) operation,
meaning that the time required to access an element is constant and does not
depend on the size of the array.

 Fast data retrieval: Arrays allow for fast data retrieval because the data is
stored in contiguous memory locations. This means that the data can be
accessed quickly and efficiently without the need for complex data structures
or algorithms.

 Memory efficiency: Arrays are a memory-efficient way of storing data.


Because the elements of an array are stored in contiguous memory locations,
the size of the array is known at compile time. This means that memory can be
allocated for the entire array in one block, reducing memory fragmentation.

 Versatility: Arrays can be used to store a wide range of data types, including
integers, floating-point numbers, characters, and even complex data structures
such as objects and pointers.

 Easy to implement: Arrays are easy to implement and understand, making


them an ideal choice for beginners learning computer programming.

54 | P a g e
 Compatibility with hardware: The array data structure is compatible with
most hardware architectures, making it a versatile tool for programming in a
wide range of environments.

3.1.6 Disadvantages of array data structure:

 Fixed size: Arrays have a fixed size that is determined at the time of creation.
This means that if the size of the array needs to be increased, a new array must
be created and the data must be copied from the old array to the new array,
which can be time-consuming and memory-intensive.

 Memory allocation issues: Allocating a large array can be problematic,


particularly in systems with limited memory. If the size of the array is too large,
the system may run out of memory, which can cause the program to crash.

 Insertion and deletion issues: Inserting or deleting an element from an array


can be inefficient and time-consuming because all the elements after the
insertion or deletion point must be shifted to accommodate the change.

 Wasted space: If an array is not fully populated, there can be wasted space in
the memory allocated for the array. This can be a concern if memory is limited.

 Limited data type support: Arrays have limited support for complex data
types such as objects and structures, as the elements of an array must all be of
the same data type.

 Lack of flexibility: The fixed size and limited support for complex data types
can make arrays inflexible compared to other data structures such as linked lists
and trees.

3.1.7 Advantages of Structure over Array:

• The structure can store different types of data whereas an array can only store
similar data types.
• Structure does not have limited size like an array.
• Structure elements may or may not be stored in contiguous locations but array
elements are stored in contiguous locations.
• In structures, object instantiation is possible whereas in arrays objects are not
possible.

55 | P a g e
3.1.8 Sparse Matrices

A matrix that has a significant percentage of its entries set to zero is referred to as a
sparse matrix. Utilizing specialized algorithms and data structures that make use of the
sparse structure in order to make effective use of the memory is necessary in order to
achieve this efficiency. If we apply the operations to sparse matrices using typical
matrix structures and techniques, then the execution will become more sluggish, and
the matrix will use up a significant amount of memory. It is simple to compress sparse
data, which, in turn, may greatly decrease the amount of memory that is required.

Sparse matrices may be divided into two distinct categories. The first form of sparse
matrix has zero assigned to all of the entries that are located above the major diagonal.
This particular kind of sparse matrix is sometimes referred to as a (lower) triagonal
matrix. This is due to the fact that, when seen graphically, all of the elements that have
a value that is not zero appear below the diagonal. When i is greater than j, the Ai,j
value in the matrix is equal to zero. A lower-triangular matrix with n rows and n
columns will have one non-zero element in the first row, two non-zero elements in the
second row, and so on up to n non-zero entries in the nth row. Take a look at Fig. 3.4,
which depicts a matrix with lower triangles.

Figure 3.4 Lower-triangular matrix

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

We may use a one-dimensional array that only stores non-zero items to efficiently store
a lower-triangular matrix in the memory. This array will only store elements that are
not zero. Any one of the following strategies may be used to successfully translate data
from a one-dimensional array to a two-dimensional matrix:

1. Row-wise mapping—Here the contents of array A [] will be {1, 5, 3, 2, 7, –1,


3, 1, 4, 2, –9, 2, –8, 1, 7}
56 | P a g e
2. Column-wise mapping—Here the contents of array A [] will be {1, 5, 2, 3, –9,
3, 7, 1, 2, –1, 4, –8, 2, 1, 7}

When i is greater than j, the Aij value in the matrix is equal to zero. An upper-triangular
matrix A with n rows and n columns contains n non-zero items in the first row, n minus
one non-zero elements in the second row, and the same goes for the nth row, which has
one non-zero element. Take a look at Fig. 3.5 to see a matrix with upper-triangular
elements.

Figure 3.5 Upper-triangular matrix

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

A second kind of sparse matrix is one in which elements with a non-zero value may
appear only on the diagonal or immediately above or below the diagonal. This is
referred to as a diagonal sparse matrix. The sparse diagonal matrix is the name given
to this particular configuration. This specific sort of matrix is also known as a tri-
diagonal matrix and has a few additional names. Therefore, in a tridiagonal matrix, Ai,j
equals 0 given that the difference between i and j is greater than 1. If there are items on
any of the tridiagonal matrix's three diagonals, then the matrix is said to be tridiagonal.

1. the main diagonal, it contains non-zero elements for i=j. In all, there will be n
elements.
2. below the main diagonal, it contains non-zero elements for i=j+1. In all, there
will be n–1 element.
3. above the main diagonal, it contains non-zero elements for i=j–1. In all, there
will be n–1 element.

A tri-diagonal matrix is seen here in Figure 3.6. We may use a one-dimensional array
that only stores non-zero items to efficiently store a tri-diagonal matrix in the memory.

57 | P a g e
This array will only store elements that are not zero. Any one of the following strategies
may be used to successfully translate data from a one-dimensional array to a two-
dimensional matrix:

1. Row-wise mapping—Here the contents of array A [] will be {4, 1, 5, 1, 2, 9, 3,


1, 4, 2, 2, 5, 1, 9, 8, 7}
2. Column-wise mapping—Here the contents of array A [] will be {4, 5, 1, 1, 9,
2, 3, 4, 1, 2, 5, 2, 1, 8, 9, 7}
3. Diagonal-wise mapping—Here the contents of array A [] will be {5, 9, 4, 5, 8,
4, 1, 3, 2, 1, 7, 1, 2, 1, 2, 9}

Figure 3.6 Tri-diagonal matrix

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

3.2 STACK

The last-in, first-out (LIFO) concept governs the process by which items are added to
and deleted from a stack, which is a container of objects. Objects may be added to a
stack at any time, but the "last" object that was added can only be removed at any point
in time. Objects can be added to a stack at any point in time. The metaphor of a stack
of plates in a spring loaded, cafeteria plate dispenser is where the word "stack" comes
from. In this scenario, the essential actions entail "pushing" and "popping" the plates
that are stacked on top of one another.

When we require a new plate from the dispenser, we "pop" the top plate off the stack,
and when we add a plate, we "push" it down on the stack so that it becomes the new
top plate. When we are through, the stack is back in its original position. A PEZ R
candy dispenser, which holds mint candies in a spring-loaded container and "pops" out
the top-most candy in a stack when the top of the dispenser is pushed, would be an even
more entertaining metaphor. This dispenser keeps the candies in a container that can
be lifted.

58 | P a g e
3.2.1 Stack-Definitions & Concepts

A stack is a kind of linear data structure that is not basic. It is a kind of ordered list in
which both the insertion of new data items and the removal of data items that already
exist takes place from just one end, which is referred to as the Top of Stack (TOS).
Since a stack's operations begin at the top of the stack, the element that was most
recently added to the stack will be the one that gets removed from the stack in the order
in which it was added. Because of this characteristic, the stack is also referred to as a
Last-In-First-Out (LIFO) kind of list. Take a look at the following examples:

• A common model of a stack is plates in a marriage party. Fresh plates are


“pushed” onto the top and “popped” off the top.
• Some of you may eat biscuits. If you assume only one side of the cover is torn
and biscuits are taken off one by one. This is called popping and similarly, if
you want to preserve some biscuits for some time later, you will put them back
into the pack through the same torn end called pushing.

When a stack is constructed, the base of the stack does not change. The height of the
stack increases each time a new element is added to the stack at any position other than
the top. On the other hand, the height of the stack decreases each time the element at
the very top of the stack is removed.

Figure 3.7 Schematic diagram of a stack

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

59 | P a g e
A stack is an ordered collection of data components that are of the same kind, and the
actions of adding and removing data elements only take place at one end of the stack.
In the context of a stack, the actions of inserting new data and removing old data have
been given the specific names of PUSH and POP, respectively, and the location inside
the stack at which these operations are carried out is referred to as the TOP of the stack.
ITEM refers to each individual piece that makes up a stack. The word "SIZE" refers to
the greatest number of components that a stack may hold at one time. A typical
overview of a stack data structure may be seen in the figure.

3.2.1.1 Representation of a Stack

There are a number distinct strategies that may be used in order to successfully store a
stack in memory. The most common methods that may be used are either making use
of a single-dimensional array or a single-linked list. Both of these techniques are one-
dimensional. The following are some of the ways in which arrays may be used to
represent stacks: The first thing that needs to be done is to allocate a memory block that
is large enough to carry the full capacity of the stack in its entirety. This should be done
as soon as possible. It is therefore possible to store the components of the stack in a
sequential sequence, commencing from the first location in the memory block. The
method in question is referred to as "stacking."

Figure 3.8 Two ways of representing stacks

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

60 | P a g e
Itemi in Figure refers to the item that is located at the itch position on the stack. l and
u in Figure relate to the index range of the array that is being utilized, and the values of
these indices are normally 1 and SIZE respectively. A pointer known as TOP is used
to specify the place inside an array up to which point it will be filled with items taken
from the stack. This location is indicated by the array being populated with items from
the stack. When taking into consideration this depiction, there are two distinct
interpretations that are possible:

EMPTY: TOP < l

FULL: TOP ≥ u

A depiction of stacks using linked lists the array representation of stacks, despite the
fact that it is reasonably plain and easy to understand, is constrained in the sense that it
can only represent stacks of a certain size. In a variety of contexts and applications, the
height of the stack may change while the program is being run, depending on the
amount of data that is being stacked. An obvious and simple solution to this problem
is to use a linked list in place of a stack as a representation of the data in the stack. Any
stack may be accurately modeled using a single structure that is composed entirely of
linked lists. In this particular scenario, the DATA field is intended for the ITEM, whilst
the LINK field's job is to, as is normal, provide a reference to the item that will be
shown following this one.

In the representation of a linked list, the node that includes the item that is presently at
the bottom of the stack is the last node on the list, and the node that contains the item
that is currently at the top of the stack is the first node on the list. The item that is
currently at the top of the stack is the first node on the list. A PUSH operation will
cause a new node to be added to the top of the list, whereas a POP operation will cause
an existing node to be removed from the front of the list. This is the outcome of the
way the PUSH and POP operations work.

3.2.2 Operations on Stacks

The following is a list of the fundamental operations that may be carried out on the
stack:

The act of adding a new element to the top of the stack is referred to as the push
operation. PUSH is an abbreviation for this procedure. When an element is pushed into

61 | P a g e
the stack, another element is added to it. Since the newly added element will be added
to the top of the stack, the top of the stack is incremented by one after each push
operation. The situation known as stack-full condition occurs when an array has
reached its capacity and cannot accept any further elements. Overflowing the stack is
the name for this circumstance. The act of removing an element from the top of the
stack is referred to as the pop operation. POP stands for "pop operation." Following the
completion of each pop operation, the stack's value is reduced by one. If there is no
element on the stack when the pop operation is carried out, then this will result in a
situation known as stack underflow.

3.2.2.1 Algorithms for push & pop for static implementation using arrays

 Algorithm for inserting an item into the stack (PUSH)

For the sake of implementing the stack, let stack [maxsize] be an array. It is important
to note that the term "maxsize" should be taken to denote the maximum. The parameters
for the array stack's dimensions. Num is the element that will be added to the stack, and
top is the index number of the element that will be placed to the top of the stack. Both
variables reflect the element that will be added to the stack.

 The function of the Stack PUSH operation in C is as follows:

Step 1: [Check for stack overflow?]

If TOP = MAXSIZE – 1, then:

Write: ‘Stack Overflow’ and return

[End of If Structure]

62 | P a g e
Step 2: Read NUM to be pushed in stack.

Step 3: Set TOP = TOP + 1 [Increases TOP by 1]

Step 4: Set STACK[TOP] = NUM [Inserts new number NUM in new TOP Position]

Step 5: Exit

 Algorithm for deleting an item from the stack (POP)

For the sake of implementing the stack, we will assume that stack[maxsize] is an array.
Here, maxsize will refer to the maximum number of elements in the stack. The
parameters for the array stack's dimensions. Both num and top each represent an
element of the stack; num represents the element that will be removed from the stack,
while top represents the index number of the element that is presently positioned at the
top of the stack.

The function of the Stack POP operation in C is as follows:

Step 1: [Check for stack underflow?]

If TOP = -1: then

Write: ‘Stack underflow’ and return.

[End of If Structure]

Step 2: Set NUM = STACK[TOP] [Assign Top element to NUM]

Step 3: Write ‘Element popped from stack is: ‘, NUM.

63 | P a g e
Step 4: Set TOP = TOP - 1 [Decreases TOP by 1]

Step 5: Exit

3.2.3 Applications of Stacks

A stack is a kind of abstract data structure that is employed as a collection of several


components and is often used in the majority of computer languages. In the real world,
actions can only be performed at one of the ends of a stack. In this essay, we will talk
about the stack, its applications, as well as the benefits and drawbacks associated with
using it.

An ordered, linear series of components may be stored in what is known as a stack,


which is a kind of linear data structure. It belongs to the category of data known as
abstract. The Last-in-First-Out (LIFO) principle governs the operation of stacks and
stipulates that the element that was added to the stack most recently will be removed
from the stack first. Because we can only access the items that are on the top of the
Stack, in order to implement the Stack, it is required to keep a reference to the top of
the Stack, which is the element that was most recently put. This is the element that was
placed last.

3.2.3.1 Stack Representation

LIFO stands for the "Last-In-First-Out" concept, which is adhered to by linear data
structures like stacks. This indicates that the item that was most recently added to the
stack will be the one that gets eliminated first. It is an ordered collection of pieces in
which the addition of new things and the removal of existing ones take place at the
same end, which is sometimes referred to as the "top." A stack may be seen as a vertical
structure with pieces placed on top of each other, much like a stack of plates. This
representation of a stack is called a stack diagram. Consider a stack to be a collection
of objects that are placed one on top of the other, similar to a stack of books in which
you may add to or take away volumes from the top.

3.2.3.2 Array Implementation

In an implementation that is based on arrays, you will have an array in which you will
keep a variable called "top" that connects to the index of the element that is now at the
top of the stack. This variable will be linked to the array's index via a link. When you

64 | P a g e
push an element, an increment is added to the "top" index, and the newly pushed
element is put at that index. This ensures that the element can always be found at the
top of the stack.

3.2.3.3 Linked List Implementation

When a linked list is used as the implementation's foundation, each individual member
of the stack serves the same purpose as a node in the linked list. When you push an
element, a new node is generated, and it is directed to the node that is now in the top
position. Additionally, the top pointer is modified so that it refers to the newly
generated node. Processing Carried Out on Stacks The two basic operations that may
be carried out on a stack, which is essentially a container, are known as pushing and
popping, respectively.

Method of Operation Based on Pushing It is necessary to put an item to the very top of
the stack in order to finish a push operation and go on to the next step. Imagine a pile
of plates where each new dish is put on top of the one that came before it in the stack.
When used in certain settings, the PUSH operation automatically entails the adding of
a new component to a stack. Because a new item is continually added to the top of the
stack, we need to make sure that we use the formula TOP=Max-1 in order to determine
whether or not it is vacant at all times.

If it is, then we may proceed to the next stage of the process. If this condition is not
satisfied, the Stack will be assumed to be at its maximum capacity, and no further items
will be added to it. Even if we make an attempt to add the component, we will still get
a notification informing us that there has been a Stack overflow. The Pop Operation,
abbreviated as "PO" During the pop operation, the component that is located at the very
pinnacle of the stack is removed and discarded. As a visual representation of this
activity, picture yourself taking the top plate off of a stack of plates. "Popping" an
element off of a stack is what POP is used for. When you want to remove an element
from the stack, you must first make sure that the top of the stack is empty by verifying
whether or not the TOP=NULL expression returns true.

If this condition is met, then the Stack will be empty, which will prevent any activity
connected to deletion from taking place. Warnings of Stack Underflow will be
produced regardless of whether or not attempts are made to delete anything. The
operation's codename was "Sneak a Peep." By using the peek operation, you will be

65 | P a g e
able to view the top element without having to first remove it from the structure. It is
the same as looking at the plate that is on top of a stack without first taking the plate
that is below it out of the stack. Implementations of both arrays and linked lists may
enable the peek operation by accessing the element at the "top" index or the data in the
top node, respectively. This is done in order to perform the operation. In both instances,
this step is taken.

The Peek technique is used in situations in which it is necessary to return the value of
the element at the top of the stack without destroying that element. The name "Peek"
comes from the phrase "Peek and Preserve." This procedure starts off by determining
whether or not the Stack is empty by examining whether or not the statement TOP =
NULL is true. if the Stack is empty, the value will be returned; otherwise, an
appropriate notice will be shown. if the Stack is empty, the value will be returned.

Applications of Stack in Data Structure

1. Function Calls

The current state of the program is pushed into the stack whenever a function is called
in a computer program. The execution of the function that came before it is restarted
after the process has been completed by popping the state off of the stack. This is done
after the process has been completed.

2. Backtracking

Backtracking is one possible use for stacks, as is determining whether or not the
parentheses of an expression are compatible. The backtracking approach makes use of
stacks in order to organize and keep track of the many phases of the solution process.
After the current state has been pushed into the Stack, the algorithm will travel
backwards, at which point the previous state will be deleted from the Stack.

3. Undo/Redo Operations

Stacks are used by the undo and redo capabilities of many programs. These capabilities
allow the application to remember the actions that were performed before. When one
action is complete, another one is added to the Stack immediately after it so that the
cycle continues. The procedure is restarted once the action that caused the problem is
undone by popping the item that was at the top of the stack.

66 | P a g e
4. Web browser history

Your web browser uses a feature known as a "stack" to preserve a record of the sites
you see on the internet. The URL of the page that you viewed before that one gets
erased from the Stack whenever you hit the back button on your browser. Every time
you go to a new page, the old URL will be brought back up to the top of the stack.

5. Reverse the Data

If we want to invert a particular piece of data, we will need to reorganize the


information such that the first and last items are switched; the second and second-last
elements are switched; and so on for all of the subsequent components. Only then will
we have successfully inverted the data. By doing this, the information will be displayed
in the reverse order of how it was received. Take for instance the following: In the case
that we make use of the string coding Ninja, the outcome, when reversed, will be ajni.
Known as the Ngnidoc.

6. Parenthesis checking

In order to determine whether or not brackets are in a balanced condition, a stack data
structure is used. When an open parenthesis is replaced on the Stack with a close
parenthesis, the opening parenthesis that was there before is eliminated. When the
brackets are regarded to be in a balanced state, the Stack must be empty when the
expression comes to a conclusion.

7. Expression Evaluation

A data structure known as a stack is used in the evaluation of expressions that are stated
using the infix, postfix, and prefix notations. The Stack may be used to store operators
and operands, and the components that are located at the top of the Stack can be used
to carry out the operations that are stored in the Stack.

Advantages of Stack

1. Efficient use of the RAM

A stack is a data structure that, in contrast to other data structures, makes more effective
use of memory since it occupies a continuous block of memory. This is possible
because a stack stores its data in ascending order.

67 | P a g e
2. Applied to Compiler Design

Compilers are built using stack data structures at their core, and these structures are
used for a variety of functions, including parsing and analyzing the syntax of
programming languages.

3. Quick access period

When objects are added to and removed from the top of the stack, stack data structures
provide for quick access times for adding and deleting components since those
operations take place at the top of the stack.

4. Simple to implement

Stack data structures are simple to understand and create, and it is feasible to easily
implement them by making use of arrays or linked lists. Stacks may also store several
values in a single element. Stacks may also be included inside other types of data
structures in a nesting fashion.

Disadvantages of Stack

1. Restricted Capacity

The storage capacity of stack data structures is restricted due to the fact that these
structures can only keep a certain number of components in their memory at any one
moment. Adding more items to the Stack after it has reached its storage limit might
cause it to overflow, which would result in the loss of data if it were to continue.

2. No arbitrary access

When it comes to data structures, the components of a stack cannot be retrieved in any
order other than the one they were placed in. Instead, they will only allow you to add
components to or remove elements from the very top of the stack when it comes to your
ability to add or delete items. In order to access an element that is positioned in the
middle of the stack, it will first be necessary to eliminate all of the components that are
situated above it.

3. Not appropriate for some uses

Stack data structures are not a good fit for applications like searching or sorting
algorithms that require to access elements near the stack's center. Applications that do
not need access to components in the center of a Stack work better with stacks.

68 | P a g e
4. Limitations of the recursive function call

Utilizing the stack data structure makes it possible to do recursive function calls;
nevertheless, this opens the door for the program to become unresponsive in the event
of an overflow.

5. Underflow and overflow in a stack

If an abnormally high number of components are added to the Stack data structure,
there is a risk that it could suffer from stack overflow. On the other hand, the Stack data
structure may suffer from stack underflow if an abnormally large number of its
components are removed. Both of these outcomes are quite conceivable.

3.3 QUEUE

The queue is a structure that is almost as simple as the stack. Like the stack, it keeps
things, but unlike the stack, it returns the items in the order in which they were added
to the queue. This makes the queue a FIFO storage system (first in, first out), which
stands for "first in, first out." This indicates that the items are stored in the order in
which they were first input into the system. When there are recurrent responsibilities
that need to be completed, queues are a helpful organizing tool that may be used. The
primary distinction between breadth-first search (BFS) and depth-first search (DFS) is
that the former employs a queue while the latter makes use of a stack to store the node
that is going to be searched next. Additionally, in breadth-first search, they play an
essential role as a framework. The primary distinction between breadth-first search
(BFS) and depth-first search (DFS) is that the former makes use of a queue while the
latter makes use of a stack.

The procedures described in the previous sentence should, at the very least, be
supported by the queue:

• {enqueue (obj): Insert obj at the end of the queue, making it the last item.
• {dequeue (): Return the first object from the queue and remove it from the
queue.
• queue empty (): Test whether the queue is empty.

The main difference between a queue and a stack is that the modifications may happen
at either end. Inserts can be made at one end, and deletes can be made at the other. This

69 | P a g e
makes the queue somewhat more complicated than the stack. If we opt to construct the
queue using an array, then the section of the array that is now being used will move
sequentially across the array.

If we had an endless array, then there would be no issue with this as all. The sentence
might be written as follows:

A genuine implementation with a finite array will need to wrap this around in order to
function properly given that the index calculation is modulo the length of the array. The
configuration may look something like this:

70 | P a g e
Again, this has the fundamental problem that any array-based structure has, which is
that it has a predetermined maximum capacity. One potential disadvantage of any
structure based on arrays is the aforementioned possibility. Because it restricts it in this
way, it will almost certainly result in overflow problems and will not effectively
implement the structure. Consequently, the structure will not be successfully
implemented. In addition, regardless of whether or not the array is ever expanded to its
expected maximum size, it will always reserve space for such expansion in the event
that it occurs. The choice that includes both a dynamically allocated structure and a
linked list is the one that is suggested as the best one to go with. The following is the
most logical step to do in this situation:

Once again, we proceed on the basis that the get node and return node operations are
accessible, just as we do with every dynamically constructed structure. This is standard
operating procedure for us. These are the only two processes that always perform as
intended and take the same amount of time each time. Because this is the order in which
we want to remove items from the front of the queue, the pointers in the linked list are
ordered in such a way that they point from the beginning, where we delete things, to
the very end, where we add new ones. This is because this is the order in which we
want to delete things. This apparent method has two characteristics that, from an
aesthetic point of view, are not as great as they might be: We need a one-of-a-kind entry
point structure that is separate from the list nodes, and we must always deal with the
activities that entail an empty queue in a different way.

71 | P a g e
We are needed to adjust both the insertion pointer and the removal pointer in the event
that insertions are made into an empty queue and removals are made of the element
that is now at the end of the queue. However, in all other circumstances, we only modify
one of these points. By joining the two lists together to make a cyclic list, it is possible
to circumvent the first of the potential drawbacks. Because of this, the pointer that is
located at the very end of the queue will consistently point in the direction of the queue's
commencement. We won't need a removal pointer after all since the next component
of the insertion point relates to the removal point. As a consequence of this, the entry
point to the queue only needs one pointer, and as a consequence of this, it is of the same
type as the nodes that are included inside the queue.

The second disadvantage may be overcome by including a placeholder node into the
cyclic list in question. This node should be positioned between the completion of the
insertion phase and the finish of the removal process. The entry point continues to refer
to the end of the insertion; alternatively, if there is no list, it leads to the placeholder
node. Since this is no longer the case, an empty list is not treated as a unique
circumstance, at least not when it comes to the insert. Consequently, a variation of the
cyclic list may look somewhat like this:

72 | P a g e
Alternately, one might build the queue in the form of a doubly linked list. This kind of
list does not need any distinctions to be made between the words depending on their
case, but it does require two pointers for each node. The aesthetic requirement of
decreasing the number of pointers is more reasonable by the amount of work that has
to be done in each step to preserve the consistency of the structure in comparison to the
amount of memory that is needed by the structure. You may find an example of a
double-linked list here in its implemented form:

Despite the fact that both the stack and the list-based solutions have a similar outward
presentation, choosing which of the list-based solutions one prefers comes down mostly
to a matter of individual taste. All of the list-based solutions are slightly more complex
than the stack.

The queue is a dynamic data structure that, similar to the stack, has update operations
known as enqueue and dequeue in addition to query operations known as queue empty
and front element. All of these operations need the same amount of time to perform
and are referred to collectively as queue operations. Additionally, the operations to

73 | P a g e
construct queue and remove queue are subject to the same constraints that apply to
actions that are functionally identical in the stack, and these limits are as follows:
Creating an array-based queue requires getting a large block of memory from the
system's underlying memory management, whereas creating a list-based queue should
require only some get node operations; and deleting an array-based queue simply
involves returning that memory block to the system, whereas deleting a list-based
queue requires returning every individual node that is still contained in it, so it will take
O(n) time to delete a list-based queue that still contains elements; creating an array-
based queue requires

3.3.1 Operations on Queue

The following operations are performed on a queue data structure.

1. En Queue (value) - (To insert an element into the queue)


2. De Queue () - (To delete an element from the queue)
3. display () - (To display the elements of the queue)

Queue data structure can be implemented in two ways. They are as follows

1. Using Array
2. Using Linked List

If an array is used in the construction of a queue, then the queue is able to arrange a
limited number of items at one time. This is because an array is a sequential data
structure. When a queue is built with the assistance of a linked list, the resulting queue
has the capability of arranging an unlimited number of items in a sequential order.

3.3.1.1 Queue Data Structure Using Array

A single-dimensional array may be used to generate a queue, which is a special form


of data structure. Queues are used to organize data in sequential order. In the
construction of the array-based queue, there is only room for a limited amount of data
values to be stored. The construction of a queue data structure may be accomplished in
a very simple manner by making use of arrays. Build a one-dimensional array with a
size that has been established in advance, and then insert or delete items from that array
by using the FIFO (First In First Out) principle with the help of the variables 'front' and
'rear'.

74 | P a g e
This will accomplish the task. At the outset, the value for both 'front' and 'rear' is
initialized to the value of -1. When we want to add a new value to the queue, we have
to first move the value that is now in the "rear" position forward by one place, and then
we may insert the new value at that point. When we want to remove a value from the
queue, we must first remove the item that is presently in the "front" position, and then
we must increase the value that is currently in the "front" position by one. Only then
can we remove the value that was previously in that place.

 Queue Operations using Array

The following is an illustration of one possible method for constructing the queue data
structure making use of arrays:

To begin, before we get started with the actual implementation of the operations, let's
establish an empty queue by following the steps below.

This should take no more than a minute. First, make sure that you have included all of
the header files that are necessary for the program. Next, define a constant that you
have dubbed "SIZE" and assign it a value that is specific to your requirements. In the
second stage, all of the user-defined functions that are going to be used in the queue
implementation need to be declared.

As the third stage in the procedure, you will need to create a one-dimensional array
with the previously determined SIZE (int queue [SIZE]). In the fourth step, you will
need to declare two integer variables that you will later refer to as "front" and "rear,"
and you will then need to set the value of each of those variables to "-1." (the value of
the front int is -1, while the value of the rear int is -1)

Step 5: Next, design the main method by providing a menu of operations list and
performing necessary function calls to carry out the activities that the user has set on
queue. This should be done in order to complete those actions.

 En Queue (value) - Inserting value into the queue

Adding a new item to a queue is the job of the end Queue () function, which operates
inside the context of a queue data structure and is used to accomplish this task. When
there is a delay, the most recent product is always moved to the very end of the queue
to be processed. The end Queue () function takes one integer value as an input and puts

75 | P a g e
that value into the queue. It only accepts one input at a time. The value that was entered
into the queue is also returned by the end Queue () function. We may use the methods
that are outlined below in order to add a new item to the queue. These steps are
mentioned below.

To begin, it is necessary to ascertain whether or not the line is now at capacity. (in the
back, which corresponds to size one)

Step 2: If the queue is presently at capacity, display the message "Queue is Full!!!" on
the screen. The message "Insertion is not possible!!!" has to be shown before you leave
the function.

Step 3: If it is not full, you will need to set the value of the queue's rear to equal value
and then raise the value of the rear by one (using the notation rear++).

 De Queue () - Deleting a value from the Queue

While the queue data structure is being used, the De Queue () method may be used to
remove an element from the queue. This can be done even while the queue data
structure is being used. When anything is taken out of a queue, it is always done so
starting from the most recent place in the line. The de Queue () function does not need
any specific value to be handed in as a parameter and hence does not require any values
at all. It is possible for us to remove an element from the queue by following the
procedures that are given below.

Find out first whether there are any items in the queue or if it is presently empty. (front
is the same as back)

Step 2: If there is no item in the queue, the function should be terminated and the
message "Queue is empty!!! "Should be shown. If there are no items in the queue,
deletion is not an option!!!

Step 3: If the value of the front is NOT EMPTY, then the "front ++" operator should
be used to add one to the value of the front. After that, queue[front] will be shown as
an element that has been eliminated. The next step is to establish whether or not the
front and the back are the same thing (front == rear), and if this is the case, change the
value of both the front and the back to -1 (front = rear = -1). If the front and the back
are not the same thing, go to the next step.

76 | P a g e
 Display () - Displays the elements of a Queue

We may demonstrate the various components of a queue by following the processes


that are detailed further down in this section.

First things first, find out whether there are any goods currently in the queue or not.
(front is the same as back)

Step 2: If there are no items in the queue, the function will be terminated and the
message "Queue is EMPTY!!!" will be shown on the screen.

Step 3: Create an integer variable that you will refer to as 'i,' and if the variable already
exists, set its value to "i = front+1."

After displaying the value of the 'queue[i]' variable, you should use the operator (i++)
to raise the value of the 'i' variable by one. Carry on with the same activity until the
value of 'i' is either lower than or equal to the value of rear.

3.3.1.2 Queue Using Linked List

The queue that is built using an array has one basic problem, which is that it can only
operate effectively for a limited number of data values at a time. This limitation places
severe limitations on how much data can be stored in the queue. This suggests that the
whole amount of data must be presented as soon as the procedure is started. It is
inappropriate to use a queue that is based on an array when we do not know the quantity
of the data that will be used by our system. Realizing the functionality of a queue data
structure may be accomplished via the usage of a linked list data structure.

The queue, which is implemented via the use of a linked list, has the capability of
operating for an unlimited number of value iterations without losing its effectiveness.
This demonstrates that a queue that is built on a linked list is acceptable for processing
data sets of varying sizes (it is not necessary to identify the size of the queue prior to
beginning construction).

The capacity to organize an indefinite number of data items is given to us as a result of


the construction of the Queue utilizing linked lists. When a queue is constructed using
a linked list, the node that was most recently added to the queue is always referenced
to by the term "rear," whereas the node that was first established is always pointed to
by the word "front."

77 | P a g e
 Operations

Before we can go on to really implementing the operations, we need to get a few things
set up first so that we can utilize a linked list to form a queue. Once these things are
ready, we can move on to actually implementing the operations. You will need to begin
by include all of the necessary header files for the application in the very first stage of
the process. Also, declare all of the user-defined functions that are going to be used.

The second step is to define a structure that will be called a "Node," and it will have
two elements named data and next. After creating two Node pointers and naming them
"front" and "rear," go to Step 3 and assign the value NULL to each of them. Implement
the main method by displaying a menu that has a selection of operations, and make
relevant function calls inside the main method in order to carry out the operation that
the user has selected to do.

3.3.2 Circular Queue

We have discussed how, in linear queues, insertions may only take place at one end,
which is referred to as the REAR, and deletions are always conducted from the other
end, which is referred to as the FRONT. This topic was covered in more detail earlier
in this lesson. Take a look at the line that is shown in the figure numbered 3.9.

In this instance, the value of FRONT is 0, while the value of REAR is 9.

Figure 3.9 Linear queue

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

You will be unable to add another value to the queue at this time if you attempt to do
so since the queue has already reached its maximum capacity. Because there is no
empty space, there is nowhere for the value to be inserted anywhere in the equation.
Take into account a circumstance in which two cuts are carried out in rapid succession
one after the other. When that time comes, the line will be shown in the way that is
seen in Figure 3.10.
78 | P a g e
In this case, the front is equal to 2, while the back is equal to 9.

Figure 3.10 Queue after two successive deletions

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Imagine for a moment that we have the need to add another item to the queue that is
shown in Figure 3.11. The overflow condition is still there despite the fact that there is
extra space. This is due to the fact that the condition rear = MAX – 1 continues to be
true despite the fact that there is spare room. One of the most important downsides of
a linear queue is that it can only accommodate one customer at a time. There are two
possible courses of action that may be taken in order to resolve this issue. To get started,
shift the pieces over to the left side of the board so that the available space may be used
in the most efficient way possible and filled in. On the other hand, this might be a very
time-consuming operation, especially if there are a big number of individuals waiting
in line.

The use of a circular queue constitutes the second potential solution. The index that is
put immediately after the last index in the circular queue is the one that is considered
to be the first index in the circular queue. The circular line won't be deemed to be at
capacity until the front is equal to zero and the back is equal to Max minus one. It
should be noted that the implementation of a circular queue is, in every respect,
identical to the construction of a linear queue. The only difference that will be made is
in the code that is responsible for handling the insertion and deletion operations. Now,
in order to insert, we need to check to see whether all three of the following
requirements are present: The circular queue has achieved its maximum capacity when
the number at the front equals zero and the number at the rear equals MAX minus one.
Take a look at the line that is shown in Figure 8.16, as it serves as a perfect illustration
of this concept.

Figure 3.11 Full queue


79 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

If the current value of rear is less than or equal to MAX minus 1, then the current value
of rear will be incremented, and the new value will be introduced into the system in the
manner shown in Figure 3.12.

Figure 3.12 Queue with vacant locations

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

If the value of front is not equal to 0 and the value of rear is not equal to MAX – 1, this
shows that there is still room available in the line. Make sure that the value of rear is
set to 0 and then position the new element in the space that has been made available, as
shown in Figure 3.13.

Figure 3.13 Inserting an element in a circular queue

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Let us have a look at Fig. 3.14, which depicts the steps that need to be taken in order to
add an item to a circular queue in order to fulfill the requirements of the task. The very
first thing that we do is check to see whether there is a problem with the overflow. In
the second step, we will each take a test on our own. First, to find out if there is no one
waiting in line, and if there is no one waiting in line, to find out whether the rear of the
line has already reached its maximum capacity but there are still vacant spaces before

80 | P a g e
the front of the line. If there is no one waiting in line, then the second step is to find out
if there are any open spots before the front of the line. The third step of the operation
comprises the act of putting the item in the queue in the position that is denoted by
REAR.

Figure 3.14 Algorithm to insert an element in a circular queue

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Let's move on to the next subject after we've gone through how a new item is added to
a circular queue and have a discussion about how deletions are handled in this case. To
reiterate, in order to get rid of an element, we search for these three conditions to be
met.

3.3.3 Priority Queue

A data structure known as a priority queue is one that assigns a priority to each
individual member of the queue. Priorities may range from lowest to highest. Once the
order has been defined, the priority of the element will be applied to the process of
deciding the order in which the components will be dealt with. When processing the
components of a priority queue, the following is a list of some basic criteria that should
be adhered to:

• An element with higher priority is processed before an element with a lower


priority.
• Two elements with the same priority are processed on a first-come-first-served
(FCFS) basis.

81 | P a g e
You may think of a priority queue as a modified queue in which, if an element has to
be taken from the queue, the one with the greatest priority is fetched first. This allows
you to remove elements from the queue more efficiently. The importance of the
component may be adjusted according to a number of different criteria. Operating
systems often make use of priority queues in order to ensure that the process with the
greatest priority is completed first. It is possible to adjust the process's priority
according to the amount of CPU time that it needs in order to be entirely run.

For instance, if there are three processes, and the first process requires 5 ns to finish,
the second process requires 4 ns, and the third process requires 7 ns, then the second
process will have the greatest priority, and as a result, it will be the first process to be
carried out. However, CPU time is not the sole component that determines the priority;
rather, it is only one of numerous criteria that are taken into consideration. The
significance of one procedure in comparison to others is still another consideration. In
the event that we are required to run two processes simultaneously, one of which is
concerned with online order booking and the other with printing of stock data, it is
evident that online order booking is more significant and must be carried out before the
printing of stock details.

3.3.3.1 Implementation of a Priority Queue

There are two different approaches to putting in place a priority queue. We may either
choose to keep the items in a sorted list so that when an element has to be removed, the
queue does not need to be searched for the one with the greatest priority, or we can
choose to store the elements in an unsorted list so that insertions are always done at the
end of the list.

Either way, the priority of the element that is removed from the list is not affected.
When there is a need to remove an item from the list, the item with the greatest priority
will be looked for, and then it will be deleted. In a sorted list, the amount of time
required to insert a new member is O (n), yet the amount of time required to remove an
element is merely O (1).

On the other hand, it will take O (1) time to insert a new element into an unsorted list,
and it will take O (n) time to remove an existing piece from the list. In actuality, both
of these methods are ineffective, and in most cases, a combination of these two ways
is used instead, which typically takes less than or equal to O (log n) time.

82 | P a g e
3.3.3.2 Linked Representation of a Priority Queue

A priority queue may be saved in the memory of a computer either as an array of items
or as a linked list of objects. Both of these storage formats can contain the same
information. The information or data component, the priority number of the element,
and the address of the next element in the queue will all be included in every node of a
priority queue that is implemented using a linked list. When we use a sorted linked list,
the one that has a greater priority will appear before the one that has a lower priority.
This is because the higher priority element comes first in the list.

The priority level is considered to be higher when the priority number is lower. For
example, if there are two components A and B, and component A has a priority number
of 1, while component B has a priority number of 5, then component A will be
processed before component B since component A has a higher priority than
component B.

3.3.4 Array representation of Priority Queue

In the case of queues, linear arrays provide a representational tool that is both helpful
and handy. Each queue has a front variable and a rear variable, as was described before.
These variables correspond to the location inside the queue from which deletions and
insertions may be performed, respectively. These variables may be found at the very
first and very last positions in the queue. A queue is shown in the form of an array in
Figure 3.15, which gives a visual picture of a queue.

 Operations on Queues

In Fig. 3.15, the value in the FRONT position is 0, and the value in the REAR
position is 5. Let's imagine that we want to add one more component with the value
of 45. In such scenario, the value of REAR would have to be increased by one, and it
would be stored in the location that was stated to be used by REAR.

Figure 3.15 Queue

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

83 | P a g e
Figure 3.16 depicts what the queue might seem to look like once the addition has
been made. In this instance, the value of FRONT is 0, while the value of REAR is 6.
When there is a new component that has to be introduced, we will continue in the
same method that we have in the past.

Figure 3.16 Queue after insertion of a new element

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

In the event that we decide to remove an item from the queue, the value of FRONT
will shift forward by one place as a direct result of this operation. When removing
items from the queue, one must start from this particular end of the line in order to be
successful. Figure 8.3 depicts the structure that the queue will have when the deletion
has been completed. In this scenario, the value of FRONT is 1, while the value of
REAR is 6.

Figure 3.17 Queue after deletion of an element

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Before we are allowed to add a new item to a queue, we are first needed to determine
whether or not there is an overflow condition. If we try to add another item to a queue
that is already full, the items that are now in the queue will overflow into the next
available space. When the number of items in the queue REAR equals the number of
items MAX minus one, this indicates that the queue has reached its maximum capacity.
Because the index starts at 0, it is essential to make a note of the fact that we have
entered MAX -1. Before we are allowed to delete an entry from a queue, we are first
obligated to determine whether or not the queue is experiencing an underflow
condition.

84 | P a g e
A condition of underflow is created whenever there is an effort made to remove an item
from a queue that is not presently containing any items. It is possible that there is no
entry in the queue if the value of FRONT is -1 and the value of REAR is also -1. Let
us now have a look at Figs. 3.18 and 3.19, which depict the processes that are utilized
to add an element to a queue as well as remove an element from a queue, respectively.

Figure 3.18 Algorithm to insert an element in a queue

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 3.19 Algorithm to delete an element from a queue

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

85 | P a g e
Figure 3.18 depicts the process that has to be carried out in order to add an item to a
queue. This approach is required in order to do so. The very first thing that we do when
we get to Step 1 is to check to see whether there is a problem with the overflow. In the
second stage, we take a look at the queue to see whether it already has any things in it.
If there is nothing in the queue, the FRONT and REAR locations will both be reset to
zero, and the new value will be put in the 0th position. In the case that there is nothing
in the queue, the new value will be placed in the 0th position. If this is not the case and
the queue already has some things in it, then the REAR variable will be expanded in
such a way that it connects to the next available position in the array. If this does not
happen, then the queue will continue to hold items. The third step is inserting the value
into the queue at the position that is indicated by the REAR variable.

Figure 3.19 illustrates the procedure that must be followed in order to eliminate an item
from a queue. The very first thing that we do is check to see whether there is a problem
with underflow. In the event where FRONT equals -1 or if FRONT is greater than
REAR, an underflow will take place. On the other hand, if the queue already has some
entries in it, then FRONT will be incremented to refer to the next item in the queue if
it is the case that the queue already has some entries in it.

3.4 LINKED LIST

An array is a linear collection of data items in which the components are stored at
consecutive memory locations, as we have seen. An array is a linear collection of data
items. Linear collections of data items are referred to as arrays. When establishing
arrays, we are required to supply a size specification for each array, which helps to
restrict the total number of objects that each array is able to store. This is done to
prevent arrays from becoming unmanageably large. If we declare an array to be of type
int marks, for example, the array will only be able to have a maximum of 10 data items
and nothing more than that.

If we define an array to be of type float marks, the array will be able to include an
unlimited number of data items. But what if we don't know the whole number of
components until later on in the process? In addition, the components have to be stored
at random in any place rather than in locations that are sequential in order to make the
most efficient use of memory. This is necessary in order to get the best possible results
from the usage of memory. As a result, in order to design programs that are efficient,
there is a need for some form of data structure that may remove the limitations that
have been set on the maximum number of components and the storage condition.

86 | P a g e
A data structure known as a linked list is an example of an arrangement that does not
comply with the limitations that were presented before. Due to the fact that linked lists
do not store their components in memory locations that are consecutive to one another,
users are able to add an unlimited number of items to the list. A linked list, on the other
hand, is not like an array in that it does not let data to be retrieved in a haphazard
manner. It is the only way to access the components of a linked list, and you must do it
in the order in which they are shown. On the other hand, in the same way as with an
array, arithmetic operations such as addition and subtraction may be carried out at any
point in the list in a fixed period of time.

To describe it in the simplest terms imaginable, a linked list may be thought of as an


example of a linear collection of data elements. These individual bits of data are
referred to as "nodes" in the network terminology. One example of a data structure that
may be used in the development of other data structures is the linked list. You could
also utilize other types of data structures. As a consequence of this, it may be used as a
building block in the creation of data structures such as stacks and queues, in addition
to alterations of these types of structures. Each node in a linked list consists of one or
more data elements as well as a pointer to the next node in the sequence. This allows
the list to be interpreted as either a train or a series of nodes, depending on the
perspective used.

3.4.1 Singly Linked List

One kind of linked list, known as a simply linked list, is the most basic type. A node in
this form of linked list is responsible for storing not only some data but also a reference
to the next node in the list that also has the same data type. When we speak of a node
as having a reference to the next node, we are referring to the fact that it retains the
address of the node that follows it in the chain. This is what we mean when we say that
a node has a pointer to the next node. When using a single linked list, data may be
traversed via the list in just one way at a time. Figure 3.20 displays a single linked list
in its entirety.

Figure 3.20 Singly linked list

87 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

3.4.1.1 Traversing a Linked List

"Traversing" a linked list means gaining access to the nodes of the list in order to carry
out processing operations on those nodes. This is what is meant by the term
"traversing." It is essential to bear in mind that a linked list will always have a pointer
variable with the name START. This variable will record the address of the list's first
node. It is necessary to ensure that the NEXT field of the very last node is set to either
NULL or -1 before the list may be considered to have reached its end. PTR is the name
of one of the additional pointer variables that we make use of so that we may navigate
throughout the linked list. This variable provides information on the physical location
of the node that is currently being visited. The way that is used to go through a linked
list is shown in Figure 3.21.

Figure 3.21 Algorithm for traversing a linked list

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

As the first step of this method, we give the variable PTR an initial value that is the
address of the START variable. As a direct consequence of this, PTR's focus is now
directed to the first node in the linked list. Then, in Step 2, a while loop is executed,
and its execution is maintained until PTR has completed processing all of the nodes, or
until it comes across the value NULL, whichever comes first. This continues until Step
3 is reached. In Step 3, we apply the procedure (such as print) to the active node, which
is the node that is referred to by the variable PTR. In other words, the active node is the
node that is now being processed. In Step 4, we set the PTR variable so that it refers to
the node whose address is stored in the NEXT field. This allows us to go on to the next
node in the chain. Because of this, we are able to proceed to the next node in the chain.
88 | P a g e
Let's develop an algorithm to count the nodes in a linked list now that we know how
many nodes are included inside a linked list. To do this, we will travel through the list
nodes one at a time while concurrently raising the value of the counter by one each time
we move through the list nodes. This will allow us to successfully complete the task.
The final value of the counter will be shown after we have reached the NULL value.
This implies that we will cease counting once all of the nodes in the linked list have
been traversed. Figure 3.22 illustrates the process that must be followed in order to
provide the total number of nodes that are included inside a linked list.

Figure 3.22 Algorithm to print the number of nodes in a linked list

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

3.4.1.2 Searching for a Value in a Linked List

One has to start by doing a search on the linked list in order to find a certain item that
is included inside the linked list. As was said before, a linked list is made up of nodes,
and each node has two sections: the information portion and the next section. As was
also mentioned earlier, a linked list may be thought of as a chain. As a result, searching
requires determining whether or not a certain value is included inside the information
section of the node. The algorithm will provide the address of the node in which the
value is stored if it is able to locate the node and determine whether or not it contains
the value.

3.4.2 Circular Linked Lists

In a circular linked list, the very last node has a reference that sends users back to the
very first node of the list. In addition to a circular single linked list, it is also possible

89 | P a g e
to have a circular doubly linked list in addition to having a circular singly linked list.
When traversing a circular linked list, we are free to begin our journey at any given
node and go through the list in any direction—either forward or backward—until we
reach the node from which we started. As a direct consequence of this, a circular linked
list does not have either a beginning nor an ending point. Figure 3.23 displays the
circular linked list that was created.

Figure 3.23 Circular linked list

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The use of a circular linked list has a number of potential drawbacks, the primary one
being that iteration may become fairly challenging. It is essential to keep in mind that
the NEXT section of none of the nodes in the list has any NULL values. This is the
case throughout the whole list.

Circular linked lists are often used in operating systems, and one popular application
for these lists is task maintenance. An illustration of one possible application for a
circular linked list is going to be provided for our attention right now. When we are
surfing the web, we have the option to travel backwards and forwards via the pages that
we have previously been on by using the Back button and the Forward button,
respectively, when we want to browse through the sites that we have been on in the
past. How does one go about doing that? The answer is not difficult to figure out.

A circular linked list is used so that the sequence in which a user explores various
websites on the internet may be accurately recorded. You may redo your previous site
exploration by using the browser's Back and Forward buttons once you've navigated
this circular linked list in either direction (forward or backward). In point of fact, this
may be done by using either a circular stack or a circular queue. Both of these circular
data structures are circular in nature. Take a look at the data shown in Fig. 3.24.

90 | P a g e
Figure 3.24 Memory representation of a circular linked list

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

We are allowed to go down the list until we reach the NEXT item, which must include
the address of the node that is located at the top of the list. This signifies that the linked
list has reached its conclusion; more precisely, the node that contains the address of the
beginning node is the node that acts as the list's end destination since it holds the
address of the start node. The following is what we receive when we go through DATA
and NEXT in the sequence described above.

91 | P a g e
CHAPTER 4

NONLINEAR DATA STRUCTURE

A graph is a kind of non-linear data structure that generates an illustrated representation


of a collection of things by linking a particular number of vertices (also called nodes)
and edges. The number of vertices and edges that make up a graph is limited. There is
a limit on the total number of edges and vertices. Vertices are the individual points that
the edges of a graph link to one another, and they are represented as triangles on a
graph. Diagrams of network connections may also be referred to as graphs. Graphs are
one term for this kind of diagram. These edges and nodes have the ability to create a
connection between any two nodes in the network by going through any one of the
other nodes in the network as a middleman. There are valid representations of the
connected node that can be generated using either directed or undirected graphs.

These representations can be created using either directed or undirected graphs. Graphs,
either directed or undirected, may be used in the construction of these representations.
When compared to an undirected network, a directed network has nodes that are only
loosely connected to one another via the use of edges that go in just one of the two
possible directions. A directed network is the name given to this particular kind of
network. Each node in a network that is not oriented in any specific manner is
connected to at least one edge in every conceivable direction. This ensures that the
network is complete. It is frequently referred to as a bidirectional node due to the fact
that it is capable of transmitting and receiving information in both directions. This is
due to the fact that it is able to convey information as well as receive information in
both ways.

Trees are a kind of non-linear data structure that express hierarchical data in a pattern
that is like to that of a tree, and they are formed in this manner. Trees may also be
thought of as a type of data model. Trees also have a pattern that is comparable to that
of a tree, and trees have this pattern. One of these nodes or components is referred to
as a root node, while the other components of a data structure consisting of a value are
referred to as subtrees. The root node is the component that has the most importance in
the structure. The component of the structure that carries the greatest amount of weight
is known as the root node. The "root node" refers to the component that has the most
weight in the structure as a whole.

92 | P a g e
To look at it from still another perspective, one might argue that a subtree is a subset
of a subtree, and that a subtree is a subset of a subtree. Alternatively, one could say that
a subtree is a subset of a subtree. The term "child node" is used to refer to the other
nodes in the tree, since there is only one node in the tree that can be regarded to be the
"parent node." Other nodes in the tree are referred to as "sibling nodes." The node that
is located at the topmost pinnacle of the tree is referred to as the parent node. Every
node in the tree is responsible for ensuring that the structure's parent-child relationships
continue to be robust. There is only ever going to be one parent node for a specific
node, despite the fact that it could contain an endless number of child nodes. There are
many different types of trees, some examples of which are a binary tree, also known as
a binary search tree, an expression tree, an AVL tree, and a B tree. There are also many
more types of trees. There are also a great number of other types of trees.

4.1 TREE-DEFINITIONS AND CONCEPTS

4.1.1 Tree Definitions and Properties

A tree is the name given to the abstract data type that organizes its constituent parts
into a hierarchical structure. Every element in a tree, with the exception of the very last
one at the very top, has a parent element and either zero or more elements that are its
children.

Figure 4.1: A tree with 17 nodes representing the organizational structure of a


fictitious corporation.

93 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The very last element in a tree is the only one that does not have a parent element. It is
typical practice, when attempting to conceptualize a tree, to enclose its component
pieces inside ovals or rectangles, and then to employ lines to symbolize the ties between
the parents and children. (For further details, please refer to Figure 4.1.) It is usual
practice to refer to the component at the very top of the tree as the "root," despite the
fact that when drawn, this component appears to be the tallest member, with the
subsequent components connected below it (the exact opposite of how a biological tree
is structured).

4.1.1.1 Formal Tree Definition

We will refer to the tree T as a collection of nodes that hold objects in a parent-child
connection and have the following qualities for the sake of this discussion.

1. If the tree T does not include any empty nodes, then the node that it does contain
is known as the root of the tree, and it does not have any children of its own.
2. Every node in T other than the root has its own unique parent node, and every
node in the tree that has was a parent is a child of w. The root node is the only
node in the tree that does not have was a parent.

It is essential to bear in mind that according to our definition, a tree may exist even if
it does not include any nodes at all, and this is something that must be kept in mind at
all times. This standard also makes it possible for us to define a tree in a recursive
manner, which means that a tree T can either be empty or it can consist of a node r,
which is referred to as the root of T, and a collection of trees (which can be empty)
whose roots are the children of r. A tree T can either be empty or it can consist of a
node r and a collection of trees whose roots are the children of r. If both of the nodes'
parents have the same biological parent, then the two nodes are regarded to be siblings.
If a node v does not have any children, then that node is regarded to be an external
node. It is possible that a node v qualifies as an internal node if it has one or more
children. When discussing external nodes in particular settings, the term "leaves" is
often used.

Illustration number 4.1: The vast majority of operating systems provide the user with
the directory structure, which is a hierarchical organisation of files into nested

94 | P a g e
directories, using a graphical user interface that is modelled like a tree (also known as
folders). (For further details, please refer to Figure 4.2.) In order to be more precise,
the nodes that are internal to the tree are connected with directories, whilst the nodes
that are external to the tree are linked with standard files. The directory that is at the
very top of the directory hierarchy is referred to as the "root directory" in both the
UNIX and Linux file systems. The slash character "/" is used to indicate that the
directory in question is the root directory.

Figure 4.2: Tree representing a portion of a file system

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

4.1.1.2 Edges and Paths in Trees

A section of the bark that covers a tree T is a pair of nodes (u,v) in which either u or v
may take on the role of parent to the other node in the pair. In T, a route is a sequence
of nodes that is designed in such a manner that any two consecutive nodes in the
sequence may generate an edge. This kind of construction is referred to as a "sequential
construction." Figure 4.2 provides an illustration of a tree that includes the path labeled
as (cs252/, projects/, demos/, market). This is only one example.

Illustration number 4.2: When you utilize single inheritance in a C++ program, the
inheritance link that already exists between classes will assume the form of a tree
because of the way that trees are structured. The base class is a representation of the
very beginning of the tree.

95 | P a g e
4.1.1.3 Ordered Trees

The structure of the tree may be considered ordered if there is a predetermined linear
order for the offspring of each node. This indicates that we are able to determine which
of the children of a node are the first, second, third, etc. in the ordered list. When
attempting to depict such an ordering, it is common practise to employ a representation
of the tree in which the siblings are sorted from left to right, mirroring the direction of
their linear link. This ordering is determined by how the tree is going to be used, and it
is shown for clarity. Many times, the linear order link that exists between the offspring
of the same parent is shown via ordered trees. This is accomplished by presenting the
children of a sibling in a series or iterator in the correct order.

Illustration number 4.3: A structured document, such as a book, is organised


hierarchically as a tree, with the chapters, sections, and subsections acting as the
interior nodes and the paragraphs, tables, figures, and bibliographies serving as the
external nodes. An example of a structured document is a table of contents. (For further
details, please refer to Figure 4.3.) A metaphor for the commencement of the tree is
provided by the book itself. In point of fact, we may conceive of expanding the tree
even further such that it depicts paragraphs as being formed of sentences, phrases as
being composed of words, and words as being built of characters. This would allow us
to see the structure of the language more clearly. In any case, this kind of tree is an
example of what is known as an ordered tree since there is a clearly defined ordering
among the children of each node in the tree. To phrase it another way, there is a
structure of hierarchy inside the tree.

Figure 4.3: An ordered tree associated with a book

96 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

4.1.1.4 Tree Functions

The nodes of the tree are where the ADT storage is located, and it is where elements
are stored. We do not provide users access to the underlying nodes of our network
directly since these nodes are components of our technology that operate in the
background. Instead, public access may be gained to the nodes of the tree by using the
position objects that are connected to each node of the tree. As a consequence of this,
when we discuss the public interfaces of functions in our ADT, we use the notation p
(instead of v) to make it abundantly obvious that the argument being sent to the function
is a position and not a node. This ensures that the function receives the correct
information. This is done in order to clear up any misunderstandings that may arise.
However, because of how closely these two concepts are related to one another, we
often get them confused with one another and use the terms "position" and "node"
interchangeably when discussing trees. This is because of how closely related these
two things are to one another.

We take use of the fact that C++ has the capability to overload the dereferencing
operator ("*") in order to get access to the component that is linked to a certain address.
*p is the access code for the related element whenever a position variable, p, is known
to be present. This may be used to either read the value of the element or change the
value of the element. Both of these uses are possible. It is quite beneficial to have the
capacity to maintain sets of postures. The user may be presented with an example that
consists of a list of the children of a certain node in a tree. This may be done for
illustrative reasons. A list in which each element reflects a place in a tree is referred to
as a "position list," and this phrase is used to describe such a list.

The ability to reach other parts of the tree that are located close by is what gives a place
in a tree its genuine power. This power derives from the fact that the location is in a
tree. The following is the definition that we use when we are given a point p on the tree
T: The tree is responsible for providing all of the above benefits on its own. The first
two methods, size and empty, are just the standard functions that we built for the
different kinds of containers that we have seen in the past. These functions include
getting the size of the container and setting it to empty. The result of the function root
is the location of the tree's root, while the output of the function positions is a list that

97 | P a g e
contains all of the tree's nodes. The root function may be used to determine the location
of the tree's root.

Within the scope of this section, no particular update functions have been specified to
be used with a tree. In its place, we shall discuss several approaches to the updating of
trees in relation to specific applications of trees in the next chapters. These applications
are going to be covered in more depth in a subsequent section. In point of fact, in
addition to the methods of tree update that are discussed in this book, we are able to
conceive of a great deal of other varieties of tree update techniques.

4.1.2 Representation of Binary Tree

Let's go through some more specific information on complete binary trees and the
method that these trees are represented in the computer.

 The Complete Binary Tree ADT

One might consider a whole binary tree to be an example of an abstract data type. T
provides support for the following two functions in addition to the functions that are
offered by the binary tree ADT:

Add (e): Add to T and return a new external node v that stores element e in such a
manner that the resulting tree is a full binary tree with final node v. This ensures that
the element e is stored in the correct location. This verifies that the tree was successfully
built from the ground up.

omit the brackets (): Remove the very last node of the T, and put back the component
that it previously contained. The fact that these update operations are the only ones that
are performed ensures that the tree that is created is a complete binary.

• If the bottom level of T is not full, then add inserts a new node on the bottom
level of T, immediately after the rightmost node of this level (that is, the last
node); hence, T’s height remains the same.
• If the bottom level is full, then add inserts a new node as the left child of the
leftmost node of the bottom level of T; hence, T’s height increases by one.

The impact of adding anything essentially has two distinct forms, as shown in Figure
4.4; the same is true when removing anything from the equation (which may also be
asserted).

98 | P a g e
Figure 4.4: Examples of operations add and remove on a complete binary tree, where
w denotes the node inserted by add or deleted by remove

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

4.1.2.1 A C++ Implementation of a Complete Binary Tree

In Code Fragment 4.5, we provide the complete binary tree abstract data type (ADT)
in the form of an informal interface that is simply referred to the entire Tree. As is the
case with all of our other informal interfaces, this does not constitute a whole class
written in C++. It only covers the material that pertains to the portion of the lesson that
is accessible to the general audience.

finalize your work in the C++ class. It only covers the material that pertains to the
portion of the lesson that is accessible to the general audience. The interface is
responsible for defining an instance of a nested class that is referred to as Position. This
class represents a node in the tree. We provide the fundamental functionalities for
traversing around the tree, including access to the starting position (root position) and
the most recent position (most recent position). In addition to the operations of adding
and deleting, there is a function known as swap that can be used to switch the contents
of any two nodes that are provided. This function may be found in addition to the
operations of adding and deleting.

99 | P a g e
4.1.3 Binary tree traversal (In order, Preorder a Post order)

The Insertion Sort is a technique of sorting that is not only straightforward but also very
efficient. Over the course of many decades, it has served as an essential component in
the study of computer science and the processing of data. It creates a sorted sequence
in a systematic way by continuously inserting items from an unsorted list into their
respective positions inside a growing sorted sublist. This results in the development of
a sorted sequence. The simplicity of this approach is what gives it its allure; there is no
complication involved.

This method is not only easy to understand due to the fact that it is straightforward in
nature, but it also has a degree of efficiency that makes it suitable for datasets ranging
in size from relatively little to reasonably big. This is due to the fact that this method is
basic in nature. The versatility of Insertion Sort extends well beyond its capacity to
sort; in addition to this, it can be used as an effective educational tool for the acquisition
of crucial sorting concepts, which may help students and hobbyists alike build a deeper
understanding of algorithms and data structures. On the Insertion Sort website, the
program may be obtained at no charge and downloaded there. Throughout the course
of this research of Insertion Sort, we will expose the inner workings of this time-
honored way of sorting, as well as its virtues and shortcomings, as well as its continuous
relevance in the creation of algorithms and the teaching of computer science.

The process of moving through a tree while halting at each of its nodes is known as
"tree traversal." The technique of traversal is traversing through a tree node by node
and writing out the values of each node as one moves through the tree. Because the
edges link all of the nodes to one another, we always start at the root head node. This
is because the edges connect all of the nodes. To phrase it another way, we are unable
to visit a node in the tree at random. When traversing a linear data structure such as a
linked list, queue, or stack, there is only one route that can be taken that fulfills the
requirement. On the other hand, one may go through a tree via any one of a number of
different routes, some of which are pictured below.

• Pre order traversal


• In order traversal
• Post order traversal

Generally, we traverse a tree to search or locate given item or key in the tree or to print
all the values it contains.

100 | P a g e
4.1.3.1 In order Traversal

When using this form of traversal, the left subtree on the left is visited first, then the
root node, and lastly the right subtree is visited for the final time. It is of the utmost
importance that we never lose sight of the fact that every single node has the capacity
to stand in for its very own subtree. When a binary tree is traversed in the wrong order,
the resulting output will contain sorted key values that are ordered in ascending order.
When doing this style of traversal, you will travel through the nodes in such a manner
that the sequence will always be traversed through in either the ascending or descending
order. The Left->Root->Right pattern is used to traverse the nodes if it is desirable to
get to the leftmost position available.

This process will continue until there are no more nodes left or values to choose from
that are lower in value. When we have returned to the parent node, we will continue to
traverse the nodes on the right in a manner that is analogous to the previous one. We
will continue to follow this pattern for each node in the tree until we have finished the
process of traversing the tree. We proceed to A's left child node B by using the in-order
traversal method. Even letter B is tackled one letter at a time. This process will continue
until each and every node has been explored. The conclusion reached after doing a
thorough and methodical investigation of this tree is –

D→B→E→A→F→C→G

Algorithm

1. Starting at the left subtree, the In order function is called on it.

2. The root node is then visited.

3. Lastly, the In order function is called on the right subtree.

4.1.3.2 Pre order Traversal

If and only if each node in a binary tree satisfies the following requirements, then it
will be possible to traverse the tree in preorder. Left to the root, then right to the branch.
The procedure begins with a visit to the root node, either for the purpose of printing or
visiting, and then continues along into the left subtree. After the algorithm's traversal
of the left subtree is finished, it moves on to the next step, which is the traversal of the

101 | P a g e
right subtree. The aforementioned procedure is repeated for every branch of the tree
until all of the branches have been investigated. The process of moving from the root
node to the left subtree and then to the right subtree is how the preorder traversal works.
This layout makes it much easier to go quickly through the whole tree, from the trunk
all the way out to the leaves.

When doing this style of traversal, the root node serves as the first point of departure,
followed by the left subtree, and finally the right.

Algorithm for Preorder Traversal

1. Visit the root node.


2. Traverse the left subtree by recursively calling the preorder function on the left
child.
3. Traverse the right subtree by recursively calling the preorder function on the
right child.

Dry Run of Pre order Traversal

Examining the problem of the pre order traversal in a methodical manner in order to
find a solution to it.

4.1.3.3 Post order Traversal

When doing a post order traversal of a binary tree, the nodes of the tree are visited in
the following order: It goes Left->Right->Root in this way.

The technique starts out by "traversing" the left subtree, which is another name for
"traveling through" it, and then it carries on to "travel through" the right subtree. The
algorithm reaches the root node after it has completed its tour through the right subtree
in its entirety and is ready to go on. This process is carried out for each node in the tree
until all of the nodes in the tree have been visited, which takes place when all of the
nodes have been visited.

The post order traversal operates in such a way that it starts with the left subtree, then
continues on to the right subtree, and finally arrives at the root node as its final
destination. Starting with the tree's branches and working one's way up to the trunk,
this pattern makes it easy to swiftly cover the whole of the tree.

102 | P a g e
Algorithm for Post order Traversal

1. Traverse the left subtree by recursively calling the post order function on the
left child.
2. Traverse the right subtree by recursively calling the post order function on the
right child.
3. Visit the root node.

Dry Run of Post order Traversal

Beginning the traverse from the root node and working our way to the left side of the
tree.

From the node with value 0, we hop onto the node with value 0 as our destination. Due
to the fact that 0 is a leaf node, the principle of Left-Right-Root dictates that we must
add 0 to the output when it is visited or printed. Moving in the direction of the right
subtree of 2, we make our way to node 5.

After going through the methodology and doing a dry run for each of the three methods,
it is time to have a look at the code that will allow us to implement in order Traversal,
Post order Traversal, and Preorder Traversal in an iterative and recursive fashion. Using
the binary tree with the numbers 9, 2, 12, 0, 5, 11, and 16, here is an implementation of
the iterative code for in order, post order, and preorder traversals in Python.

4.2 THREADED BINARY TREE

Insertion Sort is a method for sorting that is both simple and very effective. For decades,
it has been an essential component in the fields of computer science and data
processing. It develops a sorted sequence in a methodical manner by repeatedly
inserting items from an unsorted list into their appropriate locations inside a developing
sorted sublist. This methodology is what gives it its charm since it is so straightforward.
This approach is not only simple to comprehend because of its uncomplicated nature,
but it also has a degree of efficiency that makes it suited for datasets ranging in size
from small to relatively large.

The flexibility of Insertion Sort goes beyond its ability to sort; it also acts as a key
learning tool for acquiring important sorting principles, which assists students and
hobbyists alike in developing a better knowledge of algorithms and data structures.

103 | P a g e
Insertion Sort may be downloaded for free from the Insertion Sort website. During the
course of this investigation of Insertion Sort, we will reveal the inner workings of this
time-honored sorting method, as well as its benefits and drawbacks, as well as its
continued importance in the development of algorithms and the teaching of computer
science. In a binary tree, the nodes that represent the leaf branches do not have any
offspring. As a consequence of this, the left and right fields of the leaf nodes will both
be set to the value NULL. However, NULL uses up memory space, thus in order to
prevent it from occurring in the node, we will establish threads.

4.2.1 Threads

Threads are essentially connections that point to the node's ancestor as well as its
successor. The following guidelines will be used to design threads.

• If ptr->left child is NULL, replace ptr->left child with a pointer to its inorder
predecessor of ptr.
• If ptr -> right child is NULL, replace ptr -> right child with a pointer to its in
order successor of ptr

The following is a representation of the binary tree:

The following is an illustration of the related threaded binary tree.

Because the INORDER Traversal was performed on the binary tree above: The
HDIBEAFCG.

4.2.2 In order Traversal of a Threaded Binary Tree

Each node in a binary tree with in order threads should have a left node that points to
the previous node and a right node that points to the next node in order. This idea forms
the basis of the building's design. The trees' starting node (called the head node) is
linked to the root node (called the foot node) on the left side of the tree structure.

Left thread and right thread are two supplementary fields of every node, both of which
start out as 0. Let's have a look at the values required to build a threaded binary tree so
we can get a better grasp on the idea of in order thread traversal of a binary tree. 10, 8,
6, 12, 9, 11, 14

The first step is to establish a root node for the tree.


104 | P a g e
Let's set 10 as the starting point and make it the first node. The following connection
will be made to the left of the head node:

As seen, the left and right NULL connections of the root nodes are going to be routed
to the leaf node.

Following that is the number 8, which, due to the fact that it is less than the left child
of the root 10, is being contrasted with that number.

new - > left = root- > left

new - > right = root

root - > left = new

root - > lth = 1

Node 8's left link refers to the node before it in the in order, whereas node 8's right link
refers to the node after it.

105 | P a g e
The next node, 6, is linked to the node on the left in a similar method. As can be seen
below, the next node has a value of 12, which is more than that of the root node (10),
hence the two are joined to the right of the root node.

new - > right = root- > right

new - > left = root

root - >right = 1

root - > right = new

4.3 CONVERSION OF GENERAL TREES TO BINARY TREES

The transition of data structures from general trees to binary trees is a watershed point.
This progression embodies the difficult skill of reducing complex hierarchical linkages
into a form that is more easily understood and organized. The fact that generic trees
may be scaled to any size makes them an extremely flexible tool that can be used to
illustrate a diverse range of hierarchical structures. These adaptable frameworks may
be used to explain a wide variety of topics, including organizational structures, file
systems, and the intricate connections that exist between components in markup
languages such as extensible markup language (XML) and hypertext markup language
(HTML). However, owing to the inherent complexity of the tree as a whole, performing
efficient tasks such as searching or traversing the tree could be challenging.

The concept of converting generic trees into binary trees comes into play at this point.
This enables one to make advantage of the characteristics of binary trees, such as the
tree's binary branching structure and the ordered placement of the nodes, in order to
accelerate and enhance a variety of operations.

Techniques such as left-child, right-sibling representations and level-order traversals,


which make the transition between the unrestricted freedom of general trees and the
organized efficiency of binary trees, bring up a multitude of options for tackling a broad
range of computing problems. This is accomplished by making the transition between
the two types of trees. In order to better manage and analyze data in the ever-changing
area of computer science and information technology, we investigate the art of making
order out of chaos. This ranges from the complexity of general trees to the simplicity
of binary trees.

106 | P a g e
4.3.1 Generic Tree

Data structures that take the shape of trees and let each node to have up to an offspring
are known as generic trees or n-ary trees. Any positive integer may serve as n. Each
node has a children vector inside it for keeping track of its children's addresses. Only
the leaf nodes and the terminal nodes do not produce new plants.

4.3.2 Binary Tree

A Binary tree is considered to be a hierarchical data structure since each node in the
tree may only have two children. The offspring of a node are comprised of both that
node's left and right children. In a binary tree, the first node is referred to as the root
node. The leaf nodes, often known as the leaves, do not produce any progeny. The
groundwork for the core concepts of this blog has been completed. The next thing to
do is to precisely identify the problem at hand.

4.3.3 Problem Statement

A conventional tree structure that has n nodes has been made available to us for our
use. In order for you to successfully finish this work, you will need to convert that
standard tree into a binary tree.

Exemplification and Application of a Model

Let's have a look at a simple example to help better understand how the conversion
process works.

Input

Output:

583421

4.3.4 Pictorial Representation:

Explanation:

An illustration of a tree that satisfies the criteria of a binary tree is shown above; you
can see an example of such a tree in the picture. In addition, I would like to bring to

107 | P a g e
your attention that the result is now being printed in the presale format. Because of this,
the output is changed from a standard tree to a binary tree as a result of this
transformation. Since we are now acquainted with the problem at hand, let's go on to
the next stage and look at the many options that are available to us. The problem that
was just outlined will now be examined through the lens of one solution that is available
to address it.

Approach

We are able to convert any ordinary tree into a binary one by using these three simple
ideas. Are you eager? Now I'll go through the rules and regulations. First and foremost:
The root of a binary tree is identical to the root of a generic tree. According to Rule 2,
the child that is the leftmost child of a node in a binary tree is also the node's leftmost
child in a generic tree. The right child of a node in a binary tree is equivalent to the
right sibling of a node in a generic tree, as stated in Principle 3.

When we refer to someone as a "sibling," we imply that they share the same ancestor
node's children. Take notice that if a root node in the generic tree has just one right
child, that node will be promoted to the rightmost child node of the last node in the
binary tree to follow the root node. This promotion will take place automatically. Let's
check to determine whether the algorithm described in the following problem statement
is valid.

Algorithm

We will understand the algorithm stepwise. Let's go.

 Create a generic tree and call the "generic To Binary" function.


 Check if the root node is null. If yes, return null.
 Check if the root node has no children. If true, return the root as the node is
already a leaf node.
 Check if the root node has only one child. If yes, call the "generic To Binary"
function recursively with the child node and set it as the left child of the current
node in the binary tree.
 If the root has multiple children, set the left child of the binary tree as the result
of recursively calling the generic To Binary function with the first child node
as the argument.

108 | P a g e
 Set the right child of the binary tree as the result of recursively calling the
generic To Binary function with the next sibling node as the argument.
 Iterate over the remaining child nodes and set them as the left child of the
rightmost node in the binary tree.
 Return the root of the binary tree.
 Print the binary tree using the preorder traversal.

4.3.5 Dry Run

There is a comprehension of the problem, as well as an example that is operational and


an algorithm. Let's have a conversation about the test run for this algorithm. We are
going to stick with the same parameters that were used in the model. The first thing
that we do is determine whether or not the root is null. To answer your question directly,
no. Let's check to see whether the node that's now at the top of the tree has any progeny
right now. The correct response is "yes." Because the root has now produced children,
the offspring that is farthest to the left should take the role of the left node in the binary
tree. The second development is the addition of node 8 to the network.

Again, we'll look at all of the needs for this node individually. Since it does not contain
a null value, this node must be a parent to some children. Therefore, node 8 will be the
left node of the generic tree at its most leftmost position. The third stage, or node 3, is
brand new. We will check one more to make sure everything is in order. In this
situation, the first condition, which asks whether the node is null, evaluates to false,
while the second condition, which asks if the root node has any children, evaluates to
true. This means that the root, 3, will be provided. Fourth, our attention will be drawn
to the kid who is third from the right.

The answer to this question will indicate whether or not the standard number of siblings
is three. Without a doubt, four is that number. After then, the age of the proper child
out of the trio will be four years old. In the fifth step, recursion will be used to examine
whether or not node 4 still meets the requirements. Even though the node is not devoid
of children, the number 4 does not have any descendants. As a result, node 4 is a leaf
node. As a direct result of this, you will get the root. Following our investigation of
Node 3, we will now go to Node 8. Determine whether or not number 8 really has a
valid sibling as the sixth phase of the process. You are correct, the relevant number is
2. The conclusion that can be drawn from this is that the correct child of the eight will
be two years old.

109 | P a g e
Seventh, we will make sure everything is in order by double-checking everything. The
value of the node is not the value of nothing. In addition to this, it is sterile. On the
other hand, the proper sibling of 2 might be located in the general tree of 1. This
indicates that it will come in as the right child of the second node when it arrives.

4.5 APPLICATIONS OF TREES

A value and a collection of pointers to other nodes are stored in each individual node
of a tree, and the individual nodes are connected to one another by means of edges.
(This is analogous to the way in which parents refer to their children in a family tree).
In contrast to the linear storage techniques of stacks and queues, hierarchical data may
be represented using trees instead of those structures. It is organized in a treelike
hierarchy, with a central hub that branches out to various nodes in the structure. Trees
are an example of a non-linear data structure. This structure is made up of nodes as well
as edges. A tree structure is used to portray the information in a hierarchical way. In
this one-of-a-kind form of connected graph, there is neither a cycle nor a circuit to be
found anywhere in the structure.

Tree Terminologies:

• A node is the fundamental building block of a tree; it stores data and maintains
connections to other nodes.
• A route in a tree that connects two nodes is called an edge, however it is also
sometimes referred to as a branch. It's possible for many edges to link to the
same node.
• The node that comes directly before a certain node in the tree is known as that
node's parent. An elementary clarification: it's a branching node in a tree
structure.
• When a node is connected from below to another node, it is considered to be a
child of the node to which it is related. Every single node in the tree is a sub-
node, with the exception of the root node.
• The first node in a tree is referred to as the root since it is the point at which the
tree originated. It is not necessary for a tree to have more than one root.
• A node in a tree structure that does not have any offspring is referred to as a
leaf node. This kind of node is also referred to as an external node.
• Internal nodes, also known as non-leaf nodes, are nodes inside a tree that do not
have any leaf children.

110 | P a g e
• Nodes are regarded to be siblings if they have the same parent.
• Nodes that are on the same level but have different parents as their ancestors
are referred to as cousins.
• The number of a node's children is equivalent to the degree of that node. The
degree of the tree is defined as the node in the tree that has the maximum degree
possible.
• It is necessary for every node in the tree to have a separate path that leads to it
from every other node in the tree. The total number of a route's edges is the
metric that is used to measure its length.
• In a directed graph, the level of a node is determined by the number of edges
that link that node to the root node.
• A subtree is a subset of a tree that only contains the offspring of one particular
node. Subtrees are also known as child trees.

Applications of Tree:

• One may conceptualize the file system of a computer as a tree-like structure. If


you were to visualize this organization as a tree, each folder or directory would
represent a node, and each file would represent a leaf.
• Trees are used in the process of parsing and processing XML documents. The
elements in an XML document are analogous to leaves on a tree, and the
attributes of those elements are analogous to the leaves' specific qualities.
• The organization of database indexes often makes use of trees. Because of this,
the B-tree and its many versions are used rather often.
• The syntax of computer languages is often specified via the use of parse trees,
which are also used in the design of compilers. Compilers depend on this
information to determine the structure of the program and generate machine
code in accordance with their findings.
• AI: Artificial intelligence often makes use of decision trees in order to assess
possible courses of action in light of a number of criteria.

Real-Time Applications of Tree:

• A tree data structure is used in the process of indexing in relational databases.


• The tree data structure is used often in the administration of file systems as well
as folders.
• The Domain Name System (DNS) uses a tree to organize its data.

111 | P a g e
• A number of board games, including chess, feature pieces shaped like trees in
some capacity.
• Decision-based algorithms in machine learning often make use of trees as a data
structure.

Advantages of Tree:

• Trees have unrivaled capabilities in terms of both searching and retrieving


information. Even for incredibly large data sets, searching through a tree
typically only takes a very short amount of time and has a time complexity of
O(log n).
• When nodes are either added to or removed from a tree, the size of the tree
responds by either growing or shrinking accordingly. Because of this, they are
perfect for applications in which the quantity of data that will be kept might
potentially change.
• Uncomplicated process: There are many different ways to navigate through a
tree, and each one is designed to meet the requirements of a particular use case.
Because of this, the information that is kept in a tree structure may be accessed
and processed in a very short amount of time.
• Trees make upkeep easier owing to the inherent strictness of their node-to-node
hierarchy and interrelationships with one another. Because of this, it is easy to
change specific nodes without having an effect on the remainder of the tree.
• As a result of the hierarchical structure that is intrinsic to trees, they may be
used to show a wide variety of various sorts of interactions. Due to this feature,
they are well suited for representing data structures such as hierarchies,
taxonomies, and file systems.
• It takes O(log n) time to add or delete nodes from a tree, which means that this
operation may be completed very quickly even for very large trees.

Disadvantages of Tree:

• Because of the intricacy and scale of trees, there is a possibility of memory


overhead being incurred. It's possible that some apps won't run properly if you
don't have enough RAM.
• If a tree is not balanced appropriately, it may result in unequal amounts of time
spent searching. This may be a significant problem in situations when time is
of the utmost importance.

112 | P a g e
• Trees are not usually the data structures that are the simplest to understand or
put into effect due to the intricacy of the information they contain. This might
provide a barrier for those who aren't used to working with them.
• Reduced wiggle room: Despite the fact that trees may take on a variety of sizes
and shapes, they are not as versatile as other types of data structures, such as
hash tables. This might be a significant problem for applications in which the
amount of the data being stored changes often.
• When it comes to other activities like sorting and grouping, trees are not as
effective as they are when it comes to searching and retrieving information. It's
possible that there are more effective data structures available to utilize for
activities of this nature.

4.5.1 AVL trees

The AVL tree is a self-balancing binary search tree that was invented in 1962 by G.M.
Adelson-Velsky and E.M. Landis. The initials AVL were chosen to recognize those
who were responsible for creating the tree. The greatest height difference that may exist
between two of a node's children in an AVL tree is one. This is the only acceptable
value. As a result of this quality, AVL trees are sometimes referred to as "height-
balanced trees" in certain circles. The most significant advantage of using an AVL tree
is that the amount of time necessary to carry out search, insert, and delete operations is
O (log n), which applies to both the typical and the worst-case situations.

The structure of a binary search tree and an AVL tree are almost indistinguishable from
one another. In addition to that, its framework consists of a second variable that we will
refer to as the "Balance Factor." As a direct consequence of this, each node has a
connection to a stability factor. The balancing factor of a node may be calculated by
dividing the height of the node's right sub-tree by the height of its left sub-tree. A
height-balanced binary search tree has nodes that either have a value of -1, 0, or 1,
depending on the case. Indicating that the tree has to be rebalanced is the presence of
any other balance factor on a node. The balancing factor may be calculated by
subtracting the height of the left subtree from the height of the right subtree.

• In 1962, G.M. Adelson-Velsky and E.M. Landis developed the AVL tree, a self-
balancing binary search tree. To honour the people who made this tree possible,
we decided to use the initials AVL. In an AVL tree, the maximum allowed
difference in height between any two offspring of a node is one. This is the

113 | P a g e
maximum value that may be used. This characteristic is what gives AVL trees
the nickname "height-balanced trees" in certain quarters. The primary benefit
of an AVL tree is that, in both the average and worst-case scenarios, the time
required to perform search, insert, and delete operations is O (log n).
• Binary search trees and AVL trees seem quite similar from a structural
standpoint. The structure also includes a second variable, which we'll call the
"Balance Factor." This results in a link between each node and a stability factor.
To get a node's balancing factor, just divide the height of the node's right subtree
by the height of its left subtree. Nodes in a height-balanced binary search tree
may have values of -1, 0, or 1, as appropriate. The existence of any additional
balance factors on a node indicates that the tree needs to be rebalanced.
• Subtracting the left subtree's height from the right subtree's height yields the
balancing factor.

Please refer to figure 10.35. Due to the fact that nodes 18, 39, 54, and 72 do not produce
any progeny, their balance factors are all equal to zero. There is one kid to the left of
node 27, while there are none to the right. This indicates that the subtree on the left has
one level of height, but the subtree on the right has no levels of height. Due to the fact
that this is the situation, its balancing factor is 1. The right subtree of node 36 only has
one level of depth, in contrast to the left subtree, which has two levels of depth. This
indicates that the factor of symmetry for it is 1. The equilibrium factor for node 45 is
1, which can be calculated by subtracting 3 from 2, whereas the equilibrium factor for
node 63 is 0 (1 minus 1). Please refer to Figs. 10.35 (a) and (b), which, respectively,
represent an AVL tree that is right-heavy and balanced.

Figure 4.5 (a) Left-heavy AVL tree, (b) right-heavy tree, (c) balanced tree

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

114 | P a g e
The trees shown in Figure 4.6 are excellent illustrations of AVL trees since each node
has a balancing value of either -1, 0, or 1 as shown in the figure. Nevertheless,
alterations to an AVL tree, such as the addition or removal of nodes, may have an effect
on the balance factor of the nodes, which then requires the nodes to be rebalanced. The
stability of the tree may be restored by turning the node that is considered to be the
most important. Rotations may be labeled as either LL (counterclockwise), RR
(clockwise), LR (left right), or RL (right left), depending on their direction of
movement. The particular sort of rotation that has to be carried out is going to vary
depending on the circumstances. The following is an analysis of the operations that
may be performed on an AVL tree, including adding, removing, searching, and rotating
nodes.

4.5.1.1 Operations on AVL Trees

Identifying the Position of a Branch Within an AVL Hierarchy The steps involved in
searching an AVL tree are exactly the same as those involved in searching a binary
search tree. The height-balancing structure of the tree reduces the amount of time
required to complete the search method to O (log n). Because the technique does not in
any way alter the tree's structure, there is no need for any particular preparations to be
made.

Introducing a New AVL Branch into an Already Established Tree When compared to
the process of putting nodes into a binary search tree, the method of inserting nodes
into an AVL tree is exactly the same. When adding a new node to the AVL tree, it is
always done so as a leaf node. However, following the insertion stage, there is often a
second rotating phase that is carried out. It is possible to rectify a tree's lopsidedness
by rotating the tree. If, on the other hand, the balancing factor of each node stays the
same after the addition of the new node, either -1, 0 or 1, then rotations are not required.

Throughout the process of inserting new nodes, the balance factor of the new node will
remain at 0 due to the fact that it is always put as the leaf node. The nodes along the
path from the root of the tree to the newly added node are the only ones whose balance
factors will be impacted by this change. The following is a list of potential
modifications that might be applied to any particular route node:

• Initially, the node was either left- or right-heavy and after insertion, it becomes
balanced.

115 | P a g e
• Initially, the node was balanced and after insertion, it becomes either left- or
right-heavy.
• Initially, the node was heavy (either left or right) and the new node has been
inserted in the heavy sub-tree, thereby creating an unbalanced sub-tree. Such a
node is said to be a critical node.

Consider the AVL tree given in Fig. 4.7.

Figure 4.6 AVL tree

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

If we establish a new tree and give the newly produced tree a new node with the value
30, then the newly created tree will still be balanced, and we won't have to do any
rotations as a result of this. Examine the tree that is shown in Fig. 4.8, which displays
the tree after the insertion of node 30, and note the changes that have occurred.

Figure 4.7 AVL tree after inserting a node with the value 30

116 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

4.5.3 2-3 Tree

When it comes to binary search trees, we discovered in the previous chapter that the
worst-case scenario for operations such as search, insert, and delete takes O(N) time,
where N is the total number of nodes in the tree, and the average-case scenario for these
operations takes O (log N) time. In addition, we discovered that the worst-case scenario
for these operations takes O(N) time. If, on the other hand, the tree is stable and its
height is O (log N), then the amount of time necessary for any of the three methods is
always O (log N). AVL trees, red-black trees, B trees, and 2-3 trees are some examples
of trees that are commonly considered to have a height balance. These data structures
were discussed in a chapter and section that came before this one; at this point, we will
focus on the data structure known as the 2-3 tree. Depending on the species of tree,
each inner node of a 2-3 tree contains either two or three offspring instead of just one.

• If a node has two offspring, then the node is referred to as a "2-node." Each of
the two nodes in a two-node structure has one data value and two children.
• Nodes that are known as 3-nodes are those that have produced three children.
The three nodes, each of which has three children and two distinct data values,
are organised as follows: one child is located to the left, one child is located in
the centre, and one kid is located to the right.

It is impossible to classify a 2-3 tree as a binary tree for this same reason. There is just
one level in this tree that has any leaf nodes, and that is the level that is the furthest
distant from the root. You may notice a tree that is between two and three years old if
you look at figure 4.9.

Figure 4.8 2-3 Tree

117 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

4.5.3.1 Searching for an Element in a 2-3 Tree

The search operation is what is used to explore whether or not a certain data value,
designated by x, is contained inside a 2-3 tree termed T. The results of this investigation
are displayed in the form of a yes or no answer. The process of looking for a value in a
2-3 tree is relatively analogous to the process of looking for a value in a binary search
tree. Both trees use two and three levels to organise their data.

The root is where the hunt for the data value x should first start. If k1 and k2 are the
two values that are retained in the root node, then you should: move to the right child
if the node only has two children if x is greater than k2 move to the left child if x is less
than k1 move to the left child if x is smaller than k1 move to the left child if x is smaller
than k1 and so on.

If x is more than k1 and the node has three children, go on to the middle child if x is
equal to or less than k2; otherwise, continue on to the right child if x is the same as or
greater than k2. At the end of the process, the node with the data value x is reached if
and only if x is situated at this leaf. This is the only condition under which this can
occur.

4.5.3.2 Deleting an Element from a 2-3 Tree

When you delete something, a specific data value will be deleted from the 2-3 tree.
This happens whenever you delete something. If deleting a value from a node result in
a violation of a tree's property—for instance, if the node is left with less than one data
value—then two nodes will need to be merged together in order to preserve the overall
qualities of a 2-3 tree. This is the case if the node in question has fewer than one data
value after the value was removed.

During the process of inserting new data, it was necessary to add the new value to one
of the tree's leaf nodes. On the other hand, while going through the process of deleting
anything, it is not necessary for the value to be removed from a leaf node. It is possible
for the copy of the value that is stored in any of the nodes to be deleted. In order to get
rid of a value x, the value that comes after it in the ordering is substituted for it, and
then the value itself is removed. After a value is removed from a node, the node is

118 | P a g e
checked to see if it is empty. If it is, the node is merged with another node in order to
restore the attribute of the tree.

119 | P a g e
CHAPTER 5

SORTING AND SEARCHING

In both computer science and the administration of data, the concepts of sorting and
searching are essential building blocks. They are the fundamental building blocks of a
wide variety of algorithms and applications, which provide us with the capability to
efficiently organize, retrieve, and modify information. The process of arranging data in
a certain order, which is referred to as sorting, is essential for a broad variety of jobs,
such as ensuring that databases are well-organized and increasing the effectiveness of
search engines. The act of ensuring that data is structured in a manner that simplifies
future processes, therefore making it simpler to locate, analyse, and make use of the
data, is of essential relevance and is a step that is of critical importance in the process.
When you search, on the other hand, you are looking for specific pieces of information
that are contained inside a dataset.

Finding a specific record in a database, scouring the internet for information, or even
just moving through the items on a sorted list are all examples of activities that require
efficient searching algorithms. Combining sorting and searching is at the heart of many
of the challenges that arise in the course of computational work. Together, they provide
us with solutions that make it possible for us to deal with and retrieve enormous
amounts of data quickly and accurately. During this inquiry, we will delve into the
complexity of sorting and searching by exploring their underlying principles, multiple
algorithms, and real-world applications, which are what make them essential
components of the present digital era. Specifically, we will focus on how these
components are used to organize and find information. In the fields of computer science
and data management, two of the most fundamental operations are searching and
sorting.

Sorting is regarded to be one of the most essential tasks. These two procedures are at
the core of the great majority of the many apps that are now available. When a group
of data is placed in a specific order, often either ascending or descending order, this
process is known as sorting the data. This makes it much easier to gain access to the
information, assess it, and recover it in a timely way. On the other hand, searching is
the process of locating a certain item inside a dataset or collection in order to have
quick and accurate access to the information that is needed.

120 | P a g e
These procedures serve as the basis for a wide range of day-to-day actions, such as
sorting a list of names into alphabetical order, looking for a certain book in a massive
library, or locating an essential entry in a huge database. There has been a meteoric rise
in the number of applications that are driven by data ever since the dawn of the digital
era, and the volume of data has expanded at a rate that is exponentially higher. As a
direct consequence of this, the efficient use of algorithms for sorting and searching is
currently more crucial than it has ever been before.

Not only can these strategies and techniques enhance the experience for the user, but
they also boost the performance and scalability of the systems that are in charge of
managing massive information. During this exploration of sorting and searching, we
will delve into the principles, techniques, and real-world applications that make these
operations key components of computer science, data processing, and information
retrieval. Specifically, we will focus on how to sort data and how to search for
information. To be more specific, we will concentrate on how to organise data as well
as how to look for information.

5.1 INSERTION SORT

The Insertion Sort is a foundational algorithm for sorting data that shows the beauty
that may be achieved by applying computational tools to issues in order to find
solutions. In the realm of sorting algorithms, the Insertion Sort stands out owing to its
fundamental mechanism, in which components are iteratively inserted into their correct
positions within a dataset, finally generating a sorted sequence. This method allows the
Insertion Sort to produce sorted sequences with a high degree of accuracy. Insertion
Sort is distinguishable from other algorithms for sorting thanks to this method. Because
of its uncomplicated nature and the ease with which it may be put into practise, some
people who are new to the world of algorithms and data structures could find that it is
a helpful learning tool.

Insertion Sort is helpful in circumstances where simplicity and efficiency for relatively
small datasets or data that has only been partially sorted are critical needs. Even if it
may not claim the speed of more complicated algorithms like Quick Sort or Merge Sort,
Insertion Sort finds its place to shine in circumstances such as these. Insertion Sort is a
valuable instructional tool that also has a number of applications in the real world,
across many different industries. It is possible, for instance, to use it to sort very small
arrays; it is also possible to use it to enhance the performance of other algorithms by

121 | P a g e
serving as a subcomponent; and it is possible to use it as a stepping stone for more
complex sorting strategies. During this research of Insertion Sort, we will explore the
inner workings of the algorithm, investigate the amount of time and space it consumes,
and discover where Insertion Sort fits into the greater landscape of sorting algorithms.
By carrying out this course of action, we will shine light on the beauty that may be
brought to the process of tackling computational difficulties by making things as basic
as possible.

The Insertion Sort is a method for sorting that is easy to understand and extremely
successful in its use. Over the course of several decades, it has served as an
indispensable component in the study of computer science and the processing of data.
It creates a sorted sequence in a systematic way by continuously inserting components
from an unsorted list into their respective positions inside a growing sorted sublist. This
results in the creation of a sorted sequence. The simplicity of this approach is what
lends it its allure, and it is this quality that makes it so appealing. This method is not
only easy to understand due to the fact that it is basic in nature, but it also contains a
degree of efficiency that makes it suitable for datasets ranging in size from relatively
little to reasonably large. This is due to the fact that this method is simple in nature.

Students and amateurs alike may acquire a better grasp of algorithms and data
structures with the assistance of this essential learning aid for appreciating crucial
sorting notions. The versatility of Insertion Sort extends much beyond its capacity to
sort; in addition to that, it may be used as an effective educational tool for the mastery
of fundamental sorting principles. Throughout the course of this research of Insertion
Sort, we will expose the inner workings of this time-honored way of sorting, as well as
its virtues and shortcomings, as well as its continuous relevance in the creation of
algorithms and the teaching of computer science.

5.1.2 The Algorithm

The Insertion sort is performed using N passes in addition to one pass. The insertion
sort ensures that the items at locations 0 through p are in the right order from pass p=1
all the way through pass N-1. This coverage extends all the way through the sorting
process. Insertion sort makes use of the fact that the items at locations 0 through p-1
are already known to be in sorted order. This information is used to determine the order
in which the items should be inserted. The order of the remaining pieces may then be
determined with the use of this knowledge. Figure 5.1 is an example of the array that
was produced after each iteration of the insertion sort.

122 | P a g e
Figure 5.1 Insertion sort after each pass

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 5.2 Insertion sort routine

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 5.1 is an illustration of the comprehensive strategy. During pass p, we will move
the element that is now positioned at position p to the left until its proper placement is
located among the first p+1 elements. This will continue until all of the elements have
been processed. The code that is seen in Figure 5.2 is what is used to put this strategy

123 | P a g e
into action. The aforementioned data transfer is carried out by Lines 11 through 14,
which do not make any overt use of swaps in the process. The element that was located
at position p has been relocated to the temporary location tmp, and the positions of all
of the larger elements that came before position p have been moved to the right by one
point in order to accommodate this change. After that, tmp will be moved to the area
that is more suitable for it. In the process of putting the binary heaps system into place,
the same technique was adopted, which can be seen in the previous sentence.

5.1.3 STL Implementation of Insertion Sort

The sort methods in the STL do not accept a single argument that is an array of items,
as they do in other libraries; rather, the sort functions acquire a pair of iterators that
represent the start and end marker of a range. This is in contrast to other libraries, in
which the sort methods take a single parameter that is an array of objects. In other
words, the sort functions found in other libraries will take a single input consisting of
an array of items that are equivalent to one another. It's possible that the processes that
sort the data will work more efficiently as a result of this.

Since the action continues with the assumption that the items may be sorted, the only
ones that are utilised in a sort method that has two parameters are the iterators. In
contrast, the third parameter of a sort process that takes three arguments is a function
object when the procedure accepts three inputs. There are a few problems that arise as
a consequence of converting the procedure shown in Figure 7.2 into one that uses the
STL, and these problems need to be fixed. The following is a list of the most obvious
problems:

1. In addition to one that has three parameters, we need to design a kind that has
two of these parameters in order to meet our requirements. It is plausible to
deduce that the sort that has two parameters triggers the sort that has three
parameters, with less (Object) serving as the third argument in both instances.
This is because the sort that has two parameters has the potential to activate the
sort that has three parameters.
2. The method that is used to access arrays will need to be changed to utilising
iterators instead of the current approach.
3. The first version of the code needs us to create a variable with the name tmp on
line 11, but the revised version of the code will have Object as its data type
instead.

124 | P a g e
The first problem presents the most challenge to find a solution to due to the fact that
both of the template type parameters, also known as the generic types, for the two-
parameter sort are Iterator.

Figure 5.3 Two-parameter sort invokes three-parameter sort via C++11 doctype

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Additionally, because of this, the first problem at hand is the one that presents the most
challenge. On the other hand, Object is not one of the generic type parameters that can
be made use of in this context. Before the introduction of C++11, resolving this issue
would have required the creation of extra methods. This was no longer the case with
the new version. Figure 5.3 is an illustration that demonstrates how the newly
introduced doctype feature, which was made available in C++11, makes it possible to
describe the purpose in a clear and succinct manner.

Figure 5.4 illustrates the major sorting code for your convenience. This piece of code
modifies array indexing such that it makes use of the iterator instead of using the
operator, and it also modifies calls to the operator so that they make use of the less
Than function object.

Observe that once we actually code the insertion Sort algorithm, every statement in the
original code is replaced with a corresponding statement in the new code that makes
straightforward Iterators and the function object should both be used in the task that
you are doing. When it comes to the implementation of our sorting algorithms, we
prefer to make use of our more straightforward interface as opposed to the STL
interface since, in our opinion, the original code is much simpler to comprehend.
125 | P a g e
Figure 5.4 Three-parameter sort using iterators

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

5.1.4 Analysis of Insertion Sort

Because of the nested loops, which each have the potential to require N iterations, the
difficulty of the insertion sort is stated as O(N2) in its mathematical form. In addition,
this limitation is restrictive since it may be satisfied even if the input is carried out in
the reverse order. This makes the limitation tough. A thorough calculation indicates
that the most tests that can be run in the inner loop shown in Figure 7.2 for each and
every value of p is equal to p plus one.

This is the maximum number of tests that can be run in the inner loop. If, on the other
hand, the input is already sorted, the execution time is reduced to O(N). This is due to
the fact that the test carried out within the innermost for loop will never succeed
immediately. This is due to the fact that the input is already arranged in the appropriate
sequence. In point of fact, if the input is almost sorted, the insertion sort will finish
rather quickly (this term will be explained in a more specific manner in the following
section). Because there is such a wide variety of outcomes that might occur, it is
essential to analyse how this algorithm acts in the scenario that occurs the most
frequently. In addition to a large variety of alternative sorting algorithms, it has been
126 | P a g e
established that the average case for insertion sort is (N2). This point will be driven
home in the following subsection.

5.2 QUICK SORT

The generic sorting algorithm known as quicksort is the one that has traditionally been
able to perform its duties in C++ in the shortest possible manner. This has been the
situation for a considerable amount of time. This is what the meaning behind its name
is supposed to be. The typical amount of time required to complete the operation is
represented by the notation O (N login), which stands for "order of magnitude, natural
number." The very constrained and meticulously improved inner loop is the primary
aspect that plays a significant role in contributing to its lightning-fast performance. It
has a performance of O(N2) in the worst-case scenario, but with a little amount of work,
it is possible to make this scenario exponentially less likely. This is the case even
though the worst-case scenario has a performance of O(N2). This is the very worst
possible outcome. We are able to attain the short running time of quicksort on nearly
all inputs by combining quicksort with heapsort, which has a worst-case running time
of O (N login). As a result of this, we are able to reap the benefits associated with each
of these environments.

Although it has a reputation for being an algorithm that in theory may be highly
optimised but in practise was difficult to write right for a number of years, the quicksort
approach is really extremely easy to comprehend and demonstrate that it is accurate.
This is despite the fact that it has a reputation for having a reputation for being an
algorithm that in theory may be highly optimised but in practise was difficult to write
correctly. In spite of the fact that it has a reputation for having this reputation, this is
the case. Examples of divide-and-conquer recursive algorithms include the quicksort
and merge sort algorithms. Both of these kinds of algorithms are used to sort data. One
of the most well-known algorithms is called quicksort.

Let us get started by sorting a list by putting everything in descending order utilising
the straightforward procedure that is presented here. Choose an item at random, and
then arrange the other items into one of these three groups according to its dimensions:
those that are smaller than the selected thing, those that are the same size as the chosen
thing, and those that are larger than the chosen thing. After a process of recursive
sorting has been finished with the first and third groups, the contents of the remaining
two groups will have their items combined into one.

127 | P a g e
Because of the basic principles that govern recursion, the culmination of the process
will invariably take the form of an arranged and sorted version of the initial list. This
is the only consequence that can be explained logically. Figure 5.5 illustrates one
simple way that this strategy can be put to use. When used with the vast majority of the
available inputs, this technique generates outcomes that, in the eyes of the vast majority
of people, are at the very least partially acceptable. These results are depicted in Figure
5.5, which is available here. In point of fact, the performance is particularly good if the
list comprises a significant number of duplicates but only a relatively small number of
unique entries, which is the case most of the time and is the circumstance in question
here. This is due to the fact that unique entries consume more storage space than
duplicates do.

128 | P a g e
Figure 5.5 Simple recursive sorting algorithm

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The method that we have just gone over serves as the basis for the quicksort, which
depends largely on the algorithm. In spite of this, it is challenging to comprehend how
we have improved over merge sort given that we are producing additional lists and
doing it in a recursive fashion. In point of fact, at this time we have not really
accomplished this goal at all. If we want to see an improvement in performance, we
will need to make sure that our inner loops are clear of clutter and reduce the amount
of additional memory that we need. Because of this, quicksort is frequently built in a
manner that inhibits the establishment of the second group (the items that are equal),
and the algorithm has a vast number of nitpicky elements that impact its performance;
here is where the complications reside.

Now we are going to speak about the most common approach to implement quicksort,
which is referred to as "classic quicksort." An array is used as the input in this
implementation, and the procedure does not construct any extra arrays of any kind.
Because the partition step only gives a hazy description of what should be done with
items that are equal to the pivot, this must be decided while the system is still in the
design process. A key component of a successful deployment is the management of
this particular circumstance in the most efficient and effective manner feasible. We
would want that binary search trees be balanced, so intuitively we would anticipate that
around half of the components that are equal to the pivot would go into S1 and the other
half would go into S2.

129 | P a g e
The pivot point is determined to be the number 65, which is chosen at random. The
remaining items in the collection have been separated into two distinct collections that
are easier to handle. The order of the lower integers (0, 13, 26, 31, 43, and 57) was
determined by applying rule 3 of recursion to the process of sorting the collection of
integers. The arrangement of the list of really large integers follows the same pattern.
After then, the arranged arrangement of the whole collection may be restored with
comparatively little effort.

This approach's efficacy should be evident, however it is not immediately clear why it
is more effective than merge sort. It should be obvious that this method is effective. It
solves two subproblems in a recursive manner, in the same way as merge sort does, and
it requires linear additional work (step 3). However, in contrast to merge sort, it is not
guaranteed that the subproblems will be of equal size, which may lead to outcomes that
are less than desired. Quicksort is a very time- and resource-effective choice since the
partitioning step may be carried out without leaving the current location. This is the
justification for why the quicksort option is the more expedient one. The great degree
of efficiency that the system operates at more than makes up for the lack of equal-sized
recursive calls that are present in the system.

The approach, in its currently revealed form, is lacking in quite a few elements, which
will now be given by us as a result of our efforts. There are a variety of approaches that
may be used to put steps 2 and 3 into action. The method that is going to be described
in this article is the culmination of a great deal of study and inquiry into its use in the
real world, and it is a highly efficient way to put quicksort into action. Even minor
deviations from the processes specified in this technique might potentially lead to
unexpected and negative results.

5.2.1 Picking the Pivot

Despite the fact that the procedure will work properly regardless of whatever
component is chosen to serve as the pivot, it is still abundantly evident that some of the
available choices are more advantageous than others.

5.2.2 A Wrong Way

The most common tactic, which consists of making the first part of the structure the
pivot point, is also the one that provides the least amount of information. If the input is
presorted or in the opposite order, the pivot will provide an unacceptable partition.

130 | P a g e
However, if the input is random, this is acceptable. This is due to the fact that either all
of the objects are placed in S1 or all of the items are placed in S2, but they cannot be
placed in both places at the same time. This may be tolerated so long as the input is
completely arbitrary. The fact that this problem occurs throughout each and every call
made to a recursive function escalates the level of worry to an even higher level.

If the first element is selected as the pivot and the input is presorted, then the practical
effect of using quicksort will be that it will take quadratic time to accomplish almost
nothing at all, which is a fairly humiliating scenario to be in. The pivot element is the
element that determines the order in which subsequent elements are sorted once they
have been read in. You may avoid this issue by ensuring that the initial component of
your structure is not the pivot of the structure.

Because it is rather common to have data that has been presorted, or input that
comprises a large percentage that has been presorted, choosing the first element as the
pivot is a terrible idea and should be abandoned as soon as it is practically possible to
do so. The element that is selected as the pivot may be determined by which of the first
two distinct things is the greater of the two. Choosing this alternative, however, comes
with the same restrictions as selecting the one that is presented initially in the list. One
further choice that may be made is to make the pivot the component that is the third
largest in size. You should not pivot utilising any of these two strategies since doing so
is not encouraged.

Choosing the pivot at random is a strategy that can never be executed incorrectly,
therefore you should just stick with that. Because it is exceedingly improbable that a
random pivot will regularly yield an unacceptable partition, utilising this approach is
often considered to be risk-free in all circumstances. On the other hand, there is a
possibility that it won't function in the event that the random number generator has a
fault, which occurs more frequently than you may think it would. In this scenario, there
is a risk that it won't work. On the other hand, the production of random numbers is
frequently a pricey commodity that does not affect the remaining portion of the
algorithm's average running time and does not lower the amount of time it takes to
carry out the instructions of the algorithm.

5.2.3 Partitioning Strategy

The method of partitioning that is discussed here is just one of several that are used in
real practise; yet, it is well-known for providing beneficial results because to its use of

131 | P a g e
logical organisation. As we shall demonstrate in the next paragraph, following a
strategy that has been used successfully in the past does not carry any inherent risks;
yet, following this strategy in an ineffective or erroneous manner is relatively
straightforward. The first thing that you need to do is switch the element that came
before the pivot element with the element that came before it. This will allow you to
relocate the pivot element to the end of the sequence. The element that comes first is
denoted by the letter i, while the element that comes next to last is denoted by the letter
j. The current state of affairs is depicted in the following diagram, which is based on
the assumption that the original input was the same as it was in the past:

We are going to proceed with the assumption, at least for the time being, that each
component is one of a kind. Concerns on what steps to take when there are several
occurrences of a certain item will be brought up at a later time. The correct behaviour
for our algorithm is required even in the most extreme scenario, which is when all of
the components are the same. It is shocking how easy it is to behave in a manner that
is not proper.

The phase of partitioning will be successful when all of the elements that have a smaller
size have been moved to the left side of the array and all of the elements that have a
bigger size have been moved to the right side. Regarding the pivot, "small" and "large"
are, of course, terms that are relative to one another. We shift i to the right, yet it remains
in its previous position to the left of j. Because of this, we are able to bypass factors
that are far less relevant than the pivot. We begin by moving j to the left and then
skipping through any components that are larger than the pivot. After both i and j have
come to a stop, i will be pointing to a big component, and j will be pointing to a minor
component. This will happen after both i and j have come to a stop. If the letter i is to
the left of the letter j in the diagram, then the objects in question will be exchanged. As
a direct consequence of this action, a sizable component will be moved to the right,
while a minute component will be moved to the left.

either because it was there to begin with or because the huge piece that had been placed
in position p was moved during a swap and is now in position q. the reason for this
might be one of the two possibilities mentioned above. A line of reasoning somewhat
similar to this one suggests that the objects that can be found in the regions where p is
higher than i must be of a large proportion. The most effective method for dealing with
components that have the same value as the pivot is one of the most significant
considerations that we are obligated to take into account.

132 | P a g e
The questions that need to be addressed are whether or not it is appropriate for i to
pause whenever it discovers an element that is equivalent to the pivot, and whether or
not it is appropriate for j to pause whenever it discovers an element that is comparable
to the pivot. The answers to these questions are required. It seems, from a purely
intuitive standpoint, as though both i and j ought to conduct the same action. This is
due to the fact that any other course of action would result in a biassed partitioning step
being carried out. For example, if element i continues to rotate while element j comes
to a standstill, then all of the elements that are similar to the pivot will find themselves
in state S. This is because element i is the pivot. This is due to the fact that element i
keeps spinning even when element j stops doing so.

In order to obtain an idea of what would be ideal, we consider the scenario in which
every element in the array is the same. This allows us to get an idea of what would be
desirable. This gives us the opportunity to acquire a sense of what would be considered
desirable. If i and j both come to a stop at the same moment, then there will be a lot of
interaction between the components that are the same as each other. In spite of the fact
that this appears to serve no purpose, the final result will be that i and j will intersect in
the middle of the diagram. When this is done, adjusting the pivot will result in the
partition producing two almost equal subarrays. This is a consequence of what has just
been said. As a result, the total amount of time spent executing the software would be
O (N login), just as the study had projected, according to the findings of the research
that was carried out using merge sort.

It is possible that no swaps will ever be performed if neither i nor j ever come to a halt
and there is code in place to prevent them from falling off the end of the array. This is
due to the fact that there is no requirement for them to do so. In spite of the fact that
this appears to be the case, an accurate implementation would move the pivot to the
spot that was most recently touched by me, which would be the position right before to
the very last one (or the very last one, depending on how the proper implementation is
carried out). This would result in subarrays that are not distributed in a consistent
fashion over the available space. In the event where all of the constituents are identical,
the total amount of time necessary to finish the process will be O(N2).

The result is the same as it would have been if one had used the first element as a pivot
for data that had been sorted in a previous step. A time commitment that is quadratic is
required in order to perform absolutely nothing at all. As a result of this, we get the
conclusion that it is desirable to prevent the possibility of having subarrays that are

133 | P a g e
considerably unequal by doing the unneeded swaps in order to generate even subarrays.
This is because it is preferable to prevent the probability of having subarrays that are
significantly uneven. Because of this, we are forced to reach the realisation that it is
preferable to steer clear of the potential of having subarrays that are considerably
unequal.

Due to the fact that this is the case, we will take the necessary precautions to guarantee
that both i and j will come to a halt if they come across a component that is the same
as the pivot. This particular option is the only one out of the four that were taken into
consideration for which there is no requirement for a length of time that is proportionate
to the square of the input. At first look, it could appear to be a ridiculous waste of time
to concentrate one's efforts on a collection of components that are almost identical to
one another in almost every respect. Why, exactly, would anyone want to sort 500,000
components into their respective bins when they are all identical to one another? Keep
in mind that Quicksort is a recursive algorithm.

This is in contrast to other sorting algorithms. Let's say there are 10,000,000 items and
500,000 of them are the same (or, more likely, complicated components whose sort
keys are the same). In the not-too-distant future, quicksort will be able to independently
carry out the recursive call on all 500,000 of these objects. In the event that this turns
out to be the case, it will be of the utmost importance to ensure that 500,000 related
components can be sorted in an effective manner.

5.2.4 Small Arrays

Insertion sort has a better speed than quicksort when working with arrays with N that
is less than 20. In addition to this, because quicksort is a recursive algorithm, the
situations described above are going to occur rather frequently. When working with
very small arrays, one alternative that is frequently available is to omit utilising the
recursive feature of quicksort and instead use a sorting algorithm that is efficient when
working with arrays of a lower size, such as insertion sort. This is a common practise.
Utilising this strategy may truly save around 15 percent of the entire running time when
compared to not employing any cutoff approach at all, which is a significant amount of
time. Any cutoff between 5 and 20 is likely to provide findings that are equivalent to
one another, despite the fact that N = 10 is indicated as an appropriate cutoff range.
This makes it easier to steer clear of unfavourable and degenerate situations, such as
computing the median of three items when there are only one or two of those
components.

134 | P a g e
5.2.5 Analysis of Quicksort

Because quicksort, like merge sort, is a recursive algorithm, the solution to a recurrence
formula is required in order to comprehend how the algorithm operates. We are going
to do the research for a quicksort with the assumptions of a random pivot (no median
of three partitions), and no cutoff for arrays that are very small. In the same manner as
merge sort, we are going to set our starting point at T (0) = T (1) = 1. The amount of
time necessary to carry out quicksort is comparable to the amount of time necessary to
carry out the two recursive calls in addition to the amount of linear time spent in the
partition (the pivot selection only takes a constant amount of time to be carried out).
This is a general summary of the quicksort connection.

5.2.6 A Linear-Expected-Time Algorithm for Selection

In order to solve the problem with selection, Quicksort's settings can be adjusted in
many ways. It is essential to bear in mind that by employing a priority queue, we may
identify the kth element that is either the largest or the smallest at a period that is N
plus k login. As a consequence of this, a technique that has the complexity of N log N
is produced, and it is this approach that may be utilised to determine the median value.

In light of the fact that we are capable of sorting the array in O (N login) time, it is fair
to predict that we will be able to get a better time constraint for selection. The quicksort
method is pretty comparable to the one that we provide, which is intended to locate the
kth smallest element in a set S. Both methods are aimed to find the smallest element in
the set. In point of fact, the first three steps are precisely the same as one another. We
are going to refer to this process as quick select from this point forward. Si will be
represented by the symbol |Si| to denote the total number of elements it contains. The
following is a list of the steps involved in doing a rapid select:

1. If the value of "S" is 1, then "k" should be equal to "1," and the response should
be the element that was found in "S." If a cutoff is being used for very small
arrays, and if S has a larger value than CUTOFF, then sort S and return the
element that has the smallest size that is kth in the list.
2. Determine which of the components, v > S, is the most important.
3. Using quicksort, partition the S v array into the S1 and S2 subarrays as was
done in the previous step.
4. If k is less than or equal to S1, then the kth most basic element must be present
in S. If k is more than S1, then this element cannot be included. Bring back the

135 | P a g e
fast select that was used earlier (S, k). If k = 1 + |S1|, then the element that is
one size smaller than k is the pivot, and we may use that element as the answer
that we return if that is how the equation is written. In the event that this is not
the situation, the kth smallest element may be found in S, and it is the (k | S1 |
1) st smallest element in S2. Following the execution of a recursive call, the
quick select function is returned with the arguments (S, k |S1| 1) in its argument
list.

In comparison to quicksort, the alternative algorithm known as quick select is more


time and resource efficient because it only makes one recursive call. Both quicksort
and fast select have the same worst-case scenario, and both have the same complexity
level, which is O(N2). Given that the worst-case situation for quicksort happens when
either S1 or S2 is empty, this makes logical sense; hence, using fast select does not
truly save a recursive call. Notwithstanding this, the average amount of time spent
running is N. The analysis will be quite comparable to that of quicksort, and you will
be responsible for carrying it out as part of the exercise.

5.3 MERGE SORT

At this point, we are going to direct all of our attention and energy into the merge sort.
The worst-case running time for merge sort is O (N login), and the number of
comparisons it uses is quite near to being optimal. Merge sort is a sorting algorithm. A
recursive method is beautifully demonstrated by this particular example. The basic
activity of this approach is to combine the two sorted lists into a single master list. If
the output is added to a third list, then this work may be finished in a single loop of the
input data since the lists are already sorted. This is because the output was added to the
third list.

The basic process of merging data needs the use of three arrays: an input array (C), an
output array (A and B), and three counters (Actr, Bctr, and Cctr), all of which are set
to point to the beginning of their respective arrays when they are first initialised. The
item in C that has a lesser counter value than either A[Actr] or B[Bctr] is duplicated,
and the counters that belong to it advance one position. When either of the input lists
reaches its termination, the remaining items from the other list are copied over to C.
For the next piece of input, an illustration of how the merging code ought to operate
has been provided. In the scenario in which array A has the numbers 1, 13, 24, 26, and
array B contains the numbers 2, 15, 27, 38, the algorithm will proceed as follows in
this case: The first comparison that will be made is between the numbers 1 and 2. C
136 | P a g e
receives a boost of one, making its total value 13; this value is then contrasted with 2;
C receives the higher score.

C gets an additional 2, and then the value of 13 is compared to that of 15.

C has an additional 13 added to it, and then the value of 24 is compared to 15. This
continues until a comparison of 26 and 27 is made.

After adding 26 to C, there are no more elements in the A array.

After that, the remainder of the array B is transferred to array C.

A linear amount of time is required to complete the process of merging two sorted lists.
This is due to the fact that there is a limit of N minus 1 comparison that are made, where
N is the total number of entries. In other words, there are restrictions placed on the
number of comparisons that are carried out. This may be shown by observing that each
comparison adds one element to C, with the exception of the very final comparison,
which adds at least two items to C. This can be done because each comparison
contributes one element to C. In this particular instance, the last comparison brings
forth at least three new components.
137 | P a g e
This is the reason why the description of the merge sort method is not very difficult. In
the case when N is equal to one, there is just one component to arrange, and the solution
is there in front of you. In the event that this is not the case, it is recommended that a
recursive merge sort be carried out on both the first and second halves of the data. After
completing this step, you will have two halves that have been sorted, and you will then
be able to combine them by using the technique that was described in the step before
this one in this section. For example, to sort the eight-element array 24, 13, 26, 1, 2, 27,
38, 15, we first recursively sort the first four entries, and then we sort the final four
elements, which gives us the results 1, 13, 24, 26, 2, 15, 27, 38.

Similarly, to sort the array 27, 38, 15, we first sort the first four items, which gives us
the result 1. In a same manner, in order to sort the array 27, 38, we must first sort the
last four entries, which provides us with the result 15. Following this, we combine the
two lists in the same way that we did earlier, which results in the final list, which has
the numbers 1, 2, 13, 15, 24, 26, 27, and 38 as its components. This algorithm is a
fantastic example of the tried-and-true tactic known as "divide and conquer." The
challenge is conquered by first breaking it down into a series of less challenging
challenges, which are then tackled with the recursive approach to problem solving. You
will be putting the pieces of the solutions together during the phase in which you
conquer a place while you will be doing this.

The effective divide-and-conquer strategy will provide us with a lot of opportunities


that we will be able to exploit owing to the fact that recursion will offer us many
instances of the aforementioned possibilities. The merge Sort that just has one
parameter is utilised as a driver for the recursive merge Sort that possesses four
parameters. The merge Sort being discussed here is only a driver. In most cases,
merging employs a quite unremarkable method. If a local temporary array is created
before each iteration of the merge function's recursive execution, then there can be
login unique temporary arrays in use at any given time.

This is due to the fact that the merge function produces a new duplicate of the temporary
array at the beginning of each iteration. An intensive study revealed that there is only
ever need to be one temporary array that is operational at any one time, and that this
array can be produced by the public merge Sort driver. This revelation was reached as
a result of the finding that there is only ever required to be one temporary array that is
active at any given time. This is due to the fact that the merge operation occurs on the
very last line of the merge Sort algorithm. In addition, we are free to utilise any section

138 | P a g e
of the temporary array, and in point of fact, we are going to use the exact same piece
as a portion of the input array. This gives us a lot of flexibility in how we may organise
our data. Because of this, the improvement that is going to be discussed in the next
paragraph is something that is doable. The method of merging may be observed via the
use of Figure 7.12, which shows an illustration of the process.

5.3.1 Analysis of Merge sort

The merge sort is one of the most well-known examples of the several methods that
can be used to evaluate recursive routines. There are many different ways that this may
be done. In addition to that, it is one of the approaches that is the least complicated. In
order for us to be able to determine how long the programme will really take to execute,
it will be necessary for us to build a recurrence relation. In order to guarantee that the
outcomes of each and every one of our divisions are consistently just and reasonable,
we shall operate under the assumption that N is a power of 2. When N is equal to one,
the amount of time required to complete the merge sort remains the same, and we will
use the number 1 to denote that this continuity has been maintained. In the event that
this is not the case, the amount of time required to do a merge sort on N numbers is
identical to the amount of time required to perform two recursive merge sorts with a
size of N/2. This is on top of the amount of time that is needed to merge in a manner
that is linear. Take a look at the following equations to get a better understanding of
the actual relevance of this:

This is a typical example of a recurrence relation, and there are a variety of approaches
that may be taken to deal with it. We will provide two distinct ways of approaching the
problem. The first thing that may be done is to use N to make a vertical cut through the
centre of the recurrence relation. The reasoning behind why we are acting in this
manner will become immediately apparent to you. The findings are as follows:

Because this equation holds true for every value of N that is a power of 2, we can
alternatively express it as
139 | P a g e
And

To bring this discussion to a close, let's finish by adding up all of the numbers in the
equations. This indicates that we are going to be adding up all of the terms that are
situated on the left side of the equation, and after that, we are going to make the result
equal to the total of all of the terms that are situated on the right side of the equation. It
is of the utmost importance to take into consideration the fact that the term T(N/2)/(N/2)
exists on both sides of the equation; hence, its influence is negated. In point of fact, the
majority of the statements appear on both sides of the argument; hence, they cannot
coexist due to their incompatibility. The procedure in issue is commonly referred to as
"telescoping a sum." When each of the individual components of the whole is brought
together, the finished result is

due to the fact that all of the other terms are eliminated, leaving logN equations, which
indicates that the sum of all of the 1s that occur at the end of these equations equals
logN. Simply multiplying everything by N will lead you to the solution you're looking
for.

It is essential to keep in mind the fact that the total would not skyrocket if we did not
divide through by N at the beginning of the replies. This is one of the most crucial
things to keep in mind. As a consequence of this reality, it was necessary to partition
everything through N. Alternately, you might try replacing the recurrence relation
continually on the right-hand side. This would be an alternative method. In this case,
the

140 | P a g e
Since we can substitute N/2 into the main equation,

we have

Again, by substituting N/4 into the main equation, we see that

So, we have

Continuing in this manner, we obtain

Using k = login, we obtain

It is important that the individual's preferences be taken into account while making the
decision on the approach that will be taken. The first method typically results in
unnecessary work that is more easily handled on conventional paper with dimensions
of 812 by 11 inches. This, in turn, results in fewer mathematical errors; nevertheless,
in order to make effective use of this method, a certain level of expertise is required.
The second strategy is one that lays a higher focus on the use of raw force to accomplish
what it sets out to do. This strategy seeks to accomplish what it sets out to do.

In spite of the fact that merge sort has a running time of O (N login), it has a severe
flaw in the sense that in order to combine two sorted lists, it requires linearly more
141 | P a g e
memory than the other sorting methods. This is a huge detriment to the offering. The
additional effort required to transport data to and from the temporary array at various
stages in the algorithm causes the sorting process to go at a considerably more slow
pace than it normally would if it hadn't been included in the method. The elimination
of this repetition may be accomplished by skillfully swapping the functions of a and
temporary at various levels of the recursion in order to get the outcomes that are desired.
A variant of merge sort that does not need recursion is one of the iterations that might
be put into practise and is one of the options available.

When performing a generic sort in Java with a Comparator, for example, an element
comparison can be expensive (due to the fact that comparisons may not be easily
inclined, and as a result, the overhead of dynamic dispatch may slow things down),
whereas moving elements is inexpensive (due to the fact that reference assignments,
rather than copies of large objects, are being used). This is because moving elements
uses reference assignments, rather than making copies of large objects. When compared
to other common sorting algorithms, merge sort is the algorithm that does the fewest
number of comparisons; hence, it is an excellent contender for general-purpose sorting
in Java. In point of fact, the algorithm in question is the one that is utilised by the
standard Java library for the purpose of generic sorting in order for it to be successful
in achieving its objectives.

On the other hand, comparing objects is often not extremely expensive since the
compiler has the ability to aggressively apply inline optimisation. This makes the
comparison of objects very inexpensive. This results in the process of comparing things
being one that has a very low cost. In contrast to this, the situation that predominates
in conventional C++ requires that a generic sort and the copying of objects may be
time-consuming and expensive depending on the size of the objects that are being
copied. This is because of the way that traditional C++ handles generic sorting and
object copying.

If we are able to drastically reduce the amount of data that is communicated while
maintaining the same degree of accuracy, then it may be reasonable to request that an
algorithm carry out a few additional comparisons in the context that has been
demonstrated here. Quicksort, which will be covered in the next section, is the sorting
function that has been used the most frequently in C++ libraries because it successfully
achieves this compromise. You can learn more about Quicksort in the following
section. This is due to the fact that the sorting function known as Quicksort will be

142 | P a g e
discussed in the section that follows this one. Because the new move semantics in
C++11 have the capacity to change this dynamic, it is still unclear whether or not
quicksort will continue to be the method of choice for sorting data in C++ libraries.
This is because the new move semantics in C++11 have the ability to modify this
dynamic. This is due to the fact that the new move semantics in C++11 have the
potential to alter the dynamics of the situation.

5.4 HEAP SORT

Using priority queues, it is feasible to sort data in the temporal complexity of O (N


login). Heapsort is the name of the algorithm that is developed from this idea, and it
delivers the greatest Big-Oh running time that we have seen up to this point in our
observations.

This stage takes O(N) amount of time to complete. Following that, we are going to
carry out N deletion activities. In accordance with the established sorting order, the
objects that take up the least space are taken from the stack first. We are able to sort N
elements by first saving them in a second array and then copying the original array back
into place. This allows us to sort the elements in descending order of importance. The
total amount of time needed for the execution is O (N login), which is determined by
the fact that the time necessary for each deletion action is O (login).

The fundamental problem with this strategy is that it requires use of an extra array,
which is why it should be avoided. As a direct consequence of this change, the
minimum amount of RAM that must be present has increased by 100 percent. In certain
situations, this might be a difficult problem. It is essential to keep in mind that the
amount of additional time spent copying the second array back to the first is just O(N),
which suggests that this will not likely have a significant influence on the total amount
of time it takes to complete the task. The problem at hand is the lack of room.

It is possible to circumvent the requirement for a second array by utilising the clever
strategy of making use of the fact that the size of the heap is decreased by one after
each execution of deletion. Therefore, the component that was just taken out of the pile
may be placed in the cell that was at the very bottom of the heap when it was taken out.
Take into consideration the following possibility: There are six distinct components
that make up a heap. a1 is the output of the first deletion action that was performed.
Now that there are only five components left in the heap, we are free to shift a1 to

143 | P a g e
position 6, which is where it should be. The succeeding deletion action produced the
value a2 as its output. Because there would then be a total of just four components in
the heap, we may put a2 at position 5, as this would be the only remaining available
spot.

After the completion of the last deletion action, this method will result in the array
containing the components arranged in a manner that is decreasing according to their
respective values. If we want the items to be ordered in the standard ascending sort
order, we may change the ordering property such that the parent has a larger element
than the child. This will achieve the desired result. This will bring about the outcome
that is wanted. As a direct consequence of this, we are now working with a heap.

Our approach will make use of a (max)heap, but we won't really implement the ADT
because we want it to be as fast as possible. Arrays are utilised for each and every
operation, as is standard practise. The initial step, which requires a constant amount of
time, is the creation of the heap. After that, we carry out N minus one deleteMaxes,
which involves lowering the position of the element that was originally at the top of
the heap, working our way downward, and reducing the size of the heap. When the
procedure is finished, the array will have the objects organised in the appropriate
sequence. Consider the following example: 31 41 59 26 53 58 97 is the input sequence.
Take into consideration these figures in the order presented. The pile that was produced
as a result is shown in Figure 5.6.

Figure 5.6 (Max) heap after build Heap phase

144 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 5.7 depicts the heap that is created as a result of the first delete axe operation
that is performed. The numbers indicate that the final item in the heap is 31, but 97 has
been placed in an area of the heap array that is mathematically considered to no longer
be a part of the heap. However, the elements that are still present in the heap array will
be reorganised in the appropriate order following the execution of five additional delete
axe operations. This will result in the heap no longer containing any items.

Figure 5.7 Heap after first delete Max

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The fact that the data in the array used for heapsort start at position 0 rather than array
index 1 presents a little challenge. This is in contrast to the binary heap, in which the
data start at array index 1. As a direct result of this, the code is rather distinctive in
comparison to the binary heap code. The changes are not particularly significant.

5.4.1 Analysis of Heapsort

In the end, this results in the formation of the heap and needs fewer than 2N
comparisons. In the second phase, the itch delete Max comparison will use no more
than less than 2log (N i + 1) comparisons, for a total of no more than 2N logN O(N)

145 | P a g e
comparisons (assuming that N is more than 2). Even in the worst-case situation,
heapsort will only execute a maximum of 2N logN O(N) comparisons. This is because
of the way the algorithm works. In this exercise, you are going to be challenged to
demonstrate that it is possible for all of the delete Max processes to concurrently reach
their worst-case condition. This will be a challenge for you since it will need you to
demonstrate that it is possible.

Experiments have shown that the performance of heapsort is highly constant, which is
a significant benefit. It only conducts a somewhat lower total number of comparisons
than the worst-case limitation suggests it ought to, on average. For a good number of
years, nobody had been able to establish that there were any nontrivial limits on the
typical length of time that heapsort required to carry out. It would appear that the
problem is that repeated executions of delete Max compromise the randomness of the
heap, which, in turn, makes the probability arguments extremely hard to comprehend.
In the end, it was determined that a different method was the one that was successful.

Theorem is Presented

The typical number of comparisons that must be performed in order to heapsort a


random permutation containing N distinct items is 2N login O (N login).

Proof

It is sufficient for us to illustrate the bound for the second phase since the step of heap
building conducts an average of N comparisons. In this scenario, we are going to
assume that the permutation is 1, 2..., N. Let's imagine that the itch delete Max
command causes the root element to drop down to the lower di levels. After that, it uses
the 2di algorithm to do comparisons. There is a cost sequence that can be applied to
any input known as D: d1, d2..., dN that calculates the cost of phase 2 for the heapsort
algorithm. This sequence is used to compute the cost. This cost may be determined by
applying the formula MD = N i=1 di; the total number of comparisons that were carried
out is thus 2MD.

Let's suppose that f(N) is a representation of the number of stacks that hold N objects
altogether. Using the exercise that has been provided, it is feasible to demonstrate that
f(N) is bigger than (N/(4e)) N (where e = 2.71828...) in a way that is mathematically
sound. We are going to show that the cost of only a very small percentage of these

146 | P a g e
heaps, precisely (N/16) N, is less than M = N (logN log logN 4), which is the formula
for the total cost of all of these piles. This is something that can be demonstrated by us.
Because of this, it follows that the average value of MD is at least M minus a term that
is o (1), and as a consequence, the average number of comparisons is at least 2M. In
addition, because of this, the average value of MD is at least M. After this is
demonstrated, it will become clear that the mean value of MD must be at least M. As a
direct consequence of this, the major objective of this research is to demonstrate that
there exist an exceptionally small number of heaps that have low-cost sequences.

There is a total of 2di probable sites where the root element may be put in any di. This
is because level di can only have a maximum of 2di nodes. As a direct result of this,
the greatest number of distinct delete Max sequences that are compatible with any
particular sequence D is.

An elementary application of algebra reveals that there are no more than logN different
possible sequences D for each given sequence D. This number is determined by the
length of the sequence. The letter D stands for the total number of different sequences
that are feasible. This is due to the fact that each di can take on a variety of values, the
smallest of which is 1 and the largest of which is logN. Therefore, the maximum
number of unique delete Max sequences that require a cost exactly equal to M is equal
to the number of cost sequences that have a total cost of M multiplied by the number
of delete Max sequences for each of these cost sequences. In other words, the maximum
number of distinct delete Max sequences that require a cost exactly equal to M is equal
to M. To put it another way, the greatest number of unique delete Max sequences that
need a cost exactly equal to M is equal to M itself. The next occurrence is an example
of a bound for (logN) N2M, which takes place.

The maximum possible value for the total number of piles with a cost sequence that is
less than M. This is the greatest value that may potentially be achieved.

147 | P a g e
If we make the assumption that M is equivalent to N(logN log logN 4), then the greatest
number of heaps that may have a cost sequence that is less than M is (N/16)N. This
result may be derived from the observations that we made earlier on in this section. It
is feasible to demonstrate, via the use of an argument that is more complicated, that
heapsort always performs at least N logN O(N) comparisons and that there are inputs
that are capable of achieving this requirement. This may be done by showing that there
is an input that is capable of meeting this constraint. In addition, the comparisons of
2N logN O(N) may be included to the study of the typical situation if that were to prove
useful.

5.4.2 Heap Sort Algorithm

Think about the following possibility if you want to discover a solution to the problem:

To begin, use the happify command so that the array is converted into a heap data
structure. After that, remove each of the Max-heap's root nodes one at a time and then
replace each of them with the remaining nodes in the heap. In conclusion, bring joy to
the centre of the pile. It is necessary to keep doing this step until the size of the heap is
more than 1.

• Construct a heap by making use of the input array that was given to you.
• Carry on with the processes that are described below until there is just one
component left in the heap:
• You should switch the element that is at the very end of the heap with the one
that is at the very root of the heap. The root element is the one that occupies the
most space.
• Remove the very last item from the stack, which at this time ought to be placed
in the appropriate manner.
• Continue piling the remaining components of the heap upon one another.
• In order to construct the sorted array, it is possible to reverse the order in which
the elements are listed in the array that is being used as the input.

5.4.3 complexity Analysis of Heap Sort

The degree of difficulty, measured in terms of time, is O (N log N).

The amount of auxiliary space that is required is O (log n), and this is because the call
stack can recursively call itself. The iterative implementation, on the other hand, could
simply require O(1) additional space for its auxiliary data.
148 | P a g e
Important things to keep in mind with reference to the Heap Sort method:

• One example of an algorithm that runs in-place is referred to as the heap sort.
• The traditional implementation is not stable, although it is feasible for it to
become stable.
• In most circumstances, about two to three times slower than a properly
constructed Quicksort. The primary contributor to the delay is the lack of a
point of reference that is located locally.

The following are some of the benefits that come with using the Heap Sort:

It is useful and productive. The complexity of time required to complete a heap sort is
usually expressed as O (n log n). This holds true no matter what the circumstances are.
As a consequence of this, it is an efficient way for sorting through extremely large
datasets. The log n factor, which is obtained from the height of the binary heap, ensures
that the technique continues to have good performance even when dealing with a large
number of items. This is because the height of the binary heap is used as the basis for
calculating the log n factor.

Memory consumption may be maintained to a minimum because, in addition to the


memory space needed to hold the initial list of items to be sorted, it does not require
any more memory space in order to work correctly. This means that memory
consumption can be kept to a minimum. Because of this, the amount of memory that is
used may be maintained to a minimum. Ease of Understanding - In contrast to other
sorting algorithms that are just as effective, this one does not make use of challenging
concepts from the field of computer science, such as recursion, which makes it much
simpler to understand.

The following are some of the drawbacks of using heap sort:

 Costly: the heap sort is a costly operation.

 Unreliable: the heap sort algorithm is not secure due to its instability. It is
conceivable that the current order will be replaced with a different one.

 Efficient: the Heap Sort method is not especially efficient when it comes to
dealing with exceedingly sophisticated data, despite the fact that it is effective.

149 | P a g e
5.5 LINEAR SEARCH

When doing a linear search, the variable f is thought of as being a linear function of the
variable i, and the formula for f(i) is often written as "i equals f(i)." In game terms, this
means employing the wraparound tactic while hunting for an empty cell by checking
each cell in turn in order to narrow down the possibilities. Figure 5.8 illustrates the
result of using the same hash function that was used in the previous example as well as
the collision resolution method of setting f(i) equal to i. This result is obtained when
the keys "89, 18, 49, 58, 69" are inserted into a hash table. Additionally, Figure 5.8 also
depicts the result of using the same hash function that was used in the previous example.
This chart illustrates the results that may be achieved by utilising both approaches.

The first collision happens when the number 49 is entered; it is placed in the next
available slot, which happens to be spot 0, which is unoccupied at the time. This causes
the first collision to take place. The result of this is that the first collision takes place.
Following the appearance of the numbers 18, 89, and then 49, the key 58 may be found
three spaces away in a cell that is otherwise empty. This revelation is the result of a
string of unfortunate events. The collision with the 69 is resolved in a manner that is
analogous to that which was described in the phrase that came before this one. Even
while it is always possible to find a free cell so long as the table is sufficiently large
enough, the amount of effort that is necessary to do so can often be rather substantial.
Worse still, clusters of occupied cells begin to emerge even when the table's available
area has only a part of its space filled with information.

This is a significant problem. Before any key that hashes into the cluster can be added
to it, there will need to be many attempts made to resolve the collision that it causes.
The main clustering effect is another name for this phenomenon. After a series of
unsuccessful attempts to insert the key, it will ultimately be successful. The scope of
this inquiry encompasses a wide variety of linear data structures in their many forms.
A linear data structure is an arrangement in which the components of a data structure
are organised in such a manner that it is possible for those components to be read and
written in a linear fashion. This makes the linear reading and writing of the components
of the linear data structure a possibility. Because of this, the linear data structure is an
illustration of an arrangement that falls under the category of being a linear data
structure. When an arrangement is put together in this way, we say that it has been
"linearized." Examples of linear data structures include an array, a stack, a queue, and
a linked list. Another example is a queue.

150 | P a g e
A linked list is yet another illustration to consider. A queue of people waiting to be
served is one more example of this. A linked list constitutes an additional instance that
needs to be taken into account. It is essential to make use of a sequential memory
location in the event that linear data structures are being saved in the memory. This is
necessary in order to proceed further. As a consequence of this, there will invariably
be a linear relationship between the many components of the structure that are now
being stored. Because the structure is being stored in a linear form, this is the result.
data structures that do not arrange their contents in accordance with a linear pattern are
referred to be non-linear.

If the individual components of a data structure are not maintained in the same order in
which they were produced initially, then we refer to the structure as having non-linear
components. One illustration of a linear data structure is a data structure that looks like
a tree. The overarching name for this type of data structure is referred to as a "non-
linear data structure," and it is used to refer to a specific subtype of the category. a
graphical representation of it in addition to having it in the form of a tree is also
included here. Memory is laid up in a way that is both location-based and random so
that it may be used to store non-linear data structures. This allows memory to be utilised
for both of these purposes. Because of this, memory is capable of functioning as a
storage medium for non-linear data structures. This is done for logistical reasons having
to do with storage.

Even if we are not going to finish the calculations right now, it is possible to
demonstrate that the anticipated number of probes when employing linear probing is
about 1 2 (1 + 1/ (1) 2) for insertions and unsuccessful searches, and 1 2 (1 + 1/ (1)) for
successful searches. This can be done by demonstrating that it is possible to
demonstrate that it is possible to demonstrate that it is possible to demonstrate that it is
possible. This may be accomplished by proving that the number of probes required for
successful searches using linear probing is about 1 2 (1 + 1/ (1)). Regardless of this, we
are not going to immediately carry out the computations right at this moment in time.
The calculations are going to be more difficult than I had originally thought them to be.
It is not difficult to deduce from the code that the exact same number of probes are
needed for both insertions and failure searches. This conclusion can be reached with
relative ease. Finding something should, on average, take less time than looking for
something that does not exist. This is something that can be understood with just a
moment of reflection; all it takes is a little bit of time. This is something that ought to
be the norm in the situation.

151 | P a g e
In the event that clustering does not turn out to be troublesome, deriving the necessary
formulae should not be too challenging. In this part of the article, we are going to
proceed on the basis of the premise that there is a very vast database, and that each
probe operates independently of the ones that came before it. These assumptions are
realistic and can be satisfied by an approach that uses a random collision resolution
strategy, unless is extremely near to 1. In that case, they are not rational and a random
collision resolution strategy will not be able to meet them. The very first thing that we
are going to conduct in this procedure is figure out the average quantity of probes that
would be used during a search that did not end in a positive finding. This is an estimate
of the number of probes that will need to be employed before we locate a cell that is
vacant and may be termed empty.

Given that there is an equal number of unoccupied cells and occupied cells, the number
of cells that we should be able to analyse is equal to 1/(1) of the total number of cells
in the organism. This is the case given that the proportion of vacant cells is same to 1.
The same number of probes is required for a successful search for a particular element
as is required for the successful insertion of that element, making the minimum number
of probes required for both tasks identical. When a search turns up no results, the only
option left is to manually complete the missing piece of the jigsaw by filling in the
blank with the relevant information. Because of this, we are in a position to be able to
compute the usual cost of a fruitful search by employing the cost of a search that was
not productive as a basis for our calculations.

The catch is that goes from 0 to its current value, which suggests that earlier insertions
are less expensive and should result in a lower average. The catch is that this moves
from 0 to its current value. You can observe the shifts from 0 to their present value
farther down in this section. For example, in the table that is shown in Figure 5.8, equals
0.5. However, the cost of accessing 18 is not determined until after 18 has been added
to the database, so equals 0.5 does not represent the actual cost of accessing 18.

At that particular second in time, the value was 0.2. Since element 18 was introduced
into a table that was mostly empty before it, gaining access to that element ought should
be simpler than gaining access to an element that was very recently entered, such as
number 69, since element 18 was added into a table that was mostly empty before it.
By applying an integral in the calculation of the mean value of the insertion time, which
gives us the mean value of the insertion time, we are able to get a fair estimate of the
average. This is made possible by the fact that the integral offers us the mean value of
the insertion time.

152 | P a g e
When compared to the formulas that are used for linear probing that are related to them,
these formulas are substantially more advanced and accurate. Clustering presents a
problem not just in the academic study of the topic, but also in its application in the real
world.

If is equal to 0.75, then the approach described above implies that 8.5 probes should be
used for linear probing to seek for an insertion. This number of probes is based on the
assumption that linear probing is being performed. In the event that equals 0.9, the
expected number of probes will be fifty, which is not an acceptable quantity. This
would be the same as having 4 and 10 probes for the appropriate load factors if there
were no problems with clustering. Based on the results of these calculations, it is
abundantly evident that linear probing is not the most appropriate strategy to apply in
situations when it is predicted that the table would be more than half full. If, on the
other hand, equals 0.5, then an insertion will only require an average of 2.5 probes,
while a search would only require an average of 1.5 probes. Both of these numbers are
significantly lower than the previous ones.

A linear search, which is also sometimes referred to as a sequential search, is a method


that may be used to find a certain object in a list of other items. This method is termed
a linear search. These two names are both used to refer to the same object. This form
of searching algorithm looks at each item on the list individually, either until a match
is found or until the entire list has been gone through and examined. In the most worst-
case scenario, linear amounts of time are required for the Linear Search, and it is only
capable of carrying out a maximum of n comparisons, where n is the total number of
items on the list.

However, this average case may be changed if the search probability for each element
is different. Linear Search has an average case of n+1/2 comparisons, but this average
case can be changed. If every element has an equal probability of being searched, then
the linear search will do an average of n and a half comparison in each case. There are
a great number of additional search algorithms and schemes available, such as the
binary search algorithm and hash tables, which provide a search operation that is
noticeably more expedient for all lists. When compared to various other search
algorithms and strategies, linear search is often not a realistic choice.
153 | P a g e
The following procedures need to be carried out in order to put into effect Linear Search
in an appropriate manner:

1. In order to get started, we are going to have to utilise a for loop in order to
examine the items that are contained within the array.
2. When looking for an element in a list of elements, we should start by comparing
the search element with the current list of elements while iterating through the
loop. This should be done throughout the whole process. This is the most
effective method for locating the component that we are seeking for. If the
element that is being searched for and the element that is now included in the
list of elements both include the same element, then we need to supply the index
of the element that is relevant to the array. If the element does not contain the
element that is being looked for, then we do not need to offer that index.
3. Proceed to the next item on the list if the component that we are searching for
does not appear to be in the location that we last checked for it.

5.5.1 Linear Search in Data Structure

Imagine for a second that you have been assigned with the mission of identifying a
certain volume among the assortment of books that you keep on your own shelves and
that you have only a short amount of time to complete the assignment. In the event that
the stack of books is not organised in any particular fashion, the one and only way to
discover a certain volume is to read the names of the books that are included in the
stack. You are going to put an end to this process as soon as you find the book you are
searching for, and you are not going to look into any other books at all. The concept of
linear search may be seen in action in the following real-life scenario.

A data structure may be searched using a variety of different methods, the most
fundamental of which is the linear search. It is put to use to search for any element that
may be contained within a linear data structure such as an array or a linked list. This
type of data structure may be found in computer programming. The Linear Search
method performs a comparison of the search element with each individual element that
makes up the data structure. If the search element is discovered, the algorithm then
returns the position of the element. In the discipline of computer science, one of the
most important steps is often believed to be the discovery of an algorithm. This stage
comprises using a method that is broken down into steps in order to discover a specific
piece of data that is hidden somewhere among a large collection of data.

154 | P a g e
Each and every search algorithm calls for the user to provide a search key in order to
be able to effectively finish the operation. The study of computer science has resulted
in the development of a broad range of search algorithms. The efficacy and productivity
of the data that is being given is determined by the manner in which these algorithms
are employed (i.e., the manner in which the data itself is utilised). These algorithms are
separated into two distinct categories according to the different sorts of search
operations that each of them is capable of carrying out.

 Sequential Search: This method does a search on each item in the list or array
one at a time as it proceeds in a predetermined order through the list or array.
(The rules that apply while carrying out a linear search also apply in this
scenario.)

 Interval Search: These algorithms were created to search sorted data


structures. They are referred to as "interval searches." The fact that they look
for things at regular intervals is where their name derives from. Because these
other forms of search algorithms constantly aim for the core of the search
structure and split the search space in two, they are more effective than the
standard Linear Search. One example of this would be a search utilising binary.

5.5.2 Algorithm of Linear Search

The activity of searching refers to the process of locating a certain item that is included
in a list of other items. If the element in question cannot be identified in the list, the
method is considered to have been unsuccessful, and information on its location is not
returned. If the element is found, then the process is regarded to have been successful,
and the return value for this process is the location of the element.

The objective of utilising the Linear Search method is to locate the index or search
point inside a specified array. A comparison of the search key to the first element of
the array or the first list starts the search process. If the first element in the array does
not match the search key, the array will go on to the second element, and so on and so
forth, until either a match is discovered or all of the elements in the array have been
used up. If not, it continues until it reaches the end of the array or list, which signals
that the search key is not currently available. If a match is discovered, the index is
returned; otherwise, it continues until it reaches that point. You are going to be given
an instance of the Linear search method right now so that you can have a better
understanding of what it entails.
155 | P a g e
Take, for instance:

Let's imagine the number 30 is the search key for the given array, which contains the
numbers 50, 90, 30, 70, and 60 respectively. The following thing to do is to walk
through the array in an iterative manner and verify each individual member to see if it
matches the search key. You have to move on to the next element in the array since the
value 50 does not equal the value 30, which is the next component in the array. The
number 50 may be found as the array's very first element. The next part is number 90,
however seeing as how that number is not equal to 30, it is time to move on to the next
one in the sequence. The result is the index of the element that is now being utilised in
the array. This is due to the fact that both the search key 30 and the following element
in the array contain the value 30.

In the last example, we had a look at the situation in which the search key was already
included in the array. Consider for a moment a circumstance in which the search key is
not present. Let's say for the sake of argument that the value of the search key is 10.
Conduct an inspection of each component of the array using the search element as the
criterion for comparison. It is finally unable to discover a match with 50, 90, 30, 70, or
60, and it reaches the end of the array as a result. As a direct result of this, either the
value -1 will be returned or the notice "element is not present in the array" will be
shown. Both of these outcomes are equally undesirable. This suggests that the key for
doing a search is not now available.

The Explained Code and What It Means:

• The method of utilising Linear Search to hunt for a certain element or value
within an array requires traversing the array from its beginning to its conclusion
until the desired element or value is discovered. This continues until the desired
element or value is located.
• The array is searched one element at a time, and the position is returned if the
key element that was being looked for is discovered in the array. This occurs
only if the array contains the key element. The value -1 is returned if the element
specified by the key is not present in the array.
• In this C programme, we have not particularly developed a function for the
Linear Search; instead, we may use the main function to check for the presence
of an element in an array. This is because we have not expressly written a
function for the Linear Search.

156 | P a g e
• We visit the array beginning at the 0th index and rising in order of index,
breaking the loop there and publishing the element's location in the array;
however, if the element that was requested is not present in the array, we simply
show the message "Element is not present in the array." • We traverse the array
beginning at the 0th index and rising in order of index.
• If we had developed a separate Linear Search function, and if the element could
not be located in the array, we would have printed "Element is not present in
array" or returned -1 to indicate that the element was not there. This would have
been the case in the event that the element could not be located. If the element
had not been found, this activity would have been carried out.

5.5.3 Time Complexity of Linear Search in C Program

The amount of time that an algorithm has to spend in order to completely process the
data set that it was given as input is one way to think about the algorithm's temporal
complexity. The number of operations that will be performed by the algorithm may be
roughly estimated based on the length of the data that is being read in by the algorithm.
In this specific situation, it would not take into consideration the whole amount of time
that is necessary to finish the algorithm. Instead, it will provide statistics on the variance
(increase or decrease) in execution time whenever there is a change in the number of
operations that the algorithm executes, regardless of whether that change is positive or
negative. These statistics will be provided anytime there is a change in the number of
operations that the algorithm does. Let's take a look at how challenging Linear Search
may be in three distinct circumstances: the best case, the average case, and the worst
case.

Case Time Complexity


Best Case O (1)
Average Case O(n)
Worst case O(n)

• The Difficulty of the Ideal Scenario When Employing Linear Search If the
element that we are looking for is situated at the beginning of the array, then
doing a linear search will be more difficult than in the best case scenario. The
complexity of linear search in terms of time is O (1) when everything works out
the way it's supposed to.

157 | P a g e
• The Degree of Complicacy Involved in the Typical Case When using a linear
search, the complexity of the typical situation, measured in terms of the amount
of time required, is O(n).
• Complexity in the Worst Possible Case: The most challenging scenario that can
occur in Linear Search is when the element that we are seeking for is situated
in the very last position in the array. This is because it makes it much more
difficult to find the element. If the array that has been provided to us does not
include the element that we are seeking for, the worst thing that might happen
is that we will have to search through the entire array looking for it. The amount
of time that will be required to do a linear search is represented by the notation
O(n), which refers to the worst-case scenario.

The reason that the Linear Search has a temporal complexity of O(n) is that each
member of the array is only compared once. This is because the search only goes
through the array in order.

5.6 BINARY SEARCH

Only search trees, a specific kind of tree data structure that may be used in the making
of ordered maps and dictionaries, will be discussed in this chapter. Search trees are a
form of tree data structure. As a result, this category is inclusive of all of the different
kinds of structures that are covered in this chapter. In this chapter, we will use search
trees as the foundation for all of the other types of structures that will be presented. You
should remember from that a map is a collection of key-value entries, with each value
being connected with a different key. You can find this information in the table.
Keeping this in mind should make it easier for you to comprehend what a map is. A
dictionary may be distinguished from other data structures in this regard due to the fact
that it allows many values to share the same key value. Nevertheless, both of these
classes of data structures are going to be covered in this chapter, with the primary
emphasis being placed on maps over the duration of the study.

We are going to move forward on the assumption that dictionaries and maps both make
accessible a single pointer object that is known as an iterator. This object gives us the
ability to refer to the components that make up the structure as well as list them. It also
gives us the ability to list the components. There is a unique sentinel iterator known as
end, and it is only utilised for one time during the whole process. It is responsible for
indicating when a particular item is finished being iterated over at the present time.

158 | P a g e
This sentinel is a convention that refers to an imaginary element that is positioned just
beyond the element that is considered to be the very final element of the structure. This
sentinel is positioned just in front of the element that is considered to be the very last
element of the structure. This element is positioned directly in front of the component
of the structure that is thought of as being the very last component of the overall
structure.

You can acquire the item that corresponds to an iterator p by using the notation *p if
you have one available. You are free to skip this step if you do not have access to an
iterator p. By utilising the p->key () and p->value () methods, one is able to gain access
to the specific key and value that is being searched for. This information may be found
in the p array. Our working hypothesis is that the key components may be acquired by
overloading the relational less-than operator in C++. This is the assumption that we
have made. This makes it possible to construct a specification of the total orders being
placed.

The following sections will explain how to walk a binary search tree to print its values
in sorted order, how to search for a value within a binary search tree, how to find the
minimum or maximum element, how to find the predecessor or successor of an
element, and how to insert into or delete from a binary search tree, respectively. Walk
a binary search tree to print its values in sorted order. After going through the
fundamental aspects of binary search trees in the previous section, the audience is next
given the opportunity to absorb this information.

5.6.1 Binary Search Trees and Ordered Maps

A search may be conducted by making comparisons at a number of distinct nodes inside


T, which is made feasible owing to the keys that are stored there. This allows for the
possibility of finding what one is looking for. The search can either go to the left child
or the right child of the current node, which is marked by v, or it can come to a stop at
the current node itself. As a result, for the sake of this conversation, we are going to
make the assumption that binary search trees are nonempty accurate binary trees. That
is to say, we do not store any things anyplace other than the inside nodes of a binary
search tree; the outside nodes are just utilised as "placeholders." This means that we do
not use any other locations for the storage of objects. Because of this tactic, several of
the search and update algorithms that we use are far simpler to comprehend. Aside from
that, we may make exceptions for flawed binary search trees.

159 | P a g e
These trees make better use of the store space that is available, but at the expense of
more complicated search and update operations. In order to run the find(k) operation
on a map M that is patterned after a binary search tree T, we need to interpret the
structure of the tree T as a decision tree. This will allow us to successfully run the
operation.

In this situation, the question that is presented at each internal node v of T is, "Is the
search key k less than, equal to, or greater than the key that is stored at node v, which
is denoted by the symbol key(v)?" in other words, "Is the search key k less than, equal
to, or greater than the key that is stored at node v?" In the event that the appropriate
response is "smaller," the research will continue in the left subtree in this case. If the
user responds with the word "equal," then the search is regarded to have been carried
out without any problems. If the question is answered with "greater," then the search
will continue in the correct subtree. In conclusion, if we find ourselves at an external
node while doing the search, it indicates that our efforts were for naught and we should
give up.

5.6.2 Analysis of Binary Tree Searching

The research concentrated on the most catastrophic possible outcome. The searching
procedure that takes place within a binary search tree T has an uncomplicated running
time. It's about the algorithm. The Tree Search algorithm is a recursive one, and during
each iteration of the recursion process, it carries out a predetermined number of the
most fundamental operations.

Figure 5.8: The running time of searching in a binary search tree.

160 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The Tree Search recursive call is executed on a child node of the node that came before
it at each iteration of the call. In other words, the Tree Search algorithm is executed on
the nodes that make up a route of T that starts at the highest level and descends one
level at a time. This route begins at the beginning of the tree and terminates at the
bottommost level. As a result, the number of nodes that correspond to this description
is restricted to h plus one, where h is the height of T. The operation known as "Find on
map M" takes O(h) time to complete, where h is the height of the binary search tree T
that was utilised in the construction of M. To put it another way, the execution of the
find operation on map M takes O(h) time since we spend O(1) time on each node that
is found during the search. (To view an illustration of this, please refer to Figure 5.9)

5.6.3 Describing binary search

When explaining a method to another human being, it is often sufficient to provide only
a partial explanation of the algorithm in question. This is because algorithms are
typically very complex. A recipe for a cake might omit certain details; for instance, it
might assume that you know how to open the refrigerator to get the eggs out and that
you know how to break the eggs. Both of these skills are necessary for making the cake.
However, they are both things that you should be able to perform. People might have
an intuitive grasp of how to complete the blanks, but computer software do not have
this capacity. For this reason, we need to offer thorough explanations of computer
algorithms.

You will need to have a good knowledge of an algorithm before you can properly
implement it in a programming language. This understanding must extend to every
facet of the algorithm. What are the different factors that go into the problem? What
are the results? Which variables should be created, and what values should be assigned
to them when they are first formed? Which steps should be conducted first in order to
compute other values, and which actions should be taken last in order to compute the
output? Do any of these phases include the repetition of instructions that might be
rewritten in a more streamlined manner utilising a loop?

Let's take a close look at how to accurately express binary search. The primary
objective of binary search is to maintain a record of the current set of educated
assumptions that may be made. In the spirit of the guessing game, let's pretend that I'm

161 | P a g e
trying to come up with a number between one and one hundred. If you have previously
guessed 25 and I have informed you that my number is higher, and if you have already
guessed 81 and I have told you that my number is lower, then the only numbers in the
range from 26 to 80 that are valid predictions are those numbers.

Using a few different variables, we are able to keep track of the set of estimates that
are reasonable. Let's say that the current lowest reasonable estimate for this round is
represented by the variable minm, i, n, and that the current maximum reasonable guess
is represented by the variable maxim, a, x. The number nn, which represents the greatest
potential number that your adversary is considering, is given as an input to the issue.
However, it would not be difficult to alter the method in order to take the lowest
possible number as a second input. Our working assumption is that the lowest possible
number is one.

In order to play the guessing game, the following is a step-by-step explanation of how
to use binary search:

1. Let us assume that 1min equals 1m, i, n, equals 1, and that max equals nm, a, x,
equals n.
2. Determine the average of maxm, a, x and inm, i, n, and then round the result
down to the nearest integer.
3. You should end if you guessed the number. You managed to find it!
4. If you think the guess was too low, adjust the minm, i, and n values such that
they are one higher than the guess.
5. If the guess was too high, reduce the values of maxm, a, and x by one from their
original values.
6. Return to the second step.

That explanation might be made even more specific if the inputs and outputs of the
algorithm were described in greater detail, and if it were made obvious what was meant
by commands such as "guess a number" and "stop." But I think we've covered enough
ground for the time being. In the next section, we will look at how to utilise binary
search on an array, and then we will explore how to translate descriptions of algorithms
into actual code that can be used.

162 | P a g e
CHAPTER 6

HASHING AND FILE STRUCTURES

Two of the most fundamental concepts in computer science and information technology
are the hashing process and the structure of data. Cybersecurity is dependent on both
of these concepts. They play a vital role in the processes of storing, retrieving, and
managing data. The process of hashing, which involves converting data into a string of
characters with a predetermined length, is often employed in the process of creating
efficient data structures with the goal of attaining rapid data access. In the procedures
of data deduplication, data indexing, and data security, it is a method that is absolutely
necessary. In addition, it is employed in a broad number of applications, such as the
storage of passwords, the implementation of cryptographic techniques, and the creation
of distributed databases. On the other hand, file structures decide how data is arranged
and saved on storage devices, which has an influence on the efficiency and
functionality of operations involving the storage and retrieval of data.

This is because file structures govern how data is written to the file. In order to develop
dependable and efficient software systems, database systems, and storage systems, it is
essential to have a strong grasp of the concepts and implementations of hashing and
file structures. During the course of this inquiry, we are going to delve into the intricate
world of hashing and file structures, shedding light on their significance, applications,
and the many algorithms and data structures that support them in the process. You will
be equipped with the knowledge and abilities necessary to harness the potential of data
management and storage in the digital era after completing this in-depth exploration of
hashing and file structures. This course will teach you all you need to know, regardless
of whether you are an experienced software engineer, a data scientist, or a new
programmer.

Hashing and file formats are fundamental concepts that are extremely crucial to the
management, archiving, and retrieval of data in the fields of computer science and
information technology. Hashing is also a vital part of the process. Data hashes can be
stored in hash tables using their names. Hashing, which consists of turning data into a
string of characters with a set length, serves a broad range of purposes, some of the
most important of which are the protection of sensitive information, the enhancement
of the efficacy with which data can be retrieved, and the maintenance of the data's

163 | P a g e
integrity. Hashing is a process. In the context of file structures, hashing is widely used
as a basic component in a broad range of data storage technologies, including hash
tables, databases, and file systems. In addition, hashing is regularly used as a
fundamental component in the context of file structures.

The file structure is the organisational arrangement of data stored on storage devices,
such as traditional hard drives, modern solid-state drives, and distributed cloud storage
solutions. The term "file structure" refers to the organisational arrangement of data
within storage devices. On the other hand, file structures have this organising pattern
built into them. It is crucial to have a thorough grasp of how these ideas interact with
one another and how they are utilised in actual-world scenarios in order to design
effective and scalable systems that can manage huge amounts of data while maintaining
the data's integrity and accessibility. In order to do this, it is necessary to have an in-
depth knowledge of how these concepts are employed. During the course of this
exploration of hashing and file structures, we will delve into the underlying principles,
methodology, and practical applications that drive these critically essential parts of
computer science and data management.

6.1 HASHING: THE SYMBOL TABLE

The concept of hashing is fundamental to both the field of computer science and the
management of data. It offers a reliable approach to obtaining and storing data in an
efficient manner in a variety of formats. When it comes to the realm of symbol tables,
which is one of its most well-known applications, it plays an important role not only in
the structuring of the data but also in the retrieval of that data. This is one of the most
well-known uses of this technology. Symbol tables are fundamental data structures that
are employed in the process of associating values with keys. This makes it simple not
just to properly store information but also to search for it, modify it, and retrieve it. The
process of hashing transforms these keys into hashed values of a certain size, which
afterwards permits rapid access to the data that are associated with them.

This technology not only enables searches to be completed in the blink of an eye but
also optimises the usage of memory, which makes it an indispensable resource for a
wide variety of applications. Applications such as database management systems,
network protocols, and programming languages and compilers are included in this
category. We will learn the principles, technique, and real-world applications that make
hashing a cornerstone in current computing. Hashing makes it feasible to manage large

164 | P a g e
and dynamic datasets in a smooth manner, and this will be the focus of our exploration
of hashing within the context of symbol tables. Symbol tables are a type of data
structure that are used to represent symbols. This will be performed by shedding light
on the processes that are used by hashing in order to generate a one-of-a-kind identifier
for each symbol included inside a table.

6.1.1 General Idea

An item, for instance, may include a string in addition to other data members, such as
a name that is part of a massive personnel hierarchy. This string would serve as the
item's key. Other data members would include a phone number. We shall refer to the
table size as the Table Size, keeping in mind that this is a component of a hash data
structure and not simply some variable that is floating about worldwide. Keeping this
in mind, we will refer to the table size as the Table Size. The reason for this will become
evident in just a minute, but the conventional procedure is to have the table run from 0
to Table Size minus 1, and the rationale for this will become clear in just a moment as
well.

Following the assignment of a number to each key in the range of 0 to Table Size minus
1, the number is then inserted into the column that is pertinent to that key. The mapping
is known as a hash function, and it should ideally be straightforward to compute while
also guaranteeing that any two distinct keys receive separate cells. Ideally, it should
also ensure that any two distinct keys receive separate cells. This is obviously not
possible owing to the fact that there is a finite number of cells but an almost unlimited
supply of keys; hence, we are seeking for a hash function that will evenly distribute the
keys throughout all of the cells in the matrix.

The symbol table is an essential data structure that is created and maintained by
compilers. Compilers are responsible for its development and maintenance. This
structure's goal is to compile information on the existence of a wide variety of entities,
such as variable names, function names, objects, classes, and interfaces, amongst other
things. This information will be used to record data. Both the analysis and the synthesis
parts of a compiler will make use of something called a symbol table at some point
throughout their work.

The following applications may be suitable for a symbol table; however, this will
depend on the particular language that is being used:

165 | P a g e
• To keep the names of all entities in a single place in an ordered and well-
organized form throughout all locations.
• Making sure that a variable has been defined before attempting to use it by
checking to see whether it has been.
• To implement type checking by making certain that all of the assignments and
expressions in the source code are correct from a grammatical and logical
standpoint.
• To determine the range of circumstances in which a name can be used (a process
also known as scope resolution).

A symbol table is nothing more than a table, and it can take the shape of either a linear
table or a hash table depending on the specific needs of the application. It keeps a record
of each name using the following structure for each entry.

6.1.2 Implementation

If there is going to be just a little amount of data for a compiler to process, then the
symbol table can be represented as an unordered list instead of a traditional table
format. Although it is easier to design, this particular method of implementation should
only be used when working with extremely small tables. For the purpose of the
implementation of a symbol table, you can choose from the following possible options:

• List in linear format, sorted or not sorted, depending on your preference.


• A Hierarchical Organisation of Binary Queries
• A dining table covered in hashes

The construction of symbol tables often takes the form of hash table construction. This
indicates that the symbol in the source code is used as a key for the hash function, and
the information on the symbol is returned as the value of the return key in the hash
table.

Insert the brackets ()

The analysis phase, which is the first half of the compiler and is responsible for finding
tokens and storing their names in a table, makes more frequent use of this technique
than any other step. This is because the analysis phase is also responsible for
determining whether or not an expression contains a token. With the assistance of this
technique, the symbol table may be modified to include newly discovered information

166 | P a g e
on one-of-a-kind names which are located in the source code. The file format or data
structure that is utilised to store the names will be determined by the specific compiler
that is being used.

The term "attribute" refers to the information that is associated to a certain symbol in
the source code. This information is known as the attribute of that symbol. This
information consists of the value of the symbol, as well as its state, scope, and type.
Also included is the information. The symbol and its attributes are passed as arguments
to the insert () function, at which point the information concerning the symbol and its
properties is saved in the symbol table.

6.2 HASH FUNCTION

If the input keys are integers, a simple approach consisting of returning Key divided by
Table Size is frequently an effective solution. The one and only time this won't be the
case is if Key possesses certain qualities that are not particularly desirable. In this
circumstance, giving great thought to the choice of the hash function that will be
utilised is required in order to proceed. For instance, if all of the key lengths culminate
in zero and the size of the table is ten, the typical hash function is not the most
advantageous choice. It is important to make sure that the table size is prime, and we
will go into the reasons for this at a later time.

In general, it is a good idea to make sure that the table size is prime. In addition to that,
this will assist us in avoiding scenarios such as the one that was discussed before. This
function not only makes the computation of the function reasonably straightforward
when the input keys are random integers, but it also ensures that the keys are dispersed
consistently over the board. Because strings make up the majority of the keys, careful
consideration is required when choosing the appropriate hashing method to use. One
strategy that may be done is to simply add up the ASCII values of the characters that
are contained in the string. The method detailed in Figure 6.1 is utilised while putting
this strategy into action.

The hash function shown in Figure 6.1 is not only simple to construct but also quite
efficient in terms of the response computations it performs. On the other hand, when
the size of the table is quite large, the function does not perform very well in terms of
spreading the keys. Take, for example, the number that is assigned to Table Size, which
is 10,007 (given that 10,007 is a prime integer). Let's presume that none of the keys

167 | P a g e
contain more than eight characters altogether. The hash function may generally only
assume values between 0 and 1,016, which is equal to 127 multiplied by 8, due to the
fact that the integer value of an ASCII character is guaranteed to be no greater than
127. It is quite clear that this distribution is not fair in any way! You are welcome to
have a look at the extra hash function that is presented in figure 6.2.

This hash method operates under the assumption that the Key parameter will include a
minimum of three characters. 729 is equal to 27, which is the value that represents the
entire number of letters in the English alphabet including the spaces that are not used
for letters. This function just examines the first three characters, but if those characters
are selected at random and the table size is kept the same as it was before, then we
should be able to expect a distribution that is typically fair. Unfortunately, the English
language is not totally open to interpretation.

A search conducted in one of the more comprehensive online dictionaries reveals that
the total number of distinct permutations is actually just 2,861. This is despite the fact
that there are 26, which means 17,676 different combinations of three characters
(ignoring blanks). However, this number may be reduced to just 2,861 by eliminating
some of the possibilities. Even if none of these conceivable combinations were to clash
with one another, the maximum amount of the database that could be hashed is still just
28 percent. Therefore, despite the fact that it is easy to compute, this function should
not be utilised in circumstances in which the hash table is rather huge.

Figure 6.1 A simple hash function

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

168 | P a g e
Figure 6.2 Another possible hash function—not too good

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 6.3 depicts the third effort that was made to create a hash function. It is fair to
expect that this hash function will accurately distribute the results since it computes
KeySize1 i=0 Key [Key Size i 1] 37i and then places the result within the right range.
A polynomial function with a degree of 37 may be found by the use of Horner's method
in the computer code. Using the formula hk = ((k2) 37 + k1) 37 + k0 is another way to
reach the result hk = k0 + 37k1 + 372k2, for instance. This technique is similar to the
first. The formula may be found in the phrase that came before this one. The use of
Horner's rule allows this to be extended all the way up to a polynomial of the nth degree.

The hash function makes use of unsigned ints so that it may avoid the risk of
introducing a negative value and also makes advantage of the fact that it is acceptable
for values to overrun their allocated space. The hash function that is depicted in Figure
6.3 is not necessarily the most effective when it comes to the distribution of tables;
nevertheless, it does have the benefit of being very simple to comprehend and is fairly
speedy. If the keys are extremely long, the hash function will be unable to do its
computations in a timely manner.

In circumstances similar to this one, it is common practise to make use of less than all
of the characters available. After then, the selection would be determined by the
qualities of the keys, such as the length of the key, among other things. For the sake of
demonstration, the keys may represent a whole street address. It is possible for the hash
function to take into account a couple of characters from the street address, a couple of
characters from the city name, and a couple of characters from the ZIP code. Some
programmers decide to construct their hash function by making use of the characters
that are located in the odd spaces only. Their line of thinking is that this will save them
time when computing the hash function, which will more than make up for the fact that
the function will be slightly less evenly distributed. This is their argument.

169 | P a g e
Figure 6.3 A good hash function

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The resolution of collisions is currently the most important aspect of the programming
that has not yet been completed. In the case that a newly added element hashes out to
the same value as an existing element, we will be faced with a collision and will need
to find out how to handle it. In the event that a newly added element hashes out to the
same value as an existing element. Dealing with this situation may be done in a number
of various ways, all of which are viable options. In the first part of this discussion, we
will go through separate chaining and open addressing, which are two of the most
straightforward options; in the second part, we will investigate a number of different
alternatives that have been found in more recent times.

6.2.1 Separate Chaining

The first stage in the process, which is commonly referred to as distinct chaining and
includes maintaining a list of everything that hashes to the same value, is the first step
in the first approach. This part of the process is the first step in the first approach. The
list implementation that is provided by the Standard Library is accessible to us, and we
are able to make use of it. It is made available to us since it is part of our usage. Because
they contain a number of links and take up space that isn't required, it is best to steer
clear of utilising these lists whenever it is possible to do so, especially in situations
when space is at a premium.

170 | P a g e
It is especially vital to avoid utilising these lists when there is a limited amount of space
available. For the sake of this article, we are going to proceed on the assumption that
the keys are the first ten perfect squares and that the technique for hashing is simply
hash(x) = x multiplied by 10. This will allow us to proceed with the rest of the article.
(The length of the table is not a prime number; nevertheless, for the purpose of
elucidation, let's say that it is.) The several chaining hash tables that were produced as
a consequence of this operation are displayed in figure 6.4 for your viewing
convenience.

Figure 6.4 A separate chaining hash table

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

In order to conduct out a search, we select which list we should go through by using
the hash function. This allows us to determine which list we should go through in order
to find what we are looking for. After that, we move on to inspecting each item that is
included in the relevant list. If it is likely that there will be duplicates, it is standard
practise to keep an additional data member, and if there is a match, this data member
will be incremented. Before we can carry out an insert, we must first check the relevant
list to see if the element has been included in the list before. If it has, then we can move
on to the next step of the process. After checking to see if the component has already
been inserted, the next step is to insert it if necessary. If it turns out that the component

171 | P a g e
is new, it may be moved to the front of the list since doing so is easy, and also because
it is common practise that newly added items are the ones most likely to be accessed in
the near future.

If it turns out that the element is new, it may be moved to the beginning of the list. If it
turns out that the component is new, then it may easily be added to the front of the list
because doing so is a straightforward process. The class interface that should be utilised
for a standalone chaining solution is seen in figure 6.5. The constructor is in charge of
allocating an array of linked lists, which is subsequently placed in the hash table when
it has been created.

Figure 6.5 Type declaration for separate chaining hash table

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The example syntax point that follows is an example of one that may be found in the
class interface: Both the fact that it is a C++ token and the fact that it is longer than >
mean that it would have been recognised as the token in earlier versions of C++ than

172 | P a g e
C++11 due to the fact that it is longer than >. In earlier versions of C++, prior to C++11,
the declaration of theists needed an additional space to be included between the two >
signs. This was no longer the case in C++11. This is no longer the case, however, as of
the introduction of C++11 earlier this year. The hash tables that are specified in this
chapter may only be used by objects if they include a hash function and equality
operators (operator== or operator! =, or possibly both). The binary search tree is only
useful for objects that are Comparable. In a similar vein, the hash tables that are given
in this chapter are only relevant for items that have a hash function. In order for the
binary search tree to have any use at all, both of these characteristics need to be present.

Our hash functions are designed to take the object being hashed as their one and only
parameter, and they are programmed to return an integral type that is appropriate for
the data they have been given. We do not require hash methods that take both the object
and the table size as inputs, thus we created our hash functions to just accept the object
as an input rather than the table size. This is because we developed our hash functions
in-house. Making use of function objects is the recommended strategy for
accomplishing this objective, and the year C++11 saw the introduction of the protocol
for employing hash tables as a search mechanism. With the aid of the function object
template, it is now possible to build hash functions in C++11, which was not the case
previously.

173 | P a g e
The default implementations of this template are supplied for standard types like int
and string; as a result, the hash function that is presented in Figure 6.3 might be
implemented as a string. It is a given that the unsigned integral type size_t, which
represents the size of an object, is capable of storing an array index because size_t is a
type that represents the size of an item. This ability to store an array index is a feature
of the type that represents the size of an object. A class that implements a hash table
algorithm may then use calls to the generic hash function object to construct an integral
type of size_t and then scale the result into an appropriate array index. This is possible
because the generic hash function is a function that is shared by all hash table
algorithms. This is feasible due to the fact that all hash table methods utilise the same
function, which is known as the generic hash function object. The inclusion of the
private member function myhash in our hash tables reflects this fact, as can be seen in
Figure 6.6.

Figure 6.6 my Hash member function for hash tables

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Figure 6.7 illustrates one instance of an employee class that may be archived in the
generic hash table for future reference. The name member of this class is going to be
quite important. The criteria of HashedObj are met by the Employee class, which
implements them by providing equality operators and a hash function object. This class
also satisfies the requirements of HashedObj.

The code that may be utilised to successfully complete the make Empty, contains, and
remove activities is displayed in Figure 6.8.
174 | P a g e
Figure 6.7 Example of a class that can be used as a HashedObj

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

175 | P a g e
Figure 6.8 make Empty, contains, and remove routines for separate chaining hash
table

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

The procedure of introducing anything then comes after that. We do nothing if the item
that is to be inserted already exists in the list; if it does not, however, we add it to the
list (see Figure 6.9). The user has complete control over where the element is placed in
the list; nonetheless, the push back operation is the most time-efficient choice for our
needs. in this regard A list is another name for a reference variable.

For the purpose of resolving the collisions, any approach other than linked lists might
be utilised; a binary search tree or even an additional hash table would do the task.
Basic separate chaining, on the other hand, does not make any attempts to do actions
that are unduly complex since we assume that all of the lists ought to be short if the
table is big and the hash function is efficient.

176 | P a g e
We define the load factor of a hash table to be the ratio of the number of elements that
are contained in the hash table to the size of the table. This ratio is represented by the
symbol and is indicated by the notation. In the previous example, was equivalent to the
value 1.0. The average number of things that can be found on a list is equal to. The
amount of work that must be done in order to conduct a search is equivalent to the
amount of constant time that must be spent evaluating the hash function in addition to
the amount of time that must be spent traversing the list.

When conducting a search that does not yield the desired results, the typical number of
nodes that need to be investigated is. It is essential to journey through about one and a
half times the number of links in order to have a successful search. Observe that the list
that is now being searched has at least one extra node in addition to the one node that
records the match. This can be zero or more nodes. This substantiates the assertion that
the list is comprised of more than one node. The number of "other nodes" in a table
with N items and M lists is projected to be (N1)/M = 1/M, which is quite close to the
value 0. This is because it is anticipated that M is a rather large number. As a result of
the fact that, on average, only half of the "other nodes" are searched, the total number
of nodes that need to be searched in order to achieve an average search cost is one plus
one half the number of other nodes.

Figure 6.9 insert routine for separate chaining hash table

Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

177 | P a g e
The findings of this inquiry indicate that the load factor, and not the table size, is the
component that is more important to consider. When using separate chaining hashing,
the general rule of thumb is to make the table size about as large as the number of
entries that are expected (that is, allow 1). This is the fundamental guideline. One
further approach to express this idea is to declare that 1 > 0. If the load factor in the
code shown in Figure 6.9 is more than one, we expand the size of the table by calling
the function rehash on line 10. This is done so that the table can hold more data. In
Section 6.4, the subject of rehash is discussed in detail. As was said earlier, maintaining
the appropriate number of people seated at each table will assist ensure that everyone
receives an equal portion of the food that is being served.

6.2.2 Hash Tables Without Linked Lists

The distinct chaining hashing technique is hindered by the usage of linked lists, which
is a negative for the algorithm. It is possible that the algorithm will run a little bit more
slowly as a result of this change because of the amount of time required to allocate
additional cells (especially in foreign languages). As a further consequence of this fact,
the production of a second data structure is going to end up becoming an absolute need.
When working with linked lists, one approach for resolving collisions is to iteratively
attempt different cells until it is discovered that one of them is empty. This can be done
until all of the collisions have been resolved.

In a sense that is more precise, the cells h0(x), h1(x), h2(x), etc. are checked one after
the other with the condition that hi(x) = (hash(x) + f(i)) mod Table Size, where f (0)
equals 0. This is done so that the results may be compared to determine whether or not
the condition is met. The function that is designated by the letter f represents the process
that is used to resolve collisions. Due to the fact that all of the information is kept within
the table, the size of the table has to be substantially more than what it would be if
separate chaining hashing were used. The load factor for a hash table that does not
make use of separate chaining should typically be lower than or equal to 0.6 in the
majority of situations. These tables are referred to as probing hash tables and are
commonly used in computer systems. In this section, we will take a look at three of the
most common methods for resolving collisions.

6.3 COLLISION-RESOLUTION TECHNIQUES

Since computers are unable to comprehend human languages such as English or


Spanish, users are required to communicate with computers via the usage of a set of

178 | P a g e
languages known as programming languages. The creation of software for a computer
may be accomplished through the use of a wide variety of language families.
Computers are powerful instruments that were developed to solve complex issues; yet,
in order for them to accomplish this goal, a programming language and a programmer
are required. The operation of a computer's myriad web browsers, games, email clients,
operating systems, and applications is entirely dependent on the software installed on
that machine. Programming enables users to come up with inventive solutions to any
problem that may occur.

In the field of computer programming, the usage of computational language is the


method by which a task's set of instructions are communicated to a computer in order
to be carried out. Using the right instruction groups and specialist software, it is
possible to find a solution to any problem, regardless of its level of complexity. When
it comes to the creation of computer programmes, there are three fundamental ideas
that need to be kept in mind. The terms "selection," "repetition," and "sequencing" are
used to refer to them in that order. The first essential idea is a series, which gives you
instructions to carry out the stages in a certain sequence; the second essential idea is
choosing the suitable command to use. Iteration, the fourth idea, is the process of
repeatedly carrying out the same action. Iteration refers to the process of performing
the same activity. The creative process of programming consists of the programmer
picking the appropriate instructions to solve the problem when they are presented with
a challenge.

If we map the keys of a huge universe, U, to a small set, S = [0..., s 1], it is unavoidable
that many elements of U will be mapped to the same element of S. This is because
mapping the keys of a large universe to a small set reduces the amount of space
available for each key. This is due to the fact that S is a more compact option than U.
We don't need to store the whole world in order to utilise a dictionary structure; rather,
all we need to do is keep a set X U of n keys for the objects that are already in the
dictionary.

Using a dictionary structure does not need us to store the entirety of the universe.
However, if we do not know the set X when we select the hash function h: U S, which
is unavoidable if the set X is dynamic and subject to change as a consequence of
insertions and deletions, then we have the option of selecting a set X in which all of the
elements are mapped to the same s S. This is the case if the set X is dynamic and subject
to change as a consequence of insertions and deletions. This is feasible due to the fact

179 | P a g e
that the set X is dynamic and is always evolving as a consequence of insertions and
deletions being made to it. As a result, steps need to be done in reference to the aspects
of X that are in conflict with one another. The following are two methods that have
been demonstrated to be effective in fixing this issue:

Being in possession of a secondary structure that, for each s in S, stores all of the items
in X and has the value h(x) = s assigned to the s variable as its value. As a result, every
one of the s buckets has its very own secondary dictionary. On the other hand, given
that each bucket should only be used to store a certain number of components, this
secondary dictionary may be relatively basic. Chaining is the most fundamental
method, which entails doing nothing more than generating a linked list of the
components as a collection.1 This is the course of action that is recommended. Having
for each u U a series of alternate addresses in S: if h(u) = h1(u) is already being used
by a colliding element, we try h2(u), h3(u), etc. until we find a bucket that is not full.
this continues until we find a bucket that does not contain any elements. The method
in question is known as "open addressing," and despite the fact that a significant amount
of study has been conducted on it, it is not advised that individuals engage in this
technique.

The first method entails splitting the universe U by h1(S), and then storing inside the
same secondary structure all of the element’s x between X and U that correspond to the
same partition class. This procedure is followed by the storage of the data. If we can
insert and delete in the secondary structure, then we can do so in the main structure as
well; the function h does nothing more than point us in the direction of the proper
secondary structure. If the partition that is induced on X is outstanding, with only a few
or no more than a few things in each bucket, then this is incredibly useful; but, if there
are a large number of elements in the same bucket, then it degrades to the same level
as the secondary structure that we are now exploiting.

In order to get a worst-time bound of O(login) in addition to an O(1) time for all of
those things whose bucket only has a small number of components, we may use a
balanced search tree as a secondary structure. This would allow us to acquire both of
these time bounds simultaneously. However, we will show that if the hash function is
chosen properly and set S is not made too tiny, it is possible to forecast that the majority
of the buckets will be almost empty. This will be accomplished by choosing a set S that
is not too small. Consequently, employing a linked list as the secondary search structure
is enough even when used on its own.

180 | P a g e
This second technique does not require any kind of dynamic memory allocation since
we do not require linked lists. As a result, it has gained a lot of traction because it does
not require dynamic memory allocation. Because it is an implicit structure, there is no
demand for pointers; hence, there is no requirement for it to be very simple to produce
and efficient with space. As a result of this, it was assumed to be exceptionally efficient
with both time and space. These tiny gains, which appear to be irrelevant on today's
computers, are rendered null and worthless by a fundamental downside, which is that
this structure does not support deletions, and they are rendered null and void by the
basic drawback that this structure does not permit deletions. In order to add an element
x, we have to go through a series of buckets, starting with h1(x) and working our way
all the way up to hk(x), until we find one that is free of content.

As a consequence of this, when we are carrying out a search operation, we are obliged
to look at the same sequence of buckets several times until we either identify the
element or a bucket that is empty. If we delete an element at this point in the sequence,
the bucket that bucket belongs to will become empty. As a consequence of this, any
future find operation for x will fail since we will have broken the search route.
However, this would result in the accumulation of a large number of invalid buckets,
which could be reused in insert operations but would still add to the length of the search
route even if they were invalid. This is something that can be avoided by labelling the
deleted element as invalid while leaving the bucket's capacity unchanged. If you do
this, however, the search route will be longer. Alternately, if we remove an item from
bucket i, we may try to shift up along its search route any other item that had i in its
search path but found that bucket to be full.

This would be the case if we deleted an item from bucket i. If we discovered that the
bucket was already full, then this would be the situation. Having said that, we are able
to accomplish this only if we are aware of the position of the potential other element;
hence, all of the elements that have an occurrence of i in their search route need to have
the same bucket j as the next element in their search path. Any system that allows
deletions will ultimately result in clustering of some sort, and this specific case is
particularly troublesome because it causes clustering, which is the development of
continuous blocks of full buckets; any system that allows deletions will inevitably
result in clustering of some kind.

As a consequence of this, the alternative that appears to be the most natural, namely
hi(x) = h0(x) + i, is in reality a suboptimal choice. If we do not make use of deletions,

181 | P a g e
there exist a large number of unique sequences of functions (hi(x)) s i=1 that are
conceivable as search paths. These sequences have been explored from the perspective
of the expected length of the longest search path. If we do not make use of deletions,
we will be unable to locate the most suitable solution. A substantial number of research
publications have been written on the topic of the optimal selection of the sequence
(hi(x)) s i=1, which is also referred to as probe sequences in certain circles. However,
the fundamental cost of losing deletions will never surpass the minor space savings that
may be achieved by eliminating pointers. As a result, this benefit will never prevail.

Both strategies have produced a huge number of distinct outcomes as a result of their
use. In the chaining method, we are going to inspect subsequent items until we find the
correct key, and then we are going to move down the list in the proper bucket. This will
be done until we have exhausted the chaining method. Because of this, we want to
make sure that the items on each list that are often accessed are recognised as early on
as possible. As a result, each bucket contains its own individual instance of the
difficulty with accessing the well-researched list. It is common knowledge that the
move-to-front strategy is a 2-competitive one for this particular issue.

This designation indicates that the move-to-front strategy can access no more than
twice as many list items as the approach that possesses the optimal sequence of list
items. Therefore, moving the discovered item to the front of its list after each find
operation is a straightforward adjustment that provides some benefits for very skewed
access patterns. This easy alteration may be done after each find operation. This is done
so that the object that was located is shown at the top of the screen. This concept was
initially conceived of as self-adjusting hash tables. It is also capable of being blended
with the open addressing approach; however, doing so results in a far more complicated
formulation. In the next section, both the move-to-front technique and the find function
will be discussed in detail.

6.3.1 Programming Languages

There is currently no way to convert any human language into a format that can be
understood by computers, which is why we require a large number of different
programming languages. Every language has its own set of advantages and
disadvantages, and some languages are better suited to accomplishing certain jobs than
others. Some languages are better suited to doing certain things than others.
Programming languages come in a wide variety, and many various sorts of

182 | P a g e
professionals, including software developers, computer system engineers, web
designers, app developers, and others, need to have some level of fluency in at least
one programming language in order to do their jobs. Despite the fact that there are over
50 different programming languages in existence, the three programming languages
that are used the most commonly are HTML, Java, and the C language.

6.3.2 Computation Thinking

The use of computational thinking may be utilised as a method for the resolution of any
issue through the utilisation of four fundamental patterns. If we have a strong grasp of
the four fundamental patterns and make effective use of them, we may be able to make
computational thinking more straightforward when it comes to programming. In order
to have a solid understanding of an issue, the first step is to break it down into its
component elements and analyse each one individually. When we break the problems
down into smaller, more manageable parts, we place ourselves in a position where we
are better able to apply the other aspects of computational thinking. The process
continues with the identification of patterns with the second stage. In this stage, there
is an examination of the problems to see whether or not there is a sequential pattern.
Any repeating themes that were discovered have been sorted into the appropriate
categories as far as possible.

It will not be necessary to further reduce that problem if it turns out that no patterns
have been found. The problem that has to be solved can be simplified or generalised,
and this brings us to the third part of the solution. When you distance yourself from the
specifics of a problem, you improve your ability to construct an answer that is more
general and can be used in a number of settings and circumstances. The Algorithm is
the name of the fourth and last component of the system, and it is here that difficulties
are addressed in ways that are ever more complicated. It is absolutely necessary for you
to design a plan for implementing your answer. Using an algorithm is a tactic that can
be applied to figure out detailed step-by-step instructions on how to approach any
activity. This tactic may be used to find out how to approach any work.

When the record's home location is already filled, the collision resolution procedure
that takes place during insertion is tasked with the job of locating an empty area in the
hash table. This work must be completed before the record may be inserted. Any
technique for resolving collisions may be thought of as the construction of a series of
slots in a hash table, each of which may or may not contain the record. This is a

183 | P a g e
conceptualization that can be applied to any technique. When the sequence starts at the
first position, the key will already be in the position that it knows best and is most
comfortable in.

In the event that the home location in the sequence is already occupied, the collision
resolution strategy will proceed to the site in the sequence that comes after that
particular one. In the event that this one is taken as well, it will be necessary to hunt for
another place, and so on. A probe function, which we shall refer to as p, is responsible
for producing the slots that make up the probe sequence. The probe sequence itself is a
collection of these slots. The probe sequence is constructed using these various slots as
its building blocks. The insertion procedure is carried out in the manner described
above.

6.4 FILE STRUCTURE: CONCEPTS OF FIELDS, RECORDS AND FILES

In its most general form, a file is just a collection of records that have been organised
in some kind. The ways in which the records themselves are grouped within the file are
a significant component of the administration of the file itself. This is due to the fact
that the manner in which the records are arranged have a major influence on the
performance of the system in terms of discovering new records and accessing
previously hidden ones. The "organisation" that we are referring to in this context is
not the physical layout of the file as it is recorded on a storage medium; rather, it is the
logical arrangement of the records in a file (their ordering or, more broadly speaking,
the presence of "closeness" relations between them based on the content of the records),
and this is what we mean when we say "organisation."

The physical layout of the file as it is recorded on a storage medium is not what we
mean when we say " However, the accessibility of records within a file is dependent
on the physical medium (such as a hard drive or a cloud service) on which the file and
its associated data are kept. Magnetic tape is necessarily sequential because of how it
was originally designed. If you want to read a record, you have to start at the beginning
of the tape and read each record one after the other (in sequential order) until you get
to the record you want to read. Only then can you read it. This is quite comparable to
the manner in which one listens to a music that has been recorded on an audio cassette.
It ought to go without saying that discs provide for the possibility of random access to
recordings. A comparison may be made between this distinction and the one that exists
between audio cassettes and audio compact discs.

184 | P a g e
When listening to an audio cassette, you need to begin from the very beginning and
continue moving forward on the tape until you reach the song that you are interested in
hearing. If you do not start at the beginning, you will not be able to find the music that
you are looking for. You may play the songs on a compact disc in whatever sequence
you choose, or you can fast-forward directly to the song you want to hear if there is
more than one you are interested in hearing.

In this lesson, we are going to discuss the many ways in which data may be represented
for files that are kept on external storage devices. These devices can be things like hard
drives or flash drives. Because of this, we will be able to carry out the essential
procedures (such as retrieval and update) in the shortest amount of time feasible. It is
possible for the organisational strategy that is most suited for a particular application
to change based on a number of different factors. These criteria can include the kind of
external storage that is available, the kinds of queries that are allowed, the number of
keys, as well as the method of retrieval as well as the mode of update.

6.4.1 File Organization

The act of representing and storing the records that are included within a file is referred
to as "file organisation," and it is a part of the overall file management process. A file
is a named collection of data or facts that are related to one another in this context. The
sections of a table known as fields are the columns that only contain one type of
information. A collection of all the records in one place is known as a file. As a
consequence of this, a file possesses Records, and Records include fields, and Fields
contain data items, and Data items contain characters (including alphabetic characters,
numeric characters, and special characters, among other types). When it comes to data
storage, one byte is taken up by each individual character.

Figure 6.10 : Components of a File

185 | P a g e
Source: Data structures and algorithm analysis in C++ data collection and processing
through by Mark Allen Weiss in March (2018)

Within the structure of a traditional library, the author catalogue is seen as a type of file
called a "author file." One record is made up of all of the author's catalogue cards when
they are taken one at a time. Any section of a card, such as the author's name, the title,
or any other component, might be considered a field. As a result, a file could include
one or more records, and each of those records might have one or more fields. It's
possible that this pattern will persist forever. A database is a collection of files that,
when joined, perform out the operations of a logical data model. A database may also
be thought of as a repository for data. As a consequence of this, the method that is
applied in the arrangement of data for the purposes of storage, retrieval, and processing
is what is meant to be referred to when using the phrase "file organisation."

The two types of files that are likely to be present in a physical database system are
known as data files and index files respectively. The numerous data files are where the
database's data, which together make up the database, are stored. Index files (or
directories) are utilised to enable access to the data files; despite this, index files (or
directories) frequently do not store any information other than key values in their
respective databases. A database's logical structure is important in determining which
pieces of information should be available and how those different pieces of information
are connected to one another. Consider, as an illustration, a standard bibliographic
database. It might be a collection of records that are preserved in a file and contain
bibliographical information on a number of different books. It is possible that each
record will have more than one field that is exclusive to a book.

The organisation of a file, which is synonymous with the physical ordering of


information in storage, is what decides the order in which records are stored and hence
the sequence in which they are read from a file. In addition to this, it sets the order of
activities that must be carried out in order to discover certain documents. These
activities must be carried out in order.

6.4.2 Record Access Method

When we talk about the structure or organisation of a file, we are referring to the
technique that is used in the process of structuring the record that is contained inside
the file. When we talk about the structure or organisation of a file, we are referring to
the method that is applied. The process by which we extract data from the file is referred

186 | P a g e
to as the "access method," which is a generic word. A search is performed on the file
using this approach. These two aspects, structure/organization and access technique,
are connected since the kind of structure defines the possible access techniques and
vice versa. As a result, these two aspects are interdependent on one another.

When deciding which organisation will be most helpful for a certain file, it is vital to
take into consideration not only the kinds of operations that will be conducted on the
data, but also the operational features of the storage media that will be used. Only then
can the organisation that will be most helpful for the file be chosen. This guarantees
that the data will be handled in the most effective manner feasible within the constraints
of the situation. The ability of a storage device to either offer direct access to individual
record occurrences or enable only sequential access to record occurrences is the most
significant feature that must be taken into consideration when purchasing a device of
this kind.

Sequential access to record occurrences is also an important feature. The other choice
is to access record occurrences in a sequential fashion. A direct access storage device,
often known as a DASD (also abbreviated as DASD), can take the form of magnetic
discs. This particular category of storage device can also be referred to by its
abbreviation. Magnetic tapes are one example of a type of storage media that arranges
the data it contains in sequential order. Other examples of such a medium include hard
drives and flash memory.

The following is a concise overview of the four basic file organisation strategies that
will be covered in greater depth later on in this article:

1. In the proper sequence according to the chronology


2. A format for indexing that is in consecutive order
3. Access at Will, with Assignments Determined at Random
4. There are a number of different keys.

6.4.3 Sequential File Organization

The files can be organised in a sequential fashion, which is the form of file organisation
that is the least complicated. The records are written in one long list that is arranged in
a sequential sequence, and this list is stored in a file that is ordered sequentially. The
records in the file are arranged in the same order that they were entered into the file,
whether that was by typing them in or writing them by hand. This helps to guarantee

187 | P a g e
that the honesty of the data will be maintained. In other words, each item in the file is
saved one after the other; for instance, the record with the sequence number 11 may be
located straight after the record that comes 10th in the sequence.

The reading of the file starts at the beginning and continues in the order that the entries
are set out in the file, thus it all starts at the beginning. Therefore, the only way to
recover data from a basic sequential file is to start at the beginning of the file and read
one record after the other, in sequence, until you reach the record you are seeking for.
This process must be repeated until all of the data in the file has been read. In order to
get the data, this step has to be taken. The search is being carried out record by record,
in the sequence in which they were recorded. It's possible that this will be a time-
consuming operation, especially if you're working with really large files. as searching
through massive databases, utilising a sequential technique such as this needs a much
longer period of time as compared to using other file formats to identify and retrieve
particular records. If the conditions are met, a sequential file might be stored on a
sequential storage medium like a magnetic tape.

This is provided that the conditions are met. On the other hand, sequential files should
never be used for anything other than the storing of copies of databases that are being
transferred, backed up, or archived. The challenge of straightforward access is made
more difficult in sequential files by the problem of entering and removing items, which
is a problem in both types of files. Both forms of files have this issue. When new data
are added to a sequential file after it has already been created, it can only be done so at
the very end of the file. The use of a sequential file format has a number of drawbacks,
and this is one of them. It is not possible to add entries in the middle of the file without
first entirely rewriting the file. This is impractical. In addition to this, it is difficult to
make changes to a record that already exists without first rewriting the file. This is the
case even if the record is being edited manually. You will first need to locate a record
in order to delete it from the database.

6.4.5 Indexed-Sequential File

The sequential file has a few flaws, which the indexed sequential file aims to remedy
and improve upon by addressing these issues head-on. In an indexed sequential file, a
file is first sequenced on a certain field, and then an index for that file is constructed
based on the same field. This type of file is called a "sequential index." A file of this
nature is referred to as an indexed sequential file. As a direct consequence of this, an

188 | P a g e
extra form of indexing mechanism is incorporated into an indexed sequential file. The
index provides a mechanism for doing searches in a manner that is more expeditious.
Through the process of indexing, a set of orderable quantities may be connected to a
set of things contained in the set. Due to the indexed sequential file arrangement, the
processing can be done in either a sequential or random method.

It is said to be a "indexed sequential file" when a file that is sequential (or ordered on
main keys) also has an index associated with it. The index makes it possible to obtain
entries in any order, but the sequential structuring of the file makes it simple to go on
to consecutive items and promotes processing in a sequential manner. This makes it
possible to retrieve elements in any order. Another feature that may be reached through
the use of this file system is the overflow area. This function makes more space
available for the insertion of records without requiring the development of a new file
in which to store the data. Before moving on to discuss the indexed sequential file
structure, let's first go through the various types of indexes that are possible in the first
place.

6.4.6 Random Access File Organization

Alternately, a related organisation or an organisation with direct access might be used


to refer to an establishment that possesses random access. A random-access file is made
up of a group of records that all have the same length and are stored in close proximity
to one another. In contrast to the indexed sequential structure, which possesses this
quality, the records are not preserved in any particular sequence at any point in time.
There is no direct connection that can be made between the record and the address of
the file that can be found in the database. In order to correlate data with their appropriate
file locations, the use of linear addressing or hashing methods is necessary. When direct
access is used in its most fundamental form, identifying numbers are assigned to the
data records. This is the first step in the process. These identifiers, when combined,
provide a relative address inside the file.

Files that make use of linear addressing take up the least amount of space feasible on
the storage media. This is due to the fact that there is a one-to-one link between each
record key and an address on the disc. Additional files may be added to the ones that
are already there, so long as they fit within the limitations of the available space on the
disc. One of the disadvantages of linear addressing files is that the database
management system (DBMS) will most likely be the one to establish the values of the

189 | P a g e
primary key, and these values will need to be updated if the file has to be reorganised.
The consistent structuring of files is required in order to regain the space on the disc
that has been lost as a consequence of the additions and deletions that have been made
to the records.

Hashing is a technique that programmers use to quickly get access to a relative file that
is associated with the primary key. This is accomplished via the use of a mechanism
called hashing. In this approach, a mathematical operation of some type is performed
on a field of the record, and the result is then used as an address. The process is known
as field-matrix matching. If a Direct file uses hashing techniques, which allows for
greater flexibility, any form of data item may be assigned as the main key of the file,
and the file can then be opened with greater ease. Following that, it instantly begins the
process of converting the values of the primary key into addresses. You could, for
instance, enter a Student ID Number, and then a mathematical formula would be
applied to it once it was entered. The value that is created as a result is the value that
corresponds to the storage space on the disc from which the record may be retrieved.
This value is the value that is produced. This suggests that in order to get a particular
record, we need to have knowledge of the value associated with the key.

6.5 SEQUENTIAL, INDEXED AND RELATIVE/RANDOM FILE


ORGANIZATION

When someone refers to "file organisation," what they really mean is how the records
that are included within a file are organised. This is what is supposed to be referred to
when using the phrase "file organisation." The pace at which one may access their
records can be sped up in a number of different ways, depending on how the files that
hold those details are organised. This can be done in a number of different ways. In the
following paragraphs, an explanation is provided for the various file organising systems
that are available:

• The arrangement of connected files in an ordered way;


• The arrangement of the files in a sequential fashion;
• The arrangement of the files utilising indexing and sequential order

The only thing that the syntaxes of this module, which are discussed together with the
words that apply to them, pertain to is how they are used in the programme. This is the
only thing that is covered in the explanations. When it comes to the syntaxes of this

190 | P a g e
module, this particular item is the only one that is discussed. The following chapter,
which is titled "File handling Verbs," will present a complete study of the programmes
in their entirety that make use of these syntaxes. This analysis will be contained inside
the section that bears the chapter's name.

6.5.1 Sequential File Organization

Records that have been added to a sequential file are maintained and retrieved in the
same sequence in which they were added to the file initially. The sequential file
organisation has a number of fundamental properties, which are listed below in the
following order:

• It is possible to listen to records in the same sequence in which they were


initially recorded. It is important to read the nine recordings that came before
the tenth record in order to be able to read the tenth record.
• The records have been written in the order that corresponds to the sequence. It
is not feasible to insert a new record in the middle of the ones that are already
there. By default, a new item will be appended to the very last position in the
file when it is written.
• After a record has been added to a sequential file, it is not possible to delete it,
change its length, or add more information to it. This is due to the fact that
consecutive files can only be read.
• After the entries have been entered, the order of the records cannot be changed
in any manner and cannot be changed back.
• There is the possibility of bringing the record up to date. If the length of the
new record is the same as the length of the previous record, then it will be able
to overwrite the previous record.

When it comes to printing, sequential output files are an alternative that might be
considered. The following is an example of the syntax for the format of the sequential
file organisation:

6.6 MULTI-KEY FILE ORGANIZATION AND ACCESS METHOD

Because of the way that the technology for mass storage is structured, every data record
that is saved in a particular file is given a unique identification that is different from
any other identification in the file. This identifier, which is also known as a key, may
be the physical position of the record, which may be absolute or relative to some other

191 | P a g e
place, or it may be a key that is composed of a specific combination of the record's
characteristics, such as the information that is contained in a particular field inside the
record. Either way, this identifier may also be referred to as a key. In either scenario,
the actual location of the record might serve as the point of contact. Therefore, the
record of "John Smith" may be assigned the key of "19" (if it is the nineteenth record
in the file), or "01230402" (if it is on device 01, cylinder 23; track 04, the second
record), or "15216" (if John Smith's employee number is 15216) in a personnel file.

This would depend on which of these conditions was true. There is just one thing that
any of these keys may refer to, and that is the location of the record. However, every
single key that corresponds to an entry in a single file has to be a one-of-a-kind
combination. Because of this, it is impossible for any two records to share the same key
value. This rule is applicable in every situation, irrespective of the method that was
used to allocate the key. It is impossible for two records to have the same physical
address since it is also impossible for two records to occupy the same physical space.

In addition, having two records that share the same key would result in ambiguity,
which would imply that a command including the non-unique key would not be
sufficient to enable the hardware to decide which of the several records that contained
that key was to be accessed. This would be the case since there would be two records
that shared the same key. The rationale for this constraint is self-explanatory: it is
impossible for two records to have the same physical address, just as it is impossible
for two records to share the same physical space.

192 | P a g e
View publication stats

You might also like