You are on page 1of 64

Semester: 1 Programme: M.Sc.

DS Unit: 1 - 5 Course Code: 23PDS1ES01

Basics of Data Structures


and Algorithms
Elective - 1 : Data Structures &
Algorithms
Dr. M. Kriushanth
Assistant Professor of Data Science

Department of Data Science


St. Joseph’s College (Autonomous)
Tiruchirappalli - 2
Basic steps in complete development of
Algorithm
• The development of an algorithm involves several steps to ensure a
systematic and effective approach. Here are the basic steps involved
in the complete development of an algorithm:

• Problem Understanding: Clearly define and understand the problem


you are trying to solve. Identify the input, output, and any constraints
or requirements associated with the problem.
• Research and Analysis: Conduct research to gain insights into existing
solutions or approaches related to the problem. Analyze similar
algorithms or techniques that can be adapted or improved upon for
your specific problem.

• Algorithm Design: Design the algorithm by breaking down the


problem into smaller, manageable subproblems. Determine the logic
and steps required to solve each subproblem, considering factors such
as efficiency, correctness, and scalability.
• Pseudocode or Flowchart: Write pseudocode or create a flowchart that
outlines the high-level steps and structure of the algorithm.
Pseudocode is a structured language that combines elements of
programming syntax with plain English to describe the algorithm's logic
without being tied to any specific programming language.

• Coding and Implementation: Translate the pseudocode or flowchart


into a programming language of your choice. Write the actual code that
implements the steps and logic of the algorithm. Follow best practices
in coding, such as using meaningful variable names, commenting the
code for clarity, and maintaining code readability.
• Testing and Debugging: Test the algorithm implementation with
various inputs, including edge cases and boundary conditions, to
ensure correct behavior. Identify and fix any bugs or issues that arise
during testing. Debugging tools and techniques can help in this
process.

• Performance Analysis: Evaluate the algorithm's performance by


measuring factors like time complexity, space complexity, and
accuracy. Analyze the efficiency of the algorithm and identify any
bottlenecks or areas for improvement.
• Optimization and Refinement: Optimize the algorithm to improve its
efficiency, speed, or resource usage. Consider algorithmic
optimizations, data structure choices, or algorithmic trade-offs to
enhance the algorithm's performance.

• Documentation: Document the algorithm's design, implementation,


and usage instructions. Include relevant details such as the problem
statement, input-output specifications, assumptions made, and any
limitations or known issues.
• Integration and Deployment: Integrate the algorithm into the larger
system or application, if applicable. Ensure that it works seamlessly
within the intended environment. Deploy the algorithm to production
or make it available for others to use, depending on the project
requirements.

• Maintenance and Updates: Monitor the algorithm's performance in


real-world scenarios. Address any issues or bugs that arise during its
usage. Update the algorithm as needed to adapt to changing
requirements or to incorporate new features.
Analysis and complexity of Algorithm

• The analysis and complexity of an algorithm involve evaluating its


performance characteristics, such as time complexity and space
complexity. These measures provide insights into how the algorithm's
efficiency and resource requirements scale with the input size. Let's
dive into the key aspects of algorithm analysis and complexity:
• Time Complexity: Time complexity measures the amount of time an
algorithm takes to run as a function of the input size. It focuses on the
number of operations or comparisons performed by the algorithm.
Time complexity is typically expressed using big O notation (e.g., O(n),
O(n log n), O(n^2)).

• Example: A linear search algorithm that scans an array sequentially


has a time complexity of O(n), indicating a linear relationship between
the input size (n) and the number of operations.
• Space Complexity: Space complexity refers to the amount of memory
or space required by an algorithm to solve a problem. It represents
the additional memory used as the input size grows. Space complexity
is also expressed using big O notation.

• Example: An algorithm that creates a new array of size n to store


input elements has a space complexity of O(n), indicating a linear
increase in memory usage with the input size.
• Best Case, Worst Case, and Average Case Complexity: Algorithms may
exhibit different performance characteristics based on the input
distribution or specific input values. Therefore, it's important to
consider best-case, worst-case, and average-case complexities.

• Example: In the case of a sorting algorithm like quicksort, the worst


case occurs when the input array is already sorted or nearly sorted,
resulting in a time complexity of O(n^2). The average and best cases
have a time complexity of O(n log n) when the input is randomly
distributed.
• Asymptotic Analysis: Asymptotic analysis focuses on how the
algorithm's performance scales with the input size in the long run. It
considers the growth rate of the algorithm's time or space requirements
as the input size approaches infinity. Asymptotic analysis helps compare
algorithms and identify the most efficient ones.

• Example: If an algorithm has a time complexity of O(n^2), it means that


the running time grows quadratically with the input size. In comparison,
an algorithm with a time complexity of O(n) is more efficient for larger
input sizes.
• Complexity Classes: Complexity classes classify algorithms based on
their efficiency and scalability. For example, algorithms with a time
complexity of O(1), O(log n), O(n), O(n log n), O(n^2), etc., belong to
different complexity classes. Common complexity classes include
constant time (O(1)), logarithmic time (O(log n)), linear time (O(n)),
polynomial time (O(n^k)), and exponential time (O(k^n)).

• Example: An algorithm with a time complexity of O(n^3) falls into the


polynomial time complexity class, indicating that the algorithm's
running time grows as a polynomial function of the input size.
Asymptotic notations
• Asymptotic notations are mathematical tools used in computer
science and mathematics to describe the performance characteristics
of algorithms and functions. They provide a way to analyze and
compare the efficiency or complexity of algorithms as the input size
approaches infinity. The three commonly used asymptotic notations
are:
Big O notation (O): It represents the upper bound or worst-case
scenario of an algorithm's time complexity. It gives an upper limit on
how the algorithm's running time grows relative to the input size.

Example: If an algorithm has a time complexity of O(n), it means that


the running time of the algorithm grows linearly with the input size.
Omega notation (Ω): It represents the lower bound or best-case
scenario of an algorithm's time complexity. It gives a lower limit on how
the algorithm's running time grows relative to the input size.

Example: If an algorithm has a time complexity of Ω(n^2), it means that


the running time of the algorithm grows at least quadratically with the
input size.
Theta notation (Θ): It represents both the upper and lower bounds of
an algorithm's time complexity. It provides a tight bound on how the
algorithm's running time grows relative to the input size.

Example: If an algorithm has a time complexity of Θ(n), it means that


the running time of the algorithm grows linearly with the input size,
and it is both the best and worst-case scenario.
These asymptotic notations help in analyzing the scalability and
efficiency of algorithms. By focusing on the growth rate of algorithms as
the input size increases, we can compare and choose the most efficient
algorithm for a given problem. It is important to note that asymptotic
notations ignore constant factors and lower-order terms, as they
become less significant as the input size grows.
Problem Solving techniques and examples
• Brainstorming
• Root cause analysis
• Trial and Error
• Means-ends analysis
• SWOT Analysis
• Divide and conquer
• Creative problem solving
• When it comes to problem-solving in the context of data structures
and algorithms, several techniques can be employed. Here are some
commonly used problem-solving techniques along with examples:
• Brute Force: This technique involves systematically trying all possible
solutions and selecting the one that satisfies the problem
requirements.
• Example: Given an array of integers, find the maximum element using
brute force. Iterate through each element in the array and compare it
with the current maximum. Update the maximum if a larger element
is found.
• Divide and Conquer: This technique involves breaking down a
problem into smaller subproblems, solving them independently, and
combining their solutions to obtain the final result.

• Example: Implementing the binary search algorithm. Divide the sorted


array into two halves and check if the target element is in the middle.
If not, recursively search in the left or right half based on the
comparison result.
• Greedy Algorithms: Greedy algorithms make locally optimal choices at
each step to reach an overall optimal solution.

• Example: The activity selection problem, where you have a set of


activities with start and end times. The goal is to find the maximum
number of non-overlapping activities that can be performed. Sort the
activities by end time and greedily select the activity with the earliest
end time while ensuring no overlap.
• Dynamic Programming: This technique solves complex problems by
breaking them down into overlapping subproblems and storing their
solutions to avoid redundant calculations.

• Example: The Fibonacci sequence. Instead of recursively calculating


Fibonacci numbers, use dynamic programming to store the values of
previously calculated Fibonacci numbers and reuse them to compute
larger Fibonacci numbers efficiently.
• Backtracking: Backtracking involves exploring all possible solutions by
incrementally building candidates and backtracking when a candidate
does not satisfy the problem constraints.

• Example: The N-Queens problem, where you have to place N queens


on an N×N chessboard such that no two queens threaten each other.
• Heuristics and Approximation Algorithms: These techniques aim to
find solutions that are not necessarily optimal but are close enough to
the optimal solution within a reasonable time frame.

• Example: The Traveling Salesman Problem. Heuristic algorithms such


as the nearest neighbor algorithm can be used to find an approximate
solution by iteratively selecting the nearest unvisited city until all
cities are visited.
ADT
• In the context of data structures and algorithms, ADT stands for
Abstract Data Type. An abstract data type is a high-level description of
a set of data values and the operations that can be performed on
those values. It defines the behavior of a data type independent of its
implementation.
• ADTs provide a way to abstract away the details of how the data is
stored and manipulated, allowing programmers to focus on the
functionality and behavior of the data structure. They serve as a
blueprint for creating data structures and provide a clear interface for
interacting with the underlying data.
• Stack: An ADT that follows the Last-In-First-Out (LIFO) principle. It
supports operations like push (to insert an element), pop (to remove
the most recently inserted element), and peek (to access the top
element without removing it).

• Queue: An ADT that follows the First-In-First-Out (FIFO) principle. It


supports operations like enqueue (to insert an element at the end),
dequeue (to remove the element from the front), and peek (to access
the front element without removing it).
• Linked List: An ADT that represents a collection of nodes, where each
node contains a value and a reference to the next node. It supports
operations like insertion, deletion, and traversal.

• Binary Search Tree: An ADT that represents a binary tree with a


special property: the value of each node in the left subtree is less than
the value of the node itself, and the value of each node in the right
subtree is greater. It supports operations like insertion, deletion,
search, and traversal.
Greedy Method
• A greedy algorithm is an approach for solving a problem by selecting
the best option available at the moment. It doesn't worry whether
the current best result will bring the overall optimal result.

• The algorithm never reverses the earlier decision even if the choice is
wrong. It works in a top-down approach.

• This algorithm may not produce the best result for all the problems.
It's because it always goes for the local best choice to produce the
global best result.
• However, we can determine if the algorithm can be used with any
problem if the problem has the following properties:

1. Greedy Choice Property


• If an optimal solution to the problem can be found by choosing the
best choice at each step without reconsidering the previous steps
once chosen, the problem can be solved using a greedy approach.
This property is called greedy choice property.
2. Optimal Substructure
If the optimal overall solution to the problem corresponds to the
optimal solution to its subproblems, then the problem can be solved
using a greedy approach. This property is called optimal substructure.
Advantages of Greedy Approach
• The algorithm is easier to describe.
• This algorithm can perform better than other algorithms (but, not in
all cases).

Drawback of Greedy Approach


• As mentioned earlier, the greedy algorithm doesn't always produce
the optimal solution. This is the major disadvantage of the algorithm
Divide and Conquer
• Divide and Conquer is an algorithmic pattern. In algorithmic methods,
the design is to take a dispute on a huge input, break the input into
minor pieces, decide the problem on each of the small pieces, and
then merge the piecewise solutions into a global solution. This
mechanism of solving the problem is called the Divide & Conquer
Strategy.
• Divide and Conquer algorithm consists of a dispute using the
following three steps.
• Divide the original problem into a set of subproblems.
• Conquer: Solve every subproblem individually, recursively.
• Combine: Put together the solutions of the subproblems to get the
solution to the whole problem.
• Examples: The specific computer algorithms are based on the Divide
& Conquer approach:
• Maximum and Minimum Problem
• Binary Search
• Sorting (merge sort, quick sort)
• Tower of Hanoi.
Fundamental of Divide & Conquer
Strategy:
• There are two fundamental of Divide & Conquer Strategy:
• Relational Formula
• Stopping Condition
• 1. Relational Formula: It is the formula that we generate from the
given technique. After generation of Formula we apply D&C Strategy,
i.e. we break the problem recursively & solve the broken subproblems.
• 2. Stopping Condition: When we break the problem using Divide &
Conquer Strategy, then we need to know that for how much time, we
need to apply divide & Conquer. So the condition where the need to
stop our recursion steps of D&C is called as Stopping Condition.
Applications of Divide and Conquer
Approach:
• Binary Search
• Quicksort
• Merge Sort
• Closest Pair of Points
• Strassen's Algorithm
• Cooley-Tukey Fast Fourier Transform (FFT) algorithm
• Karatsuba algorithm for fast multiplication
Dynamic Programming
• Dynamic Programming is a technique in computer programming that helps to
efficiently solve a class of problems that have overlapping subproblems and
optimal substructure property.

• If any problem can be divided into subproblems, which in turn are divided into
smaller subproblems, and if there are overlapping among these subproblems,
then the solutions to these subproblems can be saved for future reference. In this
way, efficiency of the CPU can be enhanced. This method of solving a solution is
referred to as dynamic programming.

• Such problems involve repeatedly calculating the value of the same subproblems
to find the optimum solution.
Dynamic Programming Example
• Let's find the fibonacci sequence upto 5th term. A fibonacci series is
the sequence of numbers in which each number is the sum of the two
preceding ones. For example, 0,1,1, 2, 3. Here, each number is the
sum of the two preceding numbers.
Algorithm
• We are calculating the fibonacci sequence up to the 5th term.

• The first term is 0.


• The second term is 1.
• The third term is sum of 0 (from step 1) and 1(from step 2), which is 1.
• The fourth term is the sum of the third term (from step 3) and second
term (from step 2) i.e. 1 + 1 = 2.
• The fifth term is the sum of the fourth term (from step 4) and third
term (from step 3) i.e. 2 + 1 = 3.
Backtracking
• A backtracking algorithm is a problem-solving algorithm that uses a brute force
approach for finding the desired output.

• The Brute force approach tries out all the possible solutions and chooses the
desired/best solutions.

• The term backtracking suggests that if the current solution is not suitable, then
backtrack and try other solutions. Thus, recursion is used in this approach.

• This approach is used to solve problems that have multiple solutions. If you want
an optimal solution, you must go for dynamic programming.
State Space Tree
• A space state tree is a tree representing all the possible states
(solution or nonsolution) of the problem from the root as an initial
state to the leaf as a terminal state.
Backtracking Algorithm Applications
• To find all Hamiltonian Paths present in a graph.
• To solve the N Queen problem.
• Maze solving problem.
• The Knight's tour problem.
Branch and bound
• Branch and bound is one of the techniques used for problem solving.
It is similar to the backtracking since it also uses the state space tree.
It is used for solving the optimization problems and minimization
problems. If we have given a maximization problem then we can
convert it using the Branch and bound technique by simply converting
the problem into a maximization problem.
Binary Tree
A binary tree is a tree data
structure in which each parent
node can have at most two
children. Each node of a binary tree
consists of three items:

• data item
• address of left child
• address of right child
Types of Binary Tree
1. Full Binary Tree
• A full Binary tree is a special type of binary tree in which every parent
node/internal node has either two or no children.
2. Perfect Binary Tree
• A perfect binary tree is a type of binary tree in which every internal
node has exactly two child nodes and all the leaf nodes are at the
same level.
3. Complete Binary Tree
A complete binary tree is just like a full binary tree, but with two major
differences
• Every level must be completely filled
• All the leaf elements must lean towards the left.
• The last leaf element might not have a right sibling i.e. a complete
binary tree doesn't have to be a full binary tree.
4. Degenerate or Pathological Tree
• A degenerate or pathological tree is the tree having a single child
either left or right.
5. Skewed Binary Tree
• A skewed binary tree is a pathological/degenerate tree in which the
tree is either dominated by the left nodes or the right nodes. Thus,
there are two types of skewed binary tree: left-skewed binary tree
and right-skewed binary tree.
6. Balanced Binary Tree
• It is a type of binary tree in which the difference between the height
of the left and the right subtree for each node is either 0 or 1.
Binary Search Tree(BST)
Binary search tree is a data structure that quickly allows us to maintain
a sorted list of numbers.

• It is called a binary tree because each tree node has a maximum of


two children.
• It is called a search tree because it can be used to search for the
presence of a number in O(log(n)) time.
• The properties that separate
a binary search tree from a
regular binary tree is
• All nodes of left subtree are
less than the root node
• All nodes of right subtree are
more than the root node
• Both subtrees of each node
are also BSTs i.e. they have
the above two properties
Rehashing
• The reason for rehashing is that anytime a new key-value pair is
placed into the map, the load factor rises, and as a result, the
complexity rises as well. And as the complexity of our HashMap
grows, it will no longer have a constant O(1) time complexity. As a
result, rehashing is used to disperse the items across the hashmap,
lowering both the load factor and the complexity, resulting in search()
and insert() having a constant time complexity of O(1). One should
note that the existing objects may fall into the same or a different
category after rehashing.
How Rehashing is done?
Rehashing can be done in the following way:
• Check the load factor after adding a new entry to the map.
• Rehash if it is more than the pre-defined value (or the default value of
0.75 if none is specified).
• Make a new bucket array that is double the size of the previous one
for Rehash.
• Then go through each element in the old bucket array, calling insert()
on each to add it to the new larger bucket array.
What is Load Factor?
• Load factor is a measure that helps to decide when to increase the
HashTable or HashMap capacity to maintain the operations(search
and insert) complexity of O(1). The default value of the load factor is
0.75, i.e. 75% of the HashMap size.

• Load factor can be decided using the formula as follows:

• The initial capacity of the HashMap * Load factor of the HashMap


• Let's understand Load Factor with the help of an example.

• For example if the initial capacity of Hashtable is 20 and the load


factor of hastable is 0.75(default value), then according to the formula
20*0.75 = 15.

• It means the 15th key-value pair of the hashtable will keep its size as
20. Now when we insert the 16th key-value pair into the hashtable, it
will increases its size from 20 to 20*2 = 40 buckets.
Extensible Hashing
• Extendible Hashing is a dynamic hashing method wherein directories,
and buckets are used to hash data. It is an aggressively flexible
method in which the hash function also experiences dynamic
changes.

• Directories: The directories store addresses of the buckets in pointers.


An id is assigned to each directory which may change each time when
Directory Expansion takes place.
• Buckets: The buckets are used to hash the actual data.
Priority Queue
• A priority queue is a type of queue that arranges elements based on
their priority values. Elements with higher priority values are typically
retrieved before elements with lower priority values.

• In a priority queue, each element has a priority value associated with


it. When you add an element to the queue, it is inserted in a position
based on its priority value. For example, if you add an element with a
high priority value to a priority queue, it may be inserted near the
front of the queue, while an element with a low priority value may be
inserted near the back.
• There are several ways to implement a priority queue, including using
an array, linked list, heap, or binary search tree. Each method has its
own advantages and disadvantages, and the best choice will depend
on the specific needs of your application.

• Priority queues are often used in real-time systems, where the order
in which elements are processed can have significant consequences.
They are also used in algorithms to improve their efficiencies, such as
Dijkstra’s algorithm for finding the shortest path in a graph and the
A* search algorithm for pathfinding.

You might also like