You are on page 1of 66

CS-605: Advanced Algorithm

Analysis

Lecture 1: Introduction to DSA


By
Dr Syed Khaldoon Khurshid
Introduction to Teacher
• Dr. Syed Khaldoon Khurshid
• Teaching experience more than twenty years
• Information Retrieval and Data Science
Online contents of the course
Google Classroom
• https://classroom.google.com/c/NTUwMDg
3NDc4MjE0
• Class code: w6kdr4w
Why are you here?
• Algorithms are fundamental.
Standard Orientation
• Algorithms are useful.
• Course is a required course.
How and why do you need Algorithms
in CS?
• Have you used Algorithms during your BS
(CS) Program?
• Where will you use Algorithms in your MSCS
program?
Algorithms are fundamental

The
Operating Systems (CS 140) Machine learning (CS 229) Cryptography (CS 255)
Algorithmic
Lens

Compilers (CS 143)


Networking (CS 144) Computational Biology (CS 262)
Course goals
• The design and analysis of algorithms
– These go hand-in-hand

• In this course you will:


– Learn to think analytically about algorithms
– Learn to communicate clearly about
algorithms
Roadmap

Asymptotic
Analysis

MIDTERM

Dynamic
Greedy Algorithms Programming

Graphs!
FINAL
The
Future!
Aside: the bigger picture

• Does it work? • Should it work?


• Is it fast? • Should it be fast?
• Can I do better?
• We want to reduce crime.
• It would be more “efficient” to put cameras in everyone’s
homes/cars/etc.
• We want advertisements to reach to the people to whom they are
most relevant.
• It would be more “efficient” to make everyone’s private data public.
• We want to design algorithms, that work well, on average, in the
population.
• It would be more “efficient” to focus on the majority population.
If algorithms are fundamental (which they are)

•Then some of the most consequential choices you


will make as computer scientists are:
– Deciding which problem to solve
– Deciding how to turn that problem into
something algorithmically tractable
– Deciding which algorithm to use to solve it and
what tradeoffs to accept
Algorithms: Good Approach

Write programs
based on the
Use or design an algorithms
algorithm to
Turn a real- solve the
world problem problem
into a formal
problem
Some (potentially impactful) decisions:

We often need to ignore or change aspects of a real-world


situation in order to turn it into an algorithmically solvable
problem. For example, we can write an algorithm that
sorts a numbered list without knowing what the numbers
are numbers of.
– Abstraction is when we omit details of the real-
world situation.
• Omit the kind of thing being sorted by our
algorithm, or what condition it is in, or what color
it is, or how long it has been in the list.
– Idealization is when we deliberately change aspects
of the real-world situation.
• Round the numbers being sorted to make them
whole numbers.
Course goals
• Think analytically about algorithms
• Learn to communicate clearly about
algorithms
Pre- Courses required before AAoA
• Programming I
1 • Programming II

• Introduction to Data Structures


2 • Data Structures

• Introduction to Algorithms
3 • Analysis of Algorithms
1. Learn at least one Programming
language
• This should be your first step while starting to learn data
structure and algorithms. We as human beings, before
learning to write a sentence or an essay on a topic, first try
to learn that language: the alphabet, letters, and
punctuations in it, how and when to use them. The
same goes for programming also.
• Firstly, select a language of your choice, be it Java, C,
C++, Python, or any other language of your choice.
Before learning how to code in that language you should
learn about the building pieces of the language: the basic
syntax, the data types, variables, operators,
conditional statements, loops, functions, etc. You may
also learn the concept of OOP (Object Oriented
Programming).
2. Learn about Complexities
• Here comes one of the interesting and
important topics. The primary motive to use
DSA is to solve a problem effectively and
efficiently.
• How can you decide if a program written by
you is efficient or not? This is measured by
complexities. Complexity is of two types:
• Time Complexity: Time complexity is used to
measure the amount of time required to
execute the code.
• Space Complexity: Space complexity means
the amount of space required to execute
successfully the functionalities of the code.
Auxiliary Space Vs Space Complexity:
• The term Space Complexity is misused for Auxiliary
Space at many places. Following are the correct
definitions of Auxiliary Space and Space Complexity.
• Auxiliary Space is the extra space or temporary
space used by an algorithm.
• The space Complexity of an algorithm is the total
space taken by the algorithm with respect to the
input size. Space complexity includes both
Auxiliary space and space used by input.
• So how can we determine which one is efficient?
The answer is the use of asymptotic notation.
Example: Auxiliary Space Vs Space Complexity
• For example, if we want to compare standard
sorting algorithms on the basis of space, then
Auxiliary Space would be a better criterion than
Space Complexity. Merge Sort uses O(n) auxiliary
space, Insertion sort, and Heap Sort use O(1)
auxiliary space. The space complexity of all these
sorting algorithms is O(n) though.
• Space complexity is a parallel concept to time
complexity. If we need to create an array of size n,
this will require O(n) space. If we create a two-
dimensional array of size n*n, this will require
O(n2) space.
• In recursive calls stack space also counts.
How each call is added up in the
level of the stack?

Output:
Each of these
calls is added
to call stack
and takes up
actual
memory. So it
takes O(n)
space.
Asymptotic Notation:
• Asymptotic notation is a mathematical
tool that calculates the required time in
terms of input size and does not require
the execution of the code.
Asymptotic Notation:
• It neglects the system-dependent constants
and is related to only the number of modular
operations being performed in the whole
program. The following 3 asymptotic notations
are mostly used to represent the time
complexity of algorithms:
• Big-O Notation (Ο) – Big-O notation
specifically describes the worst-case scenario.
• Omega Notation (Ω) – Omega(Ω) notation
specifically describes the best-case scenario.
• Theta Notation (θ) – This notation represents
the average complexity of an algorithm.
3. Data Structures
• A data structure is a meaningful way of
arranging and storing data in a computer so as to
use it efficiently.
• More precisely, a data structure is a collection of
data values, the relationships among them, and
the functions or operations that can be applied
to the data.
• Data structures provides means for management of
large dataset such as databases or internet
indexing services.
• Though file system is a much more advanced data
structure, it does try to solve the fundamental issue
that any data structure tries to tackle i.e. efficient
way of storing, organizing, and maintaining data.
Basic Concepts
Data

Data
Structure

Algorithms

Analysis of
Algorithms
Asymptotic Complexity - determines how fast an algorithm can compute
(with respect to input) when applied over a data structure.
Types of the Data Structure
Non-primitive data structures,
The primitive data structure,
can store data of more than one
also known as built-in data
type. For example, array, linked
types can store the data of only
list, stack, queue, tree, graph, and
one type. You know the integers,
so on. These are often referred to
floating points, characters, pointers
as derived data types.
etc.
Linear Vs Non-linear

Its data have to Non-linear data


be structured in structures are
a linear order. not arranged
That means sequentially in
there is no that each
hierarchy, and element of such
elements are data structures
held together can have
sequentially multiple paths to
either by connect to other
pointers or elements or
contiguous form a hierarchy.
memory
locations.
Static Vs Dynamic Data Structures
• Linear data structures are mainly classified into two
categories, static and dynamic.
• Static data structures - Here the size of the data
structure is allocated in the memory during the compile-
time thereby rendering the allocated size fixed. This
results in the maximum size of such data structures
needing to be known in advance, as memory cannot be
reallocated at a later point. That’s why they are called
static.
• Dynamic data structure - Here the size is allocated at
the runtime which makes it flexible. As the memory can
always be reallocated depending on the requirements,
these data structures are called dynamic.
• An array is a static data structure whereas a linked
list is dynamic!
Data Structure: Tree
• Trees are indeed one of the most important
data structures with a hierarchical
relationship among its elements
aka nodes which are connected by links
aka edges. If it helps, think of a tree as a
linked list but with two or more pointers.
• However, unlike a circular linked list, a tree
doesn’t have any cycle.
Need of the Graph
• But there could be instances where we do need
such cycles or the situation could be a little bit
complicated.
• In the context of social media, an expression
like this doesn't seem unheard of;
– 'I am Jane, a friend of a friend of Joe, and since
you also follow Joe, I was thinking maybe we can
be friends, please accept my request...'
• What data structure can model these elaborate
relationships?
Graph
• A graph contains nodes and edges much like
a tree but far more superior and versatile. It
can be so powerful that it has a dedicated
branch named 'Graph Theory’ in
mathematics.
• A graph can have cycles, disjoints, and all
kinds of flexibility on top of a tree. Hence,
it's important to remember that a tree
falls under a special category of graphs.
Abstract Data Type (ADT)
• In computer science, we often talk about abstraction and
programming to an interface.
• For instance, you don’t need to know automobile
engineering to drive your car to the grocery store. You
just have to know how to drive since the manufacturers
already abstracted away all the details of the car engine
and other internal mechanisms. And for driving, you do
get interfaces such as a steering wheel and a gearshift in
order to interact with the car.
• Likewise, every data structure has a corresponding
interface known as Abstract Data Type or ADT. Simply
put, an abstract data type only addresses the
interface, and data structures implement that
interface.
Queue
• For instance, a queue is an ADT that must
maintain the First In First Out (FIFO)
ordering of elements. It simply means,
the first person to get in the queue, would
be the first person to get out of the queue.
• This idea of a queue ADT might have been
inspired by a real-world standing queue in
front of a grocery store counter or similar
scenarios.
Stack
• On the other hand, Stack is another example
of ADT that conforms to the Last In First Out
(LIFO) ordering of elements.
• This idea of this ADT is probably inspired by
a real-world stack of books.
Basic Operations
• We have covered a lot of ground by discussing
several data structures such as array, linked list,
stack, queue, tree, graph, hash and probably
hinted at their respective use-cases but never
really got to the specifics.
Like, how do they operate?
• Well, it turns out almost all data structures
support a few key operations...
Search
• The most useful operation! If you cannot search an
element, how are you ever be able to do anything?
• How can you ever access it if you cannot search for
it?
Basic Operations
Traverse
• Traversing means iterating through a data structure in
some particular order.
• In an array, you could access an element in constant
time i.e. O(1) provided you knew its position
beforehand!
• But that doesn't mean you can search for it in constant
time.
• You still need to start from the first index and search
for it. Hence, in order to search you need to traverse.
• So, we conclude that traversal is also as useful as
searching.
Basic Operations
Insert
• Insertion basically means to insert one or more
elements into your data structure. Elements can be
inserted at the beginning, end or at any specified
position.
• In some data structures, you don’t even need to
specify a position e.g. insertion in a heap.
• Insertion is a fundamental operation because if you
cannot insert anything into your data structure, why
bother having one?
• A cart exists for a reason to help facilitate storing
items for purchase. In order to store it, you must
insert it.
Basic Operations
Delete
• Once we are done with our shopping, we need to delete items
from the cart, show them to the counter, and put it into our bag
(another data structure).
• But you can argue, you don't need to delete a number in an array
to use it!
• Because we can access an element like this:
• float chocolate_bar = cart[35]; eat(chocolate_bar);
• So, depending on the chocolate_bar's data type, there are two
possible scenarios:
• accessing it creates another copy of the chocolate bar and then
you eat it but it's highly unlikely because it violates the law of
conservation of energy.
• eating the chocolate bar from the cart itself and leaving just the
wrapper behind.
• The second scenario is practical but highly inefficient which
contradicts the existence of data structure.
Basic Operations
Update
• This one builds on top
of searching and inserting e.g. you suddenly
realized you've purchased the wrong chocolate bar
so you search the wrong bar in the cart and replace
it with the correct bar.
• Now you might say, replacing could be also be
considered an operation that builds on top of
deletion and insertion.
• cart[35] = better_chocolate_bar;
– Here we just searched the cart and then found it to be on
position 35 and replaced it with a better_chololate bar.
Basic Operations
Sort
• This is a quite advanced data structure operation and an array is highly
efficient for this kind of operation due to its constant access and linear
ordering.
• Linked lists can also be sorted by manipulating pointers and so can be trees.
• A special sorted binary tree known as the Binary Search Tree is one of the
most useful data structures out there.
• I mean, you can imagine and probably appreciate a grocery store for having all
its similar items sorted in an aisle according to their prices from left to right.
• So, if you are looking for an expensive chocolate bar you know exactly in which
direction to look!
Merge
• This is not a fundamental operation and in fact, it uses other basic ones to help
facilitate the said process e.g. merging two linked lists into one by
manipulating pointers, or two BSTs into one.
• To put into perspective, putting the items from two carts into a larger one
could be a case for merging
• Merging is also used for one of the most well-known sorting algorithms,
namely merge sort.
Advantages of Data Structures
• Efficiency - All these various data structures exist for a reason and that
is to facilitate efficient storage and retrieval of data for various use-
cases.
• For example, searching for a value by its key is better suited for hash
maps than an array.
• A tree is more likely to be efficient to model any file system but not
efficient for representing relations in social media applications.
• Abstractions - Every data structure provides clean interfaces in the
form of operations exclusive to their respective abstract data types,
which makes their internal implementation hidden from developers.
• You don't need to know how STL's unordered_map works under the
hood to write an application using it in C++, you just need to know
what operations it supports and how you can use that to your
advantage?
• Composition - Fundamental data structures can be combined to build
more complex data structures.
• In a database management system, indexing is usually implemented
using a B+ tree which is based on top of the B tree - a special kind of n-
ary self-balancing tree data structure.
• In fact, you can think of the database itself as this huge composite data
structure capable of storing data even when a program has run its
course.
What are Data Structures
and Algorithms?
DSA
• A data structure is a named location that
can be used to store and organize data. And,
an algorithm is a collection of steps to solve
a particular problem.
Algorithm
What all you learned from this
hypothetical scenario
• You learnt what is an algorithm?
• You learnt about the characteristics of an
algorithm.
• You also learnt how data flows in an
algorithm.
Algorithm
• The formal definition of an algorithm is a finite set of
steps carried out in a specific time for specific
problem-solving operations, especially by a Computer.
So basically, it is not the complete code, rather it is the logic
that can be implemented to solve a particular problem.
• Algorithms are independent of programming languages
and are usually represented by using flowcharts or
pseudocode.
The characteristics of algorithm:
• Unambiguous– An algorithm should be clear and simple
in nature and lead to a meaningful outcome i.e. they should
be unambiguous. The recipe for baking a cake is the perfect
example for this. It tells you the step by step procedure
clearly, which leads you to bake a cake successfully. The
situation could easily turn ambiguous if the recipe only has
steps like “Add sugar.” This phrase is ambiguous or open to
multiple interpretations- “How much sugar to add?” or
“Add sugar to what?” are the kind of questions that can
cause confusion.
• Input– It should have some input values. In our scenario of
baking a cake, our ingredients were the inputs
• Output– What did the ingredients of our recipe finally give
us? A cake! The cake was our final output. So every
algorithm should have well-defined outputs
The characteristics of algorithm:
• Effectiveness– Each step of the algorithm should be
effective and efficient enough to produce results. The
effectiveness of an algorithm can be evaluated with the
help of two important parameters-
– Time Complexity-It is nothing but the amount of
time taken by the computer to run the algorithm.
We can also call it the computational complexity of
an algorithm. It can either be best-case, average-
case or worst-case. We always aim for the best-case
for effectiveness.
– Space Complexity-It refers to the amount of
computational memory needed to solve an instance
of the problem statement. In simple words, it is the
total space taken by the algorithm with respect to
the input size. The lower the space complexity of an
algorithm, the faster the algorithm will work.
The characteristics of algorithm:
• Finiteness– The steps of an algorithm should be
countable and it should terminate after the specified
number of steps. The recipe for the cake exhibits the
characteristic of finiteness as you only need to
follow the designated number of steps to get the
desirable result.
• Language Independent– Just like it doesn’t matter
whether you bake a cake in a pressure cooker or an
oven, you will eventually receive a cake as the end
product. Similarly, algorithms should be language
independent. In other words, the algorithm should
work for all programming languages and give the
same output.
Data Flow of an Algorithm
• Let us refer to our cake baking scenario once again-
• The closed bakeries due to Covid restrictions was our
problem. Our recipe was the algorithm. The ingredients
were our input. The oven was the processing unit and
finally, our cake was the output!
Breaking down and taking a closer look
• Problem- The problem can be any real world or a
programmable problem. The problem statement usually
gives the programmer an idea of the issue at hand, the
available resources and the motivation to come with a
plan to solve it.
• Algorithm- After analyzing the problem, the
programmer designs the step by step procedure to solve
the problem efficiently. This procedure is the algorithm.
• Input- The algorithm is designed and the relevant inputs
are supplied.
• Processing Unit- The processing unit receives these
inputs and processes them as per the designed
algorithm.
• Output- Finally, after the processing is complete, we
receive the favorable output of our problem statement.
Why do we Need Algorithms?
• Primarily, we need algorithms for the following
two reasons-
• Scalability– When we have big real-world
problems, we cannot tackle them on the macro
level. We need to break them down into smaller
steps so that the problem can be analyzed
easily. Thus, algorithms facilitate scalability.
• Performance– It is never easy to break down
big problems into smaller modules. But
algorithms help us achieve this. They help us
make the problem feasible and provide
efficient performance driven solutions.
How to Write an Algorithm?
• A simple Google search would tell you that
there is no one “right way” to bake a cake.
There are countless recipes available online-
devised and tested by bakers around the world.
Likewise, there are no predefined standards on
how to write an algorithm.
• The way we write algorithms is heavily
influenced by the problem statement and the
resources available. The common construct
that is widely followed in case of algorithms is
the use of pseudocode.
• One thing that is appreciated while writing/
designing an algorithm is that the problem
domain should be well-defined.
Factors of an Algorithm
While designing an algorithm, we must consider the
following factors:
• Modularity- If a big problem can be easily broken down
into smaller ones, it facilitates modularity.
• Correctness- The analysis of the problem statement and
consequently the algorithm should be correct. The
algorithm should also work correctly for all possible test
cases of the problem at hand. A test case is nothing but a
specification of inputs, executing conditions, testing
procedure and expected results, which can be developed
from the problem statement itself
• Maintainability- The algorithm should be designed in
such a way that it should be easy to maintain and if we
wish to refine it at some point, we should be able to do
so without making major changes.
• Functionality- The steps of an algorithm should
successfully solve a real world problem.
Factors of an Algorithm
• User-friendly- It should be easily understood by
programmers.
• Simplicity- It should be the simplest possible
solution to a problem. By simplicity, we refer to the
fact that the algorithm should have the best-case
time complexity. The approach of the algorithm
should be simple yet it should produce the desired
results in a time-efficient manner, keeping both-
time and space complexities- in mind.
• Extensible- It should be extensible i.e. the algorithm
should facilitate reusability. In other words, other
programs should be able to reuse it or extend for
their own problem statement too.
Example 1:
Example 2:
Importance of Algorithms
• The importance of algorithms can be classified as-
• Theoretical Importance- The best way to deal with
any real-world problem is to break them down into
smaller modules. The theoretical knowledge from
pre-existing algorithms often help us to do so.
• Practical Importance- Just designing an algorithm
theoretically or using some aspects of pre-existing
ones is not enough. The real-world problems can
only be considered to be solved if we manage to get
practical results from it.
• Hence, an algorithm can be said to have both
theoretical and practical importance.
Issues While Working on
Algorithms
• The most common issue that we face while
working on algorithms is- “How do I design it?”
• Not all problems will have an easy solution and
along with that, we need the solutions to be
efficient as well. Sometimes, this may cause the
programmers to feel stuck.
Why Learn Data Structures and Algorithms?

• You actually need an algorithm to build the


data structure in the first place. Even if we give
it a pass, and assume it exists!
• Example: lossless data compression
algorithms! To shed some light, let’s take a
string “aaaaaaabbbcddddddd” of length 18 and
a simple compression algorithm that encodes
our string to something like “a7b3c1d7”
of length 8. Notice how the information about
the string is still preserved and can be decoded
back. Hence, it’s called “lossless”.
Using DSA to Make Your Code Scalable
• Let’s take a scenario – your manager asks you
to rank customers according to their purchase
history, and the higher the purchase, the better
the rank should be.
• The record of their purchase is present on
a CSV file.
• After giving some thought, you come up with a
feasible solution vis-a-vis sorting the record
according to purchase in descending order.
• Since sorting goes well with contiguous
memory, you first load the data into a suitable
data structure like an array that stores related
data items in a sequential address space.
Using DSA to Make Your Code Scalable
• Next, you have a plethora of sorting
algorithms to choose from:
Using DSA to Make Your Code Scalable
• Let’s say you pick insertion sort and apply the
algorithm to the array.
• It works.
• Next thing you know, your team rolls out a new
feature that rewards customers based on their
ranking which in turn creates positive
reinforcement and the sales go through the roof.
However, on a blissful Saturday night, your code
breaks, and the production is down!
• Even though you had solved the problem, it turns
out that the insertion sort is pretty costly and
makes your program run slower and slower as the
size of the customer record becomes larger and
larger.
Using DSA to Make Your Code Scalable
• More formally, insertion sort scales quadratic
ally with respect to the data set.
Using DSA to Make Your Code Scalable
• In the end, you are advised to pick a more
efficient sorting algorithm such as merge
sort and modify your code to handle the scale.
• But, these scalability issues can also arise
from a memory standpoint where we might
have to trade off time for optimizing space.
• In that context, an algorithm
like quicksort might be a better choice since it’s
an in-place sorting algorithm in that it doesn’t
need extra space in memory to sort
unlike merge sort (“extra space” refers to
memory apart from the record itself).
• Data structures and algorithms play a
crucial part in handling the scalability of a
software system.
Why Learn Data Structures and Algorithms?
• The purpose of data structures and algorithms
is to make your software as efficient as possible.
• “We need to learn data structures and
algorithms since it enhances our ability to solve
problems much more efficiently and helps us
think through a scenario methodically.”
• Data structures allow information storage, it
provides the means for management of large
data like databases, work together and are
necessary for efficient algorithms, safe storage
of data, allows easier processing of data, and
the use of the internet to access data anytime.

You might also like