Professional Documents
Culture Documents
2022-2023
Data Structures and Algorithms with Python
Contents Page
Chapter one Introduction 3
Chapter Two Complexity Analysis 16
Chapter Three Searching and Sorting 40
Chapter Four Python Programming Introduction 88
Chapter Five Python Collections 104
Chapter Six Stack and Queue 138
Chapter Seven Arrays and Linked lists 186
Chapter Eight Trees 270
Chapter Nine Graphs 286
Chapter Ten Recursion
Chapter Eleven Hashing
Chapter Twelve Strings Algorithms
String Operations
The Boyer-Moore Algorithm
The Knuth-Morris-Pratt Algorithm
Hash-Based Lexicon Matching
Tries
Chapter Thirteen Heaps
Chapter Fourteen Exercises
COURSE OBJECTIVES:
The course should enable the students to:
1- Learn the basic techniques of algorithm analysis.
2- Demonstrate several searching and sorting algorithms.
3- Implement linear and non-linear data structures.
4- Demonstrate various tree and graph traversal algorithms.
5- Analyze and choose appropriate data structure to solve problems
in real world.
1. Introduction
Data Types:
Computer memory is filled with zeros and ones. If we have a
problem and we want to code it, it is very difficult to provide the
solution in terms of zeros and ones. To help users, programming
languages and compilers provide us with data types. For example,
integer takes 2 bytes (actual value depends on compiler), float takes
4 bytes, etc. At the top level, there are two types of data types:
➢ System-defined data types (also called Primitive data types)
Data types that are defined by system. The primitive data types
provided by many programming languages are: int, float, char,
double, bool, etc. The number of bits allocated for each primitive
data type depends on the programming languages, the compiler, and
the operating system. For the same primitive data type, different
languages may use different sizes. Depending on the size of the data
types, the total available values (domain) will also change.
Algorithms
➢ What is an Algorithm?
An algorithm is an explicit, precise, unambiguous sequence of
elementary simple instructions, be followed to solve a problem.
Normally, people write algorithms only for difficult tasks. Algorithms
explain how to find the solution to a complicated algebra problem,
The main techniques the algorithm uses Can you reuse those
techniques to solve similar problems?
Note: If the program is run on a large data set, then the running time
becomes an issue.
Algorithms Properties:
1) Precision:
The steps are precisely defined (understand). E.g. (instruction
Add 6 or 7 to x) is not allowed, as it is not clear).
2) Correctness: The output is correct for each input as defined by
the problem.
3) Finiteness: The algorithm produces the output after a finite
number of steps for each input.
4) Determinism
5) Generality:
Applicable for all problems of the desired form, not a specific
case.
The importance of algorithm analysis. you should be able to find
the complexity of any given algorithm.
An algorithm is a step-by-step procedure for performing some task in
a finite amount of time, and a data structure is a systematic way of
organizing and accessing data. These concepts are central to
computing.
A primary analysis tool is to characterize the running time of an
algorithm or data structure operation, with space usage also being of
interest. It is an important consideration in economic and scientific
Second Level Students 2021/2022 6
Data Structures and Algorithms with Python
O(n)
O(log n)
O(1)
Input n
Fig. 1-1
Another factor for algorithm efficiency is storage space of memory. A
double variable takes large space, what if they become 10 double
variables so the storage >>>>.
Data structures:
(a) Three variables x, y, z.
(b) An array nums[0..2].
- accessing an array-item takes more time than accessing a simple
variable. (To access nums[i], the executable code must compute its
address
addr(nums[i]) = addr(nums[0]) + i*sizeof(int), which involves 1
addition and 1 multiplication.)
Example: Selection problem
Given a list of N numbers, determine the kth largest, where k≤ N
Algorithm 1 Algorithm 2
1- Read N numbers into 1- Read the first k elements into an array.
an array. 2- Sort them in descending order
2- Sort the array in 3- Each remaining element is read one by
descending order by one:
simple algorithm -if smaller than the kth element, then it is
3- Return the element in ignored.
position k - otherwise, it is placed in its correct spot in
the array bumping one element out of the
array
4- The element in the kth position is
returned as the answer.
Analyzing Algorithms
The efficiency (running time) of an algorithm or data structure,
depends on number of factors. If an algorithm has been
implemented, we can study its running time by executing it on
various test inputs and recording the actual time spent in each
execution. Such measurements can be taken in an accurate manner
by using system calls that are built into the language or operating
system for which the algorithm is written. (Experiments)
In general, the running time of an algorithm or data structure method
increases with the input size, although it may also vary for distinct
inputs of the same size.
Also, the running time is affected by the hardware environment
(processor, clock rate, memory, disk, etc.) and software
environment (operating system, programming language, compiler,
interpreter, etc.) in which the algorithm is implemented, compiled,
and executed. All other factors being equal, the running time of the
same algorithm on the same input data will be smaller if the
computer has, say, a much faster processor or if the implementation
is done in a program compiled into native machine code instead of
an interpreted implementation run on a virtual machine.
- We only analyse correct algorithms
- An algorithm is correct iff ∀input instance ∃ correct output
• Cubic algorithms work well if the elements are not more than
1,000.
Pseudocode:
1. Initialize positiveCount = 0.
2. Use each nums[i] > 0 to increment positiveCount by one.
3. Let negativeCount = n − positiveCount.
Code:
positiveCount = negativeCount = 0;
for (i=0; i<n; i++) //each nums[i] ¹ 0
if (0 < nums[i]) positiveCount++;
else negativeCount++;
An Example of Pseudo-Code
The array-maximum problem is a simple problem of finding the
maximum element in an array A storing n integers. To solve this
problem, we can use an algorithm called arrayMax, which scans
through the elements of A using a for loop.
Programming
Q: Is Programming and programming language the same thing?
Ans: No.
Programming language is a tool we use to program – Ex: Photoshop
is not the image.
You learn programming for years. Is it so difficult? No- you just need
some time to become efficient programmer. During learning
programming, you need to learn algorithms (Mohamed Ibn Mosa Al
Khawarezmy).
to solve a particular problemالخوارزميات ببساطة هي خطوات لحل منطقي
خوارزمية لطباعة مستند وورد:مثال بسيط علي خوارزمية حياتية
<=== أنا كمبرمج ممكن أكتب نفس البرنامج بطرق مختلفة لكن كيف أقيم أفضلها؟
ليس من المهم ان تقول انا اتعلمت بايثون انما االهم هل. هكذا تكون مبرمجalgorithm
تعرف تجيد استخدام أدواتك وتفكر أفضل؟
Fig. 1-2
2. Complexity Analysis
After completing this chapter, you will be able to:
✓ Analysis of the algorithm’s computational complexity.
0 1 2 3 4 5
10 5 15 2 25 55
Best case average case worst case
import time
problemSize = 10000000
print("%12s%16s" % ("Problem Size", "Seconds"))
for count in range(5):
start = time.time()
# The start of the algorithm
work = 1
for x in range(problemSize):
work += 1
work -= 1
# The end of the algorithm
elapsed = time.time() - start
print("%12d%16.3f" % (problemSize, elapsed))
problemSize *= 2
out:
Problem Size Seconds
10000000 1.234
The program uses the time() function in the time module to track the
running time. This function returns the number of seconds that have
elapsed between the current time on the computer’s clock and
January 1, 1970 (also called The Epoch). Thus, the difference
between the results of two calls of time.time() represents the elapsed
time in seconds.
Counting Instructions
Another technique used to estimate the efficiency of an algorithm is
to count the instructions executed with different problem sizes.
These counts provide a good predictor of the amount of abstract
work an algorithm performs, no matter what platform the algorithm
runs on. Keep in mind, however, that when you count instructions,
you are counting the instructions in the high-level code in which the
algorithm is written, not instructions in the executable machine
language program.
When analyzing an algorithm in this way, you distinguish between
two classes of instructions:
➢ Instructions that execute the same number of times regardless
of the problem size
➢ Instructions whose execution count varies with the problem
size.
For now, you ignore instructions in the first class, because they do
not figure significantly in this kind of analysis. The instructions in the
second class normally are found in loops or recursive functions. In
problemSize = 1000
print("%12s%15s" % ("Problem Size", "Iterations"))
for count in range(5):
number = 0
# The start of the algorithm
work = 1
for j in range(problemSize):
for k in range(problemSize):
number += 1
work += 1
work -= 1
# The end of the algorithm
print("%12d%15d" % (problemSize, number))
problemSize *= 2
As you can see, the number of iterations is the square of the problem
size (Fig. 2-1).
Questions:
Write a program that counts and displays the number of
iterations of the Following loop:
while problemSize > 0:
problemSize = problemSize // 2
هي طريقة لتقييم الخوارزميات وتعتمد علي حساب عدد خطوات البرنامج أو خطوات التنفيذ
بدل ما ننفذ الكود علي أكثر من جهاز كمبيوتر وتقييمه فممكنstep execution
في التقييم أفضلbig O مواصفات األجهزة تتحكم في التقييم فيستخدم
في سطور في البرنامج تدخل في الحساب وسطور ال تدخلbig O عند حساب
1 هذه عبارة تخصيص تحسبint c=10 ال تحسب بينماint c مثال
public void max(int a, int b) no
{ no
…program instructions
} no
Examples:
(1) 1+4N= O(N)
(4) 3 log n+ 2 is O(log n). 3 logn+2 ≤ 5logn, for n ≥ 2. Note that log n
is zero for n = 1
(6) 2n+2 is O(2n). As 2n+2 = 2n ·22 = 4·2n; hence, we can take c = 4 and n0
= 1 in this case.
(7) 2n+100 log n is O(n). As 2n+100 log n ≤ 102 n, for n ≥ n0 = 1;
hence, we can take c= 102 in this case.
(8) sin N=O(1), 10=O(1), 1010=O(1)
(9) ∑𝑁 2
𝑖=1 𝑖 ≤ 𝑁. 𝑁 = 𝑂(𝑁 )
Arithmetic operations: *, /, %, +, -
SE (step execution)
Example 2-1:
Write method to calculate total= 1+2+ 3+ ….+ n
Number of iterations
public int sum (int n) 0
{ 0
int i, total; 0
total =0; 1
for (i=1; i<=n; i++) n+1
{ 0
total= total+i; 2*n # 2 because assign, sum
} 0
return total; 1
Second Level Students 2021/2022 23
Data Structures and Algorithms with Python
}
مرات7 مرات ومقارنتها6 تنفذfor اذنn=6 مثال
So SE= 1+ n+1 + 2n +1= 3n+3 ➔ O(n) do not consider
constants
سيزداد وقت تنفيذ الكودn اذن في هذا الكود كلما زادت
Example 2-2:
𝑛(𝑛+1)
Write method to calculate ∑𝑛𝑖=1 𝑖 =
2
public int sum (int n) 0
{ 0
int total; 0
total= n*(n+1)/2; 1+1+1+1 #assign,mul,add,div
return total; 1
}
SE =4+1=5 =O(5)➔ O(1) عدد الجمل
Note: O(1000000) = O(1) ال تعني جملة واحدة
)n لكنها تعني أن الكود ليس دالة زمنية مهما نفذته يأخذ نفس الوقت (لن يعتمد علي
Example 2-3:
Write method to calculate ∑20
𝑖=1 𝑖 i.e. sum= 1+2+3+ ….+20
public int sum ( ) 0
{ 0
int i, total; 0
total=0; 1
for (i=1; i<=20; i++) 20+1
{ 0
total=total+i; 2*20 #assign, add 20 time
} 0
return total; 1
} 0
SE= 1+21+40+1=63
Step execution O(63)= O(1) or O(c)
Example 2-4:
int sumList(int A[ ], int n) 0
{ 0
int sum=0, i; 1
for(i=0; i< n; i++) n+1
sum= sum+A[i]; 3n # array access, sum, assign
return sum; 1
}
Total= 1+ n+1 +3n +1= 4n+3 O(n)
Example 2-5:
Compute SE (step execution) and big O for the following code:
x=0; 1
for (i= -2; i<n; i++) 2+n+1
{ 0
x=x+i; 2*(n+2)
y=x+2; 2*(n+2)
} 0
print(x); 1
SE=1+n+3+2n+4+2n+4+1=5n+13 ➔ O(n)
Example 2-6:
Compute SE (step execution) and big Oh for the following code:
float sum (float list[ ], int n) 0
{ float total=0; 1
int i; 0
for (i=0; i<n; i++) n+1
total+=list[i]; 3n # array access, sum
return total; 1
} SE=1+n+1+3n+1=4n+3 ➔ Time Complexity: O(n)
Example 2-7:
Compute SE (step execution) and big O for the following code:
int i; 0
i=1; 1
for(i; i<n; i= i*2) log (n) *2 so the log base is 2
print(i); 1
SE=1+ log2(n) + 1 ➔ O(log2 (n))
Example 2-8:
Write a method to calculate ∑𝑁 3
𝑖=1 𝑖 . Compute SE (step execution) and
big O.
int sum (int n)
{
int total; 0
total=0; 1 (one operation)
for (int i=1; i<=n; i++) n+1
total+=i*i*i; 4n (4 operations: assign, add, 2mul, n times)
return total; 1
}
SE=1+ n+1+4n+1 ➔ O(n)
Example 2-9:
Add two matrices C(m, n)= A(m, n)+ B(m, n)
Algorithm add(A, B, m, n)
Input: +ve integers m, n and two dimensional arrays of numbers A and
B (rows 1:m, cols 1:n)
Output: a two-dimensional array C, the sum of A and B
end for
end for
return C mn (access matrix C)
SE= m+1+n+1+6mn O(n2)
Example 2-10:
work = 1 1
for x in range(problemSize): n
work += 1 2*(n-1)
work -= 1 1
SE= 1+n+2n-2+1 O(n)
Example 2-12:
What is the time Complexity of the following code:
I=n
while I>0:
for j in range(n): n
print(j)
I=I/2 log n inside while
O(n log n)
Example 2-13: (nested for)
Compute SE (step execution) and big O for the following code:
int func(a[ ], n) 0
{ 0
int x=5; 1
for (i=1; i<=n; i++) n+1
{ 0
for (j=1; j<n; j++) n
{ 0
x=x+ i + j; n-1 *n تجاهلنا تكرار الجمع
print(x); n-1 عدد مرات تنفيذfor
} 0
} 0
} 0
SE= 1+ n+1+ n*[ n+ n-1+ n-1]= n+2+3n2-2n= 3n2-n+2
O(n2)
Example 2-14:
Estimate the time complexity (big O) for the following codes:
def func1(n):
for var1 in range(m):
if m%2 == 0:
for var2 in range(m):
print(m+var1+var2)
return(0)
nested for ➔ O(m2) or O(n2)
Example 2-15:
def func2(n):
for var1 in range(n):
print(var1)
print(‘Hello’)
return(0)
O(n)
Example 2-16:
def func3(n):
for j in range(n):
for k in range(n):
for x in range(n):
print(j,k,x)
return(0)
nested for ➔ O(n3)
Example 2-17:
for (int i=0; i<n;i++){
for(int j=1; j<=i; j*=2){
print(I, j) } O(n log n)
Note:
L is a list so: O(1) ==O(c)
L[1] O(1)
L[i]=0 O(1)
len(L) O(1)
l.append(6) O(1)
Q3)
def linear_algo(items):
for item in items: execute 4 times
print(item)
linear_algo([4, 5, 6, 8]) O(n)
a single loop from 1 to N, its complexity is linear – O(N)
Q4)
def quadratic_algo(items):
for item in items:
for item2 in items:
Q5)
for i in range(5):
print ("Python is awesome")
O(5)
Q6)
def fact(n):
product = 1
for i in range(n):
product = product * (i+1)
return product
print (fact(5))
Q7)
def fact2(n):
if n == 0:
return 1
else:
return n * fact2(n-1)
print (fact2(5))
Q8)
for (int i = 1; i <= N; i++) { N2
for (int j = 1; j <= N; j++) {
statement1;
}
}
for (int i = 1; i <= N; i++) {
statement2; 4N
statement3;
statement4;
statement5;
}
N2+4N O(N2)
How many statements will execute if N = 10? If N = 1000?
{ 0 because j depends on i
𝑛(𝑛+1)
x= i+j; *2 في هذه الحالة مجموع متسلسلة حسابية
2
𝑛(𝑛+1)
y= x+3; *2
2
} 0
} 0
SE= ???
𝑛(𝑛+1)
= 1+n+1+n+ + 1+n(n+1) + n(n+1)
2
O(n2)
Example:
Compute SE (step execution) and big O for the following code:
for (int i=0; i<n; i++) n
for (int j=0; j< n; j++) n
for (int k=0; k<n; k=k*2) log2 n
print i+j+k;
SE=n*n*log n
O(n2 log2 n)
Example:
Compute SE (step execution) and big O for the following code:
for (int i=n/2; i<n; i++) n/2
for (int k=0; k< n; k=k*2) log2 n
for (int j=0; j<n; j=j*2) log2 n
print i+j+k;
SE=(n/2)*log n*log n
O(n log2 n2)
Example:
long Fatorial(int n)
{
if(n==0)
{
return 1;
}
else
{
return n*Factorial(n-1);
}
}
O(N)
Example:
long Fibonacci(int n)
{
if(n==0)
{
return 1;
}
else if (n==1)
{
return 1;
}
else
{
Example: (while)
Compute SE (step execution) and big O for the following code:
int i=0;
while(i<5) { //runs 5 times
i++;
}
O(11) O(c)
Example:
int i = 1;
do
{
i++;
} while(i<=n);
O(n)
Example:
def fib(n, counter):
“”” Count the number of iterations in the Fibonacci function
“””
Sum =1
First=1
Second=1
Count=3
while Count <= n:
counter.increment()
Sum= First+ Second
First= Second
Second= Sum
Count +=1
return sum
Example: (if/Else)
Compute SE and big O for the following code:
if (condition) {
sequence of statements 1
}
else {
sequence of statements 2
}
Exercises
1- big-O notation to classify:
(a) 2n + 4n2 + n+1
(b) 3 n2 + 6
(c) n3 + n2 - n
2- For problem size n, algorithms A and B perform n2 and ½ n2 +1/2 n
min() Parameters
Output
The smallest number is: 2
Output
Question:
Write code to return with the index of the minimum item of a list, if
the list is not empty and the items are in arbitrary order. Here is the
code for the algorithm, in function indexOfMin:
def indexOfMin(lyst):
"""Returns the index of the minimum item."""
minIndex = 0
currentIndex = 1
begins in the same manner, starting with the first element until the
desired element is found. In linear search, we cannot determine that
a given search value is present in the sequence or not until the entire
list is traversed.
The search process stops when the target is found, or the current
beginning position is greater than the current ending position.
Example:
Suppose we want to search 10 in a sorted array of elements, then we
first determine the middle element of the array. As the middle item
contains 18, which is greater than the target value 10, so can discard
the second half of the list and repeat the process to first half of the
array. This process is repeated until the desired target item is located
in the list. If the item is found then it returns True, otherwise False.
L=0
R=10
Mid=int(L+(R-L)/2))
=int(0+(10-0)/2))=5
L=0, r= mid-1=4
Mid=int(L+(r-L)/2))=2
Arr[2]<10 go right
L=mid+1=3
L=3 ,r=4
Once again, the worst case occurs when the target is not in the list.
How many times does the loop run in the worst case? This is equal to
the number of times the size of the list can be divided by 2 until the
quotient is 1.
The binary search for the target item 10 requires four comparisons,
whereas a sequential search would have required 10 comparisons.
This algorithm actually appears to perform better as the problem
size gets larger. Our list of nine items requires at most four
comparisons, whereas a list of 1,000,000 items requires at most only
20 comparisons!
Binary search is certainly more efficient than sequential search.
However, the kind of search algorithm you choose depends on the
organization of the data in the list. There is an additional overall cost
to a binary search, having to do with keeping the list in sorted order.
In a moment, you’ll examine some strategies for sorting a list and
analyze their complexity. But first read a few words about comparing
data items.
Exercise
Suppose that a list contains the values 20, 44, 48, 55, 62, 66, 74, 88,
93, 99 at index positions 0 through 9. Trace the values of the variables
left, right, and midpoint in a binary search of this list for the target
value 90. Repeat for the target value 44.
if arr[i] == arr[j]:
print arr[i],
j += 1
i += 1
O(n2)
Exercises:
1- Given a list of n numbers, find the element, which appears
maximum number of times.
2- Given a list of n elements. Find the majority element, which
appears more than n/2 times. Return 0 in case there is no majority
element.
3- Given two list X and Y. Find a pair of elements (xi, yi) such that
xi∈X and yi∈Y where xi+yi=value.
4- Given a sorted list arr[] find the number of occurrences of a
number.
5- Given a list of n elements, write an algorithm to find three
elements in a list whose sum is a given value.
Second Level Students 2021/2022 48
Data Structures and Algorithms with Python
6- Given a sorted list, find a given number. If found return the index if
not, find the index of that number if it is inserted into the list.
Fibonacci Search:
It is a comparison based technique that uses Fibonacci numbers to
search an element in a sorted array. It has a O(log n) time complexity.
• F(n) = F(n-1) + F(n-2), F(0) = 0, F(1) = 1 is way to define
fibonacci numbers recursively.
• First few Fibinacci Numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55,
89, 144, …
Let the element to be searched is x, then the idea is to first find the
smallest Fibonacci number that is greater than or equal to length of
given array. Let the Fibonacci number be fib(nth Fibonacci number).
Use (n-2)th Fibonacci number as index and say it is i, then compare
a[i] with x, if x is same then return i. Else if x is greater, then search the
sub array after i, else search the sub array before i.
# call
arr = [10, 22, 35, 40, 45, 50, 80, 82, 85, 90, 100]
n = len(arr)
x = 80
print("Found at index:", fibMonaccianSearch(arr, x, n))
Jump Search:
▪ For sorted arrays
▪ To Check fewer elements (than linear search) by jumping fixed
steps
▪ Or skipping some elements
Jump Search Algorithm
▪ Step 1 Calculate Jump size.
▪ Step 2 Jump from index i to index i+jump
▪ Step 3 If x== arr[i+jump] return x
Else jump back a step
▪ Step 4 Perform linear search
Example:
➢ Assume the following sorted array:
➢ Search for 77
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
❑ Calculate :
▪ Jump size =3
❑ Jump size =3
Arr[3]<77
❑ Jump size =3
Arr[6]<77
❑ Jump size =3
❑ Jump size =3
Selection Sort
Perhaps the simplest strategy is to search the entire list for the
position of the smallest item.
If that position does not equal the first position, the algorithm swaps
the items at those positions. The algorithm then returns to the
second position and repeats this process, swapping the smallest item
with the item at the second position, if necessary. When the
algorithm reaches the last position in the overall process, the list is
sorted. The algorithm is called selection sort because each pass
through the main loop selects a single item to be moved.
Fig. 5-3 shows the states of a list of five items after each search and
swap pass of a selection sort. The two items just swapped on each
pass have asterisks next to them, and the sorted portion of the list is
shaded.
Unsorted List After 1st Pass After 2nd Pass After 3rd pass After 4th pass
5 1* 1 1 1
3 3 2* 2 2
1 5* 5 3* 3
2 2 3* 5* 4*
4 4 4 4 5*
Fig. 5-3 Data during Selection sort
This function includes a nested loop. For a list of size n, the outer
loop executes n- 1 times. On the first pass through the outer loop, the
inner loop executes n -1 times. On the
second pass through the outer loop, the inner loop executes n -2
times. On the last pass through the outer loop, the inner loop
executes once. Thus, the total number of comparisons for a list of
size n is the following:
(n-1)+ (n-2) + ….+1= n(n-1)/2= ½ n2 -1/2 n
Time Complexity O(n2)
Example:
Use the selection sort to order the following integers:
64 25 12 22 11
Solution:
64 25 12 22 11
11 25 12 22 64
11 12 25 22 64
11 12 22 25 64
11 12 22 25 64
Bubble Sort
Another sort algorithm that is relatively easy to conceive and code is
called a bubble sort.
Its strategy is to start at the beginning of the list and compare pairs of
data items as it moves down to the end. Each time the items in the
pair are out of order, the algorithm swaps them.
This process has the effect of bubbling the largest items to the end of
the list. The algorithm then repeats the process from the beginning of
the list and goes to the next-to-last item, and so on, until it begins
with the last item. At that point, the list is sorted.
Unsorted List After 1st Pass After 2nd Pass After 3rd pass After 4th pass
5 4* 4 4 4
4 5* 2* 2 2
2 2 5* 1* 1
1 1 1 5* 3*
3 3 3 3 5*
Fig. 5-4 Data during Bubble sort
def bubbleSort(lyst):
n = len(lyst)
while n > 1: # Do n - 1 bubbles
i = 1 # Start each bubble
while i < n:
if lyst[i] < lyst[i - 1]: # Exchange if needed
swap(lyst, i, i - 1)
i += 1
n -= 1
OR
def bubbleSort(arr):
n = len(arr)
# Traverse through all array elements
for i in range(n):
# Last i elements are already in place
for j in range(0, n-i-1):
# traverse the array from 0 to n-i-1
# Swap if the element found is greater
# than the next element
if arr[j] > arr[j+1] :
arr[j], arr[j+1] = arr[j+1], arr[j]
# call
arr = [64, 34, 25, 12, 22, 11, 90]
bubbleSort(arr)
print ("Sorted array is:")
for i in range(len(arr)):
print ("%d" %arr[i])
First Pass:
(51428) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and
swaps since 5 > 1.
(15428) ( 1 4 5 2 8 ), Swap since 5 > 4
(14528) ( 1 4 2 5 8 ), Swap since 5 > 2
Second Level Students 2021/2022 60
Data Structures and Algorithms with Python
(14258)
( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not
swap them.
Second Pass:
(14258) (14258)
(14258) ( 1 2 4 5 8 ), Swap since 4 > 2
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but our algorithm does not know if it
is completed. The algorithm needs one whole pass without any swap
to know it is sorted.
Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)
You can make a minor adjustment to the bubble sort to improve its
best-case performance to linear. If no swaps occur during a pass
through the main loop, then the list is sorted. This can happen on any
pass and in the best case will happen on the first pass. You can track
the presence of swapping with a Boolean flag and return from the
function when the inner loop does not set this flag.
def bubbleSortWithTweak(lyst):
n = len(lyst)
while n > 1:
swapped = False
i=1
while i < n:
if lyst[i] < lyst[i - 1]: # Exchange if needed
swap(lyst, i, i - 1)
swapped = True
i += 1
if not swapped: return # Return if no swaps
n -= 1
Note that this modification only improves best-case behavior. On the
average, the behavior of this version of bubble sort is still O(n2 ).
Insertion Sort
Our modified bubble sort performs better than a selection sort for
lists that are already sorted. But our modified bubble sort can still
perform poorly if many items are out of order in the list. Another
algorithm, called an insertion sort, attempts to exploit the partial
Ordering of the list in a different way. It builds the sorted sequence
one number at a time. This is a suitable sorting technique in playing
card games.
Insertion sort provides several advantages:
✓ Simple implementation
✓ Efficient for (quite) small data sets
✓ Adaptive (i.e., efficient) for data sets that are already
substantially sorted: the time complexity is O(n + d), where d is
the number of inversions.
✓ More efficient in practice than most other simple quadratic
(i.e., O(n2)) algorithms such as selection sort or bubble sort; the
best case (nearly sorted input) is O(n)
✓ Stable; i.e., does not change the relative order of elements
with equal keys
Second Level Students 2021/2022 62
Data Structures and Algorithms with Python
OR
# Python program for implementation of Insertion Sort
def insertionSort(arr):
# Traverse through 1 to len(arr)
for i in range(1, len(arr)):
key = arr[i]
# Move elements of arr[0..i-1], that are
# greater than key, to one position ahead
# of their current position
j = i-1
while j >=0 and key < arr[j] :
arr[j+1] = arr[j]
j -= 1
arr[j+1] = key
# Call
arr = [12, 11, 13, 5, 6]
insertionSort(arr)
print ("Sorted array is:")
for i in range(len(arr)):
print ("%d" %arr[i])
Time Complexity:
✓ Worst Case Performance O(n2)
✓ Best Case Performance(nearly) O(n)
✓ Average Case Performance O(n2)
The analysis focuses on the nested loop. The outer loop executes n −
1 times. In the worst case, when all the data are out of order, the
inner loop iterates once on the first pass through the outer loop,
twice on the second pass, and so on, for a total of ½ n2- ½ n times.
Thus, the worst-case behavior of insertion sort is O(n2 ).
The more items in the list that are in order, the better insertion sort
gets until, in the best case of a sorted list, the sort’s behavior is linear.
In the average case, however, insertion sort is still quadratic.
Fig. 5-4 shows the states of a list of five items after each pass through
the outer loop of an insertion sort. The item to be inserted on the
next pass is marked with an arrow; after it is inserted, this item is
marked with an asterisk.
Unsorted List After 1st Pass After 2nd Pass After 3rd pass After 4th pass
2 2 1* 1 1
5 5 (no insertion) 2 2 2
1 1 5 4* 3*
4 4 4 5 4
3 3 3 3 5
Fig. 5-4 Data during Insertion sort
Faster Sorting
The three sort algorithms considered thus far have O(n2 ) running
times. There are several variations on these sort algorithms, some of
which are marginally faster, but they, too, are O(n2 ) in the worst and
average cases. However, you can take advantage of some better
algorithms that are O(n log n). The secret to these better algorithms is
a divide-and-conquer strategy. That is, each algorithm finds a way of
breaking the list into smaller sublists.
These sublists are then sorted recursively. Ideally, if the number of
these subdivisions is log(n) and the amount of work needed to
rearrange the data on each subdivision is n, then the total complexity
of such a sort algorithm is O(n log n). In Table 5-1, you can see that
the growth rate of work of an O(n log n) algorithm is much slower
than that of an O(n2 ) algorithm.
Quick sort
Quick sort is a divide and conquer algorithm. Quick sort first divides a
large list into two smaller sublists: the low elements and the high
elements. Quick sort can then recursively sort the sub-lists.
Advantages:
- One of the fastest algorithms on average.
- Does not need additional memory (the sorting takes place in
the array - this is called in-place processing).
Example:
Figure 5-5 illustrates these steps as applied to the numbers 12 19 17
18 14 11 15 13 16.
step Action carried out List state after the action
Let the list with pivot 14
1 Swap the pivot with the last item
2 Establish the boundary before the
first item.
3 Scan for the first item less than the
pivot.
4 Swap this item with the first item
after the boundary. In this example
the item swapped with itself.
5 Advance the boundary.
6 Scan for the next item less than the
pivot.
7 Swap this item with the first item
after the boundary
8 Advance the boundary.
9 Scan for the next item less than the
privot
10 Swap this item with the first item
after the boundary
11 Advance the boundary
12 Scan for the next item less than the
pivot (no one)
13 Interchange the pivot with the first
item after the boundary
Partitioning a sublist
Implementation of Quicksort
The quicksort algorithm is most easily coded using a recursive
approach. The following script runs quicksort on a list of 20
randomly ordered integers.
def partition(arr,low,high):
i = ( low-1 ) # index of smaller element
pivot = arr[high] # pivot
for j in range(low , high):
# If current element is smaller than or
# equal to pivot
if arr[j] <= pivot:
# increment index of smaller element
i = i+1
arr[i],arr[j] = arr[j],arr[i]
arr[i+1],arr[high] = arr[high],arr[i+1]
return ( i+1 )
# The main function that implements QuickSort
# arr[] --> Array to be sorted,
# low --> Starting index,
# high --> Ending index
pi = partition(arr,low,high)
# call
arr = [10, 7, 8, 9, 1, 5]
n = len(arr)
quickSort(arr,0,n-1)
print ("Sorted array is:")
for i in range(n):
print ("%d" %arr[i])
Example:
Time Complexity:
➢ Worst Case Performance O(n2)
➢ Best Case Performance(nearly) O(n log2 n)
➢ Average Case Performance O(n log2 n)
Merge Sort
Merge sort is based on Divide and conquer method. It takes the list to
be sorted and divide it in half to create two unsorted lists. The two
unsorted lists are then sorted and merged to get a sorted list. The
two unsorted lists are sorted by continually calling the merge-sort
algorithm; we eventually get a list of size 1 which is already sorted.
The two lists of size 1 are then merged.
Merge() function:
It takes the array, left-most , middle and right-most index of the array
to be merged as arguments.
Finally copy back the sorted array to the original array.
break
return result
def mergesort(list):
if len(list) < 2:
return list
middle = int(len(list)/2)
left = mergesort(list[:middle])
right = mergesort(list[middle:])
# Call
seq = [12, 11, 13, 5, 6, 7]
print("Given array is")
print(seq);
print("\n")
print("Sorted array is")
print(mergesort(seq))
OR
def mergeSort(lyst):
The merge function combines two sorted sublists into a larger sorted
sublist. The first Sublist lies between low and middle and the second
between middle + 1 and high. The process consists of three steps:
1. Set up index pointers to the first items in each sublist. These are at
positions low and middle + 1.
2. Starting with the first item in each sublist, repeatedly compare
items. Copy the smaller item from its sublist to the copy buffer and
advance to the next item in the sublist. Repeat until all items have
been copied from both sublists. If the end of one sublist is reached
before the other’s, finish by copying the remaining items from the
other sublist.
3. Copy the portion of copyBuffer between low and high back to the
corresponding positions in lyst.
Time Complexity:
➢ Worst Case Performance O(n log2 n)
➢ Best Case Performance(nearly) O(n log2 n)
➢ Average Case Performance O(n log2 n)
The running time of the merge function is dominated by the two for
statements, each of which loops (high − low + 1) times.
Consequently, the function’s running time is O(high − low), and all
the merges at a single level take O(n) time. Because
mergeSortHelper splits sublists as evenly as possible at each level,
the number of levels is O(log n), and the maximum running time for
this function is O(n log n) in all cases.
The merge sort has two space requirements that depend on the list’s
size. First, O(log n) space is required on the call stack to support
recursive calls. Second, O(n) space is used by the copy buffer.
Questions:
1- Why is merge sort an O(n log n) algorithm in the worst case?
2- Describe the strategy of quicksort and explain why it can reduce the
time complexity of sorting from O(n2 ) to O(n log n).
3- Explain why quicksort is not O(n log n) in all cases? Describe the
worst-case situation for quicksort and give a list of 10 integers, that
would produce this behavior.
4- The partition operation in quicksort chooses the item at the midpoint
as the pivot. Describe two other strategies for selecting a pivot value.
def fib(n):
"""The recursive Fibonacci function."""
if n < 3:
return 1
else:
return fib(n - 1) + fib(n - 2)
Note that fib(4) requires only 4 recursive calls, which seems linear,
but fib(6) requires 2 calls of fib(4), among a total of 14 recursive
calls. Indeed, it gets much worse as the problem size grows.
Represent as balanced tree.
The number of recursive calls generally is 2n+1 -2 in fully balanced call
trees, where n is the argument at the top or the root of the call tree.
This is clearly the behavior of an exponential, O(kn) algorithm.
Although the bottom two levels of the call tree for recursive Fibonacci
are not completely filled in, its call tree is close enough in shape to a
fully balanced tree to rank recursive Fibonacci as an exponential
algorithm. The constant k for recursive Fibonacci is approximately
1.63.
The sum at the end of the loop is the nth Fibonacci number. Here is
the pseudocode for this algorithm:
Set sum to 1
Set first to 1
Set second to 1
Set count to 3
While count <= N
Set sum to first + second
Set first to second
Set second to sum
Increment count
second = sum
count += 1
return sum
Questions:
Q1) Choose the correct answer:
1- A binary search assumes that the data are:
a. Arranged in no particular order
b. Sorted
2- A selection sort makes at most:
a. n2 exchanges of data items b. n exchanges of data items
3- An example of an algorithm whose best-case, average-case, and worst-
case behaviors are the same is:
a. Sequential search
b. Selection sort
c. Quicksort
4- The recursive Fibonacci function makes approximately:
a. n2 recursive calls for problems of a large size n
b. 2n recursive calls for problems of a large size n
Q2) The list method reverse reverses the elements in the list. Define
a function named reverse that reverses the elements in its list
argument (without using the method reverse). Try to make this
function as efficient as possible, and state its computational
complexity using big-O notation.
Q4) An alternative strategy for the expo function uses the following
recursive definition:
expo(number, exponent)
=1, when exponent=0
=number *expo(number, exponent-1), when exponent is
odd
=(expo(number, exponent / 2))2, when exponent is even
Define a recursive function expo that uses this strategy, and state its
Computational complexity using big-O notation.
As your program gets longer, split it into several files for easier
maintenance. A Python program consists of one or more
modules. You may also want to use your own function that you
have written in several programs without copying its definition
into each program.
So,
It only enters the module name fibo not the functions fib and
fib2. Use the module name to access the functions:
import fibo
fibo.fib(1000)
# out: 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
print(fibo.fib2(100))
# out: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
print(fibo.__name__)
# out: fibo
OR:
from fibo import *
print(fib(500))
مع حلول عام ،2009معظم برامج األنظمة الجديدة أصبحت تؤدي غالبية العمليات على النظام مستخدمة
ملفات المكتبات البرمجية .تلك المكتبات أتاحت الوظائف التي تحتاج إليها التطبيقات الحديثة بحيث أصبح أكثر
األجزاء البرمجية لتلك التطبيقات مكتوبا في مكتبات النظام نفسه .أول ما بدأنا البرمجة ،كنا نكتب كل الكود
في ملف واحد وبشكل موحد ،اي أنك تجد كل البرنامج في دالة واحدة .
األمر كان عاديا في البداية ألننا لم نكتب سوى عمليات بسيطة ،لكنه صار مزعجا عندما كبرت المشكالت
المراد حلها وكبر حلها ،ويستدعي كتابة "كود" طويل يحوي الكثير من التفاصيل والعمليات ،وبالتالي أصبح
فهمه للقارئ صعبا ،وشكله مزعجا نوعا ما ،ناهيك عن نسبة التكرار الكبيرة لكثير من العمليات .
لذا استدعى األمر عزل كل ما يمكن حسابه أو عمله على حدة في شكل "دوال" تختص كل منها بأداء مهمة
"واحدة" مخصصة ،وهذا ما كنا نسميه ب"نظام الوحدات" في البداية كنا نكتبها في نفس ملف البرنامج ،
ونستدعيها في الدالة الرئيسية للبرنامج (. )main function
والسؤال هنا كان كاآلتي :إذا احتجت إلحدى هذه الدوال في برنامج آخر ،فكيف أستخدمها دون الحاجة إلعادة
كتابتها مجددا ؟
هنا تأتي المكتبة ..أول مرة كتبت مكتبة كان بلغة " . "Cمفهومها كان بسيطا َ :ج ِّ ّمع كل الدوال التي تظن أنك
ستحتاج الستخدامها أكثر من مرة ،واحفظها في هذا الملف ،الذي ستطلبه عند الحاجة .
Note
Python keywords and names are case-sensitive. Thus, while
keyword is not While.
Python keywords are spelled in lowercase letters and are color-
coded in orange in an IDLE window.
if __name__ == “__main__”:
Do something
sentences.
>>>type(2)
Class <int>
Some data structures, such as strings, tuples, lists, and
dictionaries, also have literals, as you will see later.
String Literals
You can enclose strings in single quotes, double quotes, or sets of
three double quotes or three single quotes (this notation is useful for a
string containing multiple lines) of text. Character values are single-
character strings.
The \ character is used to escape.
NOTE:
if a==1:
print("hey")
if a==2:
print("bye")
Error
>>> print(“Hello”)
To end the interactive session: You can either use exit() or Ctrl-D
(i.e. EOF) . The brackets behind the exit function are crucial.
>>> 2+2
>>>4.567 * 8.323 * 17
Python follows the usual order of operations in expressions. The
standard order of operations is expressed in the following
enumeration: (operator precedence)
>>>3 + 2 * 4 # OUTPUT: 11
Variables
# out: 30
# variables
balance=300 # integer
print(balance)
balance = balance + 10 # add 10 to balance
print(balance) # 310
<identifier> = <expression>
a, b = b, a # swap a,b
value = min(100,
200) \ # escape operator
*3
print(value) # 300
Example:
➢ So:
Operators
Strings
city = """
... Toronto is the largest city in Canada
Second Level Students 2021/2022 98
Data Structures and Algorithms with Python
... and the provincial capital of Ontario.
... It is located in Southern Ontario on the
... northwestern shore of Lake Ontario.
... """
print(city)
OUTPUT: '.-..-..-..-.'
Functions
Define and Call. Python presents some standard functions like
print, input, abs, and min. Many other functions are available by
import from modules.
Example:
radius = float(input("Radius: ")) # conversion type
print("The area is", 3.14 * radius ** 2)
Function Arguments:
x=round(3.1674)
print(x) # out: 3
print("x=",x) # out: x=3
x=round(3.1674,1)
print("x=",x) # out: x=3.2
Example:
Class Class2:
def __init__(self):
X=[1, 2]
self.func(X)
print X
def func(self, Y):
Y.append(5)
Control Statements
if <Boolean expression>:
<sequence of statements>
if <Boolean expression>:
<sequence of statements>
else:
<sequence of statements>
if <Boolean expression>:
<sequence of statements>
elif <Boolean expression>:
<sequence of statements>
else:
<sequence of statements>
Loop Statements
while <Boolean expression>:
<sequence of statements>
Example:
# This code prints the product of the numbers from 1 to 10:
prod = 1
value = 1
while value <= 10:
prod *= value # same as prod= prod*value
value += 1
print(prod)
Example:
# This code prints the product of the numbers from 1 to 10:
prod = 1
value = 1
for value in range(1,11):
prod *= value # same as prod= prod*value
value += 1
print(prod)
Exercises:
1. Write a program that takes the radius of a sphere (a floating-point
number) as input and outputs the sphere’s diameter, circumference,
surface area, and volume.
2. If π/4= 1-1/3+ 1/5- 1/7 + ……
Write a program that allows the user to specify the number of
iterations used in this approximation and displays the resulting value.
3. Write a program using a while loop that asks the user for a number,
and prints a countdown from that number to zero.
4. Write a program to generate Fibonacci numbers less than 500. The
Fibonacci sequence is generated by adding the previous two terms,
as 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do
not exceed 1000, find the sum of the even-valued terms.
5. Collections
Chapter Objectives
After completing this chapter, you will be able to:
• Define the four general categories of collections—linear,
hierarchical, graph, and unordered.
• List the specific types of collections that belong to each of the
four categories of collections.
• Recognize which collections are appropriate for given
applications.
• Describe the commonly used operations on every type of
Collection.
• Describe the difference between an abstract collection type
and its implementations.
Collection Types
Python includes several built-in collection types: the string,
the list, the tuple, the set, and the dictionary.
• list – It is a mutable container datatype and stores different
types of elements. It also allows duplicate because it follows
index-based storage. Using square brackets [].
• DefaultDict
• Counter
• namedTuple
• deque
• chainMap
• UserDict
• UserList
• UserString
Note: 1- for using these above data structures you need to import
collections module.
2- normal dic: Each key is separated from its value by a colon (:), the
items are separated by commas, and the whole thing is enclosed in
curly braces. An empty dictionary without any items is written with
just two curly braces, like this: {}.
Keys are unique within a dictionary while values may not be. The
values of a dictionary can be of any type, but the keys must be of an
immutable data type such as strings, numbers, or tuples.
Example:
dict = {'Name': 'Ahmed', 'Age': 7, 'Class': 'First'}
print ("dict['Name']: ", dict['Name'])
print ("dict['Age']: ", dict['Age'])
output:
dict['Name']: Ahmed
dict['Age']: 7
Example:
dict = {'Name': 'Ahmed', 'Age': 7, 'Class': 'First'}
Example:
# normal dictionary test
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
#dic2.py
# create dic
squares = {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
print(squares.pop(4)) # remove a particular item out:16
print(squares) #out: {1: 1, 2: 4, 3: 9, 5: 25}
print(squares.popitem()) #will remove 5:25 out: (5,25)
print(squares) #out: {1: 1, 2: 4, 3: 9}
squares.clear() # remove all item
print(squares) # out: {}
del squares # delete the dic itself
print(squares) # Error: name 'squares' is not defined
Can be :
print(squares)
Or:
squares = {}
for x in range(6)
squares[x]=x*x
print(squares)
------------------
#dict5.py
squares = {1: 1, 3: 9, 5: 25, 7: 49, 9: 81}
print(1 in squares) # out: True
print(2 not in squares) # out: True
print(49 in squares) #out: False
------------------
#dict6.py
squares = {1: 1, 3: 9, 5: 25, 7: 49, 9: 81}
for i in squares:
print(squares[i])
# out:
#1
#9
#25
#49
#81
------------------------
#dict7.py
s = {0: 'False', 1: 'False'}
print(all(s)) #False
s = {} #True
print(all(s))
# 0 is False
# '0' is True
s = {'0': 'True'} #True
print(all(s))
------------------------
#dic10.py
# defining the dictionary
d = {'A' : [1, 2, 3, 4, 5, 6, 7, 8, 9], 'B' : 34,
'C' : 12,
'D' : [7, 8, 9, 6, 4] }
print(d)
#out:
# {'A': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'B': 34, 'C': 12, 'D': [7, 8, 9,
6, 4]}
-------------------------
Non-primitive DS
So:
Python built-in data structures
Python comes with a general set of built-in data structures:
• lists
• tuples
• string
• dictionaries
• sets
• others...
Lists
Some of the list data type methods of list objects:
list.append(x)
Add an item to the end of the list. Equivalent to a[len(a):] = [x].
list.extend(iterable)
Extend the list by appending all the items from the iterable.
Equivalent to a[len(a):] = iterable.
list.insert(i, x)
list.remove(x)
Remove the first item from the list whose value is equal to x. It raises
a ValueError if there is no such item.
list.pop([i])
Remove the item at the given position in the list, and return it. If no
index is specified, a.pop() removes and returns the last item in the
list. (The square brackets around the i in the method signature
denote that the parameter is optional, not that you should type
square brackets at that position.)
list.clear()
Return zero-based index in the list of the first item whose value is
equal to x. Raises a ValueError if there is no such item.
The optional arguments start and end are interpreted as in the slice
notation and are used to limit the search to a particular subsequence
of the list. The returned index is computed relative to the beginning
of the full sequence rather than the start argument.
list.count(x)
list.sort(key=None, reverse=False)
Sort the items of the list in place (the arguments can be used for sort
customization, see sorted() for their explanation).
list.reverse()
list.copy()
list2 = [5, 6]
print(mylist+list2) # [0, 1, 2, 3, 4, 5, 6] Concatenate to a single list
Mylist = Mylist*2 #Repeat [0, 1, 2, 3, 4, 0,1, 2, 3, 4]
Mylist [1:4] #Slicing
4 in Mylist # Searching True
Mylist.reverse() # [4,3,2,1,0]
Examples:
ff= [‘orange’, ‘apple’, ‘pear’, ‘banana’, ‘kiwi’, ‘apple’, ‘banana’]
ff.count(‘apple’)
ff.count(‘tangerine’) #0
ff.index(‘banana’)
ff.index(‘banana’, 4)# Find next banana starting a position 4
ff.reverse()
print(ff)
ff.append(‘grape’)
Tuple
A tuple is an immutable sequence of arbitrary data. Lists are created
by using parenthesis brackets consisting of values separated by
comma.
Cake=(‘C’, ‘a’, ‘k’, ‘e’)
print(type(Cake))
tup1=(1, 2, [5,6])
Dictionary
A Dictionary is a set of Key-Value pairs. Dictionaries are mutable.
Dictionaries are created using curly braces { }.
#dict8.py
#d = {0: 'False'} # 0 is False
#print(any(d)) #False
# 1 is True
#d = {0: 'False', 1: 'True'}
#print(any(d)) #True
# 0 and False are false
#d = {0: 'False', False: 0}
#print(any(d)) #False
# iterable is empty
#d = {}
#print(any(d)) #False
# 0 is False '0' is True
d = {'0': 'False'}
print(any(d)) #True
------------------------------
#dict8.py
testDict = {1: 'one', 2: 'two'}
print(testDict, 'length is', len(testDict)) #length is 2
testDict = {}
print(testDict, 'length is', len(testDict)) #length is 0
----------------------------------
#dict4.py
# Normal Dictionary example
odd_squares = {x: x*x for x in range(11) if x % 2 == 1}
print(odd_squares)
#output: {1: 1, 3: 9, 5: 25, 7: 49, 9: 81}
---------------------
#dicVowels.py
# vowels keys
keys = {'a', 'e', 'i', 'o', 'u' }
value = 'vowel'
# vowels keys
keys = {'a', 'e', 'i', 'o', 'u' }
value = [1]
------------------
#dic3.py
# Normal Dictionary example
marks = {}.fromkeys(['Math', 'English', 'Science'], 0)
#dic14.py
1- OrderedDict
Simply called ordered collection of elements. Whereas a normal
dict object does not follow the order.
Example:
from collections import OrderedDict
name=OrderedDict()
name[0]=’b’
name[1]=’e’
name[2]=’s’
name[3]=’e’
name[4]=’n’
name[5]=’t’
print(“before:”, name)
name[3]=’a’
print(“after:”, name)
Output:
2- DefaultDict
DefaultDict is used to create a dict with duplicate keys. It is also a
subclass of dictionary using factory function to find missing values.
It is not like dict class because in normal dict class does not allow
using duplicate keys. If we try to use duplicate keys it will take the
last value of the particular key.
To overcome this limitation, use DefaultDict.
Example:
#dict10.py
from collections import defaultdict
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)
d.items()
print(d) #out: {'yellow': [1, 3], 'blue': [2, 4], 'red': [1]})
3- counter
The counter is used to count the hashtable elements. It is also used
to count the items entered in the collection of a particular key. It is
also a subclass of the dictionary to count items.
It also performs some additional operations.
• element function – It returns the all elements of the counter in a list.
Example:
#dic12.py
from collections import Counter
Example:
#dic12.py
from collections import Counter
a=[10,10,20,30,40,10, 20,50,40,60,10,30,60,70]
print(Counter(a))
#out:
#Counter({10: 4, 20: 2, 30: 2, 40: 2, 60: 2, 50: 1, 70: 1})
----------------------
#dic11.py
from collections import Counter
c=Counter(['a','b','c','a','b','a'])
print(c) #out:Counter({'a': 3, 'b': 2, 'c': 1})
print(c['a']) #out: 3
c1=Counter({'a':3,'b':2,'c':1})
print(c1['b']) #out: 2
Collections notes:
The string and the list are probably the most common and
fundamental types of collections. Other important types of
collections include stacks, queues, priority queues, binary search
trees, heaps, graphs, bags, and various types of sorted collections.
Collections can be homogeneous, meaning that all items in the
collection must be of the same type, or heterogeneous, meaning the
items can be of different types. In many programming languages,
collections are homogeneous, although most Python collections can
contain multiple types of objects.
Linear Collections
The items in a linear collection, like people in a line, are
ordered by position. Each item except the first has a unique
predecessor, and each item except the last has a unique successor.
As shown in Fig. 4-1, D2’s predecessor is D1, and D2’s successor
is D3.
Hierarchical Collections
Data items in hierarchical collections are ordered in a
structure resembling an upsidedown-tree. Each data item except the
one at the top has just one predecessor, called its parent, but
potentially many successors, called its children. As shown in Figure 4-
2, D3’s predecessor (parent) is D1, and D3’s successors (children) are
D4, D5, and D6.
Graph Collections
A graph collection, or graph, is a collection in which each data
item can have many predecessors and many successors. As shown in
Fig. 4-3, all elements connected to D3 are considered to be both its
predecessors and its successors, and they are also called its
neighbors.
Unordered Collections
As the name implies, items in an unordered collection are not in any
order, and it is not possible to meaningfully speak of an item’s
predecessor or successor. Fig. 4-4 shows such a structure.
order you want, within the bag, the marbles are in no particular
order.
Sorted Collections
A sorted collection imposes a natural ordering on its items.
Examples are the entries in a phone book (the 20th century paper
variety) and the names on a class roster.
To impose a natural ordering, there must be some rule for comparing
items, such that itemi ≤ itemi+1, for the items visited in a sorted
collection.
Although a sorted list is the most common example of a sorted
collection, sorted collections need not be linear or ordered by
position. From the client’s perspective, sets, bags, and dictionaries
may be sorted, even though their items are not accessible by
position.
A special type of hierarchical collection, known as a binary search
tree, also imposes a natural ordering on its items.
A sorted collection allows the client to visit all its items in sorted
order. Some operations, such as searching, may be more efficient on
a sorted collection than on its unsorted cousin.
collections
Convert to another Create a new collection with the same items
collection type (cloning is a special case)
Insert an item Add item
Remove an item
Replace an item
Access or retrieve an
item
Type Conversion
Example:
To make a copy of a list and then compares the two lists using the is
and == operators. Because the two lists are not the same object, is
returns False. Because the two lists are distinct objects but are of the
same type and have the same structure (each pair of elements is the
same at each position in the two lists), == returns True.
>>> lyst1 = [2, 4, 8]
>>> lyst2 = list(lyst1)
>>> lyst1 is lyst2
False
>>> lyst1 == lyst2
True
Not only do the two lists in this example have the same structure, but
they share the same items. That is, the list function makes a shallow
copy of its argument list. These items are not themselves cloned
before being added to the new list; instead, mere references to these
objects are copied. This policy does not cause problems when the
items are immutable (numbers, strings, or Python tuples).
Implementations of Collections
Naturally, programmers who work with programs that include
collections have a rather different perspective on those collections
than the programmers who are responsible for implementing them in
the first place.
Programmers who use collections need to know how to instantiate
and use each type of collection. From their perspective, a collection
is a means for storing and accessing data items in some
predetermined manner, without concern for the details of the
collection’s implementation.
In other words, from a user’s perspective, a collection is an
abstraction, and for this reason, in computer science, collections are
also called abstract data types (ADTs). The user of an ADT is
concerned only with learning its interface, or the set of operations
that objects of that type recognize.
Developers of collections, on the other hand, are concerned with
implementing a collection’s behavior in the most efficient manner
possible, with the goal of providing
the best performance to users of the collections. Numerous
implementations are usually possible. However, many of these take
so much space or run so slowly that they can be dismissed as
pointless. Those that remain tend to be based on several underlying
approaches to organizing and accessing computer memory.
Some programming languages, like Python, provide only one
implementation of each of the available collection types. Other
languages, like Java, provide several.
For example,
Java’s java.util package includes two implementations of lists,
named ArrayList and LinkedList; and two implementations of sets
4- Namedtuple():
Normal Tuple:
A Tuple in Python is like a list. The difference between tuple and list is
that we cannot change the elements of the tuple once it is assigned,
whereas the elements of a list can be changes.
#tuple1.py
my_tuple = () #empty tuple
print(my_tuple) # out: ()
# nested tuple
my_tuple = ("mouse", [8, 4, 6], (1, 2, 3))
print(my_tuple) #out: ('mouse', [8, 4, 6], (1, 2, 3))
print(a) # 3
print(b) # 4.6
print(c) # dog
----------------
#tuple3.py
# Accessing tuple elements using indexing
my_tuple = ('p','e','r','m','i','t')
print(my_tuple[0]) # 'p'
print(my_tuple[5]) # 't'
# nested tuple
n_tuple = ("mouse", [8, 4, 6], (1, 2, 3))
# nested index
print(n_tuple[0][3]) # 's'
print(n_tuple[1][1]) # 4
----------------------
#tuple4.py
# Negative indexing for accessing tuple elements
my_tuple = ('p', 'e', 'r', 'm', 'i', 't')
# Output: 't'
print(my_tuple[-1])
# Output: 'p'
print(my_tuple[-6])
# Accessing tuple elements using slicing
my_tuple = ('p','r','o','g','r','a','m','i','z')
# Output: ('p', 'r', 'o', 'g', 'r', 'a', 'm', 'i', 'z')
print(my_tuple[:])
----------------------
Unlike lists, tuples are immutable (elements of a tuple cannot be
changed once they have been assigned). But, if the element is itself a
mutable data type like list, its nested items can be changed. We can
also assign a tuple to different values (reassignment).
#tuple5.py
# Changing tuple values
my_tuple = (4, 2, 3, [6, 5])
# Output: ('p', 'r', 'o', 'g', 'r', 'a', 'm', 'i', 'z')
print(my_tuple)
-----------
#tuple6.py
# Concatenation
# Output: (1, 2, 3, 4, 5, 6)
print((1, 2, 3) + (4, 5, 6))
# Repeat
# Output: ('Repeat', 'Repeat', 'Repeat')
print(("Repeat",) * 3)
y=("Repeat",) * 3
print(y) #out: ('Repeat', 'Repeat', 'Repeat')
---------------------
As discussed above, we cannot change the elements in a tuple. It
means that we cannot delete or remove items from a tuple.
Second Level Students 2021/2022 133
Data Structures and Algorithms with Python
print(my_tuple.count('p')) # Output: 2
print(my_tuple.index('l')) # Output: 3
----------------------
#tuple9.py
# Membership test in tuple
my_tuple = ('a', 'p', 'p', 'l', 'e',)
# In operation
print('a' in my_tuple) #True
print('b' in my_tuple) #False
# Not in operation
print('g' not in my_tuple) # True
---------------------
#tuple10.py
# Using a for loop to iterate through a tuple
for name in ('John', 'Kate'):
print("Hello", name)
#output:
# Hello John
# Hello Kate
Student = namedtuple('Student',['name','age','DOB'])
# Adding values
S = Student('Nandini','19','2541997')
print("S=")
print(S)
5- deque
6- ChainMap
7- UserDict
8- UserList
9- UserString
This class also acts as a wrapper for the string elements. It also
used to manipulate the string values. Because normal string is not
allowing any manipulation like adding and removing. Here we are
having two classes UserString and MutableString. In this
MutableString not used most. Because its mostly in slicing process.
UserString is most used one for manipulation of String objects.
Questions:
1. Examples of linear collections are:
a. Sets and trees b. Lists and stacks
2. Examples of unordered collections are:
a. Queues and lists b. Sets and dictionaries
3. A hierarchical collection can represent a:
a. Line of customers at a bank b. File directory system
4. A graph collection best represents a:
a. Set of numbers
b. Map of flight paths between cities
5. In Python, a type conversion operation for two collections:
a. Creates copies of the objects in the source collection and adds
these new objects to a new instance of the destination collection
The operations for putting items on and removing items from a stack
are called push and pop, respectively.
➢ Push: if (top==MAX), display Stack overflow else reading the
data and making stack [top] =data and incrementing the top
value by doing top ++.
➢ Pop: if (top==0 ), display Stack underflow else printing the
element at the top of the stack and top--.
➢ Display: IF (TOP==0), display Stack is empty else printing the
elements in the stack from stack [0] to stack [top].
Undo stack
You can see that the stack now has an Add Function operation on it.
After adding the function, you delete a word from a comment. This
also gets added to the undo stack:
Notice how the Delete Word item is placed on top of the stack. Finally,
you indent a comment so that it’s lined up properly:
Your editor undoes the indent, and the undo stack now contains two
items. This operation is the opposite of push and is commonly
called pop.
When you hit undo again, the next item is popped off the stack:
Stack Implementation
We will discuss the implementation of stack using data structures and
modules from Python library. Stack in Python can be implemented
using following ways:
• list
• collections.deque
• queue.LifoQueue
Like other collections, the stack can also include the clear, isEmpty,
len, str, in, and + operations, as well as an iterator.
These operations are listed as Python methods in Table 6-1, where
the variable s refers to a stack.
Note that the methods pop and peek have an important
precondition and raise an exception if the user of the stack does not
satisfy that precondition.
Stack method What it Does
s.isEmpty() Returns True if s is empty or False otherwise.
s.__len__() Same as len(s). Returns the number of items in s.
s.__str__() Same as str(s), Returns the string representation
of s.
s.__iter__() Same as iter(s) or for item in s: . Visits each item
in s from bottom to top.
s.__contains__(item) Same as item in s. Returns True if item is in s
or False otherwise.
s1.__add__(s2) Same as s1+s2. Returns a new stack
containing the items in s1 and s2.
s.__eq__(anyObject) Same a s==anyObject. Returns True if s
equals anyObject or False otherwise. Two
stacks are equal if the items at corresponding
positions are equal.
s.clear() Makes s become empty.
s.peek() Returns the item at the top of s. Precondition:
s must not be empty; raises a keyerror if the
stack is empty.
s.push(item) Adds item to the top of s.
s.pop() Removes and returns the item at the top of s.
Precondition: s must not be empty; raises a
KeyError if the stack is empty.
Table 6-1
stack = []
stack.append('a')
stack.append('b')
stack.append('c')
print('Initial stack')
print(stack)
# uncommenting print(stack.pop())
# will cause an IndexError as the stack is now empty
Output:
Initial stack
['a', 'b', 'c']
The list methods make it very easy to use a list as a stack, where the
last element added is the first element retrieved (“last-in, first-out”).
To add an item to the top of the stack, use append(). To retrieve an
item from the top of the stack, use pop() without an explicit index.
For example:
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]
---------------------------------------
# Module: stack3.py
class Stack:
def __init__(self):
self.items = []
def isEmpty(self):
return self.items == []
def peek(self):
return self.items[len(self.items)-1]
def size(self):
return len(self.items)
#Module: stack4.py
from stack3 import Stack
s=Stack()
print(s.isEmpty())
s.push(4)
s.push('dog')
print(s.peek())
s.push(True)
print(s.size())
print(s.isEmpty())
s.push(8.4)
print(s.pop())
print(s.pop())
print(s.size())
Table 6-2
else:
newNode = Node(data)
newNode.next = self.head
self.head = newNode
def pop_val(self):
if self.head is None:
return None
else:
del_Val = self.head.data
self.head = self.head.next
return del_Val
my_instance = Stack_linkedList()
while True:
print('push <value>')
print('pop')
print('quit')
my_input = input('What action would you like to perform ? ').split()
operation = my_input[0].strip().lower()
if operation == 'push':
my_instance.push_val(int(my_input[1]))
elif operation == 'pop':
del_Val = my_instance.pop_val()
if del_Val is None:
print('The stack is empty.')
else:
print('The deleted value is : ', int(del_Val))
elif operation == 'quit':
break
newNode.next = self.root
self.root = newNode
print ("%d pushed to stack" %(data))
def pop(self):
if (self.isEmpty()):
return float("-inf")
temp = self.root
self.root = self.root.next
popped = temp.data
return popped
def peek(self):
if self.isEmpty():
return float("-inf")
return self.root.data
class Stack:
# Initializing a stack.
# Use a dummy node, which is
# easier for handling edge cases.
def __init__(self):
self.head = Node("head")
self.size = 0
# Driver Code
if __name__ == "__main__":
stack = Stack()
for i in range(1, 11):
stack.push(i)
print(f"Stack: {stack}")
Output:
Stack: 10 -> 9 -> 8 -> 7 -> 6 -> 5 -> 4 -> 3 -> 2 -> 1
Pop: 10
Pop: 9
Pop: 8
Pop: 7
Pop: 6
Another code:
# Python program for implementation of stack
def createStack():
stack = []
return stack
return stack.pop()
# Call
stack = createStack()
print("maximum size of array is",maxsize)
push(stack, str(10))
push(stack, str(20))
push(stack, str(30))
print(pop(stack) + " popped from stack")
print(pop(stack) + " popped from stack")
print(pop(stack) + " popped from stack")
print(pop(stack) + " popped from stack")
push(stack, str(10))
push(stack, str(20))
push(stack, str(30))
print(pop(stack) + " popped from stack")
# stackTestBalancParentheses.py
# Python3 code to Check for balanced parentheses in an expression
open_list = ["[","{","("]
close_list = ["]","}",")"]
def check(myStr):
stack = []
for i in myStr:
if i in open_list:
stack.append(i)
elif i in close_list:
pos = close_list.index(i)
if ((len(stack) > 0) and
(open_list[pos] == stack[len(stack)-1])):
stack.pop()
else:
return "Unbalanced"
if len(stack) == 0:
return "Balanced"
else:
return "Unbalanced"
# Main code
string = "{[]{()}}"
print(string,"-", check(string))
string = "[{}{})(]"
print(string,"-", check(string))
string = "((()"
print(string,"-",check(string))
Exercises
Infix evaluation: 34 + 22 * 2 → 34 + 44 → 78
Postfix evaluation: 34 22 2 * + → 34 44 + → 78
Example:
Evaluate the postfix expression: 6 5 2 3 + 8 * + 3 + *
Table 6-5
Code:
#H:\H:\data structures\data structures Python 2020\codes vs code\postfixEvaluate2.py
class Evaluate:
def __init__(self):
self.array = []
self.size = -1
def isEmpty(self):
if self.array == []:
return True
else:
return False
def pop(self):
if not self.isEmpty():
Second Level Students 2021/2022 160
Data Structures and Algorithms with Python
return self.array.pop()
else:
return "empty"
def push(self, op):
self.array.append(op)
def Postfix(self, exp):
for i in exp:
if i.isdigit():
self.push(i)
else:
val1 = self.pop()
val2 = self.pop()
#self.push(str(eval(val2 + i + val1)))
result=self.cal(val2,val1,i)
self.push(result)
#return int(self.pop())
return (self.pop())
def cal(self,op2,op1,i):
if i == '*':
return int(op2)*int(op1)
elif i == '/':
return int(op2)/int(op1)
elif i == '+':
return int(op2)+int(op1)
elif i == '-':
return int(op2)-int(op1)
obj=Evaluate()
#exp=input('enter the postfix expression')
exp= "45*4+6"
value= obj.Postfix(exp)
print('the result of postfix expression',exp,'is',value)
exp = "12+3+4+5+"
value= obj.Postfix(exp)
print('the result of postfix expression',exp,'is',value)
Second Level Students 2021/2022 161
Data Structures and Algorithms with Python
exp = "12345*+*+"
value= obj.Postfix(exp)
print('the result of postfix expression',exp,'is',value)
exp = "45*4+6/"
value= obj.Postfix(exp)
print('the result of postfix expression',exp,'is',value)
def isEmpty(self):
return True if self.top == -1 else False
# Return the value of the top of the stack
def peek(self):
return self.array[-1]
# add it to output
if self.isOperand(i):
self.output.append(i)
result= "".join(self.output)
print(result)
# Test program
exp = "a+b*(c^d-e)^(f+g*h)-i"
obj = Conversion(len(exp))
obj.infixToPostfix(exp)
#infixTopostfix.py
# program to convert an infix expression to a postfix expression
def isOperator(input):
switch = {
'+': 1,
'-': 1,
'*': 1,
'/': 1,
'%': 1,
'(': 1,
}
return 0
switch = {
'+': 2,
'-': 2,
'*': 4,
'/': 4,
'%': 4,
'(': 0,
}
Second Level Students 2021/2022 168
Data Structures and Algorithms with Python
return switch.get(input, 0)
switch = {
'+': 1,
'-': 1,
'*': 3,
'/': 3,
'%': 3,
'(': 100,
}
return switch.get(input, 0)
i=0
s = []
# If character an operand
if (isOperand(input[i]) == 1):
print(input[i], end = "")
inPrec(s[-1])):
s.append(input[i])
else:
while(len(s) > 0 and
outPrec(input[i]) <
inPrec(s[-1])):
print(s[-1], end = "")
s.pop()
s.append(input[i])
i += 1
s.pop()
# Main code
input = "a+b*c-(d/e+f*g*h)"
Exercise
Backtracking
A backtracking algorithm begins in a predefined starting state and
then moves from state to state in search of a desired ending state. At
any point along the way, when there is a choice between several
alternative states, the algorithm picks one, possibly at random, and
continues. If the algorithm reaches a state that represents an
undesirable outcome, it backs up to the last point at which there was
an unexplored alternative and tries it. In this way, the algorithm
either exhaustively searches all states, or it reaches the desired
ending state.
Queues
queue: Retrieves elements in the order they were added.
– First-In, First-Out ("FIFO")
– Elements are stored in order of insertion but don't have
indexes.
– Client can only add to the end of the queue, and can only
examine/remove the front of the queue.
Queues Applications
• Operating systems:
• Programming:
Can you add any new element now? No, even though there are two
free positions. To over come this problem the elements of the queue
are to be shifted towards the beginning of the queue so that it creates
vacant position at the rear end. Then the FRONT and REAR are to be
adjusted properly. The element 66 can be inserted at the rear end.
Rear = 4, Front =0
Example:
#Queue_deque
print(queue)
queue.append("Terry") # Terry arrives
print('append')
print(queue)
queue.append("Graham") # Graham arrives
print('append')
print(queue)
queue.popleft() # The first to arrive
print('remove left "front"')
print(queue)
queue.popleft() # The second to arrive now leaves
print('remove left "front"')
print(queue) # Remaining queue in order of arrival)
front is always 1 less than the actual front of the queue and rear
always points to the last element in the queue. Thus, front = rear if
and only if there are no elements in the queue. The initial condition
then is front = rear = 0.
deque is short for Double Ended Queue - a generalized queue that can
get the first or last element that's stored:
numbers=deque()
print(numbers)
numbers.append(99)
print(numbers)
numbers.append(15)
print(numbers)
numbers.append(82)
print(numbers)
numbers.append(50)
print(numbers)
numbers.append(47)
print(numbers)
last_item=numbers.pop()
print('delete the item:')
print(last_item) #47
print(numbers) #[99, 15, 82, 50]
print('delete the item:')
first_item= numbers.popleft()
print(first_item) #99
print('last numbers')
print(numbers) # [15, 82, 50]
stack = deque()
print('Initial stack:')
print(stack)
# uncommenting print(stack.pop())
# will cause an IndexError
# as the stack is now empty
Output:
Initial stack:
deque(['a', 'b', 'c'])
Queue module also has a LIFO Queue, which is basically a Stack. Data
is inserted into Queue using put() function and get() takes data out
from the Queue.
# Initializing a stack
stack = LifoQueue(maxsize = 3)
Code:
front = 0
rear = 0
mymax = 3
# Function to create a stack. It initializes size of stack as 0
def createQueue():
queue = []
return queue
def dequeue(queue):
if (isEmpty(queue)):
return "Queue is empty"
item=queue[0]
del queue[0]
return item
# Test program
queue = createQueue()
while True:
print("1 Enqueue")
print("2 Dequeue")
print("3 Display")
print("4 Quit")
ch=int(input("Enter choice"))
if(ch==1):
if(rear < mymax):
item=input("enter item")
Second Level Students 2021/2022 182
Data Structures and Algorithms with Python
enqueue(queue, item)
rear = rear + 1
else:
print("Queue is full")
elif(ch==2):
print(dequeue(queue))
elif(ch==3):
print(queue)
else:
break
The output restricted DEQUE allows deletions from only one end and
input restricted DEQUE allow insertions at only one end. The DEQUE
can be constructed in two ways they are
1) Using array
2)Using linked list
Operations in DEQUE
1. Insert element at back
2. Insert element at front
3. Remove element at front
4. Remove element at back
Insert_front
Is an operation used to push an element into the front of the Deque.
Insert_back
Is an operation used to push an element into the back of the Deque.
Remove_front
Is an operation used to pop an element into the front of the Deque.
Remove_back
Is an operation used to pop an element into the back of the Deque.
Applications of DEQUE
1. The A-Steal algorithm implements task scheduling for several
processors (multiprocessor scheduling).
2. The processor gets the first element from the deque.
3. When one of the processor completes execution of its own threads
it can steal a thread from another processor.
4. It gets the last element from the deque of another processor and
executes it.
Circular Queue:
Circular queue is a linear data structure. It follows FIFO principle. In
circular queue the last node is connected back to the first node to
make a circle.
• Circular linked list fallow the First In First Out principle
• Elements are added at the rear end and the elements are deleted
at front end of the queue
• Both the front and the rear pointers points to the beginning of the
array.
• It is also called as “Ring buffer”.
• Items can inserted and deleted from a queue in O(1) time.
Second Level Students 2021/2022 186
Data Structures and Algorithms with Python
Queue Empty
MAX = 6
Front = Rear = 0
CO U NT = 0
Front= (Front + 1) % 6 = 1
Rear = 5
Count = Count - 1 = 4
Again, delete an element. The element to be deleted is always
pointed to by the FRONT pointer. So, 22 is deleted.
Front = (Front + 1) % 6 = 2
Front = 2, Rear = 2
Rear = Rear % 6 = 2
CO UNT = 6
Exercises:
Objectives:
✓ Create arrays
✓ Perform various operations on arrays
✓ Determine the running times and memory usage of array operations
✓ Describe how costs and benefits of array operations depend on how
arrays are represented in computer memory
✓ Create linked structures using singly linked nodes
✓ Perform various operations on linked structures with singly linked nodes
The class defines methods that allow clients to use the subscript
operator [ ], the len function, the str function, and the for loop with
array objects. The Array methods needed for these operations are
listed in Table 7-1. The variable a in the left column refers to an Array
object.
Table 7-1 Array Operations and the methods of the Array Class
Creating an Array
import array as arr
a = arr.array('d', [1.1, 3.5, 4.5])
print(a)
# array2.py
import array as arr
a = arr.array('i', [2, 4, 6, 8]) #'i' signed integer
First element: 2
Second element: 4
Last element: 8
# array3.py
Output:
numbers[0] = 0
Output
We can add one item to the array using the append() method, or add
several items using the extend() method.
numbers.append(4)
numbers.extend([5, 6, 7])
Output
"""
An Array is like a list, but use only [], len, iter, and str.
To instantiate, use
Second Level Students 2021/2022 196
Data Structures and Algorithms with Python
#1
for item in a: # Traverse the array to print all
print(item)
#1
#2
#3
#4
#5
As you can see, an array is a very restricted version of a list.
Arrays operations
We can also concatenate two arrays using + operator.
print(numbers)
Output
Output
numbers.remove(12)
print(numbers) # Output: array('i', [10, 11, 12, 13])
print(numbers.pop(2)) # Output: 12
print(numbers) # Output: array('i', [10, 11, 13])
Output
If you create arrays using the array module, all elements of the array
must be of the same numeric type.
Output
➢ Lists are much more flexible than arrays. They can store
elements of different data types including strings. And, if you
need to do mathematical computation on arrays and matrices,
you are much better off using something like NumPy.
import array
# initializing array with array values
# initializes array with signed integers
arr= array.array('i',[1, 2, 3, 1, 2, 5])
print ("\r")
Second Level Students 2021/2022 201
Data Structures and Algorithms with Python
Output:
The new created array is : 1 2 3 1 2 5
The index of 1st occurrence of 2 is : 1
The array after reversing is : 5 2 1 3 2 1
if logicalSize == len(a):
temp = Array(len(a) + 1) # Create a new array
for i in range(logicalSize): # Copy data from the old
temp [i] = a[i] # array to the new array
a = temp # Reset the old array variable
# to the new array
Suppose this grid is named grid. To access an item in grid, you use
two subscripts to specify its row and column positions, remembering
that indexes start at 0:
x = grid[2][3] # Set x to 23, the value in (row 2, column 3)
sum = 0
for row in range(grid.getHeight()): # Go through rows
for column in range(grid.getWidth()): # Go through columns
sum +=grid[row][column]
Example:
>>> matrix = [
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12],
... ]
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]
OR
>>> transposed = []
>>> for i in range(4):
# the following 3 lines implement the nested listcomp
transposed_row = []
for row in matrix:
transposed_row.append(row[i])
transposed.append(transposed_row)
>>> transposed
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]
The zip() function would do a great job for this use case:
>>> list(zip(*matrix))
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]
del statement
There is a way to remove an item from a list given its index instead of
its value: the del statement. This differs from the pop() method
which returns a value. The del statement can also be used to remove
slices from a list or clear the entire list (which we did earlier by
assignment of an empty list to the slice). For example:
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]
>>> del a
Questions:
1- Write a code segment that searches a Grid object for a negative
integer. The loop should terminate at the first instance of a
negative integer in the grid, and the variables row and column
should be set to the position of that integer. Otherwise, the
variables row and column should equal the number of rows
and columns in the grid.
2- Describe the contents of the grid after you run the following
code segment:
matrix = Grid(3, 3)
for row in range(matrix.getHeight()):
for column in range(matrix.getWidth()):
matrix[row][column] = row * column
Though tuples may seem similar to lists, they are often used in
different situations and for different purposes. Tuples are immutable,
and usually contain a heterogeneous sequence of elements that are
accessed via unpacking (see later in this section) or indexing (or even
by attribute in the case of namedtuples). Lists are mutable, and their
elements are usually homogeneous and are accessed by iterating
over the list.
For example:
>>> empty = ()
>>> singleton = 'hello', # <-- note trailing comma
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)
>>> x, y, z = t
This is called, appropriately enough, sequence unpacking and works for any
sequence on the right-hand side. Sequence unpacking requires that there
are as many variables on the left side of the equals sign as there are
Second Level Students 2021/2022 209
Data Structures and Algorithms with Python
Sets
Python also includes a data type for sets. A set is an unordered
collection with no duplicate elements. Basic uses include
membership testing and eliminating duplicate entries. Set objects
also support mathematical operations like union, intersection,
difference, and symmetric difference.
Curly braces or the set() function can be used to create sets. Note: to
create an empty set you have to use set(), not {}; the latter creates an
empty dictionary, a data structure that we discuss later.
>>> a | b # letters in a or b
or both
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a
and b
{'a', 'c'}
>>> a ^ b # letters in a or b
but not both
{'r', 'd', 'b', 'm', 'z', 'l'}
Dictionaries
Another useful data type built into Python is the Dictionaries which
are sometimes found in other languages as “associative memories”
or “associative arrays”. Unlike sequences, which are indexed by a
range of numbers, dictionaries are indexed by keys, which can be any
immutable type; strings and numbers can always be keys. Tuples can
be used as keys if they contain only strings, numbers, or tuples; if a
tuple contains any mutable object either directly or indirectly, it
cannot be used as a key. You can’t use lists as keys, since lists can be
modified in place using index assignments, slice assignments, or
methods like append() and extend().
Looping Techniques
When looping through dictionaries, the key and corresponding value
can be retrieved at the same time using the items() method.
To loop over two or more sequences at the same time, the entries can be
paired with the zip() function.
More on Conditions
The conditions used in while and if statements can contain any
operators, not just comparisons.
same object; this only matters for mutable objects like lists. All
comparison operators have the same priority, which is lower than
that of all numerical operators.
Note that comparing objects of different types with < or > is legal
provided that the objects have appropriate comparison methods. For
example, mixed numeric types are compared according to their
numeric value, so 0 equals 0.0, etc. Otherwise, rather than providing
an arbitrary ordering, the interpreter will raise a TypeError exception.
The last item in either type of linked structure has no link to the next
item. The figure indicates the absence of a link, called an empty link,
by means of a slash instead of an arrow. Note also that the first item
in a doubly linked structure has no link to the preceding item.
Like arrays, these linked structures represent linear sequences of
items. However, the programmer who uses a linked structure cannot
immediately access an item by specifying its index position. Instead,
the programmer must start at one end of the structure and follow the
links until the desired position (or item) is reached. This property of
linked structures has important consequences for several operations,
as discussed shortly.
Figure 7-3 shows a singly linked node and a doubly linked node
whose internal links are empty.
class Node(object):
"""Represents a singly linked node."""
def __init__(self, data, next = None):
"""Instantiates a Node with a default next of None."""
self.data = data
self.next = next
Figure 7-6 shows the state of the three variables after this code is
run.
Use the Node class to create a singly linked structure and print its
contents:
Exercise
Write a code segment that transfers items from a full array to a singly
linked structure. The operation should preserve the ordering of the
items.
Traversal
Many applications simply need to visit each node without deleting it.
This operation, called a traversal, uses a temporary pointer variable
named probe.
probe = head
while probe != None:
<use or modify probe.data>
probe = probe.next
Searching
The sequential search of a linked structure resembles a traversal in
that you must start at the first node and follow the links until you
reach a sentinel. However, in this case, there are two possible
sentinels:
•• The empty link, indicating that there are no more data items to
examine
•• A data item that equals the target item, indicating a successful
search
Here is the form of the search for a given item:
probe = head
You can assume that 0 <= i< n, where n is the number of nodes in
the linked structure. Here is the form for accessing the ith item:
# linkedList6.py
# Node class
class Node:
def __init__(self):
self.head = None # Initialize head as None
current = current.next
if llist.search(21):
print("Yes")
else:
print("No")
# linkedList7.py
# Recursive Python program to search an element in linked list
# Node class
class Node:
# Function to initialise
# the node object
def __init__(self, data):
self.data = data # Assign data
self.next = None # Initialize next as null
class LinkedList:
def __init__(self):
self.head = None # Initialize head as None
# Base case
if(not li):
return False
# If key is present in
# current node, return true
if(li.data == key):
return True
li = LinkedList()
li.push(1)
li.push(2)
li.push(3)
li.push(4)
key = 4
if li.search(li.head,key):
print("Yes")
else:
print("No")
Replacement
The replacement operations in a singly linked structure also employ
the traversal pattern.
In these cases, you search for a given item or a given position in the
linked structure and replace the item with a new item. The first
operation, replacing a given item, need not assume that the target
item is in the linked structure.
If the target item is not present, no replacement occurs and the
operation returns False. If the target is present, the new item
replaces it and the operation returns True. Here is the form of the
operation:
probe = head
while probe != None and targetItem != probe.data:
probe = probe.next
if probe != None:
probe.data = newItem
return True
else:
return False
The operation to replace the ith item assumes that 0 <= i <n.
Some operations make the linked list more preferrable than arrays.
Fig. 7-8 Two cases of inserting an item at the beginning of a linked list
newNode = Node(newItem)
if head is None:
head = newNode
else:
probe = head
while probe.next != None:
probe = probe.next
probe.next = newNode
Assume that there is at least one node in the structure. The operation
returns the removed item.
head = head.next
return removedItem
The operation uses constant time and memory, unlike the same
operation for arrays.
In either case, the code returns the data item contained in the
deleted node. Here is the form:
if head.next is None:
head = None
else:
probe = head
while probe.next.next != None:
probe = probe.next
removedItem = probe.next.data
probe.next = None
return removedItem
The main advantage of the singly linked structure over the array is
not time performance but memory performance. Resizing an array,
when this must occur, is linear in time and memory. Resizing a linked
structure, which occurs upon each insertion or removal,
is constant in time and memory. Moreover, no memory ever goes to
waste in a linked structure. The physical size of the structure never
exceeds the logical size. Linked structures
do have an extra memory cost in that a singly linked structure must
use n cells of memory for the pointers. This cost increases for doubly
linked structures, whose nodes have two links.
Questions:
Assume that the position of an item to be removed from a singly
linked structure has been located. State the run-time complexity for
completing the removal operation from that point.
link from the last node back to the first node in the structure. There is
always at least one node in this implementation. This node, the
dummy header node, contains no data but serves as a marker for the
beginning and the end of the linked structure.
The first node to contain data is located after the dummy header
node. This node’s next pointer then points back to the dummy
header node in a circular fashion, as shown in Figure 7-14.
Fig. 7-14 A circular linked list after inserting the first node
The search for the ith node begins with the node after the dummy
header node. Assume that the empty linked structure is initialized as
follows:
Here is the code for insertions at the ith position using this new
representation of a linked structure:
class Node(object):
def __init__(self, data, next = None):
"""Instantiates a Node with default next of None"""
self.data = data
self.next = next
class TwoWayNode(Node):
def __init__(self, data, previous = None, next = None):
"""Instantiates a TwoWayNode."""
Node.__init__(self, data, next)
self.previous = previous
Questions:
8. Trees
Exercises:
1- Write a Python program to create a Balanced Binary Search
Tree (BST) using an array (given) elements where array
elements are sorted in ascending order.
2- Write a Python program to find the closest value of a given
target value in a given non-empty Binary Search Tree (BST) of
unique values.
3- Write a Python program to check whether a given a binary tree
is a valid binary search tree (BST) or not.
Let a binary search tree (BST) is defined as follows:
The left subtree of a node contains only nodes with keys less
than the node's key.
Both the left and right subtrees must also be binary search
trees.
4- Write a Python program to delete a node with the given key in a
given Binary search tree (BST). First, search for the node to be
deleted, if found, delete it.
5- Consider the following tree, traverse it using BFS algorithm,
DFS algorithms (Preorder, inorder and PostOrder)
6- Write a function that returns the maximum value of all the keys
in a binary tree. Assume all values are nonnegative; return -1 if
the tree is empty.
7- Write a function that returns the sum of all the keys in a binary
tree.
A tree T is a set of nodes storing elements such that the nodes have a
parent-child relationship that satisfies the following:
• If T is not empty, T has a special tree called the root that has no
parent.
• Each node v of T different than the root has a unique parent node
w; each node with parent w is a child of w.
• Tree is one of the most important nonlinear data
structures.
• Make algorithms much faster than using linear DS, such as
array-based or linked lists.
• In file systems, graphical user interfaces, databases, web
sites, and other computer systems.
❖ The depth of a node is the length of the path (or the number of
edges) from the root to that node.
❖ The height of a node is the longest path from that node to its
leaves.
❖ The height of a tree is the height of the root. A leaf node has no
children -- its only path is up to its parent.
Tree Terminology
Leaf node
A node with no children is called a leaf (or external node). A node
which is not a leaf is called an internal node.
Path: A sequence of nodes n1, n2, . . ., nk, such that ni is the parent of ni
+ 1 for i = 1, 2,. . ., k - 1. The length of a path is 1 less than the number of
nodes on the path. Thus there is a path of length zero from a node to
itself.
Siblings: The children of the same parent are called siblings.
Ancestor and Descendent If there is a path from node A to node B,
then A is called an ancestor of B and B is called a descendent of A.
Level The level of the node refers to its distance from the root. The
root of the tree has level 0, and the level of any other node in the tree
is one more than the level of its parent.
Height The maximum level in a tree determines its height. (The node
height is the length of a longest path from the node to a leaf).
Height of the tree is also called depth.
Depth of a node is the number of nodes along the path from the root
to that node.
Code:
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
def PrintTree(self):
print(self.data)
root = Node(10)
root.PrintTree()
A full binary tree of height h has all its leaves at level h. Alternatively;
All non-leaf nodes of a full binary tree have two children, and the leaf
nodes have no children.
A full binary tree with height h has 2h + 1 - 1 nodes. A full binary tree of
height h is a strictly binary tree all whose leaves are at level h.
For example, a full binary tree of height 3 contains 23+1 – 1 = 15 nodes.
A complete binary tree of height h looks like a full binary tree down to
level h-1, and the level h is filled from left to right.
A Binary tree is Perfect Binary Tree in which all internal nodes have
two children, and all leaves are at same level.
Example:
The root 3 is the 0th element while its left-child 5 is the 1 st element of
the array.
Node 6 does not have any child so its children i.e. 7 th and 8 th element
of the array are shown as a Null value.
It is found that if n is the number or index of a node, then its left child
occurs at (2n + 1)th position and right child at (2n + 2) th position of
the array. If any node does not have any of its child, then null value is
stored at the corresponding index of the array.
if parent[i] == -1:
root[0] = created[i] # root[0] denotes root of the tree
return
# If parent is not created, then create parent first
if created[parent[i]] is None:
createNode(parent, parent[i], created, root )
# Find parent pointer
p = created[parent[i]]
# If this is first child of parent
if p.left is None:
p.left = created[i]
# If second child
else:
Second Level Students 2021/2022 260
Data Structures and Algorithms with Python
p.right = created[i]
# Creates tree from parent[0..n-1] and returns root of the created tree
def createTree(parent):
n = len(parent)
# Create and array created[] to keep track
# of created nodes, initialize all entries as None
created = [None for i in range(n+1)]
root = [None]
for i in range(n):
createNode(parent, i, created, root)
return root[0]
# Test
parent = [-1, 0, 0, 1, 1, 3, 5]
root = createTree(parent)
print "Inorder Traversal of constructed tree"
inorder(root)
or right child empty then it will have in its respective link field, a null
value. A leaf node has null value in both of its links.
Code:
# Python program to create a Complete BT from with linked list
class ListNode:
self.next = None
new_node = ListNode(new_data)
# Make next of new node as head
new_node.next = self.head
# Move the head to point to new node
self.head = new_node
def convertList2Binary(self):
# Queue to store the parent nodes
q = []
# Base Case
if self.head is None:
Second Level Students 2021/2022 263
Data Structures and Algorithms with Python
self.root = None
return
# 1.) The first node is always the root node,
# and add it to the queue
self.root = BinaryTreeNode(self.head.data)
q.append(self.root)
# Advance the pointer to the next node
self.head = self.head.next
if(self.head):
rightChild = BinaryTreeNode(self.head.data)
q.append(rightChild)
self.head = self.head.next
# 2.b) Assign the left and right children of parent
parent.left = leftChild
parent.right = rightChild
Second Level Students 2021/2022 264
Data Structures and Algorithms with Python
if(root):
self.inorderTraversal(root.left)
print root.data,
self.inorderTraversal(root.right)
# Test Program
conv = Conversion()
conv.push(36)
conv.push(30)
conv.push(25)
conv.push(15)
conv.push(12)
conv.push(10)
conv.convertList2Binary()
print "Inorder Traversal of the contructed Binary Tree is:"
conv.inorderTraversal(conv.root)
1. Add the node at the root to a queue. For instance, in the above
example, the node A will be added to the queue.
2. Pop an item from the queue and print/process it.
3. This is important-- add all the children of the node popped in
step two to the queue. At this point in time, the queue will
contain the children of node A:
class Tree_Node:
def __init__(self,root_value,children_nodes):
self.value = root_value
self.children_nodes = children_nodes
def breadth_first_search(Root_Node):
queue = collections.deque()
queue.append(Root_Node.value)
while queue:
node_value = queue.popleft()
print(node_value)
children_nodes = nodes_dic[node_value]
for i in children_nodes:
if i == None:
continue
queue.append(i)
node. There are three main ways to apply Depth First Search to a
tree.
➢ Inorder Traversal
➢ Preorder Traversal
➢ Postorder Traversal
1- Inorder Traversal
2- Preorder Traversal
3- Post-order Traversal
The code
class Node:
def __init__(self,key):
self.left = None
self.right = None
self.val = key
if root:
# First recur on left child
printPostorder(root.left)
# the recur on right child
printPostorder(root.right)
# now print the data of node
print(root.val),
if root:
# Test code
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
Or
Python Implementation for DFS
class Tree_Node:
def __init__(self,root_value,children_nodes):
self.value = root_value
self.left_child = children_nodes[0]
self.right_child = children_nodes[1]
def depth_first_search(Root_Node):
if Root_Node.value is None:
return
stack = []
stack.append(Root_Node.value)
node_value = stack.pop()
print (node_value)
children_nodes = nodes_dic[node_value]
nodes_dic = {"A":["B","C"],
"B":["D","E"],
"C":["F","G"],
"D":[None],
"E":["H","I"],
"F":[None],
"G":["J", None],
"H":[None],
"I":[None],
"J":[None]}
root_node_value = next(iter(nodes_dic.keys()))
root_node_children = next(iter(nodes_dic.values()))
root_node = Tree_Node(root_node_value ,root_node_children )
depth_first_search(root_node)
Searching a key
To search a given key in BST, we first compare it with root, if the key
is present at root, we return root.
If key is greater than root’s key, we recur for right sub-tree of root
node.
Otherwise we recur for left sub-tree.
Code:
# A utility function to search a given key in BST
def search(root,key):
# Base Cases: root is null or key is present at root
if root is None or root.val == key:
return root
Priority Queues
Priority Queue is an extension of queue with following properties.
1) Every item has a priority associated with it.
2) An element with high priority is dequeued before an element with
low priority.
3) If two elements have the same priority, they are served according
to their order in the queue.
A typical priority queue supports following operations.
insert(item, priority): Inserts an item with given priority.
getHighestPriority(): Returns the highest priority item.
deleteHighestPriority(): Removes the highest priority item.
file system
———–
/ <-- root
/\
Simple tree
9. Graph
Graphs are used to represent many real life applications: Graphs are
used to represent networks.
The networks may include paths in a city or telephone network or
circuit network. Graphs are also used in social networks like linkedIn,
facebook.
For example, in facebook, each person is represented with a vertex
(or node). Each node is a structure and contains information like
person id, name, gender and locale.
1. Adjacency Matrix
2. Adjacency List
There are other representations also like, Incidence Matrix and
Incidence List. The choice of the graph
representation is situation specific. It totally depends on the type of
operations to be performed and ease of use.
Adjacency Matrix
Adjacency Matrix is a 2D array of size V x V where V is the number of
vertices in a graph. Let the 2D array be adj[][], a slot adj[i][j] = 1
indicates that there is an edge from vertex i to vertex j.
Adjacency matrix for undirected graph is always symmetric.
Adjacency Matrix is also used to represent weighted graphs. If adj[i][j]
= w, then there is an edge from vertex i to vertex j with weight w.
Adjacency List
An array of linked lists is used. Size of the array is equal to number of
vertices. Let the array be array[].
An entry array[i] represents the linked list of vertices adjacent to the
ith vertex. This representation can also be used to represent a
weighted graph. The weights of edges can be stored in nodes of
linked lists.
Following is adjacency list representation of the above graph.
Python 3 code
# Python3 Program to print BFS traversal
# from a given source vertex. BFS(int s)
# traverses vertices reachable from s.
from collections import defaultdict
# Constructor
def __init__(self):
while queue:
# Test code
g.BFS(2)
Output:
Following is Breadth First Traversal (starting from vertex 2)
2 0 3 1
11) Path Finding We can either use Breadth First or Depth First
Traversal to find if there is a path between two vertices.
Graph Algorithms
Depth First Traversal for a Graph-DFT
Depth First Traversal (or Search) for a graph is similar to Depth First
Traversal of a tree. The only catch here is, unlike trees, graphs may
contain cycles, so we may come to the same node again. To avoid
processing a node more than once, we use a boolean visited array.
For example, in the following graph, we start traversal from vertex 2.
When we come to vertex 0, we look for all adjacent vertices of it. 2 is
also an adjacent vertex of 0. If we don’t mark visited vertices, then 2
will be processed again and it will become a non-terminating
process. Depth First Traversal of the following graph is 2, 0, 1, 3
DFS (V, E)
???
4) Topological Sorting
DFS search starts from root node then traversal into left child node
and continues, if item found it stops
otherwise it continues. The advantage of DFS is it requires less
memory compare to Breadth First Search (BFS).
# Constructor
def __init__(self):
# default dictionary to store graph
self.graph = defaultdict(list)
# Constructor
def __init__(self):
# default dictionary to store graph
self.graph = defaultdict(list)
while queue:
# Dequeue a vertex from queue and print it
s = queue.pop(0)
print (s)
# Test code
# Create a graph given in the above diagram
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
10. Recursion
A function is recursive if it calls itself and has a termination
condition. Why a termination condition? To stop the function from
calling itself to infinity.
Recursion examples:
print(sum([5,7,3,8,10]))
To do this recursively:
def factorial(n):
if n == 1:
return 1
else:
return n * factorial(n-1)
print(factorial(3))
Example:
def tri_recursion(k):
if(k>0):
result = k+tri_recursion(k-1)
print(result)
else:
result = 0
return result
Limitations of recursions
def factorial(n):
if n == 1:
return 1
else:
return n * factorial(n-1)
print(factorial(3000))
sys.setrecursionlimit(5000)
def factorial(n):
if n == 1:
return 1
else:
return n*factorial(n 1)
print(factorial(3000))
but keep in mind there is still a limit to the input for the factorial
function. For this reason, you should use recursion wisely. As you
learned now for the factorial problem, a recursive function is not the
best solution. For other problems such as traversing a directory,
recursion may be a good solution.
Questions:
Return False
What is the run time of this new version?
Sol:
The outer loop still executes O (N) times in the algorithm.
When the outer loop’s counter is i, the inner loop executes O (N – i) times. If you add
up the number of times that the inner loop executes, the
( )
result is N + (N – 1) + (N – 2) +…+ 1 = N (N – 1)/2 = N2 – N /2. This
is still O (N2) ,
In the left one 5 is not greater than 6. In the right one 6 is not greater
than 7.
The reason binary-search trees are important is that the following
operations can be implemented efficiently using a BST:
Second Level Students 2021/2022 308
Data Structures and Algorithms with Python
BST Properties
3. The keys in the right subtree are greater than the key in its
parent node.
4. Duplicate node keys are not allowed.
Inserting a node
A naïve algorithm for inserting a node into a BST is that, we start from
the root node, if the node to insert is less than the root, we go to left
child, and otherwise we go to the right child of the root. We continue
this process (each node is a root for some sub tree) until we find a
null pointer (or leaf node) where we cannot go any further. We then
insert the node as a left or right child of the leaf node based on node
is less or greater than the leaf node. We note that a new node is
always inserted as a leaf node. A recursive algorithm for inserting a
node into a BST is as follows. Assume we insert a node N to tree T. if
the tree is empty, we return new node N as the tree. Otherwise, the
problem of inserting is reduced to inserting the node N to left of right
sub trees of T, depending on N is less or greater than T. A definition is
as follows.
Insert (N, T)= N if T is empty
= insert(, T.left) if N< T
=insert(N, T.right) if N> T
Deleting a node
A BST is a connected structure. That is, all nodes in a tree are
connected to some other node. For example, each node has a parent,
unless node is the root. Therefore deleting a node could affect all sub
trees of that node. For example, deleting node 5 from the tree could
result in losing sub trees that are rooted at 1 and 9.
N, the find the largest node in the left sub tree of N or the smallest
node in the right sub tree of N. These are two candidates that can
replace the node to be deleted without losing the order property. For
example, consider the following tree and suppose we need to delete
the root 38.
Then we find the largest node in the left sub tree (15) or smallest
node in the right sub tree (45) and replace the root with that node
and then delete that node. The following set of images demonstrates
this process.
Go to left child
AVL Trees
An AVL tree is another balanced BST. Like red-black trees, they are
not perfectly balanced, but pairs of sub-trees differ in height by at
most 1, maintaining an O(logn) search time. Addition and deletion
operations also take O(logn) time.
Balance
requirement
for an AVL
tree: the left
and right
sub-trees
differ by at
most 1 in
height.
Yes, this is an AVL tree. Examination shows that each left sub-tree has
a height 1 greater than each right sub-tree.
No this is not an AVL tree. Sub-tree with root 8 has height 4 and sub-
tree with root 18 has height 2.
An AVL tree implements the Map abstract data type just like a regular
binary search tree, the only difference is in how the tree performs. To
implement our AVL tree we need to keep track of a balance factor for
each node in the tree. We do this by looking at the heights of the left
and right subtrees for each node. More formally, we define the
balance factor for a node as the difference between the height of the
left subtree and the height of the right subtree.
balanceFactor=height(leftSubTree)−height(rightSubTree)
Using the definition for balance factor given above we say that a
subtree is left-heavy if the balance factor is greater than zero. If the
balance factor is less than zero, then the subtree is right heavy.
If the balance factor is zero, then the tree is perfectly in balance. For
purposes of implementing an AVL tree and gaining the benefit of
having a balanced tree we will define a tree to be in balance if the
balance factor is -1, 0, or 1.
Once the balance factor of a node in a tree is outside this range, we
will need to have a procedure to bring the tree back into balance.
Figure shows an example of an unbalanced, right-heavy tree and the
balance factors of each node.
When storing an AVL tree, a field must be added to each node with
one of three values: 1, 0, or -1. A value of 1 in this field means that the
left subtree has a height one more than the right subtree.
A value of -1 denotes the opposite. A value of 0 indicates that the
heights of both subtrees are the same. Updates of AVL trees require
up to O(log n) rotations, whereas updating red-black trees can be
done using only one or two rotations (up to O(log n) color changes).
For this reason, they (AVL trees) are considered a bit obsolete by
some.
To make the processing of m-way trees easier some type of order will
be imposed on the keys within each node, resulting in a multiway
search tree of order m (or an m-way search tree).
By definition an m-way search tree is a m-way tree in which:
M-way search trees give the same advantages to m-way trees that
binary search trees gave to binary trees - they provide fast
information retrieval and update. However, they also have the same
problems that binary search trees had - they can become
unbalanced, which means that the construction of the tree becomes
of vital importance.
B Trees
An extension of a multiway search tree of order m is a B-tree of order
m. This type of tree will be used when the data to be accessed/stored
is located on secondary storage devices because they allow for large
amounts of data to be stored in a node.
A B-tree of order m is a multiway search tree in which:
1. The root has at least two subtrees unless it is the only node in the
tree.
Second Level Students 2021/2022 322
Data Structures and Algorithms with Python
Searching a B-tree
An algorithm for finding a key in B-tree is simple. Start at the root and
determine which pointer to follow based on a comparison between
the search value and key fields in the root node.
Follow the appropriate pointer to a child node. Examine the key
fields in the child node and continue to follow the appropriate
pointers until the search value is found or a leaf node is reached that
does not contain the desired search value.
The condition that all leaves must be on the same level forces a
characteristic behavior of B-trees, namely that B-trees are not
allowed to grow at the their leaves; instead they are forced to grow at
the root.
When inserting into a B-tree, a value is inserted directly into a leaf.
This leads to three common situations that can occur:
1. A key is placed into a leaf that still has room.
2. The leaf in which a key is to be placed is full.
3. The root of the B-tree is full.
This is the easiest of the cases to solve because the value is simply
inserted into the correct sorted position in the leaf node.
Results in:
Second Level Students 2021/2022 325
Data Structures and Algorithms with Python
The 15 needs to be moved to the root node but it is full. This means
that the root needs to be divided:
The 15 is inserted into the parent, which means that it becomes the
new root node:
results in:
1b) If the leaf is less than half full after deleting the desired value
(known as underflow), two things could happen:
Deleting 7 from the tree above results in:
separator key from the parent to the leaf and moving the middle key
from the node and the sibling combined to the parent.
1b-2) If the number of keys in the sibling does not exceed the
minimum requirement, then the leaf and sibling are merged by
putting the keys from the leaf, the sibling, and the separator from
the parent into the leaf.
The sibling node is discarded and the keys in the parent are moved to
"fill the gap". It's possible that this will cause the parent to
underflow. If that is the case, treat the parent as a leaf and continue
repeating step 1b-2 until the minimum requirement is met or the root
of the tree is reached.
The vales in the left sibling are combined with the separator key (18)
and the remaining values.
They are divided between the 2 nodes:
Hashing Function
A function which employs some algorithm to computes the key K for
all the data elements in the set U, such that the key K which is of a
fixed size.
The same key K can be used to map data to a hash table and all the
operations like insertion, deletion and searching should be possible.
The values returned by a hash function are also referred to as hash
values, hash codes, hash sums, or hashes.
Hash Collision
A situation when the resultant hashes for two or more data elements
in the data set U, maps to the same location in the has table, is called
a hash collision. In such a situation two or more data elements would
qualify to be stored / mapped to the same location in the hash table.
1. Liner Probing
2. Quadratic probing
3. Double hashing (in short in case of collision another hashing
function is used with the key value as an input to identify where in
the open addressing scheme the data should actually be stored.)
Applications of Hashing
In the common hash table, which uses a hash function to index
into the correct bucket in the hash table, followed by comparing each
element in the bucket to find a match. In error checking,
hashes (checksums, message digests, etc.) are used to detect errors
caused by either hardware or software.
References
WEB References
1. https://docs.python.org/3/tutorial/datastructures.html
2. http://www.tutorialspoint.com/data_structures_algorithms