Lecture11.1 After Large

INTRODUCTION TO
COMPUTING SCIENCE AND PROGRAMMING

Lecture 11.1
Searching and Sorting
CMPT 120, Spring 2023, Mohammad Tayebi

Class Agenda
• Last Time • Today

• NumPy and Pandas • Time Complexity
• MovieLens Dataset Exploration • Big-O Notation
• Search Algorithms
• Sorting Algorithms
• Reading
• Lecture Slides
CMPT 120, Spring 2023, Mohammad Tayebi 2

Analysis of Algorithms
• Dilemma: you have two (or more) solution to solve a problem, how to
choose the best solution?
• Simple but not a reliable approach: implement each algorithm in Python
and test how long each takes to complete.
• Algorithms runtime can be different for different settings
• Difficult but reliable approach: assess performance in an abstract way
• Idea: analyze algorithm performance as size of input grows
• We need to develop an approach showing rate of growth of programs running times
so that we can compare algorithms.

Complexity
• Complexity has a special meaning in computer science : the amount of
computational resources a program requires in order to run
• Main computational resources:

• Time: how long the program takes to execute
• Given two programs performing the same task, the one that executes faster is more efficient.
• Space: how much computer memory the program consumes
• Given two programs performing the same task, the one consuming less storage is more efficient.
• Often, there is a tradeoff between time and space

• e.g.: we can build a program that uses less memory if we don’t mind if it needs more
time to complete a task (and vice-versa)

Time Complexity - 1
• Time complexity usually is more important than space complexity

• Building programs that run fast
• Time complexity of a program is not directly computed from its execution

time as this measure can be impacted by:
• Size of input
• Speed of the computer’s hardware
• Other programs running at the same time
• Operating system performance

Time Complexity - 2
• The time complexity of an algorithm is a function that gives the amount of time that the
algorithm takes to complete.
• Time complexity is measured based on the growth rate as the input size increases.
• Examples of the input size:
• the number of names to sort
• the number of numbers to add
• the number of students to search
• Time complexity of an algorithm is also called cost function and denoted as f(n), where
n is the size of the input.
• Cost: the amount of time it takes the algorithm to complete
• Time complexity is non-decreasing
• The amount of time needed by an algorithm cannot decrease as the size of the input increases.
• e.g. finding a number in a larger list can not take less time than finding a number in a smaller list.

Asymptotic Growth
• Let 𝑓 𝑛 = 3𝑛2 + 5𝑛 be the cost function of an algorithm. The following table shows 𝑓 𝑛 for different
values of n.
Input size n 3𝑛2 5𝑛 3𝑛2 + 5𝑛
1 3 5 8
2 12 10 22
10 300 50 300
100 30,000 50 30,500
10,000 300,000,000 50,000 300,050,000
1,000,000 3,000,000,000,000 5,000,000 3,000,005,000,000
• The time complexity of an algorithm is generally analyzed for large values of n.

• Assume the needed time to run a basic operation is 1 microsecond = 0.000001 second.
• For the example above, and for values of n < 100 running time is smaller than 1 second but for
n = 1 million the running time is about 25 days.
• For large values of n, the value of the cost function is mainly dependent on the largest term in
the function.
• In other words, when analyzing algorithms, we only care about the term that grows the fastest.
• In the example above, 3𝑛2 is the larges term.
Big-O Notation
• Order of magnitude or Big-O notation is a mathematical notation for describing
the asymptotic growth of a time complexity function.
• The letter O stands for “Order”.
• With Big-O notation, instead of using the exact cost functions we use functions that
approximate the actual cost.
• Example: if 𝑓 𝑛 = 𝑛3 + 2𝑛 + 6 , then f(n) is of order 𝑛3 , or 𝑂(𝑛3 ), as 𝑛3 dominates the
cost function for large values of 𝑛.
• The big-O notation expresses the relative performance of an algorithm, not its
absolute performance (being an approximation).
• Algorithms with complexity O(n) may have different absolute execution times.
• e.g. 𝑓 𝑛 = 𝑛3 + 2𝑛 + 6 and 𝑓 𝑛 = 𝑛3 and 𝑓 𝑛 = 2𝑛3 + 5𝑛2 + 7𝑛 + 2 are all in order of 𝑓(𝑛3 ).

Determining Time Complexity
• Focus on the most expensive part of the algorithm and count the number
of times a basic operation is executed
• Dominant parts of the algorithm
• Generally, operations in loop and recursive calls
• Basic operations
• An operation that takes constant time and it is independent of the size of the input
• E.g. assigning a value to a variable, comparing the values of two variables and
adding/multiplying/subtracting two values
• Ignore constants
• Operations that are independent of input size
• Ignore lower exponent terms
• e.g., 𝑛2 and 𝑛 when there is 𝑛3 in cost function

Example 1: Calculating Time Complexity
• Constant operations
• Lines 2 - 4 1 def some_function(n)
• Basic operations depending on n 2 x = 1
3 y = 2
• addition (lines 6 and 7)
4 x = x + y
• Dominant part of the program 5 for i in range(n):
• 2 * n (lines 6 and 7 are repeated n times) 6 x = x + 3
7 y += 1
• Exact time complexity
• f(n) = 2n + 3
• Approximate time complexity

• O(n)
• Line 2 1 def some_function(n)
• Basic operations depending on n 2 x = 1
• Addition and multiplication (lines 4, 6 and 7) 3 for i in range(n):
4 x = x + 1
• Dominant part of the program
5 for j in range(2 * n):
• n (line 4 is repeated n times)
6 x = x + 3
• 2 * 2 * n (lines 6 and 7 are repeated 2n times) 7 x = x * 2
• f(n) = 5n + 1

• O(n) 11
• Line 2
1 def some_function(n)
• Basic operations depending on n
2 x = 1
• Addition (lines 4 and 7)
3 for i in range(n):
• Dominant part of the program? 4 x = x + 1
• n (line 4 is repeated n times) 5 for i in range(n):
6 for j in range(n):
• n * n (line 7 is repeated 𝑛2 times)
7 x = x + 3
• f(n) = 𝑛2 + n + 1

• O(𝑛2 )
Example 4: Calculating Time Complexity - 1
• Suppose n is power of 2, e.g. 210
Not as simple as previous examples: while loop • If not, it will be a number between two such numbers and
does not run n times! It needs deeper analysis!
still the following approach works.
1 def some_function(n) • Let n = 2𝑘 . Then, what does k mean? K is equal to the number
of divisions of n until it becomes 1.
2 x = 1 • If we know what k is, we know the complexity of the algorithm.
• If n = 2𝑘 then k = log 2 𝑛
3 while n > 1: • Note that our complexity algorithm should be based on the size
4 x = x + 1 of the input, meaning that it should be based on the variable n,
not k.
5 n = n / 2 • E.g. if n = 210 , then k is equal to 10 and the while loop
iterates 10 times; and the algorithm complexity is log 2 𝑛.
• All logarithm functions (whatever the base) are in the
• Instead of decreasing by 1 in each iteration, i decreases
by half until it becomes 1.
same order of complexity.
• To know the complexity of the algorithm we need to • 𝑂(log 2 𝑛) = 𝑂(log 5 𝑛) = 𝑂(log 30 𝑛) = O(log 𝑛)
know the number of needed divisions until n becomes 1.

Example 4: Calculating Time Complexity - 2
1 def some_function(n)
• Line 2
2 x = 1
• Basic operations depending on n
3 while n > 1:
• Addition and division (lines 4 and 5)
4 x = x + 1
• Dominant part of the program? 5 n = n / 2
• log 2 𝑛 (lines 4 and 5 are repeated 2log 2 𝑛 times)

• f(n) = 2log 2 𝑛 + 1

• O(log 𝑛)
Searching Algorithms
• Searching a collection of data for a particular value is one of the most

frequently used computer algorithms.
• Searching algorithms all accomplish the same goal—finding an
element that matches a given search key, if such a value exist.
• The major difference is the amount of effort each algorithm (time
complexity) requires to complete the search.
• A proper approach to describe this effort is using Big-O notation.
• For searching algorithms, this is particularly dependent on the number of data
elements.

Linear Search
• A linear search traverses the list until the desired element is found.
• Algorithm: Check the items of the list in order, until the key is found, or
the end of the list is reached.

Linear Search - Implementation
def linear_search(lst, x):

for i in range(len(lst)):
if lst[i] == x:
return i
return -1
The complexity of a linear search is linear, 𝑂 𝑛 , where n is the size of the given list.

Binary Search
• The binary search algorithm is more efficient than the linear search algorithm, but
it requires that the list first be sorted.
• Binary search is only worthwhile when the list, once sorted, will be searched many
times.
• The algorithm is given a sorted list in ascending order:
• The first iteration of binary search tests the middle list item.
• If it matches the search key, the algorithm ends.
• If the search key is less than the middle item, the algorithm continues with only the first half.
• If the search key is greater than the middle item, the algorithm continues with only the second
half.
• Each iteration tests the middle value of the list’s remaining items. If the item does not match
the search key, the algorithm eliminates half of the remaining items.
• The algorithm ends either by finding an item that matches the search key or by reducing the
sublist to zero size.

Binary Search - Example
Search key: 41
3 5 12 14 41 56 71
Middle element
41 > 14
41 56 71
Middle element
41 < 56
41

Binary Search - Implementation
1. def binary_search(lst, x):
2. start = 0
3. end = len(lst) - 1
4. middle = 0
5. while start <= end:
6. middle = (end + start) // 2
7. If x is greater, ignore the first half
7. if x > lst[middle]:
8. start = middle + 1 9. If x is smaller, ignore the second half
9. elif x < lst[middle]:
10. end = middle - 1 11. x is present in the list as middle element
11. else:
1. lst = [4, 7, 8, 12, 15]
12. return middle
2. x = 7
13. return -1
3. result = binary_search(lst, x)
4. if result != -1:
5. print("Search item is present at index ", str(result))
6. else:
CMPT 120, Spring 2023, Mohammad Tayebi 7. print("Search item is not present in the given list.") 20
Binary Search - Complexity
• Binary search complexity

corresponds to the height of b is equal to 2, the
size of division
the tree that represent the list
of elements.
• Number of times we divide the
input size n by 2 until we get to
a list of size 1.
• O(log 𝑛) Image source: https://stackoverflow.com/questions/30116387/time-complexity-of-bst

Binary Search vs Linear Search
• Binary search has a tremendous
performance improvement over linear
search.
• For a list of 1,048,574 (220 ) elements it takes
the binary search algorithm a maximum of 20
comparisons to find the search key while in a
linear search it would take more than one
million comparisons.
• For a list of one billion elements it takes the
binary search algorithm a maximum of 30
comparisons to find the search key. For the
same task the linear search algorithm needs
one billion comparisons, in the worst case.

Sorting Algorithms
• Sorting data which means placing the data into some particular order - such
as ascending or descending- is one of the most important computing tasks.
• All sorting algorithms do the same task but having different time and space
complexity.
• Insertion sort
• Bubble sort
• Selection sort
• Merge sort
• Quick Sort
• Heap Sort
• …
• To apply binary search, we need to sort the given list first.

Bubble Sort
• Bubble sort scans the list of numbers and swap any numbers that are
not in sorted order.
• To sort the list, this swapping process must be repeated n times.

Bubble Sort – Implementation
1. def bubble_sort(lst):
2. n = len(lst)
3. for i in range(n): 3. Scanning all elements of the list
4. for j in range(n - i - 1): 4. Elements that are in place in the

end of list are not checked again.
5. if lst[j] > lst[j + 1]:
6. lst[j], lst[j + 1] = lst[j + 1], lst[j]
6. Swapping elements if they are not in the
correct positions
Image source: https://www.geeksforgeeks.org/bubble-sort/
The time complexity of bubble sort is O 𝑛2 .

Insertion Sort
• The insertion sort starts with the second element of the list:
• Compare it to the first element of the list. If it is smaller, swap these two elements.
• Now, the first two elements in the list are in sorted order, and the rest of the list is
still unsorted.
• Then, it moves on to the third element and starts sliding any elements in
the sorted part of the list to the right until it finds the right place to insert
the current element.
• It repeats this steps until the list is completely sorted.

Insertion Sort – Implementation
1. def insertion_sort(lst):
2. for i in range(1, len(lst)):
3. temp = lst[i]
4. j = i
5. while j > 0 and lst[j - 1] > temp
6. lst[j] = lst[j - 1]
7. j = j - 1
8. lst[j] = temp
The time complexity of insertion sort is O 𝑛2 .

Image source: https://www.geeksforgeeks.org/insertion-sort/

Selection Sort
• Selection sort repeatedly find the smallest element from the unsorted
part of the list and build up a sorted list.
• The selection sort algorithm maintains two sublists of a given list: a list
of already sorted elements and a sublist of unsorted elements.

Selection Sort - Implementation
1. def selection_sort(lst):
2. n = len(lst)
3. for i in range(n):
4. min_index = i
5. for j in range(i + 1, n): 5-7 Find the element with minimum
value in each step
6. if lst[j] < lst[min_index]:
7. min_index = j Image source: https://www.geeksforgeeks.org/selection-sort-vs-bubble-sort/
8. lst[i], lst[min_index] = lst[min_index], lst[i]

8. Put the element with minimum
value in the correct position
The time complexity of selection sort is O 𝑛2 .

Merge Sort
• Merge sort works by breaking the original list down into smaller list,
sort them and merge the sorted lists until the whole list is sorted.
• Merge sort can be implemented in two ways:
• Iterative (bottom-up)
• Recursive (top-down)
• We learn about this approach in the recursion lecture.

Iterative (bottom-up) approach

Iterative Merge Sort - Example
5 -3 60 1 18 30 -10 6
-3 5 1 60 18 30 -10 6
-3 1 5 60 -10 6 18 30
-10 -3 1 5 6 18 30 60

1. def merge_sort(lst):
2. size = len(lst) - 1
3. temp = lst.copy()
5. Divide the list into blocks of size `k`
4. k = 1
Merge Sort – 5. while k <= size:
k = [1, 2, 4, 8, 16…]
Implementation 1 6.
7.
for i in range(0, size, 2 * k):
start = i
8. middle = i + k - 1
9. end = min(i + 2 * k - 1, size)
10. merge(lst, temp, start, middle, end)
11. k = 2 * k 6. for k = 1, i = [0, 2, 4, 6, 8…]
for k = 2, i = [0, 4, 8, 12…]
for k = 4, i = [0, 8, 16…]
1. # iterative merge sort …
2. lst = [5, 7, 19, 13, -4, 2, 10, 1]
3. print("Original list: ", lst)
4. merge_sort(lst)
5. print("Sorted list: ", lst)
1. def merge(lst, temp, start, middle, end):
2. m = start
3. i = start
Merge Sort – 4.
5.
j = middle + 1
while i <= middle and j <= end:
Implementation 2 6. if lst[i] < lst[j]: 5. Compare elements in the left
and right of the middle element
7. temp[m] = lst[i]
• The number of merges 8. i = i + 1
performed is log 𝑛 9. else:
• To complete each merge, 10. temp[m] = lst[j]
n comparisons is needed 11. j = j + 1

12. m = m + 1
• The time complexity of 13. Copy remaining elements
13. while i <= middle:
bubble sort is O(nlog 𝑛).
14. temp[m] = lst[i]
15. m = m + 1
17. copy back the sorted
16. i = i + 1 part to lst
17. for i in range(start, end + 1):

18. lst[i] = temp[i]
Concluding Remark
• Given a problem that can be solved with

different algorithms, the algorithm with better
time complexity is preferred.

Next Lecture
Searching and Sorting

Reading: Lecture Slides

Lecture11.1 After Large

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture11.1 After Large

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO

COMPUTING SCIENCE AND PROGRAMMING

CMPT 120, Spring 2023, Mohammad Tayebi

• Last Time • Today

CMPT 120, Spring 2023, Mohammad Tayebi 2

CMPT 120, Spring 2023, Mohammad Tayebi

• Main computational resources:

• Often, there is a tradeoff between time and space

CMPT 120, Spring 2023, Mohammad Tayebi 4

• Time complexity usually is more important than space complexity

• Time complexity of a program is not directly computed from its execution

CMPT 120, Spring 2023, Mohammad Tayebi 5

CMPT 120, Spring 2023, Mohammad Tayebi 6

• The time complexity of an algorithm is generally analyzed for large values of n.

CMPT 120, Spring 2023, Mohammad Tayebi 8

CMPT 120, Spring 2023, Mohammad Tayebi 9

• Approximate time complexity

• Approximate time complexity

• Approximate time complexity

CMPT 120, Spring 2023, Mohammad Tayebi 13

• Exact time complexity

• Approximate time complexity

• Searching a collection of data for a particular value is one of the most

CMPT 120, Spring 2023, Mohammad Tayebi 15

CMPT 120, Spring 2023, Mohammad Tayebi 16

def linear_search(lst, x):

CMPT 120, Spring 2023, Mohammad Tayebi 17

CMPT 120, Spring 2023, Mohammad Tayebi 18

CMPT 120, Spring 2023, Mohammad Tayebi 19

• Binary search complexity

CMPT 120, Spring 2023, Mohammad Tayebi 21

CMPT 120, Spring 2023, Mohammad Tayebi 22

CMPT 120, Spring 2023, Mohammad Tayebi 23

CMPT 120, Spring 2023, Mohammad Tayebi 24

4. for j in range(n - i - 1): 4. Elements that are in place in the

Image source: https://www.geeksforgeeks.org/bubble-sort/

The time complexity of bubble sort is O 𝑛2 .

CMPT 120, Spring 2023, Mohammad Tayebi 25

CMPT 120, Spring 2023, Mohammad Tayebi 26

The time complexity of insertion sort is O 𝑛2 .

CMPT 120, Spring 2023, Mohammad Tayebi 27

CMPT 120, Spring 2023, Mohammad Tayebi 28

8. lst[i], lst[min_index] = lst[min_index], lst[i]

The time complexity of selection sort is O 𝑛2 .

CMPT 120, Spring 2023, Mohammad Tayebi 29

CMPT 120, Spring 2023, Mohammad Tayebi 30

CMPT 120, Spring 2023, Mohammad Tayebi 31

CMPT 120, Spring 2023, Mohammad Tayebi 32

n comparisons is needed 11. j = j + 1

17. for i in range(start, end + 1):

• Given a problem that can be solved with

CMPT 120, Spring 2023, Mohammad Tayebi 35

Searching and Sorting

CMPT 120, Spring 2023, Mohammad Tayebi

You might also like