You are on page 1of 36

INTRODUCTION TO

COMPUTING SCIENCE AND PROGRAMMING


Lecture 11.1
Searching and Sorting

CMPT 120, Spring 2023, Mohammad Tayebi


Class Agenda

• Last Time • Today


• NumPy and Pandas • Time Complexity
• MovieLens Dataset Exploration • Big-O Notation
• Search Algorithms
• Sorting Algorithms

• Reading
• Lecture Slides

CMPT 120, Spring 2023, Mohammad Tayebi 2


Analysis of Algorithms

• Dilemma: you have two (or more) solution to solve a problem, how to
choose the best solution?
• Simple but not a reliable approach: implement each algorithm in Python
and test how long each takes to complete.
• Algorithms runtime can be different for different settings
• Difficult but reliable approach: assess performance in an abstract way
• Idea: analyze algorithm performance as size of input grows
• We need to develop an approach showing rate of growth of programs running times
so that we can compare algorithms.

CMPT 120, Spring 2023, Mohammad Tayebi


Complexity
• Complexity has a special meaning in computer science : the amount of
computational resources a program requires in order to run

• Main computational resources:


• Time: how long the program takes to execute
• Given two programs performing the same task, the one that executes faster is more efficient.
• Space: how much computer memory the program consumes
• Given two programs performing the same task, the one consuming less storage is more efficient.

• Often, there is a tradeoff between time and space


• e.g.: we can build a program that uses less memory if we don’t mind if it needs more
time to complete a task (and vice-versa)

CMPT 120, Spring 2023, Mohammad Tayebi 4


Time Complexity - 1

• Time complexity usually is more important than space complexity


• Building programs that run fast

• Time complexity of a program is not directly computed from its execution


time as this measure can be impacted by:
• Size of input
• Speed of the computer’s hardware
• Other programs running at the same time
• Operating system performance

CMPT 120, Spring 2023, Mohammad Tayebi 5


Time Complexity - 2
• The time complexity of an algorithm is a function that gives the amount of time that the
algorithm takes to complete.
• Time complexity is measured based on the growth rate as the input size increases.
• Examples of the input size:
• the number of names to sort
• the number of numbers to add
• the number of students to search

• Time complexity of an algorithm is also called cost function and denoted as f(n), where
n is the size of the input.
• Cost: the amount of time it takes the algorithm to complete
• Time complexity is non-decreasing
• The amount of time needed by an algorithm cannot decrease as the size of the input increases.
• e.g. finding a number in a larger list can not take less time than finding a number in a smaller list.

CMPT 120, Spring 2023, Mohammad Tayebi 6


Asymptotic Growth
• Let 𝑓 𝑛 = 3𝑛2 + 5𝑛 be the cost function of an algorithm. The following table shows 𝑓 𝑛 for different
values of n.
Input size n 3𝑛2 5𝑛 3𝑛2 + 5𝑛
1 3 5 8
2 12 10 22
10 300 50 300
100 30,000 50 30,500
10,000 300,000,000 50,000 300,050,000
1,000,000 3,000,000,000,000 5,000,000 3,000,005,000,000

• The time complexity of an algorithm is generally analyzed for large values of n.


• Assume the needed time to run a basic operation is 1 microsecond = 0.000001 second.
• For the example above, and for values of n < 100 running time is smaller than 1 second but for
n = 1 million the running time is about 25 days.

• For large values of n, the value of the cost function is mainly dependent on the largest term in
the function.
• In other words, when analyzing algorithms, we only care about the term that grows the fastest.
• In the example above, 3𝑛2 is the larges term.
CMPT 120, Spring 2023, Mohammad Tayebi 7
Big-O Notation
• Order of magnitude or Big-O notation is a mathematical notation for describing
the asymptotic growth of a time complexity function.
• The letter O stands for “Order”.
• With Big-O notation, instead of using the exact cost functions we use functions that
approximate the actual cost.
• Example: if 𝑓 𝑛 = 𝑛3 + 2𝑛 + 6 , then f(n) is of order 𝑛3 , or 𝑂(𝑛3 ), as 𝑛3 dominates the
cost function for large values of 𝑛.
• The big-O notation expresses the relative performance of an algorithm, not its
absolute performance (being an approximation).
• Algorithms with complexity O(n) may have different absolute execution times.
• e.g. 𝑓 𝑛 = 𝑛3 + 2𝑛 + 6 and 𝑓 𝑛 = 𝑛3 and 𝑓 𝑛 = 2𝑛3 + 5𝑛2 + 7𝑛 + 2 are all in order of 𝑓(𝑛3 ).

CMPT 120, Spring 2023, Mohammad Tayebi 8


Determining Time Complexity
• Focus on the most expensive part of the algorithm and count the number
of times a basic operation is executed
• Dominant parts of the algorithm
• Generally, operations in loop and recursive calls
• Basic operations
• An operation that takes constant time and it is independent of the size of the input
• E.g. assigning a value to a variable, comparing the values of two variables and
adding/multiplying/subtracting two values
• Ignore constants
• Operations that are independent of input size
• Ignore lower exponent terms
• e.g., 𝑛2 and 𝑛 when there is 𝑛3 in cost function

CMPT 120, Spring 2023, Mohammad Tayebi 9


Example 1: Calculating Time Complexity
• Constant operations
• Lines 2 - 4 1 def some_function(n)
• Basic operations depending on n 2 x = 1
3 y = 2
• addition (lines 6 and 7)
4 x = x + y
• Dominant part of the program 5 for i in range(n):
• 2 * n (lines 6 and 7 are repeated n times) 6 x = x + 3
7 y += 1
• Exact time complexity
• f(n) = 2n + 3

• Approximate time complexity


• O(n)
CMPT 120, Spring 2023, Mohammad Tayebi 10
Example 2: Calculating Time Complexity
• Constant operations
• Line 2 1 def some_function(n)
• Basic operations depending on n 2 x = 1
• Addition and multiplication (lines 4, 6 and 7) 3 for i in range(n):
4 x = x + 1
• Dominant part of the program
5 for j in range(2 * n):
• n (line 4 is repeated n times)
6 x = x + 3
• 2 * 2 * n (lines 6 and 7 are repeated 2n times) 7 x = x * 2
• Exact time complexity
• f(n) = 5n + 1

• Approximate time complexity


CMPT 120, Spring 2023, Mohammad Tayebi
• O(n) 11
Example 3: Calculating Time Complexity
• Constant operations
• Line 2
1 def some_function(n)
• Basic operations depending on n
2 x = 1
• Addition (lines 4 and 7)
3 for i in range(n):
• Dominant part of the program? 4 x = x + 1
• n (line 4 is repeated n times) 5 for i in range(n):
6 for j in range(n):
• n * n (line 7 is repeated 𝑛2 times)
7 x = x + 3
• Exact time complexity
• f(n) = 𝑛2 + n + 1

• Approximate time complexity


• O(𝑛2 )
CMPT 120, Spring 2023, Mohammad Tayebi 12
Example 4: Calculating Time Complexity - 1
• Suppose n is power of 2, e.g. 210
Not as simple as previous examples: while loop • If not, it will be a number between two such numbers and
does not run n times! It needs deeper analysis!
still the following approach works.
1 def some_function(n) • Let n = 2𝑘 . Then, what does k mean? K is equal to the number
of divisions of n until it becomes 1.
2 x = 1 • If we know what k is, we know the complexity of the algorithm.
• If n = 2𝑘 then k = log 2 𝑛
3 while n > 1: • Note that our complexity algorithm should be based on the size
4 x = x + 1 of the input, meaning that it should be based on the variable n,
not k.
5 n = n / 2 • E.g. if n = 210 , then k is equal to 10 and the while loop
iterates 10 times; and the algorithm complexity is log 2 𝑛.
• All logarithm functions (whatever the base) are in the
• Instead of decreasing by 1 in each iteration, i decreases
by half until it becomes 1.
same order of complexity.
• To know the complexity of the algorithm we need to • 𝑂(log 2 𝑛) = 𝑂(log 5 𝑛) = 𝑂(log 30 𝑛) = O(log 𝑛)
know the number of needed divisions until n becomes 1.

CMPT 120, Spring 2023, Mohammad Tayebi 13


Example 4: Calculating Time Complexity - 2
• Constant operations
1 def some_function(n)
• Line 2
2 x = 1
• Basic operations depending on n
3 while n > 1:
• Addition and division (lines 4 and 5)
4 x = x + 1
• Dominant part of the program? 5 n = n / 2
• log 2 𝑛 (lines 4 and 5 are repeated 2log 2 𝑛 times)

• Exact time complexity


• f(n) = 2log 2 𝑛 + 1

• Approximate time complexity


• O(log 𝑛)
CMPT 120, Spring 2023, Mohammad Tayebi 14
Searching Algorithms

• Searching a collection of data for a particular value is one of the most


frequently used computer algorithms.
• Searching algorithms all accomplish the same goal—finding an
element that matches a given search key, if such a value exist.
• The major difference is the amount of effort each algorithm (time
complexity) requires to complete the search.
• A proper approach to describe this effort is using Big-O notation.
• For searching algorithms, this is particularly dependent on the number of data
elements.

CMPT 120, Spring 2023, Mohammad Tayebi 15


Linear Search

• A linear search traverses the list until the desired element is found.
• Algorithm: Check the items of the list in order, until the key is found, or
the end of the list is reached.

CMPT 120, Spring 2023, Mohammad Tayebi 16


Linear Search - Implementation

def linear_search(lst, x):


for i in range(len(lst)):
if lst[i] == x:
return i
return -1

The complexity of a linear search is linear, 𝑂 𝑛 , where n is the size of the given list.

CMPT 120, Spring 2023, Mohammad Tayebi 17


Binary Search
• The binary search algorithm is more efficient than the linear search algorithm, but
it requires that the list first be sorted.
• Binary search is only worthwhile when the list, once sorted, will be searched many
times.
• The algorithm is given a sorted list in ascending order:
• The first iteration of binary search tests the middle list item.
• If it matches the search key, the algorithm ends.
• If the search key is less than the middle item, the algorithm continues with only the first half.
• If the search key is greater than the middle item, the algorithm continues with only the second
half.
• Each iteration tests the middle value of the list’s remaining items. If the item does not match
the search key, the algorithm eliminates half of the remaining items.
• The algorithm ends either by finding an item that matches the search key or by reducing the
sublist to zero size.

CMPT 120, Spring 2023, Mohammad Tayebi 18


Binary Search - Example
Search key: 41

3 5 12 14 41 56 71

Middle element
41 > 14

41 56 71

Middle element
41 < 56

41

CMPT 120, Spring 2023, Mohammad Tayebi 19


Binary Search - Implementation
1. def binary_search(lst, x):
2. start = 0
3. end = len(lst) - 1
4. middle = 0
5. while start <= end:
6. middle = (end + start) // 2
7. If x is greater, ignore the first half
7. if x > lst[middle]:
8. start = middle + 1 9. If x is smaller, ignore the second half
9. elif x < lst[middle]:
10. end = middle - 1 11. x is present in the list as middle element
11. else:
1. lst = [4, 7, 8, 12, 15]
12. return middle
2. x = 7
13. return -1
3. result = binary_search(lst, x)
4. if result != -1:
5. print("Search item is present at index ", str(result))
6. else:
CMPT 120, Spring 2023, Mohammad Tayebi 7. print("Search item is not present in the given list.") 20
Binary Search - Complexity

• Binary search complexity


corresponds to the height of b is equal to 2, the
size of division
the tree that represent the list
of elements.
• Number of times we divide the
input size n by 2 until we get to
a list of size 1.
• O(log 𝑛) Image source: https://stackoverflow.com/questions/30116387/time-complexity-of-bst

CMPT 120, Spring 2023, Mohammad Tayebi 21


Binary Search vs Linear Search
• Binary search has a tremendous
performance improvement over linear
search.
• For a list of 1,048,574 (220 ) elements it takes
the binary search algorithm a maximum of 20
comparisons to find the search key while in a
linear search it would take more than one
million comparisons.
• For a list of one billion elements it takes the
binary search algorithm a maximum of 30
comparisons to find the search key. For the
same task the linear search algorithm needs
one billion comparisons, in the worst case.

CMPT 120, Spring 2023, Mohammad Tayebi 22


Sorting Algorithms
• Sorting data which means placing the data into some particular order - such
as ascending or descending- is one of the most important computing tasks.
• All sorting algorithms do the same task but having different time and space
complexity.
• Insertion sort
• Bubble sort
• Selection sort
• Merge sort
• Quick Sort
• Heap Sort
• …
• To apply binary search, we need to sort the given list first.

CMPT 120, Spring 2023, Mohammad Tayebi 23


Bubble Sort

• Bubble sort scans the list of numbers and swap any numbers that are
not in sorted order.
• To sort the list, this swapping process must be repeated n times.

CMPT 120, Spring 2023, Mohammad Tayebi 24


Bubble Sort – Implementation

1. def bubble_sort(lst):
2. n = len(lst)
3. for i in range(n): 3. Scanning all elements of the list

4. for j in range(n - i - 1): 4. Elements that are in place in the


end of list are not checked again.
5. if lst[j] > lst[j + 1]:
6. lst[j], lst[j + 1] = lst[j + 1], lst[j]
6. Swapping elements if they are not in the
correct positions

Image source: https://www.geeksforgeeks.org/bubble-sort/

The time complexity of bubble sort is O 𝑛2 .

CMPT 120, Spring 2023, Mohammad Tayebi 25


Insertion Sort

• The insertion sort starts with the second element of the list:
• Compare it to the first element of the list. If it is smaller, swap these two elements.
• Now, the first two elements in the list are in sorted order, and the rest of the list is
still unsorted.
• Then, it moves on to the third element and starts sliding any elements in
the sorted part of the list to the right until it finds the right place to insert
the current element.
• It repeats this steps until the list is completely sorted.

CMPT 120, Spring 2023, Mohammad Tayebi 26


Insertion Sort – Implementation
1. def insertion_sort(lst):
2. for i in range(1, len(lst)):
3. temp = lst[i]
4. j = i
5. while j > 0 and lst[j - 1] > temp
6. lst[j] = lst[j - 1]
7. j = j - 1
8. lst[j] = temp

The time complexity of insertion sort is O 𝑛2 .


Image source: https://www.geeksforgeeks.org/insertion-sort/

CMPT 120, Spring 2023, Mohammad Tayebi 27


Selection Sort

• Selection sort repeatedly find the smallest element from the unsorted
part of the list and build up a sorted list.
• The selection sort algorithm maintains two sublists of a given list: a list
of already sorted elements and a sublist of unsorted elements.

CMPT 120, Spring 2023, Mohammad Tayebi 28


Selection Sort - Implementation

1. def selection_sort(lst):
2. n = len(lst)
3. for i in range(n):
4. min_index = i
5. for j in range(i + 1, n): 5-7 Find the element with minimum
value in each step
6. if lst[j] < lst[min_index]:
7. min_index = j Image source: https://www.geeksforgeeks.org/selection-sort-vs-bubble-sort/

8. lst[i], lst[min_index] = lst[min_index], lst[i]


8. Put the element with minimum
value in the correct position

The time complexity of selection sort is O 𝑛2 .

CMPT 120, Spring 2023, Mohammad Tayebi 29


Merge Sort

• Merge sort works by breaking the original list down into smaller list,
sort them and merge the sorted lists until the whole list is sorted.
• Merge sort can be implemented in two ways:
• Iterative (bottom-up)
• Recursive (top-down)
• We learn about this approach in the recursion lecture.

CMPT 120, Spring 2023, Mohammad Tayebi 30


Iterative (bottom-up) approach

CMPT 120, Spring 2023, Mohammad Tayebi 31


Iterative Merge Sort - Example

5 -3 60 1 18 30 -10 6

-3 5 1 60 18 30 -10 6

-3 1 5 60 -10 6 18 30

-10 -3 1 5 6 18 30 60

CMPT 120, Spring 2023, Mohammad Tayebi 32


1. def merge_sort(lst):
2. size = len(lst) - 1
3. temp = lst.copy()
5. Divide the list into blocks of size `k`
4. k = 1
Merge Sort – 5. while k <= size:
k = [1, 2, 4, 8, 16…]

Implementation 1 6.
7.
for i in range(0, size, 2 * k):
start = i
8. middle = i + k - 1
9. end = min(i + 2 * k - 1, size)
10. merge(lst, temp, start, middle, end)
11. k = 2 * k 6. for k = 1, i = [0, 2, 4, 6, 8…]
for k = 2, i = [0, 4, 8, 12…]
for k = 4, i = [0, 8, 16…]
1. # iterative merge sort …
2. lst = [5, 7, 19, 13, -4, 2, 10, 1]
3. print("Original list: ", lst)
4. merge_sort(lst)
5. print("Sorted list: ", lst)
CMPT 120, Spring 2023, Mohammad Tayebi 33
1. def merge(lst, temp, start, middle, end):
2. m = start
3. i = start
Merge Sort – 4.
5.
j = middle + 1
while i <= middle and j <= end:
Implementation 2 6. if lst[i] < lst[j]: 5. Compare elements in the left
and right of the middle element
7. temp[m] = lst[i]
• The number of merges 8. i = i + 1
performed is log 𝑛 9. else:
• To complete each merge, 10. temp[m] = lst[j]

n comparisons is needed 11. j = j + 1


12. m = m + 1
• The time complexity of 13. Copy remaining elements
13. while i <= middle:
bubble sort is O(nlog 𝑛).
14. temp[m] = lst[i]
15. m = m + 1
17. copy back the sorted
16. i = i + 1 part to lst

17. for i in range(start, end + 1):


18. lst[i] = temp[i]
CMPT 120, Spring 2023, Mohammad Tayebi 34
Concluding Remark

• Given a problem that can be solved with


different algorithms, the algorithm with better
time complexity is preferred.

CMPT 120, Spring 2023, Mohammad Tayebi 35


Next Lecture

Searching and Sorting


Reading: Lecture Slides

CMPT 120, Spring 2023, Mohammad Tayebi

You might also like