You are on page 1of 174

Simple sorting algorithms

Krzysztof Simiński
Algorithms and Data Structures [Macro]
lecture 03, 13th March 2020

1 Problem statement
input: sequence of n numbers ha1 , a2 , . . . , an i and operator 6
output: permutation hai1 , ai2 , . . . , ain i, such that ai1 6 ai2 6 . . . 6 ain

1.1 Stability
Definition 1. A sorting algorithm is stable, if the order of the same keys in an input
sequence is not changed in an output sequence.
Example 1. Let’s compare outputs of stable and unstable sorting algorithms.
input data:

6 5 1 9 5 8 4 2

A stable algorithm does not change the order of the same keys:

1 2 4 5 5 8 8 9

An unstable algorithm may change the order of the same keys:

1 2 4 5 5 8 8 9

Stability of a sorting algorithm is a very important feature when sorting by two


(or more) keys is concerned.
Example 2. Let’s sort students by surnames. If two (or more) students have the same
surname, they are sorted by a first name.
This task can be easily solved with two sorts: first by first names and then by sur-
names. If we use a stable sorting algorithms, we get:

1
unsorted data sorted by first names sorted by surnames
Irene Yellow Ann Red Helen Blue
Hugh Magenta Calliope Brown Calliope Brown
Chris Red Chris Red Hugh Magenta
Doris Pink Doris Pink John Magenta
Calliope Brown Helen Blue Doris Pink
Helen Blue Hugh Magenta Ann Red
John Magenta Irene Yellow Chris Red
Ann Red John Magenta Irene Yellow

The first sort (by first names) sets Ann Red before Chris Red. The second stable sort
by surnames does not change the relative order of items with the same key (here: Red).
Thus Ann Red is still before Chris Red. The same is valid for a pair: Hugh Magenta and
John Magenta.

1.2 Natural behaviour


Definition 2. A sorting algorithm has natural behaviour, if its complexity is the lowest
for a sorted sequence and its complexity is the largest for a sequence sorted in the reversed
order.

2 Simple sorting algorithms


All algorithm discussed in this section are in-place sorting algorithms.

Definition 3. In-place (in situ) sorting algorithm requires at most O(1) extra space to
run.
Comparison is a dominant operation in all algorithms discussed in this lecture.

2.1 Permutation sort


This is a brute algorithm. It tests permutations until a sorted one is found.
1 procedure p e r m u t a t i o n _ s o r t ;
2 while array A is not sorted do
3 A ← generate next permutation ( A ) ;
4 end while ;
5 end procedure ;

An n-element array has n! posible permutations. In each iteration we have n − 1


comparisons, so time complexity T (n) ∈ O(n!). This is why this algorithm is not
used in practice.

2.2 Bubble sort


Bubble sort iterates through an array and compares all adjacent pairs of values. If
a pair is not sorted, it is reversed.
Example 3. An array to sort:

2
1 procedure b u b b l e _ s o r t ;
2 f o r i ← 1 to n do
3 f o r j ← 2 to n do
4 i f A [ j − 1 ] > A [ j ] then
5 swap ( A [ j − 1 ] , A [ j ] ) ;
6 end i f ;
7 end f o r ;
8 end f o r ;
9 end procedure ;

Figure 1: Bubble sort

6 3 1 9 5 8 4 2

i=1

3 6 1 9 5 8 4 2

3 1 6 9 5 8 4 2

3 1 6 9 5 8 4 2

3 1 6 5 9 8 4 2

3 1 6 5 8 9 4 2

3 1 6 5 8 4 9 2

3 1 6 5 8 4 2 9

i=2

1 3 6 5 8 4 2 9

1 3 6 5 8 4 2 9

1 3 5 6 8 4 2 9

1 3 5 6 8 4 2 9

1 3 5 6 4 8 2 9

3
1 3 5 6 4 2 8 9

1 3 5 6 4 2 8 9

If we follow this pattern, we finally sort an array.

2.2.1 Complexity analysis


The for loop in lines 2-8 is run n times. In each iteration the internal loop (lines
3-7) is run n − 1 times. Thus T (n) = n(n − 1) = n2 − n ∈ O(n).
Example 4 shows that the algorithm can be easily modified. In each iteration the
local maximal value is moved to its final position (in the 1st iteration: the nth value,
in the 2nd iteration: the (n − 1)th value, in ith iteration: the (n − i + 1)th value).
Modification of the condition in line 3 reduces the number of comparisons. How to
modify it? What is the complexity of a modified algorithm?

2.2.2 Natural behaviour


The complexity is independent from the nature of data to sort. The algorithm does
not have a natural behaviour.

2.2.3 Stability
In the pseudocode in Fig. 1 (line 4) we use strict inequality (<). If two equal keys
are compared the condition is false and the keys are not swapped and the algorithm
is stable. But if we used weak inequality (6), the same keys would be swapped and
the algorithm would not be stable.

2.3 Selection sort


Selection sort is a simple sorting algorithm. It finds a local minimum in a subarray
and swaps it with the first item in a subarray.
Example 4. An array to sort:

6 3 1 9 5 8 4 2

For i = 1 we search for a minimum in the whole array and swap it with the first item in
a array.

6 3 1 9 5 8 4 2

sorted to sort

1 3 6 9 5 8 4 2

4
1 procedure s e l e c t i o n _ s o r t ;
2 f o r i ← 1 to n − 1 do
3 index_min ← i ;
4 value_min ← A[ i ] ;
5 f o r j ← i + 1 to n do
6 i f A [ j ] < v a l u e _ m i n then
7 index_min ← j ;
8 value_min ← A[ j ] ;
9 end i f ;
10 end f o r ;
11 swap ( A [ i ] , A [ i n d e x _ m i n ] ) ;
12 end f o r ;
13 end procedure ;

Figure 2: Selection sort

The first cell of an array holds the minimal value or (in other words) the minimal value
is in its final location.
In the second iteration we search for a local minimum in a subarray with indices
2, 3, . . . , n and swap it with the second cell.

sorted to sort

1 3 6 9 5 8 4 2

sorted to sort

1 2 6 9 5 8 4 3

We follow this pattern:

sorted to sort

1 2 6 9 5 8 4 3

sorted to sort

1 2 3 9 5 8 4 6

5
sorted to sort

1 2 3 4 5 8 9 6

sorted to sort

1 2 3 4 5 8 9 6

sorted to sort

1 2 3 4 5 6 9 8

1 2 3 4 5 6 8 9

2.3.1 Complexity analysis


Comparisons (line 6) are executed in the internal loop. The number of iterations
in the internal loop is not constant in all iterations of the external loop:

external loop number of comparisons


1 n−1
2 n−2
3 n−3
.. ..
. .
n−2 2
n−1 1
Number of comparisons is a dominant operation, thus time complexity is:
n−1
X
T (n) = 1 + 2 + . . . + (n − 2) + (n − 1) = i. (1)
i=1

To elaborate the sum let’s use the counterflow method. The sum to calculate is written
in columns twice: normally and in a counterflow. The sum of items in each row is n
and the number of rows is n − 1.
n−1 + 1 = n
n−2 + 2 = n
n−3 + 3 = n
.. .. ..
. ... . ... .
2 + n−2 = n
1 + n−1 = n
Pn−1 Pn−1 Pn−1
i=1 i + i=1 i = 2 i=1 i = n(n − 1)

6
n−1
X n(n − 1)
i= (2)
i=1
2

n(n−1)
Time complexity of the selection sort algorithm is T (n) = 2 ∈ O(n2 ).

2.3.2 Natural behaviour


Time complexity does not depend on data. The algorithm has not got a natural
behaviour.

2.3.3 Stability
Let’s analyse certain situation. We search for a local minimum in a subarray.

sorted to sort

5l 5r 4

Having found the minimum we swap it with the first item in a subarray. Unfortunately
the first item holds a repeating key. After swap the order of repeating keys is reversed.

sorted to sort

4 5r 5l

The selection sort is not a stable algorithm.

2.4 Insertion sort


Insertion sort splits an array into a sorted part and a part to sort. In each iteration
one item is taken from the part to sort and located in a correct position in the sorted
part. Insertion between some values need shifting of items to the right.

Example 5. An array to sort:

6 3 1 9 5 8 4 2

The first cell is a sorted part of the array.

sorted to sort

6 3 1 9 5 8 4 2

We start with i = 2 and try to insert the ith item into a correct position in the sorted
part.

7
6 3 1 9 5 8 4 2

sorted to sort

3 6 1 9 5 8 4 2

Again we try to insert the first item from the unsorted part of the array.

sorted to sort

3 6 1 9 5 8 4 2

And we follow the pattern.

sorted to sort

1 3 6 9 5 8 4 2

sorted to sort

1 3 6 9 5 8 4 2

sorted to sort

1 3 6 9 5 8 4 2

sorted to sort

1 3 5 6 9 8 4 2

sorted to sort

1 3 5 6 8 9 4 2

8
1 procedure i n s e r t i o n _ s o r t ;
2 f o r i ← 2 to n do
3 value_min ← A[ i ] ;
4 j ← i − 1;
5 while j > 0 cand v a l u e _ m i n < A [ j ] do
6 A[ j + 1] ← A[ j ] ;
7 j ← j − 1;
8 end while ;
9 A[ j + 1] ← value_min ;
10 end f o r ;
11 end procedure ;

Figure 3: Insertion sort

sorted to sort

1 3 4 5 6 8 9 2

1 2 3 4 5 6 8 9

2.4.1 Complexity analysis


Comparison (line 5) is simultaneously a condition of the inner loop (lines 5-8). The
comparison is tested at east once in each external loop.

Optimistic case In such a case the condition (= comparison) is tested only once in
each iteration of the external loop. It means an array is already sorted. The external
loop is run n − 1 times. Finally Tbest (n) = n − 1 ∈ O(n).

Pessimistic case In the pessimistic case the inner loop is run maximally. An item
is inserted always into the first cell. It means an array is sorted in the reversed order.

external loop number of comparisons


i=2 1
3 2
4 3
.. ..
. .
n−2 n−2
n−1 n−1

n−1
X n(n − 1)
Tworst (n) = i= ∈ O(n2 ). (3)
i=1
2

9
Average case For an average case we have to assume some model of data. Let’s
assume each permutation of keys in an input array has the same probability.
Let’s discuss the insertion of the ith item into a sorted part of the array. It may be
inserted in i positions (before 1st, 2nd, . . . , before (i − 1)th, or is left unmoved).
If an item is not moved only one comparison is needed. If an item is to be moved
one cell, two comparisons are needed. If an item is to be moved into the second cell,
i − 1 comparisons are needed. If an item is to be moved into the first cell, i − 1
comparisons are needed. The number of comparison for the 1st and 2nd cells is exactly
the same. Because we distinguish the 1st and 2nd cell with the same comparison: the
last comparison is true for the 1st cell and false for the 2nd cell.
We assume the final localisation of the ith item has the same probability p = 1i .
The expected number E of comparisons for the ith item is:
1 1 1 1 1
E(i) = 1 + 2 + 3 + . . . + (i − 1) + (i − 1) = (4)
i i i i i
1 1
= (1 + 2 + 3 + . . . + (i − 1)) + (i − 1) = (5)
i i
1 i(i − 1) i − 1
= · + = (6)
i 2 i
i−1 1 i+1 1
= +1− = − (7)
2 i 2 i
Now we have only to sum up expected number of comparisons for all i’s in the ex-
ternal loop:
n   n n
X i+1 1 X i+1 X 1
Tavg (n) = − = − = (8)
i=2
2 i i=2
2 i=2
i
n n
!
1X 1 X1
= (i + 1) − − + = (9)
2 i=2 1 i=1 i
n n
1X 1X
= i+ 1 − (Hn − 1) = (10)
2 i=2 2 i=2
1 n−1
= (n − 1)(n + 2) + − (Hn − 1) = (11)
4 2
1
= (n2 + 3n − 4) − (Hn − 1) = (12)
4
1
= (n2 + 3n) − Hn (13)
4
1
≈ (n2 + 3n) − ln n − γ ∈ O(n2 ) (14)
4

2.4.2 Natural behaviour


The algorithm has the best complexity of sorted arrays and the worst for arrays
sorted in reversed order. Thus it has natural behaviour.

2.4.3 Stability
For comparison (line 5) we use a strict operator (<). The algorithm does not swap
items with the same key. This is a stable algorithm.

10
Fast sorting algorithms
Krzysztof Simiński
Algorithms and data structures
lecture 03, 20th March 2020

Simple sorting algorithms compare items in an array and move them into final
positions. Unfortunately these are short distance shifts. Sometimes these algorithms
are called turtle sorting algorithms. This brings an idea to make turtles into rabbits that
jump at long distances in each iteration.

1 Shellsort
Shellsort algorithm is based on simple sorting algorithms. In this approach data in
an array are split into subarrays with gap h. A subarray holds items with indices i, i +
h, i+2h, . . . , i+kh. Subarrays are interleaved. Each subarray is sorted independently
from other subarrays in an array. Sorting of all h-subarrays is called h-sorting.
Example 1. Use Shellsort to sort an array. Use the following sequence of gaps: (. . . , 32,
16, 8, 4, 2, 1).

6 15 12 9 5 8 4 2 3 1

Initial gap is the maximal value from the provided sequence of gaps that does not
exceed the half of length of an array. In our example it is h = 4. Every hth element
belongs to the same subarray. The subarrays are interleaved. The initial value of h makes
each subarray hold at least two elements. We have four subarrays: blue, red, green, and
black.

6 15 12 9 5 8 4 2 3 1

Each series is sorted independently. We can see that some elements (eg. 6, 3, 1, 15) have
been moved to distant positions (rabbits, not turtles).

1
3 1 4 2 5 8 12 9 6 15

The array is not sorted yet, but we can notice that elements are closer to their final po-
sitions than before sorting. The left part of the array is dominated by small values, the
right part – by large values.
Then we take the next h value. In our example it is h = 2. We create new subarrays
and sort them independently.

3 1 4 2 5 8 12 9 6 15

Please notice that new subarrays are almost sorted. In the green subarray two elements
need swapping. The red subarray is sorted.

3 1 4 2 5 8 6 9 12 15

We take the next value h = 1. All elements are in the same subarray and we just sort the
array.

3 1 4 2 5 8 6 9 12 15

Final sorted array:

1 3 2 4 5 6 8 9 12 15

We have to ask a very important question: Is there any sense in shellsort? First
we run h-sorts and finally we always sort with h = 1. But 1-sort is just a sort and it
is enough to run 1-sort to get a sorted array without all previous h-sorts. Thus, why
do we sort subarrays since we always run a typical sort as the last h-sort?
To answer this question we have to recall a natural behaviour of sorting algorithms.
To make shellsort useful we have to use a natural sorting algorithm. Such algorithm
has lower complexity and execution time for sorted arrays. In shellsort each h-sorting
makes an array almost sorted (not sorted, but almost sorted) and only a fewer more
swaps are needed. In such a case a natural sorting algorithm runs faster than for a
random array. This is why in shellsort we use insertion sort and do not use selection
sort.

2
1 procedure shellsort
2 h ← initial gap ;
3 while h > 0 do
4 f o r i ← 0 to h − 1 do
5 sort subarray
hA [ i ] , A [ i + h ] , A [ i + 2 h ] , . . . , A [ i + ki h ] i ;
6 end f o r ;
7 h ← next value (h) ;
8 end while ;
9 end procedure ;

Figure 1: Shellsort

Problem 1. Is it possible to find such an initial permutation that after the first h-sorting
elements are more distant from their final positions than before sorting?

1.1 Computational complexity


Computational complexity of shellsort depends heavily on a gap sequence:
• . . . , n2 , n4 , n8 , 1: pessimistic complexity Tpes (n) ∈ O(n2 )

• . . . , 364, 121, 40, 13, 4, 1: Tpes (n) ∈ O(n3/2 )


• . . . , 1073, 281, 77, 23, 8, 1: Tpes (n) ∈ O(n4/3 )
• . . . , 36, 24, 16, 27, 18, 12, 8, 9, 6, 4, 3, 2, 1: Tpes (n) ∈ O(n(log n)2 ), it is a very
interesting sequence – it is not a descending sequence!

Unfortunately the best sequence is not knowen. We do not know if there exists a
sequence for Tpes (n) ∈ O(n log n). The analysis of average complexity is even more
complicated. Experiments show that shellsort is faster than simple sorting algorithms,
but its complexity is higher than O(n log n).

1.2 Stability
Shellsort is not stable. The same keys may be in various subarrays and are moved
independently. Thus their relevant position may be swapped.

1.3 Natural behaviour


Shellsort has natural behaviour because it is based on a natural sorting algorithm.

2 Quicksort
Quicksort is a recursive algorithm. It is an example of the “divide and conquer”
approach. This approach divides a problem into subproblems, solves each subproblem,
and merges solutions of subproblems into a final solution.
The idea of quicksort is to split an array into two parts (subarrays): left with values
less than a pivot and right with element greater that a pivot. Pivot is just an element of

3
1 procedure quicksort ( l , r ) ;
2 i f l < r then
3 pivot ← A [ l ] ; // choose pivot
4 s ← l;
5 f o r i ← l + 1 to r do // rearrage element of an array
6 i f A [ i ] < pivot then
7 s ← s + 1;
8 swap ( A [ s ] , A [ i ] ) ;
9 end i f ;
10 end f o r ;
11 swap ( A [ s ] , A [ l ] ) ; // put pivot in its final position
12 quicksort ( l , s − 1 ) ; // sort left subarray
13 quicksort ( s + 1 , r ) ; // sort right subarray
14 end i f ;
15 end procedure ;

Figure 2: Quicksort

an array chosen before the split (we discuss how to choose pivot in sec. 2.4). Subarrays
are not sorted. Then the algorithm is run recursively for both subarrays independently
(Fig. 2).
Example 2. Use quicksort for an array:

6 15 12 9 5 8 4 2 3 1

The implementation in Fig. 2 chooses the first element as a pivot. In our example it is 6.

pivot = 6 6 15 12 9 5 8 4 2 3 1

We search for an element less than the pivot at its right (lines 5-10).

pivot = 6 6 15 12 9 5 8 4 2 3 1

s i

pivot = 6 5 15 12 9 6 8 4 2 3 1

s i

pivot = 6 5 4 12 9 6 8 15 2 3 1

s i

4
pivot = 6 5 4 2 9 6 8 15 12 3 1

s i

pivot = 6 5 4 2 3 6 8 15 12 9 1

s i

pivot = 6 5 4 2 3 1 8 15 12 9 6

s i

And now we put the pivot in its final position (line 11).

pivot = 6 5 4 2 3 1 6 15 12 9 8

s i

We get an array split into three parts: the left with items less than the pivot, the pivot,
and the right part with element not less than pivot.

<6 >6

5 4 2 3 1 6 15 12 9 8

The pivot is in its final position. It will not be moved from this position. The algorithm is
run for the left and the right parts independently.

2.1 Computational complexity


Comparison of elements is a dominant operation. The first step of the algorithm
is rearrangement of items. Each element in an array has to be touch: we need n − 1 ∈
O(n) comparisons. In the second step the algorithm is called recursively for left and
right subarrays.

Optimistic complexity In the best case an array is split into halves. In the first
call of the algorithm an array has n elements, in the second calls – n2 , then n4 , n8 ,
16 etc until it is not possible to split it any more. The depth of recursion is O(log n)
n

and in each there are O(n) comparisons. Thus in total the optimistic complexity is
Topt (n) = O(n log n).

5
Pessimistic complexity In the worst case a final position of a pivot is the first (last)
cell of an array. In the recursive call an subarray is one element shorter. The depth of
recursion is O(n). Thus the pessimistic complexity is Tpes (n) = O(n2 ).

Problem 2. In Fig. 2 a pivot is always the first item of an array. Does it mean this
implementation has always pessimistic complexity?

Average complexity Let’s denote average complexity for an n element array with
T n . After rearrangement of an array a pivot may bo located in any position.
For an empty or 1-element array there are no comparisons:

T0 = T1 = 0 (1)

For longer arrays if a pivot’s final position is i, then complexity Tn is:

Tn = (n − 1) + Ti−1 + Tn−i , (2)

where n − 1 is a number of comparisons to rearrange an array. The average complex-


ity:
n
1X
(3)

T n = (n − 1) + T i−1 + T n−i
n i=1
n
X
(4)

nT n = n(n − 1) + T i−1 + T n−i
i=1
Xn n
X
nT n = n(n − 1) + T i−1 + T n−i (5)
i=1 i=1

Let’s calculate two last sums:


n
X n−1
X
T i−1 = T 0 + T 1 + . . . + T n−1 = Ti (6)
i=1 i=0
n
X n−1
X
T n−i = T n−1 + T n−2 + . . . + T 1 + T 0 = Ti (7)
i=1 i=0

And back to (5):


n−1
X
nT n = n(n − 1) + 2 Ti (8)
i=0

Let’s write Eq. (8) for n − 1:


n−2
X
(n − 1)T n−1 = (n − 1)(n − 2) + 2 Ti (9)
i=0

6
Now let’s subtract (9) from (8):
n−1
X n−2
X
nT n − (n − 1)T n−1 = n(n − 1) + 2 T i − (n − 1)(n − 2) − 2 Ti (10)
i=0 i=0
n−1
X n−2
X
nT n − (n − 1)T n−1 = n(n − 1) − (n − 1)(n − 2) + 2 Ti − 2 Ti (11)
i=0 i=0
n−2
X n−2
X
nT n − (n − 1)T n−1 = (n − 1)(n − n + 2) + 2 T i + 2T n−1 − 2 T i (12)
i=0 i=0
nT n − (n − 1)T n−1 = 2(n − 1) + 2T n−1 (13)
nT n = (n + 1)T n−1 + 2(n − 1) (14)

Now we apply a summing factor technique. We multiply both sides of the equation
with summing factor sn 6= 0:

sn nT n = sn (n + 1)T n−1 + 2sn (n − 1) (15)

At the left side we have only index n (ie. sn nT n ), and in the first term at the right side
sn (n + 1)T n−1 we have n and n + 1 and n − 1. Let’s mimic the left side and write
the first term with n − 1 only:

sn nT n = sn−1 (n − 1)T n−1 + 2sn (n − 1) (16)

We want (15) to equal (16) thus:

sn (n + 1) = sn−1 (n − 1) (17)

Let’s calculate the expression:


n−1
sn = · sn−1 (18)
n+1
n−1 n−2
sn = · · sn−2 (19)
n+1 n+0
n−1 n−2 n−3 n−4 3 2 1
sn = · · · · . . . · · · · s1 (20)
n+1 n n−1 n−2 5 4 3
Eq. (20) can be easily simplified.
2
sn = · s1 (21)
(n + 1)n
And back to (16):

sn nT n = sn−1 (n − 1)T n−1 + 2sn (n − 1) (22)


= sn−2 (n − 2)T n−2 + 2sn−1 (n − 2) + 2sn (n − 1) (23)
= sn−3 (n − 3)T n−3 + 2sn−2 (n − 3) + 2sn−1 (n − 2) + 2sn (n − 1) (24)
= s1 (1)T 1 + 2s2 (1) + . . . + 2sn−2 (n − 3) + 2sn−1 (n − 2) + 2sn (n − 1)
(25)
= s0 (0)T 0 + 2s1 (0) + 2s2 (1) + . . . + 2sn−2 (n − 3) + 2sn−1 (n − 2) + 2sn (n − 1)
(26)

7
Use Eq. (1):

sn nT n = 2s1 (0) + 2s2 (1) + . . . + 2sn−2 (n − 3) + 2sn−1 (n − 2) + 2sn (n − 1)

(27)
n
X
sn nT n = 2si (i − 1) (28)
i=1
(29)

and use Eq. (21):


n  
2 X 2
· s1 nT n = 2 s1 (i − 1) (30)
(n + 1)n i=1
(i + 1)i

Because s1 6= 0 we have
n n
1 X 2 X i−1
· Tn = (i − 1) = 2 (31)
n+1 i=1
(i + 1)i i=1
(i + 1)i
n   n n
!
X 2 1 X 2 X 1
=2 − =2 − (32)
i=1
i+1 i i=1
i + 1 i=1 i

Harmonic number Hn is defined as


n
X 1
Hn = , (33)
i=1
i

thus
n
1 X 1
· Tn = 4 − 2Hn (34)
n+1 i=1
i+1
 
1 1 1
= −2Hn + 4 + + ... + (35)
2 3 n+1
 
1 1 1 1 1
= −2Hn + 4 −1 + + + + . . . + + (36)
1 2 3 n n+1
 
1
= −2Hn + 4 −1 + Hn + (37)
n+1
4
= −2Hn − 4 + 4Hn + (38)
n+1
4
= 2Hn − 4 + (39)
n+1

 
4
T n = (n + 1) 2Hn − 4 + (40)
n+1
We know that

lim Hn = ln n + γ, (41)
n→∞

8
where γ ≈ 0.577 . . . stands for the Euler’s constant. Finally
 
4
T n ≈ (n + 1) 2 ln n + 2γ − 4 + (42)
n+1
∈ O(n log n) (43)

Space complexity Each recursive call requires O(1) extra space (for temporary
variables). In average there are O(log n) recursive calls at the same time. Thus the
average space complexity is T p (n) ∈ O(log n).

2.2 Stability
The algorithm groups elements less (greater) than a pivot in the left (right) subar-
ray in an arbitrary way. It is not a stable algorithm.

2.3 Natural behaviour


For a sorted array the implementation in Fig. 2 chooses the minimal item as a
pivot. This leads to the pessimistic complexity. The algorithm does not have a natural
behaviour.

2.4 Pivot selection


The analysis of computational complexity of the algorithms shows that pivot se-
lection is crucial. A pivot should be a median in an array. After rearrangement its
final position should be more or less in the middle of an array. So we need special
algorithms for pivot selection. But if pivot selection is too complicated it may take
too long time to run.
In practice common approaches are:
• random choice,
• first item in an arrays,
• middle item in an array,

• median of three random items,


• median of three medians of three (random) items.
In each case the pessimistic complexity is the same: T (n) ∈ O(n log n), but we reduce
the n2 coefficient.

3 Selection of the kth smallest element


Selection of the first smallest element is just a minimum search. Selection of the
2 th is a median search.
n+1

The naïve solution is just sorting of an array. But we can select kth element faster.
Sorting place all element in their correct locations. But we need only the kth smallest
element in its correct location. We adopt quicksort for selection of kth smallest ele-
ment. After each rearrangement of elements a pivot is in its final position (line 14 in

9
1 procedure select ( l , r , k ) ;
2 i f l = r then
3 return A [ l ] ;
4 end i f ;
5

6 pivot ← A [ l ] ;
7 s ← l;
8 f o r i ← l + 1 to r do
9 i f A [ i ] < pivot then
10 s ← s + 1;
11 swap ( A [ s ] , A [ i ] ) ;
12 end i f ;
13 end f o r ;
14 swap ( A [ s ] , A [ l ] ) ;
15

16 i f s = k then // kth element found


17 return A [ s ] ;
18 e l s e i f s > k then
19 select ( l , s − 1 , k ) ; // search left subarray or . . .
20 else
21 select ( s + 1 , r , k ) ; // . . . right subarray
22 end i f ;
23 end procedure ;

Figure 3: Selection of kth smallest element

Fig. 3). If a pivot is in kth cell, the kth element is found. If a pivot’s index is greater
than k we run the algorithm only for the left subarray (line 19), otherwise for the right
one (line 21).

3.1 Computational complexity


Computation complexity depends on a array split.

Optimistic complexity In the best complexity case an array is always split into
halves. Let’s sum all comparison. For simplicity let’s assume n = 2k .
n n
Topt (n) = n − 1 + − 1 + − 1 + ... + 2 − 1 + 1 − 1 = (44)
2 4
n n n n n
= 0 − 1 + 1 − 1 + 2 − 1 + . . . + log n−1 − 1 + log n − 1 = (45)
2 2 2 2 2 2 2
log2 n  log2 n log2 n log2 n log2 n
X n  X 1 X X 1 X
= i
− 1 = n i
− 1 = n i
− 1= (46)
i=0
2 i=0
2 i=0 i=0
2 i=0
log2 n log2 n  log2 n !
1 − 21 X 1
=n − 1 = 2n 1 − − (log2 n + 1) =
1 − 12 i=0
2
(47)
= 2n − 2 − log2 n − 1 ∈ O(n) (48)

10
Optimistic complexity The worst case is the same as the worst case of quicksort.

n(n − 1)
Tpes (n) = (n − 1) + (n − 2) + . . . + 2 + 1 = ∈ O(n2 ) (49)
2

Average complexity Calculation of average complexity is very similar to that of


quicksort. For 1 element array there are no comparisons:

T1 = 0. (50)

For n element array and a pivot in ith position after rearragement we have a recursive
equation:
i−1 n−i
Tn = (n + 1) + Ti−1 + Tn−1 , (51)
n n
where i−1n is probability that the algorithm is called for the left subarray, and
n−i
n –
for the right subarray. The average complexity is:
n  
1X i−1 n−i
Tn = (n + 1) + T i−1 + T n−1 . (52)
n i=1 n n

Problem 3. Use the same approach as used for average complexity of quicksort to show
that average time complexity of the kthe smallest element search is

T n ∈ O(n). (53)

11
Binary search trees
Krzysztof Simiński

Algorithms and data structures


lecture 04, 27th March 2020

A tree is a particular type of a graph. A tree is commonly used as a dynamic data


structure. A tree consists of nodes (Fig. 1). In our lecture we discuss binary trees. In a
binary tree a node has at most two children. Each tree node has a pointer (a reference
– it depends on implementation) to its children. We also add a pointer to a parent. It
is not obligatory. Trees with nodes without references to parent are also known and
used.

1 Binary search trees


A binary search tree is a particular type a binary tree. In a binary search tree we
have to define a relation < (or 6, >, >) for nodes, because it is a sorted binary tree.
We can choose any of the relations, but we have to keep to this decision consequently.
In our lecture we use < relation.
A binary search tree is a rooted tree (a root has no parent). Nodes store references
to left and right children. Children are subtrees. Subtrees may be empty. A binary
search tree satisfies the binary search property: for each node N values stored in a
left subtree are less then a value in N and values stored in a right subtree and greater
or equal to a value in N (Fig. 2).
Basic operations in binary search trees are search, insert, and delete. They can be
implemented iteratively or recursively.

1.1 Iterative algorithms


In pseudocodes nil stands for a null reference (pointer) to a parent or a child. Fig. 3
presents an iterative algorithm for a value search. Fig. 4 and Fig. 5 present special cases
of a search procedure for minimum and maximum values respectively. Fig. 6 presents
a pseudocode for insertion of a value into a binary search tree. A new node is never

1 record node of
2 value : type ; // stored value
3 left , right : ^ node ; // pointers (references) to children
4 parent : ^ node ; // pointer (reference) to parent
5 end of record

Figure 1: A tree node.

1
20
< 20 > 20

10 27
< 10 > 10 < 27 > 27

2 17 25 32
< 17 < 32

12 28
> 28

30

Figure 2: Example of a binary search tree.

1 procedure find ( root , searched )


2 begin
3 x ← root ;
4 while x 6= n i l and searched 6= x . value do
5 i f searched < x . value then
6 x ← x . left ;
7 else
8 x ← x . right ;
9 end i f ;
10 end while ;
11 return x ;
12 end procedure ;

Figure 3: Iterative search in a binary search tree.

inserted between nodes. It always substitutes a nil value in one of nodes with respect
to the binary search property.
Removal of an node requires some explanation. We have three cases.

• A node to be removed has no children. It is the simplest case. We have to find


its parent, test if a node to be removed is a left or right child and simply remove
it.
Example 1. Let’s remove node 12 from the binary search tree below. We only
have to find the parent of node 12 – it is 17, assign nil to the reference to 12,
and remove 12.
20 20

10 ... 10 ...

2 17 2 17

12 12

2
1 procedure minimum ( root )
2 begin
3 x ← root ;
4 while x . left 6= n i l do
5 x ← x . left ;
6 end while ;
7 return x ;
8 end procedure ;

Figure 4: Minimum search in a binary search tree.

1 procedure maximum ( root )


2 begin
3 x ← root ;
4 while x . right 6= n i l do
5 x ← x . right ;
6 end while ;
7 return x ;
8 end procedure ;

Figure 5: Maximum search in a binary search tree.

(finis)

• A node to be removed has one child. We have to find its parent, test if a node
to be removed is a left or right child and modify the reference to the node to be
removed. The reference now points an only child of a node to remove. Now we
only need to remove the node to remove.
Example 2. Let’s remove node 17 from the binary search tree below. We have
to find the parent of node 17 – it is 10, move its right child reference from 17 to
12.
20 20

10 ... 10 ...

2 17 2 17

12 12
(finis)

• A node to be removed has two children. This case is more complicated.


– First we have to find a successor of a node to be removed. A successor of
a node is the next node in the sorted sequence of nodes.
– Overwrite value of the node to remove with its successor.

3
1 procedure insert ( root , to_add )
2 begin
3 y ← nil ;
4 x ← root ;
5

6 // new node:
7 new_node ← new node ;
8 new_node . left ← new_node . right ← n i l ;
9 new_node . value ← to_add ;
10

11 // find a leaf that will be a parent of a new node:


12 while x 6= n i l do
13 y ← x;
14 i f new_node . value < x . value then
15 x ← x . left ;
16 else
17 x ← x . right ;
18 end i f
19 end while
20

21 new_node . parent ← y ;
22 i f y = n i l then
23 root ← new_node ;
24 e l s e i f new_node . value < y . value then
25 y . left ← new_node ;
26 else
27 y . right ← new_node ;
28 end i f
29 end procedure ;

Figure 6: Insertion into a binary search tree.

4
– Remove original successor (it has zero or one child.)
Example 3. Let’s remove node 10 from the binary search tree below. We have
to find the successor of node 10 – it is 12.
20

10 ...

We want to remove 10. Its successor is


2 17
12.

12

14

20

12 ...

Overwrite value of 10 with 12 (10 is ac-


2 17
tually removed).

12

14

20

12 ...

2 17 Remove original 12.

12

14
(finis)

For removal of a node we need to find its successor in a binary search tree. A
successor of a node is the next node in a sorted sequence of nodes in a tree. Successor
search is presented in Fig. 9. The algorithm handles two cases.
• If a predecessor (a successor of which we search for) has a right subtree, a min-
imum in a right subtree is the searched successor.
Example 4. A successor of 10 is a minimum of 10’s right subtree. A successor
of 27 is a minimum of 27’s right subtree. (finis)

• If a predecessor has no right subtree, we have to go up for a successor. We go


up until we go up and left. If we go up and right, we get to a successor.

5
1 procedure remove ( root , to_remove )
2 begin
3 v ← find ( root , to_remove ) ;
4 i f v . left = n i l or v . right = n i l then
5 y ← v ; // the node has one or no child
6 else
7 y ← successor ( v ) ; // the node has two children
8 end i f
9 i f y . left 6= n i l then
10 x ← y . left ;
11 else
12 x ← y . right ;
13 end i f
14 i f x 6= n i l then // the node had one child
15 x . parent ← y . parent ;
16 end i f
17 i f y . parent = n i l then // we remove root
18 root ← x ;
19 e l s e i f y = y . parent . left then // cut the node out
20 y . parent . left ← x ;
21 else
22 y . parent . right ← x ;
23 end i f
24 i f y 6= v then // move value
25 v . value ← y . value ;
26 end i f
27 returny y ;
28 end procedure ;

Figure 7: Removal of a value from a binary search tree.

27

20 32

10 25 28

2 17 30

12 19

Figure 8: A binary search tree. Dashed arrows denote successors of nodes. Node 32
has no successor.

6
1 procedure successor ( predecessor )
2 begin
3 x ← predecessor ;
4 i f x . right 6= n i l then // search in the right subtree
5 r e t u r n minimum ( x . right ) ;
6 end i f
7

8 // there is no right subtree


9 y ← x . parent ;
10 while y 6= n i l and x = y . right do
11 x ← y;
12 y ← y . parent ;
13 end while ;
14

15 return y ;
16 end procedure ;

Figure 9: Successor search in a binary search tree.

Example 5. Node 19 has no right subtree. We have to go up. We go up left to


17, we go up left further to 10. Then we go up right to 20. 20 is a successor of
19.
Node 30 has no right subtree. We have to go up. We go up left to 28. Then we
go up right to 32 and 32 is a successor of 30. (finis)

7
1.2 Recursive algorithms
Each subtree of a tree is also a valid binary search tree. This is why often recursive
algorithms are used for binary search trees.

1 procedure find ( root , searched )


2 begin
3 i f root = n i l or root . value = searched then
4 r e t u r n root ;
5 end i f
6 i f searched < root . value then
7 r e t u r n find ( root . left , searched ) ;
8 else
9 r e t u r n find ( root . right , searched ) ;
10 end i f
11 end procedure ;

Figure 10: Recursive search in a binary search tree.

1 procedure minimum ( root )


2 begin
3 i f root = n i l then
4 return n i l ;
5 end i f
6

7 i f root . left 6= n i l then


8 r e t u r n minimum ( root . left ) ;
9 else
10 r e t u r n root . value ;
11 end i f
12 end procedure ;

Figure 11: Minimum search in a binary search tree.

8
1 procedure maximum ( root )
2 begin
3 i f root = n i l then
4 return n i l ;
5 end i f
6

7 i f root . right 6= n i l then


8 r e t u r n maximum ( root . right ) ;
9 else
10 r e t u r n root . value ;
11 end i f
12 end procedure ;

Figure 12: Maximum search in a binary search tree.

1 procedure print ( root )


2 begin
3 i f root 6= n i l then
4 print ( root . left ) ;
5 write ( root . value ) ;
6 print ( root . right ) ;
7 end i f
8 end procedure ;

Figure 13: Printing of values stored in a binary search tree.

1 procedure insert ( root , to_add )


2 begin
3 i f root = n i l then // empty tree
4 new_node ← new node ;
5 new_node . left ← new_node . right ← n i l ;
6 new_node . value ← to_add ;
7 r e t u r n new_node ;
8 e l s e // non empty tree
9 i f to_add < root . value then // go left
10 root . left ← insert ( root . left , to_add ) ;
11 e l s e // go right
12 root . right ← insert ( root . right , to_add ) ;
13 end i f
14 end i f
15 r e t u r n root ;
16 end procedure ;

Figure 14: Insertion into a binary search tree.

9
20 = 1

21 = 2
h = log n

22 = 4

23 = 8

Figure 15: A balanced binary search tree.

Table 1: Complexity of operations in a binary search tree.


complexity
operation average worst
search O(log n) O(n)
insert O(log n) O(n)
delete O(log n) O(n)

1.3 Computation complexity for binary search trees


Definition 6. A length of a path in a tree is a number of edges in a path.
Definition 7. A height of a tree is a length of a longest path between a root (a node
without parent) and a leaf (a node without children).

Definition 8. A balanced tree is a tree in which a distance between a root and any leaf
is the same.
Theorem 1. A balanced tree with height h has n = 2h+1 − 1 nodes.
Problem 9. Prove theorem 1. 

Operations in a balanced binary tree has low computational complexity – it is


linear with regard to tree’s height, thus is is logarithmic with regard to a number of
nodes (Tab. 1). If a tree degenerates to a list, complexity of operations is linear with
regard to a number of nodes.

10
33

15 47

10 20 38 51

5 18 36 39

Figure 16: Example of a red-black tree. Small black square denote nil references.

1 record node of
2 value : typr ; // stored value
3 left , right : node ; // pointers (references) to children
4 parent : node ; // pointer (reference) to parent
5 colour : enum ( red , black ) ; // node colour
6 end of record

Figure 17: A red-black tree node.

2 Red-black trees
Binary search trees have low complexity of operations if they are balanced. Several
types of self-balancing trees have been proposed. One of them are red-black trees.
In red-black trees each node has one more field: colour (red or black) – Fig. 17. A
red-black tree is a binary search tree satisfying:
• Each node is red or black.

• Each null reference (nil) is black. It is treated as a black leaf.


• If a node is red, both its children are black.
• Each path from a root to any leaf has the same number of black nodes. It is the
black height of a red-black tree.

2.1 Rotations
Rotations are operations used in red-black trees (and in many self-balancing trees).
Rotations do not violate the binary search property of trees. General idea of left and
right rotations is presented in Fig. 19, an example – in Fig. 20, and pseudocode for a
left rotation in Fig. 18.

11
1 procedure left_rotation ( root , x )
2 begin
3 y ← x . right ;
4 x . right ← y . left ;
5 i f y . left 6= n i l then
6 y . left . parent ← x ;
7 end i f
8 y . parent ← x . parent ;
9 i f x . parent = n i l then
10 root ← y ;
11 e l s e i f x = x . parent . left then
12 x . parent . left ← y ;
13 else
14 x . parent . right ← y ;
15 end i f
16 y . left ← x ;
17 x . parent ← y ;
18 end .

Figure 18: Left rotation in a binary search tree.

Problem 10. Write a pseudocode or a right rotation in a binary search tree. 

12
p p

B A
right rotation

A B
γ α

α β β γ
left rotation

Figure 19: Rotations in a binary search trees. Triangles (α, β, γ) stand for subtrees.
Subtrees may be empty (nil).

right_rotation(root, y);

root root

y 12 x 12

8 4

4 9 15 1 8 15

1 7 11 7 9

11

left_rotation(root, x);

Figure 20: Example of right_rotation and left_rotation.

13
C C new x

A D A D

10 B x 28 30 10 B 28 30

Figure 21: Case 1a.

C C

A D B D

10 B x x A B

Figure 22: Case 1b transformed into case 1c.

2.2 Insertion
A new node is inserted as a binary search tree. A new node is always red. If a new
red node has a red parent, the tree should be transformed to restore the red-black tree
properties.

Definition 11. A sibling of a node is a node that has the same parent.
Definition 12. An uncle of a node is a sibling of node’s parent.

1. Analyse an uncle node that is a right sibling of a parent.


(a) A new node has a red uncle (line 13).
(b) A new node is a right son and has a black uncle (line 19).
(c) A new node is a left son and has a black uncle (line 24).

2. (a case symmetrical to 1) Analyse an uncle node that is a left sibling of a parent.


(a) (a case symmetrical to 1a) A new node has a red uncle.
(b) (a case symmetrical to 1b) A new node is a left son and has a black uncle.
(c) (a case symmetrical to 1c) A new node is a right son and has a black uncle.

Problem 13. Insert into an empty red-black tree values: 30, 20, 10, 15, 16, 5, 8, 12, 13,
9. 
Problem 14. How many rotations are needed to restore properties of a red-black
trees after insertion of a new node? 

14
C

B D B

x A B A C

10 28
B 30

Figure 23: Case 1c.

1 procedure insert ( root , to_add )


2 begin
3 // add as to a binary search tree
4 x ← insert ( root , to_add ) ; // x points to a new node
5 x . colour ← red ; // new node is always red
6

7 // restore features of a red-black tree


8 while x 6= root and x . parent . colour = red do
9 i f x . parent = x . parent . parent . left then
10 // case 1
11 uncle ← x . parent . parent . right ;
12 i f uncle . colour = red then
13 // both x and uncle are red – case 1a
14 x . parent . colour ← black ;
15 uncle . colour ← black ;
16 x . parent . parent . colour ← red ;
17 x ← x . parent . parent ;
18 e l s e // cases 1b and 1c
19 // case 1b
20 i f x = x . parent . right then
21 x ← x . parent ;
22 let_rotation ( root , x ) ;
23 end i f // case 1b transformed into case 1c
24 // case 1c
25 x . parent . colour ← black ;
26 x . parent . parent . colour ← red ;
27 right_rotation ( root , x . parent . parent ) ;
28 end i f
29 e l s e // case 2
30 // repeat lines 10-28 with «left» and «right» swapped
31 end i f
32 end while
33 root . colour ← black ;
34 end procedure ;

Figure 24: Node insertion into a red-black tree.

15
2.3 Removal
If a removed node is red, the properties of a red-black tree are not violated. The
problem is if a black node is removed.
In line 4 two references are set:
0. if a removed node had no children:
• y points to a removed node (that is not in a tree any more),
• x = nil.

1. if a removed node had one child:


• y points to a removed node (that is not in a tree any more),
• x points to the child.
2. if a removed node had two children: The value (not colour!) of a successor of a
node to remove is copied to a node to remove. Then the successor is removed
from a tree.
• y points to a successor of a removed node (y is a not a tree any more),
• x points to a child of a successor.

If
• x.colour = red, then change its colour to black.
• x.colour = black, then
1. node x is a left child of its parent, then it has a right sibling
(a) sibling is red (line 13)
(b) sibling is black and both children as well (line 21)
(c) sibling is black and its left child is red and right child is black (line
26)
(d) sibling is black and its right child is red (line 33)
2. (case symmetrical to 1) węzeł x jest prawym synem swojego ojca, zatem
ma lewego brata
(a) (case symmetrical to 1a) sibling is red,
(b) (case symmetrical to 1b) sibling is black and its both children as well,
(c) (case symmetrical to 1c) sibling is black and its right child is red and
left child is black,
(d) (case symmetrical to 1d) sibling is black and its left child is red.

Problem 15. Draw cases 1a, 1b, 1c, and 1d. 

Problem 16. Remove values: 5, 15, 20, 16, 30 from the tree from Problem 13. 
Problem 17. How many rotations are needed to restore properties of a red-black
trees after removal of a node? 

16
1 procedure remove ( root , to_remove )
2 begin
3 // remove as from a binary search tree
4 ( x , y ) ← remove ( root , to_remove ) ;
5

6 // restore features of a red-black tree


7 i f y . colour = black do
8 while x 6= root and x . colour = black do
9 i f x = x . parent . left then
10 // cases 1
11 sibling ← x . parent . right ;
12 i f sibling . colour = red then
13 // case 1a
14 sibling . colour ← black ;
15 x . parent . colour ← red ;
16 left_rotation ( root , x . parent ) ;
17 sibling ← x . parent . right ;
18 // case 1a transformed into other case
19 end i f
20 i f sibling . left . colour = black and sibling .
right . colour = black then
21 // case 1b
22 sibling . colour ← red ;
23 x ← x . parent ;
24 e l s e // cases 1c and 1d
25 i f sibling . right . colour = black then
26 // case 1c
27 sibling . left . colour ← black ;
28 sibling . colour ← red ;
29 right_rotation ( root , sibling ) ;
30 sibling ← x . parent . right ;
31 // case 1c transformed into case 1d
32 end i f
33 // case 1d
34 sibling . colour ← x . parent . colour ;
35 x . parent . colour ← black ;
36 sibling . right . colour ← black ;
37 left_rotation ( root , x . parent ) ;
38 x ← root ;
39 end i f
40 else
41 // repeat lines 10-39 with «left» and «right» swapped
42 end i f
43 end while
44 x . colour ← black ;
45 end i f
46 end .

Figure 25: Removal from a red-black tree.

17
Table 2: Complexity of operations in a red-black tree.
complexity
operation average and worst
search O(log n)
insert O(log n)
delete O(log n)

2.4 Computation complexity for red-black trees


Red-black trees are not balanced trees. But they are almost balanced. Black height
of all leaves is exactly the same.
Problem 18. Let’s assume a black height of in a red-black tree is l. What is the min-
imal and maximal number of nodes in a path from the root to leaves? 
The same black height of all leaves is enough for a red-black tree to have logar-
ithmic average and worst complexity for search, insert, and delete (Fig. 2).
Red-black trees are commonly used to implement associative data containers (eg.
std :: map in C++ SLT library).

18
Hash tables
Krzysztof Simiński
Algorithms and data structures
lecture 04, 3rd April 2020

Let’s compare two data structures: an array (a vector) and a balanced binary search
tree.
In an array the indices are consecutive numbers (there are na gaps). It is very easy
to calculate an address of an item we search for. Access to item is very fast: O(1).
Unfortunately if we would like to store items with indices −10, 3, 7, and 64, we have
to allocate a consistent block of memory for all indices in interval [−10, 64]. Cells in
the allocated array are mostly unused. This is a waste of memory. So it is better to
use a balanced binary tree. We can very easily implement a container for any set of
indices without memory waste. This solution has a disadvantage: access to an item
takes longer time than in arrays – it is O(log n).
Let’s join the advantages of these two approaches: non consecutive set of indices
and fast access. These are the features of hash tables.
A hash table is just an array. But the keys of items are not used indirectly as indices
in the array. First keys are hashed. A hashing function is a function that takes an key
and returns an index in a hash array. Keys may be completely different in nature than
indices.
Example 1. Our hash table is based on a 7-element array:
0 1 2 3 4 5 6

The hash function is:

h(x) = x mod 7. (1)

Let’s insert values: −10, 6, 8, 20, 35, 40, 72.

h(−10) = −10 mod 7 ≡ −10 + 7 ≡ 4


h(6) = 6 mod 7 ≡ 6
h(8) = 8 mod 7 ≡ 1
h(20) = 20 mod 7 ≡ 6
h(35) = 35 mod 7 ≡ 0
h(40) = 40 mod 7 ≡ 5
h(72) = 72 mod 7 ≡ 3

And finally the filled array:

1
w bits

× bA · 2w c

r1 r0
p bits
h(k)

Figure 1: Product hashing. w-bit representation of key k is multiplied by w-bit number


bA · 2w c, where 0 < A < 1 is a constant. p most significant bits of lower half of w-bit
of the product is a searched hash value h(k).

0 1 2 3 4 5 6

35 8 20 72 −10 40 6

(finis)

1 Hash function

h : K → U, (2)
where K – set of keys, U – set of indices in a hash table.
Features of a good hash function:
1. It is easy to compute.
2. For similar keys returns dissimilar indices.

3. Returns indices with the uniform distributions (all indices have the same prob-
ability).

1.1 Modular hashing


It is a very simple hashing method. Indices are calculated with formula (k stands
for a key value, m – for array size)

h(k) = k mod m. (3)

We use this technique in Example 1.

1.2 Product hashing


It is a very interesting hashing technique. Index values are calculated with for-
mula:
h(k) = bm (kA − bkAc)c . (4)

2
This formula can be implemented in a very fast way as register operations (Fig. 1).
The method works for any value of constant A. Knuth proves that

5−1
A= ≈ 0.6180339887 . . . (5)
2
produces good hashing functions.

Example 2. Let k = 123456, m = 10000, A = 5−1
2

h(k) = b10000 · (123456 · 0.6180339887 . . . − b123456 · 0.6180339887 . . .c)c =


= b10000 · (76300.0041151 . . . − b76300.0041151 . . .c)c =
= b10000 · 0.0041151 . . .c = b41.151 . . .c = 41. (6)

Let’s calculate a hash value for the next key k + 1 = 123457:

h(k + 1) = b10000 · (123457 · 0.6180339887 . . . − b123457 · 0.6180339887 . . .c)c =


= b10000 · (76300.6221429359 . . . − b76300.6221429359 . . .c)c =
= b10000 · 0.6221429359 . . .c = b6221.429359 . . .c = 6221. (7)

The function is easy to compute and for similar keys returns dissimilar indices. (finis)

2 Conflicts
Unfortunately cardinality of a set of keys may be significantly larger than car-
dinality of a set of indices. In such a situation it is impossible to fit each key into a
different cell in an array. If a hash function returns the same index for two (or more)
different keys, we have a conflict.

2.1 Open hashing – closed addressing


This technique is also called chaining. In this technique conflicts are resolved with
lists in array cells (buckets).
Example 3. Let’s insert values 70, −6, 40, 35, 19, 5, 63 into a hash table with hash
function h(x) = x mod 7.

h(70) = h(63) = h(35) = 0


h(−6) = 1
h(40) = h(19) = h(5) = 5

3
1 procedure hash_chaining_insert ( x )
2 begin
3 k ← h(x) ;
4 insert x at the beginning of the list starting at A [ k ] ;
5 end procedure .

Figure 2: Pseudocode for inserting in chaining hashing.

1 procedure hash_chaining_search ( x )
2 begin
3 k ← h(x) ;
4 i f x is in the list starting at A [ k ] then
5 return true ;
6 else
7 return f a l s e ;
8 end i f ;
9 end procedure .

Figure 3: Pseudocode for searching in chaining hashing.

5 5 19 40

1 −6

0 63 35 70

(finis)

Time complexities
• Assuming a good hash function (uniformly distributing) time complexity de-
pends on an average length of the lists.
• When a size of hash table is a and the number of elements is n, the time com-
plexities of searching and removing is O 1 + na .
• The time complexity for insertion is O(1).

4
1 procedure hash_chaining_remove ( x )
2 begin
3 k ← h(x) ;
4 i f x is in the list starting at A [ k ] then
5 remove x ;
6 end i f ;
7 end procedure .

Figure 4: Pseudocode for removing in chaining hashing.

• Note: When n = c × a for some small c, e.g., c < 3, all dictionary operations
work in O(1) average time!

2.2 Closed hashing – open addressing


In this technique conflict are not resolved with lists. If it is impossible to insert an
item, because the cell is already taken, an alternative location is searched for – this
approach is called probing.

2.2.1 Linear probing


If a position is not empty, just try the next position. For a linear probing we define
a family (set) of hash functions in a recursive way:
• h(x, 0) — basic hash function (same as in a chaining method)
• h(x, i) — hash functions defined for i = 1, 2, . . . , m − 1, where m is the size of
a hash table


h(x, 0) = . . .
h(x, i) = (h(x, 0) + i) mod m, for 1 6 i 6 m − 1
Example 4. Use hash function

h(x, 0) = x mod 8
h(x, i) = (h(x, 0) + 3i) mod 8 for 1 6 i 6 7

to insert values: 57, 14, 18, 8, 111, 87, 25, 33 in to a hash table.

h(57, 0) = 57 ≡ 1 mod 8 (8)


h(14, 0) = 14 ≡ 6 mod 8 (9)
h(18, 0) = 18 ≡ 2 mod 8 (10)
h(111, 0) = 111 ≡ 7 mod 8 (11)
h(87, 0) = 87 ≡ 7 mod 8 (12)

0 1 2 3 4 5 6 7

57 18 14 111

5
Unfortunately for 87 we have two conflicts:

h(87, 0) = 87 ≡ 7 mod 8 (13)


h(87, 1) = (h(87, 0) + 3 × 1) ≡ 10 ≡ 2 mod 8 (14)
h(87, 2) = (h(87, 0) + 3 × 2) ≡ 13 ≡ 5 mod 8 (15)

0 1 2 3 4 5 6 7

57 18 87 14 111

h(25, 0) = 25 ≡ 1 mod 8 (16)


h(25, 1) = (h(25, 0) + 3 × 1) ≡ 4 mod 8 (17)

0 1 2 3 4 5 6 7

57 18 25 87 14 111

h(33, 0) = 33 ≡ 1 mod 8 (18)


h(33, 1) = (h(33, 0) + 3 × 1) ≡ 1 + 3 ≡ 4 mod 8 (19)
h(33, 2) = (h(33, 0) + 3 × 2) ≡ 1 + 6 ≡ 7 mod 8 (20)
h(33, 3) = (h(33, 0) + 3 × 3) ≡ 1 + 1 ≡ 2 mod 8 (21)
h(33, 4) = (h(33, 0) + 3 × 4) ≡ 1 + 4 ≡ 5 mod 8 (22)
h(33, 5) = (h(33, 0) + 3 × 5) ≡ 1 + 7 ≡ 0 mod 8 (23)

0 1 2 3 4 5 6 7

33 57 18 25 87 14 111

(finis)

Problems
• Linear probing tends to group values in clusters.

• When h(x, 0) points into large group it is necessary to check a lot of non-empty
positions to localize an element or an empty cell.

2.3 Quadratic probing


Linear probing solves conflicts with grouping conflicting values in clusters. This
is not a good feature. We would like values to be dispersed in a hash table. Quadratic
probing is a technique that tries to disperse values in a table.

6
1 procedure hash_probing_linear_insert ( x )
2 begin
3 f o r i ← 0 to m − 1 do
4 k ← h(x , i) ;
5 i f A [ k ] is empty or removed then
6 A[k] ← x ;
7 return true ;
8 end i f ;
9 end f o r ;
10 return f a l s e ;
11 end procedure .

Figure 5: Pseudocode for inserting with linear probing.

1 procedure hash_probing_linear_search ( x )
2 begin
3 f o r i ← 0 to m − 1 do
4 k ← h(x , i) ;
5 i f A [ k ] = x then
6 return true ;
7 e l s e i f A [ k ] is empty then
8 return f a l s e ;
9 end i f ;
10 end f o r ;
11 return f a l s e ;
12 end procedure .

Figure 6: Pseudocode for searching with linear probing.

1 procedure hash_probing_linear_remove ( x )
2 begin
3 f o r i ← 0 to m − 1 do
4 k ← h(x , i) ;
5 i f A [ k ] = x then
6 A [ k ] ← empty ;
7 return true ;
8 e l s e i f A [ k ] is empty then
9 return f a l s e ;
10 end i f ;
11 end f o r ;
12 return f a l s e ;
13 end procedure .

Figure 7: Pseudocode for removing with linear probing.

7
(
h(x, 0) = . . .
for 1 6 i 6 m − 1

h(x, i) = h(x, 0) + c1 i + c2 i2 mod m
It is important to choose constants c1 and c2 in such a way that h(x, i) for any i =
0, 1, . . . , m − 1 returns different values.
We do not present pseudocodes here, because they are the same as for linear prob-
ing (only a hash function is different).
Example 5. Use hash function
(
h(x, 0) = x mod 8
for 1 6 i 6 m − 1

h(x, i) = h(x, 0) + 2i2 − 5i mod m

to insert values: 57, 21, 18, 5, 123, 87, 25, 33 in to a hash table.

h(57, 0) = 57 ≡ 1 mod 8
h(21, 0) = 21 ≡ 5 mod 8
h(18, 0) = 18 ≡ 2 mod 8

0 1 2 3 4 5 6 7

57 18 21

h(5, 0) = 5 mod 8
h(5, 1) = h(5, 0) + 2 × 12 − 5 × 1 ≡ 5 + 2 − 5 ≡ 2 mod 8


h(5, 2) = h(5, 0) + 2 × 22 − 5 × 2 ≡ 5 + 8 − 10 ≡ 3 mod 8




0 1 2 3 4 5 6 7

57 18 5 21

h(123, 0) = 123 ≡ 3 mod 8


h(123, 1) = h(123, 0) + 2 × 12 − 5 × 1 ≡ 3 + 2 − 5 ≡ 0

mod 8

0 1 2 3 4 5 6 7

123 57 18 5 21

h(87, 0) = 87 ≡ 7 mod 8

8
0 1 2 3 4 5 6 7

123 57 18 5 21 87

h(25, 0) = 25 ≡ 1 mod 8
h(25, 1) = h(25, 0) + 2 × 12 − 5 × 1 ≡ 1 + 2 − 5 ≡ 6

mod 8

0 1 2 3 4 5 6 7

123 57 18 5 21 25 87

h(33, 0) = 33 ≡ 1 mod 8
h(33, 1) = h(33, 0) + 2 × 12 − 5 × 1 ≡ 1 + 2 − 5 ≡ 6 mod 8


h(33, 2) = h(33, 0) + 2 × 22 − 5 × 2 ≡ 1 + 8 − 10 ≡ 7 mod 8




h(33, 3) = h(33, 0) + 2 × 32 − 5 × 3 ≡ 1 + 18 − 15 ≡ 4 mod 8




0 1 2 3 4 5 6 7

123 57 18 5 33 21 25 87

(finis)

2.4 Double hashing


In case of a conflict a second hashing function is used. It may be interpreted as
hashing in hashing. If the first functions returns the same index, the second function
is used to elaborate a skip (in linear probing always 1).

h(x, i) = (h1 (x) + ih2 (x)) mod m for 0 6 i < m


Example 6. Use hash function

h1 (x) = x mod 7

h2 (x) = (2x mod 6) + 1

h(x, i) = (h1 (x) + ih2 (x)) mod 7

to insert values: 9, 16, 8, 2, 23, 5 in to a hash table. Please not that h1 and h2 use
various mod operand values!

h(9, 0) = (h1 (9) + 0 × h2 (9)) ≡ 2 mod 7

9
0 1 2 3 4 5 6

h(16, 0) = (h1 (16) + 0 × h2 (16)) ≡ 2 mod 7


h2 (16) = (2 × 16 mod 6) + 1 = 3
h(16, 1) = (h1 (16) + 1 × h2 (16)) ≡ 2 + 1 × 3 ≡ 5 mod 7

0 1 2 3 4 5 6

9 16

h(8, 0) = (h1 (8) + 0 × h2 (8)) ≡ 1 mod 7

0 1 2 3 4 5 6

8 9 16

h(2, 0) = (h1 (2) + 0 × h2 (2)) ≡ 2 mod 7


h2 (2) = (2 × 2 mod 6) + 1 = 5
h(2, 1) = (h1 (2) + 1 × h2 (2)) ≡ 2 + 1 × 5 ≡ 0 mod 7

0 1 2 3 4 5 6

2 8 9 16

h(23, 0) = (h1 (23) + 0 × h2 (23)) ≡ 2 mod 7


h2 (23) = (2 × 23 mod 6) + 1 = 5
h(23, 1) = (h1 (23) + 1 × h2 (23)) ≡ 2 + 1 × 5 ≡ 0 mod 7
h(23, 2) = (h1 (23) + 2 × h2 (23)) ≡ 2 + 2 × 5 ≡ 5 mod 7
h(23, 3) = (h1 (23) + 3 × h2 (23)) ≡ 2 + 3 × 5 ≡ 3 mod 7

0 1 2 3 4 5 6

2 8 9 23 16

10
h(5, 0) = (h1 (5) + 0 × h2 (5)) ≡ 5 mod 7
h2 (5) = (2 × 5 mod 6) + 1 = 5
h(5, 1) = (h1 (5) + 1 × h2 (5)) ≡ 5 + 1 × 5 ≡ 3 mod 7
h(5, 2) = (h1 (5) + 2 × h2 (5)) ≡ 5 + 2 × 5 ≡ 1 mod 7
h(5, 2) = (h1 (5) + 3 × h2 (5)) ≡ 5 + 3 × 5 ≡ 6 mod 7

0 1 2 3 4 5 6

2 8 9 23 16 5

(finis)

3 Hash table resizing


In chaining method a hash table is resized if number of data items n > 2a (where
a stands for the size of a table). In probing methods the threshold is n > 0.8a. Resizing
requires allocation of a longer array and copying of all values. There are known some
dynamic resizing methods that do not copy all items in one step.

4 Perfect hashing
Definition 7. A perfect hashing function returns different value for each key.
A perfect hashing function guarantees no conflicts.
Definition 8. A minimal perfect hashing function is a perfect hashing function that for
all n keys returns indices from interval [0, n − 1].
A perfect minimal hashing function guarantees no conflicts and no empty buckets
in a hash table.

5 Computation complexity for hash tables


Hash tables join advantages of two container types:
• They have low complexity of operations (as vectors) – cf. Tab. 1.

• They may have non-consecutive indices (as associative arrays).


Unfortunately there are no free lunches – hash table are associative and fast, but stored
keys are not ordered. Hash tables are commonly used to implement associative un-
ordered data containers (eg. std :: unordered_map in C++ SLT library).

11
Table 1: Complexity of operations in a hash table.
complexity
operation average worst
search O(1) O(n)
insert O(1) O(n)
delete O(1) O(n)

12
Heaps
Krzysztof Simiński

Algorithms and data structures


lecture 07, 17th April 2020

Binary heap is a data structure similar to binary trees, but it differs from trees in
two ways (Fig. 1):
• (order property) Each node holds a value greater than or equal to values held
by its children (maximum heap).1
• (shape property) All levels in a heap are full. The only exception is the lowest
level that may be filled partially starting from the left.
Heaps are rarely stored in trees. Commonly we use arrays for heaps. (Fig. 2). A
heap is stored in an array indexed from one. On the top of a heap (in the first cell of an
array) we store the maximal value. Such a representation makes it very easy to access
a parent or children of a value.
If a value is stored in a ith cell, then
• its children have indices 2i and 2i + 1;
• its parent has index 2i .
 

Problem 1. What are the indices of children and a parent of a value, if a heap is
indexed from zero instead of one? 
1 It is also possible to define a minimum heap with the reversed order.

45

14 30

12 4 26 20

10 8 1 2 21

Figure 1: Example of a binary heap. Each node holds value greater than its children.
Level l = 1 has 2l−1 = 1 node, level l = 2 has 2l−1 = 2 nodes, level l = 3 has
2l−1 = 4 node. These levels are full. The only partially filled level is the last one.

1
45 14 30 12 4 26 20 10 8 1 2 21

1 2 3 4 5 6 7 8 9 10 11 12

Figure 2: An array representation of the heap presented in Fig. 1.

1 procedure sift_up ( A , value , right ) // O(log n)


2 // A new value is inserted at the last position in a heap
3 begin
4 A [ right ] ← value ;
5 child ← right ;
6 while child > 1 do
7 parent ← floor ( child / 2 ) ; // child’s parent
8 i f A [ parent ] < A [ child ] then
9 swap ( A [ parent ] , A [ child ] ) ;
10 child ← parent ;
11 e l s e // parent greater than its child
12 return ;
13 end i f
14 end while
15 end .

Figure 3: Sift-up heap operation

1 Sift-up operation
A new item is added in the lowest level of a heap just after the last item. If the
lowest level if full, a new level is started from the left. In an array implementation a
new item is inserted just after the last item. The shape property is satisfied, but we
have to satisfy the order property as well. We use sift-up operation (Fig. 3).
Example 2. Let’a add 35 to the heap in Fig. 1 and 2. We add a new value at the end
of a heap. If the order property is not satisfied (a new item is greater than its parent),
we have to swap a new value and its parent.

45

14 30

12 4 26 20

10 8 1 2 21 35

2
Value 35 added at the end of the heap.

45 14 30 12 4 26 20 10 8 1 2 21 35

1 2 3 4 5 6 7 8 9 10 11 12 13

Value 35 sifted up to a upper level.

45

14 30

12 4 35 20

10 8 1 2 21 26

45 14 30 12 4 35 20 10 8 1 2 21 26

1 2 3 4 5 6 7 8 9 10 11 12 13

45

14 35

12 4 30 20

10 8 1 2 21 26

45 14 35 12 4 30 20 10 8 1 2 21 26

1 2 3 4 5 6 7 8 9 10 11 12 13

Value 35 is in its final location. (finis)


Theorem 1. Time complexity of the sift-up procedure is O(log n).
Problem 3. Prove theorem 1. 

3
1 procedure sift_down ( A , left , right ) // O(log n)
2 begin
3 parent ← left ;
4 while parent ∗ 2 6 right do // until a parent has at least one child
5 child ← parent ∗ 2 ;
6 greater_child ← 0 ; // index of greater child
7 i f A [ parent ] < A [ child ] then
8 greater_child ← child ;
9 end i f ;
10 i f child + 1 6 right then // parent has two children
11 i f A [ parent ] < A [ child + 1 ] and
12 A [ child ] < A [ child + 1 ] then
13 greater_child ← child + 1 ;
14 end i f ;
15 end i f ;
16 i f greater_child > 0 then // greater child exists
17 swap ( A [ parent ] , A [ greater_child ] ) ;
18 parent ← greater_child ;
19 e l s e // greater child
20 return ;
21 end while ;
22 end .

Figure 4: Sift-down operation.

2 Sift-down operation
Sift-down is an essential operation in a binary heap. We place an item on the top
of a heap. If the order property is not satisfied we have to swap an item with one of
its children. The case is more complicated because we have to decide which child an
item should be swapped with (Fig. 4).
Example 4. Value 3 has been placed at the top of the heap. Let’s sift it down.

14 30

12 4 26 20

10 8 1 2 21

4
3 14 30 12 4 26 20 10 8 1 2 21

1 2 3 4 5 6 7 8 9 10 11 12

Value 3 needs sifting down. We choose the greater child and swap.

30

14 3

12 4 26 20

10 8 1 2 21

30 14 3 12 4 26 20 10 8 1 2 21

1 2 3 4 5 6 7 8 9 10 11 12

Value 3 is not in its final localisation. We have to swap it with its greater child.

30

14 26

12 4 3 20

10 8 1 2 21

30 14 26 12 4 3 20 10 8 1 2 21

1 2 3 4 5 6 7 8 9 10 11 12

Unfortunately value 3 is still not in its final localisation. We have to swap it with its
greater child.

5
1 procedure heapify ( A , size ) // T (n) ∈ Θ(n log n)
2 begin
3 start ← floor ( size / 2 ) ; // index of a parent of the last item
4 while start > 0 then
5 sift_down ( A , start , size ) ;
6 start ← start − 1 ;
7 end while
8 end .

Figure 5: Heapifying

30

14 26

12 4 21 20

10 8 1 2 3

30 14 26 12 4 21 20 10 8 1 2 3

1 2 3 4 5 6 7 8 9 10 11 12

Now value 3 is in its final location. (finis)

3 Heapifying
In the sections above we have operated on already existing heaps. Now we are
going to make a heap in an array (Fig. 5).
Example 5. Let’s heapify the array:

14 12 21 10 4 3 30 1 8 20 2 26

1 2 3 4 5 6 7 8 9 10 11 12

We start with sifting down the parent of the last item.

14 12 21 10 4 3 30 1 8 20 2 26

1 2 3 4 5 6 7 8 9 10 11 12

6
After sifting down:

14 12 21 10 4 26 30 1 8 20 2 3

1 2 3 4 5 6 7 8 9 10 11 12

We have to test the item indexed with 5:

14 12 21 10 4 26 30 1 8 20 2 3

1 2 3 4 5 6 7 8 9 10 11 12

We have to sift it down:

14 12 21 10 20 26 30 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

We have to test the item indexed with 4:

14 12 21 10 20 26 30 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

There is no need of sifting it down. We have to test one item left:

14 12 21 10 20 26 30 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

We sift it down.

14 12 30 10 20 26 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

And the next item:

7
14 12 30 10 20 26 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

We swap it with its greater child:

14 20 30 10 12 26 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

Value 12 has children. We have to test if it needs sifting down:

14 20 30 10 12 26 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

Value 12 is in its location. We have to sift down value 14.

14 20 30 10 12 26 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

We have to sift it further down:

30 20 14 10 12 26 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

30 20 26 10 12 14 21 1 8 4 2 3

1 2 3 4 5 6 7 8 9 10 11 12

Finally value 14 has found its location and we have made the heap in the array. (finis)
Problem 6. Draw a tree-like version of example 5. 

8
3.1 Computation complexity
We start to heapify in the middle of an array. Cost of sifting down is O(log n).
We sift a linear number of items O(n). Thus upper estimation is O(n log n). Unfor-
tunately it is a very rough estimation. Let’s try to be more precise.
Let’s assume a heap has h levels indexed from 0 (root) up to h − 1 (leaves). Let’s
analyse the penultimate level, ie. h − 2. Items in this level can only be sifted down to
level h − 1. In level h − 2 there are more or less 41 of all items in a heap. Thus a quarter
of items may be only sifted one level down.
Sifting an item from level ith takes O(h − 1 − i). There are 2i items in this level.
Let’s sum up possible sifts for all levels:
h−1
X
2i (h − 1 − i) = 20 (h − 1) + 21 (h − 2) + 22 (h − 3) + . . . + 2h−2 (1) + 2h−1 (0) =
i=0
(1)
h−1 h−1 h−1 h−1
X X 2h−1 X k X k
= 2h−1−k (k) = k
(k) = 2h−1 k
6n
2 2 2k
k=0 k=0 k=0 k=0
(2)
Ph−1 Ph−1
Sum k=0 2kk is convergent to a constant, so i=0 2i (h − 1 − i) ∈ O(n). It means
a heap is built in the linear time.
Ph−1
Let’s check if the value of k=0 2kk is not an extremely large constant. Let’s write
the sum as a matrix:
1
2
1 1
4 4
1 1 1 (3)
8 8 8
.. .. .. ..
. . . .

If we add items in rows, we get the sum. If we sum items in columns, the first column
is a geometric series equal 1, the second a series equal 14 , the third 18 etc. Eventually
we have a series of series (that is also a series): 11 + 12 + 14 + 81 + . . . = 2.
We have just shown that the complexity of heapifing is O(n).

4 Heapsort
The idea of heapsort is very simple. First we make a heap in an array. The first
cell of the array holds the maximal value. We swap it with the last item in the array.
The maximal value is in its final position. But the value at the top of the heap needs
sifting down. So we sift it down. The second maximal value is at the top of the heap.
We swap it with the penultimate value in the array. In each iteration the heap is one
item shorter and the sorted array – one item longer.
Example 7. Let’s sort the array:

14 12 21 10 4 3 30 1 8 20 2 26

9
1 procedure heapsort ( A , size )
2 // Topt (n) = Tpes (n) = Tavg (n) ∈ O(n log n)
3 begin
4 heapify ( A , size ) ;
5 right ← size ;
6

7 while right > 1 do


8 // the top of a heap holds maximal value
9 swap ( A [ right ] , A [ 1 ] ) ;
10 right ← right − 1 ;
11 sift_down ( A , 1 , right ) ; // place the item in its location
12 end while
13 end .

Figure 6: Heapsort.

First we heapify the array (we have already done it in example 5).

heap

30 20 26 10 12 14 21 1 8 4 2 3

We swap the last item with the first one and sift value 3 down:

heap sorted

26 20 21 10 12 14 3 1 8 4 2 30

heap sorted

21 20 14 10 12 2 3 1 8 4 26 30

heap sorted array

20 12 14 10 4 2 3 1 8 21 26 30

heap sorted array

14 12 8 10 4 2 3 1 20 21 26 30

heap sorted array

12 10 8 1 4 2 3 14 20 21 26 30

10
heap sorted array

10 4 8 1 3 2 12 14 20 21 26 30

heap sorted array

8 4 2 1 3 10 12 14 20 21 26 30

heap sorted array

4 3 2 1 8 10 12 14 20 21 26 30

heap sorted array

3 1 2 4 8 10 12 14 20 21 26 30

heap sorted array

2 1 3 4 8 10 12 14 20 21 26 30

heap sorted array

1 2 3 4 8 10 12 14 20 21 26 30

sorted array

1 2 3 4 8 10 12 14 20 21 26 30

(finis)

4.1 Computational complexity


First we heapify – O(n). Then n − 1 times we swap and sift down, what makes
O(n log n). Finally complexity of heapsort is O(n log n). The complexity is independ-
ent from the data to sort. No extra array is needed (in situ algorithm).

11
Mergesort, linear sorting
Krzysztof Simiński
Algorithms and data structures
lecture 08, 24th April 2020

1 Mergesort
Mergesort is an example of a “divide and conquer” paradigm. A task is split into
subtasks until it is trivial to solve them. Then the results of subtasks are merged into
a final results of the initial task.
In mergesort algorithm a crucial operation is merging of sorted subarrays into one
merged sorted array. Fortunately it can be done fast.

1.1 Merging of sorted arrays


Merging of sorted arrays does not even need all data to be read into memory at
once. We can take item by item from two streams of data.
Example 1. Let’s merge two sorted arrays of numbers.

12 9 7 6 3

15 8 4

We take only one value from both input arrays and test which is lesser. We move the
lesser value from the input into output.

12 9 7 6 3

15 8 4

1
12 9 7 6

15 8 4

Then we test two values from input streams and again test which one is lesser. The
lesser value is moved to output.

12 9 7 6

4 3

15 8

We repeat this procedure.

12 9 7

6 4 3

15 8

12 9

7 6 4 3

15 8

2
12 9

8 7 6 4 3

15

12

9 8 7 6 4 3

15

One of streams is empty. We just move all values from the non empty stream to output.

12 9 8 7 6 4 3

15

Eventually we get a merged sorted array.

15 12 9 8 7 6 4 3

(finis)

Each value is taken from input array only once. One comparison is enough to
decide which value to move. Each value is put into output only once. Thus the com-
putational complexity of merging algorithm is linear.

3
1 procedure mergesort ( A , down , up ) ;
2 i f down < up then
3 s ← ( down + up ) / 2 ;
4

5 mergesort ( A , down , s ) ;
6 mergesort ( A , s + 1 , up ) ;
7

8 // merge subarrays
9 B [ down . . up ] ; // auxilliary array
10 left ← down ;
11 right ← s + 1 ;
12

13 f o r i ← down to up do
14 i f left > s then
15 B [ i ] ← A [ right ] ;
16 right ← right + 1 ;
17 e l s e i f right > up then
18 B [ i ] ← A [ left ] ;
19 left ← left + 1 ;
20 e l s e i f A [ left ] < A [ right ] then
21 B [ i ] ← A [ left ] ;
22 left ← left + 1 ;
23 else
24 B [ i ] ← A [ right ] ;
25 right ← right + 1 ;
26 end i f ;
27 end f o r ;
28

29 // copy sorted subarray


30 f o r i ← down to up do
31 A[i] ← B[i ];
32 end f o r ;
33

34 end i f ;
35 end procedure ;

Figure 1: Mergesort algorithm.

4
10 4 3 9 12 1 20 7 15 5

10 4 3 9 12 1 20 7 15 5

10 4 3 9 12 1 20 7 15 5

10 4 3 9 12 1 20 7 15 5

10 4 1 20

Figure 2: Mergesort: the first step – split into subarrays.

1.2 Sorting
The pseudocode of mergesort in presented in Fig. 1.

Example 2. Mergesort is a recursive algorithm. First we split an array into two equal
parts and call the algorithm for both parts until we get one-item arrays (Fig. 2). One-
item arrays have only one item each, so they are sorted. Now we only have to merge
sorted arrays into a final sorted array. (Fig. 3). (finis)

1.3 Time complexity


Complexity analysis of mergesort is similar to the analysis of quicksort, but much
easier. The execution of the algorithm is independent from the data in an array to sort.
Thus worst, average, and best time complexity is exactly the same. The array is always
split into two equal parts. The we run the algorithm for both parts recursively (Fig. 4).
The depth of recursion is O(log n). Merging of subarrays is done in linear time. The
number of operations in each level of recursion is O(n). Thus the time complexity of
mergesort is Tt (n) ∈ O(n log n).

1.4 Space complexity


Mergesort requires an auxiliary array. Space complexity is Ts ∈ O(n). Mergesort
is not an in situ algorithm.

5
1 3 4 5 7 9 10 12 15 20

3 4 9 10 12 1 5 7 15 20

3 4 10 9 12 1 7 20 5 15

4 10 3 9 12 1 20 7 15 5

10 4 1 20

Figure 3: Mergesort: the second step – merging of sorted subarrays.

n items

log n merges

Figure 4: Array split in recursive calls of mergesort.

6
a1 6 a2
T F

a2 6 a3 a1 6 a3
T F T F

a1 6 a2 6 a3 a1 6 a3 a1 6 a3 6 a2 a2 6 a3
T F T F

a1 6 a3 6 a2 a3 6 a1 6 a2 a3 6 a2 6 a1 a2 6 a1 6 a3

Figure 5: Decision tree for sorting values a1 , a2 , and a3 . “T” stands for true, “F” – false.
Leaves (final sorted permutations) are coloured.

2 Linear time sorting


The sorting algorithms we have discussed so far share a common feature: they are
based on comparison of values. Let’s analyse the lower limit of complexity of such
algorithms.
Example 3. Let’s assume we would like to sort a sequence of three numbers (a1 , a2 , a3 ).
We compare a1 with the others to find the final permutation. If a1 6 a2 and a2 6 a3 ,
we know the sorted permutation is (a1 , a2 , a3 ). If a1 6 a2 and a1 6 a3 , we cannot
answer the question. We have to compare a2 and a3 . All possibilities are presented as
a decision tree in Fig. 5. (finis)
Theorem 1. The height of each decision tree representing a comparison based algorithm
that sorts n items is Ω(n log n).
Proof. Let’s construct a tree (whose height is h) representing sorting of n values. The
tree has n! leaves, because there are n! possible permutation of n values. The height
of tree is h, so it has 2h leaves. Thus

n! 6 2h (1)
log2 (n!) 6 h (2)

We use Stirling’s formula to estimate the factorial:


 n n
< n!, (3)
e
where e is the base of the natural logarithm.
n
n log2 < log2 (n!) 6 h (4)
e
n log2 n − n log2 e < h (5)

We use lower asymptotic limit:

h ∈ Ω(n log n). (6)

7
1 procedure countingsort ( in_array [ 1 . . n ] , out_array [ 1 . . n ] ,
k)
2 // in_array: input array
3 // out_array: output array
4 // k: maximal value stored in an input array
5

6 f o r i ← 1 to k do // O(k)
7 counter [ i ] ← 0 ;
8 end f o r ;
9 f o r i ← 1 to n do // O(n)
10 counter [ in_array [ i ] ] ← counter [ in_array [ i ] ] + 1 ;
11 end f o r ; // counter [ i ] holds number of items equal i.
12

13 f o r i ← 2 to k do // O(k)
14 counter [ i ] ← counter [ i ] + counter [ i − 1 ] ;
15 end f o r ; // counter [ i ] holds number of items less or equal i.
16

17 f o r i ← n downto 1 do // O(n)
18 out_array [ counter [ in_array [ i ] ] ] ← in_array [ i ] ;
19 counter [ in_array [ i ] ] ← counter [ in_array [ i ] ] − 1 ;
20 end f o r ;
21 end procedure ;

Figure 6: Countsort

Each comparison is represented by an edge in a decision tree. In means the height h


of a tree is a number of comparison in a sorting algorithm. Thus comparison based
sorting algorithms cannot have lower complexity than Ω(n log n).

Sorting algorithms with complexity less than O(n log n) are possible, but they do
not compare values and need some information on the data to sort.

2.1 Countsort
Countsort assumes that each of sorted numbers is an integer from interval [1, k]
for a certain k. If k ∈ O(n), complexity of countsort is O(n). The pseudocode of
countsort is presented in Fig. 6.

Example 4. We would like to sort numbers from interval [1, 6] stored in input array
in_array:

in_array 4 5 3 6 1 3 6 3 1 3

First we initialise array counter (lines 6-8 in Fig. 6):

counter 0 0 0 0 0 0

8
We count each value in array in_array and put the results in counter (lines 9-11 in
Fig. 6):

counter 2 0 4 1 1 2

In the next step we cumulate values in counter (lines 13-15):

counter 2 2 6 7 8 10

And the final step: we fill output array out_array (linie 17-20). We fill the array from
its end!

out_array 1 1 3 3 3 3 4 5 6 6

(finis)

2.1.1 Computational complexity


In the algorithm there are four loops: two run in O(k) time and two in O(n), thus
time complexity is O(n + k). In practise k ∈ O(n), thus T (n) ∈ O(n). Countsort has
linear time complexity.

2.2 Stability
Countsort is stable.

Problem 1. Why? 

2.3 Radixsort
Radixsort is a sorting algorithm for numbers. We sort numbers first by the least
significant digit, then by second least significant digit etc. We know exactly the num-
ber of digits (in decimal system it is 10), so we can use countsort algorithm (with
k = 10).

Example 5. Sort numbers with radixsort.

input sorted by sorted by sorted by


units digit tens digit hundreds digit

329 720 720 329


457 457 329 355
657 657 436 436
839 355 839 457
436 436 457 657
720 329 657 720
355 839 355 839

9
1 procedure radixsort ( A , d )
2 // A: input array
3 // d: number of digits
4 f o r i ← 1 to d do
5 sort A with a stable sorting algorithm with regard to i-th digit ;
6 end f o r ;
7 end procedure ;

Figure 7: Radixsort

(finis)

It is necessary to use a stable sorting algorithm (like countsort) in radixsort.


Unfortunately radixsort is not an in situ algorithm.
Problem 2. Why? 

2.3.1 Computational complexity


If we sort d-digit numbers with digits 1, . . . , k and there are n numbers to sort, for
each radix we run countsort with complexity O(n + k). There are d radices, thus the
complexity is O(dn + dk). If d and k are constants, we get complexity T (n) ∈ O(n).

2.4 Bucketsort
Bucketsort assumes numbers to sort are values from interval [0, 1). They do not
need to be integers. The algorithm assumes a uniform distribution of numbers. The
numbers are located in buckets. The number of buckets equals the number of values to
sort. Because the values have a uniform distribution, the buckets hold similar numbers
of values. The last step is sorting of values in each bucket (Fig. 9).
Example 6. Use bucketsort to sort number in array A (Fig. 8). (finis)

2.4.1 Computational complexity


The number of buckets equals the number of values to sort. The values have a
uniform distribution. The average number of values in a bucket is one. This is why the
average execution time is linear. In pessimistic case all values are located in the same
bucket. In such a case time complexity is Tworst ∈ O(n2 ), because we use insertion
sort.
The algorithm needs an auxiliary array the size if which is O(n).

10
A B

1 0.64

2 0.23 0.14 0.18

3 0.14 0.21 0.23 0.27

4 0.27 0.37

5 0.99

6 0.78

7 0.21 0.64

8 0.37 0.74 0.78

9 0.18

10 0.74 0.99

Figure 8: Example of bucketsort.

1 procedure bucketsort ( A [ 1 . . n ] )
2 // A: input array
3 f o r i ← 1 to n do
4 B [ bnA [ i ] c ] ← A [ i ] ;
5 end f o r ;
6 f o r i ← 0 to n − 1 do
7 sort list B [ i ] with insertion sort ;
8 end f o r ;
9 merge lists B [ 0 ] , B [ 1 ] , . . . , B [ n − 1 ] ;
10 end procedure ;

Figure 9: Bucketsort

11
Dynamic programming
Krzysztof Simiński
Algorithms and data structures
lecture 09, 02nd May 2020

Dynamic programming is a algorithm design paradigm similar to «divide and con-


quer» approach. It splits a task into subtasks, solves subtasks, and merges results of
subtasks into a final solution of the task.
Example 1. Quicksort is a «divide and conquer» algorithm. First it moves items in an
array, then the algorithm is run for two parts of an array. The merger is trival. (finis)
Example 2. Mergesort is a «divide and conquer» algorithm. First the algorithm is
run for two equal parts of an array. The split is trivial. The most complicated part of
the algorithm is merging of subresults into the final result. (finis)
In dynamic programming a task is also split into subtasks. The subtasks are solved
and the results are merged (combined) into a final results. In dynamic programming
subtasks are not independent. The same subtask may occur many times. To avoid
multiply solving of exactly the same task, it is solved only once and the result is stored
in memory for future references.
«Dynamic programming» is a historical name. Today it may be misleading. «Pro-
gramming» means «solving», eg. «linear programming» is a procedure for solving
linear systems (often with constraints). «Dynamic» does not mean «dynamic alloca-
tion of memory», it means the subresults are stored in arrays.

1 Matrix chain multiplication problem


As an example of dynamic programming approach first we analyse a matrix chain
multiplication problem. Matrix A with r rows and c columns is a matrix
 
a11 a12 . . . a1c
 a21 a22 . . . a2c 
A[r, c] =  . .. .. ..  .
 
 .. . . . 
ar1 ar2 . . . arc

Multiplication of two matrices is defined as


 Pq Pq Pq 
Pi=1 a1i bi1 Pi=1 a1i bi2 ... Pi=1 a1i bir
q q q
i=1 a2i bi1 i=1 a2i bi2 ... i=1 a2i bir
 
A[p, q] × B[q, r] = C[p, r] =  .. .. .. ..
 
.
Pq . Pq . Pq .

 
i=1 api bi1 i=1 api bi2 ... i=1 api bir

1
A1 A2 A3 A4 A5

(A1 )(A2 A3 A4 A5 ) (A1 A2 )(A3 A4 A5 ) (A1 A2 A3 )(A4 A5 ) (A1 A2 A3 A4 )(A5 )

(A3 )(A4 A5 ) (A1 )(A2 A3 )

(A3 A4 )(A5 ) (A1 A2 )(A3 )

(A2 )(A3 A4 A5 ) (A1 A2 A3 )(A4 )


(A2 A3 )(A4 A5 ) (A1 A2 )(A3 A4 )
(A2 A3 A4 )(A5 ) (A1 )(A2 A3 A4 )

Figure 1: Example of recursive solution to matrix chain multiplication problem. The


sequence A1 A2 is analysed three times.

Scalar multiplication (multiplication of floating point numbers) is a dominant op-


eration in the task. Multiplication [p, q] × [q, r] results in a matrix [p, r] whose each
cell needs q scalar multiplications. Finally the total number of scalar multiplications
is pqr, what results in computational complexity O(n3 ), where n stands for a size of
one dimension of matrices.
In matrix chain multiplication problem we have a sequence A1 A2 A3 . . . An of
matrices to multiply. Matrix multiplication is not commutative (AB 6= BA). We may
not swap matrices in a sequence. But matrix multiplication is associative. We can
parenthesize a sequence and the result (product) is the same. If we group matrices in
a clever way, we may highly reduce the number of scalar multiplication.
Example 3. Sequence of matrices hA1 , A2 , A3 i whose dimensions are A1 [10, 100],
A2 [100, 5], A3 [5, 50]. If we parenthesize matrices in this way (A1 A2 ) A3 , the number
of scalar multiplications is 7500. But if we group matrices in that way A1 (A2 A3 ), we
need 75000 scalar multiplications to get the final result. (finis)

Let’s define formally the matrix chain multiplication problem. Given a sequence
of n matrices (A1 , A2 , . . . , An ) find a optimal grouping of matrices that requires the
least number of scalar multiplications.
Number P (n) of different groupings of a sequence of n matrices is:
(
1, for n = 1
P (n) = Pn−1 (1)
k=1 P (k)P (n − k), for n > 2

Or in a closed form: P (n) = C(n−1), where n-th Catalan number C(n) = n+1 1 2n

n ∈
n 
Ω n43/2 .
Let’s denote by Ai...j = Ai Ai+1 . . . Aj a product of matrices from i-th to j-th.

2
Let’s analyse a task for five matrices (without a loss of generality). Given a se-
quence of matrices A1 A2 A3 A4 A5 their dimensions must fit: number of columns of a
previous matrix equals number of rows in a next matrix. Because the dimensions of
matrices must fit, we do not have to store numbers of rows of the (i + 1)-th matrix,
because it equals the number of columns of the i-th matrix. It is enough to store a se-
quence of numbers p0 p1 . . . pn . We can easily reconstruct dimensions of i-th matrix:
Ai [pi−1 , pi ].
If we solve the problem with «divide and conquer» paradigm, we have to test all
groupings of matrices into two groups (Fig. 1): (A1 )(A2 A3 A4 A5 ), (A1 A2 )(A3 A4 A5 ),
(A1 A2 A3 )(A4 A5 ), (A1 A2 A3 A4 )(A5 ). Then each subsequence longer then 2 has to
be subgrouped further. On each level we choose the minimal solution.
We have to solve one more problem. We have to merge subsolutions. If we elab-
orate the cost of multiplication of a sequence Aa...b is ma...b and the cost of sequence
Ac...d is mc...d , what is the cost for sequence Aa...b Ac...d ? Cost ma...d of multiplication
of matrices in sequence Aa...d is a sum of cost ma...b and cost mc...d and cost M of
multiplication of these matrices. Matrix Aa...b has as many rows as matrix Aa (let’s de-
note is as rAa ) and as many columns as matrix Ab (let’s denote it as cAb ). Matrix Ac...d
has as many rows as matrix Ac (let’s denote is as rAc = cAb ) and as many columns as
matrix Ad (let’s denote it as cAd ). Multiplication cost M is M = rAa cAb cAd . Finally
multiplication cost for the sequence is
ma...d = ma...b + mc...d + wAa kAb kAd (2)
or
ma...d = ma...b + mc...d + pa−1 pb pd . (3)
We know how to split a task into subtasks and we know how to merge subso-
lutions. We can easily solve the problem recursively. In Fig. 1 sequence (A1 A2 ) is
printed in red. We can easily notice that its cost is elaborated many times. This is why
we put the results into two arrays:
• m[i, j] – minimal number of scalar multiplications mi...j for sequence Ai...j ,
(
0, i=j
m[i, j] = (4)
minj−1
i=k (m[i, k] + m[k + 1, j] + p p p
i−1 k j ) , i<j

• s[i, j] = k, where k stands for the k-th matrix after which we put a parenthesis,
ie the optimal grouping of sequence Ai Ai+1 . . . Aj is (Ai Ai+1 . . . Ak ) (Ak+1 . . . Aj ).
The algorithm is presented in Fig. 2 starts with the shortest sequences and merges
them into longer ones.
Example 4. Let’s find an optimal grouping for sequence A1 A2 A3 A4 of matrices
with dimensions: A1 [20, 1], A2 [1, 100], A3 [100, 2], A4 [2, 50]. First let’s encode matrix
dimensions as a sequence (p0 , p1 , p2 , p3 , p4 ) = (20, 1, 100, 2, 50). We initialize cost
array m and split array s:
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 ∞ ∞ ∞ i=1
2 0 ∞ ∞ 2
3 0 ∞ 3
4 0 4
5 5

3
1 procedure paranthesize ( hp0 , p1 , p2 , . . . , pn i )
2 f o r i ← 1 to n do
3 m[i , i] ← 0;
4 f o r j ← i + 1 to n do
5 m [ i , j ] ← ∞ ; // minimum search
6 end f o r ;
7 end f o r ;
8

9 f o r j ← 2 to n do
10 f o r i ← j − 1 downto 1 do
11 f o r k ← i to j − 1 do
12 temp ← m [ i , k ] + m [ k + 1 , j ] + pi−1 pk pj ;
13 i f temp < m [ i , j ] then
14 m [ i , j ] ← temp ;
15 s[i , j] ← k ;
16 end i f ;
17 end f o r ;
18 end f o r ;
19 end f o r ;
20 r e t u r n {m , s} ;
21 end procedure

Figure 2: Optimal matrix grouping.

And start the algorithm:


• j=2
• i=1
• k = 1 : We test multiplication cost for sequence Ai...j = A1...2 .

m[1, 1] + m[2, 2] + p0 p1 p2 = 0 + 0 + 20 · 1 · 100 = 2000

The value is less than the value in m[1, 2] = ∞, thus we overwrite it


and actualise the grouping location in s[1, 2] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 ∞ ∞ i=1 1
2 0 ∞ ∞ 2
3 0 ∞ 3
4 0 4
5 5
• j=3

• i=2
• k = 2 : We test multiplication cost for sequence Ai...j = A2...3 .

m[2, 2] + m[3, 3] + p1 p2 p3 = 0 + 0 + 1 · 100 · 2 = 200

4
The value is less than the value in m[2, 3] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 3] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 ∞ ∞ i=1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• i = 1: Sequence A1...3 has two groupings:
• k = 1: We test multiplication cost for sequence Ai...j = A1...3 grouped
into A1...1 A2...3 .

m[1, 1] + m[2, 3] + p0 p1 p3 = 0 + 200 + 20 · 1 · 2 = 240

The value is less than the value in m[1, 3] = ∞, thus we overwrite it


and actualise the grouping location in s[1, 3] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• k = 2:

m[1, 2] + m[3, 3] + p0 p3 p3 = 2000 + 0 + 20 · 2 · 2 = 2080

The value is greater than the value in m[1, 3] = 240. We do not


modify it.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• j=4
• i=3
• k = 3: We test multiplication cost for sequence Ai...j = A3...4 .

m[3, 3] + m[4, 4] + p2 p3 p4 = 0 + 0 + 100 · 2 · 50 = 10000

The value is less than the value in m[3, 4] = ∞, thus we overwrite it


and actualise the grouping location in s[3, 4] with k = 3.

5
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 10000 3 3
4 0 4
5 5
• i=2
• k = 2: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...2 A3...4 .
m[2, 2] + m[3, 4] + p1 p2 p4 = 200 + 10000 + 1 · 100 · 50 = 15200
The value is less than the value in m[2, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 4] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i = 1 0 2000 240 ∞ i=1 1 1
2 0 200 15200 2 2 2
3 0 10000 3 3
4 0 4
5 5
• k = 3: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...3 A4...4 .
m[2, 3] + m[4, 4] + p1 p3 p4 = 200 + 0 + 1 · 2 · 50 = 300
The value is less than the value in m[2, 4] = 15200, thus we overwrite
it and actualise the grouping location in s[2, 4] with k = 3.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5
• i=1
• k = 1: We test multiplication cost for sequence Ai...j = A1...4 grouped
into A1...1 A2...4 .
m[1, 1] + m[2, 4] + p0 p1 p4 = 0 + 300 + 20 · 1 · 50 = 1300
The value is less than the value in m[1, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[1, 4] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 1300 i=1 1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5

6
1 procedure multiply_matrix_chain ( A = hA1 , A2 , . . . , An i , s , i , j
)
2 i f j > i then
3 X ← multiply_matrix_chain (A , s , i , s[i , j ]) ;
4 Y ← multiply_matrix_chain (A , s , s[i , j] + 1 , j) ;
5 r e t u r n multiplication ( X , Y ) ; // multiply matrices
6 else
7 r e t u r n Ai ;
8 end procedure ;

Figure 3: Optimal multiplication of a sequence of matrices.

• k = 2: We test multiplication cost for sequence Ai...j = A1...4 grouped


into A1...2 A3...4 .

m[1, 2] + m[3, 4] + p0 p2 p4 = 2000 + 10000 + 20 · 100 · 50 = 13100

The value is greater than the value in m[1, 4] = 1300. We do not


modify it.
• k = 3: We test multiplication cost for sequence Ai...j = A1...4 grouped
into A1...3 A4...4 .

m[1, 3] + m[4, 4] + p0 p3 p4 = 240 + 0 + 20 · 2 · 50 = 2240

The value is greater than the value in m[1, 4] = 1300. We do not


modify it.
We have calculated costs of multiplications of all subsequnences. Let’s find the
optimal grouping of sequence A1...4 (Fig. 3). First we read m[1, 4] = 1, so we split the
sequence after the first matrix: (A1 )(A2 A3 A4 ). Then we read m[2, 4] = 3, so we split
sequence A2...4 after A3 . Thus we have (A1 )((A2 A3 )A4 ). It is the optimal grouping
with 1300 scalar multiplications.
Please notice that multiplication cost of four matrices A1...4 is less than multiplic-
ation cost of shorter sequence A1...3 . By adding one more matrix to the sequence we
reduce a number of scalar multiplications. (finis)

Problem 1. What are the features of a sequence A1...n and matrix An+1 so that the
multiplication cost of A1...n is less then cost of A1...n+1 ? 

1.1 Computational complexity


Algorithm paranthesize (Fig. 2) has time complexity Tt ∈ O(n3 ) and space
complexity Ts ∈ O(n2 ), where n is a number of matrices in a sequence.
Problem 2. Why? 

2 Longest common subsequence problem


Definition 1. Subsequence can be obtained from a sequence by removing 0 or more
elements from any position.

7
Example 5. Sequence: (M, I, S, S, I, S, S, I, P, I), examples of subsequences: (I, I),
(M, I, S, S), (S, S, S, S), (M, I, P ). But you cannot swap elements! (P, I, M ) is not
a subsequence of (M, I, S, S, I, S, S, I, P, I). (finis)

Definition 2. Substring can be obtained from a sequence by removing 0 or more con-


secutive elements.
Example 6. Sequence: (M, I, S, S, I, S, S, I, P, I), examples of substrings: (M, I, S, S),
(I, S, S, I, P ), (S, I, S, S). (finis)

Definition 3. Longest common subsequence of two sequences is a common subsequence


of the maximal possible length.
The problem is commonly solved with dynamic programming. Given sequences
A = a1 a2 . . . an and B = b1 b2 . . . bm let’s assume we have read i − 1 symbols from
sequence A and j − 1 symbols from sequence B. We have found the length of the
longest common sequence c[i, j]. Now we would like to test the next symbol from
both sequences. If ai = bj , we just add the symbol and increase the length of the
common sequence. If the symbols do not match ai 6= bj , we use the longest of two
elaborated sequences so far. Let’s formalise it:

when ai = bj

c[i − 1, j − 1] + 1
c[i, j] = (5)
max(c[i, j − 1], c[i − 1, j]) when ai 6= bj

with the boundary conditions:

c[i, 0] = c[0, j] = 0 for any valid i, j (6)

Example 7. Let’s find the longest common sequence for sequences X = (A, B, A,
C, A, B, C, A) and Y = (A, C, A, B, C, B). First we fill a matrix with the algorithm
presented in Fig. 4.

8
1 procedure find_LCS ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4 f o r i ← 0 to m do c [ i , 0 ] ← 0 ; end f o r ;
5 f o r j ← 0 to n do c [ 0 , j ] ← 0 ; end f o r ;
6

7 f o r i ← 1 to m do
8 f o r j ← 1 to n do
9 i f xi = yj then
10 c[i , j] ← c[i − 1 , j − 1] + 1 ;
11 b[i , j] ← "- " ;
12 else
13 i f c [ i − 1 , j ] > c [ i , j −1] then
14 c[i , j] ← c[i − 1 , j ] ;
15 b[i , j] ← "↑ " ;
16 else
17 c[i , j] ← c[i , j − 1];
18 b[i , j] ← "← " ;
19 end i f ;
20 end i f ;
21 end f o r ;
22 end f o r ;
23 end procedure ;

Figure 4: Psedocode for the longest common sequence search algorithm.

9
Y = A C A B C B

j→ 1 2 3 4 5 6

X i↓
0 0 0 0 0 0 0
- ← - ← ← ←
A 1
0 1 1 1 1 1 1
↑ ↑ ↑ - ← ←
B 2
0 1 1 1 2 2 2
- ↑ - ↑ ↑ ↑
A 3
0 1 1 2 2 2 2
↑ - ↑ ↑ - ←
C 4
0 1 2 2 2 3 3
- ↑ - ← ↑ ↑
A 5
0 1 2 3 3 3 3
- ↑ - ↑ ↑ ↑
A 6
0 1 2 3 3 3 3
↑ ↑ ↑ - ← -
B 7
0 1 2 3 4 4 4
- ↑ - ↑ ↑ ↑
A 8
0 1 2 3 4 4 4

Fig. 5 presents an algorithm for printing the longest common subsequence. (finis)
Problem 3. The solution we have found in Example 7 is not the only one common
sequence with 4 elements. How to modify the algorithm to get other solutions? 

If we only need the length of the longest common sequence we do not need all
rows in the matrix. We need only the actual and the previous row.

10
1 f u n c t i o n print_LCS ( b , X = hx1 , . . . , xm i , i , j )
2 i f i = 0 or j = 0 then
3 return ;
4 end i f ;
5

6 i f b [ i , j ] = " - " then


7 print_LCS ( b , X , i − 1 , j − 1 ) ;
8 print ( xi ) ;
9 else
10 i f b [ i , j ] = " ↑ " then
11 print_LCS ( b , X , i − 1 , j ) ;
12 else
13 print_LCS ( b , X , i , j − 1 ) ;
14 end i f ;
15 end i f ;
16 end procedure

Figure 5: Psedocode for printing the longest common sequence.

11
3 Edit distance
Edit distance is a number of single edit operations (insertions, deletions, substitu-
tions) necessary to transform a sequence into other. It is often used as a measure of
similarity of sequences, eg in spellchecking. It if a very close problem to the longest
common sequence search problem. We only have to modify the cost formula. The
formula for calculation of edit distance:

c[i, 0] = icd (7)


c[0, j] = jci (8)

and for i ∈ [1, n] and j ∈ [1, m]:

c[i, j] = min(c[i, j − 1] + ci , c[i − 1, j] + cd , c[i − 1, j − 1] + cr (ai , bj )), (9)

where cd stands for deletion cost, ci – insertion cost, and cr – substitution cost. Vari-
ants:
• cr (x, y) can be more complex, e.g., can depend on “similarity” of symbols or
“proximity” of symbols (in spellchecking applications it is much more probable
to press ‘a’ instead of ’s’ than ’a’ instead of ’o’);
• cr (x, y) = 0 for any pair of symbols — this is indel distance;
• sometimes we allow to swap the neighbour symbols (as a single edit operation).
The edit distance problem is solved in a very similar way as the longest common
sequence problem – Fig. 6.
Example 8. Let’s find the edit distance between sequences X = (A, B, A, C, A, B,
C, A) and Y = (A, C, A, B, C, B). Assume costs: ci = 2, cd = 3, and cr (a, b) = 4
for a 6= b and cr (a, a) = 0.

12
1 f u n c t i o n find_edit_distance ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i ,
ci , cd , cr )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4

5 for i ← 0 to m do
6 c[i , 0] ← i ∗ cd ;
7 b[i , 0] ← "↑ " ;
8 end f o r ;
9 for j ← 0 to n do
10 c[0 , j] ← j ∗ ci ;
11 b[0 , j] ← "← " ;
12 end f o r ;
13

14 f o r i ← 1 to m do
15 f o r j ← 1 to n do
16 insertion_cost = c [ i , j − 1 ] + ci ;
17 deletion_cost = c [ i − 1 , j ] + cd ;
18 replacement_cost = c [ i − 1 , j − 1 ] + cr (xi , yj ) ;
19

20 c [ i , j ] ← insertion_cost ;
21 b[i , j] ← "← " ;
22

23 if c[i , j ] > deletion_cost then


24 c[i , j ] ← deletion_cost ;
25 b[i , j] ← "↑ " ;
26 end i f ;
27 if c[i , j ] > replacement_cost then
28 c[i , j ] ← replacement_cost ;
29 b[i , j] ← "- " ;
30 end i f ;
31 if
32 end f o r ;
33 end f o r ;
34 end procedure

Figure 6: Pseudocode for edit distance problem.

13
Y = A C A B C B

j→ 1 2 3 4 5 6

← ← ← ← ← ←
X i↓
0 2 4 6 8 10 12
↑ - ← ← ← ← ←
A 1
3 0 2 4 6 8 10
↑ ↑ - ← - ← ←
B 2
6 3 4 6 4 6 8
↑ - - - ← ← ←
A 3
9 6 7 4 6 8 10
↑ ↑ - ↑ - - ←
C 4
12 9 6 7 8 6 8
↑ - ↑ - ← ↑ -
A 5
15 13 9 6 8 9 10
↑ - ↑ - - ← -
A 6
18 15 11 9 10 12 13
↑ ↑ ↑ ↑ - ← -
B 7
21 18 14 13 9 11 12
↑ - ↑ - ↑ - ←
A 8
24 21 17 14 12 13 15

A C A B C B

A B A C A A B A

The cost of transformation of sequence Y into X is 15. If we follow the arrows we


can transform X into Y : first delete final B, replace C with A, copy B and A, insert A,
copy C and A, insert B and A. (finis)
Problem 4. What the cost of transformation of sequence (A, C, B, C, A, B) into
sequence (A, A, B, C, A, C, B)? How to transform the input sequence at minimal
cost? Assume costs: ci = 3, cd = 2, and cr (a, b) = 4 for a 6= b and cr (a, a) = 1. 

14
Dynamic programming
Krzysztof Simiński
Algorithms and data structures
lecture 09, 02nd May 2020

Dynamic programming is a algorithm design paradigm similar to «divide and con-


quer» approach. It splits a task into subtasks, solves subtasks, and merges results of
subtasks into a final solution of the task.
Example 1. Quicksort is a «divide and conquer» algorithm. First it moves items in an
array, then the algorithm is run for two parts of an array. The merger is trival. (finis)
Example 2. Mergesort is a «divide and conquer» algorithm. First the algorithm is
run for two equal parts of an array. The split is trivial. The most complicated part of
the algorithm is merging of subresults into the final result. (finis)
In dynamic programming a task is also split into subtasks. The subtasks are solved
and the results are merged (combined) into a final results. In dynamic programming
subtasks are not independent. The same subtask may occur many times. To avoid
multiply solving of exactly the same task, it is solved only once and the result is stored
in memory for future references.
«Dynamic programming» is a historical name. Today it may be misleading. «Pro-
gramming» means «solving», eg. «linear programming» is a procedure for solving
linear systems (often with constraints). «Dynamic» does not mean «dynamic alloca-
tion of memory», it means the subresults are stored in arrays.

1 Matrix chain multiplication problem


As an example of dynamic programming approach first we analyse a matrix chain
multiplication problem. Matrix A with r rows and c columns is a matrix
 
a11 a12 . . . a1c
 a21 a22 . . . a2c 
A[r, c] =  . .. .. ..  .
 
 .. . . . 
ar1 ar2 . . . arc

Multiplication of two matrices is defined as


 Pq Pq Pq 
Pi=1 a1i bi1 Pi=1 a1i bi2 ... Pi=1 a1i bir
q q q
i=1 a2i bi1 i=1 a2i bi2 ... i=1 a2i bir
 
A[p, q] × B[q, r] = C[p, r] =  .. .. .. ..
 
.
Pq . Pq . Pq .

 
i=1 api bi1 i=1 api bi2 ... i=1 api bir

1
A1 A2 A3 A4 A5

(A1 )(A2 A3 A4 A5 ) (A1 A2 )(A3 A4 A5 ) (A1 A2 A3 )(A4 A5 ) (A1 A2 A3 A4 )(A5 )

(A3 )(A4 A5 ) (A1 )(A2 A3 )

(A3 A4 )(A5 ) (A1 A2 )(A3 )

(A2 )(A3 A4 A5 ) (A1 A2 A3 )(A4 )


(A2 A3 )(A4 A5 ) (A1 A2 )(A3 A4 )
(A2 A3 A4 )(A5 ) (A1 )(A2 A3 A4 )

Figure 1: Example of recursive solution to matrix chain multiplication problem. The


sequence A1 A2 is analysed three times.

Scalar multiplication (multiplication of floating point numbers) is a dominant op-


eration in the task. Multiplication [p, q] × [q, r] results in a matrix [p, r] whose each
cell needs q scalar multiplications. Finally the total number of scalar multiplications
is pqr, what results in computational complexity O(n3 ), where n stands for a size of
one dimension of matrices.
In matrix chain multiplication problem we have a sequence A1 A2 A3 . . . An of
matrices to multiply. Matrix multiplication is not commutative (AB 6= BA). We may
not swap matrices in a sequence. But matrix multiplication is associative. We can
parenthesize a sequence and the result (product) is the same. If we group matrices in
a clever way, we may highly reduce the number of scalar multiplication.
Example 3. Sequence of matrices hA1 , A2 , A3 i whose dimensions are A1 [10, 100],
A2 [100, 5], A3 [5, 50]. If we parenthesize matrices in this way (A1 A2 ) A3 , the number
of scalar multiplications is 7500. But if we group matrices in that way A1 (A2 A3 ), we
need 75000 scalar multiplications to get the final result. (finis)

Let’s define formally the matrix chain multiplication problem. Given a sequence
of n matrices (A1 , A2 , . . . , An ) find a optimal grouping of matrices that requires the
least number of scalar multiplications.
Number P (n) of different groupings of a sequence of n matrices is:
(
1, for n = 1
P (n) = Pn−1 (1)
k=1 P (k)P (n − k), for n > 2

Or in a closed form: P (n) = C(n−1), where n-th Catalan number C(n) = n+1 1 2n

n ∈
n 
Ω n43/2 .
Let’s denote by Ai...j = Ai Ai+1 . . . Aj a product of matrices from i-th to j-th.

2
Let’s analyse a task for five matrices (without a loss of generality). Given a se-
quence of matrices A1 A2 A3 A4 A5 their dimensions must fit: number of columns of a
previous matrix equals number of rows in a next matrix. Because the dimensions of
matrices must fit, we do not have to store numbers of rows of the (i + 1)-th matrix,
because it equals the number of columns of the i-th matrix. It is enough to store a se-
quence of numbers p0 p1 . . . pn . We can easily reconstruct dimensions of i-th matrix:
Ai [pi−1 , pi ].
If we solve the problem with «divide and conquer» paradigm, we have to test all
groupings of matrices into two groups (Fig. 1): (A1 )(A2 A3 A4 A5 ), (A1 A2 )(A3 A4 A5 ),
(A1 A2 A3 )(A4 A5 ), (A1 A2 A3 A4 )(A5 ). Then each subsequence longer then 2 has to
be subgrouped further. On each level we choose the minimal solution.
We have to solve one more problem. We have to merge subsolutions. If we elab-
orate the cost of multiplication of a sequence Aa...b is ma...b and the cost of sequence
Ac...d is mc...d , what is the cost for sequence Aa...b Ac...d ? Cost ma...d of multiplication
of matrices in sequence Aa...d is a sum of cost ma...b and cost mc...d and cost M of
multiplication of these matrices. Matrix Aa...b has as many rows as matrix Aa (let’s de-
note is as rAa ) and as many columns as matrix Ab (let’s denote it as cAb ). Matrix Ac...d
has as many rows as matrix Ac (let’s denote is as rAc = cAb ) and as many columns as
matrix Ad (let’s denote it as cAd ). Multiplication cost M is M = rAa cAb cAd . Finally
multiplication cost for the sequence is
ma...d = ma...b + mc...d + wAa kAb kAd (2)
or
ma...d = ma...b + mc...d + pa−1 pb pd . (3)
We know how to split a task into subtasks and we know how to merge subso-
lutions. We can easily solve the problem recursively. In Fig. 1 sequence (A1 A2 ) is
printed in red. We can easily notice that its cost is elaborated many times. This is why
we put the results into two arrays:
• m[i, j] – minimal number of scalar multiplications mi...j for sequence Ai...j ,
(
0, i=j
m[i, j] = (4)
minj−1
i=k (m[i, k] + m[k + 1, j] + p p p
i−1 k j ) , i<j

• s[i, j] = k, where k stands for the k-th matrix after which we put a parenthesis,
ie the optimal grouping of sequence Ai Ai+1 . . . Aj is (Ai Ai+1 . . . Ak ) (Ak+1 . . . Aj ).
The algorithm is presented in Fig. 2 starts with the shortest sequences and merges
them into longer ones.
Example 4. Let’s find an optimal grouping for sequence A1 A2 A3 A4 of matrices
with dimensions: A1 [20, 1], A2 [1, 100], A3 [100, 2], A4 [2, 50]. First let’s encode matrix
dimensions as a sequence (p0 , p1 , p2 , p3 , p4 ) = (20, 1, 100, 2, 50). We initialize cost
array m and split array s:
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 ∞ ∞ ∞ i=1
2 0 ∞ ∞ 2
3 0 ∞ 3
4 0 4
5 5

3
1 procedure paranthesize ( hp0 , p1 , p2 , . . . , pn i )
2 f o r i ← 1 to n do
3 m[i , i] ← 0;
4 f o r j ← i + 1 to n do
5 m [ i , j ] ← ∞ ; // minimum search
6 end f o r ;
7 end f o r ;
8

9 f o r j ← 2 to n do
10 f o r i ← j − 1 downto 1 do
11 f o r k ← i to j − 1 do
12 temp ← m [ i , k ] + m [ k + 1 , j ] + pi−1 pk pj ;
13 i f temp < m [ i , j ] then
14 m [ i , j ] ← temp ;
15 s[i , j] ← k ;
16 end i f ;
17 end f o r ;
18 end f o r ;
19 end f o r ;
20 r e t u r n {m , s} ;
21 end procedure

Figure 2: Optimal matrix grouping.

And start the algorithm:


• j=2
• i=1
• k = 1 : We test multiplication cost for sequence Ai...j = A1...2 .

m[1, 1] + m[2, 2] + p0 p1 p2 = 0 + 0 + 20 · 1 · 100 = 2000

The value is less than the value in m[1, 2] = ∞, thus we overwrite it


and actualise the grouping location in s[1, 2] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 ∞ ∞ i=1 1
2 0 ∞ ∞ 2
3 0 ∞ 3
4 0 4
5 5
• j=3

• i=2
• k = 2 : We test multiplication cost for sequence Ai...j = A2...3 .

m[2, 2] + m[3, 3] + p1 p2 p3 = 0 + 0 + 1 · 100 · 2 = 200

4
The value is less than the value in m[2, 3] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 3] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 ∞ ∞ i=1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• i = 1: Sequence A1...3 has two groupings:
• k = 1: We test multiplication cost for sequence Ai...j = A1...3 grouped
into A1...1 A2...3 .

m[1, 1] + m[2, 3] + p0 p1 p3 = 0 + 200 + 20 · 1 · 2 = 240

The value is less than the value in m[1, 3] = ∞, thus we overwrite it


and actualise the grouping location in s[1, 3] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• k = 2:

m[1, 2] + m[3, 3] + p0 p3 p3 = 2000 + 0 + 20 · 2 · 2 = 2080

The value is greater than the value in m[1, 3] = 240. We do not


modify it.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• j=4
• i=3
• k = 3: We test multiplication cost for sequence Ai...j = A3...4 .

m[3, 3] + m[4, 4] + p2 p3 p4 = 0 + 0 + 100 · 2 · 50 = 10000

The value is less than the value in m[3, 4] = ∞, thus we overwrite it


and actualise the grouping location in s[3, 4] with k = 3.

5
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 10000 3 3
4 0 4
5 5
• i=2
• k = 2: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...2 A3...4 .
m[2, 2] + m[3, 4] + p1 p2 p4 = 200 + 10000 + 1 · 100 · 50 = 15200
The value is less than the value in m[2, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 4] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i = 1 0 2000 240 ∞ i=1 1 1
2 0 200 15200 2 2 2
3 0 10000 3 3
4 0 4
5 5
• k = 3: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...3 A4...4 .
m[2, 3] + m[4, 4] + p1 p3 p4 = 200 + 0 + 1 · 2 · 50 = 300
The value is less than the value in m[2, 4] = 15200, thus we overwrite
it and actualise the grouping location in s[2, 4] with k = 3.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5
• i=1
• k = 1: We test multiplication cost for sequence Ai...j = A1...4 grouped
into A1...1 A2...4 .
m[1, 1] + m[2, 4] + p0 p1 p4 = 0 + 300 + 20 · 1 · 50 = 1300
The value is less than the value in m[1, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[1, 4] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 1300 i=1 1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5

6
1 procedure multiply_matrix_chain ( A = hA1 , A2 , . . . , An i , s , i , j
)
2 i f j > i then
3 X ← multiply_matrix_chain (A , s , i , s[i , j ]) ;
4 Y ← multiply_matrix_chain (A , s , s[i , j] + 1 , j) ;
5 r e t u r n multiplication ( X , Y ) ; // multiply matrices
6 else
7 r e t u r n Ai ;
8 end procedure ;

Figure 3: Optimal multiplication of a sequence of matrices.

• k = 2: We test multiplication cost for sequence Ai...j = A1...4 grouped


into A1...2 A3...4 .

m[1, 2] + m[3, 4] + p0 p2 p4 = 2000 + 10000 + 20 · 100 · 50 = 13100

The value is greater than the value in m[1, 4] = 1300. We do not


modify it.
• k = 3: We test multiplication cost for sequence Ai...j = A1...4 grouped
into A1...3 A4...4 .

m[1, 3] + m[4, 4] + p0 p3 p4 = 240 + 0 + 20 · 2 · 50 = 2240

The value is greater than the value in m[1, 4] = 1300. We do not


modify it.
We have calculated costs of multiplications of all subsequnences. Let’s find the
optimal grouping of sequence A1...4 (Fig. 3). First we read m[1, 4] = 1, so we split the
sequence after the first matrix: (A1 )(A2 A3 A4 ). Then we read m[2, 4] = 3, so we split
sequence A2...4 after A3 . Thus we have (A1 )((A2 A3 )A4 ). It is the optimal grouping
with 1300 scalar multiplications.
Please notice that multiplication cost of four matrices A1...4 is less than multiplic-
ation cost of shorter sequence A1...3 . By adding one more matrix to the sequence we
reduce a number of scalar multiplications. (finis)

Problem 1. What are the features of a sequence A1...n and matrix An+1 so that the
multiplication cost of A1...n is less then cost of A1...n+1 ? 

1.1 Computational complexity


Algorithm paranthesize (Fig. 2) has time complexity Tt ∈ O(n3 ) and space
complexity Ts ∈ O(n2 ), where n is a number of matrices in a sequence.
Problem 2. Why? 

2 Longest common subsequence problem


Definition 1. Subsequence can be obtained from a sequence by removing 0 or more
elements from any position.

7
Example 5. Sequence: (M, I, S, S, I, S, S, I, P, I), examples of subsequences: (I, I),
(M, I, S, S), (S, S, S, S), (M, I, P ). But you cannot swap elements! (P, I, M ) is not
a subsequence of (M, I, S, S, I, S, S, I, P, I). (finis)

Definition 2. Substring can be obtained from a sequence by removing 0 or more con-


secutive elements.
Example 6. Sequence: (M, I, S, S, I, S, S, I, P, I), examples of substrings: (M, I, S, S),
(I, S, S, I, P ), (S, I, S, S). (finis)

Definition 3. Longest common subsequence of two sequences is a common subsequence


of the maximal possible length.
The problem is commonly solved with dynamic programming. Given sequences
A = a1 a2 . . . an and B = b1 b2 . . . bm let’s assume we have read i − 1 symbols from
sequence A and j − 1 symbols from sequence B. We have found the length of the
longest common sequence c[i, j]. Now we would like to test the next symbol from
both sequences. If ai = bj , we just add the symbol and increase the length of the
common sequence. If the symbols do not match ai 6= bj , we use the longest of two
elaborated sequences so far. Let’s formalise it:

when ai = bj

c[i − 1, j − 1] + 1
c[i, j] = (5)
max(c[i, j − 1], c[i − 1, j]) when ai 6= bj

with the boundary conditions:

c[i, 0] = c[0, j] = 0 for any valid i, j (6)

Example 7. Let’s find the longest common sequence for sequences X = (A, B, A,
C, A, B, C, A) and Y = (A, C, A, B, C, B). First we fill a matrix with the algorithm
presented in Fig. 4.

8
1 procedure find_LCS ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4 f o r i ← 0 to m do c [ i , 0 ] ← 0 ; end f o r ;
5 f o r j ← 0 to n do c [ 0 , j ] ← 0 ; end f o r ;
6

7 f o r i ← 1 to m do
8 f o r j ← 1 to n do
9 i f xi = yj then
10 c[i , j] ← c[i − 1 , j − 1] + 1 ;
11 b[i , j] ← "- " ;
12 else
13 i f c [ i − 1 , j ] > c [ i , j −1] then
14 c[i , j] ← c[i − 1 , j ] ;
15 b[i , j] ← "↑ " ;
16 else
17 c[i , j] ← c[i , j − 1];
18 b[i , j] ← "← " ;
19 end i f ;
20 end i f ;
21 end f o r ;
22 end f o r ;
23 end procedure ;

Figure 4: Psedocode for the longest common sequence search algorithm.

9
Y = A C A B C B

j→ 1 2 3 4 5 6

X i↓
0 0 0 0 0 0 0
- ← - ← ← ←
A 1
0 1 1 1 1 1 1
↑ ↑ ↑ - ← ←
B 2
0 1 1 1 2 2 2
- ↑ - ↑ ↑ ↑
A 3
0 1 1 2 2 2 2
↑ - ↑ ↑ - ←
C 4
0 1 2 2 2 3 3
- ↑ - ← ↑ ↑
A 5
0 1 2 3 3 3 3
- ↑ - ↑ ↑ ↑
A 6
0 1 2 3 3 3 3
↑ ↑ ↑ - ← -
B 7
0 1 2 3 4 4 4
- ↑ - ↑ ↑ ↑
A 8
0 1 2 3 4 4 4

Fig. 5 presents an algorithm for printing the longest common subsequence. (finis)
Problem 3. The solution we have found in Example 7 is not the only one common
sequence with 4 elements. How to modify the algorithm to get other solutions? 

If we only need the length of the longest common sequence we do not need all
rows in the matrix. We need only the actual and the previous row.

10
1 f u n c t i o n print_LCS ( b , X = hx1 , . . . , xm i , i , j )
2 i f i = 0 or j = 0 then
3 return ;
4 end i f ;
5

6 i f b [ i , j ] = " - " then


7 print_LCS ( b , X , i − 1 , j − 1 ) ;
8 print ( xi ) ;
9 else
10 i f b [ i , j ] = " ↑ " then
11 print_LCS ( b , X , i − 1 , j ) ;
12 else
13 print_LCS ( b , X , i , j − 1 ) ;
14 end i f ;
15 end i f ;
16 end procedure

Figure 5: Psedocode for printing the longest common sequence.

11
3 Edit distance
Edit distance is a number of single edit operations (insertions, deletions, substitu-
tions) necessary to transform a sequence into other. It is often used as a measure of
similarity of sequences, eg in spellchecking. It if a very close problem to the longest
common sequence search problem. We only have to modify the cost formula. The
formula for calculation of edit distance:

c[i, 0] = icd (7)


c[0, j] = jci (8)

and for i ∈ [1, n] and j ∈ [1, m]:

c[i, j] = min(c[i, j − 1] + ci , c[i − 1, j] + cd , c[i − 1, j − 1] + cr (ai , bj )), (9)

where cd stands for deletion cost, ci – insertion cost, and cr – substitution cost. Vari-
ants:
• cr (x, y) can be more complex, e.g., can depend on “similarity” of symbols or
“proximity” of symbols (in spellchecking applications it is much more probable
to press ‘a’ instead of ’s’ than ’a’ instead of ’o’);
• cr (x, y) = 0 for any pair of symbols — this is indel distance;
• sometimes we allow to swap the neighbour symbols (as a single edit operation).
The edit distance problem is solved in a very similar way as the longest common
sequence problem – Fig. 6.
Example 8. Let’s find the edit distance between sequences X = (A, B, A, C, A, B,
C, A) and Y = (A, C, A, B, C, B). Assume costs: ci = 2, cd = 3, and cr (a, b) = 4
for a 6= b and cr (a, a) = 0.

12
1 f u n c t i o n find_edit_distance ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i ,
ci , cd , cr )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4

5 for i ← 0 to m do
6 c[i , 0] ← i ∗ cd ;
7 b[i , 0] ← "↑ " ;
8 end f o r ;
9 for j ← 0 to n do
10 c[0 , j] ← j ∗ ci ;
11 b[0 , j] ← "← " ;
12 end f o r ;
13

14 f o r i ← 1 to m do
15 f o r j ← 1 to n do
16 insertion_cost = c [ i , j − 1 ] + ci ;
17 deletion_cost = c [ i − 1 , j ] + cd ;
18 replacement_cost = c [ i − 1 , j − 1 ] + cr (xi , yj ) ;
19

20 c [ i , j ] ← insertion_cost ;
21 b[i , j] ← "← " ;
22

23 if c[i , j ] > deletion_cost then


24 c[i , j ] ← deletion_cost ;
25 b[i , j] ← "↑ " ;
26 end i f ;
27 if c[i , j ] > replacement_cost then
28 c[i , j ] ← replacement_cost ;
29 b[i , j] ← "- " ;
30 end i f ;
31 if
32 end f o r ;
33 end f o r ;
34 end procedure

Figure 6: Pseudocode for edit distance problem.

13
Y = A C A B C B

j→ 1 2 3 4 5 6

← ← ← ← ← ←
X i↓
0 2 4 6 8 10 12
↑ - ← ← ← ← ←
A 1
3 0 2 4 6 8 10
↑ ↑ - ← - ← ←
B 2
6 3 4 6 4 6 8
↑ - - - ← ← ←
A 3
9 6 7 4 6 8 10
↑ ↑ - ↑ - - ←
C 4
12 9 6 7 8 6 8
↑ - ↑ - ← ↑ -
A 5
15 13 9 6 8 9 10
↑ - ↑ - - ← -
A 6
18 15 11 9 10 12 13
↑ ↑ ↑ ↑ - ← -
B 7
21 18 14 13 9 11 12
↑ - ↑ - ↑ - ←
A 8
24 21 17 14 12 13 15

A C A B C B

A B A C A A B A

The cost of transformation of sequence Y into X is 15. If we follow the arrows we


can transform X into Y : first delete final B, replace C with A, copy B and A, insert A,
copy C and A, insert B and A. (finis)
Problem 4. What the cost of transformation of sequence (A, C, B, C, A, B) into
sequence (A, A, B, C, A, C, B)? How to transform the input sequence at minimal
cost? Assume costs: ci = 3, cd = 2, and cr (a, b) = 4 for a 6= b and cr (a, a) = 1. 

14
Graphs (part 1)
Krzysztof Simiński

Algorithms and data structures


lecture 10, 08th May 2020

Definition 1. Graph G = (V, E) is a pair of sets: a set of vertices (nodes, points) V and
a (multi)set of edges (links, arrows, arcs) E.
Number of vertices is denoted with n or |V|. Number of edges is denoted with m
or |E|.
Definition 2. An edge e = (va , vb ) is a pair of two vertices va and vb .

Some definitions allow multiple edges between the same vertices – such graphs
are called multigraphs.
Definition 3. An edge e = (va , vb ) is undirected, if (va , vb ) ∈ E and (vb , va ) ∈ E,
where E is a set of edges.

Definition 4. An undirected graph has only undirected edges.


Definition 5. An edge e = (va , vb ) is directed, if (va , vb ) ∈ E and (vb , va ) ∈
/ E, where
E is a set of edges.
Definition 6. A directed graph has at least one directed edge.

Edge (va , vb ) joins nodes va and vb or is incident to va and vb .


Definition 7. The degree of a vertex in a graph is the number of edges incident to the
vertex.
Definition 8. The path in a graph is a sequence of vertices and edges that joins the
starting vertex and the ending vertex.
Definition 9. The length of a path is a number of edges in the path.
Definition 10. An undirected graph in which a path exists between any pair of vertices
is a connected graph.

1 Graph representation
There are two common representations of graphs in computers.
1. Adjacency list is a list of vertices and each vertex has its own list of neighbours
(vertices it has links to).

1
2. Adjacency matrix is a square matrix in which each vertex is represented by both
a column and a row. A ‘1’ in a cell denotes a node represented by cell’s row is
connected with a node represented by cell’s column.
(
1, if (i, j) ∈ E
aij = (1)
0, otherwise

Example 1. G = (V, E) is a graph with vertices: V = {v1 , v2 , v3 , v4 } and edges:


E = {(v1 , v2 ), (v1 , v3 ), (v2 , v1 ), (v2 , v4 ), (v3 , v1 ), (v3 , v2 ), (v4 , v3 )}.
Diagrammatic form of the graph:

v2 v4

v1 v3

Adjacency list:

v1 v2 v3

v2 v1 v4

v3 v1 v2

v4 v3

Adjacency matrix:
 
0 1 1 0
1 0 0 1
 
1 1 0 0
0 0 1 0

We can easily compute degrees of all vertices: a sum of 1’s in a row denotes an output
degree of a vertex, a sum of 1’s in a column denotes an input degree of a vertex.
(finis)

2 Graph searching
Graph searching is a method of visiting all vertices in a graph. There two essential
algorithms: breadth-first search (BFS) and depth-first search (DFS). They are often a
base for more sophisticated graph algorithms.

2
1 procedure breadth_first_search ( G = (V, E), s )
2 // G: graph
3 // s: start vertex
4

5 // initialization:
6 foreach v in V \ s do
7 v . state ← unvisited ; // valid states: unvisited, visited, analysed
8 v . distance ← ∞ ; // distance from s vertex
9 v . predecessor ← null ;
10 end foreach ;
11

12 s . state ← visited ;
13 s . distance ← 0 ;
14 s . predecessor ← null ;
15

16 // beadth first search:


17 Q . push ( s ) ; // a queue
18 while Q 6= ∅ do // while queue is not empty
19 u ← Q . pop ( ) ;
20 foreach node v incident to u do
21 i f v . state = unvisited then
22 v . state ← visited ;
23 v . distance ← u . distance + 1 ;
24 v . predecessor ← u ;
25 Q . push ( v ) ;
26 end i f ;
27 end foreach ;
28 u . state ← analysed ;
29 end while ;
30 end procedure ;

Figure 1: Breadth-first search.

3
2.1 Breadth first search
Breadth-first search algorithm starts from any vertex of a graph, visits its all neigh-
bours. Having visited all neighbours it visits all neighbours’ neighbours. It proceeds
until all vertices are visited (Fig. 1).
Example 2. Let’s apply breadth-first search to graph G = (V, E), where: V =
{A, B, C, D, E, F, G, H, I} and edges: E = {(A, D), (A, E), (B, C), (C, F ), (D, G),
(E, F ), (F, H), (F, I)}.
Let’s use adjacency matrix (we only put 1’s):
A B C D E F G H I
A 1 1
B 1
C 1 1 1
D 1 1
E 1 1 1
F 1 1 1 1
G 1
H 1
I 1
Let’s start with vertex E (any vertex is a good choice). First we initialize all vertices
(line 6 in Fig. 1).

node state distance predecessor A B C D E F G H I


A unvisited ∞ null 1 1
B unvisited ∞ null 1
C unvisited ∞ null 1 1 1
D unvisited ∞ null 1 1
E visited 0 null 1 1 1
F unvisited ∞ null 1 1 1 1
G unvisited ∞ null 1
H unvisited ∞ null 1
I unvisited ∞ null 1
We put E into queue Q (line 17):

node state distance predecessor A B C D E F G H I


A unvisited ∞ null 1 1
B unvisited ∞ null 1
C unvisited ∞ null 1 1 1
D unvisited ∞ null 1 1
E visited 0 null 1 1 1
F unvisited ∞ null 1 1 1 1
G unvisited ∞ null 1
H unvisited ∞ null 1
I unvisited ∞ null 1
Q E
We pop E from the queue and visit all its neighbours – we push them in the queue
(line 20):

4
node state distance predecessor A B C D E F G H I
A visited 1 E 1 1
B unvisited ∞ null 1
C visited 1 E 1 1 1
D unvisited ∞ null 1 1
E analysed 0 null 1 1 1
F visited 1 E 1 1 1 1
G unvisited ∞ null 1
H unvisited ∞ null 1
I unvisited ∞ null 1
Q A, C, F
The queue is not empty. We pop a vertex from the queue:

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B unvisited ∞ null 1
C visited 1 E 1 1 1
D visited 2 A 1 1
E analysed 0 null 1 1 1
F visited 1 E 1 1 1 1
G unvisited ∞ null 1
H unvisited ∞ null 1
I unvisited ∞ null 1
Q C, F , D
A has two neighbours, but we visit only D, because E is already analysed.

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B visited 3 B 1
C analysed 1 E 1 1 1
D visited 2 A 1 1
E analysed 0 null 1 1 1
F visited 1 E 1 1 1 1
G unvisited ∞ null 1
H unvisited ∞ null 1
I unvisited ∞ null 1
Q F , D, B

5
node state distance predecessor A B C D E F G H I
A analysed 1 E 1 1
B visited 3 B 1
C analysed 1 E 1 1 1
D visited 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G unvisited ∞ null 1
H visited 2 F 1
I visited 2 F 1
Q D, B, H, I

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B visited 3 B 1
C analysed 1 E 1 1 1
D analysed 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G visited 3 D 1
H visited 2 F 1
I visited 2 F 1
Q B, H, I, G
We analyse neighbours of B. It has no unvisited neighbours.

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B visited 3 B 1
C analysed 1 E 1 1 1
D analysed 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G visited 3 D 1
H visited 2 F 1
I visited 2 F 1
Q H, I, G
We analyse neighbours of H. It has no unvisited neighbours.

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B analysed 3 B 1
C analysed 1 E 1 1 1
D analysed 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G visited 3 D 1
H analysed 2 F 1
I visited 2 F 1
Q I, G

6
We analyse neighbours of I. It has no unvisited neighbours.

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B analysed 3 B 1
C analysed 1 E 1 1 1
D analysed 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G visited 3 D 1
H analysed 2 F 1
I analysed 2 F 1
Q G
We analyse neighbours of G. It has no unvisited neighbours.

node state distance predecessor A B C D E F G H I


A analysed 1 E 1 1
B analysed 3 B 1
C analysed 1 E 1 1 1
D analysed 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G analysed 3 D 1
H analysed 2 F 1
I analysed 2 F 1
Q
The queue is empty. All vertices have been analysed. The implementation we use
does not only visit all vertices of a graph. It also elaborates paths from the starting
vertex to all reachable vertices in a graph. The distances are elaborated explicitly. The
elaborated predecessors make it possible to reconstruct a path from the start vertex
to all reachable vertices. (finis)
Problem 1. Reconstruct all paths from the starting vertex in Example 2. 

Computation complexity Each vertex in pushed into and popped from a queue
once. Each edge is used only once. Thus Tt ∈ O(|V| + |E|).

2.2 Depth first search


Depth-first algorithm starts with a node, visits its neighbour and is run recursively
for the neighbour. It transverses a graph as deep as only possible before backtracking
(Fig. 2).
Example 3. Let’s apply depth-first search for the graph from Example 2. Again let’s
start with vertex E. We visit E. We change its state to ‘visited’. It has several neigh-
bours. We visit E’s neigbour: A (it is an arbitrary choice, we may visit any unvisited
neighbour). We change its state to ‘visited’. A has only one unvisited neighbour: D.
We visit D and change its state to ‘visited’. D has only one unvisited neighbour: G.

7
1 procedure depth_first_search ( G = (V, E), s )
2 // G: graph
3 // s: start vertex
4

5 // initialization:
6 foreach v in V do
7 v . state ← unvisited ; // valid states: unvisited, visited, analysed
8 v . predecessor ← null ;
9 end foreach ;
10

11 visit ( s ) ;
12 end procedure ;
13

14 procedure visit ( u )
15 u . state ← visited ;
16

17 foreach node v incident to u do


18 i f v . state = unvisited then
19 v . predecessor ← u ;
20 visit ( v ) ;
21 end i f ;
22 u . state ← analysed ;
23 end procedure ;

Figure 2: Depth-first search.

8
We visit G and change its state to ‘visited’. G has no unvisited neighbours, we change
its state to ‘analysed’ and trace back to D. D has no unvisited neighbours (is ‘ana-
lysed’) and we trace back to A. A has no unvisited neighbours (is ‘analysed’) and we
trace back to E. E has unvisited neighbours (C, F ). We choose one of them, eg C.
We change its state to ‘visited’. C has unvisited neighbours (B, F ). We choose one of
them, eg B. We change its state to ‘visited’. B has no unvisited neighbours (is ‘ana-
lysed’) and we trace back to C. C has one unvisited neighbour F . So we move to F .
We change its state to ‘visited’. F has unvisited neighbours (H, I). We choose one of
them, eg H. We change its state to ‘visited’. H has no unvisited neighbours (is ‘ana-
lysed’) and we trace back to F . F has one unvisited neighbour I. We move to I. We
change its state to ‘visited’. I has no unvisited neighbours (is ‘analysed’) and we trace
back to F . F has no unvisited neighbours (is ‘analysed’) and we trace back to C. C
has no unvisited neighbours (is ‘analysed’) and we trace back to E. E has no unvisited
neighbours (is ‘analysed’). All vertices have been analysed. (finis)

Problem 2. What are the values of predecessors in Example 3? 

Computation complexity Each vertex in visited only once. Each edge is trans-
versed only once. Thus Tt ∈ O(|V| + |E|).

2.3 Connected components


A connected component of an undirected graph is a subgraph in which any two
vertices are connected to each other by paths. Detection of connected components is
very simple. We just run the breadth-first or depth-first search algorithm. Each run
detects one component. If there are still some unvisited vertices we run the algorithm
again and calculate one connected component.

3 Cycles
Definition 11. The cycle is a path with the same starting and ending vertex.
Definition 12. The loop is a cycle with length 1.

Definition 13. An acyclic graph has no cycles.

3.1 Detection of cycles


We can easily detect cycles in graphs. We only have to modify the depth-first
search algorithm. We just add one condition (line 21). If we search in depth a branch
in a graph and test neighbours of the node and if a neighbour is already visited and is
not a predecessor, we have a cycle in a graph.

Problem 3. How to print nodes in a cycle? 


Problem 4. Detect (a) cycle(s) in the graph from Example 3. 

3.2 Eulerian cycle


Definition 14. An Eulerian path is a path in a finite graph that transverses every edge
exactly once (vertices may be revisited).

9
1 procedure cycle_detection ( G = (V, E), s )
2 // G: graph
3 // s: start vertex
4

5 // initialization:
6 foreach v in V do
7 v . state ← unvisited ; // valid states: unvisited, visited, analysed
8 v . predecessor ← null ;
9 end foreach ;
10

11 visit ( s ) ;
12 end procedure ;
13

14 procedure visit ( u )
15 u . state ← visited ;
16

17 foreach node v incident to u do


18 i f v . state = unvisited then
19 v . predecessor ← u ;
20 visit ( v ) ;
21 e l s e i f v . state = visited and v 6= u . predecessor
then
22 print ( " cycle detected " ) ;
23 end i f ;
24 u . state ← analysed ;
25 end procedure ;

Figure 3: Detection of cycles

10
1 procedure Euler ( G = (V, E) )
2 // for any vertex v
3 Q ← ∅ ; // empty queue
4 visit ( v ) ;
5 print ( Q ) ;
6 end procedure ;
7

8 procedure visit ( v )
9 foreach vertex u incident to v do
10 remove edge ( v , u ) ;
11 visit ( u ) ;
12 Q . push ( v ) ;
13 end foreach ;
14 end procedure ;

Figure 4: Detection of Eulerian cycle.

Definition 15. An Eulerian cycle is an Eulerian path that starts and ends on the same
vertex.
The problem of the Eulerian cycle in graph was first stated by Leonhard Euler. It is
a famous problem known as the Seven Bridges of Königsberg. In Königsberg (today:
Kalinigrad) two islands on the Pregel River were connected with the mainland with 7
bridges. The question is: Is it possible to cross all bridges only once and return to the
starting point? Euler solved this problem in 1735 and proved it was impossible.
Theorem 1. A connected undirected graph has an Eulerian cycle if and only if each its
vertex has an even degree.
Theorem 2. A connected directed graph has an Eulerian cycle if and only if for each
vertex its input degree equals its output degree.

3.2.1 Detection of Eulerian cycle


The algorithm for detection of Eulerian cycle is presented in Fig. 4.

Example 4. Detect an Eulerian cycle in the graph below.

B D

A C E

G F

Let’s start with vertex B. We visit it. It has two neighbour. We visit C and remove
edge (B, C).

11
B D

A C E

G F

C has three neighbours. Let’s visit G. We remove (C, G).

B D

A C E

G F

G has only one neighbour. Let’s visit A. We remove (G, A).

B D

A C E

G F

A has only one neighbour. Let’s visit B. We remove (A, B).

B D

A C E

G F

B has no more neighbours. We push B into a queue and trace back. A has no more
neighbours. We push A into a queue and trace back. G has no more neighbours. We
push G into a queue and trace back. C has two neighbour. Before we visit one of then
we push C into a queue and trace back. In this moment queue Q = (C, G, A, B). In
the same way we visit D and remove (C, D).

B D

A C E

G F

We visit E.

12
B D

A C E

G F

We visit F .

B D

A C E

G F

We visit C.

B D

A C E

G F

Now we trace back and push vertices into the queue. Finally we print the queue Q =
(B, C, D, E, F, C, G, A, B). (finis)

3.3 Hamiltonian cycle


Definition 16. A Hamiltonian path is a path in a finite graph that transverses every
vertex exactly once.

Definition 17. A Hamiltonian cycle is a Hamiltonian path that starts and ends on the
same vertex.
Determining whether Hamiltonian paths and cycles exist in a graph is NP-complete
problem. We do not know if a polynomial solution exists. We can solve it by testing
all permutations of vertices.

4 Trees
There are several equivalent definitions of a tree in the graph theory.

Definition 18. A tree is a connected acyclic graph.


Definition 19. A tree is a graph in which there exists exactly one path between any two
vertices.
Definition 20. A tree is a connected graph in which removal of any edge disconnects
the graph.

13
Definition 21. A tree is a graph in which adding of a edge makes it a cyclic graph.
Definition 22. A tree is a connected graph in which |V| = |E| + 1.
In some algorithms we use forests.
Definition 23. A forest is an acyclic graph.

4.1 Spanning tree


Definition 24. A spanning tree T of a undirected graph G is a tree that has exactly the
same vertices as graph G.
Definition 25. A weighted graph Gw (V, E, w), where w : e → R is a function that
assigns each edge with a real number. The number is called weight (length) of an edge.
For a unweighed graph we use depth-first search algorithm for elaboration of a
spanning tree. We run the algorithm and add transversed edges to a spanning tree.
Example 5. Let’s find a spanning tree for the graph from Example 3. We start with
vertex E and transverse edge (E, A), then (A, D). (D, G). Then we trace back to
E. We transverse (E, C), (C, B). We trace back to C. We transverse (C, F ), (F, H).
Then we trace back to F and transverse (F, I). Then we trace back to E. All node have
been visited. The spanning tree has exactly the same set of nodes. The set of edges of
the spanning tree is {(E, A), (A, D), (D, G), (E, C), (C, B), (C, F ), (F, H), (F, I)}.
(finis)
For weighted graphs we define a minimal spanning tree.
Definition 26. A minimal spanning tree for a weighted graf is a spanning tree with
minimal sum of weights of edges.
The first algorithm was invented by Otakar Borůvka who used it to solve the
problem of optimisation of high voltage electric grid in Czechoslovakia in 1926.
There are several algorithms for minimal spanning trees: Borůvka’s algorithm,
Jarník-Prim algorithm, Kruskal’s algorithm. They are quite similar. We present here
the Kruskal’s algorithm because it uses an interesting data structure for fast perform-
ance.

4.1.1 Kruskal’s algorithm


Kruskal’s algorithm is a greedy algorithm. Its pseudocode is presented in Fig. 5.
Example 6. Find a minimal spanning tree for the graph below. It is an undirected
weighted graph.
3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

14
1 procedure Kruskal ( G = (V, E, w) )
2 // G: graph
3 // w: edge weights
4

5 // initialization:
6 // T = (Vt , Et ) // spanning tree
7 Vt ← ∅ ; // empty spanning tree
8 Et ← ∅ ;
9

10 D ← {{v1 }, {v2 }, . . . , {vn }} ; // set of disjoint sets of vertices


11

12 foreach e ∈ E do
13 Q . push ( e ) ; // priority queue
14 end foreach ;
15

16 while |D| > 1 and Q is not empty do


17 e = (u, v) ← Q . pop ( ) ; // lowest cost edge
18 i f u ∈ Du 6= Dv 3 v // u and v belong to different sets in D
19 if u ∈ / Vr then
20 Vt ← Vt ∪ {u} ;
21 else
22 Vt ← Vt ∪ {v} ;
23 end i f ;
24 Et ← Et ∪ {(u, v)} ;
25 replace Du and Dv in D with Du ∪ Dv ;
26 end i f ;
27 end while ;
28

29 r e t u r n T = (Vt , Et ) ;
30 end procedure ;

Figure 5: Kruskal’s algorithm for minimal spanning trees in weighted graphs.

15
First we put each vertex in its one-element set (line 10) and we put all edges into a
priority queue (line 13).
3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

Then we pop an edge with the smallest weight from the queue and check if the edge
joins nodes in two different sets. In our example the lightest edge joins B and E. These
vertices are in two different sets (red and violet). We add the edge to the spanning tree
(line 24) and merge (line 25) sets of nodes (now B and E are in a red set).
3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

We pop an edge with the smallest weight from the queue and check if the edge joins
nodes in two different sets. In our example it joins D and G. These vertices are in two
different sets (blue and black). We add the edge to the spanning tree and merge sets
of nodes.
3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

We pop an edge with the smallest weight from the queue and check if the edge joins
nodes in two different sets. In our example it joins A and B. These vertices are in two

16
different sets (yellow and red). We add the edge to the spanning tree and merge sets
of nodes.
3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

We continue with H and I.


3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

And continue with A and I.


3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

And continue with I and F.

17
3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

The next edge we pop from the queue has weight 7. It joins E and F. But we do not
add it to the spanning tree because both E and F are in the same set (the red set). So
we pop the next edge – its weight is 8. We can add this edge to the spanning tree.

3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

The next edge has weight 7. It joins C and E. We cannot add it. We add edge with
weight 10.

3 11
A B C

1 8
13 9

5 7
D E F

10
2
6
12 4
G H I

No more edges can be added. The minimal spanning tree has red vertices and red
edges. (finis)

Computational complexity There are two crucial points in the algorithm:


1. priority queue: pushing and popping of all edges has complexity O (|E| log |E|);

18
1 procedure MakeSet ( x )
2 x . parent = x ;
3 x . rank = 0;
4 end procedure ;

Figure 6: Disjoint-set data structure: make set procedure.

2. testing if two vertices belong to the same set of vertices – fortunately this pro-
cedure has low complexity due to applied disjoint-set data structure.
The computational complexity of the Kruskal’s algorithm is O (|E| log |E|).

4.1.2 Disjoint-set data structure


The crucial part of the Kruskal’s algorithm is a fast algorithm for testing if two
vertices belong to the same set. We use the disjoint-set data structure. It implements
three procedures:
• MakeSet that makes a one-element set for each item – Fig. 6;
• FindSet that returns a representative of the set an item belongs to (all items in
a set have the same representative) – Fig. 8;
• Union that merges two sets – Fig. 7.
MakeSet procedure is very simple. It creates disjoint sets with only one item each.
Each set is a one-item tree. So we have a forest of disjoint trees. Each item is its own
representative (parent).
The most pivotal problem is implementation of Union procedure to actualise rep-
resentatives in merged trees fast. The height of trees influences the speed of merging.
A shallower tree is added to the root of a deeper tree. Thanks to this approach the
height of the tree grows only if two trees of the same height are merged. The tree
whose height is h is added to the root of a tree of height h, so the result tree has
height h + 1. Disjoint-set data structures applies path compressing technique to make
trees shallower. This is why in Union procedure we have ranges instead of heights
(Fig. 7).
FindSet procedure returns a representative of a item. All items in a set have the
same representative. A representative is an item that is its own parent (a root of a
tree). So to find a representative we have to go up to the root of a tree. To make it
faster we use path compression. After we have found a representative of item x we
set the representative to be a parent of all items in the path from x to the root. Thanks
to this techniques the amortized time per operation is O(α(n)), where α(n) is the
inverse of the function n = f (x) = A(x, x), and A is the Ackermann function. The
Ackermann function is defined as

n + 1
 for m = 0
A(m, n) = A(m − 1, 1) for m > 0 and n = 0 (2)
A(m − 1, A(m, n − 1)) for m > 0 and n > 0

This is an extremely fast growing function, eg. A(4, 2) = 265536 − 3. Let’s denote
f (n) = A(n, n). Let’s define the inverse of the function f as α = f −1 . This is very,

19
1 procedure Union ( x , y )
2

3 parent_x ← FindSet ( x ) ;
4 parent_y ← FindSet ( y ) ;
5

6 i f parent_x = parent_y then


7 r e t u r n ; // x and y belong to the same set.
8

9 // merge disjoint sets, add shorter tree to a longer one


10 i f parent_x . rank < parent_y . rank then
11 parent_x . parent ← parent_y ;
12 e l s e i f parent_x . rank > parent_y . rank then
13 parent_y . parent ← parent_x ;
14 e l s e // the same ranks
15 parent_y . parent ← parent_x ;
16 parent_x . rank ← parent_x . rank + 1 ;
17 end i f
18 end procedure

Figure 7: Disjoint-set data structure: union procedure.

1 procedure FindSet ( x )
2 i f x . parent 6= x then
3 x . parent ← FindSet ( x . parent ) ; // path compression
4 end i f ;
5

6 r e t u r n x . parent ;
7 end procedure ;

Figure 8: Disjoint-set data structure: find set procedure.

very slow growing function. α(n) is less than 5 for all remotely practical values of n,
1019729
because f (4) = A(4, 4) ≈ 22 . In practice α(n) < 5, so we treat it as a constant.
In practice the disjoint-set data set works in constant time.
Example 7. Let’s analyse how the disjoint-set structure works. We have six items:
A, B, C, D, E, F .
First we call MakeSet procedure (Fig. 6). It creates a disjoint set for each item.
Rank of each item is zero. Each item is its own parent.

A B C D E F
0 0 0 0 0 0

Let’s merge A with B, C with D, and E with F . We call Union procedure (Fig. 7).
Union procedure calls FindSet, but it is trivial in this case.

20
A B C D E F
1 0 1 0 1 0

Let’s merge D and F . Union procedure calls FindSet for D and for F . C is returned
for D: C is D’s representative. And similarly E is returned for F . Both trees have
the same “height” (we do not use straight height but “height” because we use ranges
here).

A B C D E F
1 0 2 0 1 0

Let’s merge B and F . Union procedure calls FindSet for B and for F . C is returned
for F . FindSet procedure not only returns a representative but also compresses the
path. All item on the path from F to C have their parents set to C. It makes the tree
very shallow. The tree with A representative is now a subtree of the tree with higher
range.

A B C D E F
1 0 2 0 1 0

And eventually let’s find a representative for B. We call FindSet and it returns C.
FindSet procedure uses path compression. All items on the path from B to its rep-
resentative are modified and their parents are set to the representative C.

A B C D E F
1 0 2 0 1 0

(finis)

21
Graphs (part 2, shortest paths)
Krzysztof Simiński

Algorithms and data structures


lecture 11, 08th May 2020

Today we discuss algorithms for calculation of the shortest paths in weighted


graphs. This is a very practical problem, eg. the Dijkstra’s algorithm is used in vehicle
tracing or network routing (some protocols use the Bellman-Ford-Moore algorithm).
First let’s recall some definitions.
Definition 1. Graph G = (V, E) is a pair of sets: a set of vertices (nodes, points) V and
a (multi)set of edges (links, arrows, arcs) E.
Definition 2. A weighted graph Gw (V, E, w), where w : e → R is a function that
assigns each edge with a real number. The number is called weight (length) of an edge.
Number of vertices is denoted with n or |V|. Number of edges is denoted with m
or |E|.

shortest paths

between all pairs of vertices from one vertex to all other

Floyd-Warshall alg.

no negative weights negative weights allowed

Dijkstra’s alg. Bellman-Ford-Moore alg.

Figure 1: Decision tree for algorithm selection.

1
1 Floyd-Warshall algorithm
The Floyd-Warshall algorithm is a dynamic programming algorithm for calcula-
tion of the shortest paths between all pairs of vertices.
The idea is straightforward. Let’s assume we have found a path from vertex X to
w
Y whose length is w: X − → Y . We test each vertex of the graph as an intermediate
u v
vertex. For each vertex V ∈ V we test if the path X − → V − → Y is shorter than
w
X− → Y . If it is shorter we store the information in an array (Fig. 2).
In the algorithm we use an adjacency matrix representation of a graph.
Example 1. Let’s find the shortest paths between all pairs of vertices in the graph
below.

3
B D

7
4 −1 10 2

12

A C

First we initialise the distance d0 and predecessor p0 arrays (line 6 in Fig. 2). The
distance array is an adjacency matrix for the graph with ∞ if an edge does not exist.
d0 A B C D p0 A B C D
A 0 4 12 ∞ A A B C 0
B −1 0 ∞ 3 B A B 0 D
C 2 7 0 10 C A B C D
D ∞ ∞ 2 0 D 0 0 C D
For each X → Y edge we test if the path via vertex A, ie. X → A → Y is shorter
than X → Y . It if is, we actualise both d (line 30) and p arrays (line 31).
dA A B C D dA A B C D
A 0 4 12 ∞ A A B C 0
B −1 0 13 3 B A B A D
C 2 6 0 10 C A A C D
D ∞ ∞ 2 0 D 0 0 C D
We repeat the same procedure for vertex B. We test if X → B → Y is shorter than
X →Y.
dB A B C D pB A B C D
A 0 4 12 7 A A B C B
B −1 0 13 3 B A B A D
C 2 6 0 9 C A A C A
D ∞ ∞ 2 0 D 0 0 C D
The same for C …

2
1 procedure Floyd_Washall ( G = (V, E), w )
2 // w – edge weights
3 // d – array of distances
4 // p – array of predecessors
5

6 n ← |V| ; // number of edges


7

8 f o r i ← 1 to n do // initialisation
9 f o r j ← 1 to n do
10 i f i = j then
11 d[i , j] ← 0;
12 e l s e i f ∃wij then
13 d [ i , j ] ← wij ;
14 else
15 d[i , j] ← ∞;
16 end i f ;
17 i f d [ i , j ] ̸= ∞ then
18 p[i , j] ← j ;
19 else
20 p[i , j] ← 0;
21 end i f ;
22 end f o r ;
23 end f o r ;
24

25 f o r k ← 1 to n do // elaboration of distances
26 f o r i ← 1 to n do
27 i f d [ i , k ] ̸= ∞ then
28 f o r j ← 1 to n do
29 i f d [ i , k ] + d [ k , j ] < d [ i , j ] then
30 d[i , j] ← d[i , k] + d[k , j ] ;
31 p[i , j] ← p[i , k ] ;
32 end i f ;
33 end f o r ;
34 end i f ;
35 end f o r ;
36 end f o r ;
37 end procedure ;

Figure 2: Floyd-Warshall algorithm

3
dC A B C D pC A B C D
A 0 4 12 7 A A B C B
B −1 0 13 3 B A B A D
C 2 6 0 9 C A A C A
D 4 8 2 0 D C C C D
… and D.
dD A B C D pD A B C D
A 0 4 9 7 A A B B B
B −1 0 5 3 B A B D D
C 2 6 0 9 C A A C A
D 4 8 2 0 D C C C D

(finis)

Computation complexity First we have to initialise arrays what takes O(n2 ) time.
Then three nested loops: O(n3 ). Finally time complexity is O(n3 ). The space com-
plexity O(n2 ) – two square arrays n × n.

2 Bellman-Ford-Moore algorithm
The Bellman-Ford-Moore algorithm is a greedy algorithm for calculating the shor-
test paths from one reference vertex (source) to all other vertices. The algorithm allows
negative weights of edges (cf Fig. 1).
The algorithm is presented in Fig. 3. It uses relaxation of edges in a graph. The
idea is quite simple. Let’s assume we try to relax edge (a, b) whose weight is wab .
Let’s assume the distance from the source to vertex a id da and to vertex b is vb . If we
get to a and use edge (a, b), the distance is da + wab . If it is less then db , we use the
edge in the path.
Example 2. The same graph as in Example 1. The task is to calculate the shortest paths
from vertex D (source).

3
B D

7
4 −1 10 2

12

A C

The order of edges in relaxation is: w(A, B) = 4, w(A, C) = 12, w(B, A) = −1,
w(B, D) = 3, w(C, A) = 2, w(C, B) = 7, w(C, D) = 10, w(D, C) = 2.
First we initialize two arrays: d for distance and p for predecessors (line 5 in Fig. 3):

4
1 procedure Bellman_Ford_Moore ( G = (V, E), w, s )
2 // w – edge weights
3 // s – start vertex
4

5 // initialisation:
6 d[s] ← 0; // array of distances
7 p[s] ← 0; // array of predecessors
8 foreach v in V \ s do
9 d[v] ← ∞; // array of distances
10 p[v] ← 0; // array of predecessors
11 end foreach ;
12

13 // relaxation of edges:
14 f o r i ← 1 to |V| − 1 do
15 foreach e = (u, v) ∈ E do // for each edge
16 i f d [ u ] + wuv < d [ v ] then
17 d [ v ] = d [ u ] + wuv ;
18 p[v] = u ;
19 end i f ;
20 end foreach ;
21 end f o r ;
22

23 end procedure ;

Figure 3: Bellman-Ford-Moore algorithm

5
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ A 0
B ∞ B 0
C ∞ C 0
D 0 D 0

In each iteration we try to relax all edges in the following order: w(A, B) =
4, w(A, C) = 12, w(B, A) = −1, w(B, D) = 3, w(C, A) = 2, w(C, B) = 7,
w(C, D) = 10, w(D, C) = 2. We can only relax the last edge, because its starting
vertex D is the only vertex with a finite distance. The distance for D is 0. If we use
edge (D, C) with weight w = 2 we can get to C at distance 2.

d0 d1 d2 d3 p0 p1 p2 p3
A ∞ ∞ A 0 0
B ∞ ∞ B 0 0
C ∞ 2 C 0 C
D 0 0 D 0 0

In the second iteration we try to relax all edges. Having relaxed the edges we elaborate
shorter distances to vertex A and B.
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ ∞ 4 A 0 0 C
B ∞ ∞ 9 B 0 0 C
C ∞ 2 2 C 0 C C
D 0 0 0 D 0 0 0
The last iteration shortens the distance to B.
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ ∞ 4 4 A 0 0 C C
B ∞ ∞ 9 8 B 0 0 C A
C ∞ 2 2 2 C 0 C C C
D 0 0 0 0 D 0 0 0 0
Let’s reconstruct the shortest paths from the source vertex D.
From the left array we read the distance for D → A. It is 4. Let’s reconstruct the
path D → A. From the right array we read: if you want to get to A, you have to get
to C first: D → C → A. If you want to get to C, just go straight to C. We have just
reconstructed the path.
Let’s reconstruct the path D → B. If you want to get to B, first go to A: D →
A → B. If you want to get to A, you have to get to C first: D → C → A → B. And
the distance is 8.
The path D → C has no intermediate vertices. (finis)

Computation complexity The external loop (line 14 in Fig. 3) has |V| − 1 iterations.
The internal loop (line 15 in Fig. 3) has |E| iterations. Finally the time complexity is
O (|V||E|).

6
1 procedure Dijkstra ( G = (V, E), w, s )
2 // w – edge weights
3 // s – start vertex
4

5 // initialisation:
6 d [ s ] ← 0 ; // array of distances
7 p [ s ] ← 0 ; // array of predecessors
8 foreach v in V \ s do
9 d[v] ← ∞; // array of distances
10 p[v] ← 0; // array of predecessors
11 end foreach ;
12

13 Q ← V ; // put all vertices into a priority queue


14

15 while queue Q is not empty do


16 u ← a vertex from priority queue Q with minimal distance d ;
17 foreach vertex v incident with vertex u do
18 relaxation ( u , v ) ;
19 end foreach ;
20 end while ;
21 end procedure ;
22

23 procedure relaxation ( u , v ) ;
24 // u and v are vertices
25 i f d [ u ] + wuv < d [ v ] then
26 d [ v ] ← d [ u ] + wuv ;
27 p[v] ← u ;
28 end i f ;
29 end procedure ;

Figure 4: Dijkstra’s algorithm

3 Dijkstra’s algorithm
The Dijkstra’s¹ algorithm is a greedy algorithm for elaboration of the shortest
paths from a source to all other vertices. The algorithm cannot be applied for graphs
with negative weights of edges (cf Fig. 1). The algorithm is based on the breadth-first
search (Fig. 4). The first step is initialisation of distances. All vertices are assigned with
infinite distance, but the source vertex that is assigned with zero (line 5 in Fig. 4). Then
we put all vertices into the priority queue Q (line 13). We pop vertex u with minimal
distance (line 13). Then we relax all neighbours (breadth-first approach) of vertex u
and actualise their minimal distances. We end the algorithm when the queue is empty.
Example 3. Let’s run the Dijkstra’s algorithm for the graph below and for source A.
The graph has no negative weights.
¹pronunciation: [ˈdɛjkstɾa]

7
C 1
5
1
B E
3
10
4 4

15
A F

20 2
D

First we initialise distances and predecessors. We put all vertices in the priority queue.
Vertices in the queue are white.

∞, 0

C 1
∞, 0 5 ∞, 0
1
B E
3
10
4 4
0, 0 ∞, 0
15
A F

20 2
D
∞, 0

We pop a vertex with the minimal distance. It is A. Then we relax all its neighbours.

∞, 0

C 1
4, A 5 10, A
1
B E
3
10
4 4
0, 0 15, A
15
A F

20 2
D
∞, 0

Vertex A is analysed. We pop a vertex with the minimal distance. It is B. Then we


relax all its unvisited neighbours C and E.

8
9, B

C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 15, A
15
A F

20 2
D
∞, 0

Vertex B is analysed. We pop a vertex with the minimal distance. It is E. Then we


relax all its unvisited neighbours C and F .

8, E

C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F

20 2
D
∞, 0

Vertex E is analysed. We pop a vertex with the minimal distance. It is C. It has no


unvisited neighbours.

8, E

C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F

20 2
D
∞, 0

9
We repeat the procedure for F …

8, E

C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F

20 2
D

13, F

… and for D.
8, E

C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F

20 2
D

13, F

The queue is empty. All vertices have been analysed.

8, E

C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F

20 2
D

13, F

10
We have elaborated the shortest paths for all vertices. Let’s reconstruct the path for
D. Its predecessor is F . Its predecessor is E. Its predecessor is B. Its predecessor is
A. Finally A → B → E → F : 13.

Computational complexity The complexity depends on implementation of the queue.


• If the priority queue is implemented with an array, popping a vertex with min-
imal distance needs( O(|V|) ) time. Each vertex is popped once, so popping of
all(vertices lasts
) O |V| 2
. Each edge is used only once. So time complexity is
O |V|2 + |E| .
• If the priority queue is implemented with a binary heap, popping a vertex with
minimal distance lasts O(log |V|). Each vertex is popped once, so popping of
all vertices lasts O (|V| log |V|). Each edge is used only once and may results in
relaxation of the edge and actualisation of a distance in the incident vertex. Ac-
tualisation of vertex minimal distance implies sifting a vertex up or down what
lasts O(log |V|). All actualisations last O(|E| log |V|). Finally the time complex-
ity is O ((|V| + |E|) log |V|).

• If the priority queue is implemented with a Fibonacci heap, popping a vertex


with minimal distance lasts O(log |V|). Each vertex is popped once, so popping
of all vertices lasts O (|V| log |V|). Each edge is used only once and may res-
ults in relaxation and actualisation of distance. Actualisation of vertex minimal
distance in a Fibonacci heap takes constant time O(1), so all actualisations last
O(|E|). Finally the time complexity is O (|V| log |V| + |E|).

(finis)

11
Exhaustive search
Krzysztof Simiński
Algorithms and data structures
lecture 12, 15th May 2020

Contents
1 Eight queens puzzle 1

2 Game trees 5
2.1 Nim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Min-max tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 α-β cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Noughts and crosses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Chess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Criptarithms and alphametics 9


For some problems we cannot propose a mathematical model. We do not know
how to solve them. In such cases a last resort is just to test all possible solutions.
We call this technique exhaustive search. This approach is only applicable to discrete
problems.

Example 1. A triple pa, b, cq P N is called Pythagorean if a2 ` b2 “ c2 . The boolean


Pythagorean triples problem states a question: Can the set N “ t1, 2, . . .u of natural
numbers be divided into two parts, such that no part contains a triple pa, b, cq with
a2 ` b2 “ c2 ?
We do not have any theory to answer this question. The problem was solved with
exhaustive search. It was simply tested that the set t1, . . . , 7824u can be partitioned
into two parts, such that no part contains a Pythagorean triple, while this is impossible
for t1, . . . , 7825u. There are 27825 ą 210¨782 ą 103 ¨ 2782 “ 102346 possible partitions
for the set t1, . . . , 7825u. The size of the proof is 200 TB. (finis)

1 Eight queens puzzle


The eight queens puzzle is a very popular problem of placing eight queens on a
chessboard so that none of queen captures any other queen. A queen in chess captures
in ranks (rows), files (columns), and diagonals. So each queen has to occupy its own
rank, file, and diagonal. One of 92 solutions is presented in Fig. 1. There is no known
formula for the exact number of solutions, or even for its asymptotic behaviour. We
solve this problem with enumeration of all possible positions on a chessboard. The
27ˆ27 board is the highest-order board that has been completely enumerated (Tab. 1).

1
8
0Z0l0Z0Z
7
ZqZ0Z0Z0
6
0Z0Z0ZqZ
5
Z0l0Z0Z0
4
0Z0Z0l0Z
3
Z0Z0Z0Zq
2
0Z0ZqZ0Z
1
l0Z0Z0Z0
a b c d e f g h

Figure 1: One of 92 solutions of the eight queens puzzle.

Table 1: Number of solutions of the n queens problem for various sizes of a chessboard.
number of
size n solutions positions
`16˘
4 2 “ 1 820
`524˘
5 10 “ 53 130
`62 ˘ 5
6 4 6 “ 1 947 792
`72 ˘
7 40 “ 85 900 584
`82 ˘ 7
8 92 “ 4 426 165 368
`92 ˘8
9 352 9 “ 260 887 834 350
`102 ˘
10 724 “ 17 310 309 456 440
`112 ˘10
11 2 680 “ 1 276 749 965 026 536
`12211
˘
12 14 200 “ 103 619 293 824 707 388
`13212
˘
13 73 712 13 “ 9 176 358 300 744 339 432
`142 ˘
14 365 596 “ 880 530 516 383 349 192 480
`152 ˘14
15 2 279 184 15 “ 91 005 567 811 177 478 095 440
. . .
. . .
. . .
`272 ˘
27 234 907 967 154 122 528 27 “ 11 091 107 763 254 898 773 425 731 705 373 527 055 193 637 625 824

2
Because the number of possible positions on a chessboard is huge we have to re-
duce the number of analysed potential solutions. In this problem we know each queen
occupies its own rank, file, and diagonal. So we do not analyse positions with mul-
tiple queens in the same rand, file, or diagonal. This reduces the number of potential
solutions. We can also notice that some solutions are rotations or symmetries.
Let’s solve the problem for a small chessboard 4ˆ4. First let’s encode a chessboard
situation in a concise way. Because each queen is located in a separate file, for each
file we encode a number of rank with a queen.

Example 2. The situation

4
0l0Z
3
Z0Zq
2
qZ0Z
1
Z0l0 is encoded as p2, 4, 1, 3q. (finis)
a b c d

Let’s analyse all possible positions. A question mark ‘?’ denotes an empty column.
We start with the empty chessboard:
p?, ?, ?, ?q
1. p1, ?, ?, ?q
(a) p1, 1, ?, ?q
(b) p1, 2, ?, ?q
(c) p1, 3, ?, ?q
i. p1, 3, 1, ?q
ii. p1, 3, 2, ?q
iii. p1, 3, 3, ?q
iv. p1, 3, 4, ?q
(d) p1, 4, ?, ?q
i. p1, 4, 1, ?q
ii. p1, 4, 2, ?q
A. p1, 4, 2, 1q
B. p1, 4, 2, 2q
C. p1, 4, 2, 3q
D. p1, 4, 2, 4q
iii. p1, 4, 3, ?q
iv. p1, 4, 4, ?q

2. p2, ?, ?, ?q
(a) p2, 1, ?, ?q
(b) p2, 2, ?, ?q
(c) p2, 3, ?, ?q
(d) p2, 4, ?, ?q

3
i. p2, 4, 1, ?q
A. p2, 4, 1, 1q
B. p2, 4, 1, 2q
C. p2, 4, 1, 3q

4
0l0Z
3
Z0Zq
2
qZ0Z
1
Z0l0
a b c d

D. p2, 4, 1, 4q
ii. p2, 4, 2, ?q
iii. p2, 4, 3, ?q
iv. p2, 4, 4, ?q
3. p3, ?, ?, ?q
(a) p3, 1, ?, ?q
i. p3, 1, 1, ?q
ii. p3, 1, 2, ?q
iii. p3, 1, 3, ?q
iv. p3, 1, 4, ?q
A. p3, 1, 4, 1q
B. p3, 1, 4, 2q

4
0ZqZ
3
l0Z0
2
0Z0l
1
ZqZ0
a b c d

C. p3, 1, 4, 3q
D. p3, 1, 4, 4q
(b) p3, 2, ?, ?q
(c) p3, 3, ?, ?q
(d) p3, 4, ?, ?q
4. p4, ?, ?, ?q
(a) p4, 1, ?, ?q
i. p4, 1, 1, ?q
ii. p4, 1, 2, ?q
iii. p4, 1, 3, ?q
A. p4, 1, 3, 1q
B. p4, 1, 3, 2q

4
C. p4, 1, 3, 3q
D. p4, 1, 3, 4q
iv. p4, 1, 4, ?q
(b) p4, 2, ?, ?q
i. p4, 2, 1, ?q
ii. p4, 2, 2, ?q
iii. p4, 2, 3, ?q
iv. p4, 2, 4, ?q
(c) p4, 3, ?, ?q
(d) p4, 4, ?, ?q

There are only two solutions to the four queen problem. It is possible to make
this procedure faster because we can find same patterns eg. symmetry or rotation of
partial solutions.

2 Game trees
Unfortunately we do not know how to express popular games with mathematical
formulae. Very often we build game trees that represents all (or only some) possible
moves of players and game situations. Let’s discuss this technique with a simple game
– «Nim».

2.1 Nim
The Nim game is a strategy game for two players. The game starts with 5 objects.
Each of two players removes 1, 2, or 3 objects. The player who removes the last item
– looses. It is an example of a misère game – the last player to play looses.
Let’s draw a game tree for the Nim game. Each node represents a situation in the
game. Each edge is a move of a player. Dashed lines represent Alice’s moves, solid
lines – Bob’s moves. Fig. 3 presents the tree for all possible situations and moves.

2.1.1 Min-max tree


We have the tree with all possible situations and moves. We can elaborate a win-
ning strategy with the min-max technique. Let’s assume we want to find a winning
strategy for Alice. If Alice wins, the result of the game is `1. If Bob wins (and Alice
loses), the results of the game is ´1. And finally if no one wins (draw), the results is
0. We can assign values to the leaves of the tree. Alice wants to maximise her score,
so she chooses the move with maximal estimation. Bob wants to maximise his score
that is to minimise the Alice’s score, so he chooses the move with minimal estima-
tion. Now we know how to assign estimations to all nodes of the game tree. We start
from the leaves and go up and apply max or min alternatively until we reach the root.
Having done so, we get the min-max tree for the Nim game (Fig. 3).
In the root of the tree we have a negative value. It means if Alice starts the game,
Bob wins. The is no winning strategy for Alice.

5
1 2 3
Alice takes

Bob takes
1 2 3 1 2 3 1 2

Alice takes
1 2 3 1 2 1 1 2 1 1

Bob takes
1 2 1 1 1

Alice takes
1

Figure 2: Game tree for the Nim game. Each node of the tree represents a situation of
the game. Dashed edges stand for Alice’s moves, solid – Bob’s moves.

max ´1

Alice’s move

min ´1 ´1 ´1

Bob’s move

max `1 `1 ´1 `1 ´1 `1 ´1 `1

Alice’s move

min ´1 `1 ´1 `1 ´1 ´1 `1 ´1 ´1 ´1

Bob’s move

max ´1 `1 `1 `1 `1

Alice’s move

´1

Figure 3: Min-max tree for the Nim game.

6
2.2 α-β cut
The Nim game is very simple and we can draw a complete game tree. Unfortu-
nately for most interesting games it is impossible to draw a complete tree. We have to
limit the size of a game tree. The are two common techniques: limitation of the height
of the tree and α-β cut.
Limitation of tree height is a simple technique, but has a difficult consequence.
If we limit tree height it is very possible we do not reach leaves of the tree. If we
do not have access to the leaves, we do not know who wins. It is a serious problem.
We have to estimate the chance of winning of a situation on the deepest level of the
tree without actual knowledge. We commonly use some rules of thumbs, heuristics,
experts’ experience etc. This is why we do not use just `1, 0, and ´1. We use wider
range of values.
The second technique, the α-β cut, reduces the breadth of the tree.
Example 3. Let’s apply the α-β cut for the game tree below. The height of the tree
is 3.
We estimate two situations as 4 and 8. We use max-min tree to elaborate tempor-
ary values of nodes up to the root of the tree.
4

min

Ś Ś Ś
max max max

min min min min min min min

4 8
We calculate the values of two more situations. They are 9 and 3. These values do
not change the value of the root.
4

min

max max max

4 3

min min min min min min min

4 8 9 3
We elaborate two more estimations: 12 and 8. These estimations do not change
the value in the root. But let’s analyse it deeper. If the value of the blue subtree is less
than 8, its value does not propagate upwards, because is blocked by the max operator.
If the value of the blue subtree is greater than 8, its is not blocked by the max operator,

7
but is blocked by the min operator one level above. And again the value of the blue
tree does not propagate to the root. The value of the blue tree does not propagate
upward, it is blocked either by min or by max. Thus there is no sense in elaboration
of values in the blue subtree. We just prune it off.
4

min

4 8

max max max

4 3 8

min min min min min min min

4 8 9 3 12 8
We have a similar situation for green and red subtrees. Their values do never
propagate to the root. So we do not elaborate their values. If we apply the α-β cut, we
reduce the number of estimations from 15 to 8.
4

min

4 ě8 ě5

Ś Ś Ś
max max max

4 3 8 5

min min min min min min min

4 8 9 3 12 8 12 5
(finis)

2.3 Noughts and crosses


Noughts and crosses is a very popular game for two players. We can easily draw
a complete game tree. The number of leaves in the game tree is 255 168. It takes a few
seconds to generate such a tree. The tree can be reduced because some branches are
just rotations or reflections. So we can reduce the number of leaves down to 26 830.
Actually there are only 765 essentially different positions. The game estimation for a
starting player: 0 (draw).
We know the strategy, we know the game estimation for both players. We have
just spoiled the pleasure of the game.

2.4 Chess
The chess is a very complicated game. We cannot draw a complete game tree. We
estimate the upper bound for a number of positions as 1047.6 . The complexity of the
game (size of the tree) is estimated as 10123 . We cannot draw a complete game tree,

8
because we cannot write the tree. We have only 1080 baryons in the visible Universe.
123
So to construct a complete game tree we need 10
1080 “ 10
43
universes. Unfortunately
we have only one Universe at our disposal. So playing chess has its sense. We do not
know the strategy.

Problem 1. Propose an exhaustive search algorithm for solving sudoku puzzles. ˛

3 Criptarithms and alphametics


Criptarithm is a kind of a puzzle invented in 1931 by M. Vatriquant. The idea is to
substitute letters with digits so that the operation is correct. Each digit is represented
by one letter and each letter represents one digit. No number starts with zero. There
is only one solution.
The first criptarithm invented was:
ABC
ˆ DE
FEC
DEC
HGBC

Problem 2. Propose an exhaustive search algorithm for solving the criptarithm above.
˛
An alphametic is a criptarithm in which numbers encoded in letters produce words
or sentences. The oldest alfametic is: SEND ` MORE “ MONEY.
If all word in an alphametic are numerals and the arithmetic and linguistic mean-
ing is the same, it is a doubly true alphametic. The first doubly true alphametic is
FORTY + TEN + TEN = SIXTY.
Some more examples:
SEVEN + SEVEN + SIX = TWENTY
TWENTY + FIFTY + NINE + ONE = EIGHTY (364832 + 75732 + 8584 + 984 =
450132)
SEVEN + TEN + ONE = THREE + NINE + SIX
(INE)2 ` (TATU)2 = (TANO)2 (in Suahili: 32 ` 42 “ 52 )
(TRI)2 ` (KVAR)2 = (KVIN)2 (in Esperanto: 32 ` 42 “ 52 )
and humorous (DUŻO)2 “ MNÓSTWO (in Polish: 23962 “ 5740816)

9
Complexity problems
Krzysztof Simiński

Algorithms and data structures


lecture 13, 15th May 2020

We devoted our first lecture to the computational complexity. Today we are going
to discuss a topic closely related to computational complexity – complexity problems.
We serialised computational complexities in an ascending row: Op1q, Oplog nq, Opnq,
Opn log nq, Opn2 q, Opn3 q, Opnk q (where k is a constant), Op2n q, Opk n q (where k is a
constant), Opn!q, Opnn q. Let’s have a look in Tab. 1 that presents values of some func-
tions. Please notice that functions “greater” than polynomial have values far beyond
our imagination even for quite small arguments. Tab. 2 presents time of execution for
algorithms with various complexity. Again we can easily notice that overpolynomial
complexities result in completely unimaginable time values.
In practice we split complexities into acceptable (“reasonable”, “decent” algorithms
for “easy” problems) and unacceptable. Acceptable complexities are those up to poly-
nomial. Unacceptable are overpolynomial. For unacceptable complexity we try to find
acceptable approximate solutions.

1 P problems
Definition 1. A problem is a P problem if it can be solved in polynomial time (in
polynomial space) with a deterministic Turing machine.
A deterministic Turing machine is a “classic” Turing machine. It has a head that
moves over an infinite tape. The head reads a symbol on a tape, writes a symbol on
the tape, moves left, right, or does not change its locations. The behaviour of the head
depends both on data on the tape and the machine’s program.
Polynomial functions for complexity have been chosen for a very utilitarian reason.
These are slowly increasing functions – we can solve tasks even for very large data1 .
A sum or product of polynomial functions is a polynomial function. We can add, mul-
tiply polynomial complexities and we still have polynomial complexity.

Example 1. Examples of P problems:


• Op1q: access with an index to an item in an array, dictionary operations (access,
add, remove) in a hash table;
• Oplog nq: binary search in a sorted array, dictionary operations in a balanced
tree;
1 It is worth mentioning that in bioinformatics even square complexity is too high due to enormous data

sizes.

1
Table 1: Values of some functions. The number of protons in the Universe « 10125 ,
number of microseconds since the Big Bang « 1023 .
n 10 50 100 300 1000
5n 50 250 500 1500 5000
n2 100 2500 10000 90000 106
2n 1024 « 1015 « 1030 « 1090 « 10301
n! « 3.3 ¨ 106 « 3 ¨ 1064 « 9 ¨ 10157 « 10622 « 4 ¨ 102567
nn 1010 « 1084 10200 « 10743 103000

Table 2: Execution time for various time complexities. Execution time of one operation
t “ 1 ms. Just for comparison: the Big Bang was 13,7 billion years ago.
n 10 20 50 100 300
2
n 1{10000 s 1{2500 s 1{400 s 1{100 s 9{100 s
n5 1{10 s 3.2 s 5.2 m 2.8 h 28.1 day
2n 1{1000 s 1s 35.7 years 400 ¨ 1014 years « 1076 years
nn 2.8 h 3.3 ¨ 1012 years « 1071 years « 10186 years « 10729 years

• Opnq: sum (minimum, maximum, average) of elements in an array, kth largest


element in an array, breadth or depth first search in graphs, access to kth item
in a list;
• Opn log nq: sorting of an array with compare sort algorithm, convex hull search;
• Opn2 q: simple sorting, space complexity of Floyd-Warshall algorithm;
• Opn3 q: time complexity of Floyd-Warshall algorithm, matrix chain multiplica-
tion problem;
(finis)

2 NP problems
Before we discuss N P problem we have to define a decision problem.
Definition 2. A decision problem is a problem that can be answered with «yes» or «no».
The N P problem is defined in two equivalent ways.
Definition 3. A decision problem in a nondeterministic polynomial problem (N P prob-
lem) if it can be solved in polynomial time with a nondeterministic Turing machine.
Definition 4. A decision problem is a nondeterministic polynomial problem (N P prob-
lem) if its solution can be verified in polynomial time with a deterministic Turing ma-
chine.
A nondeterministic Turing machine is a Turing machine that can be simultan-
eously in many states. A deterministic Turing machine is always in exactly one state.
A nondeterministic Turing machine may be in one or more states. Having read a sym-
bol on the tape it transits from a set of states to a new set of states.

2
Example 2. Let’s assume a program of a nondeterministic Turing machine defines
transitions from each state to two new states. If a machine starts in state 0, after the
first transition it is in 21 “ 2 states, after two transitions it is in 22 “ 4 states, and
after 10 transitions it may be even in 210 “ 1024 states simultaneously. (finis)
If a machine has n states, then a deterministic machine can be each of n states
(one at a time), whereas a nondeterministic machine can be in each of 2n ´ 1 subsets
of states.
P problems can be solved in the polynomial time. Solutions to P problems can be
verified in the polynomial time. So a P problem is an N P problem (cf. Def. 4). This is
a very important conclusion. It states that

P Ď N P. (1)

Unfortunately we do not know if P “ N P or P ‰ N P .

Problem 1. Prove P “ N P or P ‰ N P . ˛
Problem 1 was stated in 1971 by Stephen Cook. It is one of six Millennium Prize
Problems. If you solve it you can win one million dollars and eternal glory. This is one
of three still unsolved millennium problems.
The computers we have are deterministic. In general we can simulate nondetermin-
istic machines but such a simulation has exponential complexity (each subset of n
states (and there are 2n subsets) in a nondeterministic machine is represented with
one state in a deterministic machine). We cannot simulate nondeterministic machines
in polynomial time and we do not know if it is possible. This is why N P problem are
difficult for us to solve. But if we have a solution, we can quickly (in polynomial time)
verify it.
Example 3. Question: Does there exits a Hamiltonian cycle shorter than d in a graph?
The answer is difficult, because we have to find a path. In general we do not know an
exact algorithm with complexity less than Opn!q. But if someone finds a Hamiltonian
cycle, the verification is trivial: we have to sum up all weights in the path. It can be
done in the linear time. (finis)

3 NP-complete problems
Let’s introduce a very important notion – reduction of a computational problem.

Definition 5. A Turing reduction of problem A to problem B (notation: A ďT B)


means we can solve problem A, if we can solve problem B.
Example 4. Let problem S be computation of an area of a square (if we know lengths
of its sides). Let problem T be computation of an area of a trapezium (if we know
lengths of its sides).
If we can calculate the area of a trapezium, we can calculate the area of a square,
ie. S ďT T . The opposite implication is not true. (finis)
Relation ďT is transitive. It defines a partial order. Notation C ďT D denotes that
problem C is not harder then problem D. Thanks to this we can sort problems with
regard to computation complexity.
Let A be a set of computational problems.

3
Definition 6. Problem D is an A-hard problem, if for each problem A P A relation
A ďT D is true.

It means that all problems in set A are not harder than problem D.
Definition 7. If problem D is an A-hard problem and simultaneously D P A, then
problem D is an A-complete.
In other words: if D is A-complete, then it is the hardest problem in set (class)
A. Each problem in set A can be reduced to it. So if we manage to solve problem D,
then we solve all problems in class A. Solution of one A-complete problem solves all
A-complete problems. But not for each class B there exist B-complete problems.
An uninteresting conclusion states that all P problems are P -complete. If we can
solve one P problem, we can solve all P problems. It is not very interesting because
all P problems are easy to solve. More interesting case are complete classes for which
we do not know an effective solution. An interesting example is the N P -complete
class.
Let’s use definition 7 to define N P -complete problems.
Definition 8. N P -complete problem (N P C) is an N P problem that each N P problem
can be reduced to in the polynomial time.

The definition above implies that if we can solve any N P -complete problem in
polynomial time, then we can solve all N P problems in polynomial time. Each N P -
complete problem can be reduced to any other N P -complete problem. N P -complete
problem are the hardest problems in the N P class.
Example 5. Examples of N P -complete problems:

1. Boolean satisfiability problem: For a certain logical formula does there exist
such values of Boolean variables that the formula is true? Or simpler: is the
formula satisfiable?
Example 6. Is the formula a ^ b satisfiable? The solution is very simple
because there are only two variables. In the worst case we have to test four
substitutions. For values a “ 0 and b “ 1 the formula is true. If we had n
variables we would have to test even 2n cases. (finis)

Boolean satisfiability problem is the first problem proved to be N P -complete


(Cook-Lewin theorem, 1971).

2. Does there exist a travelling salesman path shorter than d? In a more formal
way: Is there a Hamiltonian cycle shorter than d in a certain graph?
3. Does there exist a common divisor of two natural numbers?
4. Does there exist a clique in a graph? A clique is a subgraph in which each pair
of nodes is connected with an edge.
5. The knapsack problem. Let’s assume we have n objects with a price pi and
weight wi each. The problem is to pack a knapsack with objects so that the
sum of prices of object is maximal. The knapsack has a weight limit we cannot
exceed.

4
6. Are two graphs isomorphic? Two graphs are isomorphic if there exists a map-
ping of nodes of one graph onto nodes of the other graph so that the correspond-
ing edges in both graphs match. The relation is symmetric. We cannot solve it
without testing of all possible mappings.
7. Can a graph be coloured with n colours? Graph colouring is assignment of
all nodes in a graph with colours so that each edge is incident to nodes with
different colours.

8. Subset sum problem: For a (multi)set of integers, is there a non-empty subset


whose sum is zero?
9. . . .
(finis)

A co-N P problem is an N P problem with answers «yes» and «no» swapped. We


do not know if co-N P “ N P . We know that if N P ‰ co-N P then P ‰ N P .

4 NP-hard problems
Basing on definition 6 we define an N P -hard problem as:
Definition 9. Problem D is an N P -hard problem, if each N P problem can be reduced
to it.
In other words: N P -hard problem (N P H) is a problem at least as hard as any
N P problem.
Please notice that an N P -hard problem is not necessarily an N P problem – it
may be harder than N P .
Conclusions:
• A problem whose decision counterpart is N P -complete is N P -hard.

• An N P -hard problem is not necessary a decision problem.


• If P ‰ N P turns true, then all N P -hard problem do not have polynomial
solutions.
• If P “ N P turns true, then some (but not necessary all!) N P -hard problems
would be easy to solve.
There exist decision problems that are N P -hard, but not N P -complete. An ex-
ample is the halting problem. Given a description of a computer program and input
data, determine if the program finishes running or runs for ever.
The relations between complexity classes are presented in Fig. 1.

Example 7. Examples of N P -hard problems:


1. What is the shortest travelling salesman path in a graph? The correspondent
N P -complete problem is 2 in Example 5.
2. Find a subset of integers whose sum is zero. The correspondent N P -complete
problem is 8 in Example 5.

5
complexity

NPH NPH

NPC

NP P, N P, N P C

P ‰ NP P “ NP

Figure 1: Relations between complexity classes for two hypotheses: P ‰ N P (left)


and P “ N P (right).

´ ´` ˘ 1 2
¯¯
3. factorisation of integers (in function of number b of bits): O exp 64
9 b
3
plog bq 3

4. The longest common subsequence of N´ sequences.


¯ We know a dynamic pro-
śN
gramming solution with complexity O N i“1 ni , where n1 , n2 , . . . , nN are
lengths of sequences.
5. What is the minimal number of colours for a graph? The correspondent N P -
complete problem is 7 in Example 5.
(finis)
Often N P -complete and N P -hard problems can be paired. An N P -complete
problem is a decision problem, a N P -hard problem is not necessary a decision prob-
lem.
Example 8. A pair of problems:
• N P -complete: Given graph G, is there a travelling salesman path shorter than
d?
• N P -hard: What is the shortest travelling salesman path in a graph G?
(finis)

5 Heuristics
We cannot solve N P -complete and N P -hard problems in the polynomial time.
We do not even know whether polynomial solutions exist. But we have to solve such

6
problems. In such a case very often we use heuristics2 .
Definition 10. A heuristic is a polynomial algorithm that solves approximately (but
satisfactorily) a problem that we cannot solve exactly in the polynomial time.
Heuristics commonly elaborate acceptable solutions, although we know these solu-
tions are not always optimal. Very often if a heuristic returns an optimal solution we
do not know it is optimal.
Example 9. An exact solution of the travelling salesman problem has complexity
Opn!q. We can easily propose a heuristic that starts in any town, moves to the nearest
unvisited town and again moves to the nearest unvisited town until all towns are
visited. This heuristic has complexity Opn2 q. The returned solution is not optimal but
usually it is quite good and acceptable. (finis)
Problem 2. Prove that the exact solution of the travelling salesman problem has
complexity Opn!q. ˛

2 The word heuristic origins from Greek εὑρίσϰω – ‘I find, I discover’. This verb is commonly known in
the perfect tense: εὕρηϰα, ηὕρηϰα – ‘I have found, I have discovered’. This is the word Archimedes shouted
after he had discovered Archimedes’ principle.

7
Greedy algorithms
Krzysztof Simiński
Algorithms and data structures
lecture 14, 19th May 2020

In our previous lecture we discussed complexity problems. We know some prob-


lems are difficult for us to solve, because we do not know any polynomial algorithms
and we even do not know if such algorithms exist. But we have to solve such prob-
lems. Today we are going to discuss a new paradigm for algorithm design – greedy
approach.
A greedy algorithm tries to find a solution by local optimisation hic et nunc –
here and now. This approach elaborates quite good solutions, but not always a global
optimal solution.

1 Travelling salesman problem


The travelling salesman problem is a classical problem in optimisation. The prob-
lem was formulated in 1930 as “find the shortest closed path for a list of towns assum-
ing that there exist direct paths between all pairs of towns”. Commonly we reformu-
late the problem in the graph theory.
Definition 1. A complete graph contains all possible edges.
An obvious conclusion from the definition above is: in a complete graph each pair
of vertices is joined by an edge.
Problem 1 (Travelling salesman problem). Find the shortest Hamiltonian cycle in a
weighted complete graph. ˛
We already know this problem is an N P -hard problem. The exact solution has
complexity Opn!q. There are many heuristics and approaches used to solve this solu-
tion as exactly as possible. In Fig. 1 we present a pseudocode for this problem. It is a
very straightforward approach. First we choose any vertex (because all vertices have
to be visited). Then we search for its nearest unvisited neighbour and move to it. Then
we repeat this procedure until all vertices are visited. The algorithm elaborates quite
a good solution, but its is not necessary an optimal one.
Problem 2. Propose a graph for which the greedy algorithm does not elaborate the
optimal solution. ˛

2 Graph vertex colouring problem


Before we formulate the title problem let’s focus on a very practical problem. The
access time to a processor register is quite short. Access to a variable in computer

1
1 procedure greedy_travelling_salesman_problem ( G “ pV, E, wq )
2 // initialization:
3 foreach v P V do
4 v . state Ð unvisited ; // all vertices are unvisited
5 end foreach ;
6

7 s Ð choose any vertex ;


8 first Ð s ;
9 distance Ð 0 ;
10

11 while there exist unvisited vertices do


12 u Ð choose neighbour of s with minimal distance wsu ;
13 u . state Ð visited ;
14 distance Ð distance + wsu ;
15 s Ð u;
16 last Ð u ;
17 end while ;
18

19 distance Ð distance + wlast,first ;


20

21 end procedure ;

Figure 1: Greedy algorithm for the travelling salesman problem.

memory takes longer time. Access to data stored on a hard disk is even longer. We
would like to store all variables in registers. Unfortunately the number of register in
a processor is limited. A compiler has to optimise allocation of variables to registers.
Let’s analyse the piece of code below:
1 procedure register_allocation
2 b Ð initialisation ;
3 c Ð initialisation ;
4

5 a Ð b + c;
6 d Ð a ∗ 5;
7 e Ð d + b ´ a;
8 f Ð c + b + 4 ∗ e;
9 g Ð f ´ 8 ∗ e;
10 h Ð d + g;
11

12 print h ;
13 end procedure ;

In a naïve approach we use a separate register for each variable. We can easily notice
that some variables may share the same register. For example we use variable a in line
5 for the first time. For the last time we use it in line 7. Variable e we use in line 7 for
the first time and in line 9 for the last time. But please notice that a for the last time in
line 7 on the right side of assignment. Variable e is use for the first time in the same
line on the left side of assignment. This is why we can use the same register for both

2
a and e variables.
Let’s denote the first usage of the variable with and the last usage with and
write the code once more.
b c a d e f g h
b Ð initialisation ;
c Ð initialisation ;
a Ð b + c;
d Ð a ∗ 5;
e Ð d + b ´ a;
f Ð c + b + 4 ∗ e;
g Ð f ´ 8 ∗ e;
h Ð d + g;
print h;
Now we can easily detect conflicts. Let’s use a graph to represent them. Each ver-
tex in the graph stands for a variable. If two vertices are joined with an edge it means
the variable they represent are in conflict.

a
h b

g c

f d
e

To solve to problem of register assignment we use the graph vertex colouring ap-
proach. In this approach we colour vertices of a graph in such a way that (1) if two
vertices are joined with an edge they have different colours; (2) the number of colours
is minimal.
Unfortunately this is an N P -hard problem and we do not know if a polynomial
solution exists. This is why we use the greedy algorithm presented in Fig. 2. Let’s
encode colours with numbers: 1 (red), 2 (green), 3 (blue), and 4 (yellow). We take the
first colour (red) and try to colour vertices (we search vertices in alphabetical order).
So we colour a, e, g, h with red. It is not possible to colour any more vertices with
red. So we take the next colour (green) and repeat the procedure. Vertices b and f are
green. Vertices c and d need different colours.

3
1 procedure greedy_graph_vertex_colouring ( G “ pV, Eq )
2 // initialization:
3 foreach v P V do
4 v . colour Ð ∅ ;
5 end foreach ;
6 colour Ð 0 ; // colours are numbers
7

8 // colouring:
9 while there exist vertices without colour then
10 colour Ð colour + 1 ; // take the next colour
11 foreach vertex v in the graph do
12 i f v . colour = ∅ then
13 permition Ð t r u e ;
14 foreach neighbour u of vertex v do
15 i f u . colour = colour then
16 permition Ð f a l s e ;
17 end i f ;
18 end foreach ;
19 i f permition = t r u e then
20 v . colour Ð colour ;
21 end i f ;
22 end i f ;
23 end foreach ;
24 end while ;
25 end procedure ;

Figure 2: Greedy algorithm for the graph vertex colouring problem.

4
a
h b

g c

f d
e

We have coloured the graph with 4 colours.

Problem 3. Is 4 the minimal number of colours for this graph? ˛


Let’s use register R1 for the red variables, R2 for the green variables, R3 for the
blue variables, R4 for the yellow variables,
1 procedure register_allocation
2 R2 Ð initialisation ;
3 R3 Ð initialisation ;
4

5 R1 Ð R2 + R3 ;
6 R4 Ð R1 ∗ 5;
7 R1 Ð R4 + R2 ´ R1 ;
8 R2 Ð R3 + R2 + 4 ∗ R1 ;
9 R1 Ð R2 ´ 8 ∗ R1 ;
10 R1 Ð R4 + R1 ;
11

12 print R1 ;
13 end procedure ;

Problem 4. In real life processors the task is more difficult, because the number of
registers is limited. Thus the task is: Optimise variable allocation of k variables in l
registers. Try to propose a modification of the graph vertex colouring algorithm for
this task. ˛
Greedy approach does not always elaborate optimal solutions.

Example 1. Let’s focus on the following example. Let’s visit the vertices in the fol-
lowing sequence: A, B, C, D, and E. W choose the first colour and we colour A and
C with red. With the second colour we can colour three vertices.

5
D

A B C

Let’s run this example once again. Let’s visit the vertices in the following sequence:
E, D, C, B, and A. We choose the first colour and we colour E, D, and C with red.
With the second colour we can only colour B and we need the third colour for A.

A B C

The result depends on the sequence of visited vertices. (finis)


Definition 2. A graph vertex colouring with at most k colours is called k-colouring.
Definition 3. Chromatic1 number χpGq of a graph G is a minimal number of colours
needed to colour vertices of graph G.

Problem 5. What are the bounds on the chromatic number for a graph with n ver-
tices and m edges? What are the properties of a graph with the minimal (maximal)
chromatic number? ˛
Example 2. Our task is to design traffic lights in a four street crossroad. The are four
streets labelled N(orth), E(ast), S(outh), W(est). The north street is a one way street.
Let’s draw all possible paths on the crossroad.
1 Gr. χρῶµα ‘colour’

6
N

W E

Let’s draw a graph in which each path (eg. NW: north Ñ west) is a separate node.
Each edge represents a conflict (eg. path NE intersect with path EW). Let’s assume
we would like to design a very safe traffic and paths with the same destination are in
conflict (eg. NS and ES).

NW
ES
NS
EW

NE
SE

SW WS
WE

To solve our problem we use the graph vertex colouring algorithm. We start with ver-
tex WE, visit vertices clockwise, and try to colour them with the first colour (red).

7
Then we take the next colour (green) and start the next uncoloured vertex (SW). Hav-
ing coloured with green we take the next colour (blue) and visit the next unvisited
vertex (ES). The last two unvisited vertices (NS and NE) can be coloured in brown.
Finally we have coloured all vertices with four colours.

NW
ES
NS
EW

NE
SE

SW WS
WE

We cannot colour the graph with less than 4 colours, because there is a four-vertex
clique in the graph (ES, NS, WE, SW).
Problem 6. Are there any more cliques in the graph? ˛

This is not the only 4-colouring of the graph.


Problem 7. Can you find all 4-colourings of the graph? ˛
Let’s draw our crossroad in colour.

8
N

W E

(finis)

Problem 8. The solution we have elaborated in Example 2 has some faults. We cannot
add any more paths to the red group. But we can add path WS to the green group –
there would be no conflict. But in our solution each path belongs only to one group.
Modify the colouring algorithm to allow multiple membership of a path to several
groups. ˛

3 Graph edge colouring problem


A similar problem to the vertex colouring is the edge colouring problem. The prob-
lem is: Colour each edge in a graph with a colour so that (1) all edges incident with
one vertex have different colours, and (2) the minimal number of colours is used. It is
an N P -hard problem. The pseudocode of a greedy algorithm is presented in Fig. 3.
Example 3. We have a group of people. We would like to organise two-person meet-
ings. Unfortunately the relationships between all of them are not always friendly. We
ask each person who they would like to talk to with and construct a friendship matrix.

9
1 procedure greedy_graph_edge_colouring ( G “ pV, Eq )
2 // initialization:
3 foreach e P E do
4 e . colour Ð ∅ ;
5 end foreach ;
6

7 // colouring:
8 while there exist edges without colour then
9 colour Ð colour + 1 ; // take the next colour
10 foreach edge e = ( u , v ) in the graph do
11 i f e . colour = ∅ then
12 permition Ð t r u e ;
13 foreach ( u , s ) P E do
14 i f ( u , s ) . colour = colour then
15 permition Ð f a l s e ;
16 end i f ;
17 end foreach ;
18 foreach ( v , s ) P E do
19 i f ( u , s ) . colour = colour then
20 permition Ð f a l s e ;
21 end i f ;
22 end foreach ;
23

24 i f permition = t r u e then
25 e . colour Ð colour ;
26 end i f ;
27 end i f ;
28 end foreach ;
29 end while ;
30 end procedure ;

Figure 3: Greedy algorithm for the graph edge colouring problem.

10
A B C D E F G H
A 3 3 3 3 3 3
B 3 3 3 3 3
C 3 3 3 3 3 3
D 3 3 3 3 3
E 3 3 3 3 3
F 3 3 3 3 3
G 3 3 3 3 3 3
H 3 3 3 3
We can see the matrix is not symmetrical. A likes F, but F does not like A. We
need mutual friendship, so we modify the matrix and put a checkmark if friendship
is mutual.
A B C D E F G H
A 3 3 3 3
B 3 3 3 3
C 3 3 3 3
D 3 3 3 3
E 3 3 3 3 3
F 3 3 3 3
G 3 3 3 3 3
H 3 3 3
Let’s use graph approach to solve this problem. Let each person be represented by
a vertex of a graph and each friendship – an edge.

A
H B

G C

F D
E

We take the first colour (red) and try to colour as many edges as only possible. It
can be done in many ways. This is only one of many colourings.

A
H B

G C

F D
E

11
We take the second colour (green) and try to colour as many edges as only possible.

A
H B

G C

F D
E

We take the third colour (blue) and try to colour as many edges as only possible.
Because blue may be hard to distinguish from black, we draw blue edges dashed.

A
H B

G C

F D
E

We take the fourth colour (magenta) and try to colour as many edges as only pos-
sible. Because magenta may be hard to distinguish from red, we draw magenta edges
dashed.

A
H B

G C

F D
E

Finally we organise 5 sessions of meetings:


red A-B, C-D, E-G
green A-C, B-E, G-H

blue A-H, B-G, D-E

12
magenta A-E, B-D, C-G
black C-E, F-G

Problem 9. Is this the minimal number of sessions? ˛


(finis)
Some algorithms we discussed in our previous lectures are also greedy algorithms,
eg. the Dijkstra’s algorithm, minimum spanning tree search algorithms. Both the Dijk-
stra’s algorithm, minimum spanning tree search algorithms are greedy algorithms, but
elaborate global optimal solutions (it has been proved).

13
Pattern search
Krzysztof Simiński
Algorithms and data structures
lecture 15, 22nd May 2020

Pattern search is a quite frequent task in text analysis and edition. By ‘texts’ we
mean not only natural language texts, but also sequences of aminoacids or nucleotides.
There are a lot of algorithms known, today we discuss only a few – each of them
represents a different approach.
All pattern search algorithms have linear time complexity with regard to text
length and pattern length. Texts in which we commonly search patterns are very
long (millions of symbols). Reduction of a coefficient in the linear term may have
significant influence on execution time of a pattern search algorithm.

1 Some terms
We do not define a symbol. We assume we can understand a symbol in the same
way from examples.

Example 1. A letter is a symbol: ‘a’, ‘b’, ‘c’, . . . , ‘z’, ‘α’, ‘β’, ‘γ’, . . . , ‘ω’. Digits are
symbols: ‘0’ and ‘1’. (finis)
Definition 1. An alphabet is a finite nonempty set of symbols.
Example 2. Examples of alphabets:

• t0, 1u,
• tdash, dot, spaceu,
• ta, b, c, . . . , zu,
• tα, β, γ, . . . , ωu

• DNA nucleotides tA, G, C, T u,


• RNA nucleotides tA, G, C, U u.
(finis)

Definition 2. A word w over alphabet A is a finite sequence of symbols of alphabet A.


Example 3. Example words over alphabet ta, b, c, . . . , zu: ‘pattern’, ‘search’. (finis)
Definition 3. The length l of word w is a number of symbols in the word. The length is
noted as l “ |w|.

1
Definition 4. The empty word ε has no symbols. Its length is zero.
Definition 5. Concatenation of words s and t is word whose lengths is |s| ` |t| and
which composed of word s followed by word t.
Example 4. The pair of words s “ lla and t “ ma has two concatenations st “
llama and ts “ malla. (finis)
Definition 6. Word p is a prefix of word w (noted as p Ă w) if w “ py for some word
y. Word y may be empty. It is true that |p| ď |w|.

Example 5. Prefix of word w is a subword that shares the first symbol with word w.
Word abracadabra has 11 prefixes:
a,
ab,
abr,
abra,
abrac,
abraca,
abracad,
abracada,
abracadab,
abracadabr,
abracadabra.
Please not that the longest prefix of word abracadabra is the word abracadabra
itself. (finis)
The kth prefix (whose length is k) of word w will be noted as wr1 . . . ks.

Definition 7. Word s is a suffix of word w (noted as w Ą s) if w “ ys for some word


y. Word y may be empty. It is true that |s| ď |w|.
Example 6. Suffix of word w is a subword that shares the last symbol with word w.
Word abrakadabra has 11 suffixes:
a,
ra,
bra,
abra,
dabra,
adabra,
cadabra,
acadabra,
racadabra,
bracadabra,
abracadabra.
Please not that the longest suffix of word abracadabra is the word abracadabra
itself. (finis)
Definition 8. Proper prefix (suffix) p (s) of word w is any prefix (suffix) shorter than
word w, ie. |p| ă |w| (or |s| ă |w|).
Each word is its own suffix and prefix. No word is its own proper suffix or proper
prefix.

2
1 procedure naive ( T , P )
2 // T: text
3 // P: pattern
4

5 TextLen Ð length ( T ) ;
6 PatternLen Ð length ( P ) ;
7

8 f o r i Ð 1 to TextLen ´ PatternLen + 1 do
9 location Ð i ;
10 match Ð t r u e ;
11 f o r j Ð 1 to PatternLen do
12 i f T [ j + i ´ 1 ] ‰ P [ j ] then
13 match Ð f a l s e ;
14 break ;
15 end i f ;
16 end f o r ;
17 i f match = t r u e then
18 write ( " pattern starts at " , location ) ;
19 end i f ;
20 end f o r ;
21 end procedure ;

Figure 1: Naïve algorithm.

2 Naïve algorithm
The naïve approach (Fig. 1) checks all possible alignments of a pattern in a text. In
each iteration of the external loop in lines 8-20 we test each alignment of the pattern
in the text. In the internal loop (lines 11-16) we test the match for one alignment.

2.1 Computational complexity


Let’s denote the pattern we search for with p, and the text we search in with t. We
assume the pattern is not longer than the text: |p| ď |t|. The external loop (lines 8-20)
is run |p| ´ |t| ` 1 times. The internal loop (lines 11-16) is run at most |p| times, in
average 12 |p| times. Thus we have an precise asymptotic boundary Θpp|p|´|t|`1q|p|q.
Y ]
Problem 1. What is the complexity of the naïve algorithm if |p| “ |t| 2 ?

3 Knuth-Morris-Pratt algorithm
The naïve algorithm is not very effective. In Fig. 2 we have an example of a partial
match of pattern p and text t. Two symbols match, but the third does not. The naïve
algorithm aligns a pattern and tests the match. But we already know there is no match.
We have tested the symbol in the text that corresponds the pattern symbol for i “ 2
and we know there cannot be any match for i “ 3. Unfortunately the naïve algorithm
neglects the information and tests naïvely if there is a match in the alignment. We can
use the information – so does Knuth-Morris-Pratt algorithm (Fig. 3).

3
a c a b a b b a b t

i“2
a b b t

Figure 2: An example of the partial match of pattern p and text t. Vertical lines rep-
resent matches, a zigzag – no match.

The algorithm bases on the naïve approach, but aligns the pattern in more efficient
way. Some of alignments are skipped. In the naïve approach the pattern is shifted one
symbol right (line 8 in Fig. 1) in each iteration. In the Knuth-Morris-Pratt algorithm
the pattern is shifted one or more symbols right (line 26 in Fig. 3). We use the prefix
function π.
Definition 9. The prefix function is the length of the longest prefix of pattern p that is
a proper suffix of pr1 . . . ks:

πpq, pq “ maxpk ă q ^ pr1 . . . qs Ą pr1 . . . ksq (1)


k

Example 7. Let’s calculate the prefix function for pattern ‘abracadabra’. For each
prefix whose length is k we test what is the longest proper suffix of the prefix that
simultaneously is the longest prefix of pattern p.
For k “ 9 the prefix is ‘abracadab’. For the prefix we have to find the longest
proper suffix that is the prefix of the pattern.:
abracadab abracadabra
We test how many symbols that end the left word are simultaneously the prefix of the
right word. In the example the words share two-symbol sequence (it is underlined).
In the same way we calculate value for all k P r1, 11s.

q pr1 . . . qs p πpq, wq
1 a abracadabra 0
2 ab abracadabra 0
3 abr abracadabra 0
4 abra abracadabra 1
5 abrac abracadabra 0
6 abraca abracadabra 1
7 abracad abracadabra 0
8 abracada abracadabra 1
9 abracadab abracadabra 2
10 abracadabr abracadabra 3
11 abracadabra abracadabra 4

(finis)

Example 8. Let’s run the Knuth-Morris-Pratt algorithm for the following example.

4
1 procedure Knuth_Morris_Pratt ( T , P )
2 // T: text
3 // P: pattern
4 TextLen Ð length ( T ) ;
5 PatternLen Ð length ( P ) ;
6 π Ð prefix_function ( P ) ;
7 i Ð 1;
8

9 while i ď TextLen ´ PatternLen + 1 do


10 matched_symbols Ð 0 ; // number of matched symbols
11 f o r j Ð 1 to PatternLen do
12 i f T [ i + j ´ 1 ] ‰ P [ j ] then
13 break ;
14 // symbol matched:
15 matched_symbols Ð matched_symbols + 1 ;
16 end f o r ;
17

18 i f matched_symbols = 0 then
19 i Ð i + 1; // no symbols matched
20 e l s e // some number of symbols matched
21 i f matched_symbols = PatternLen then
22 write ( " pattern starts at " , i ) ;
23 end i f ;
24

25 // pattern shift
26 i Ð i + matched_symbols ´ π [ matched_symbols ] ;
27 end i f ;
28 end while ;
29 end procedure ;

Figure 3: Algorytm Knutha-Morrisa-Pratta

5
a c a b r a b r a c b r a

i“2
a b r a c a d a b r a

We have managed to match four symbols (‘a’, ’b’, ’r’, ‘a’). For the fifth symbol there
is no match. We have to shift the pattern for a new alignment. The value of variable
matched_symbols is 4. The value of the prefix function πp4q “ 1, thus we shift the
pattern by 3 symbols (line 26 in Fig. 3). (finis)
Problem 2. Calculate prefix functions for patterns
• ‘aaaaa’,
• ‘abcde’,

• ‘abcabc’.

4 Rabin-Karp algorithm
The Rabin-Karp algorithm has a very interesting approach to pattern search. A
pattern is a sequence of symbols similarly as a number is sequence of digits. A num-
ber is a sequence of digits, eg. 123 is composed of three symbols ‘1’, ‘2’, ‘3’. We can
approach pattern in the same way and treat sequences of symbols as numbers.
Example 9. Let’s analyse pattern P “‘rabarbar’. There are three symbols in the
sequence, so we have a ternary numeral system. Let’s assign symbols with numeric
values: ‘r’: 2, ‘a’: 1 i ‘b’: 0. And finally let’s elaborate the numeric value p of pattern P :

p “pppppploomo
2 on ¨3 ` loomo
1 onq ¨ 3 ` loomo
0 onq ¨ 3 ` loomo
1 onq ¨ 3 ` loomo
2 onq ¨ 3`
r a b a r
0 onq ¨ 3 ` loomo
loomo 1 onq ¨ 3 ` loomo
2 on “ 5243
b a r

(finis)
In the similar way we transform a piece of text with the same length as the pattern
in an integer number. Now we only need to compare two integers. Comparison of
integers is much faster than comparison of sequences of symbols. However there are
two problems:
• Number representing sequences may be too large to fit processor registers.
• Calculating a number for a piece of text for each alignment requires iteration
through the piece of sequence.
The first problem is solved with modulo division. The second problem can also be
solved. Let’s have an example.

6
Example 10. Let’s use a decimal representation of numbers. We have a long integer.

5 1 7 9 2 5 8 0 2 1 8 4 5

The red window holds a five digit number 92580. In the next step we want to calculate
the value of a number in the blue window (shifted one digit right). The procedure
is quite simple. First we subtract 90000 from 92580. Then we multiply the result by
10 (or shift the window right). We get 25800. And we only need to add 2 to get the
final results 25802. If we have elaborated the previous number we only need three
integer operations to get the next number: subtraction,
` multiplication ˘and addition.
Let’s write down these operations for our example: 92580 ´ 9 ¨ 104 ¨ 10 ` 2 “
25802. (finis)

Example 11. Let’s go on with Example 9. Given the text T “‘barabarabarbarbar’


find all occurrences of pattern W “‘rabarbar’. The alphabet Σ “ t‘r’,‘a’,‘b’u. We use
modulo operation with q “ 7. Let’s assign the symbols of the alphabet with numeric
values: ‘r’: 2, ‘a’: 1 i ‘b’: 0.
First we elaborate the numeric value of the pattern modulo q “ 7 and the numeric
value of the first piece of the text (lines 14-15 in Fig. 4). We calculate the numeric value
w of the pattern W in the same way as in Example 9, but we use modulo operation:

w “pppppploomo
2 on ¨3 ` loomo
1 onq ¨ 3 ` loomo
0 onq ¨ 3 ` loomo
1 onq ¨ 3 ` loomo
2 onq ¨ 3`
r a b a r
0 onq ¨ 3 ` loomo
loomo 1 onq ¨ 3 ` loomo
2 on mod 7 ” 0.
b a r

In the same way we calculate the numeric value t1 for the first piece of text ‘barabara’
(with the same length as the pattern):

t1 “pppppploomo 1 onq ¨ 3 ` loomo


0 on ¨3 ` loomo 2 onq ¨ 3 ` loomo 0 onq ¨ 3`
1 onq ¨ 3 ` loomo
b a r a b
1 onq ¨ 3 ` loomo
loomo 2 onq ¨ 3 ` loomo
1 on mod 7 ” 3.
a r a

The value ti is used to calculate the next value ti`1 with formula
´ ¯
ti`1 “ ti ´ T ris ¨ p|W |´1 ¨ p ` T ri ` |W |s mod q, (2)

where |W | is pattern length, T ris stands for the ith symbol of the text T , p is the
base of numeric system (ie. the number of symbols in the alphabet). In our example:
|W | “ 8, p “ 3.

p|W |´1 ” 37 ” 33 ¨ 32 ¨ 32 “ 27 ¨ 9 ¨ 9 ” 6 ¨ 2 ¨ 2 ” 3 mod 7. (3)

For our example we get

ti`1 “ pti ´ T ris ¨ 3q ¨ 3 ` T ri ` 8s mod 7. (4)

And we calculate:
t2 “ p3 ´ 0 ¨ 3q ¨ 3 ` 0 ” 2 (5)

7
1 procedure Rabin_Karp ( T , P , d , p )
2 // T: text
3 // P: pattern
4 // d: number of symbols in alphabet
5 // q: big number
6

7 TextLen Ð length ( T ) ;
8 PatternLen Ð length ( P ) ;
9 h Ð dPatternLen ´ 1 mod q ;
10 p Ð 0;
11 t0 Ð 0 ;
12

13 f o r i Ð 1 to PatternLen do
14 p Ð ( d ∗ p + P [ i ] ) mod q ;
15 t1 Ð ( d ∗ t0 + T [ i ] ) mod q ;
16 end f o r ;
17

18 f o r s Ð 1 to TextLen ´ PatternLen + 1 do
19 i f p = ts then // pattern maybe found
20 f o r j Ð 1 to PatternLen do
21 i f T [ j + s ´ 1 ] ‰ P [ j ] then
22 match Ð f a l s e ;
23 break ;
24 end i f ;
25 end f o r ;
26 end i f ;
27

28 i f match = t r u e then
29 write ( " pattern starts at " , s ) ;
30 end i f ;
31

32 i f s < TextLen ´ PatternLen + 1 then


33 ts`1 Ð ( d ∗ ( ts ´ T [ s ] ∗ h ) ) + T [ s + PatternLen ] )
mod q ;
34 end i f ;
35 end f o r ;
36

37 end procedure ;

Figure 4: Algorytm Robina-Karpa

8
There is no match.

t3 “ p2 ´ 1 ¨ 3q ¨ 3 ` 1 ” 5 (6)
t4 “ p5 ´ 2 ¨ 3q ¨ 3 ` 2 ” 6 (7)
t5 “ p6 ´ 1 ¨ 3q ¨ 3 ` 0 ” 2 (8)
t6 “ p2 ´ 0 ¨ 3q ¨ 3 ` 1 ” 0 (9)

We have the same numeric value for the text and the pattern: w “ t6 . We use the
modulo operation, so the same numeric value for the text and for the pattern does not
imply the match of sequences. We have to test if W “ T r6 . . . 13s (line 20 in Fig. 4).
There is not match, we get a false alarm (false positive). Let’s try further:

t7 “ p0 ´ 1 ¨ 3q ¨ 3 ` 2 ” 0 (10)

The values are the same: w “ t7 . We have to test if W “ T r7 . . . 14s “‘rabarbar’. It


true, so we have a match! (finis)
Problem 3. How many false alarms we get if we search a pattern with the Robin-Karp
algorithm? The pattern is P “ 26, the text T “ 3141592653589793, the alphabet
Σ “ t0, 1, 2, . . . , 9u. Use modulo operation q “ 11?

5 Automaton
The last algorithm we discuss is a deterministic finite automaton.
Definition 10. A finite deterministic automaton M is a tuple M “ pQ, q0 , A, Σ, δq,
where:
• Q is a set of states,

• q0 P Q – initial state,
• A Ď Q – set of accepting states,
• Σ – alphabet,
• δ : Q ˆ Σ Ñ Q – transition function.

Example 12. Let’s define an automaton with the set of states Q “ t0, 1, 2, 3, 4, 5u,
initial state 0, set of accepting states A “ t5u, alphabet Σ “ ta, bu, and transition
function δ defined as:

a b
Ñ 0 1 0
1 2
2 3
3 4 1
4 5
5 3 2

9
1 procedure automaton ( T , δ )
2 // T: text
3 // δ: transition function
4

5 TextLen Ð dlugosc ( T ) ;
6 state Ð 0 ;
7

8 f o r i Ð 1 to TextLen do
9 state Ð δ ( state , T [ i ] ) ;
10 i f state is an accepting state then
11 write ( " pattern starts at " , i ´ PatternLen + 1 )
;
12 end i f ;
13 end f o r ;
14 end procedure ;

Figure 5: The pseudocode for detection of a pattern with an automaton.

b
b a
a a b a a
0 1 2 3 4 5
b
b

The automaton starts in state 0. If an input is aabaa, the automaton transits to state 5,
that is an accepting state, what means that the pattern has been detected. The pseudo-
code for detection of a pattern is presented in Fig. 5. (finis)
Problem 4. What patterns does the automaton in Example 12 detect?
Let’s build an automaton for pattern detection. We have to calculate the transition
function. We use the suffix function.
Definition 11. The suffix function for the pattern W and sequence B is the length of
the longest prefix of pattern W , that is simultaneously suffix of B:
σpB, W q “ max B Ą W r1 . . . ks (11)
k

Example 13. Pattern P “ ab: σpε, P q “ 0, σpccaca, P q “ 1, σpccab, P q “ 2.


(finis)
The automaton for pattern W r1 . . . ws has:
• set of states Q “ t0, 1, . . . , wu with initial state q0 “ 0, the only accepting state
is qw ;
• transition function δ for state q and symbol a P Σ:
δpq, aq “ σpW r1 . . . qs ` a, W q, (12)
where x ` y is a concatenation of sequences x i y.

10
1 procedure transition_function ( P , Σ )
2 // P: pattern
3 // Σ: alphabet
4

5 PatternLen Ð length ( P ) ;
6

7 f o r state Ð 0 to PatternLen do
8 foreach symbol in Σ do
9 k Ð min ( PatternLen + 1 , state + 2 ) ;
10 repeat
11 k Ð k ´ 1;
12 u n t i l P [ 1 . . k ] if a prefix of P [ 1 . . state ] + symbol ;
13 δ ( state , symbol ) Ð k ;
14 end foreach ;
15 end f o r ;
16

17 r e t u r n δ ; // transition function
18 end procedure ;

Figure 6: Algorithm for calculation of a transition function.

The pseudocode is presented in Fig. 6.

Example 14. Let’s elaborate an automaton for pattern ‘rabarbar’. We have to analyse
transitions for all symbols in the alphabet:
0. δp0,‘a’q “ σp`‘a’,‘rabarbar’q “ σp‘a’,‘rabarbar’q “ 0
δp0,‘b’q “ σp`‘b’,‘rabarbar’q “ σp‘b’,‘rabarbar’q “ 0
δp0,‘r’q “ σp`‘r’,‘rabarbar’q “ σp‘r’,‘rabarbar’q “ 1

1. δp1,‘a’q “ σp‘r’`‘a’,‘rabarbar’q “ σp‘ra’,‘rabarbar’q “ 2


δp1,‘b’q “ σp‘r’`‘b’,‘rabarbar’q “ σp‘rb’,‘rabarbar’q “ 0
δp1,‘r’q “ σp‘r’`‘r’,‘rabarbar’q “ σp‘rr’,‘rabarbar’q “ 1
2. δp2,‘a’q “ σp‘ra’`‘a’,‘rabarbar’q “ σp‘raa’,‘rabarbar’q “ 0
δp2,‘b’q “ σp‘ra’`‘b’,‘rabarbar’q “ σp‘rab’,‘rabarbar’q “ 3
δp2,‘r’q “ σp‘ra’`‘r’,‘rabarbar’q “ σp‘rar’,‘rabarbar’q “ 1
3. δp3,‘a’q “ σp‘rab’`‘a’,‘rabarbar’q “ σp‘raba’,‘rabarbar’q “ 4
δp3,‘b’q “ σp‘rab’`‘b’,‘rabarbar’q “ σp‘rabb’,‘rabarbar’q “ 0
δp3,‘r’q “ σp‘rab’`‘r’,‘rabarbar’q “ σp‘rabr’,‘rabarbar’q “ 1
4. δp4,‘a’q “ σp‘raba’`‘a’,‘rabarbar’q “ σp‘rabaa’,‘rabarbar’q “ 0
δp4,‘b’q “ σp‘raba’`‘b’,‘rabarbar’q “ σp‘rabab’,‘rabarbar’q “ 0
δp4,‘r’q “ σp‘raba’`‘r’,‘rabarbar’q “ σp‘rabar’,‘rabarbar’q “ 5

11
5. δp5,‘a’q “ σp‘rabar’`‘a’,‘rabarbar’q “ σp‘rabara’,‘rabarbar’q “ 2
δp5,‘b’q “ σp‘rabar’`‘b’,‘rabarbar’q “ σp‘rabarb’,‘rabarbar’q “ 6
δp5,‘r’q “ σp‘rabar’`‘r’,‘rabarbar’q “ σp‘rabarr’,‘rabarbar’q “ 1
6. δp6,‘a’q “ σp‘rabarb’`‘a’,‘rabarbar’q “ σp‘rabarba’,‘rabarbar’q “ 7
δp6,‘b’q “ σp‘rabarb’`‘b’,‘rabarbar’q “ σp‘rabarbb’,‘rabarbar’q “ 0
δp6,‘r’q “ σp‘rabarb’`‘r’,‘rabarbar’q “ σp‘rabarbr’,‘rabarbar’q “ 1

7. δp7,‘a’q “ σp‘rabarba’`‘a’,‘rabarbar’q “ σp‘rabarbaa’,‘rabarbar’q “ 0


δp7,‘b’q “ σp‘rabarba’`‘b’,‘rabarbar’q “ σp‘rabarbab’,‘rabarbar’q “ 0
δp7,‘r’q “ σp‘rabarba’`‘r’,‘rabarbar’q “ σp‘rabarbar’,‘rabarbar’q “ 8
8. δp8,‘a’q “ σp‘rabarbar’`‘a’,‘rabarbar’q “ σp‘rabarbara’,‘rabarbar’q “ 2
δp8,‘b’q “ σp‘rabarbar’`‘b’,‘rabarbar’q “ σp‘rabarbarb’,‘rabarbar’q “ 0
δp8,‘r’q “ σp‘rabarbar’`‘r’,‘rabarbar’q “ σp‘rabarbarr’,‘rabarbar’q “ 1
Finally we get the automation:

state ‘a’ ‘b’ ‘r’


Ñ 0 0 0 1
1 2 0 1
2 0 3 1
3 4 0 1
4 0 0 5
5 2 6 1
6 7 0 1
7 0 0 8
8 2 0 1

(finis)
Problem 5. What is the sequence of states visited by the automaton calculated in Ex-
ample 14 for the text ‘abarrabarbarabar’?
Problem 6. Does the automaton detect two patterns if they overlap in the text?
Problem 7. Calculate automata for patterns:
• ‘aaaaa’,

• ‘abcde’,
• ‘abcabc’.

12

You might also like