Professional Documents
Culture Documents
Aads Merged
Aads Merged
Krzysztof Simiński
Algorithms and Data Structures [Macro]
lecture 03, 13th March 2020
1 Problem statement
input: sequence of n numbers ha1 , a2 , . . . , an i and operator 6
output: permutation hai1 , ai2 , . . . , ain i, such that ai1 6 ai2 6 . . . 6 ain
1.1 Stability
Definition 1. A sorting algorithm is stable, if the order of the same keys in an input
sequence is not changed in an output sequence.
Example 1. Let’s compare outputs of stable and unstable sorting algorithms.
input data:
6 5 1 9 5 8 4 2
A stable algorithm does not change the order of the same keys:
1 2 4 5 5 8 8 9
1 2 4 5 5 8 8 9
1
unsorted data sorted by first names sorted by surnames
Irene Yellow Ann Red Helen Blue
Hugh Magenta Calliope Brown Calliope Brown
Chris Red Chris Red Hugh Magenta
Doris Pink Doris Pink John Magenta
Calliope Brown Helen Blue Doris Pink
Helen Blue Hugh Magenta Ann Red
John Magenta Irene Yellow Chris Red
Ann Red John Magenta Irene Yellow
The first sort (by first names) sets Ann Red before Chris Red. The second stable sort
by surnames does not change the relative order of items with the same key (here: Red).
Thus Ann Red is still before Chris Red. The same is valid for a pair: Hugh Magenta and
John Magenta.
Definition 3. In-place (in situ) sorting algorithm requires at most O(1) extra space to
run.
Comparison is a dominant operation in all algorithms discussed in this lecture.
2
1 procedure b u b b l e _ s o r t ;
2 f o r i ← 1 to n do
3 f o r j ← 2 to n do
4 i f A [ j − 1 ] > A [ j ] then
5 swap ( A [ j − 1 ] , A [ j ] ) ;
6 end i f ;
7 end f o r ;
8 end f o r ;
9 end procedure ;
6 3 1 9 5 8 4 2
i=1
3 6 1 9 5 8 4 2
3 1 6 9 5 8 4 2
3 1 6 9 5 8 4 2
3 1 6 5 9 8 4 2
3 1 6 5 8 9 4 2
3 1 6 5 8 4 9 2
3 1 6 5 8 4 2 9
i=2
1 3 6 5 8 4 2 9
1 3 6 5 8 4 2 9
1 3 5 6 8 4 2 9
1 3 5 6 8 4 2 9
1 3 5 6 4 8 2 9
3
1 3 5 6 4 2 8 9
1 3 5 6 4 2 8 9
2.2.3 Stability
In the pseudocode in Fig. 1 (line 4) we use strict inequality (<). If two equal keys
are compared the condition is false and the keys are not swapped and the algorithm
is stable. But if we used weak inequality (6), the same keys would be swapped and
the algorithm would not be stable.
6 3 1 9 5 8 4 2
For i = 1 we search for a minimum in the whole array and swap it with the first item in
a array.
6 3 1 9 5 8 4 2
sorted to sort
1 3 6 9 5 8 4 2
4
1 procedure s e l e c t i o n _ s o r t ;
2 f o r i ← 1 to n − 1 do
3 index_min ← i ;
4 value_min ← A[ i ] ;
5 f o r j ← i + 1 to n do
6 i f A [ j ] < v a l u e _ m i n then
7 index_min ← j ;
8 value_min ← A[ j ] ;
9 end i f ;
10 end f o r ;
11 swap ( A [ i ] , A [ i n d e x _ m i n ] ) ;
12 end f o r ;
13 end procedure ;
The first cell of an array holds the minimal value or (in other words) the minimal value
is in its final location.
In the second iteration we search for a local minimum in a subarray with indices
2, 3, . . . , n and swap it with the second cell.
sorted to sort
1 3 6 9 5 8 4 2
sorted to sort
1 2 6 9 5 8 4 3
sorted to sort
1 2 6 9 5 8 4 3
sorted to sort
1 2 3 9 5 8 4 6
5
sorted to sort
1 2 3 4 5 8 9 6
sorted to sort
1 2 3 4 5 8 9 6
sorted to sort
1 2 3 4 5 6 9 8
1 2 3 4 5 6 8 9
To elaborate the sum let’s use the counterflow method. The sum to calculate is written
in columns twice: normally and in a counterflow. The sum of items in each row is n
and the number of rows is n − 1.
n−1 + 1 = n
n−2 + 2 = n
n−3 + 3 = n
.. .. ..
. ... . ... .
2 + n−2 = n
1 + n−1 = n
Pn−1 Pn−1 Pn−1
i=1 i + i=1 i = 2 i=1 i = n(n − 1)
6
n−1
X n(n − 1)
i= (2)
i=1
2
n(n−1)
Time complexity of the selection sort algorithm is T (n) = 2 ∈ O(n2 ).
2.3.3 Stability
Let’s analyse certain situation. We search for a local minimum in a subarray.
sorted to sort
5l 5r 4
Having found the minimum we swap it with the first item in a subarray. Unfortunately
the first item holds a repeating key. After swap the order of repeating keys is reversed.
sorted to sort
4 5r 5l
6 3 1 9 5 8 4 2
sorted to sort
6 3 1 9 5 8 4 2
We start with i = 2 and try to insert the ith item into a correct position in the sorted
part.
7
6 3 1 9 5 8 4 2
sorted to sort
3 6 1 9 5 8 4 2
Again we try to insert the first item from the unsorted part of the array.
sorted to sort
3 6 1 9 5 8 4 2
sorted to sort
1 3 6 9 5 8 4 2
sorted to sort
1 3 6 9 5 8 4 2
sorted to sort
1 3 6 9 5 8 4 2
sorted to sort
1 3 5 6 9 8 4 2
sorted to sort
1 3 5 6 8 9 4 2
8
1 procedure i n s e r t i o n _ s o r t ;
2 f o r i ← 2 to n do
3 value_min ← A[ i ] ;
4 j ← i − 1;
5 while j > 0 cand v a l u e _ m i n < A [ j ] do
6 A[ j + 1] ← A[ j ] ;
7 j ← j − 1;
8 end while ;
9 A[ j + 1] ← value_min ;
10 end f o r ;
11 end procedure ;
sorted to sort
1 3 4 5 6 8 9 2
1 2 3 4 5 6 8 9
Optimistic case In such a case the condition (= comparison) is tested only once in
each iteration of the external loop. It means an array is already sorted. The external
loop is run n − 1 times. Finally Tbest (n) = n − 1 ∈ O(n).
Pessimistic case In the pessimistic case the inner loop is run maximally. An item
is inserted always into the first cell. It means an array is sorted in the reversed order.
n−1
X n(n − 1)
Tworst (n) = i= ∈ O(n2 ). (3)
i=1
2
9
Average case For an average case we have to assume some model of data. Let’s
assume each permutation of keys in an input array has the same probability.
Let’s discuss the insertion of the ith item into a sorted part of the array. It may be
inserted in i positions (before 1st, 2nd, . . . , before (i − 1)th, or is left unmoved).
If an item is not moved only one comparison is needed. If an item is to be moved
one cell, two comparisons are needed. If an item is to be moved into the second cell,
i − 1 comparisons are needed. If an item is to be moved into the first cell, i − 1
comparisons are needed. The number of comparison for the 1st and 2nd cells is exactly
the same. Because we distinguish the 1st and 2nd cell with the same comparison: the
last comparison is true for the 1st cell and false for the 2nd cell.
We assume the final localisation of the ith item has the same probability p = 1i .
The expected number E of comparisons for the ith item is:
1 1 1 1 1
E(i) = 1 + 2 + 3 + . . . + (i − 1) + (i − 1) = (4)
i i i i i
1 1
= (1 + 2 + 3 + . . . + (i − 1)) + (i − 1) = (5)
i i
1 i(i − 1) i − 1
= · + = (6)
i 2 i
i−1 1 i+1 1
= +1− = − (7)
2 i 2 i
Now we have only to sum up expected number of comparisons for all i’s in the ex-
ternal loop:
n n n
X i+1 1 X i+1 X 1
Tavg (n) = − = − = (8)
i=2
2 i i=2
2 i=2
i
n n
!
1X 1 X1
= (i + 1) − − + = (9)
2 i=2 1 i=1 i
n n
1X 1X
= i+ 1 − (Hn − 1) = (10)
2 i=2 2 i=2
1 n−1
= (n − 1)(n + 2) + − (Hn − 1) = (11)
4 2
1
= (n2 + 3n − 4) − (Hn − 1) = (12)
4
1
= (n2 + 3n) − Hn (13)
4
1
≈ (n2 + 3n) − ln n − γ ∈ O(n2 ) (14)
4
2.4.3 Stability
For comparison (line 5) we use a strict operator (<). The algorithm does not swap
items with the same key. This is a stable algorithm.
10
Fast sorting algorithms
Krzysztof Simiński
Algorithms and data structures
lecture 03, 20th March 2020
Simple sorting algorithms compare items in an array and move them into final
positions. Unfortunately these are short distance shifts. Sometimes these algorithms
are called turtle sorting algorithms. This brings an idea to make turtles into rabbits that
jump at long distances in each iteration.
1 Shellsort
Shellsort algorithm is based on simple sorting algorithms. In this approach data in
an array are split into subarrays with gap h. A subarray holds items with indices i, i +
h, i+2h, . . . , i+kh. Subarrays are interleaved. Each subarray is sorted independently
from other subarrays in an array. Sorting of all h-subarrays is called h-sorting.
Example 1. Use Shellsort to sort an array. Use the following sequence of gaps: (. . . , 32,
16, 8, 4, 2, 1).
6 15 12 9 5 8 4 2 3 1
Initial gap is the maximal value from the provided sequence of gaps that does not
exceed the half of length of an array. In our example it is h = 4. Every hth element
belongs to the same subarray. The subarrays are interleaved. The initial value of h makes
each subarray hold at least two elements. We have four subarrays: blue, red, green, and
black.
6 15 12 9 5 8 4 2 3 1
Each series is sorted independently. We can see that some elements (eg. 6, 3, 1, 15) have
been moved to distant positions (rabbits, not turtles).
1
3 1 4 2 5 8 12 9 6 15
The array is not sorted yet, but we can notice that elements are closer to their final po-
sitions than before sorting. The left part of the array is dominated by small values, the
right part – by large values.
Then we take the next h value. In our example it is h = 2. We create new subarrays
and sort them independently.
3 1 4 2 5 8 12 9 6 15
Please notice that new subarrays are almost sorted. In the green subarray two elements
need swapping. The red subarray is sorted.
3 1 4 2 5 8 6 9 12 15
We take the next value h = 1. All elements are in the same subarray and we just sort the
array.
3 1 4 2 5 8 6 9 12 15
1 3 2 4 5 6 8 9 12 15
We have to ask a very important question: Is there any sense in shellsort? First
we run h-sorts and finally we always sort with h = 1. But 1-sort is just a sort and it
is enough to run 1-sort to get a sorted array without all previous h-sorts. Thus, why
do we sort subarrays since we always run a typical sort as the last h-sort?
To answer this question we have to recall a natural behaviour of sorting algorithms.
To make shellsort useful we have to use a natural sorting algorithm. Such algorithm
has lower complexity and execution time for sorted arrays. In shellsort each h-sorting
makes an array almost sorted (not sorted, but almost sorted) and only a fewer more
swaps are needed. In such a case a natural sorting algorithm runs faster than for a
random array. This is why in shellsort we use insertion sort and do not use selection
sort.
2
1 procedure shellsort
2 h ← initial gap ;
3 while h > 0 do
4 f o r i ← 0 to h − 1 do
5 sort subarray
hA [ i ] , A [ i + h ] , A [ i + 2 h ] , . . . , A [ i + ki h ] i ;
6 end f o r ;
7 h ← next value (h) ;
8 end while ;
9 end procedure ;
Figure 1: Shellsort
Problem 1. Is it possible to find such an initial permutation that after the first h-sorting
elements are more distant from their final positions than before sorting?
Unfortunately the best sequence is not knowen. We do not know if there exists a
sequence for Tpes (n) ∈ O(n log n). The analysis of average complexity is even more
complicated. Experiments show that shellsort is faster than simple sorting algorithms,
but its complexity is higher than O(n log n).
1.2 Stability
Shellsort is not stable. The same keys may be in various subarrays and are moved
independently. Thus their relevant position may be swapped.
2 Quicksort
Quicksort is a recursive algorithm. It is an example of the “divide and conquer”
approach. This approach divides a problem into subproblems, solves each subproblem,
and merges solutions of subproblems into a final solution.
The idea of quicksort is to split an array into two parts (subarrays): left with values
less than a pivot and right with element greater that a pivot. Pivot is just an element of
3
1 procedure quicksort ( l , r ) ;
2 i f l < r then
3 pivot ← A [ l ] ; // choose pivot
4 s ← l;
5 f o r i ← l + 1 to r do // rearrage element of an array
6 i f A [ i ] < pivot then
7 s ← s + 1;
8 swap ( A [ s ] , A [ i ] ) ;
9 end i f ;
10 end f o r ;
11 swap ( A [ s ] , A [ l ] ) ; // put pivot in its final position
12 quicksort ( l , s − 1 ) ; // sort left subarray
13 quicksort ( s + 1 , r ) ; // sort right subarray
14 end i f ;
15 end procedure ;
Figure 2: Quicksort
an array chosen before the split (we discuss how to choose pivot in sec. 2.4). Subarrays
are not sorted. Then the algorithm is run recursively for both subarrays independently
(Fig. 2).
Example 2. Use quicksort for an array:
6 15 12 9 5 8 4 2 3 1
The implementation in Fig. 2 chooses the first element as a pivot. In our example it is 6.
pivot = 6 6 15 12 9 5 8 4 2 3 1
We search for an element less than the pivot at its right (lines 5-10).
pivot = 6 6 15 12 9 5 8 4 2 3 1
s i
pivot = 6 5 15 12 9 6 8 4 2 3 1
s i
pivot = 6 5 4 12 9 6 8 15 2 3 1
s i
4
pivot = 6 5 4 2 9 6 8 15 12 3 1
s i
pivot = 6 5 4 2 3 6 8 15 12 9 1
s i
pivot = 6 5 4 2 3 1 8 15 12 9 6
s i
And now we put the pivot in its final position (line 11).
pivot = 6 5 4 2 3 1 6 15 12 9 8
s i
We get an array split into three parts: the left with items less than the pivot, the pivot,
and the right part with element not less than pivot.
<6 >6
5 4 2 3 1 6 15 12 9 8
The pivot is in its final position. It will not be moved from this position. The algorithm is
run for the left and the right parts independently.
Optimistic complexity In the best case an array is split into halves. In the first
call of the algorithm an array has n elements, in the second calls – n2 , then n4 , n8 ,
16 etc until it is not possible to split it any more. The depth of recursion is O(log n)
n
and in each there are O(n) comparisons. Thus in total the optimistic complexity is
Topt (n) = O(n log n).
5
Pessimistic complexity In the worst case a final position of a pivot is the first (last)
cell of an array. In the recursive call an subarray is one element shorter. The depth of
recursion is O(n). Thus the pessimistic complexity is Tpes (n) = O(n2 ).
Problem 2. In Fig. 2 a pivot is always the first item of an array. Does it mean this
implementation has always pessimistic complexity?
Average complexity Let’s denote average complexity for an n element array with
T n . After rearrangement of an array a pivot may bo located in any position.
For an empty or 1-element array there are no comparisons:
T0 = T1 = 0 (1)
6
Now let’s subtract (9) from (8):
n−1
X n−2
X
nT n − (n − 1)T n−1 = n(n − 1) + 2 T i − (n − 1)(n − 2) − 2 Ti (10)
i=0 i=0
n−1
X n−2
X
nT n − (n − 1)T n−1 = n(n − 1) − (n − 1)(n − 2) + 2 Ti − 2 Ti (11)
i=0 i=0
n−2
X n−2
X
nT n − (n − 1)T n−1 = (n − 1)(n − n + 2) + 2 T i + 2T n−1 − 2 T i (12)
i=0 i=0
nT n − (n − 1)T n−1 = 2(n − 1) + 2T n−1 (13)
nT n = (n + 1)T n−1 + 2(n − 1) (14)
Now we apply a summing factor technique. We multiply both sides of the equation
with summing factor sn 6= 0:
At the left side we have only index n (ie. sn nT n ), and in the first term at the right side
sn (n + 1)T n−1 we have n and n + 1 and n − 1. Let’s mimic the left side and write
the first term with n − 1 only:
sn (n + 1) = sn−1 (n − 1) (17)
7
Use Eq. (1):
(27)
n
X
sn nT n = 2si (i − 1) (28)
i=1
(29)
Because s1 6= 0 we have
n n
1 X 2 X i−1
· Tn = (i − 1) = 2 (31)
n+1 i=1
(i + 1)i i=1
(i + 1)i
n n n
!
X 2 1 X 2 X 1
=2 − =2 − (32)
i=1
i+1 i i=1
i + 1 i=1 i
thus
n
1 X 1
· Tn = 4 − 2Hn (34)
n+1 i=1
i+1
1 1 1
= −2Hn + 4 + + ... + (35)
2 3 n+1
1 1 1 1 1
= −2Hn + 4 −1 + + + + . . . + + (36)
1 2 3 n n+1
1
= −2Hn + 4 −1 + Hn + (37)
n+1
4
= −2Hn − 4 + 4Hn + (38)
n+1
4
= 2Hn − 4 + (39)
n+1
4
T n = (n + 1) 2Hn − 4 + (40)
n+1
We know that
lim Hn = ln n + γ, (41)
n→∞
8
where γ ≈ 0.577 . . . stands for the Euler’s constant. Finally
4
T n ≈ (n + 1) 2 ln n + 2γ − 4 + (42)
n+1
∈ O(n log n) (43)
Space complexity Each recursive call requires O(1) extra space (for temporary
variables). In average there are O(log n) recursive calls at the same time. Thus the
average space complexity is T p (n) ∈ O(log n).
2.2 Stability
The algorithm groups elements less (greater) than a pivot in the left (right) subar-
ray in an arbitrary way. It is not a stable algorithm.
The naïve solution is just sorting of an array. But we can select kth element faster.
Sorting place all element in their correct locations. But we need only the kth smallest
element in its correct location. We adopt quicksort for selection of kth smallest ele-
ment. After each rearrangement of elements a pivot is in its final position (line 14 in
9
1 procedure select ( l , r , k ) ;
2 i f l = r then
3 return A [ l ] ;
4 end i f ;
5
6 pivot ← A [ l ] ;
7 s ← l;
8 f o r i ← l + 1 to r do
9 i f A [ i ] < pivot then
10 s ← s + 1;
11 swap ( A [ s ] , A [ i ] ) ;
12 end i f ;
13 end f o r ;
14 swap ( A [ s ] , A [ l ] ) ;
15
Fig. 3). If a pivot is in kth cell, the kth element is found. If a pivot’s index is greater
than k we run the algorithm only for the left subarray (line 19), otherwise for the right
one (line 21).
Optimistic complexity In the best complexity case an array is always split into
halves. Let’s sum all comparison. For simplicity let’s assume n = 2k .
n n
Topt (n) = n − 1 + − 1 + − 1 + ... + 2 − 1 + 1 − 1 = (44)
2 4
n n n n n
= 0 − 1 + 1 − 1 + 2 − 1 + . . . + log n−1 − 1 + log n − 1 = (45)
2 2 2 2 2 2 2
log2 n log2 n log2 n log2 n log2 n
X n X 1 X X 1 X
= i
− 1 = n i
− 1 = n i
− 1= (46)
i=0
2 i=0
2 i=0 i=0
2 i=0
log2 n log2 n log2 n !
1 − 21 X 1
=n − 1 = 2n 1 − − (log2 n + 1) =
1 − 12 i=0
2
(47)
= 2n − 2 − log2 n − 1 ∈ O(n) (48)
10
Optimistic complexity The worst case is the same as the worst case of quicksort.
n(n − 1)
Tpes (n) = (n − 1) + (n − 2) + . . . + 2 + 1 = ∈ O(n2 ) (49)
2
T1 = 0. (50)
For n element array and a pivot in ith position after rearragement we have a recursive
equation:
i−1 n−i
Tn = (n + 1) + Ti−1 + Tn−1 , (51)
n n
where i−1n is probability that the algorithm is called for the left subarray, and
n−i
n –
for the right subarray. The average complexity is:
n
1X i−1 n−i
Tn = (n + 1) + T i−1 + T n−1 . (52)
n i=1 n n
Problem 3. Use the same approach as used for average complexity of quicksort to show
that average time complexity of the kthe smallest element search is
T n ∈ O(n). (53)
11
Binary search trees
Krzysztof Simiński
1 record node of
2 value : type ; // stored value
3 left , right : ^ node ; // pointers (references) to children
4 parent : ^ node ; // pointer (reference) to parent
5 end of record
1
20
< 20 > 20
10 27
< 10 > 10 < 27 > 27
2 17 25 32
< 17 < 32
12 28
> 28
30
inserted between nodes. It always substitutes a nil value in one of nodes with respect
to the binary search property.
Removal of an node requires some explanation. We have three cases.
10 ... 10 ...
2 17 2 17
12 12
2
1 procedure minimum ( root )
2 begin
3 x ← root ;
4 while x . left 6= n i l do
5 x ← x . left ;
6 end while ;
7 return x ;
8 end procedure ;
(finis)
• A node to be removed has one child. We have to find its parent, test if a node
to be removed is a left or right child and modify the reference to the node to be
removed. The reference now points an only child of a node to remove. Now we
only need to remove the node to remove.
Example 2. Let’s remove node 17 from the binary search tree below. We have
to find the parent of node 17 – it is 10, move its right child reference from 17 to
12.
20 20
10 ... 10 ...
2 17 2 17
12 12
(finis)
3
1 procedure insert ( root , to_add )
2 begin
3 y ← nil ;
4 x ← root ;
5
6 // new node:
7 new_node ← new node ;
8 new_node . left ← new_node . right ← n i l ;
9 new_node . value ← to_add ;
10
21 new_node . parent ← y ;
22 i f y = n i l then
23 root ← new_node ;
24 e l s e i f new_node . value < y . value then
25 y . left ← new_node ;
26 else
27 y . right ← new_node ;
28 end i f
29 end procedure ;
4
– Remove original successor (it has zero or one child.)
Example 3. Let’s remove node 10 from the binary search tree below. We have
to find the successor of node 10 – it is 12.
20
10 ...
12
14
20
12 ...
12
14
20
12 ...
12
14
(finis)
For removal of a node we need to find its successor in a binary search tree. A
successor of a node is the next node in a sorted sequence of nodes in a tree. Successor
search is presented in Fig. 9. The algorithm handles two cases.
• If a predecessor (a successor of which we search for) has a right subtree, a min-
imum in a right subtree is the searched successor.
Example 4. A successor of 10 is a minimum of 10’s right subtree. A successor
of 27 is a minimum of 27’s right subtree. (finis)
5
1 procedure remove ( root , to_remove )
2 begin
3 v ← find ( root , to_remove ) ;
4 i f v . left = n i l or v . right = n i l then
5 y ← v ; // the node has one or no child
6 else
7 y ← successor ( v ) ; // the node has two children
8 end i f
9 i f y . left 6= n i l then
10 x ← y . left ;
11 else
12 x ← y . right ;
13 end i f
14 i f x 6= n i l then // the node had one child
15 x . parent ← y . parent ;
16 end i f
17 i f y . parent = n i l then // we remove root
18 root ← x ;
19 e l s e i f y = y . parent . left then // cut the node out
20 y . parent . left ← x ;
21 else
22 y . parent . right ← x ;
23 end i f
24 i f y 6= v then // move value
25 v . value ← y . value ;
26 end i f
27 returny y ;
28 end procedure ;
27
20 32
10 25 28
2 17 30
12 19
Figure 8: A binary search tree. Dashed arrows denote successors of nodes. Node 32
has no successor.
6
1 procedure successor ( predecessor )
2 begin
3 x ← predecessor ;
4 i f x . right 6= n i l then // search in the right subtree
5 r e t u r n minimum ( x . right ) ;
6 end i f
7
15 return y ;
16 end procedure ;
7
1.2 Recursive algorithms
Each subtree of a tree is also a valid binary search tree. This is why often recursive
algorithms are used for binary search trees.
8
1 procedure maximum ( root )
2 begin
3 i f root = n i l then
4 return n i l ;
5 end i f
6
9
20 = 1
21 = 2
h = log n
22 = 4
23 = 8
Definition 8. A balanced tree is a tree in which a distance between a root and any leaf
is the same.
Theorem 1. A balanced tree with height h has n = 2h+1 − 1 nodes.
Problem 9. Prove theorem 1.
10
33
15 47
10 20 38 51
5 18 36 39
Figure 16: Example of a red-black tree. Small black square denote nil references.
1 record node of
2 value : typr ; // stored value
3 left , right : node ; // pointers (references) to children
4 parent : node ; // pointer (reference) to parent
5 colour : enum ( red , black ) ; // node colour
6 end of record
2 Red-black trees
Binary search trees have low complexity of operations if they are balanced. Several
types of self-balancing trees have been proposed. One of them are red-black trees.
In red-black trees each node has one more field: colour (red or black) – Fig. 17. A
red-black tree is a binary search tree satisfying:
• Each node is red or black.
2.1 Rotations
Rotations are operations used in red-black trees (and in many self-balancing trees).
Rotations do not violate the binary search property of trees. General idea of left and
right rotations is presented in Fig. 19, an example – in Fig. 20, and pseudocode for a
left rotation in Fig. 18.
11
1 procedure left_rotation ( root , x )
2 begin
3 y ← x . right ;
4 x . right ← y . left ;
5 i f y . left 6= n i l then
6 y . left . parent ← x ;
7 end i f
8 y . parent ← x . parent ;
9 i f x . parent = n i l then
10 root ← y ;
11 e l s e i f x = x . parent . left then
12 x . parent . left ← y ;
13 else
14 x . parent . right ← y ;
15 end i f
16 y . left ← x ;
17 x . parent ← y ;
18 end .
12
p p
B A
right rotation
A B
γ α
α β β γ
left rotation
Figure 19: Rotations in a binary search trees. Triangles (α, β, γ) stand for subtrees.
Subtrees may be empty (nil).
right_rotation(root, y);
root root
y 12 x 12
8 4
4 9 15 1 8 15
1 7 11 7 9
11
left_rotation(root, x);
13
C C new x
A D A D
10 B x 28 30 10 B 28 30
C C
A D B D
10 B x x A B
2.2 Insertion
A new node is inserted as a binary search tree. A new node is always red. If a new
red node has a red parent, the tree should be transformed to restore the red-black tree
properties.
Definition 11. A sibling of a node is a node that has the same parent.
Definition 12. An uncle of a node is a sibling of node’s parent.
Problem 13. Insert into an empty red-black tree values: 30, 20, 10, 15, 16, 5, 8, 12, 13,
9.
Problem 14. How many rotations are needed to restore properties of a red-black
trees after insertion of a new node?
14
C
B D B
x A B A C
10 28
B 30
15
2.3 Removal
If a removed node is red, the properties of a red-black tree are not violated. The
problem is if a black node is removed.
In line 4 two references are set:
0. if a removed node had no children:
• y points to a removed node (that is not in a tree any more),
• x = nil.
If
• x.colour = red, then change its colour to black.
• x.colour = black, then
1. node x is a left child of its parent, then it has a right sibling
(a) sibling is red (line 13)
(b) sibling is black and both children as well (line 21)
(c) sibling is black and its left child is red and right child is black (line
26)
(d) sibling is black and its right child is red (line 33)
2. (case symmetrical to 1) węzeł x jest prawym synem swojego ojca, zatem
ma lewego brata
(a) (case symmetrical to 1a) sibling is red,
(b) (case symmetrical to 1b) sibling is black and its both children as well,
(c) (case symmetrical to 1c) sibling is black and its right child is red and
left child is black,
(d) (case symmetrical to 1d) sibling is black and its left child is red.
Problem 16. Remove values: 5, 15, 20, 16, 30 from the tree from Problem 13.
Problem 17. How many rotations are needed to restore properties of a red-black
trees after removal of a node?
16
1 procedure remove ( root , to_remove )
2 begin
3 // remove as from a binary search tree
4 ( x , y ) ← remove ( root , to_remove ) ;
5
17
Table 2: Complexity of operations in a red-black tree.
complexity
operation average and worst
search O(log n)
insert O(log n)
delete O(log n)
18
Hash tables
Krzysztof Simiński
Algorithms and data structures
lecture 04, 3rd April 2020
Let’s compare two data structures: an array (a vector) and a balanced binary search
tree.
In an array the indices are consecutive numbers (there are na gaps). It is very easy
to calculate an address of an item we search for. Access to item is very fast: O(1).
Unfortunately if we would like to store items with indices −10, 3, 7, and 64, we have
to allocate a consistent block of memory for all indices in interval [−10, 64]. Cells in
the allocated array are mostly unused. This is a waste of memory. So it is better to
use a balanced binary tree. We can very easily implement a container for any set of
indices without memory waste. This solution has a disadvantage: access to an item
takes longer time than in arrays – it is O(log n).
Let’s join the advantages of these two approaches: non consecutive set of indices
and fast access. These are the features of hash tables.
A hash table is just an array. But the keys of items are not used indirectly as indices
in the array. First keys are hashed. A hashing function is a function that takes an key
and returns an index in a hash array. Keys may be completely different in nature than
indices.
Example 1. Our hash table is based on a 7-element array:
0 1 2 3 4 5 6
1
w bits
× bA · 2w c
r1 r0
p bits
h(k)
0 1 2 3 4 5 6
35 8 20 72 −10 40 6
(finis)
1 Hash function
h : K → U, (2)
where K – set of keys, U – set of indices in a hash table.
Features of a good hash function:
1. It is easy to compute.
2. For similar keys returns dissimilar indices.
3. Returns indices with the uniform distributions (all indices have the same prob-
ability).
2
This formula can be implemented in a very fast way as register operations (Fig. 1).
The method works for any value of constant A. Knuth proves that
√
5−1
A= ≈ 0.6180339887 . . . (5)
2
produces good hashing functions.
√
Example 2. Let k = 123456, m = 10000, A = 5−1
2
The function is easy to compute and for similar keys returns dissimilar indices. (finis)
2 Conflicts
Unfortunately cardinality of a set of keys may be significantly larger than car-
dinality of a set of indices. In such a situation it is impossible to fit each key into a
different cell in an array. If a hash function returns the same index for two (or more)
different keys, we have a conflict.
3
1 procedure hash_chaining_insert ( x )
2 begin
3 k ← h(x) ;
4 insert x at the beginning of the list starting at A [ k ] ;
5 end procedure .
1 procedure hash_chaining_search ( x )
2 begin
3 k ← h(x) ;
4 i f x is in the list starting at A [ k ] then
5 return true ;
6 else
7 return f a l s e ;
8 end i f ;
9 end procedure .
5 5 19 40
1 −6
0 63 35 70
(finis)
Time complexities
• Assuming a good hash function (uniformly distributing) time complexity de-
pends on an average length of the lists.
• When a size of hash table is a and the number of elements is n, the time com-
plexities of searching and removing is O 1 + na .
• The time complexity for insertion is O(1).
4
1 procedure hash_chaining_remove ( x )
2 begin
3 k ← h(x) ;
4 i f x is in the list starting at A [ k ] then
5 remove x ;
6 end i f ;
7 end procedure .
• Note: When n = c × a for some small c, e.g., c < 3, all dictionary operations
work in O(1) average time!
h(x, 0) = . . .
h(x, i) = (h(x, 0) + i) mod m, for 1 6 i 6 m − 1
Example 4. Use hash function
h(x, 0) = x mod 8
h(x, i) = (h(x, 0) + 3i) mod 8 for 1 6 i 6 7
to insert values: 57, 14, 18, 8, 111, 87, 25, 33 in to a hash table.
0 1 2 3 4 5 6 7
57 18 14 111
5
Unfortunately for 87 we have two conflicts:
0 1 2 3 4 5 6 7
57 18 87 14 111
0 1 2 3 4 5 6 7
57 18 25 87 14 111
0 1 2 3 4 5 6 7
33 57 18 25 87 14 111
(finis)
Problems
• Linear probing tends to group values in clusters.
• When h(x, 0) points into large group it is necessary to check a lot of non-empty
positions to localize an element or an empty cell.
6
1 procedure hash_probing_linear_insert ( x )
2 begin
3 f o r i ← 0 to m − 1 do
4 k ← h(x , i) ;
5 i f A [ k ] is empty or removed then
6 A[k] ← x ;
7 return true ;
8 end i f ;
9 end f o r ;
10 return f a l s e ;
11 end procedure .
1 procedure hash_probing_linear_search ( x )
2 begin
3 f o r i ← 0 to m − 1 do
4 k ← h(x , i) ;
5 i f A [ k ] = x then
6 return true ;
7 e l s e i f A [ k ] is empty then
8 return f a l s e ;
9 end i f ;
10 end f o r ;
11 return f a l s e ;
12 end procedure .
1 procedure hash_probing_linear_remove ( x )
2 begin
3 f o r i ← 0 to m − 1 do
4 k ← h(x , i) ;
5 i f A [ k ] = x then
6 A [ k ] ← empty ;
7 return true ;
8 e l s e i f A [ k ] is empty then
9 return f a l s e ;
10 end i f ;
11 end f o r ;
12 return f a l s e ;
13 end procedure .
7
(
h(x, 0) = . . .
for 1 6 i 6 m − 1
h(x, i) = h(x, 0) + c1 i + c2 i2 mod m
It is important to choose constants c1 and c2 in such a way that h(x, i) for any i =
0, 1, . . . , m − 1 returns different values.
We do not present pseudocodes here, because they are the same as for linear prob-
ing (only a hash function is different).
Example 5. Use hash function
(
h(x, 0) = x mod 8
for 1 6 i 6 m − 1
h(x, i) = h(x, 0) + 2i2 − 5i mod m
to insert values: 57, 21, 18, 5, 123, 87, 25, 33 in to a hash table.
h(57, 0) = 57 ≡ 1 mod 8
h(21, 0) = 21 ≡ 5 mod 8
h(18, 0) = 18 ≡ 2 mod 8
0 1 2 3 4 5 6 7
57 18 21
h(5, 0) = 5 mod 8
h(5, 1) = h(5, 0) + 2 × 12 − 5 × 1 ≡ 5 + 2 − 5 ≡ 2 mod 8
0 1 2 3 4 5 6 7
57 18 5 21
0 1 2 3 4 5 6 7
123 57 18 5 21
h(87, 0) = 87 ≡ 7 mod 8
8
0 1 2 3 4 5 6 7
123 57 18 5 21 87
h(25, 0) = 25 ≡ 1 mod 8
h(25, 1) = h(25, 0) + 2 × 12 − 5 × 1 ≡ 1 + 2 − 5 ≡ 6
mod 8
0 1 2 3 4 5 6 7
123 57 18 5 21 25 87
h(33, 0) = 33 ≡ 1 mod 8
h(33, 1) = h(33, 0) + 2 × 12 − 5 × 1 ≡ 1 + 2 − 5 ≡ 6 mod 8
0 1 2 3 4 5 6 7
123 57 18 5 33 21 25 87
(finis)
to insert values: 9, 16, 8, 2, 23, 5 in to a hash table. Please not that h1 and h2 use
various mod operand values!
9
0 1 2 3 4 5 6
0 1 2 3 4 5 6
9 16
0 1 2 3 4 5 6
8 9 16
0 1 2 3 4 5 6
2 8 9 16
0 1 2 3 4 5 6
2 8 9 23 16
10
h(5, 0) = (h1 (5) + 0 × h2 (5)) ≡ 5 mod 7
h2 (5) = (2 × 5 mod 6) + 1 = 5
h(5, 1) = (h1 (5) + 1 × h2 (5)) ≡ 5 + 1 × 5 ≡ 3 mod 7
h(5, 2) = (h1 (5) + 2 × h2 (5)) ≡ 5 + 2 × 5 ≡ 1 mod 7
h(5, 2) = (h1 (5) + 3 × h2 (5)) ≡ 5 + 3 × 5 ≡ 6 mod 7
0 1 2 3 4 5 6
2 8 9 23 16 5
(finis)
4 Perfect hashing
Definition 7. A perfect hashing function returns different value for each key.
A perfect hashing function guarantees no conflicts.
Definition 8. A minimal perfect hashing function is a perfect hashing function that for
all n keys returns indices from interval [0, n − 1].
A perfect minimal hashing function guarantees no conflicts and no empty buckets
in a hash table.
11
Table 1: Complexity of operations in a hash table.
complexity
operation average worst
search O(1) O(n)
insert O(1) O(n)
delete O(1) O(n)
12
Heaps
Krzysztof Simiński
Binary heap is a data structure similar to binary trees, but it differs from trees in
two ways (Fig. 1):
• (order property) Each node holds a value greater than or equal to values held
by its children (maximum heap).1
• (shape property) All levels in a heap are full. The only exception is the lowest
level that may be filled partially starting from the left.
Heaps are rarely stored in trees. Commonly we use arrays for heaps. (Fig. 2). A
heap is stored in an array indexed from one. On the top of a heap (in the first cell of an
array) we store the maximal value. Such a representation makes it very easy to access
a parent or children of a value.
If a value is stored in a ith cell, then
• its children have indices 2i and 2i + 1;
• its parent has index 2i .
Problem 1. What are the indices of children and a parent of a value, if a heap is
indexed from zero instead of one?
1 It is also possible to define a minimum heap with the reversed order.
45
14 30
12 4 26 20
10 8 1 2 21
Figure 1: Example of a binary heap. Each node holds value greater than its children.
Level l = 1 has 2l−1 = 1 node, level l = 2 has 2l−1 = 2 nodes, level l = 3 has
2l−1 = 4 node. These levels are full. The only partially filled level is the last one.
1
45 14 30 12 4 26 20 10 8 1 2 21
1 2 3 4 5 6 7 8 9 10 11 12
1 Sift-up operation
A new item is added in the lowest level of a heap just after the last item. If the
lowest level if full, a new level is started from the left. In an array implementation a
new item is inserted just after the last item. The shape property is satisfied, but we
have to satisfy the order property as well. We use sift-up operation (Fig. 3).
Example 2. Let’a add 35 to the heap in Fig. 1 and 2. We add a new value at the end
of a heap. If the order property is not satisfied (a new item is greater than its parent),
we have to swap a new value and its parent.
45
14 30
12 4 26 20
10 8 1 2 21 35
2
Value 35 added at the end of the heap.
45 14 30 12 4 26 20 10 8 1 2 21 35
1 2 3 4 5 6 7 8 9 10 11 12 13
45
14 30
12 4 35 20
10 8 1 2 21 26
45 14 30 12 4 35 20 10 8 1 2 21 26
1 2 3 4 5 6 7 8 9 10 11 12 13
45
14 35
12 4 30 20
10 8 1 2 21 26
45 14 35 12 4 30 20 10 8 1 2 21 26
1 2 3 4 5 6 7 8 9 10 11 12 13
3
1 procedure sift_down ( A , left , right ) // O(log n)
2 begin
3 parent ← left ;
4 while parent ∗ 2 6 right do // until a parent has at least one child
5 child ← parent ∗ 2 ;
6 greater_child ← 0 ; // index of greater child
7 i f A [ parent ] < A [ child ] then
8 greater_child ← child ;
9 end i f ;
10 i f child + 1 6 right then // parent has two children
11 i f A [ parent ] < A [ child + 1 ] and
12 A [ child ] < A [ child + 1 ] then
13 greater_child ← child + 1 ;
14 end i f ;
15 end i f ;
16 i f greater_child > 0 then // greater child exists
17 swap ( A [ parent ] , A [ greater_child ] ) ;
18 parent ← greater_child ;
19 e l s e // greater child
20 return ;
21 end while ;
22 end .
2 Sift-down operation
Sift-down is an essential operation in a binary heap. We place an item on the top
of a heap. If the order property is not satisfied we have to swap an item with one of
its children. The case is more complicated because we have to decide which child an
item should be swapped with (Fig. 4).
Example 4. Value 3 has been placed at the top of the heap. Let’s sift it down.
14 30
12 4 26 20
10 8 1 2 21
4
3 14 30 12 4 26 20 10 8 1 2 21
1 2 3 4 5 6 7 8 9 10 11 12
Value 3 needs sifting down. We choose the greater child and swap.
30
14 3
12 4 26 20
10 8 1 2 21
30 14 3 12 4 26 20 10 8 1 2 21
1 2 3 4 5 6 7 8 9 10 11 12
Value 3 is not in its final localisation. We have to swap it with its greater child.
30
14 26
12 4 3 20
10 8 1 2 21
30 14 26 12 4 3 20 10 8 1 2 21
1 2 3 4 5 6 7 8 9 10 11 12
Unfortunately value 3 is still not in its final localisation. We have to swap it with its
greater child.
5
1 procedure heapify ( A , size ) // T (n) ∈ Θ(n log n)
2 begin
3 start ← floor ( size / 2 ) ; // index of a parent of the last item
4 while start > 0 then
5 sift_down ( A , start , size ) ;
6 start ← start − 1 ;
7 end while
8 end .
Figure 5: Heapifying
30
14 26
12 4 21 20
10 8 1 2 3
30 14 26 12 4 21 20 10 8 1 2 3
1 2 3 4 5 6 7 8 9 10 11 12
3 Heapifying
In the sections above we have operated on already existing heaps. Now we are
going to make a heap in an array (Fig. 5).
Example 5. Let’s heapify the array:
14 12 21 10 4 3 30 1 8 20 2 26
1 2 3 4 5 6 7 8 9 10 11 12
14 12 21 10 4 3 30 1 8 20 2 26
1 2 3 4 5 6 7 8 9 10 11 12
6
After sifting down:
14 12 21 10 4 26 30 1 8 20 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 12 21 10 4 26 30 1 8 20 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 12 21 10 20 26 30 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 12 21 10 20 26 30 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 12 21 10 20 26 30 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
We sift it down.
14 12 30 10 20 26 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
7
14 12 30 10 20 26 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 20 30 10 12 26 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 20 30 10 12 26 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
14 20 30 10 12 26 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
30 20 14 10 12 26 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
30 20 26 10 12 14 21 1 8 4 2 3
1 2 3 4 5 6 7 8 9 10 11 12
Finally value 14 has found its location and we have made the heap in the array. (finis)
Problem 6. Draw a tree-like version of example 5.
8
3.1 Computation complexity
We start to heapify in the middle of an array. Cost of sifting down is O(log n).
We sift a linear number of items O(n). Thus upper estimation is O(n log n). Unfor-
tunately it is a very rough estimation. Let’s try to be more precise.
Let’s assume a heap has h levels indexed from 0 (root) up to h − 1 (leaves). Let’s
analyse the penultimate level, ie. h − 2. Items in this level can only be sifted down to
level h − 1. In level h − 2 there are more or less 41 of all items in a heap. Thus a quarter
of items may be only sifted one level down.
Sifting an item from level ith takes O(h − 1 − i). There are 2i items in this level.
Let’s sum up possible sifts for all levels:
h−1
X
2i (h − 1 − i) = 20 (h − 1) + 21 (h − 2) + 22 (h − 3) + . . . + 2h−2 (1) + 2h−1 (0) =
i=0
(1)
h−1 h−1 h−1 h−1
X X 2h−1 X k X k
= 2h−1−k (k) = k
(k) = 2h−1 k
6n
2 2 2k
k=0 k=0 k=0 k=0
(2)
Ph−1 Ph−1
Sum k=0 2kk is convergent to a constant, so i=0 2i (h − 1 − i) ∈ O(n). It means
a heap is built in the linear time.
Ph−1
Let’s check if the value of k=0 2kk is not an extremely large constant. Let’s write
the sum as a matrix:
1
2
1 1
4 4
1 1 1 (3)
8 8 8
.. .. .. ..
. . . .
If we add items in rows, we get the sum. If we sum items in columns, the first column
is a geometric series equal 1, the second a series equal 14 , the third 18 etc. Eventually
we have a series of series (that is also a series): 11 + 12 + 14 + 81 + . . . = 2.
We have just shown that the complexity of heapifing is O(n).
4 Heapsort
The idea of heapsort is very simple. First we make a heap in an array. The first
cell of the array holds the maximal value. We swap it with the last item in the array.
The maximal value is in its final position. But the value at the top of the heap needs
sifting down. So we sift it down. The second maximal value is at the top of the heap.
We swap it with the penultimate value in the array. In each iteration the heap is one
item shorter and the sorted array – one item longer.
Example 7. Let’s sort the array:
14 12 21 10 4 3 30 1 8 20 2 26
9
1 procedure heapsort ( A , size )
2 // Topt (n) = Tpes (n) = Tavg (n) ∈ O(n log n)
3 begin
4 heapify ( A , size ) ;
5 right ← size ;
6
Figure 6: Heapsort.
First we heapify the array (we have already done it in example 5).
heap
30 20 26 10 12 14 21 1 8 4 2 3
We swap the last item with the first one and sift value 3 down:
heap sorted
26 20 21 10 12 14 3 1 8 4 2 30
heap sorted
21 20 14 10 12 2 3 1 8 4 26 30
20 12 14 10 4 2 3 1 8 21 26 30
14 12 8 10 4 2 3 1 20 21 26 30
12 10 8 1 4 2 3 14 20 21 26 30
10
heap sorted array
10 4 8 1 3 2 12 14 20 21 26 30
8 4 2 1 3 10 12 14 20 21 26 30
4 3 2 1 8 10 12 14 20 21 26 30
3 1 2 4 8 10 12 14 20 21 26 30
2 1 3 4 8 10 12 14 20 21 26 30
1 2 3 4 8 10 12 14 20 21 26 30
sorted array
1 2 3 4 8 10 12 14 20 21 26 30
(finis)
11
Mergesort, linear sorting
Krzysztof Simiński
Algorithms and data structures
lecture 08, 24th April 2020
1 Mergesort
Mergesort is an example of a “divide and conquer” paradigm. A task is split into
subtasks until it is trivial to solve them. Then the results of subtasks are merged into
a final results of the initial task.
In mergesort algorithm a crucial operation is merging of sorted subarrays into one
merged sorted array. Fortunately it can be done fast.
12 9 7 6 3
15 8 4
We take only one value from both input arrays and test which is lesser. We move the
lesser value from the input into output.
12 9 7 6 3
15 8 4
1
12 9 7 6
15 8 4
Then we test two values from input streams and again test which one is lesser. The
lesser value is moved to output.
12 9 7 6
4 3
15 8
12 9 7
6 4 3
15 8
12 9
7 6 4 3
15 8
2
12 9
8 7 6 4 3
15
12
9 8 7 6 4 3
15
One of streams is empty. We just move all values from the non empty stream to output.
12 9 8 7 6 4 3
15
15 12 9 8 7 6 4 3
(finis)
Each value is taken from input array only once. One comparison is enough to
decide which value to move. Each value is put into output only once. Thus the com-
putational complexity of merging algorithm is linear.
3
1 procedure mergesort ( A , down , up ) ;
2 i f down < up then
3 s ← ( down + up ) / 2 ;
4
5 mergesort ( A , down , s ) ;
6 mergesort ( A , s + 1 , up ) ;
7
8 // merge subarrays
9 B [ down . . up ] ; // auxilliary array
10 left ← down ;
11 right ← s + 1 ;
12
13 f o r i ← down to up do
14 i f left > s then
15 B [ i ] ← A [ right ] ;
16 right ← right + 1 ;
17 e l s e i f right > up then
18 B [ i ] ← A [ left ] ;
19 left ← left + 1 ;
20 e l s e i f A [ left ] < A [ right ] then
21 B [ i ] ← A [ left ] ;
22 left ← left + 1 ;
23 else
24 B [ i ] ← A [ right ] ;
25 right ← right + 1 ;
26 end i f ;
27 end f o r ;
28
34 end i f ;
35 end procedure ;
4
10 4 3 9 12 1 20 7 15 5
10 4 3 9 12 1 20 7 15 5
10 4 3 9 12 1 20 7 15 5
10 4 3 9 12 1 20 7 15 5
10 4 1 20
1.2 Sorting
The pseudocode of mergesort in presented in Fig. 1.
Example 2. Mergesort is a recursive algorithm. First we split an array into two equal
parts and call the algorithm for both parts until we get one-item arrays (Fig. 2). One-
item arrays have only one item each, so they are sorted. Now we only have to merge
sorted arrays into a final sorted array. (Fig. 3). (finis)
5
1 3 4 5 7 9 10 12 15 20
3 4 9 10 12 1 5 7 15 20
3 4 10 9 12 1 7 20 5 15
4 10 3 9 12 1 20 7 15 5
10 4 1 20
n items
log n merges
6
a1 6 a2
T F
a2 6 a3 a1 6 a3
T F T F
a1 6 a2 6 a3 a1 6 a3 a1 6 a3 6 a2 a2 6 a3
T F T F
a1 6 a3 6 a2 a3 6 a1 6 a2 a3 6 a2 6 a1 a2 6 a1 6 a3
Figure 5: Decision tree for sorting values a1 , a2 , and a3 . “T” stands for true, “F” – false.
Leaves (final sorted permutations) are coloured.
n! 6 2h (1)
log2 (n!) 6 h (2)
7
1 procedure countingsort ( in_array [ 1 . . n ] , out_array [ 1 . . n ] ,
k)
2 // in_array: input array
3 // out_array: output array
4 // k: maximal value stored in an input array
5
6 f o r i ← 1 to k do // O(k)
7 counter [ i ] ← 0 ;
8 end f o r ;
9 f o r i ← 1 to n do // O(n)
10 counter [ in_array [ i ] ] ← counter [ in_array [ i ] ] + 1 ;
11 end f o r ; // counter [ i ] holds number of items equal i.
12
13 f o r i ← 2 to k do // O(k)
14 counter [ i ] ← counter [ i ] + counter [ i − 1 ] ;
15 end f o r ; // counter [ i ] holds number of items less or equal i.
16
17 f o r i ← n downto 1 do // O(n)
18 out_array [ counter [ in_array [ i ] ] ] ← in_array [ i ] ;
19 counter [ in_array [ i ] ] ← counter [ in_array [ i ] ] − 1 ;
20 end f o r ;
21 end procedure ;
Figure 6: Countsort
Sorting algorithms with complexity less than O(n log n) are possible, but they do
not compare values and need some information on the data to sort.
2.1 Countsort
Countsort assumes that each of sorted numbers is an integer from interval [1, k]
for a certain k. If k ∈ O(n), complexity of countsort is O(n). The pseudocode of
countsort is presented in Fig. 6.
Example 4. We would like to sort numbers from interval [1, 6] stored in input array
in_array:
in_array 4 5 3 6 1 3 6 3 1 3
counter 0 0 0 0 0 0
8
We count each value in array in_array and put the results in counter (lines 9-11 in
Fig. 6):
counter 2 0 4 1 1 2
counter 2 2 6 7 8 10
And the final step: we fill output array out_array (linie 17-20). We fill the array from
its end!
out_array 1 1 3 3 3 3 4 5 6 6
(finis)
2.2 Stability
Countsort is stable.
Problem 1. Why?
2.3 Radixsort
Radixsort is a sorting algorithm for numbers. We sort numbers first by the least
significant digit, then by second least significant digit etc. We know exactly the num-
ber of digits (in decimal system it is 10), so we can use countsort algorithm (with
k = 10).
9
1 procedure radixsort ( A , d )
2 // A: input array
3 // d: number of digits
4 f o r i ← 1 to d do
5 sort A with a stable sorting algorithm with regard to i-th digit ;
6 end f o r ;
7 end procedure ;
Figure 7: Radixsort
(finis)
2.4 Bucketsort
Bucketsort assumes numbers to sort are values from interval [0, 1). They do not
need to be integers. The algorithm assumes a uniform distribution of numbers. The
numbers are located in buckets. The number of buckets equals the number of values to
sort. Because the values have a uniform distribution, the buckets hold similar numbers
of values. The last step is sorting of values in each bucket (Fig. 9).
Example 6. Use bucketsort to sort number in array A (Fig. 8). (finis)
10
A B
1 0.64
4 0.27 0.37
5 0.99
6 0.78
7 0.21 0.64
9 0.18
10 0.74 0.99
1 procedure bucketsort ( A [ 1 . . n ] )
2 // A: input array
3 f o r i ← 1 to n do
4 B [ bnA [ i ] c ] ← A [ i ] ;
5 end f o r ;
6 f o r i ← 0 to n − 1 do
7 sort list B [ i ] with insertion sort ;
8 end f o r ;
9 merge lists B [ 0 ] , B [ 1 ] , . . . , B [ n − 1 ] ;
10 end procedure ;
Figure 9: Bucketsort
11
Dynamic programming
Krzysztof Simiński
Algorithms and data structures
lecture 09, 02nd May 2020
1
A1 A2 A3 A4 A5
Let’s define formally the matrix chain multiplication problem. Given a sequence
of n matrices (A1 , A2 , . . . , An ) find a optimal grouping of matrices that requires the
least number of scalar multiplications.
Number P (n) of different groupings of a sequence of n matrices is:
(
1, for n = 1
P (n) = Pn−1 (1)
k=1 P (k)P (n − k), for n > 2
Or in a closed form: P (n) = C(n−1), where n-th Catalan number C(n) = n+1 1 2n
n ∈
n
Ω n43/2 .
Let’s denote by Ai...j = Ai Ai+1 . . . Aj a product of matrices from i-th to j-th.
2
Let’s analyse a task for five matrices (without a loss of generality). Given a se-
quence of matrices A1 A2 A3 A4 A5 their dimensions must fit: number of columns of a
previous matrix equals number of rows in a next matrix. Because the dimensions of
matrices must fit, we do not have to store numbers of rows of the (i + 1)-th matrix,
because it equals the number of columns of the i-th matrix. It is enough to store a se-
quence of numbers p0 p1 . . . pn . We can easily reconstruct dimensions of i-th matrix:
Ai [pi−1 , pi ].
If we solve the problem with «divide and conquer» paradigm, we have to test all
groupings of matrices into two groups (Fig. 1): (A1 )(A2 A3 A4 A5 ), (A1 A2 )(A3 A4 A5 ),
(A1 A2 A3 )(A4 A5 ), (A1 A2 A3 A4 )(A5 ). Then each subsequence longer then 2 has to
be subgrouped further. On each level we choose the minimal solution.
We have to solve one more problem. We have to merge subsolutions. If we elab-
orate the cost of multiplication of a sequence Aa...b is ma...b and the cost of sequence
Ac...d is mc...d , what is the cost for sequence Aa...b Ac...d ? Cost ma...d of multiplication
of matrices in sequence Aa...d is a sum of cost ma...b and cost mc...d and cost M of
multiplication of these matrices. Matrix Aa...b has as many rows as matrix Aa (let’s de-
note is as rAa ) and as many columns as matrix Ab (let’s denote it as cAb ). Matrix Ac...d
has as many rows as matrix Ac (let’s denote is as rAc = cAb ) and as many columns as
matrix Ad (let’s denote it as cAd ). Multiplication cost M is M = rAa cAb cAd . Finally
multiplication cost for the sequence is
ma...d = ma...b + mc...d + wAa kAb kAd (2)
or
ma...d = ma...b + mc...d + pa−1 pb pd . (3)
We know how to split a task into subtasks and we know how to merge subso-
lutions. We can easily solve the problem recursively. In Fig. 1 sequence (A1 A2 ) is
printed in red. We can easily notice that its cost is elaborated many times. This is why
we put the results into two arrays:
• m[i, j] – minimal number of scalar multiplications mi...j for sequence Ai...j ,
(
0, i=j
m[i, j] = (4)
minj−1
i=k (m[i, k] + m[k + 1, j] + p p p
i−1 k j ) , i<j
• s[i, j] = k, where k stands for the k-th matrix after which we put a parenthesis,
ie the optimal grouping of sequence Ai Ai+1 . . . Aj is (Ai Ai+1 . . . Ak ) (Ak+1 . . . Aj ).
The algorithm is presented in Fig. 2 starts with the shortest sequences and merges
them into longer ones.
Example 4. Let’s find an optimal grouping for sequence A1 A2 A3 A4 of matrices
with dimensions: A1 [20, 1], A2 [1, 100], A3 [100, 2], A4 [2, 50]. First let’s encode matrix
dimensions as a sequence (p0 , p1 , p2 , p3 , p4 ) = (20, 1, 100, 2, 50). We initialize cost
array m and split array s:
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 ∞ ∞ ∞ i=1
2 0 ∞ ∞ 2
3 0 ∞ 3
4 0 4
5 5
3
1 procedure paranthesize ( hp0 , p1 , p2 , . . . , pn i )
2 f o r i ← 1 to n do
3 m[i , i] ← 0;
4 f o r j ← i + 1 to n do
5 m [ i , j ] ← ∞ ; // minimum search
6 end f o r ;
7 end f o r ;
8
9 f o r j ← 2 to n do
10 f o r i ← j − 1 downto 1 do
11 f o r k ← i to j − 1 do
12 temp ← m [ i , k ] + m [ k + 1 , j ] + pi−1 pk pj ;
13 i f temp < m [ i , j ] then
14 m [ i , j ] ← temp ;
15 s[i , j] ← k ;
16 end i f ;
17 end f o r ;
18 end f o r ;
19 end f o r ;
20 r e t u r n {m , s} ;
21 end procedure
• i=2
• k = 2 : We test multiplication cost for sequence Ai...j = A2...3 .
4
The value is less than the value in m[2, 3] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 3] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 ∞ ∞ i=1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• i = 1: Sequence A1...3 has two groupings:
• k = 1: We test multiplication cost for sequence Ai...j = A1...3 grouped
into A1...1 A2...3 .
5
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 10000 3 3
4 0 4
5 5
• i=2
• k = 2: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...2 A3...4 .
m[2, 2] + m[3, 4] + p1 p2 p4 = 200 + 10000 + 1 · 100 · 50 = 15200
The value is less than the value in m[2, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 4] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i = 1 0 2000 240 ∞ i=1 1 1
2 0 200 15200 2 2 2
3 0 10000 3 3
4 0 4
5 5
• k = 3: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...3 A4...4 .
m[2, 3] + m[4, 4] + p1 p3 p4 = 200 + 0 + 1 · 2 · 50 = 300
The value is less than the value in m[2, 4] = 15200, thus we overwrite
it and actualise the grouping location in s[2, 4] with k = 3.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5
• i=1
• k = 1: We test multiplication cost for sequence Ai...j = A1...4 grouped
into A1...1 A2...4 .
m[1, 1] + m[2, 4] + p0 p1 p4 = 0 + 300 + 20 · 1 · 50 = 1300
The value is less than the value in m[1, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[1, 4] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 1300 i=1 1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5
6
1 procedure multiply_matrix_chain ( A = hA1 , A2 , . . . , An i , s , i , j
)
2 i f j > i then
3 X ← multiply_matrix_chain (A , s , i , s[i , j ]) ;
4 Y ← multiply_matrix_chain (A , s , s[i , j] + 1 , j) ;
5 r e t u r n multiplication ( X , Y ) ; // multiply matrices
6 else
7 r e t u r n Ai ;
8 end procedure ;
Problem 1. What are the features of a sequence A1...n and matrix An+1 so that the
multiplication cost of A1...n is less then cost of A1...n+1 ?
7
Example 5. Sequence: (M, I, S, S, I, S, S, I, P, I), examples of subsequences: (I, I),
(M, I, S, S), (S, S, S, S), (M, I, P ). But you cannot swap elements! (P, I, M ) is not
a subsequence of (M, I, S, S, I, S, S, I, P, I). (finis)
when ai = bj
c[i − 1, j − 1] + 1
c[i, j] = (5)
max(c[i, j − 1], c[i − 1, j]) when ai 6= bj
Example 7. Let’s find the longest common sequence for sequences X = (A, B, A,
C, A, B, C, A) and Y = (A, C, A, B, C, B). First we fill a matrix with the algorithm
presented in Fig. 4.
8
1 procedure find_LCS ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4 f o r i ← 0 to m do c [ i , 0 ] ← 0 ; end f o r ;
5 f o r j ← 0 to n do c [ 0 , j ] ← 0 ; end f o r ;
6
7 f o r i ← 1 to m do
8 f o r j ← 1 to n do
9 i f xi = yj then
10 c[i , j] ← c[i − 1 , j − 1] + 1 ;
11 b[i , j] ← "- " ;
12 else
13 i f c [ i − 1 , j ] > c [ i , j −1] then
14 c[i , j] ← c[i − 1 , j ] ;
15 b[i , j] ← "↑ " ;
16 else
17 c[i , j] ← c[i , j − 1];
18 b[i , j] ← "← " ;
19 end i f ;
20 end i f ;
21 end f o r ;
22 end f o r ;
23 end procedure ;
9
Y = A C A B C B
j→ 1 2 3 4 5 6
X i↓
0 0 0 0 0 0 0
- ← - ← ← ←
A 1
0 1 1 1 1 1 1
↑ ↑ ↑ - ← ←
B 2
0 1 1 1 2 2 2
- ↑ - ↑ ↑ ↑
A 3
0 1 1 2 2 2 2
↑ - ↑ ↑ - ←
C 4
0 1 2 2 2 3 3
- ↑ - ← ↑ ↑
A 5
0 1 2 3 3 3 3
- ↑ - ↑ ↑ ↑
A 6
0 1 2 3 3 3 3
↑ ↑ ↑ - ← -
B 7
0 1 2 3 4 4 4
- ↑ - ↑ ↑ ↑
A 8
0 1 2 3 4 4 4
Fig. 5 presents an algorithm for printing the longest common subsequence. (finis)
Problem 3. The solution we have found in Example 7 is not the only one common
sequence with 4 elements. How to modify the algorithm to get other solutions?
If we only need the length of the longest common sequence we do not need all
rows in the matrix. We need only the actual and the previous row.
10
1 f u n c t i o n print_LCS ( b , X = hx1 , . . . , xm i , i , j )
2 i f i = 0 or j = 0 then
3 return ;
4 end i f ;
5
11
3 Edit distance
Edit distance is a number of single edit operations (insertions, deletions, substitu-
tions) necessary to transform a sequence into other. It is often used as a measure of
similarity of sequences, eg in spellchecking. It if a very close problem to the longest
common sequence search problem. We only have to modify the cost formula. The
formula for calculation of edit distance:
where cd stands for deletion cost, ci – insertion cost, and cr – substitution cost. Vari-
ants:
• cr (x, y) can be more complex, e.g., can depend on “similarity” of symbols or
“proximity” of symbols (in spellchecking applications it is much more probable
to press ‘a’ instead of ’s’ than ’a’ instead of ’o’);
• cr (x, y) = 0 for any pair of symbols — this is indel distance;
• sometimes we allow to swap the neighbour symbols (as a single edit operation).
The edit distance problem is solved in a very similar way as the longest common
sequence problem – Fig. 6.
Example 8. Let’s find the edit distance between sequences X = (A, B, A, C, A, B,
C, A) and Y = (A, C, A, B, C, B). Assume costs: ci = 2, cd = 3, and cr (a, b) = 4
for a 6= b and cr (a, a) = 0.
12
1 f u n c t i o n find_edit_distance ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i ,
ci , cd , cr )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4
5 for i ← 0 to m do
6 c[i , 0] ← i ∗ cd ;
7 b[i , 0] ← "↑ " ;
8 end f o r ;
9 for j ← 0 to n do
10 c[0 , j] ← j ∗ ci ;
11 b[0 , j] ← "← " ;
12 end f o r ;
13
14 f o r i ← 1 to m do
15 f o r j ← 1 to n do
16 insertion_cost = c [ i , j − 1 ] + ci ;
17 deletion_cost = c [ i − 1 , j ] + cd ;
18 replacement_cost = c [ i − 1 , j − 1 ] + cr (xi , yj ) ;
19
20 c [ i , j ] ← insertion_cost ;
21 b[i , j] ← "← " ;
22
13
Y = A C A B C B
j→ 1 2 3 4 5 6
← ← ← ← ← ←
X i↓
0 2 4 6 8 10 12
↑ - ← ← ← ← ←
A 1
3 0 2 4 6 8 10
↑ ↑ - ← - ← ←
B 2
6 3 4 6 4 6 8
↑ - - - ← ← ←
A 3
9 6 7 4 6 8 10
↑ ↑ - ↑ - - ←
C 4
12 9 6 7 8 6 8
↑ - ↑ - ← ↑ -
A 5
15 13 9 6 8 9 10
↑ - ↑ - - ← -
A 6
18 15 11 9 10 12 13
↑ ↑ ↑ ↑ - ← -
B 7
21 18 14 13 9 11 12
↑ - ↑ - ↑ - ←
A 8
24 21 17 14 12 13 15
A C A B C B
A B A C A A B A
14
Dynamic programming
Krzysztof Simiński
Algorithms and data structures
lecture 09, 02nd May 2020
1
A1 A2 A3 A4 A5
Let’s define formally the matrix chain multiplication problem. Given a sequence
of n matrices (A1 , A2 , . . . , An ) find a optimal grouping of matrices that requires the
least number of scalar multiplications.
Number P (n) of different groupings of a sequence of n matrices is:
(
1, for n = 1
P (n) = Pn−1 (1)
k=1 P (k)P (n − k), for n > 2
Or in a closed form: P (n) = C(n−1), where n-th Catalan number C(n) = n+1 1 2n
n ∈
n
Ω n43/2 .
Let’s denote by Ai...j = Ai Ai+1 . . . Aj a product of matrices from i-th to j-th.
2
Let’s analyse a task for five matrices (without a loss of generality). Given a se-
quence of matrices A1 A2 A3 A4 A5 their dimensions must fit: number of columns of a
previous matrix equals number of rows in a next matrix. Because the dimensions of
matrices must fit, we do not have to store numbers of rows of the (i + 1)-th matrix,
because it equals the number of columns of the i-th matrix. It is enough to store a se-
quence of numbers p0 p1 . . . pn . We can easily reconstruct dimensions of i-th matrix:
Ai [pi−1 , pi ].
If we solve the problem with «divide and conquer» paradigm, we have to test all
groupings of matrices into two groups (Fig. 1): (A1 )(A2 A3 A4 A5 ), (A1 A2 )(A3 A4 A5 ),
(A1 A2 A3 )(A4 A5 ), (A1 A2 A3 A4 )(A5 ). Then each subsequence longer then 2 has to
be subgrouped further. On each level we choose the minimal solution.
We have to solve one more problem. We have to merge subsolutions. If we elab-
orate the cost of multiplication of a sequence Aa...b is ma...b and the cost of sequence
Ac...d is mc...d , what is the cost for sequence Aa...b Ac...d ? Cost ma...d of multiplication
of matrices in sequence Aa...d is a sum of cost ma...b and cost mc...d and cost M of
multiplication of these matrices. Matrix Aa...b has as many rows as matrix Aa (let’s de-
note is as rAa ) and as many columns as matrix Ab (let’s denote it as cAb ). Matrix Ac...d
has as many rows as matrix Ac (let’s denote is as rAc = cAb ) and as many columns as
matrix Ad (let’s denote it as cAd ). Multiplication cost M is M = rAa cAb cAd . Finally
multiplication cost for the sequence is
ma...d = ma...b + mc...d + wAa kAb kAd (2)
or
ma...d = ma...b + mc...d + pa−1 pb pd . (3)
We know how to split a task into subtasks and we know how to merge subso-
lutions. We can easily solve the problem recursively. In Fig. 1 sequence (A1 A2 ) is
printed in red. We can easily notice that its cost is elaborated many times. This is why
we put the results into two arrays:
• m[i, j] – minimal number of scalar multiplications mi...j for sequence Ai...j ,
(
0, i=j
m[i, j] = (4)
minj−1
i=k (m[i, k] + m[k + 1, j] + p p p
i−1 k j ) , i<j
• s[i, j] = k, where k stands for the k-th matrix after which we put a parenthesis,
ie the optimal grouping of sequence Ai Ai+1 . . . Aj is (Ai Ai+1 . . . Ak ) (Ak+1 . . . Aj ).
The algorithm is presented in Fig. 2 starts with the shortest sequences and merges
them into longer ones.
Example 4. Let’s find an optimal grouping for sequence A1 A2 A3 A4 of matrices
with dimensions: A1 [20, 1], A2 [1, 100], A3 [100, 2], A4 [2, 50]. First let’s encode matrix
dimensions as a sequence (p0 , p1 , p2 , p3 , p4 ) = (20, 1, 100, 2, 50). We initialize cost
array m and split array s:
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 ∞ ∞ ∞ i=1
2 0 ∞ ∞ 2
3 0 ∞ 3
4 0 4
5 5
3
1 procedure paranthesize ( hp0 , p1 , p2 , . . . , pn i )
2 f o r i ← 1 to n do
3 m[i , i] ← 0;
4 f o r j ← i + 1 to n do
5 m [ i , j ] ← ∞ ; // minimum search
6 end f o r ;
7 end f o r ;
8
9 f o r j ← 2 to n do
10 f o r i ← j − 1 downto 1 do
11 f o r k ← i to j − 1 do
12 temp ← m [ i , k ] + m [ k + 1 , j ] + pi−1 pk pj ;
13 i f temp < m [ i , j ] then
14 m [ i , j ] ← temp ;
15 s[i , j] ← k ;
16 end i f ;
17 end f o r ;
18 end f o r ;
19 end f o r ;
20 r e t u r n {m , s} ;
21 end procedure
• i=2
• k = 2 : We test multiplication cost for sequence Ai...j = A2...3 .
4
The value is less than the value in m[2, 3] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 3] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 ∞ ∞ i=1 1
2 0 200 ∞ 2 2
3 0 ∞ 3
4 0 4
5 5
• i = 1: Sequence A1...3 has two groupings:
• k = 1: We test multiplication cost for sequence Ai...j = A1...3 grouped
into A1...1 A2...3 .
5
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 ∞ 2 2
3 0 10000 3 3
4 0 4
5 5
• i=2
• k = 2: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...2 A3...4 .
m[2, 2] + m[3, 4] + p1 p2 p4 = 200 + 10000 + 1 · 100 · 50 = 15200
The value is less than the value in m[2, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[2, 4] with k = 2.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i = 1 0 2000 240 ∞ i=1 1 1
2 0 200 15200 2 2 2
3 0 10000 3 3
4 0 4
5 5
• k = 3: We test multiplication cost for sequence Ai...j = A2...4 grouped
into A2...3 A4...4 .
m[2, 3] + m[4, 4] + p1 p3 p4 = 200 + 0 + 1 · 2 · 50 = 300
The value is less than the value in m[2, 4] = 15200, thus we overwrite
it and actualise the grouping location in s[2, 4] with k = 3.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 ∞ i=1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5
• i=1
• k = 1: We test multiplication cost for sequence Ai...j = A1...4 grouped
into A1...1 A2...4 .
m[1, 1] + m[2, 4] + p0 p1 p4 = 0 + 300 + 20 · 1 · 50 = 1300
The value is less than the value in m[1, 4] = ∞, thus we overwrite it
and actualise the grouping location in s[1, 4] with k = 1.
j j
m[i, j] 1 2 3 4 s[i, j] 1 2 3 4
i=1 0 2000 240 1300 i=1 1 1 1
2 0 200 300 2 2 3
3 0 10000 3 3
4 0 4
5 5
6
1 procedure multiply_matrix_chain ( A = hA1 , A2 , . . . , An i , s , i , j
)
2 i f j > i then
3 X ← multiply_matrix_chain (A , s , i , s[i , j ]) ;
4 Y ← multiply_matrix_chain (A , s , s[i , j] + 1 , j) ;
5 r e t u r n multiplication ( X , Y ) ; // multiply matrices
6 else
7 r e t u r n Ai ;
8 end procedure ;
Problem 1. What are the features of a sequence A1...n and matrix An+1 so that the
multiplication cost of A1...n is less then cost of A1...n+1 ?
7
Example 5. Sequence: (M, I, S, S, I, S, S, I, P, I), examples of subsequences: (I, I),
(M, I, S, S), (S, S, S, S), (M, I, P ). But you cannot swap elements! (P, I, M ) is not
a subsequence of (M, I, S, S, I, S, S, I, P, I). (finis)
when ai = bj
c[i − 1, j − 1] + 1
c[i, j] = (5)
max(c[i, j − 1], c[i − 1, j]) when ai 6= bj
Example 7. Let’s find the longest common sequence for sequences X = (A, B, A,
C, A, B, C, A) and Y = (A, C, A, B, C, B). First we fill a matrix with the algorithm
presented in Fig. 4.
8
1 procedure find_LCS ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4 f o r i ← 0 to m do c [ i , 0 ] ← 0 ; end f o r ;
5 f o r j ← 0 to n do c [ 0 , j ] ← 0 ; end f o r ;
6
7 f o r i ← 1 to m do
8 f o r j ← 1 to n do
9 i f xi = yj then
10 c[i , j] ← c[i − 1 , j − 1] + 1 ;
11 b[i , j] ← "- " ;
12 else
13 i f c [ i − 1 , j ] > c [ i , j −1] then
14 c[i , j] ← c[i − 1 , j ] ;
15 b[i , j] ← "↑ " ;
16 else
17 c[i , j] ← c[i , j − 1];
18 b[i , j] ← "← " ;
19 end i f ;
20 end i f ;
21 end f o r ;
22 end f o r ;
23 end procedure ;
9
Y = A C A B C B
j→ 1 2 3 4 5 6
X i↓
0 0 0 0 0 0 0
- ← - ← ← ←
A 1
0 1 1 1 1 1 1
↑ ↑ ↑ - ← ←
B 2
0 1 1 1 2 2 2
- ↑ - ↑ ↑ ↑
A 3
0 1 1 2 2 2 2
↑ - ↑ ↑ - ←
C 4
0 1 2 2 2 3 3
- ↑ - ← ↑ ↑
A 5
0 1 2 3 3 3 3
- ↑ - ↑ ↑ ↑
A 6
0 1 2 3 3 3 3
↑ ↑ ↑ - ← -
B 7
0 1 2 3 4 4 4
- ↑ - ↑ ↑ ↑
A 8
0 1 2 3 4 4 4
Fig. 5 presents an algorithm for printing the longest common subsequence. (finis)
Problem 3. The solution we have found in Example 7 is not the only one common
sequence with 4 elements. How to modify the algorithm to get other solutions?
If we only need the length of the longest common sequence we do not need all
rows in the matrix. We need only the actual and the previous row.
10
1 f u n c t i o n print_LCS ( b , X = hx1 , . . . , xm i , i , j )
2 i f i = 0 or j = 0 then
3 return ;
4 end i f ;
5
11
3 Edit distance
Edit distance is a number of single edit operations (insertions, deletions, substitu-
tions) necessary to transform a sequence into other. It is often used as a measure of
similarity of sequences, eg in spellchecking. It if a very close problem to the longest
common sequence search problem. We only have to modify the cost formula. The
formula for calculation of edit distance:
where cd stands for deletion cost, ci – insertion cost, and cr – substitution cost. Vari-
ants:
• cr (x, y) can be more complex, e.g., can depend on “similarity” of symbols or
“proximity” of symbols (in spellchecking applications it is much more probable
to press ‘a’ instead of ’s’ than ’a’ instead of ’o’);
• cr (x, y) = 0 for any pair of symbols — this is indel distance;
• sometimes we allow to swap the neighbour symbols (as a single edit operation).
The edit distance problem is solved in a very similar way as the longest common
sequence problem – Fig. 6.
Example 8. Let’s find the edit distance between sequences X = (A, B, A, C, A, B,
C, A) and Y = (A, C, A, B, C, B). Assume costs: ci = 2, cd = 3, and cr (a, b) = 4
for a 6= b and cr (a, a) = 0.
12
1 f u n c t i o n find_edit_distance ( X = hx1 , . . . , xm i , Y = hy1 , . . . , yn i ,
ci , cd , cr )
2 m ← length ( X ) ;
3 n ← length ( Y ) ;
4
5 for i ← 0 to m do
6 c[i , 0] ← i ∗ cd ;
7 b[i , 0] ← "↑ " ;
8 end f o r ;
9 for j ← 0 to n do
10 c[0 , j] ← j ∗ ci ;
11 b[0 , j] ← "← " ;
12 end f o r ;
13
14 f o r i ← 1 to m do
15 f o r j ← 1 to n do
16 insertion_cost = c [ i , j − 1 ] + ci ;
17 deletion_cost = c [ i − 1 , j ] + cd ;
18 replacement_cost = c [ i − 1 , j − 1 ] + cr (xi , yj ) ;
19
20 c [ i , j ] ← insertion_cost ;
21 b[i , j] ← "← " ;
22
13
Y = A C A B C B
j→ 1 2 3 4 5 6
← ← ← ← ← ←
X i↓
0 2 4 6 8 10 12
↑ - ← ← ← ← ←
A 1
3 0 2 4 6 8 10
↑ ↑ - ← - ← ←
B 2
6 3 4 6 4 6 8
↑ - - - ← ← ←
A 3
9 6 7 4 6 8 10
↑ ↑ - ↑ - - ←
C 4
12 9 6 7 8 6 8
↑ - ↑ - ← ↑ -
A 5
15 13 9 6 8 9 10
↑ - ↑ - - ← -
A 6
18 15 11 9 10 12 13
↑ ↑ ↑ ↑ - ← -
B 7
21 18 14 13 9 11 12
↑ - ↑ - ↑ - ←
A 8
24 21 17 14 12 13 15
A C A B C B
A B A C A A B A
14
Graphs (part 1)
Krzysztof Simiński
Definition 1. Graph G = (V, E) is a pair of sets: a set of vertices (nodes, points) V and
a (multi)set of edges (links, arrows, arcs) E.
Number of vertices is denoted with n or |V|. Number of edges is denoted with m
or |E|.
Definition 2. An edge e = (va , vb ) is a pair of two vertices va and vb .
Some definitions allow multiple edges between the same vertices – such graphs
are called multigraphs.
Definition 3. An edge e = (va , vb ) is undirected, if (va , vb ) ∈ E and (vb , va ) ∈ E,
where E is a set of edges.
1 Graph representation
There are two common representations of graphs in computers.
1. Adjacency list is a list of vertices and each vertex has its own list of neighbours
(vertices it has links to).
1
2. Adjacency matrix is a square matrix in which each vertex is represented by both
a column and a row. A ‘1’ in a cell denotes a node represented by cell’s row is
connected with a node represented by cell’s column.
(
1, if (i, j) ∈ E
aij = (1)
0, otherwise
v2 v4
v1 v3
Adjacency list:
v1 v2 v3
v2 v1 v4
v3 v1 v2
v4 v3
Adjacency matrix:
0 1 1 0
1 0 0 1
1 1 0 0
0 0 1 0
We can easily compute degrees of all vertices: a sum of 1’s in a row denotes an output
degree of a vertex, a sum of 1’s in a column denotes an input degree of a vertex.
(finis)
2 Graph searching
Graph searching is a method of visiting all vertices in a graph. There two essential
algorithms: breadth-first search (BFS) and depth-first search (DFS). They are often a
base for more sophisticated graph algorithms.
2
1 procedure breadth_first_search ( G = (V, E), s )
2 // G: graph
3 // s: start vertex
4
5 // initialization:
6 foreach v in V \ s do
7 v . state ← unvisited ; // valid states: unvisited, visited, analysed
8 v . distance ← ∞ ; // distance from s vertex
9 v . predecessor ← null ;
10 end foreach ;
11
12 s . state ← visited ;
13 s . distance ← 0 ;
14 s . predecessor ← null ;
15
3
2.1 Breadth first search
Breadth-first search algorithm starts from any vertex of a graph, visits its all neigh-
bours. Having visited all neighbours it visits all neighbours’ neighbours. It proceeds
until all vertices are visited (Fig. 1).
Example 2. Let’s apply breadth-first search to graph G = (V, E), where: V =
{A, B, C, D, E, F, G, H, I} and edges: E = {(A, D), (A, E), (B, C), (C, F ), (D, G),
(E, F ), (F, H), (F, I)}.
Let’s use adjacency matrix (we only put 1’s):
A B C D E F G H I
A 1 1
B 1
C 1 1 1
D 1 1
E 1 1 1
F 1 1 1 1
G 1
H 1
I 1
Let’s start with vertex E (any vertex is a good choice). First we initialize all vertices
(line 6 in Fig. 1).
4
node state distance predecessor A B C D E F G H I
A visited 1 E 1 1
B unvisited ∞ null 1
C visited 1 E 1 1 1
D unvisited ∞ null 1 1
E analysed 0 null 1 1 1
F visited 1 E 1 1 1 1
G unvisited ∞ null 1
H unvisited ∞ null 1
I unvisited ∞ null 1
Q A, C, F
The queue is not empty. We pop a vertex from the queue:
5
node state distance predecessor A B C D E F G H I
A analysed 1 E 1 1
B visited 3 B 1
C analysed 1 E 1 1 1
D visited 2 A 1 1
E analysed 0 null 1 1 1
F analysed 1 E 1 1 1 1
G unvisited ∞ null 1
H visited 2 F 1
I visited 2 F 1
Q D, B, H, I
6
We analyse neighbours of I. It has no unvisited neighbours.
Computation complexity Each vertex in pushed into and popped from a queue
once. Each edge is used only once. Thus Tt ∈ O(|V| + |E|).
7
1 procedure depth_first_search ( G = (V, E), s )
2 // G: graph
3 // s: start vertex
4
5 // initialization:
6 foreach v in V do
7 v . state ← unvisited ; // valid states: unvisited, visited, analysed
8 v . predecessor ← null ;
9 end foreach ;
10
11 visit ( s ) ;
12 end procedure ;
13
14 procedure visit ( u )
15 u . state ← visited ;
16
8
We visit G and change its state to ‘visited’. G has no unvisited neighbours, we change
its state to ‘analysed’ and trace back to D. D has no unvisited neighbours (is ‘ana-
lysed’) and we trace back to A. A has no unvisited neighbours (is ‘analysed’) and we
trace back to E. E has unvisited neighbours (C, F ). We choose one of them, eg C.
We change its state to ‘visited’. C has unvisited neighbours (B, F ). We choose one of
them, eg B. We change its state to ‘visited’. B has no unvisited neighbours (is ‘ana-
lysed’) and we trace back to C. C has one unvisited neighbour F . So we move to F .
We change its state to ‘visited’. F has unvisited neighbours (H, I). We choose one of
them, eg H. We change its state to ‘visited’. H has no unvisited neighbours (is ‘ana-
lysed’) and we trace back to F . F has one unvisited neighbour I. We move to I. We
change its state to ‘visited’. I has no unvisited neighbours (is ‘analysed’) and we trace
back to F . F has no unvisited neighbours (is ‘analysed’) and we trace back to C. C
has no unvisited neighbours (is ‘analysed’) and we trace back to E. E has no unvisited
neighbours (is ‘analysed’). All vertices have been analysed. (finis)
Computation complexity Each vertex in visited only once. Each edge is trans-
versed only once. Thus Tt ∈ O(|V| + |E|).
3 Cycles
Definition 11. The cycle is a path with the same starting and ending vertex.
Definition 12. The loop is a cycle with length 1.
9
1 procedure cycle_detection ( G = (V, E), s )
2 // G: graph
3 // s: start vertex
4
5 // initialization:
6 foreach v in V do
7 v . state ← unvisited ; // valid states: unvisited, visited, analysed
8 v . predecessor ← null ;
9 end foreach ;
10
11 visit ( s ) ;
12 end procedure ;
13
14 procedure visit ( u )
15 u . state ← visited ;
16
10
1 procedure Euler ( G = (V, E) )
2 // for any vertex v
3 Q ← ∅ ; // empty queue
4 visit ( v ) ;
5 print ( Q ) ;
6 end procedure ;
7
8 procedure visit ( v )
9 foreach vertex u incident to v do
10 remove edge ( v , u ) ;
11 visit ( u ) ;
12 Q . push ( v ) ;
13 end foreach ;
14 end procedure ;
Definition 15. An Eulerian cycle is an Eulerian path that starts and ends on the same
vertex.
The problem of the Eulerian cycle in graph was first stated by Leonhard Euler. It is
a famous problem known as the Seven Bridges of Königsberg. In Königsberg (today:
Kalinigrad) two islands on the Pregel River were connected with the mainland with 7
bridges. The question is: Is it possible to cross all bridges only once and return to the
starting point? Euler solved this problem in 1735 and proved it was impossible.
Theorem 1. A connected undirected graph has an Eulerian cycle if and only if each its
vertex has an even degree.
Theorem 2. A connected directed graph has an Eulerian cycle if and only if for each
vertex its input degree equals its output degree.
B D
A C E
G F
Let’s start with vertex B. We visit it. It has two neighbour. We visit C and remove
edge (B, C).
11
B D
A C E
G F
B D
A C E
G F
B D
A C E
G F
B D
A C E
G F
B has no more neighbours. We push B into a queue and trace back. A has no more
neighbours. We push A into a queue and trace back. G has no more neighbours. We
push G into a queue and trace back. C has two neighbour. Before we visit one of then
we push C into a queue and trace back. In this moment queue Q = (C, G, A, B). In
the same way we visit D and remove (C, D).
B D
A C E
G F
We visit E.
12
B D
A C E
G F
We visit F .
B D
A C E
G F
We visit C.
B D
A C E
G F
Now we trace back and push vertices into the queue. Finally we print the queue Q =
(B, C, D, E, F, C, G, A, B). (finis)
Definition 17. A Hamiltonian cycle is a Hamiltonian path that starts and ends on the
same vertex.
Determining whether Hamiltonian paths and cycles exist in a graph is NP-complete
problem. We do not know if a polynomial solution exists. We can solve it by testing
all permutations of vertices.
4 Trees
There are several equivalent definitions of a tree in the graph theory.
13
Definition 21. A tree is a graph in which adding of a edge makes it a cyclic graph.
Definition 22. A tree is a connected graph in which |V| = |E| + 1.
In some algorithms we use forests.
Definition 23. A forest is an acyclic graph.
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
14
1 procedure Kruskal ( G = (V, E, w) )
2 // G: graph
3 // w: edge weights
4
5 // initialization:
6 // T = (Vt , Et ) // spanning tree
7 Vt ← ∅ ; // empty spanning tree
8 Et ← ∅ ;
9
12 foreach e ∈ E do
13 Q . push ( e ) ; // priority queue
14 end foreach ;
15
29 r e t u r n T = (Vt , Et ) ;
30 end procedure ;
15
First we put each vertex in its one-element set (line 10) and we put all edges into a
priority queue (line 13).
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
Then we pop an edge with the smallest weight from the queue and check if the edge
joins nodes in two different sets. In our example the lightest edge joins B and E. These
vertices are in two different sets (red and violet). We add the edge to the spanning tree
(line 24) and merge (line 25) sets of nodes (now B and E are in a red set).
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
We pop an edge with the smallest weight from the queue and check if the edge joins
nodes in two different sets. In our example it joins D and G. These vertices are in two
different sets (blue and black). We add the edge to the spanning tree and merge sets
of nodes.
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
We pop an edge with the smallest weight from the queue and check if the edge joins
nodes in two different sets. In our example it joins A and B. These vertices are in two
16
different sets (yellow and red). We add the edge to the spanning tree and merge sets
of nodes.
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
17
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
The next edge we pop from the queue has weight 7. It joins E and F. But we do not
add it to the spanning tree because both E and F are in the same set (the red set). So
we pop the next edge – its weight is 8. We can add this edge to the spanning tree.
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
The next edge has weight 7. It joins C and E. We cannot add it. We add edge with
weight 10.
3 11
A B C
1 8
13 9
5 7
D E F
10
2
6
12 4
G H I
No more edges can be added. The minimal spanning tree has red vertices and red
edges. (finis)
18
1 procedure MakeSet ( x )
2 x . parent = x ;
3 x . rank = 0;
4 end procedure ;
2. testing if two vertices belong to the same set of vertices – fortunately this pro-
cedure has low complexity due to applied disjoint-set data structure.
The computational complexity of the Kruskal’s algorithm is O (|E| log |E|).
This is an extremely fast growing function, eg. A(4, 2) = 265536 − 3. Let’s denote
f (n) = A(n, n). Let’s define the inverse of the function f as α = f −1 . This is very,
19
1 procedure Union ( x , y )
2
3 parent_x ← FindSet ( x ) ;
4 parent_y ← FindSet ( y ) ;
5
1 procedure FindSet ( x )
2 i f x . parent 6= x then
3 x . parent ← FindSet ( x . parent ) ; // path compression
4 end i f ;
5
6 r e t u r n x . parent ;
7 end procedure ;
very slow growing function. α(n) is less than 5 for all remotely practical values of n,
1019729
because f (4) = A(4, 4) ≈ 22 . In practice α(n) < 5, so we treat it as a constant.
In practice the disjoint-set data set works in constant time.
Example 7. Let’s analyse how the disjoint-set structure works. We have six items:
A, B, C, D, E, F .
First we call MakeSet procedure (Fig. 6). It creates a disjoint set for each item.
Rank of each item is zero. Each item is its own parent.
A B C D E F
0 0 0 0 0 0
Let’s merge A with B, C with D, and E with F . We call Union procedure (Fig. 7).
Union procedure calls FindSet, but it is trivial in this case.
20
A B C D E F
1 0 1 0 1 0
Let’s merge D and F . Union procedure calls FindSet for D and for F . C is returned
for D: C is D’s representative. And similarly E is returned for F . Both trees have
the same “height” (we do not use straight height but “height” because we use ranges
here).
A B C D E F
1 0 2 0 1 0
Let’s merge B and F . Union procedure calls FindSet for B and for F . C is returned
for F . FindSet procedure not only returns a representative but also compresses the
path. All item on the path from F to C have their parents set to C. It makes the tree
very shallow. The tree with A representative is now a subtree of the tree with higher
range.
A B C D E F
1 0 2 0 1 0
And eventually let’s find a representative for B. We call FindSet and it returns C.
FindSet procedure uses path compression. All items on the path from B to its rep-
resentative are modified and their parents are set to the representative C.
A B C D E F
1 0 2 0 1 0
(finis)
21
Graphs (part 2, shortest paths)
Krzysztof Simiński
shortest paths
Floyd-Warshall alg.
1
1 Floyd-Warshall algorithm
The Floyd-Warshall algorithm is a dynamic programming algorithm for calcula-
tion of the shortest paths between all pairs of vertices.
The idea is straightforward. Let’s assume we have found a path from vertex X to
w
Y whose length is w: X − → Y . We test each vertex of the graph as an intermediate
u v
vertex. For each vertex V ∈ V we test if the path X − → V − → Y is shorter than
w
X− → Y . If it is shorter we store the information in an array (Fig. 2).
In the algorithm we use an adjacency matrix representation of a graph.
Example 1. Let’s find the shortest paths between all pairs of vertices in the graph
below.
3
B D
7
4 −1 10 2
12
A C
First we initialise the distance d0 and predecessor p0 arrays (line 6 in Fig. 2). The
distance array is an adjacency matrix for the graph with ∞ if an edge does not exist.
d0 A B C D p0 A B C D
A 0 4 12 ∞ A A B C 0
B −1 0 ∞ 3 B A B 0 D
C 2 7 0 10 C A B C D
D ∞ ∞ 2 0 D 0 0 C D
For each X → Y edge we test if the path via vertex A, ie. X → A → Y is shorter
than X → Y . It if is, we actualise both d (line 30) and p arrays (line 31).
dA A B C D dA A B C D
A 0 4 12 ∞ A A B C 0
B −1 0 13 3 B A B A D
C 2 6 0 10 C A A C D
D ∞ ∞ 2 0 D 0 0 C D
We repeat the same procedure for vertex B. We test if X → B → Y is shorter than
X →Y.
dB A B C D pB A B C D
A 0 4 12 7 A A B C B
B −1 0 13 3 B A B A D
C 2 6 0 9 C A A C A
D ∞ ∞ 2 0 D 0 0 C D
The same for C …
2
1 procedure Floyd_Washall ( G = (V, E), w )
2 // w – edge weights
3 // d – array of distances
4 // p – array of predecessors
5
8 f o r i ← 1 to n do // initialisation
9 f o r j ← 1 to n do
10 i f i = j then
11 d[i , j] ← 0;
12 e l s e i f ∃wij then
13 d [ i , j ] ← wij ;
14 else
15 d[i , j] ← ∞;
16 end i f ;
17 i f d [ i , j ] ̸= ∞ then
18 p[i , j] ← j ;
19 else
20 p[i , j] ← 0;
21 end i f ;
22 end f o r ;
23 end f o r ;
24
25 f o r k ← 1 to n do // elaboration of distances
26 f o r i ← 1 to n do
27 i f d [ i , k ] ̸= ∞ then
28 f o r j ← 1 to n do
29 i f d [ i , k ] + d [ k , j ] < d [ i , j ] then
30 d[i , j] ← d[i , k] + d[k , j ] ;
31 p[i , j] ← p[i , k ] ;
32 end i f ;
33 end f o r ;
34 end i f ;
35 end f o r ;
36 end f o r ;
37 end procedure ;
3
dC A B C D pC A B C D
A 0 4 12 7 A A B C B
B −1 0 13 3 B A B A D
C 2 6 0 9 C A A C A
D 4 8 2 0 D C C C D
… and D.
dD A B C D pD A B C D
A 0 4 9 7 A A B B B
B −1 0 5 3 B A B D D
C 2 6 0 9 C A A C A
D 4 8 2 0 D C C C D
(finis)
Computation complexity First we have to initialise arrays what takes O(n2 ) time.
Then three nested loops: O(n3 ). Finally time complexity is O(n3 ). The space com-
plexity O(n2 ) – two square arrays n × n.
2 Bellman-Ford-Moore algorithm
The Bellman-Ford-Moore algorithm is a greedy algorithm for calculating the shor-
test paths from one reference vertex (source) to all other vertices. The algorithm allows
negative weights of edges (cf Fig. 1).
The algorithm is presented in Fig. 3. It uses relaxation of edges in a graph. The
idea is quite simple. Let’s assume we try to relax edge (a, b) whose weight is wab .
Let’s assume the distance from the source to vertex a id da and to vertex b is vb . If we
get to a and use edge (a, b), the distance is da + wab . If it is less then db , we use the
edge in the path.
Example 2. The same graph as in Example 1. The task is to calculate the shortest paths
from vertex D (source).
3
B D
7
4 −1 10 2
12
A C
The order of edges in relaxation is: w(A, B) = 4, w(A, C) = 12, w(B, A) = −1,
w(B, D) = 3, w(C, A) = 2, w(C, B) = 7, w(C, D) = 10, w(D, C) = 2.
First we initialize two arrays: d for distance and p for predecessors (line 5 in Fig. 3):
4
1 procedure Bellman_Ford_Moore ( G = (V, E), w, s )
2 // w – edge weights
3 // s – start vertex
4
5 // initialisation:
6 d[s] ← 0; // array of distances
7 p[s] ← 0; // array of predecessors
8 foreach v in V \ s do
9 d[v] ← ∞; // array of distances
10 p[v] ← 0; // array of predecessors
11 end foreach ;
12
13 // relaxation of edges:
14 f o r i ← 1 to |V| − 1 do
15 foreach e = (u, v) ∈ E do // for each edge
16 i f d [ u ] + wuv < d [ v ] then
17 d [ v ] = d [ u ] + wuv ;
18 p[v] = u ;
19 end i f ;
20 end foreach ;
21 end f o r ;
22
23 end procedure ;
5
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ A 0
B ∞ B 0
C ∞ C 0
D 0 D 0
In each iteration we try to relax all edges in the following order: w(A, B) =
4, w(A, C) = 12, w(B, A) = −1, w(B, D) = 3, w(C, A) = 2, w(C, B) = 7,
w(C, D) = 10, w(D, C) = 2. We can only relax the last edge, because its starting
vertex D is the only vertex with a finite distance. The distance for D is 0. If we use
edge (D, C) with weight w = 2 we can get to C at distance 2.
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ ∞ A 0 0
B ∞ ∞ B 0 0
C ∞ 2 C 0 C
D 0 0 D 0 0
In the second iteration we try to relax all edges. Having relaxed the edges we elaborate
shorter distances to vertex A and B.
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ ∞ 4 A 0 0 C
B ∞ ∞ 9 B 0 0 C
C ∞ 2 2 C 0 C C
D 0 0 0 D 0 0 0
The last iteration shortens the distance to B.
d0 d1 d2 d3 p0 p1 p2 p3
A ∞ ∞ 4 4 A 0 0 C C
B ∞ ∞ 9 8 B 0 0 C A
C ∞ 2 2 2 C 0 C C C
D 0 0 0 0 D 0 0 0 0
Let’s reconstruct the shortest paths from the source vertex D.
From the left array we read the distance for D → A. It is 4. Let’s reconstruct the
path D → A. From the right array we read: if you want to get to A, you have to get
to C first: D → C → A. If you want to get to C, just go straight to C. We have just
reconstructed the path.
Let’s reconstruct the path D → B. If you want to get to B, first go to A: D →
A → B. If you want to get to A, you have to get to C first: D → C → A → B. And
the distance is 8.
The path D → C has no intermediate vertices. (finis)
Computation complexity The external loop (line 14 in Fig. 3) has |V| − 1 iterations.
The internal loop (line 15 in Fig. 3) has |E| iterations. Finally the time complexity is
O (|V||E|).
6
1 procedure Dijkstra ( G = (V, E), w, s )
2 // w – edge weights
3 // s – start vertex
4
5 // initialisation:
6 d [ s ] ← 0 ; // array of distances
7 p [ s ] ← 0 ; // array of predecessors
8 foreach v in V \ s do
9 d[v] ← ∞; // array of distances
10 p[v] ← 0; // array of predecessors
11 end foreach ;
12
23 procedure relaxation ( u , v ) ;
24 // u and v are vertices
25 i f d [ u ] + wuv < d [ v ] then
26 d [ v ] ← d [ u ] + wuv ;
27 p[v] ← u ;
28 end i f ;
29 end procedure ;
3 Dijkstra’s algorithm
The Dijkstra’s¹ algorithm is a greedy algorithm for elaboration of the shortest
paths from a source to all other vertices. The algorithm cannot be applied for graphs
with negative weights of edges (cf Fig. 1). The algorithm is based on the breadth-first
search (Fig. 4). The first step is initialisation of distances. All vertices are assigned with
infinite distance, but the source vertex that is assigned with zero (line 5 in Fig. 4). Then
we put all vertices into the priority queue Q (line 13). We pop vertex u with minimal
distance (line 13). Then we relax all neighbours (breadth-first approach) of vertex u
and actualise their minimal distances. We end the algorithm when the queue is empty.
Example 3. Let’s run the Dijkstra’s algorithm for the graph below and for source A.
The graph has no negative weights.
¹pronunciation: [ˈdɛjkstɾa]
7
C 1
5
1
B E
3
10
4 4
15
A F
20 2
D
First we initialise distances and predecessors. We put all vertices in the priority queue.
Vertices in the queue are white.
∞, 0
C 1
∞, 0 5 ∞, 0
1
B E
3
10
4 4
0, 0 ∞, 0
15
A F
20 2
D
∞, 0
We pop a vertex with the minimal distance. It is A. Then we relax all its neighbours.
∞, 0
C 1
4, A 5 10, A
1
B E
3
10
4 4
0, 0 15, A
15
A F
20 2
D
∞, 0
8
9, B
C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 15, A
15
A F
20 2
D
∞, 0
8, E
C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F
20 2
D
∞, 0
8, E
C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F
20 2
D
∞, 0
9
We repeat the procedure for F …
8, E
C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F
20 2
D
13, F
… and for D.
8, E
C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F
20 2
D
13, F
8, E
C 1
4, A 5 7, B
1
B E
3
10
4 4
0, 0 11, E
15
A F
20 2
D
13, F
10
We have elaborated the shortest paths for all vertices. Let’s reconstruct the path for
D. Its predecessor is F . Its predecessor is E. Its predecessor is B. Its predecessor is
A. Finally A → B → E → F : 13.
(finis)
11
Exhaustive search
Krzysztof Simiński
Algorithms and data structures
lecture 12, 15th May 2020
Contents
1 Eight queens puzzle 1
2 Game trees 5
2.1 Nim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Min-max tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 α-β cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Noughts and crosses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Chess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1
8
0Z0l0Z0Z
7
ZqZ0Z0Z0
6
0Z0Z0ZqZ
5
Z0l0Z0Z0
4
0Z0Z0l0Z
3
Z0Z0Z0Zq
2
0Z0ZqZ0Z
1
l0Z0Z0Z0
a b c d e f g h
Table 1: Number of solutions of the n queens problem for various sizes of a chessboard.
number of
size n solutions positions
`16˘
4 2 “ 1 820
`524˘
5 10 “ 53 130
`62 ˘ 5
6 4 6 “ 1 947 792
`72 ˘
7 40 “ 85 900 584
`82 ˘ 7
8 92 “ 4 426 165 368
`92 ˘8
9 352 9 “ 260 887 834 350
`102 ˘
10 724 “ 17 310 309 456 440
`112 ˘10
11 2 680 “ 1 276 749 965 026 536
`12211
˘
12 14 200 “ 103 619 293 824 707 388
`13212
˘
13 73 712 13 “ 9 176 358 300 744 339 432
`142 ˘
14 365 596 “ 880 530 516 383 349 192 480
`152 ˘14
15 2 279 184 15 “ 91 005 567 811 177 478 095 440
. . .
. . .
. . .
`272 ˘
27 234 907 967 154 122 528 27 “ 11 091 107 763 254 898 773 425 731 705 373 527 055 193 637 625 824
2
Because the number of possible positions on a chessboard is huge we have to re-
duce the number of analysed potential solutions. In this problem we know each queen
occupies its own rank, file, and diagonal. So we do not analyse positions with mul-
tiple queens in the same rand, file, or diagonal. This reduces the number of potential
solutions. We can also notice that some solutions are rotations or symmetries.
Let’s solve the problem for a small chessboard 4ˆ4. First let’s encode a chessboard
situation in a concise way. Because each queen is located in a separate file, for each
file we encode a number of rank with a queen.
4
0l0Z
3
Z0Zq
2
qZ0Z
1
Z0l0 is encoded as p2, 4, 1, 3q. (finis)
a b c d
Let’s analyse all possible positions. A question mark ‘?’ denotes an empty column.
We start with the empty chessboard:
p?, ?, ?, ?q
1. p1, ?, ?, ?q
(a) p1, 1, ?, ?q
(b) p1, 2, ?, ?q
(c) p1, 3, ?, ?q
i. p1, 3, 1, ?q
ii. p1, 3, 2, ?q
iii. p1, 3, 3, ?q
iv. p1, 3, 4, ?q
(d) p1, 4, ?, ?q
i. p1, 4, 1, ?q
ii. p1, 4, 2, ?q
A. p1, 4, 2, 1q
B. p1, 4, 2, 2q
C. p1, 4, 2, 3q
D. p1, 4, 2, 4q
iii. p1, 4, 3, ?q
iv. p1, 4, 4, ?q
2. p2, ?, ?, ?q
(a) p2, 1, ?, ?q
(b) p2, 2, ?, ?q
(c) p2, 3, ?, ?q
(d) p2, 4, ?, ?q
3
i. p2, 4, 1, ?q
A. p2, 4, 1, 1q
B. p2, 4, 1, 2q
C. p2, 4, 1, 3q
4
0l0Z
3
Z0Zq
2
qZ0Z
1
Z0l0
a b c d
D. p2, 4, 1, 4q
ii. p2, 4, 2, ?q
iii. p2, 4, 3, ?q
iv. p2, 4, 4, ?q
3. p3, ?, ?, ?q
(a) p3, 1, ?, ?q
i. p3, 1, 1, ?q
ii. p3, 1, 2, ?q
iii. p3, 1, 3, ?q
iv. p3, 1, 4, ?q
A. p3, 1, 4, 1q
B. p3, 1, 4, 2q
4
0ZqZ
3
l0Z0
2
0Z0l
1
ZqZ0
a b c d
C. p3, 1, 4, 3q
D. p3, 1, 4, 4q
(b) p3, 2, ?, ?q
(c) p3, 3, ?, ?q
(d) p3, 4, ?, ?q
4. p4, ?, ?, ?q
(a) p4, 1, ?, ?q
i. p4, 1, 1, ?q
ii. p4, 1, 2, ?q
iii. p4, 1, 3, ?q
A. p4, 1, 3, 1q
B. p4, 1, 3, 2q
4
C. p4, 1, 3, 3q
D. p4, 1, 3, 4q
iv. p4, 1, 4, ?q
(b) p4, 2, ?, ?q
i. p4, 2, 1, ?q
ii. p4, 2, 2, ?q
iii. p4, 2, 3, ?q
iv. p4, 2, 4, ?q
(c) p4, 3, ?, ?q
(d) p4, 4, ?, ?q
There are only two solutions to the four queen problem. It is possible to make
this procedure faster because we can find same patterns eg. symmetry or rotation of
partial solutions.
2 Game trees
Unfortunately we do not know how to express popular games with mathematical
formulae. Very often we build game trees that represents all (or only some) possible
moves of players and game situations. Let’s discuss this technique with a simple game
– «Nim».
2.1 Nim
The Nim game is a strategy game for two players. The game starts with 5 objects.
Each of two players removes 1, 2, or 3 objects. The player who removes the last item
– looses. It is an example of a misère game – the last player to play looses.
Let’s draw a game tree for the Nim game. Each node represents a situation in the
game. Each edge is a move of a player. Dashed lines represent Alice’s moves, solid
lines – Bob’s moves. Fig. 3 presents the tree for all possible situations and moves.
5
1 2 3
Alice takes
Bob takes
1 2 3 1 2 3 1 2
Alice takes
1 2 3 1 2 1 1 2 1 1
Bob takes
1 2 1 1 1
Alice takes
1
Figure 2: Game tree for the Nim game. Each node of the tree represents a situation of
the game. Dashed edges stand for Alice’s moves, solid – Bob’s moves.
max ´1
Alice’s move
min ´1 ´1 ´1
Bob’s move
max `1 `1 ´1 `1 ´1 `1 ´1 `1
Alice’s move
min ´1 `1 ´1 `1 ´1 ´1 `1 ´1 ´1 ´1
Bob’s move
max ´1 `1 `1 `1 `1
Alice’s move
´1
6
2.2 α-β cut
The Nim game is very simple and we can draw a complete game tree. Unfortu-
nately for most interesting games it is impossible to draw a complete tree. We have to
limit the size of a game tree. The are two common techniques: limitation of the height
of the tree and α-β cut.
Limitation of tree height is a simple technique, but has a difficult consequence.
If we limit tree height it is very possible we do not reach leaves of the tree. If we
do not have access to the leaves, we do not know who wins. It is a serious problem.
We have to estimate the chance of winning of a situation on the deepest level of the
tree without actual knowledge. We commonly use some rules of thumbs, heuristics,
experts’ experience etc. This is why we do not use just `1, 0, and ´1. We use wider
range of values.
The second technique, the α-β cut, reduces the breadth of the tree.
Example 3. Let’s apply the α-β cut for the game tree below. The height of the tree
is 3.
We estimate two situations as 4 and 8. We use max-min tree to elaborate tempor-
ary values of nodes up to the root of the tree.
4
min
Ś Ś Ś
max max max
4 8
We calculate the values of two more situations. They are 9 and 3. These values do
not change the value of the root.
4
min
4 3
4 8 9 3
We elaborate two more estimations: 12 and 8. These estimations do not change
the value in the root. But let’s analyse it deeper. If the value of the blue subtree is less
than 8, its value does not propagate upwards, because is blocked by the max operator.
If the value of the blue subtree is greater than 8, its is not blocked by the max operator,
7
but is blocked by the min operator one level above. And again the value of the blue
tree does not propagate to the root. The value of the blue tree does not propagate
upward, it is blocked either by min or by max. Thus there is no sense in elaboration
of values in the blue subtree. We just prune it off.
4
min
4 8
4 3 8
4 8 9 3 12 8
We have a similar situation for green and red subtrees. Their values do never
propagate to the root. So we do not elaborate their values. If we apply the α-β cut, we
reduce the number of estimations from 15 to 8.
4
min
4 ě8 ě5
Ś Ś Ś
max max max
4 3 8 5
4 8 9 3 12 8 12 5
(finis)
2.4 Chess
The chess is a very complicated game. We cannot draw a complete game tree. We
estimate the upper bound for a number of positions as 1047.6 . The complexity of the
game (size of the tree) is estimated as 10123 . We cannot draw a complete game tree,
8
because we cannot write the tree. We have only 1080 baryons in the visible Universe.
123
So to construct a complete game tree we need 10
1080 “ 10
43
universes. Unfortunately
we have only one Universe at our disposal. So playing chess has its sense. We do not
know the strategy.
Problem 2. Propose an exhaustive search algorithm for solving the criptarithm above.
˛
An alphametic is a criptarithm in which numbers encoded in letters produce words
or sentences. The oldest alfametic is: SEND ` MORE “ MONEY.
If all word in an alphametic are numerals and the arithmetic and linguistic mean-
ing is the same, it is a doubly true alphametic. The first doubly true alphametic is
FORTY + TEN + TEN = SIXTY.
Some more examples:
SEVEN + SEVEN + SIX = TWENTY
TWENTY + FIFTY + NINE + ONE = EIGHTY (364832 + 75732 + 8584 + 984 =
450132)
SEVEN + TEN + ONE = THREE + NINE + SIX
(INE)2 ` (TATU)2 = (TANO)2 (in Suahili: 32 ` 42 “ 52 )
(TRI)2 ` (KVAR)2 = (KVIN)2 (in Esperanto: 32 ` 42 “ 52 )
and humorous (DUŻO)2 “ MNÓSTWO (in Polish: 23962 “ 5740816)
9
Complexity problems
Krzysztof Simiński
We devoted our first lecture to the computational complexity. Today we are going
to discuss a topic closely related to computational complexity – complexity problems.
We serialised computational complexities in an ascending row: Op1q, Oplog nq, Opnq,
Opn log nq, Opn2 q, Opn3 q, Opnk q (where k is a constant), Op2n q, Opk n q (where k is a
constant), Opn!q, Opnn q. Let’s have a look in Tab. 1 that presents values of some func-
tions. Please notice that functions “greater” than polynomial have values far beyond
our imagination even for quite small arguments. Tab. 2 presents time of execution for
algorithms with various complexity. Again we can easily notice that overpolynomial
complexities result in completely unimaginable time values.
In practice we split complexities into acceptable (“reasonable”, “decent” algorithms
for “easy” problems) and unacceptable. Acceptable complexities are those up to poly-
nomial. Unacceptable are overpolynomial. For unacceptable complexity we try to find
acceptable approximate solutions.
1 P problems
Definition 1. A problem is a P problem if it can be solved in polynomial time (in
polynomial space) with a deterministic Turing machine.
A deterministic Turing machine is a “classic” Turing machine. It has a head that
moves over an infinite tape. The head reads a symbol on a tape, writes a symbol on
the tape, moves left, right, or does not change its locations. The behaviour of the head
depends both on data on the tape and the machine’s program.
Polynomial functions for complexity have been chosen for a very utilitarian reason.
These are slowly increasing functions – we can solve tasks even for very large data1 .
A sum or product of polynomial functions is a polynomial function. We can add, mul-
tiply polynomial complexities and we still have polynomial complexity.
sizes.
1
Table 1: Values of some functions. The number of protons in the Universe « 10125 ,
number of microseconds since the Big Bang « 1023 .
n 10 50 100 300 1000
5n 50 250 500 1500 5000
n2 100 2500 10000 90000 106
2n 1024 « 1015 « 1030 « 1090 « 10301
n! « 3.3 ¨ 106 « 3 ¨ 1064 « 9 ¨ 10157 « 10622 « 4 ¨ 102567
nn 1010 « 1084 10200 « 10743 103000
Table 2: Execution time for various time complexities. Execution time of one operation
t “ 1 ms. Just for comparison: the Big Bang was 13,7 billion years ago.
n 10 20 50 100 300
2
n 1{10000 s 1{2500 s 1{400 s 1{100 s 9{100 s
n5 1{10 s 3.2 s 5.2 m 2.8 h 28.1 day
2n 1{1000 s 1s 35.7 years 400 ¨ 1014 years « 1076 years
nn 2.8 h 3.3 ¨ 1012 years « 1071 years « 10186 years « 10729 years
2 NP problems
Before we discuss N P problem we have to define a decision problem.
Definition 2. A decision problem is a problem that can be answered with «yes» or «no».
The N P problem is defined in two equivalent ways.
Definition 3. A decision problem in a nondeterministic polynomial problem (N P prob-
lem) if it can be solved in polynomial time with a nondeterministic Turing machine.
Definition 4. A decision problem is a nondeterministic polynomial problem (N P prob-
lem) if its solution can be verified in polynomial time with a deterministic Turing ma-
chine.
A nondeterministic Turing machine is a Turing machine that can be simultan-
eously in many states. A deterministic Turing machine is always in exactly one state.
A nondeterministic Turing machine may be in one or more states. Having read a sym-
bol on the tape it transits from a set of states to a new set of states.
2
Example 2. Let’s assume a program of a nondeterministic Turing machine defines
transitions from each state to two new states. If a machine starts in state 0, after the
first transition it is in 21 “ 2 states, after two transitions it is in 22 “ 4 states, and
after 10 transitions it may be even in 210 “ 1024 states simultaneously. (finis)
If a machine has n states, then a deterministic machine can be each of n states
(one at a time), whereas a nondeterministic machine can be in each of 2n ´ 1 subsets
of states.
P problems can be solved in the polynomial time. Solutions to P problems can be
verified in the polynomial time. So a P problem is an N P problem (cf. Def. 4). This is
a very important conclusion. It states that
P Ď N P. (1)
Problem 1. Prove P “ N P or P ‰ N P . ˛
Problem 1 was stated in 1971 by Stephen Cook. It is one of six Millennium Prize
Problems. If you solve it you can win one million dollars and eternal glory. This is one
of three still unsolved millennium problems.
The computers we have are deterministic. In general we can simulate nondetermin-
istic machines but such a simulation has exponential complexity (each subset of n
states (and there are 2n subsets) in a nondeterministic machine is represented with
one state in a deterministic machine). We cannot simulate nondeterministic machines
in polynomial time and we do not know if it is possible. This is why N P problem are
difficult for us to solve. But if we have a solution, we can quickly (in polynomial time)
verify it.
Example 3. Question: Does there exits a Hamiltonian cycle shorter than d in a graph?
The answer is difficult, because we have to find a path. In general we do not know an
exact algorithm with complexity less than Opn!q. But if someone finds a Hamiltonian
cycle, the verification is trivial: we have to sum up all weights in the path. It can be
done in the linear time. (finis)
3 NP-complete problems
Let’s introduce a very important notion – reduction of a computational problem.
3
Definition 6. Problem D is an A-hard problem, if for each problem A P A relation
A ďT D is true.
It means that all problems in set A are not harder than problem D.
Definition 7. If problem D is an A-hard problem and simultaneously D P A, then
problem D is an A-complete.
In other words: if D is A-complete, then it is the hardest problem in set (class)
A. Each problem in set A can be reduced to it. So if we manage to solve problem D,
then we solve all problems in class A. Solution of one A-complete problem solves all
A-complete problems. But not for each class B there exist B-complete problems.
An uninteresting conclusion states that all P problems are P -complete. If we can
solve one P problem, we can solve all P problems. It is not very interesting because
all P problems are easy to solve. More interesting case are complete classes for which
we do not know an effective solution. An interesting example is the N P -complete
class.
Let’s use definition 7 to define N P -complete problems.
Definition 8. N P -complete problem (N P C) is an N P problem that each N P problem
can be reduced to in the polynomial time.
The definition above implies that if we can solve any N P -complete problem in
polynomial time, then we can solve all N P problems in polynomial time. Each N P -
complete problem can be reduced to any other N P -complete problem. N P -complete
problem are the hardest problems in the N P class.
Example 5. Examples of N P -complete problems:
1. Boolean satisfiability problem: For a certain logical formula does there exist
such values of Boolean variables that the formula is true? Or simpler: is the
formula satisfiable?
Example 6. Is the formula a ^ b satisfiable? The solution is very simple
because there are only two variables. In the worst case we have to test four
substitutions. For values a “ 0 and b “ 1 the formula is true. If we had n
variables we would have to test even 2n cases. (finis)
2. Does there exist a travelling salesman path shorter than d? In a more formal
way: Is there a Hamiltonian cycle shorter than d in a certain graph?
3. Does there exist a common divisor of two natural numbers?
4. Does there exist a clique in a graph? A clique is a subgraph in which each pair
of nodes is connected with an edge.
5. The knapsack problem. Let’s assume we have n objects with a price pi and
weight wi each. The problem is to pack a knapsack with objects so that the
sum of prices of object is maximal. The knapsack has a weight limit we cannot
exceed.
4
6. Are two graphs isomorphic? Two graphs are isomorphic if there exists a map-
ping of nodes of one graph onto nodes of the other graph so that the correspond-
ing edges in both graphs match. The relation is symmetric. We cannot solve it
without testing of all possible mappings.
7. Can a graph be coloured with n colours? Graph colouring is assignment of
all nodes in a graph with colours so that each edge is incident to nodes with
different colours.
4 NP-hard problems
Basing on definition 6 we define an N P -hard problem as:
Definition 9. Problem D is an N P -hard problem, if each N P problem can be reduced
to it.
In other words: N P -hard problem (N P H) is a problem at least as hard as any
N P problem.
Please notice that an N P -hard problem is not necessarily an N P problem – it
may be harder than N P .
Conclusions:
• A problem whose decision counterpart is N P -complete is N P -hard.
5
complexity
NPH NPH
NPC
NP P, N P, N P C
P ‰ NP P “ NP
´ ´` ˘ 1 2
¯¯
3. factorisation of integers (in function of number b of bits): O exp 64
9 b
3
plog bq 3
5 Heuristics
We cannot solve N P -complete and N P -hard problems in the polynomial time.
We do not even know whether polynomial solutions exist. But we have to solve such
6
problems. In such a case very often we use heuristics2 .
Definition 10. A heuristic is a polynomial algorithm that solves approximately (but
satisfactorily) a problem that we cannot solve exactly in the polynomial time.
Heuristics commonly elaborate acceptable solutions, although we know these solu-
tions are not always optimal. Very often if a heuristic returns an optimal solution we
do not know it is optimal.
Example 9. An exact solution of the travelling salesman problem has complexity
Opn!q. We can easily propose a heuristic that starts in any town, moves to the nearest
unvisited town and again moves to the nearest unvisited town until all towns are
visited. This heuristic has complexity Opn2 q. The returned solution is not optimal but
usually it is quite good and acceptable. (finis)
Problem 2. Prove that the exact solution of the travelling salesman problem has
complexity Opn!q. ˛
2 The word heuristic origins from Greek εὑρίσϰω – ‘I find, I discover’. This verb is commonly known in
the perfect tense: εὕρηϰα, ηὕρηϰα – ‘I have found, I have discovered’. This is the word Archimedes shouted
after he had discovered Archimedes’ principle.
7
Greedy algorithms
Krzysztof Simiński
Algorithms and data structures
lecture 14, 19th May 2020
1
1 procedure greedy_travelling_salesman_problem ( G “ pV, E, wq )
2 // initialization:
3 foreach v P V do
4 v . state Ð unvisited ; // all vertices are unvisited
5 end foreach ;
6
21 end procedure ;
memory takes longer time. Access to data stored on a hard disk is even longer. We
would like to store all variables in registers. Unfortunately the number of register in
a processor is limited. A compiler has to optimise allocation of variables to registers.
Let’s analyse the piece of code below:
1 procedure register_allocation
2 b Ð initialisation ;
3 c Ð initialisation ;
4
5 a Ð b + c;
6 d Ð a ∗ 5;
7 e Ð d + b ´ a;
8 f Ð c + b + 4 ∗ e;
9 g Ð f ´ 8 ∗ e;
10 h Ð d + g;
11
12 print h ;
13 end procedure ;
In a naïve approach we use a separate register for each variable. We can easily notice
that some variables may share the same register. For example we use variable a in line
5 for the first time. For the last time we use it in line 7. Variable e we use in line 7 for
the first time and in line 9 for the last time. But please notice that a for the last time in
line 7 on the right side of assignment. Variable e is use for the first time in the same
line on the left side of assignment. This is why we can use the same register for both
2
a and e variables.
Let’s denote the first usage of the variable with and the last usage with and
write the code once more.
b c a d e f g h
b Ð initialisation ;
c Ð initialisation ;
a Ð b + c;
d Ð a ∗ 5;
e Ð d + b ´ a;
f Ð c + b + 4 ∗ e;
g Ð f ´ 8 ∗ e;
h Ð d + g;
print h;
Now we can easily detect conflicts. Let’s use a graph to represent them. Each ver-
tex in the graph stands for a variable. If two vertices are joined with an edge it means
the variable they represent are in conflict.
a
h b
g c
f d
e
To solve to problem of register assignment we use the graph vertex colouring ap-
proach. In this approach we colour vertices of a graph in such a way that (1) if two
vertices are joined with an edge they have different colours; (2) the number of colours
is minimal.
Unfortunately this is an N P -hard problem and we do not know if a polynomial
solution exists. This is why we use the greedy algorithm presented in Fig. 2. Let’s
encode colours with numbers: 1 (red), 2 (green), 3 (blue), and 4 (yellow). We take the
first colour (red) and try to colour vertices (we search vertices in alphabetical order).
So we colour a, e, g, h with red. It is not possible to colour any more vertices with
red. So we take the next colour (green) and repeat the procedure. Vertices b and f are
green. Vertices c and d need different colours.
3
1 procedure greedy_graph_vertex_colouring ( G “ pV, Eq )
2 // initialization:
3 foreach v P V do
4 v . colour Ð ∅ ;
5 end foreach ;
6 colour Ð 0 ; // colours are numbers
7
8 // colouring:
9 while there exist vertices without colour then
10 colour Ð colour + 1 ; // take the next colour
11 foreach vertex v in the graph do
12 i f v . colour = ∅ then
13 permition Ð t r u e ;
14 foreach neighbour u of vertex v do
15 i f u . colour = colour then
16 permition Ð f a l s e ;
17 end i f ;
18 end foreach ;
19 i f permition = t r u e then
20 v . colour Ð colour ;
21 end i f ;
22 end i f ;
23 end foreach ;
24 end while ;
25 end procedure ;
4
a
h b
g c
f d
e
5 R1 Ð R2 + R3 ;
6 R4 Ð R1 ∗ 5;
7 R1 Ð R4 + R2 ´ R1 ;
8 R2 Ð R3 + R2 + 4 ∗ R1 ;
9 R1 Ð R2 ´ 8 ∗ R1 ;
10 R1 Ð R4 + R1 ;
11
12 print R1 ;
13 end procedure ;
Problem 4. In real life processors the task is more difficult, because the number of
registers is limited. Thus the task is: Optimise variable allocation of k variables in l
registers. Try to propose a modification of the graph vertex colouring algorithm for
this task. ˛
Greedy approach does not always elaborate optimal solutions.
Example 1. Let’s focus on the following example. Let’s visit the vertices in the fol-
lowing sequence: A, B, C, D, and E. W choose the first colour and we colour A and
C with red. With the second colour we can colour three vertices.
5
D
A B C
Let’s run this example once again. Let’s visit the vertices in the following sequence:
E, D, C, B, and A. We choose the first colour and we colour E, D, and C with red.
With the second colour we can only colour B and we need the third colour for A.
A B C
Problem 5. What are the bounds on the chromatic number for a graph with n ver-
tices and m edges? What are the properties of a graph with the minimal (maximal)
chromatic number? ˛
Example 2. Our task is to design traffic lights in a four street crossroad. The are four
streets labelled N(orth), E(ast), S(outh), W(est). The north street is a one way street.
Let’s draw all possible paths on the crossroad.
1 Gr. χρῶµα ‘colour’
6
N
W E
Let’s draw a graph in which each path (eg. NW: north Ñ west) is a separate node.
Each edge represents a conflict (eg. path NE intersect with path EW). Let’s assume
we would like to design a very safe traffic and paths with the same destination are in
conflict (eg. NS and ES).
NW
ES
NS
EW
NE
SE
SW WS
WE
To solve our problem we use the graph vertex colouring algorithm. We start with ver-
tex WE, visit vertices clockwise, and try to colour them with the first colour (red).
7
Then we take the next colour (green) and start the next uncoloured vertex (SW). Hav-
ing coloured with green we take the next colour (blue) and visit the next unvisited
vertex (ES). The last two unvisited vertices (NS and NE) can be coloured in brown.
Finally we have coloured all vertices with four colours.
NW
ES
NS
EW
NE
SE
SW WS
WE
We cannot colour the graph with less than 4 colours, because there is a four-vertex
clique in the graph (ES, NS, WE, SW).
Problem 6. Are there any more cliques in the graph? ˛
8
N
W E
(finis)
Problem 8. The solution we have elaborated in Example 2 has some faults. We cannot
add any more paths to the red group. But we can add path WS to the green group –
there would be no conflict. But in our solution each path belongs only to one group.
Modify the colouring algorithm to allow multiple membership of a path to several
groups. ˛
9
1 procedure greedy_graph_edge_colouring ( G “ pV, Eq )
2 // initialization:
3 foreach e P E do
4 e . colour Ð ∅ ;
5 end foreach ;
6
7 // colouring:
8 while there exist edges without colour then
9 colour Ð colour + 1 ; // take the next colour
10 foreach edge e = ( u , v ) in the graph do
11 i f e . colour = ∅ then
12 permition Ð t r u e ;
13 foreach ( u , s ) P E do
14 i f ( u , s ) . colour = colour then
15 permition Ð f a l s e ;
16 end i f ;
17 end foreach ;
18 foreach ( v , s ) P E do
19 i f ( u , s ) . colour = colour then
20 permition Ð f a l s e ;
21 end i f ;
22 end foreach ;
23
24 i f permition = t r u e then
25 e . colour Ð colour ;
26 end i f ;
27 end i f ;
28 end foreach ;
29 end while ;
30 end procedure ;
10
A B C D E F G H
A 3 3 3 3 3 3
B 3 3 3 3 3
C 3 3 3 3 3 3
D 3 3 3 3 3
E 3 3 3 3 3
F 3 3 3 3 3
G 3 3 3 3 3 3
H 3 3 3 3
We can see the matrix is not symmetrical. A likes F, but F does not like A. We
need mutual friendship, so we modify the matrix and put a checkmark if friendship
is mutual.
A B C D E F G H
A 3 3 3 3
B 3 3 3 3
C 3 3 3 3
D 3 3 3 3
E 3 3 3 3 3
F 3 3 3 3
G 3 3 3 3 3
H 3 3 3
Let’s use graph approach to solve this problem. Let each person be represented by
a vertex of a graph and each friendship – an edge.
A
H B
G C
F D
E
We take the first colour (red) and try to colour as many edges as only possible. It
can be done in many ways. This is only one of many colourings.
A
H B
G C
F D
E
11
We take the second colour (green) and try to colour as many edges as only possible.
A
H B
G C
F D
E
We take the third colour (blue) and try to colour as many edges as only possible.
Because blue may be hard to distinguish from black, we draw blue edges dashed.
A
H B
G C
F D
E
We take the fourth colour (magenta) and try to colour as many edges as only pos-
sible. Because magenta may be hard to distinguish from red, we draw magenta edges
dashed.
A
H B
G C
F D
E
12
magenta A-E, B-D, C-G
black C-E, F-G
13
Pattern search
Krzysztof Simiński
Algorithms and data structures
lecture 15, 22nd May 2020
Pattern search is a quite frequent task in text analysis and edition. By ‘texts’ we
mean not only natural language texts, but also sequences of aminoacids or nucleotides.
There are a lot of algorithms known, today we discuss only a few – each of them
represents a different approach.
All pattern search algorithms have linear time complexity with regard to text
length and pattern length. Texts in which we commonly search patterns are very
long (millions of symbols). Reduction of a coefficient in the linear term may have
significant influence on execution time of a pattern search algorithm.
1 Some terms
We do not define a symbol. We assume we can understand a symbol in the same
way from examples.
Example 1. A letter is a symbol: ‘a’, ‘b’, ‘c’, . . . , ‘z’, ‘α’, ‘β’, ‘γ’, . . . , ‘ω’. Digits are
symbols: ‘0’ and ‘1’. (finis)
Definition 1. An alphabet is a finite nonempty set of symbols.
Example 2. Examples of alphabets:
• t0, 1u,
• tdash, dot, spaceu,
• ta, b, c, . . . , zu,
• tα, β, γ, . . . , ωu
1
Definition 4. The empty word ε has no symbols. Its length is zero.
Definition 5. Concatenation of words s and t is word whose lengths is |s| ` |t| and
which composed of word s followed by word t.
Example 4. The pair of words s “ lla and t “ ma has two concatenations st “
llama and ts “ malla. (finis)
Definition 6. Word p is a prefix of word w (noted as p Ă w) if w “ py for some word
y. Word y may be empty. It is true that |p| ď |w|.
Example 5. Prefix of word w is a subword that shares the first symbol with word w.
Word abracadabra has 11 prefixes:
a,
ab,
abr,
abra,
abrac,
abraca,
abracad,
abracada,
abracadab,
abracadabr,
abracadabra.
Please not that the longest prefix of word abracadabra is the word abracadabra
itself. (finis)
The kth prefix (whose length is k) of word w will be noted as wr1 . . . ks.
2
1 procedure naive ( T , P )
2 // T: text
3 // P: pattern
4
5 TextLen Ð length ( T ) ;
6 PatternLen Ð length ( P ) ;
7
8 f o r i Ð 1 to TextLen ´ PatternLen + 1 do
9 location Ð i ;
10 match Ð t r u e ;
11 f o r j Ð 1 to PatternLen do
12 i f T [ j + i ´ 1 ] ‰ P [ j ] then
13 match Ð f a l s e ;
14 break ;
15 end i f ;
16 end f o r ;
17 i f match = t r u e then
18 write ( " pattern starts at " , location ) ;
19 end i f ;
20 end f o r ;
21 end procedure ;
2 Naïve algorithm
The naïve approach (Fig. 1) checks all possible alignments of a pattern in a text. In
each iteration of the external loop in lines 8-20 we test each alignment of the pattern
in the text. In the internal loop (lines 11-16) we test the match for one alignment.
3 Knuth-Morris-Pratt algorithm
The naïve algorithm is not very effective. In Fig. 2 we have an example of a partial
match of pattern p and text t. Two symbols match, but the third does not. The naïve
algorithm aligns a pattern and tests the match. But we already know there is no match.
We have tested the symbol in the text that corresponds the pattern symbol for i “ 2
and we know there cannot be any match for i “ 3. Unfortunately the naïve algorithm
neglects the information and tests naïvely if there is a match in the alignment. We can
use the information – so does Knuth-Morris-Pratt algorithm (Fig. 3).
3
a c a b a b b a b t
i“2
a b b t
Figure 2: An example of the partial match of pattern p and text t. Vertical lines rep-
resent matches, a zigzag – no match.
The algorithm bases on the naïve approach, but aligns the pattern in more efficient
way. Some of alignments are skipped. In the naïve approach the pattern is shifted one
symbol right (line 8 in Fig. 1) in each iteration. In the Knuth-Morris-Pratt algorithm
the pattern is shifted one or more symbols right (line 26 in Fig. 3). We use the prefix
function π.
Definition 9. The prefix function is the length of the longest prefix of pattern p that is
a proper suffix of pr1 . . . ks:
Example 7. Let’s calculate the prefix function for pattern ‘abracadabra’. For each
prefix whose length is k we test what is the longest proper suffix of the prefix that
simultaneously is the longest prefix of pattern p.
For k “ 9 the prefix is ‘abracadab’. For the prefix we have to find the longest
proper suffix that is the prefix of the pattern.:
abracadab abracadabra
We test how many symbols that end the left word are simultaneously the prefix of the
right word. In the example the words share two-symbol sequence (it is underlined).
In the same way we calculate value for all k P r1, 11s.
q pr1 . . . qs p πpq, wq
1 a abracadabra 0
2 ab abracadabra 0
3 abr abracadabra 0
4 abra abracadabra 1
5 abrac abracadabra 0
6 abraca abracadabra 1
7 abracad abracadabra 0
8 abracada abracadabra 1
9 abracadab abracadabra 2
10 abracadabr abracadabra 3
11 abracadabra abracadabra 4
(finis)
Example 8. Let’s run the Knuth-Morris-Pratt algorithm for the following example.
4
1 procedure Knuth_Morris_Pratt ( T , P )
2 // T: text
3 // P: pattern
4 TextLen Ð length ( T ) ;
5 PatternLen Ð length ( P ) ;
6 π Ð prefix_function ( P ) ;
7 i Ð 1;
8
18 i f matched_symbols = 0 then
19 i Ð i + 1; // no symbols matched
20 e l s e // some number of symbols matched
21 i f matched_symbols = PatternLen then
22 write ( " pattern starts at " , i ) ;
23 end i f ;
24
25 // pattern shift
26 i Ð i + matched_symbols ´ π [ matched_symbols ] ;
27 end i f ;
28 end while ;
29 end procedure ;
5
a c a b r a b r a c b r a
i“2
a b r a c a d a b r a
We have managed to match four symbols (‘a’, ’b’, ’r’, ‘a’). For the fifth symbol there
is no match. We have to shift the pattern for a new alignment. The value of variable
matched_symbols is 4. The value of the prefix function πp4q “ 1, thus we shift the
pattern by 3 symbols (line 26 in Fig. 3). (finis)
Problem 2. Calculate prefix functions for patterns
• ‘aaaaa’,
• ‘abcde’,
• ‘abcabc’.
4 Rabin-Karp algorithm
The Rabin-Karp algorithm has a very interesting approach to pattern search. A
pattern is a sequence of symbols similarly as a number is sequence of digits. A num-
ber is a sequence of digits, eg. 123 is composed of three symbols ‘1’, ‘2’, ‘3’. We can
approach pattern in the same way and treat sequences of symbols as numbers.
Example 9. Let’s analyse pattern P “‘rabarbar’. There are three symbols in the
sequence, so we have a ternary numeral system. Let’s assign symbols with numeric
values: ‘r’: 2, ‘a’: 1 i ‘b’: 0. And finally let’s elaborate the numeric value p of pattern P :
p “pppppploomo
2 on ¨3 ` loomo
1 onq ¨ 3 ` loomo
0 onq ¨ 3 ` loomo
1 onq ¨ 3 ` loomo
2 onq ¨ 3`
r a b a r
0 onq ¨ 3 ` loomo
loomo 1 onq ¨ 3 ` loomo
2 on “ 5243
b a r
(finis)
In the similar way we transform a piece of text with the same length as the pattern
in an integer number. Now we only need to compare two integers. Comparison of
integers is much faster than comparison of sequences of symbols. However there are
two problems:
• Number representing sequences may be too large to fit processor registers.
• Calculating a number for a piece of text for each alignment requires iteration
through the piece of sequence.
The first problem is solved with modulo division. The second problem can also be
solved. Let’s have an example.
6
Example 10. Let’s use a decimal representation of numbers. We have a long integer.
5 1 7 9 2 5 8 0 2 1 8 4 5
The red window holds a five digit number 92580. In the next step we want to calculate
the value of a number in the blue window (shifted one digit right). The procedure
is quite simple. First we subtract 90000 from 92580. Then we multiply the result by
10 (or shift the window right). We get 25800. And we only need to add 2 to get the
final results 25802. If we have elaborated the previous number we only need three
integer operations to get the next number: subtraction,
` multiplication ˘and addition.
Let’s write down these operations for our example: 92580 ´ 9 ¨ 104 ¨ 10 ` 2 “
25802. (finis)
w “pppppploomo
2 on ¨3 ` loomo
1 onq ¨ 3 ` loomo
0 onq ¨ 3 ` loomo
1 onq ¨ 3 ` loomo
2 onq ¨ 3`
r a b a r
0 onq ¨ 3 ` loomo
loomo 1 onq ¨ 3 ` loomo
2 on mod 7 ” 0.
b a r
In the same way we calculate the numeric value t1 for the first piece of text ‘barabara’
(with the same length as the pattern):
The value ti is used to calculate the next value ti`1 with formula
´ ¯
ti`1 “ ti ´ T ris ¨ p|W |´1 ¨ p ` T ri ` |W |s mod q, (2)
where |W | is pattern length, T ris stands for the ith symbol of the text T , p is the
base of numeric system (ie. the number of symbols in the alphabet). In our example:
|W | “ 8, p “ 3.
And we calculate:
t2 “ p3 ´ 0 ¨ 3q ¨ 3 ` 0 ” 2 (5)
7
1 procedure Rabin_Karp ( T , P , d , p )
2 // T: text
3 // P: pattern
4 // d: number of symbols in alphabet
5 // q: big number
6
7 TextLen Ð length ( T ) ;
8 PatternLen Ð length ( P ) ;
9 h Ð dPatternLen ´ 1 mod q ;
10 p Ð 0;
11 t0 Ð 0 ;
12
13 f o r i Ð 1 to PatternLen do
14 p Ð ( d ∗ p + P [ i ] ) mod q ;
15 t1 Ð ( d ∗ t0 + T [ i ] ) mod q ;
16 end f o r ;
17
18 f o r s Ð 1 to TextLen ´ PatternLen + 1 do
19 i f p = ts then // pattern maybe found
20 f o r j Ð 1 to PatternLen do
21 i f T [ j + s ´ 1 ] ‰ P [ j ] then
22 match Ð f a l s e ;
23 break ;
24 end i f ;
25 end f o r ;
26 end i f ;
27
28 i f match = t r u e then
29 write ( " pattern starts at " , s ) ;
30 end i f ;
31
37 end procedure ;
8
There is no match.
t3 “ p2 ´ 1 ¨ 3q ¨ 3 ` 1 ” 5 (6)
t4 “ p5 ´ 2 ¨ 3q ¨ 3 ` 2 ” 6 (7)
t5 “ p6 ´ 1 ¨ 3q ¨ 3 ` 0 ” 2 (8)
t6 “ p2 ´ 0 ¨ 3q ¨ 3 ` 1 ” 0 (9)
We have the same numeric value for the text and the pattern: w “ t6 . We use the
modulo operation, so the same numeric value for the text and for the pattern does not
imply the match of sequences. We have to test if W “ T r6 . . . 13s (line 20 in Fig. 4).
There is not match, we get a false alarm (false positive). Let’s try further:
t7 “ p0 ´ 1 ¨ 3q ¨ 3 ` 2 ” 0 (10)
5 Automaton
The last algorithm we discuss is a deterministic finite automaton.
Definition 10. A finite deterministic automaton M is a tuple M “ pQ, q0 , A, Σ, δq,
where:
• Q is a set of states,
• q0 P Q – initial state,
• A Ď Q – set of accepting states,
• Σ – alphabet,
• δ : Q ˆ Σ Ñ Q – transition function.
Example 12. Let’s define an automaton with the set of states Q “ t0, 1, 2, 3, 4, 5u,
initial state 0, set of accepting states A “ t5u, alphabet Σ “ ta, bu, and transition
function δ defined as:
a b
Ñ 0 1 0
1 2
2 3
3 4 1
4 5
5 3 2
9
1 procedure automaton ( T , δ )
2 // T: text
3 // δ: transition function
4
5 TextLen Ð dlugosc ( T ) ;
6 state Ð 0 ;
7
8 f o r i Ð 1 to TextLen do
9 state Ð δ ( state , T [ i ] ) ;
10 i f state is an accepting state then
11 write ( " pattern starts at " , i ´ PatternLen + 1 )
;
12 end i f ;
13 end f o r ;
14 end procedure ;
b
b a
a a b a a
0 1 2 3 4 5
b
b
The automaton starts in state 0. If an input is aabaa, the automaton transits to state 5,
that is an accepting state, what means that the pattern has been detected. The pseudo-
code for detection of a pattern is presented in Fig. 5. (finis)
Problem 4. What patterns does the automaton in Example 12 detect?
Let’s build an automaton for pattern detection. We have to calculate the transition
function. We use the suffix function.
Definition 11. The suffix function for the pattern W and sequence B is the length of
the longest prefix of pattern W , that is simultaneously suffix of B:
σpB, W q “ max B Ą W r1 . . . ks (11)
k
10
1 procedure transition_function ( P , Σ )
2 // P: pattern
3 // Σ: alphabet
4
5 PatternLen Ð length ( P ) ;
6
7 f o r state Ð 0 to PatternLen do
8 foreach symbol in Σ do
9 k Ð min ( PatternLen + 1 , state + 2 ) ;
10 repeat
11 k Ð k ´ 1;
12 u n t i l P [ 1 . . k ] if a prefix of P [ 1 . . state ] + symbol ;
13 δ ( state , symbol ) Ð k ;
14 end foreach ;
15 end f o r ;
16
17 r e t u r n δ ; // transition function
18 end procedure ;
Example 14. Let’s elaborate an automaton for pattern ‘rabarbar’. We have to analyse
transitions for all symbols in the alphabet:
0. δp0,‘a’q “ σp`‘a’,‘rabarbar’q “ σp‘a’,‘rabarbar’q “ 0
δp0,‘b’q “ σp`‘b’,‘rabarbar’q “ σp‘b’,‘rabarbar’q “ 0
δp0,‘r’q “ σp`‘r’,‘rabarbar’q “ σp‘r’,‘rabarbar’q “ 1
11
5. δp5,‘a’q “ σp‘rabar’`‘a’,‘rabarbar’q “ σp‘rabara’,‘rabarbar’q “ 2
δp5,‘b’q “ σp‘rabar’`‘b’,‘rabarbar’q “ σp‘rabarb’,‘rabarbar’q “ 6
δp5,‘r’q “ σp‘rabar’`‘r’,‘rabarbar’q “ σp‘rabarr’,‘rabarbar’q “ 1
6. δp6,‘a’q “ σp‘rabarb’`‘a’,‘rabarbar’q “ σp‘rabarba’,‘rabarbar’q “ 7
δp6,‘b’q “ σp‘rabarb’`‘b’,‘rabarbar’q “ σp‘rabarbb’,‘rabarbar’q “ 0
δp6,‘r’q “ σp‘rabarb’`‘r’,‘rabarbar’q “ σp‘rabarbr’,‘rabarbar’q “ 1
(finis)
Problem 5. What is the sequence of states visited by the automaton calculated in Ex-
ample 14 for the text ‘abarrabarbarabar’?
Problem 6. Does the automaton detect two patterns if they overlap in the text?
Problem 7. Calculate automata for patterns:
• ‘aaaaa’,
• ‘abcde’,
• ‘abcabc’.
12