You are on page 1of 34

A Parallelized Naïve Algorithm for Pattern

Matching
William Svensson

Bachelor Thesis/15 hp
Bachelor Program in Computer Science
2022
Abstract

The pattern matching is the problem of locating one string, a pattern, inside
another, a text, which is required in for example databases, search engines,
and text editors. Thus, several algorithms have been created to tackle this
problem and this thesis evaluates whether a parallel version of the Naïve
algorithm, given a reasonable amount of threads for a personal computer,
could become more efficient than some state-of-the-art algorithms used to-
day. Therefore, an algorithm from the Deadzone family, the Horspool al-
gorithm, and a parallel Naïve algorithm was implemented and evaluated on
two different sized alphabets. The results show that a parallel Naïve imple-
mentation is to be favoured over the Deadzone and Horspool on a alphabet
of size 4 for patterns larger than 2 up to 20. Furthermore, for alphabet of size
256 the parallel Naïve should also be used for patterns of lengths 1 to 20.
Contents
1 Introduction 1

2 Background 2
2.1 Parallel programming . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Performance Criteria for Parallel Computing . . . . . . . . . . . . 3
2.3 The Naïve Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 The Knuth-Morris-Pratt Algorithm . . . . . . . . . . . . . . . . . 6
2.5 The Horspool Algorithm . . . . . . . . . . . . . . . . . . . . . . 7
2.6 The Deadzone Algorithm . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Method 12
3.1 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Results 15
4.1 Research Question Results . . . . . . . . . . . . . . . . . . . . . 15
4.2 The Parallel Naïve . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Clarifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Discussion & Analysis 23

6 Future Work 26
1 Introduction

The pattern matching problem is the problem of finding one string, a pattern, in
another equally long or longer string, a text. This is a problem that is required to
be solved in an efficient manner in for example text editors, databases, and search
engines. There are two main types of string matching: exact and approximate
matching with their respective algorithms. Although, this report will only concern
itself with exact matching. The need for efficient methods and algorithms for
these tasks is increasing as the demand for faster processing in, for example, data
retrieval and molecular biology increases (Singla and Garg 2012, 218). A more
precise example is the need for faster pattern matching in the medical field as the
patient record are becoming increasingly larger (Singla and Garg 2012, 220).

Several algorithms and families of algorithms have been created to solve the pat-
tern matching problem. For example, the Knuth-Morris-Pratt algorithm (Knuth,
Morris, and Pratt 1977), the Boyer-Moore algorithm (Boyer and Moore 1977), and
the Horspool algorithm (Horspool 1980). However, some of the algorithms for
solving the pattern matching problem can be complicated and time consuming to
implement and as parallel programming becomes increasingly prominent (Pacheo
2011, 1-3) how these perform in contrast with a simpler solution should be investi-
gated. One such solution is a parallel Naïve algorithm for pattern matching, which
is relatively simple as opposed to the more complex algorithms mentioned above.

Therefore, this thesis studies the question: Can a parallelized version of the Naïve
solution for the pattern matching problem become more efficient than a state-of-
the-art algorithm from the Deadzone family and the Horspool algorithm with re-
gards to wall-clock time on a personal computer?

This is answered by implementing a parallel version of the Naïve algorithm, a


Deadzone algorithm, and the Horspool algorithm for conducting experiments on
different texts of different sized alphabets in order to examine if some applications
and/or conditions favour one algorithm more. Since the purpose of this study is to
evaluate if a parallel Naïve algorithm for pattern matching is a viable alternative
to some state-of-the-art algorithms, the more complex algorithms will be serial.
The output of the experiments, and what is used to contrast the algorithms with

1
eachother, is the wall-clock time, meaning the time it takes the algorithms to finish
searching the texts for the patterns since wall-clock time is the most relevant for
the users of such applications.

2 Background

This section explains the theory necessary to understand the results and discussions
of this report. However, firstly a quick terminology in the context of this report is
required. An alphabet is a set of different characters used in strings. Meaning that
if a text uses an alphabet of size 4, the text can only include 4 distinct characters.
A text is the longer string to be searched in and a pattern is the potential sub-string
which is being searched for. During this report two specific sizes of alphabets will
be used: an alphabet of size 4, which will be named the smaller alphabet, and a
alphabet of size 256, denoted by the greater alphabet.

Pacheo (2011, 51) describes two important terms: a critical section in a parallel
program is a section which only one thread can execute at a time and a program
that is run on one thread is said to be serial. Another term in parallel programming
is embarrassingly parallel which is used to describe programs that can easily be
parallelized (Pacheo 2011, 48).

2.1 Parallel programming

Parallel programming has become increasingly important since many manufactur-


ers of microprocessors started to pursue the strategy of adding several simple pro-
cessors to one chip, called multicore processors, instead of trying to create faster
processors. This gave rise to the need for software engineers to develop parallel
programs to increase the computational power since serial programs are unable to
use several processors at once (Pacheo 2011, 1-3).

One API used for parallel programming is OpenMP which is designed for incre-
mental parallezation of serial programs (Pacheo 2011, 209-210), which lends itself

2
perfectly to the experiments in this report. OpenMP also allows the developer to
state which block of code that should be executed in parallel and the developer
can leave it to the compiler to divide the work between the threads. For example,
OpenMP has a directive called parallel for. This directive has to be followed
by a for-loop and will divide the iterations of the loop evenly between a desired
amount of threads (p.224-225). This becomes useful if the iterations of the loop
are independent of eachother, meaning that iteration i + 1 can be computed before
iteration i has finished.

2.2 Performance Criteria for Parallel Computing

For measuring the performance of a parallel program it is useful to look at the


Speedup, which is a measurement of how well the work is divided between the
cores (Pacheo 2011, 58). The speedup is defined in accordance of Equation 1:

Ts
S= (1)
Tp

where Ts is the amount of time the serial version of the program took to finish a
problem and Tp is the amount of time it took for the parallel version of the program
to finish the same problem with the same problemsize using p number of threads.
The speedup is dependent on the number of processes/threads used and a program
with a speedup equal to the number of processes used is said to have linear speedup
which is the best possible speedup for a program. Although in practise this is
unusual because of the overhead introduced in most parallel programs (Pacheo
2011, 58). Following the definition of Equation 1, one important detail to notice is
that a speedup less than 1 means that the program becomes slower if more threads
are introduced.

Pacheo (2011, 63) also mentions that once development of a program is done, the
interesting and useful part of the program to time is the algorithms themselves and
not the time it takes to print the results. Pacheo continues by explaining that the
time-measurement is usually the wall-clock time and not the CPU time since CPU
time excludes the time that the threads are idle, which could lead to misleading

3
results. Wall-clock time includes all the time taken from the start of the timed
section of the code to the end.

2.3 The Naïve Algorithm

The Naïve algorithm is a brute force algorithm that tries to match the first character
in the pattern with all characters in the text. The algorithm will only move the
pattern one character to the right regardless of how many characters in the pattern
matched, see Figure 1 for a visual example. Therefore, the possible time it will
take is independent of the alphabet size since the pattern will only move forward
one space at a time regardless of how many matches occurs before the mismatch.
The algorithm reads the text from left to right (Knuth, Morris, and Pratt 1977, 323;
Boyer and Moore 1977, 762).

Figure 1: The figure shows an example of the serial Naïve algorithm.

4
Algorithm 1 Naïve
Require: String text, pattern
Require: int count, patternIndex
1: for i = 0...text length - pattern length + 1 do ▷ Divide iterations evenly
between the threads
2: patternIndex ← 0
3: for j = 0...pattern length do
4: if text[i + j] ̸= pattern[j] then
5: End inner for-loop
6: else
7: patternIndex ← patternIndex + 1
8: end if
9: if patternIndex = pattern length then
10: count ← count + 1 ▷ Critical section
11: end if
12: end for
13: end for

One possible parallelization of the Naïve algorithm consists of dividing the longer
text between the threads since the shifts are independent of each other and of how
many of the characters in the pattern matched, which allows for the use of the
parallel for-directive and a relatively simple implementation. This results in
a overhead which is minimal since neither the pattern nor the text will be changed
and therefore all threads can read all strings and tables at the same time and mini-
mal synchronization is required. The algorithm have one critical section, depend-
ing on the implementation, when count is needed but this should only prove to be a
problem if several threads found the pattern and needed to add to count at the same
time, see row 10 in Algorithm 1. The probability of several threads to enter row
10 at the same time depends on the size of the pattern and the size of the alphabet.

For example, if an alphabet consists of the characters a, c, g, and t and the pattern
the algorithm is searching for is a then several threads will, most likely, match an a
with the pattern a at the same time and therefore require to enter the critical section
at the same time. This would create a queue of threads which are waiting to enter
this critical section. However, this is less probable to become a problem for larger

5
alphabets.

The serial algorithm has a time complexity of O(mn) where m is the length of
the pattern and n is the length of the string. Since the algorithm is embarrassingly
parallel the parallel version should show close to linear speedup.

2.4 The Knuth-Morris-Pratt Algorithm

Knuth, Morris, and Pratt (1977, 323-328) presented an algorithm (KMP) for pat-
tern matching which uses a prefix table. Their informal description of the algo-
rithm explains that it starts from the left and reads to the right and depending on
where the algorithm encounters a mismatch, it will shift forward a certain amount
of places since the pattern cannot exist in any of the skipped positions. This is
done by declaring a prefix table called by them next[], see Table 1 for an example
on the pattern agctaagctt. The prefix table is a table which stores the information
of the prefixes in the pattern.

Table 1: This table is an example of a prefix table used by the KMP algorithm for
the pattern agctaagctt.
j 0 1 2 3 4 5 6 7 8 9
a g c t a a g c t t
0 0 0 0 1 1 2 3 4 0

Essentially, if a mismatch occurs at text[i] ̸= pattern[j] the pattern should shift


j − next[j] places to the right in the text, where j is the index of the mismatch
in the pattern. Except if next[j] = 0 in which case the pattern should shift past
the current text character. For example, for a mismatch text[i] ̸= pattern[j] and
next[j] = 0 the next match attempt should start at text[i + 1], see Figure 2 for a
visual example which uses the prefix table displayed in Table 1. Therefore, the al-
gorithms efficiency is largely dependent on the pattern and its length, meaning that
a longer pattern have the possibility to include prefixes which could be exploited
by the table and allow for greater shifts.

6
Figure 2: The figure shows an example of the KMP algorithm using the prefix
table displayed in Table 1.

Knuth, Morris, and Pratt (1977, 323) explains that the time complexity of the al-
gorithm is O(m + n) where m is the length of the sought pattern and n is the
length of the text to be searched. Furthermore, the time to calculate the table is
O(m) (p.325). This complexity holds regardless of the alphabet size. However,
they explain that the algorithm might not be more efficient than the Naïve solution
in a realistic setting since its strength lies in the worst case which is uncommon
(p.328).

2.5 The Horspool Algorithm

Boyer and Moore (1977) discusses an algorithm (BM) which reads the pattern
from right to left and uses a bad match table (delta1 ) and a suffix table (delta2 ),
similar to the prefix table in KMP. However, Horspool (1980) argues that delta2
is mostly unnecessary in a realistic setting. He writes that the table can contribute

7
to a faster running time for repetitive patterns but that these are rare. Therefore,
Horspool proposes a version of the BM algorithm that only uses the bad match
table, delta1 , and saves the time it takes to calculate delta2 as well as the often
unnecessary comparisons between the two tables. Horspool experimentally com-
pares his version to a version closer to Boyer and Moores original algorithm and
confirms that the delta2 table contributed minimally in a normal setting.

Algorithm 2 Horspool
1: for i = 0...text length − pattern length + 1; i ← shif t(patternIndex, i)
do
2: patternIndex ← 0
3: for j = pattern length − 1...0 do
4: if text[i + j] ̸= pattern[j] then
5: Exit the inner for-loop
6: else
7: patternIndex ← patternIndex + 1
8: end if
9: end for
10: if patternIndex = pattern length then
11: A match has been found
12: end if
13: end for

Essentially, delta1 is used to find if the character in the text does occur in the
pattern, see Table 2 for an example on the pattern agctaagctt. If it does occur the
pattern is shifted to align the next occurrence of the character in the pattern with the
place in the text. If the character does not occur then the pattern is shifted passed
that point in the text, since the pattern cannot occur in that interval. To clarify,
if a mismatch occurs at text[i] ̸= pattern[j] the pattern is shifted delta1 [text[r]]
steps, where r is the texts index of the first match attempt. For an overview see
Algorithm 2 and for a visual example using the bad match table Table 2 see Figure
3. As with the KMP algorithm, the Horspool algorithms efficiency is dependant
on the pattern and therefore small changes in the pattern could greatly affect the
values in the table which consequently will affect how far the pattern will shift.

8
Table 2: This table is an example of a bad match table used by the Boyer-Moore
algorithm for the pattern agctaagctt. The entry marked * represents all other char-
acters in the alphabet.
a g c t *
4 3 2 1 10

Figure 3: The figure shows an example of the Horspool algorithm using the bad
match table displayed in Table 2.

As shown by Singla and Garg (2012, 220) the Horspool algorithms time complex-
ity is O(mn) where m is the length of the sought pattern and n is the length of the
text to be searched.

2.6 The Deadzone Algorithm

Watson and Watson (1997) presented a family of algorithms which, as opposed


to KMP, BM, and Horspool, starts somewhere in the text and shifts the pattern to

9
both sides instead of strictly to the right. The idea of the algorithms are to separate
the indices of the text into two sets, a live set and a dead set. The indices in the
live set are indices where the pattern could potentially occur while the dead set
contains the indices where the pattern cannot be found.

Furthermore, Daykin et al. (2018) have published an article explaining three strate-
gies for the Deadzone algorithm. However, this report will only concern itself with
the recursive variant called DZ − R2L. They specify that the algorithms uses the
tables of the BM algorithm to shift the pattern right and the table of the KMP al-
gorithm to shift left. However, according to Kourie, Watson, and Strauss (2012,
241) which algorithm that is responsible for which shift is inconsequential with
regards to the algorithms correctness.

Algorithm 3 Deadzone
Require: Lower bound, Upper bound
1: if Lower < U pper then
2: j ← Lower+U2
pper

3: i←0
4: while i < pattern length and pattern[i] = text[j + i] do
5: i←i+1
6: end while
7: if i = pattern length then
8: A match has been found
9: end if
10: lef t ← lef tShif t(i, j) + 1
11: Deadzone(Lower, lef t) ▷ Recursive call
12: right ← rightShif t(i, j) + 1
13: Deadzone(right, U pper) ▷ Recursive call
14: end if

The version of the Deadzone algorithm displayed in Algorithm 3 always begins in


the middle of the given livezone. Algorithm 3 assumes that the tables required have
already been calculated and that they are accessible to the shift functions. Upon
initializing the algorithm, Lower is the starting index of the desired livezone the
algorithm should check and U pper is the uppermost index that should be checked.

10
This algorithm is based on the DZ − R2L algorithm explained by Daykin et al.
(2018, 121) with some alterations. For example, Algorithm 3 reads the pattern
from left to right instead of from right to left and consequently uses KMP to shift
the pattern to the right and Horspool to shift the pattern to the left. As with the
KMP and Horspool algorithm, the efficiency of this Deadzone variant i dependant
on the pattern.

2.7 Related Works

The act of comparing different pattern matching algorithms is not a new basis
for a scientific study. Several researches have made contributions to the area by
comparing different algorithms under different circumstances.

For example, De V. Smit (1982) who compared the number of comparisons the
Naïve algorithm, named the straightforward method by him, the KMP algorithm,
and the BM algorithm did. He compared the algorithms in a realistic setting with
patterns of lengths 1 to 14 and found the BM algorithm to be superior for patterns
with a length over 3 while the Naïve algorithm was almost tied with the KMP
algorithm. So much so that De V. Smit later claims that the Naïve algorithm is
preferable over the KMP algorithm because of KMPs pre-processing phase. For
patterns of length 4, the BM algorithm skips 23 of all characters in the text-string,
if the pre-processing phase is ignored. In his conclusion he writes that a text editor
should use the BM algorithm for patterns over the length of three while the Naïve
algorithm should be used for patterns of or under three.

Another study from Mauch et al. (2012) compared five different implementa-
tions of algorithms from the Deadzone family, three versions of a recursive vari-
ant and two iterative variants. One of the recursive versions were programmed
in a object-oriented style with the rest in a C-style, although all implementations
were done in a C++ environment. Among the results they discuss “The Cost of
Object-Orientation” (p.65) which showed that the Deadzone-algorithm when im-
plemented in a C++ manner is three times slower for patterns of length 4 than the
same variant implemented in a C-type manner. Even greater, the C++-type imple-
mentation proved to be thirty-three times slower for patterns of length 16384 in

11
their experimental setting.

In the area of parallel string matching, Kouzinopoulos and Margaritis (2009) uses
CUDA to evaluate parallel implementations of the Naïve, KMP, Horspool, and
Quick-Search algorithms for string matching on a multicore GPU. The GPU has
30 multiprocessors with 240 cores each and a sum total of 30720 possible active
threads. Kouzinopoulos and Margaritis used patterns of length 25, 50, 200, and
800 on three different genomes and DNA sequences. The results are presented
in the form of speedup and showed that the Naïve algorithm gained most from a
parallelisation with the highest speedup on all tests conducted. How much greater
of a speedup showed to be dependent on which sequence they used. For one of the
three DNA sequences the Naïve algorithm had more than double of the speedup of
the other algorithms for patterns of length 25 and more than double for the other
pattern lengths. For patterns of length 800 on the same DNA sequence, the Naïve
algorithm had a speedup of 14 while the other algorithms had a speedup of less
than 1.

Furthermore, the area of pattern matching has several types of algorithms not dis-
cussed in this report. One such type was introduced by Karp and Rabin (1987)
which used hash-functions. The algorithms only tries to match the pattern if a part
of the text the size of a pattern gives the same hash-value as the pattern. This gives
the pattern matching part a constant time-complexity in most cases.

3 Method

The Deadzone, Horspool, and Naïve algorithms are implemented in the program-
ming language C. The API that is used for the parallel program in these experi-
ments is OpenMP, see Section 2.1 and Section 2.3 for an explanation of why it is
used. The computer which the experiments are conducted on has a thread-total of
8 divided over 4 cores. Therefore, the parallel Naïve should not gain a speedup
much greater than 4. However, up to 8 threads are used in order to simulate cases
such as an application where this algorithm is used and the program simply at-
tempts to use all available threads.

12
For one experiment, the critical section of the parallel Naïve algorithm is removed
and the resulting speedup is compared to the originals speedup, see Equation 1.
This was done to observe the effects that the critical section has on the overall
time efficiency of the implementation. The version of the program without the
critical section results in a incorrect output but it allows valuable insight into the
results for the correct implementation. An important note is that the version of the
parallel Naïve algorithm which excluded the critical section is only used in these
experiments, all other experiments as discussed in the next paragraphs were run
on a version of the parallel Naïve algorithm which used the critical section.

What was measured in the implementations were everything included in Algorithm


1, Algorithm 2, and Algorithm 3 including the pre-processing phases of Algorithm
2 and Algorithm 3, meaning the time it took to compute the tables. The parsing and
saving of the pattern and text is excluded in the time measurement as well as any
output writing. As discussed in Section 2.2, these parts of the implementations are
timed because they are the relevant parts for answering the research question. The
programs were timed on wall-clock time in seconds, for a justification see Section
2.2, and the times were taken in the programs to allow for maximum control over
what was timed. Each of the experiments for each implementation paired with
each pattern length were run 75 times.

The input to the experiments are two different texts of different alphabet lengths,
one with 4 characters in its alphabet and one with 256 characters to examine the
difference between the implementations in different conditions. The lengths of
patterns that were used were 1 to 20 since those lengths make a sufficient impact
on the tables used. The Deadzone, Horspool, and parallel Naïve, using thread
counts from 1 to 8, implementations were all tested on the exact same patterns and
these patterns were randomly chosen from the texts since this is the regular way
to conduct experiments such as these (Mauch et al. 2012, 62).

The text with an alphabet size of 256 is 69204125 characters long with the text
of alphabet size 4 is 4638691 characters long. The text of the greater alphabet
consists of several repetitions of an english translation of the text Les Misérables
(Hugo 1887) and the text of the smaller alphabet is the E.coli genome as given by
Matt Powell (2001).

13
3.1 Delimitations

The experiments were conducted on one computer which limits the amount of
threads which can be used to run the parallel version of the Naïve algorithm. This
could result in inconsistent results if enough threads are not available at the time
of running parts of the experiment. Furthermore, this limits the results further in
the sense that when testing on several different computers, more general results
could be obtained or reveal results such that the parallel Naïve algorithm should
be chosen on some computers but not on others. Conducting the experiments on a
personal computer also allows for the operating system to intervene in the times.
Since the operating system of a computer can start background processes unbe-
knownst to the user these processes could take time from the CPU and could lead
to inconsistencies in the times of the programs.

Another possible limitation is described by Pacheo (2011, 64) who mentions that
the possible accuracy for most timer functions is limited which limits the lengths
of texts and patterns that can be timed for experiments such as these, meaning that
the algorithms could prove too fast for texts and patterns which are small enough.
Since the times are calculated in the programs themselves the calculations may be
subject to subtractive cancellation which could result in slightly inaccurate times.

There are possible experiments and conditions that were excluded due to their
irrelevancy to the purpose of this study. For example, changing the lengths of
the texts for the different alphabets to be the same in order to compare the results
between the alphabets directly. This was not done because the goal of the study is
not to see how algorithms directly compare for different alphabets, the purpose is
to see if a parallel Naïve algorithm can be more efficient than some state-of-the-art
algorithms.

Furthermore, to test the algorithms on smaller texts and to time the algorithms
without their respective pre-processing phase were both excluded. As discussed
in Section 2.4, Section 2.5, and Section 2.6, the algorithms differences in effi-
ciencies are dependent on the pattern and not on the length of the text. Changing
the length of the texts would not change the algorithms behaviours and therefore
these changes would not yield results meaningful enough to justify their inclusion.

14
Regarding timing the algorithms without their pre-processing phases, this would
defeat the purpose of the study as well. In a realistic setting, the pre-processing
phases are calculated when the algorithms are used since the alternative is saving
all possible pattern tables and then looking up the correct one, which ironically
would require efficient pattern matching.

4 Results

This section displays the results of the experiments separately but will be discussed
together in later sections. The first subsection are results directly relevant to the
research question, for easier readability the graphs in this subsection only includes
the thread counts 1, 2, 4, and 8. The second subsection shows the results of the par-
allel Naïve implementation using all thread counts from 1 to 8. The third subsec-
tion displays clarifications of the results from the previous two subsections which
are useful for the discussion of the results in later sections.

4.1 Research Question Results

The experiments conducted on a personal computer with the previously described


implementations for the greater alphabet resulted in that the parallel Naïve algo-
rithm implementation was the most efficient, with regards to wall-clock time, for
all pattern lengths tested when using 4 and 8 threads. The parallel Naïve using 2
threads was more efficient than the Horspool and Deadzone algorithms for pat-
terns under the length of 11 but slower than Horspool for patterns of length 15 and
greater, see Figure 4. However, the parallel Naïve implementation proved more
efficient when using 2 threads than the Horspool implementation for patterns of
length 1-11. When using 8 threads the Naïve was faster for patterns of length 1-20,
see later sections for a clarification on the longer pattern lengths. Furthermore, the
results show that the average time for the Deadzone and Horspool implementations
are decreasing as the length of the pattern increases for the patterns tested.

15
Figure 4: The figure shows a graph for the average time it took in seconds for the
Deadzone, Horspool, and Naïve, using different amounts of threads, implementa-
tions to pattern match with patterns of lengths 1 to 20 for a greater alphabet. For
example, Naïve 2 denotes the parallel Naïve using 2 threads.

The results of the experiments on a smaller alphabet showed that the Deadzone,
Horspool, and serial Naïve algorithms as implemented performed similarly to how
they performed on the greater alphabet, although the Horspools times were less
consistent, compare Figure 4 and Figure 5. The parallel Naïve implementation
showed to be less efficient for patterns of length 1 than all other algorithms, with
the exception that the parallel Naïve using 2 threads gave similar results as Dead-
zone. For patterns longer than 3 the parallel Naïve gave quicker times than the
other algorithms when using 8 threads, with the thread count 2 resulting in similar
times to Horspool for a majority of the patterns longer than 7.

16
Figure 5: The figure shows a graph for the average time it took in seconds for
the Deadzone, Horspool, and Naïve, using different amounts of threads, imple-
mentations to pattern match with a pattern of lengths one to twenty for a smaller
alphabet. For example, Naïve 2 denotes the parallel Naïve using 2 threads.

4.2 The Parallel Naïve

The results gained from the experiments run on a personal computer showed that
the implementation of the parallel Naïve algorithm did not become strictly faster as
more threads were used for the greater alphabet, see Figure 6. The implementation
was fastest for 8 number of threads with 4 threads being a close second. The
experiments for the smaller alphabet using different amount of threads gave the
results displayed in Figure 7 which shows that more threads gave a worse result
for small patterns but a better result for larger patterns.

17
Figure 6: The figure shows a graph for the average time it took in seconds for
the Naïve algorithm with numbers of threads from 1 to 8 to pattern match with a
pattern of lengths 1 to 20 for a greater alphabet. For example, Naïve 2 denotes the
parallel Naïve using 2 threads.

Figure 7: The figure shows a graph for the average time it took in seconds for
the Naïve algorithm with numbers of threads from 1 to 8 to pattern match with a
pattern of lengths 1 to 20 for a smaller alphabet. For example, Naïve 2 denotes the
parallel Naïve using 2 threads.

18
4.3 Clarifications

Figure 8 shows a clearer graph of the Horspool implementations time in compar-


ison of the parallel Naïve implementation using 8 threads on a greater alphabet.
The graph shows that the parallel Naïve when using 8 threads was faster than Hor-
spool, although the Horspool implementation appears to become more efficient
as the pattern length increases. The different times for the Horspool algorithm as
implemented for a pattern of length 12 on the smaller alphabet are shown in Figure
9 and reveals that there are few anomalies in the calculated times.

Figure 8: The figure shows a graph for the average time it took in seconds for
the Horspool algorithm and the parallel Naïve using 8 threads to pattern match a
pattern of lengths 6 to 20 for a greater alphabet.

19
Figure 9: The figure shows a box-plot for the different times in seconds the Hor-
spool implementation took to find a pattern of length 12 in a smaller alphabet. The
X denotes the mean.

The results for the parallel Naïve implementation for a smaller alphabet showed
that the parallel Naïve algorithm is slower using several threads for patterns of
length 1 or 2, see Section 4.2. However, the results for longer patterns are difficult
to see and therefore Figure 10 is included to clarify the results which shows that
the parallel Naïve does not become strictly faster as more threads are introduced.
Furthermore, a box-plot of the parallel Naïve implementation when 8 threads are
used for a pattern length of 19 is shown in Figure 11 and shows that the anomalies
that occurred were less than the average.

20
Figure 10: The figure shows a graph for the average time it took in seconds for
the Naïve algorithm with numbers of threads from 2 to 8 to pattern match with
patterns of lengths 3 to 20 for a smaller alphabet. For example, Naïve 2 denotes
the parallel Naïve using 2 threads.

Figure 11: The figure shows a box-plot for the different times the parallel Naïve
implementation using 8 threads took to find a pattern of length 19 in a smaller
alphabet.

21
Removing the critical section from the parallel Naïve algorithm implementation,
see row 10 in Algorithm 1, results in a different speedup for patterns of length 1,
see Figure 12, while the speedup remains similar in nature for patterns of length
20, see Figure 13.

Figure 12: The figure shows a graph for the speedup for the parallel Naïve imple-
mentation with and without the critical section from 1 to 8 threads on a pattern of
length 1 for a smaller alphabet.

22
Figure 13: The figure shows a graph for the speedup for the parallel Naïve imple-
mentation with and without the critical section from 1 to 8 threads on a pattern of
length 20 for a smaller alphabet.

5 Discussion & Analysis

The aim of this thesis is to find if a parallel implementation of the Naïve algorithm
for the pattern matching problem can become more efficient, with regards to wall-
clock time, than a implementation of the Horspool algorithm and a algorithm from
the Deadzone family given a reasonable amount of threads for a personal computer.
This section examines the results presented in Section 4 and attempt to draw a
conclusion which can answer the research question.

Firstly, examining the results for the greater alphabet shows that the parallel Naïve
algorithm is more efficient, with regards to wall-clock time when using 8 threads,
for patterns of length 1 to 20 while the Deadzone algorithm implementation appear
to be the least efficient for all patterns tested, see Figure 4. Examining the other
thread counts for the parallel Naïve implementation reveals it to be faster than
both Deadzone and Horspool for patterns of lengths 1 to 14 when using 3 threads
or more, compare Figure 4 and Figure 6. Although, the Horspool implementation
appears to be steadily decreasing as the pattern length increases and would, most

23
likely, become faster than the parallel Naïve implementation using 8 threads in
the conditions of the experiments, see Figure 8 for a clearer display. The decrease
in time which the Horspool and Deadzone implementations follow could be at-
tributed to the increasing information the longer patterns provide, meaning that
the algorithms will be able to skip larger parts of the text, see Section 2.4 and Sec-
tion 2.5. The Horspools inconsistency in Figure 8 can be attributed to the changing
values of the table, the timers possible inaccuracy, or the unpredictability of the
operating system.

Regarding the parallel Naïve implementation for the greater alphabet, the reason
that the Naïve algorithm implementation does not noticeably change its running
time depending on the length of the pattern can be explained by the algorithm
itself as explained in Section 2.3. Essentially, since the patterns are of a smaller
nature, in the rare event that most of the pattern is matched with the text before the
algorithm can move forward the pattern is small enough that this detour should not
affect the implementations time much. Besides, this case is unlikely because of
the great size of the alphabet in these experiments. The reason for that the parallel
Naïve implementations does not linearly improve as more threads are introduced
could be complications with the software since the algorithm should be close to
embarrassingly parallel, see Section 2.3. Even though attempts were made to free
as many of the cores as possible there could still be background processes started
by the computers operating system which could use valuable time and memory.

Secondly, the results for the experiments on the alphabet of size 4 can be seen in
Figure 5 and Figure 7, with Figure 10 displaying clearer results for the parallel
Naïve implementation for patterns of length 3 to 20 using 2 to 8 threads. The
Horspool implementation follows a similar pattern for the smaller alphabet as for
the greater alphabet, although slightly less consistently. This inconsistency can be
attributed to the changing values of the table or the inconsistency in the operating
system. The Horspools increase in time for patterns of length 12 is not because of
a great amount of anomalies, as shown in Figure 9, but rather, perhaps, because of
the table quickly changing the amount one character will shift the pattern, which
will happen more frequently for a smaller alphabet.

Furthermore, the parallel Naïve implementation appeared to be fastest when using


7 threads and, once again, the implementation did not become more efficient, with

24
regards to wall-clock time, linearly as more threads were introduced. This could
also be attributed to the unpredictability of the software. Moreover, the implemen-
tation showed an increase in time for a pattern length of 19 when using 8 threads.
According to Figure 11, the reason for this deviation is not anomalies in the run-
ning time and could therefore be explained by the operating system. In contrast
to the Horspool implementation, the parallel Naïve implementation showed to be
more efficient when using 3 threads or more for patterns of length 3 or longer.

One last detail to discuss is the increase in time for patterns of length 1 in the par-
allel experiments, see Figure 7. A culprit for this behaviour could be the Naïve
implementations critical section, see row 10 in Algorithm 1. This is confirmed, for
the computer the experiments were conducted on, in Figure 12 where experiments
were run and the speedup calculated twice, once including the critical section and
once excluding the critical section. Although, as seen in Figure 13 this only af-
fected the implementation for patterns of length 20 minimally. The reason that the
critical section contributes greatly to a speedup less than 1 for the small alphabet
could be because a queue of threads are often waiting to enter the critical section.
This becomes a greater problem for smaller patterns on smaller alphabets since the
different threads will encounter the pattern more often, see Section 2.3 for a more
detailed explanation of this special case.

In conclusion, the parallel Naïve algorithm evaluated in this article appears to be


able to become more efficient than the Deadzone and Horspool algorithms for pat-
terns of lengths 3 to 20 for both alphabet sizes 4 and 256. However, the results for
the greater alphabet suggests that the Horspool algorithm could become more effi-
cient than the most efficient parallel Naïve thread count tested in the experiments
for larger patterns and further testing is required to investigate if this will come to
fruition. The results for the parallel version of the Naïve algorithm showed that
it becomes more efficient than the Horspool algorithm at 3 threads for a pattern
length of 20 on the smaller alphabet while needing 4 or 8 threads for patterns of
the same length on the greater alphabet.

25
6 Future Work

Future work could include parallizing the Deadzone and Horspool algorithms to
evaluate if these gain great efficiency from a parallelization. Another possible
extension of this study is to test the parallel Naïve algorithm on a cluster to observe
if the algorithm scales in such an environment. A future study could also include a
different, and preferably more efficient, variant of the Deadzone algorithm. Lastly,
the algorithms could be tested for larger patterns to see if the algorithms behaviour
follows the same time pattern.

26
References

Boyer, S. Robert; Moore, Strother, J. 1977. A Fast String Searching Algorithm.


Communications of the ACM 20(10): 726-772. doi: 10.1145/359842.359859

Daykin, W. Jacqueline; Groult, Richard; Guesnet, Yannick; Lecroq, Thierry; Lefeb-


vre, Arnaud; Léornard, Martine; Mouchard, Laurent; Prieur-Gaston, Élise; Wat-
son, Bruce. 2018. Three Strategies for the Dead-Zone String Matching Algorithm.
Proceedings of the Prague Stringology Conference 2018: 117-128.

De V. Smit, G. 1982. A comparison of three string matching algorithms. Software,


practice & experience 12(1): 57-66. doi: 10.1002/spe.4380120106

Horspool, Nigel, R. 1980. Practical Fast Searching in Strings. Software: Practice


and Experience 10: 501-506.
http://webhome.cs.uvic.ca/ nigelh/Publications/stringsearch.pdf

Hugo, Victor. 1887. Les Misérables. Translated by Hapgood, Isabel Florence.


https://www.gutenberg.org/ebooks/135 (Retrieved 2022-04-06)

Karp, Richard M.; Rabin, Mikael O. 1987. Efficient randomized pattern-matching


algorithms. IBM Journal of Research and Development 31(2): 249-260. doi:
10.1147/rd.312.0249

Knuth E. Donald, Morris H. James, Pratt H. Vaughan. 1977. Fast Pattern Match-
ing in Strings*. SIAM Journal on Computing 6(2): 323-350.
http://static.cs.brown.edu/courses/csci1810/resources/ch2_readings/kmp_strings.pdf

Kourie, G. Derrick; Watson, W. Bruce; Strauss, Tinus. 2012. A Sequential Recur-


sive Implementation of Dead-Zone Single Keyword Pattern Matching. I Symth, F.
W. and Arumugam, Subramanian (red.). Combinatorial Algorithms: 23rd Inter-
national Workshop, IWOCA 2012, Krishnankoil, India, July 19-21, 2012, Revised
Selected Papers. 236-248.

Kouzinopoulos, Charalampos S.; Margaritis, Konstantinos G. 2009. String Match-

27
ing on a Multicore GPU Using CUDA. 2009 13th Panhellenic Conference on In-
formatics: 14-18. doi: 10.1109/PCI.2009.47

Mauch, Melanie; Kourie, G. Derrick; Watson, W. Bruce; Strauss, Tinus. 2012.


Performance Assessment of Dead-Zone Single Keyword Pattern Matching. SAIC-
SIT ’12: Proceedings of the South African Institute for Computer Scientists and
Information Technologists Conference: 59-68. doi: 10.1145/2389836.2389844

Pacheo, Peter S. 2011. An introduction to parallel programming. Amsterdam;


Boston: Elsevier/Morgan Kaufmann Publishers.

Powell, Matt. 2001. Descriptions of the corpora. https://corpus.canterbury.ac.nz/descriptions/


(Retrieved 2022-04-06)

Sheik, S. S.; Aggarwal, K. Sumit; Poddar, Anindya; Balakrishnan, N.; Sekar, K.


2004. A Fast Pattern Matching Algorithm. Journal of Chemical Information and
Computer Science 44(4): 1251-1256. doi: 10.1021/ci030463z

Singla, Nimisha; Garg, Deepak. 2012. String Matching Algorithms and their Ap-
plicability in various Applications. International Journal of Soft Computing and
Engineering 1(6): 218-222.
https://www.gdeepak.com/pubs/String%20Matching%20Algorithms%20and%20their
%20Applicability%20in%20various%20Applications.pdf

Watson, W. Bruce; Watson, E. Richard. 1997. A New Family of String Pattern


Matching Algorithms. I Holub, Jan (red.). Proceedings of the Prague Stringology
Club Workshop ’97: 12-23.

28

You might also like