Professional Documents
Culture Documents
Textbook Computing Patterns in Strings Bill Smyth Ebook All Chapter PDF
Textbook Computing Patterns in Strings Bill Smyth Ebook All Chapter PDF
Smyth
Visit to download the full and correct content document:
https://textbookfull.com/product/computing-patterns-in-strings-bill-smyth/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/introducing-python-modern-
computing-in-simple-packages-bill-lubanovic/
https://textbookfull.com/product/social-value-in-public-policy-
bill-jordan/
https://textbookfull.com/product/a-history-of-english-
autobiography-adam-smyth/
https://textbookfull.com/product/the-law-and-business-
administration-in-canada-fourteenth-edition-edition-james-everil-
smyth/
Design Patterns by Tutorials Learning design patterns
in Swift 4 2 Joshua Greene
https://textbookfull.com/product/design-patterns-by-tutorials-
learning-design-patterns-in-swift-4-2-joshua-greene/
https://textbookfull.com/product/business-vocabulary-in-use-
intermediate-3rd-edition-bill-mascull/
https://textbookfull.com/product/design-patterns-by-tutorials-
third-edition-learning-design-patterns-in-swift-raywenderlich-
tutorial-team/
https://textbookfull.com/product/numerical-computing-with-python-
harness-the-power-of-python-to-analyze-and-find-hidden-patterns-
in-the-data-1st-edition-pratap-dangeti/
https://textbookfull.com/product/music-and-sound-in-the-life-and-
literature-of-james-joyce-joyces-noyces-gerry-smyth/
Bill Smyth McMaster University Curtin University of Technology
In the beginning was the Word, and the Word was with God, and the
Word was God. —John 1:1 The computation of patterns in strings is
a fundamental requirement in many areas of science and information
processing. The operation of a text editor, the lexical analysis of a
computer program, the functioning of a finite automaton, the
retrieval of information from a database — these are all activities
which may require that patterns be located and/or computed. In
other areas of science, the algorithms that compute patterns have
applications in such diverse fields as data compression, cryptology,
speech recognition, computer vision, computational geometry, and
molecular biology. And computing patterns in strings is not a topic
whose importance lies only in its current practical applications: it is a
branch of combinatorics that includes many simply-stated problems
which often turn out to have solutions — and often more solutions
than one — of great subtlety and elegance. It is surprising therefore
that academic Departments of Mathematics or Computer Science do
not generally include in their undergraduate or graduate curricula
courses which provide an introduction to this interesting, important,
and heavily researched topic. It is perhaps even more surprising that
so few texts have been written with the purpose of putting together
in a uniform way some of the basic results and algorithms that have
appeared over the past quarter-century. I know of five books [St94,
CR94, G97, SM97, CHL01] and three fairly long survey articles [B
Y89a, A90, NavOl] whose subject matter overlaps significantly with
that of this volume. Of these, the survey articles, [St94], and [CR94]
are written more as summaries of research than as texts for
students; while [G97] and [SM97] focus heavily on the (important)
applications in molecular biology. The final monograph [CHL01] does
indeed function as both a monograph and a textbook on string
algorithms, and is moreover both clearly and elegantly written;
unfortunately, it is currently available only in French. The purpose of
this book then is to begin to fill a gap: to provide a general
introduction to algorithms for computing patterns in strings that is
useful to experienced researchers
Preface in this and other fields, but that also is accessible to senior
undergraduate and graduate students. Let us linger a moment over
three of the words used in the preceding sentence: "accessible",
"algorithms", and "patterns". An overriding objective in this book is
to make the material accessible to students who have completed or
nearly completed a mathematics or computer science undergraduate
curriculum that has included some emphasis on discrete structures
and the algorithms that operate upon them. A first consequence of
this objective is that the mathematical background required to read
this book is general rather than specific to strings. It would certainly
be provided by the standard IEEE/ACM courses in Discrete
Mathematics, Data Structures, and Analysis of Algorithms. The
reader will know what stacks, queues, linked lists and arrays are, for
example, and will have some familiarity with the analysis of
algorithms and the "asymptotic complexity" notation used for this
purpose, some experience with mathematical assertions and the
methods used to establish their correctness, and some knowledge of
important algorithms on graphs and trees. In addition, the
assumption is made that the reader is familiar with some computer
programming language, and has the ability to read and understand
algorithms expressed in such a language. A second consequence is
that no claim is made to completeness: my objective is to lure the
student and the reader into a fascinating field, not to write an
encyclopaedia of algorithms that compute patterns in strings. In
particular, I have been selective in two main ways: I focus on results
that are (I believe) important and that moreover can be explained
with reasonable economy and simplicity. Inevitably, this means that
the exposition of some interesting and valuable material is omitted.
However, I hope that, both by providing references to much of this
material and by stimulating interest, I will encourage readers to
investigate the literature for themselves. The underlying subject
matter of this book is a mathematical object called by most
computer scientists a "string" (or, in Europe and by most
mathematicians, a "word"). But the focus of this book is on
algorithms — that is, on precise methods or procedures for doing
something — and it is thus more properly thought of as a text in
computer science rather than in mathematics. This book will
therefore take quite a different approach from that of the classic
monograph [L83], and its descendants [L97, L02], that elucidates
mathematical properties above all. We shall rather be interested
primarily in algorithms that find various kinds of patterns in strings;
for the most part, only as a byproduct of that focus will interest be
displayed in the mathematical properties of the strings themselves.
This does not mean that results will not be proved rigorously, only
that the selection of those results will generally depend on their
relevance to the behaviour of some algorithm. A final remark here:
in the exposition I confine myself to sequential algorithms on strings
in one dimension, making no reference to the extensive literature on
the corresponding parallel (especially PRAM) algorithms or to the
growing literature on multi-dimensional (especially two-dimensional)
strings. Another focus of this book will be on patterns. That is, the
algorithms we discuss will virtually all be devoted to finding some
sort of a pattern in a string. I say "some sort" of pattern, because
three main kinds will be distinguished — specific, generic, and
intrinsic — that provide this book with three of its four main
divisions. A specific pattern is one that can be specified by listing
characters in their required order; for example, if we were searching
for the pattern u = abaab in the string x = abaababaabaab we
would find it (three times), but we would not find the pattern u =
ababab.
To my parents.
Algorithms
rties of Strings 1.1 Springs of Pearls Wprds form the thread on which
we string our experiences. - Aldous HUXLEY A894-1963), Brave New
World Consider a string of pearls. Imagine that the string is laid out
on the table before you, so that one end is on the left and the other
on the right. Suppose that there are n pearls in the string, and that
each pearl has a tiny label pasted on it. Suppose further that the
labels are integers in the range l..n and that they satisfy the
following rules: IS the label on the leftmost pearl is 1; M for every
integer i = 1,2,..., n - 1, the pearl to the right of the pearl labelled i
has labeli+l. These rules seem to satisfy our intuitive idea of what
makes a string of pearls a "string": the pearls all lie on a single well-
defined path, and the path can be traversed from one end to the
other by moving from the current pearl to an adjacent one.
Reflecting on the rules, however, we realize that we need not be so
specific. First of all, of course, we do not really need to speak of
"pearls": we can speak more generally of (undefined) elements. But
a second, more fundamental, observation is that the labels do not
need to be chosen in any spedfic order, and they do not need to be
integers: they could be colours, for example, or letters of the
alphabet. What really matters is that
Periodicity 19 will mean that, over all problem instances whose size
exceeds some fixed value no, the algorithm requires time at least kin
and at most k2n, where k\ and fo are also fixed. By contrast, the
statement execution space e 0{n) makes the considerably weaker
assertion that, again over all problem instances of size greater than
no, the algorithm requires space at most k2n. Thus, in this case, it is
possible that there may exist problem instances whose space
requirement is proportional to a quantity, perhaps log n or s/n or n/
log n or even 1, that is asymp- asymptotically less than n; in fact, it
is even possible that all problem instances have space requirement
proportional to such a quantity. Throughout this book, we will use 0
to describe algorithmic properties only when the strong condition
illustrated in A.8) is satisfied. Usually, when we use O(f(n))
(respectively, f2(/(n))), we will mean that there do exist collections
of problem instances for which the property is in fact exactly of
order f{n), but that also there exist other collections of problem
instances for which the property is asymptotically less than
(respectively, greater than) order f{n). We have seen that Algorithm
1.3.1 requires 0(n) time for its execution over all problem instances;
since any algorithm that computed the border array would need to
access each of n positions at least once, we are therefore justified in
describing Algorithm 1.3.1 as asymp- asymptotically optimal: no
algorithm could compute /3[l..n] in less than 0(n) time and constant
;e for any problem instance of size n. As discussed in Section 4.1,
Algorithm 1.3.1 is in a certain sense an on-line algorithm, yielding at
each step a result for the current spai alsc position i as the given
string is processed from left to right. Thus, without backtracking, the
algorithm yields a result, not only for the given string, but also for
every prefix of it. However, in a stricter sense, Algorithm 1.3.1 would
not be described as on-line: since any position i' < i in x may need
to be visited in order to assist in the calculation for i, the entire
string needs to be available for processing at every step of the
calculation. As an example of a border calculation, suppose that the
string 1 2 3 4 5 6 7 8 9 10 11 12 13 f = abaababaab a a b,
introduced in Section 1.2, is given. Then the border array /3[l..n] of
/ is 0011232345645. Observe how, in a string such as / with many
repetitive substrings, the values in the border am.y may fall back
and then rise again. For example, /3[6..8] = 323 and /3[11..13] =
645. The way in which the border array C of a given string x is
calculated makes it clear that all of the borders of x (indeed, of
every prefix of x) can be computed from /3. This follows from the
observation stated in A.6) that the second-longest border of x[l..i] is
the longest border of x [l../3[i\]. We state this important result as a
theorem: