You are on page 1of 53

Computing Patterns in Strings Bill

Smyth
Visit to download the full and correct content document:
https://textbookfull.com/product/computing-patterns-in-strings-bill-smyth/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Introducing Python Modern Computing in Simple Packages


Bill Lubanovic

https://textbookfull.com/product/introducing-python-modern-
computing-in-simple-packages-bill-lubanovic/

Social Value in Public Policy Bill Jordan

https://textbookfull.com/product/social-value-in-public-policy-
bill-jordan/

A History of English Autobiography Adam Smyth

https://textbookfull.com/product/a-history-of-english-
autobiography-adam-smyth/

The Law and Business Administration in Canada


Fourteenth Edition. Edition James Everil Smyth

https://textbookfull.com/product/the-law-and-business-
administration-in-canada-fourteenth-edition-edition-james-everil-
smyth/
Design Patterns by Tutorials Learning design patterns
in Swift 4 2 Joshua Greene

https://textbookfull.com/product/design-patterns-by-tutorials-
learning-design-patterns-in-swift-4-2-joshua-greene/

Business Vocabulary in Use Intermediate 3rd Edition


Bill Mascull

https://textbookfull.com/product/business-vocabulary-in-use-
intermediate-3rd-edition-bill-mascull/

Design Patterns by Tutorials Third Edition Learning


Design Patterns in Swift Raywenderlich Tutorial Team

https://textbookfull.com/product/design-patterns-by-tutorials-
third-edition-learning-design-patterns-in-swift-raywenderlich-
tutorial-team/

Numerical Computing With Python Harness The Power Of


Python To Analyze And Find Hidden Patterns In The Data
1st Edition Pratap Dangeti

https://textbookfull.com/product/numerical-computing-with-python-
harness-the-power-of-python-to-analyze-and-find-hidden-patterns-
in-the-data-1st-edition-pratap-dangeti/

Music and Sound in the Life and Literature of James


Joyce: Joyces Noyces Gerry Smyth

https://textbookfull.com/product/music-and-sound-in-the-life-and-
literature-of-james-joyce-joyces-noyces-gerry-smyth/
Bill Smyth McMaster University Curtin University of Technology

Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE


England and Associated Companies throughout the world Visit us on
the World Wide Web at: www.pearsoneduc.com First published 2003
© Pearson Education Limited 2003 The right of William F. Smyth to
be identified as author of this work has been asserted by him in
accordance with the Copyright, Designs and Patents Act 1988. All
rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording or
otherwise, without either the prior written permission of the
publisher or a licence permitting restricted copying in the United
Kingdom issued by the Copyright Licensing Agency Ltd, 90
Tottenham Court Road, London WIT 4LP. All trademarks used herein
are the property of their respective owners. The use of any
trademark in this text does not vest in the author or publisher any
trademark ownership rights in such trademarks, nor does the use of
such trademarks imply any affiliation with or endorsement of this
book by such owners. ISBN 0 201 39839 7 British Library
Cataloguing-in-Publication Data A catalog record for this book is
available from the British Library Library of Congress Cataloging-in-
Publication Data A catalog record for this book is available from the
Library of Congress 10987654321 07 06 05 04 03 Typeset by 68
Printed and bound in Great Britain by Biddies Ltd, Guildford and
King's Lynn

Computing Patterns in Strings

PEARSON Education We work with leading authors to develop the


strongest educational materials in computer science, bringing
cutting-edge thinking and best learning practice to a global market.
Under a range of well-known imprints, including Addison-Wesley, we
craft high-quality print and electronic publications which help readers
to understand and apply their content, whether studying or at work.
To find out more about the complete range of our publishing, please
visit us on the World Wide Web at: www.pearsoneduc.com

V .-.i Preface Part I Strings and Algorithms Chapter 1 Properties of


Strings 1.1 Strings of Pearls 1.2 Linear Strings 1.3 Periodicity 1.4
Necklaces Chapter 2 Patterns? What Patterns? 2.1 Intrinsic Patterns
(Part II) 2.2 Specific Patterns (Part III) 2.3 Generic Patterns (Part IV)
Chapter 3 Strings Famous and Infamous 3.1 Avoidance Problems
and Morphisms 3.2 Thue Strings B,3) 3.3 Thue Strings C,2) 3.4
Fibostrings B,4) Chapter 4 Good Algorithms and Good Test Data 4.1
Good Algorithms 4.2 Distinct Patterns 4.3 Distinct Borders IX 1 3 3 5
14 25 35 35 41 51 61 61 65 72 76 89 89 94 100

vi Contents Part II Computing Intrinsic Patterns 109 Chapter 5 Trees


Derived from Strings 111 5.1 Border Trees 111 5.2 Suffix Trees 113
5.2.1 Preliminaries 114 5.2.2 McCreight's Algorithm 117 5.2.3
Ukkonen's Algorithm 121 5.2.4 Farach's Algorithm 126 5.2.5
Application and Implementation 137 5.3 Alternative Suffix-Based
Structures 140 5.3.1 Directed Acyclic Word Graphs 140 5.3.2 Suffix
Arrays — Saving the Best till Last? 149 Chapter 6 Decomposing a
String 157 6.1 Lyndon Decomposition: Duval's Algorithm 158 6.2
Lyndon Applications 167 6.3 s-Factorization: Lempel-Ziv 175 Part III
Computing Specific Patterns 179 Chapter 7 Basic Algorithms 181 7.1
Knuth-Morris-Pratt 181 7.2 Boyer-Moore 187 7.3 Karp-Rabin 198 7.4
D6molki-(Baeza-Yates)-Gonnet 202 7.5 Summary 206 Chapter 8 Son
of BM Rides Again! 207 8.1 The BM Skip Loop 208 8.2 BM-Horspool
210 8.3 Frequency Considerations and BM-Sunday 212 8.4 BM-Galil
219 8.5 Turbo-BM 223 8.6 Daughter of KMP Rides Too! 226 8.7 Mix
Your Own Algorithm 231 8.8 The Exact Complexity of Exact Pattern-
Matching 233 Chapter 9 String Distance Algorithms 237 9.1 The
Basic Recurrence 238 9.2 Wagner-Fischer etal. 241 9.3 Hirschberg
244 9.4 Hunt-Szymanski 250

Contents vjj 9.5 Ukkonen-Myers 256 9.6 Summary 263 Chapter 10


Approximate Pattern-Matching 265 10.1 A General Distance-Based
Algorithm 266 10.2 An Algorithm for ^-Mismatches 269 10.3
Algorithms for ^-Differences 274 10.3.1 Ukkonen's Algorithm 276
10.3.2 Myers'Algorithm 279 10.4 A Fast and Flexible Algorithm — Wu
and Manber 286 10.5 The Complexity of Approximate Pattern-
Matching 292 Chapter 11 Regular Expressions and Multiple Patterns
295 11.1 Regular Expression Algorithms 297 11.1.1 Non-
Deterministic FA 297 11.1.2 Deterministic FA 302 11.1.3 Algorithm
WM Revisited 305 11.2 Multiple Pattern Algorithms 309 11.2.1 Aho-
Corasick FA: KMP Revisited 309 11.2.2 Commentz-Walter FA: BM
Revisited 313 11.2.3 Approximate Patterns: WM Revisited Again! 315
11.2.4 Approximate Patterns: (Baeza-Yates)-Navarro 317 Part IV
Computing Generic Patterns 327 Chapter 12 Repetitions (Periodicity)
329 12.1 All Repetitions 331 12.1.1 Crochemore 331 12.1.2 Main and
Lorentz 340 12.2 Runs 349 12.2.1 Leftmost Runs — Main 350 12.2.2
All Runs — Kolpakov and Kucherov 356 Chapter 13 Extensions of
Periodicity 359 13.1 All Covers of a String —Algorithm LS 360 13.2
All Repeats — Algorithm FST 370 13.2.1 Computing the NE Tree 372
13.2.2 Computing the NE Array 375 13.3 A;-Approximate Repeats —
Schmidt 380 13.4 ^-Approximate Periods — SIPS 396 Bibliography
403 Index 415

In the beginning was the Word, and the Word was with God, and the
Word was God. —John 1:1 The computation of patterns in strings is
a fundamental requirement in many areas of science and information
processing. The operation of a text editor, the lexical analysis of a
computer program, the functioning of a finite automaton, the
retrieval of information from a database — these are all activities
which may require that patterns be located and/or computed. In
other areas of science, the algorithms that compute patterns have
applications in such diverse fields as data compression, cryptology,
speech recognition, computer vision, computational geometry, and
molecular biology. And computing patterns in strings is not a topic
whose importance lies only in its current practical applications: it is a
branch of combinatorics that includes many simply-stated problems
which often turn out to have solutions — and often more solutions
than one — of great subtlety and elegance. It is surprising therefore
that academic Departments of Mathematics or Computer Science do
not generally include in their undergraduate or graduate curricula
courses which provide an introduction to this interesting, important,
and heavily researched topic. It is perhaps even more surprising that
so few texts have been written with the purpose of putting together
in a uniform way some of the basic results and algorithms that have
appeared over the past quarter-century. I know of five books [St94,
CR94, G97, SM97, CHL01] and three fairly long survey articles [B
Y89a, A90, NavOl] whose subject matter overlaps significantly with
that of this volume. Of these, the survey articles, [St94], and [CR94]
are written more as summaries of research than as texts for
students; while [G97] and [SM97] focus heavily on the (important)
applications in molecular biology. The final monograph [CHL01] does
indeed function as both a monograph and a textbook on string
algorithms, and is moreover both clearly and elegantly written;
unfortunately, it is currently available only in French. The purpose of
this book then is to begin to fill a gap: to provide a general
introduction to algorithms for computing patterns in strings that is
useful to experienced researchers

Preface in this and other fields, but that also is accessible to senior
undergraduate and graduate students. Let us linger a moment over
three of the words used in the preceding sentence: "accessible",
"algorithms", and "patterns". An overriding objective in this book is
to make the material accessible to students who have completed or
nearly completed a mathematics or computer science undergraduate
curriculum that has included some emphasis on discrete structures
and the algorithms that operate upon them. A first consequence of
this objective is that the mathematical background required to read
this book is general rather than specific to strings. It would certainly
be provided by the standard IEEE/ACM courses in Discrete
Mathematics, Data Structures, and Analysis of Algorithms. The
reader will know what stacks, queues, linked lists and arrays are, for
example, and will have some familiarity with the analysis of
algorithms and the "asymptotic complexity" notation used for this
purpose, some experience with mathematical assertions and the
methods used to establish their correctness, and some knowledge of
important algorithms on graphs and trees. In addition, the
assumption is made that the reader is familiar with some computer
programming language, and has the ability to read and understand
algorithms expressed in such a language. A second consequence is
that no claim is made to completeness: my objective is to lure the
student and the reader into a fascinating field, not to write an
encyclopaedia of algorithms that compute patterns in strings. In
particular, I have been selective in two main ways: I focus on results
that are (I believe) important and that moreover can be explained
with reasonable economy and simplicity. Inevitably, this means that
the exposition of some interesting and valuable material is omitted.
However, I hope that, both by providing references to much of this
material and by stimulating interest, I will encourage readers to
investigate the literature for themselves. The underlying subject
matter of this book is a mathematical object called by most
computer scientists a "string" (or, in Europe and by most
mathematicians, a "word"). But the focus of this book is on
algorithms — that is, on precise methods or procedures for doing
something — and it is thus more properly thought of as a text in
computer science rather than in mathematics. This book will
therefore take quite a different approach from that of the classic
monograph [L83], and its descendants [L97, L02], that elucidates
mathematical properties above all. We shall rather be interested
primarily in algorithms that find various kinds of patterns in strings;
for the most part, only as a byproduct of that focus will interest be
displayed in the mathematical properties of the strings themselves.
This does not mean that results will not be proved rigorously, only
that the selection of those results will generally depend on their
relevance to the behaviour of some algorithm. A final remark here:
in the exposition I confine myself to sequential algorithms on strings
in one dimension, making no reference to the extensive literature on
the corresponding parallel (especially PRAM) algorithms or to the
growing literature on multi-dimensional (especially two-dimensional)
strings. Another focus of this book will be on patterns. That is, the
algorithms we discuss will virtually all be devoted to finding some
sort of a pattern in a string. I say "some sort" of pattern, because
three main kinds will be distinguished — specific, generic, and
intrinsic — that provide this book with three of its four main
divisions. A specific pattern is one that can be specified by listing
characters in their required order; for example, if we were searching
for the pattern u = abaab in the string x = abaababaabaab we
would find it (three times), but we would not find the pattern u =
ababab.

Preface xj (Sometimes the pattern that we are looking for can


contain "don't-care" symbols, and sometimes the match that we
seek need only be "approximate" in some well-defined sense, but
these are refinements to be dealt with later.) By contrast, a generic
pattern is one that is described only by structural information of
some kind, not by a specific statement of the characters in it. For
example, we might ask for all the "repetitions" in x—that is, for all
cases in which two or more adjacent substrings of x are identical.
(The response in this case would be a list of repetitions including
(aba)(aba), (abaab) (abaab), aa (three separate times), and several
others — see if you can find them all.) I call the final kind of pattern
that we search for an intrinsic pattern — one that requires no
characterization, one that is inherent in the string itself. Here I
discuss various patterns that in one way or another expose the
periodic structure of a given string x; for example, normal form,
suffix tree, Lyndon decomposition, s-factorization. These intrinsic
patterns are used so frequently in algorithms that compute specific
or generic patterns as to be almost ubiquitous. Collectively, they
form the basis for the efficient processing of strings. The variety of
these intrinsic patterns is remarkable: the normal form of our
example string x is (abaababa) (abaab) while its Lyndon
decomposition is (ab)(aabab)(aab)(aab) and its s-factorization is (a)
(b)(a)(aba)(baaba)(ab) — all of these patterns are computationally
useful. The organization of this book is as follows. Part I gives basic
information about both strings and algorithms on strings. It provides
the terminology, notation and essential prop- properties of strings
that will be used throughout; in addition, it describes the main kinds
of algorithms to be presented and illustrates some of them on
certain famous strings; these strings are also "infamous" as
examples of worst case behaviour for many string algorithms. It is
particularly Chapter 2 that provides a kind of key to the rest of the
book: it explains precisely which problems are to be solved and
directs the reader to the later sections that present the algorithms
that solve them. Thus the book may be used fairly easily by the
reader who has selective interests. Part I also discusses qualities of
"good" algorithms on strings, and raises the interesting question of
how implementation of these algorithms should in practice be
validated. Parts II-IV deal with algorithms for computing intrinsic,
specific and generic patterns, respectively, as described above.
Altogether there are 13 chapters distributed over the four parts. As
indicated in the table of contents, these are broken down into
sections, each of which ends with a collection of exercises. Where
appropriate, chapters include a summary of the topics/algorithms
covered and a discussion of related results, additional topics and
open problems. A note about the exercises, of which there are some
500 or so: they are an integral part of the book, used for four main
purposes: M to make sure the reader has understood; m to clarify,
or to put into a different context, principles or ideas that have arisen
in the text — in a phrase, to make connections; m to handle
extensions or modifications of algorithms or mathematical results
that the reader should be aware of, but should not need to have
explained to him in detail; M to deal with details (of algorithms, of
proofs) that would otherwise unnecessarily clutter the presentation.

xii Preface Wherever possible, by explaining in the text only what


really needs to be explained, I have tried through the exercises to
involve the reader in the development or analysis of the algorithms
presented. What I myself have discovered by taking this approach is
that for the most part the algorithms, and the improvements to
algorithms, depend on a very simple new idea, an insight that is not
complicated but that has somehow previously escaped other
researchers. Once that idea is captured, what remains consists
mainly of technical- technicalities — tedious and convoluted perhaps,
but still a direct consequence of the main idea. This observation
seems to be true of string algorithms: I wonder what fields of study
it is not true of? A note also about the dreaded index: if on page p
of the text I have cited a work authored or co-authored by person P,
then the index entry for P should include p. I am very sensible of the
fact that sometimes I can be hasty (as Treebeard [T55] would say),
consequently error-prone. I have therefore spent a great deal of
time reworking this book in order to correct errors, rectify
oversights, or smooth over inelegancies; nevertheless I cannot
imagine that there will not be many defects to be found. I will be
maintaining a website http://www.cs.curtin.edu.au/
smyth/patterns.shtml to record corrections and suggestions for
improvement, and I would be grateful if readers would contact me at
smythQcomputing.edu.au Or smythSmcmaster.ca with their
comments. The material in this book is at least sufficient to cover
two one-semester A2-14 week) courses for graduate or advanced
undergraduate students. Indeed, I hope that it is more than
sufficient: I hope that it is also suitable. The initial chapters have
already been presented several times to graduate computer science
students in the Departments of Computer Science & Systems and of
Computing & Software at McMaster University, Hamilton, Ontario,
Canada; also to graduate students in the Department of Computer
Science, University of Debrecen, Hungary. These students have
contributed materially to the book's development. I wish particularly
to express my deep gratitude to the School of Computing, Curtin
University, Perth, Western Australia, and to its past and present
Heads of School, Dennis Moore, Terry Caelli, Svetha Venkatesh and
Geoff West, for generous support and encour- encouragement, both
intellectual and practical, over a period of several years. Most of this
book has been written during my sheltered visits to Curtin. I am
grateful also to Professor Petho Attila, Chair of the Department of
Computer Science at Debrecen, for his interest and sup- support: it
was in Debrecen late in 2001 that the last bits of ETgX were finally
keyed in. It is a pleasure to express my debt to my friends and
colleagues, Leila Baghdadi, Jerry Chappie, Franya Franek, Costas
Iliopoulos, Thierry Lecroq, Yin Li, Dennis Moore, Pat Ryan, Jamie
Simpson and Xiangdong Xiao for their valuable contributions. And
many thanks also to two anonymous referees whose constructive
comments have contributed materially to the final form of the book.
Finally, kudos to Jocelyn Smyth for her entertaining selection and
careful verification of "string" and "word" quotations! W F. S.

To my parents.

Algorithms

A word in time saves nine. — Anonymous

rties of Strings 1.1 Springs of Pearls Wprds form the thread on which
we string our experiences. - Aldous HUXLEY A894-1963), Brave New
World Consider a string of pearls. Imagine that the string is laid out
on the table before you, so that one end is on the left and the other
on the right. Suppose that there are n pearls in the string, and that
each pearl has a tiny label pasted on it. Suppose further that the
labels are integers in the range l..n and that they satisfy the
following rules: IS the label on the leftmost pearl is 1; M for every
integer i = 1,2,..., n - 1, the pearl to the right of the pearl labelled i
has labeli+l. These rules seem to satisfy our intuitive idea of what
makes a string of pearls a "string": the pearls all lie on a single well-
defined path, and the path can be traversed from one end to the
other by moving from the current pearl to an adjacent one.
Reflecting on the rules, however, we realize that we need not be so
specific. First of all, of course, we do not really need to speak of
"pearls": we can speak more generally of (undefined) elements. But
a second, more fundamental, observation is that the labels do not
need to be chosen in any spedfic order, and they do not need to be
integers: they could be colours, for example, or letters of the
alphabet. What really matters is that

Chapter 1. Properties of Strings @) every element has a label that is


unique; A) every element with some label x (except at most one,
called the leftmost) has a unique determinablepredecessor labelled
p(x); B) every element with some label x (except at most one, called
the rightmost) has a unique determinable successor labelled s(x); C)
whenever an element with label x is not leftmost, x = s(p(x)); D)
whenever an element with label x is not rightmost, x = p(s(x)); E)
for any two distinct elements with labels x and y, there exists a
positive integer k such that either x = sk(y) or x = pk(y). These
rules capture the essential idea of concatenation: each element has
either a unique predecessor or a unique successor, and, except at
the extremes, actually has both. Further- Furthermore, by following
a finite sequence of either successors or predecessors, we can reach
any element with label y from any other element with label x.
Fortified with these observations, then, we boldly state: Definition
1.1.1 A string is a collection of elements that satisfies rules 0-5. A
critical feature of this definition, not as yet discussed, is the
condition, included in rules 1 and 2, that there be at most one
leftmost or rightmost element. For consider what happens when the
clasp is fastened on the original string of pearls, forming a necklace.
Now there is no longer either a "leftmost" or a "rightmost" element
— but that turns out not to be a problem, since rules 0-5 continue to
apply. According to our definition, a necklace is also a string!
Furthermore, suppose that the number of pearls in the original string
were infinite: beginning at the lefthand edge of the table but
stretching away without end toward a forever unseen edge at the
right. We see that this infinite string also is covered by the definition:
it has a leftmost element, but no rightmost one. And, perhaps most
surprising of all, we see that even a string which extends to infinity
in both directions is covered by rules 0-5: in this case there is again
neither a leftmost nor a rightmost element. In this book we will at
various times become interested in all of these different kinds of
strings. To prevent confusion, therefore, we adopt the following
conventions. A string with a finite number of elements including both
a leftmost and a rightmost element will be called a linear string. A
string with a finite but nonzero number of elements and neither a
leftmost nor a rightmost element will be called a necklace (in the
literature also called a circular string). A string with an infinite
number of elements, of which one is leftmost, will be called an
infinite string; while a string with an infinite number of elements, of
which none is either leftmost or rightmost, will be called an infinite
necklace. When the meaning is clear from the context, we will just
use the word "string" to refer to any object satisfying Definition
1.1.1. In practice it will be easy to distinguish between linear strings
and necklaces, because necklaces will normally be defined in terms
of a corresponding linear string x and written

Liriear Strings 5 C(x) — we think of C(x) as being the necklace


formed from x by making its leftmost element the successor of its
rightmost element. Exercises 1.1 1. Explain why concatenation rule 3
is required. Give an example of a mathematical object which satisfies
rules 0-2 but not rule i. 2. Can rule 4 for concatenation be derived
from rules 0-3? Explain your answer. 3. Explain why concatenation
rule 5 is required. Characterize the mathematical objects that satisfy
rules 0-4 but not rule 5. 4. Does the infinite set {a, <r \ of strings
contain an infinite string? 5. According to Definition 1.1.1, can a
string consist of a single element a? Could such a string be a
necklace? 6. Is Definition 1.1.1 satisfied by a string with no elements
in it (a so-called empty string)? Does the above definition of a linear
string include the empty string? What about the definition of a
necklace ? 7. In view of the preceding exercise, how many dilTerent
kinds of siring are included in the classification of strings given in
this section? 8. Our classification of strings omits the following
cases: (a) a string with an infinity of elements including a rightmost
one but no leftmost one; (h) a string with a finite number of
elemenrs including either a leftmost one or a rightmost one, but not
both. Comment on these omissions. 9. Is there any way to prove
that Definition 1.1.1 defines a string? 1.2 L He who has been bitten
by a snake fears a piece of string. near Strings — Persian proverb
Frbm the discussion of the preceding section, it becomes clear that
the idea of a string, though a simple one, is also very general. A
string might be

Chapter 1. Properties of Strings ¦ a word in the English language,


whose elements are the upper and lower case English letters
together with apostrophe (') and hyphen (-); ¦ a text file, whose
elements are the ASCII characters; ¦ a book written in Chinese,
whose elements are Chinese ideographs; ¦ a computer program,
whose elements are certain "separators" (space, semicolon, colon,
and so on) together with the "words" between separators; ¦ a DNA
sequence, perhaps three billion elements long, containing only the
letters C, G, A and T, standing for the nucleotides cytosine, guanine,
adenine and thymine, respectively; ¦ a stream of bits beamed from a
space vehicle; ¦ a list of the lengths of the sides of a convex
polygon, whose values are drawn from the real numbers. All of these
examples are instances of what we have called in Section 1.1a
"linear string". Indeed, most of this book will deal with linear strings,
and so in this section we introduce notation and terminology useful
for talking about them. Much, but not all, of this terminology will
also apply to necklaces, infinite strings, and the empty string. The
examples make clear that an important feature of any string is the
nature of its elements: bits, members of {C, G, A, T}, real numbers,
as the case may be. It is in fact customary to describe a string by
identifying a set of which every element in the string is a member.
This set is called an alphabet, and so naturally its members are
referred to as letters — though, as we have seen, the term "letter"
must be interpreted much more broadly than is usual in English. We
say then that a string is defined on its alphabet. Of course, if a string
x is defined on an alphabet A, then x is also defined on any superset
of A, so an alpha- alphabet for x is not unique. A minimum alphabet
for x is just the set of all the distinct elements that actually occur in
x. Sometimes it is convenient to define the alphabet of a string as
the minimum one ("bits"), sometimes as a set that is far from
minimum ("real numbers"). Throughout this book A will denote an
alphabet and a = \A\ its cardinality. In the common cases that a is
2,3 or 4, we say that the alphabet A is binary, ternary or quaternary,
respectively; as we shall see, there are many interesting strings on a
binary alphabet, and a quaternary alphabet is of particular
importance because of applications to the analysis of DNA
sequences. In general, apart from the distinctness of the elements of
A that follows from the set property, we place no other restriction
upon them: the elements of the alphabet may be finite (even zero!)
in number, countably infinite (for example, the integers), or
uncountably infinite (for example, the reals). And the elements of
the alphabet may be totally ordered (so that a comparison of any
distinct pair of them yields the result "less" or "greater"), unordered,
or somewhere in between ("partially ordered"). For many of the
algorithms discussed in this book, it will be sufficient to use an
unordered alphabet; as discussed in Chapter 4, the nature of the
alphabet on which an algorithm operates is very important to the
selection of test data for the algorithm as well as to its
computational efficiency. For a given alphabet A, let A+ be the set of
all possible nonempty finite concatenations of the letters of A. Thus,
for example, if A = {a}, then A+ = {a, a2, }, where we write a2 for
aa, a3 for aaa, etc.; and if A = {0,1}, then A+ consists of all distinct
nonempty finite sequences of bits, and so may be thought of as
including all the nonnegative integers. As suggested in Exercise
1.1.6, it is convenient also to introduce the idea of the empty string,
usually written e, which we use to define the sets A' — A U {e} and
A* = A+ U {e}.

Linear Strings 7 Thi.5 terminology allows us to express another


definition of linear strings equivalent to that given in the previous
section: Definition 1.2.1 An element of A+ is called a linear string on
alphabet A. An element of A* is called a finite string on A. Thu s A+
is the set of all linear strings on a given alphabet A. Note that the
empty string is not a linear string: after all, it has neither a leftmost
nor a rightmost element! ", throughout this book strings will
consistently be denoted by boldface letters, almost always lower
case: p, t, x, and so on.,We will implement the rules 0-5 of Section
1.1 by treating strings as one-dimensional arrays; alternatively, we
might have used a linked-list or some other representation, but
arrays are a simple and natural model, as we have seen with the
string of pearls. Thus for any string, say x, containing n > 0 letters
drawn from an alphabet A, the implicit declaration will be x : array
[l..n] of A. In this case we will say that the string has length n = \x\
and positions 1,2,..., n. For any integer % e l..n, the letter in position
% is x[i], so that we may write x = aj[l]ai[2] • • -x[n], which we
recognize from Definition 1.1.1 as a concatenation of n strings of
length 1. In fact, in this formulation the position i plays the role of
the label introduced in Definition 1.1.1. Note that the array model
works also for the empty string x = e which corresponds to an e
mpty array and has length 0. Digression 1.2.1 We have said that
arrays are a "simple and natural" representation of strings, a
statement that obscures a significant computational issue. Certainly
an array is a si nple data structure, but whether or not it is natural
depends on assumptions about the mechanisms by which the
elements of the array are accessed. As we have seen, strings are
defined in terms of concatenation, and so it would seem to be
"natural" to access eleraents using the successor (next) and
predecessor (previous) operations introduced in Sec pos ion 1.1.
These are mechanisms compatible with a linked-list representation,
where access to an element at position i from the "current" position j
would at least require time proportional to \j — i|. However, an
arbitrary element in a computer array can normally be accessed in
constant time simply by specifying its location i, quite independent of
the array tion j most recently visited. An array representation of
strings is thus more powerful than a liiked-list representation, and so
arguably not suitable, since it could justify execution time estimates
for algorithms lower than those attainable using list processing. In
practice, algorithms on strings almost always begin either at the left
(position 1) or at tl le right (position n), and inspect adjacent
(successor or predecessor) positions one by one. On the other hand,
the output of string algorithms often specifies string positions, the
imp licit assumption being that the user can access these positions in
constant time. One way

8 Chapter 1. Properties of Strings to reconcile these different models


of string access is to suppose that strings are initially available as
linked-lists — so that their elements are accessible only one-by-one,
either left-to-right or right-to-left — but copied as they are input into
an array. This copying, if it were necessary, would require only 0(n)
time and Q(n) additional space, and so would not affect the
asymptotic complexity of any of the algorithms considered in this
book. We therefore adopt the rather odd convention that strings are
processed as linked-lists on input, but may in some cases be
regarded as arrays on output. We promise to alert the reader if ever
we deviate from this convention. ? Equality between strings is
defined in an obvious way. A string x of length n and a string y of
length m are said to be equal (written x = y) if and only if n = m
and x[i] = y[i] for every i = 1,..., n. Thus the empty string e is equal
only to itself. Note further that, by this definition, prepending or
appending e to a given string x does not change its value; thus we
may if we please write x = exe. Corresponding to any pair of
integers i and j that satisfy 1 < % < j < n, we may define a
substring x[i..j] of x as follows: We say that x [i. .j] occurs at
position i of x and that it has length j—i+l.lfj—i+l < n, then x [i. .j] is
called a proper substring of x. Two noteworthy kinds of substring are
x = x [1. .n] of length n, andx[i] = x[i..i] of length 1. For every pair
of integers i and j such that % > j, we adopt the convention that
ccfi.j] = e, a substring of length zero. As we have already seen,
since x = exe, e may be regarded as a substring of any element of
A*. Let k ? 1. .n be an integer, and consider positions ik = 1,2,..., n
— k +1 of x. Each of these n — k + 1 positions represents the
starting position of a substring x[ik-.ik + k — 1] of length k. Thus
every string of length n has n — k + 1 (not necessarily distinct)
substrings of length k. If u is a substring (respectively, proper
substring) of x, then x is said to be a superstring (respectively,
proper superstring) of u. Of course substrings and superstrings are
also strings. If A is ordered, we may use the substring notation to
define a corresponding induced order on the elements of A* called
lexicographic order — that is, dictionary order. More precisely,
suppose we are given two strings x = x[l..n] and y = y[l..m], where
n > 0, m > 0. We say that x < y (x is lexicographically less than y) if
and only if one of the following (mutually exclusive) conditions
holds: H n < m and cc[l..n] = j/[l..n] (as we shall see shortly, this is
the case in which x is a "proper prefix" of y); m x[l..i — 1] = y[l..i -
1] and x[i] < y[i] for some integer % G 1.. min{n, m} (this is the
case in which there is a first position i in which x and y differ). Then,
for example, using the order of the English alphabet: ¦ ab < abc
(because i = 2 = n<3 = m); ¦ e < a (because i = 0 = n<l = m); ¦
ab < aca (because i = 2 and* < c).

of has Linear Strings 9 Observe that this definition is valid also in


cases where one or both of n, m is infinite; that is, also for infinite
strings. Based on this definition, the other order relations (<, >, >)
are defined in the usual way: x < y if and only if x = y or x < y, x >
y if and only if y < x, x > y if and only if y < x. Writing x = uiu2 • • •
uk where the ui are nonempty substrings, i G l..k, is called
afcictorization or decomposition of cc into factors ui (see Section 1.4
and Chapter 6). Thus a factor is just a nonempty substring. There
are two special kinds of substring x[i..j] which are of particular
importance, and to which we give special names. For any integer ,7
G O..n, we say that x[l..j] is aprefix of x, sometimes written pref(cc);
if in fact j < n, we say that xfl.j] is a proper prefix of x. Similarly, for
any integer i G l..n + 1, we say that x[i..n] is a suffix of x, written
suff(cc), and a proper suffix if i > 1. Note that, in accordance with
the identity x = exe, these definitions allow us to include the empty
string e as both a proper prefix and a proper suffix :. Thus, for
example, the string / = abaababaabaab prefixes e, a, ab, aba, ...,/ =
abaababaabaab and suffixes e, 6, ab, aab,...,/. The proper prefixes
and suffixes of / are obtained simply by omitting / itself from these
lists. A concept whose value is not immediately apparent, but which
we will find to be useful in many different contexts, is that of a
"border". Definition 1.2.2 A border b ofx is any proper prefix ofx that
equals a suffix qfx. We see that, according to this definition, x
always has an empty border b = e of length -- 0, but that x itself is
not a border of x. In general, we use the symbol E to denote the
length |6| of b. Often we will be particularly interested in the longest
border, denoted by with length /?* = |6*|, where 0 < /?* < n — 1.
The string / introduced above has two nonempty borders: ab and
abaab. The string g = abaabaab has exactly the same two borders,
but observe that in this case the longest one, abaab, overlaps with
itself. Similarly, the string an has borders a, a2,... ,an~\ of which, for
i =¦. |~(n + l)/2], the borders a\ ai+1,..., a" overlap. As we shall
discover presently, overlapping borders are in fact characteristic of
strings that contain repetitive substrings: in the above example,
observe that g can be written in the form (abaJab = (aba)(aba)ab.

10 Chapter 1. Properties of Strings thus representing it as a string in


which two occurrences of aba are followed by a prefix of aba. We
now apply the idea of a border to generalize this observation and to
derive what we call a "normal form" for a given nonempty string x of
length n. Suppose that a border b and its length j3 have been
computed. (We shall see in Section 1.3 how to compute every
border of x in 6(n) time.) By Definition 1.2.2 it must be true that
x[l..p]=x[n-0 + l..n], A.1) from which we see that the quantity p =
n- C>l measures the displacement between positions of x that are
required to be equal. (Observe that the larger the value of C, the
smaller the value of p.) Thus, for every integer i e 1..0, it must be
true that x[i] = x[i+p]. A.2) In particular, if C = 0 (p = n), we see
that A.1) and A.2) are trivially true; while if 2C > n Bp < n), then x
must contain at least two equal adjacent substrings ai[L.p] and x\p
+ 1..2p]. More precisely, we see that x consists of \n/p\ identical
substrings, each of length p, followed by a possibly empty suffix of
length n mod p. Setting r = n/p and letting u = ai[l..p], we see that
the values p and r which we have derived from C permit us to
express any string x in the form * = uLrV, A.3) where u' = x [l..n —
[rjp] is a proper prefix (possibly empty) of u. Alternatively, we can
separate r into its integral and fractional parts by writing r = [rj+k/p
for some integer A; G O..p — 1. Then, interpreting uk/p = u[l..p]k/p
to mean simply tt[l..A;], we find that A.3) can be rewritten in the
compact form x = ur. A.4) We call p aperiod and r an exponent of cc.
The prefix u = x [1. .p] we call a generator of cc. Note that since
every string x has an empty border b = e, it therefore has a trivial
period p = n, a trivial exponent r = 1, and a trivial generator x.
Looking over the previous paragraph, we see that what has
essentially been done is to compute a period p = p{C) and a
corresponding exponent r = r(C) as functions of 13. It is clear that p
is monotone decreasing and r monotone increasing in j3. Therefore
with the choice C = C*, p achieves its minimum value p* and r its
maximum r*, the minimum period and the maximum exponent
respectively. Generally, the values p* and r* will be the ones we are
most interested in, and so, when there is no ambiguity, we will
simply refer to p* as the period and r* as the exponent. Similarly,
we refer to u = ai[l..p*] as the generator.

Linoar Strings 11 Definition 1.2.3 Let p* be the minimum period of x


= x[l..n], and let r* = n/p*, u = x[l..p*]. Then the decomposition x
= ur A.5) is called the normal form ofx. of Th stro peri som sen be
ide ino Tie normal form A.5) leads to a useful and important
taxonomy, or classification system, trings: if r* = 1, we say that x is
primitive; otherwise, x is periodic; if r* > 2, we say that x is strongly
periodic; if 1 < r* < 2, x is said to be weakly periodic; if r* > 2 is an
integer, we say that a; is a repetition (or, equivalently, that x is
repetitive); in the special cases that r* = 2 or 3, x is called a square
or a cube, respectively. ; we see that x must be either primitive or
periodic, and, if it is periodic, then one of igly periodic or weakly
periodic; further, if x is repetitive, then it must also be strongly 3dic.
Observe that r* > 2 if and only if x has a border of length 0 > n/2.
Here are e examples of these definitions: x = aaabaabab is primitive
(p* =n); f = abaababaabaab = (abaababa) (abaab) is weakly
periodic with period p* = 8, exponent r* = 13/8 and generator
abaababa; g = abaabaab = (abaJab is strongly periodic with period
p* = 3, exponent r* =8/3 and generator aba; x = (abL is repetitive
with period p* = 2, exponent r* = 4 and generator ab; x = (abcabdJ
is a square with period p* = 6 and generator abcabd. Ve remark
that the normal form A.5) is actually a kind of "intrinsic" pattern, in
the e of Part II of this book: every string x has the pattern called a
"normal form" that can ed to assign it its place in a taxonomy of
strings. We remark further that the simple introduced in this section
(e.g. border, primitive, period) will recur again and again ir
discussions of various string algorithms. Exercises 1.2 1. Try to
reconcile the definitions of linear string given in Sections 1.1 and L2;
that is, to prove that a linear string according to one definition is
necessarily a linear string according to the other.
12 Chapter 1. Properties of Strings 2. Suppose that an alphabet A
contains exactly a letters. Given some nonnegative integer n, how
many elements of A* have length n? How many have length at most
n? 3. It was remarked above that for A = {0,1}, A+ may be thought
of as including all the nonnegative integers. How many times is each
integer included? 4. Based on the definition of equality in strings, is
it true that e = e2? Justify your answer. 5. Using the definitions of
equality and lexicographic order, prove that for arbitrary strings x
and y on an ordered alphabet, x = y if and only ifxyty and y ft x. In
particular, demonstrate that this result holds in the case x = y = e,
and thus show that e is the unique lexicographically least element of
A*. 6. Based on the usual ordering of the English lower-case letters,
arrange the following strings in increasing lexicographical order:
abbae, abbba, abb, abc, a, e2, ab, aba, eb. 7. Prove that the
operator < as we have defined it satisfies transitivity; that is, that x
< y and y < z =» x < z. 8. Give an independent definition of the
order relation >, then use it together with the definition of < given
in the text to show that x > y if and only if y < x. 9. Given a string x
of length n, find the length of the following substrings of x, and state
the conditions on i and k for which your answer is valid: (a) x[i..i + k
- 1); (b) x[i-k + l..i]: (c) a:[i + l..fc-l]; (d) ea:[l.Jfc]. 10. What is the
maximum number of distinct substrings that there can possibly be in
a string x of length r>? Give an example of a string that attains this
maximum. Then characterize the set of strings of length n that
attains it. 11. In the preceding exercise there is an unstated
assumption that the alphabet size should be regarded as
unbounded. Vaiyi Sandor suggests that the question becomes more
interesting (and much more difficult) if the size a of the alphabet is
finite and fixed. What progress can you make with this apparently
unsolved research problem? Hint: Perhaps a good place to start is
with the following question: for given positive integers a. = \A\ and
k, what is the longest string on alphabet A that contains no substring
of length A; more than once? 12. Describe an algorithm that
computes all the distinct substrings in x. Establish your algorithm's
correctness and asymptotic complexity (try to achieve 0Gi2)).
Linear Strings 13. If y is a nonempty string of length m and k is a
positive integer, determine \x.\ as a function of m and k in each of
the following cases: (a) x = yk; (b) x = yM; (c) x = yM\ 14. What is
the length of the string x -= {ab)nu[ab)n-ln ¦ ¦ • (abJa[abW? 15. Are
the ideas of prefix, suffix and border defined for the empty string e?
16. Determine the longest border, the period, and the exponent of
each of the following strings, and hence classify each one as
primitive, strongly periodic, weakly periodic, or repetitive: (a)
abaababaabaababaababu: (b) abcabacabcbacbcacb; (c)
abcabdubcabdabcabri; Cd) 17. A string x of length n is said to be a
palindrome if it reads backwards the same a.s forwards; more
precisely, if for every I -- 1.2 !_n/^j' x'f. = xln "~L + J j- Jamie
Simpson believes every border of a palindrome is itself a palindrome.
Prove him right or wrong. Remark: To inspect some nontrivial
palindromes, you could consult the following URLs:
www.growndodo.com/wordplay/palindrcmes/dogseesada.html
complex.gmu.edu/neural/personnel/ernie/witty/palindromes.html 18.
Show that period and exponent are "well-defined"; in other words,
whcne\er x = uku' for some string u, some positive integer k, and a
proper prefix u' of //, that there exists a unique corresponding
border. 19. Show that if ur* is the normal form of as. then u is not a
repetition. (This fact becomes important in Section 2.3 when we
consider the encoding of the repetitions in a string.) 20. Consider
(ab)*' = aba. What is ({ab)'A- 2J7 Is it (abn«-? Or is it (ab)W- =
(n6J3? 21. Show that no string has two distinct primitive generators.
22. Can you find a way to compute the number of strings on {a, b\
that arc of length // and primitive?

14 Chapter 1. Properties of Strings Hint: Observe that for odd n, an


arbitrary string sc[l..n] can be formed from strings x [l..(n - l)/2] and
x [(n + l)/2..n], and for even n from strings x[l..n/2] and x[n/2 +
l..n]. If on due reflection you still have trouble with this exercise,
consult [GO81a.GO81b]. 23. The taxonomy of strings given above is
expressed in terms of values r*. Re-express it in terms of values of
the longest border /?*. 24. Observe that what we have called the
"normal form" of x should really more properly be called the "left"
normal form. That is, instead of writing x in the form uru', where u'
is a prefix of u, we might just as well have written x = v'vs, where v'
is a suffix of v. (a) Derive the right normal form of x. (b) Show that,
in the taxonomy of strings, the classification of x according to right
normal form is the same as it is according to left normal form. 25.
Can a necklace be a substring of a linear string? Can a linear string
be a substring of a necklace? 26. Draw a labelled tree which
represents the taxonomy of strings introduced in this section. 1.3
Periodicity Be not careless in deeds, nor confused in words, nor
rambling in thought. - Marcus Aurelius ANTONINUS A21-180),
Meditations VIII As we shall see in subsequent chapters, the
maximum border of a string x turns out to be a very useful quantity
in numerous contexts, not only as a means of classifying strings.
Therefore, in this section we take the time to show in detail how to
compute the maximum border; indeed, we show how to compute
the maximum border of every nonempty prefix of x, and to do so in
0(n) time. We then go on to prove an important lemma that
describes a fundamental periodicity property of strings. Of course
there is an obvious way to compute the length E* of the maximum
border b* of x. Begin by setting j3* <— 0. Then compare x[l] with
x[n]\ if equal, set E* <— 1. Next compare x[1..2] with x[n — l..n]; if
equal, set C* <— 2. Proceed in this way, adding one to the length of
the prefixes and suffixes compared at each stage, until finally x[l..n
— 1] is compared with x[2, n], with C* set to n - 1 in case of
equality. The final value of ft* is then the length of the longest
border b* of x. We do not present this algorithm formally because it
is inefficient: we leave as Exercise 1.3.1 the exact determination of
its asymptotic complexity.

Periodicity 15 i = Imagine that successive values of/?* are stored in


an array (or string!) f3[l..n]; then for 1, 2,..., n, /3[i] gives the length
of the longest border of a:[Li]. We call P[l..n] the border array of
jcfL.n]. Then f3[n] is the length of the longest border of x, the
quantity required to compute the normal form of x. We make the
following observations: P[l] = 0 (since e is the longest border of
jc[1..1]); for 2 < i < n, if x[l..i] has a border of length A; > 0, then
x[Li — 1] has a border of length A; — 1; thus, in particular, for 1 < i
< n - 1, j3[i + 1] < P[i] + 1; for 1 < i < n — 1, /3[i + 1] = f3[i] + 1
if and only if x[i + 1] = x [P[i] + ll (since P[i] + 1 is the position of x
immediately to the right of the prefix x [l../3[i]\ which is the longest
border of a;[Li]); if 6 is a border of x, and b' is a border of b, then b'
is a border of x. These observations, particularly the third and
fourth, suggest that it may be possible to compute /3[i + l]
from/3[l], /3[2],..., P[i]. The chief difficulty evidently arises when P[i]
> 0 but x[i +1] 7^ x [f3[i] +1]: in this case, it is necessary to look
at the second-longestborder of x[l.i]. If this border is empty, then
/3[i + 1] = 1 (if x[l] = x[i + 1]) or 0 (otherwise). If this border is not
empty, then, denoting its length by f32 [i] > 1, we will need to
compare x[i + 1] with a;[/92[i] + l]: if equal, then j3[i + 1] <- C2[i]
+ 1; if not, then we go on to consider the third-longest border, of
length C3[i], of a:[Li]. And so on, until finally for somt k,/3k[i] =0.
The argument made here will be easier to follow in terms of an
example. Suppose that x = abaababa*, where the * indicates that
the 9th letter is not as yet known. The border array of x is then
00112323?, where the ? indicates that the 9th entry remains to be
computed. Observe that, for i = 8, /3[i] = 3. If in fact it turns out
that aj[9] = x[i + 1] = x[/3[i] + l] = x[4] = a, then Thu j = is a (j -
long we immediately have j3[9] = 4. If not, however, then we must
consider the second border of x[1..8]; that is, /32[8] =/3[/3[8]]
=/3[3] = 1. jif aj[8 + 1] = x[f32[8] + l] = x[2] = 6, we have /3[9] =
2. But if x[9] is neither a nor b, and instead turns out to be a new
letter c, we conclude that /3[9] = 0. general, consider the quantities
/3 J'[i], j = 1,2,..., A;, representing the jth-longest borcers of x[l..i].
(Here we take f3x[i] = p[i] and pk\i] = 0.) Since for every 2,3,..., k,
p j [i] < p j~1[i], it follows that border of x [l.-P^1^]]. That is, the
jth-longest border of a:[Li] is a border of the ) [P^]]jg [] l)th-longest
border of xfl.i], and, in particular, it is the longestborder of the (j -
l)th- est border of a;[Li]! In symbols, &j[i\=P[Pj-1\i]]. (L6)

16 Chapter 1. Properties of Strings If x[i + 1] 7^ x [0 i~x [i] + l],


then we can determine the next position to be tested against x [i
+1] by computing 0 j [i] according to A.6), so that the next test
would compare x[i +1] against x [0 J [i] + l]. It will be useful to
summarize these observations in a lemma: Lemma For some integer
n > 1, let x = x[l..n] be a string with border array 0 = 1.3.1 Let k be
the least integer such that /3k[n] = 0. Then (a) for every integer j e
1.. A;, x [ 1.. n] has a border x [ 1.. f3 *'[n] ]; (b) for any choice of
letter X, the only possible borders ofx[l..n + 1] = jc[l..n] A are those
whose lengths are members of the descending sequence > A.7) ?
Equation A.6) leads to an efficient algorithm for the computation of
the array which contains the lengths of the longest borders of each
of the prefixes x[l..«], i = 1,2,..., n, of the given string x. This is
presented as Algorithm 1.3.1. As we shall see in Chapter 7, all of
these lengths, not only f3[n], are potentially useful when a; is a
specific pattern to be matched against some other string: they are
used to determine how far to shift the pattern along the other string
in case of a mismatch or "failure". For this reason the computation of
/3[l..n] has sometimes been called the failure function algorithm
[AHU74]. Algorithm 1.3.1 (Border Array) - Compute the border array
C ofx [1. .n] r- 0 for i«— 1 to n — 1 do b <- 0[i] while b > 0 and x[i
+ 1] ^ as[6 + 1] do b <- {3[b} if as[iH-1] =a;[6+l] then 0[i + 1] <-
b + 1 else 0[i + 1] <- 0 Theorem 1.3.2 Algoritlm 1.3.1 correctly
computes 0[l..n]. Proof First consider the while loop: this loop
handles the computation A.6), replacing b = 0? 1 [i] by b = 03 [i].
Thus b is monotone decreasing and so the while loop must
terminate. Since

Periodicity 17 the for loop terminates when i = n, it follows that


Algorithm 1.3.1 terminates after n - 1 steps. Let us consider the
while loop further. Exit from this loop occurs if either b = 0 or x[i f 1]
= x[b + 1], or both. If aj[i + 1] = x[b + 1], we should set /3[i + 1]
<- 6 + 1 regardless of whether or not 6 = 0; otherwise, it is clear
that we must set j3[i +1] -(-0. Thus the if structure in the algorithm
deals correctly with the result of the while loop: it selects the
greatest possible length from the sequence A.7). We conclude that
Algorithm 1.3.1 does indeed correctly compute j3[i + 1] for every i e
0..n - 1. ? Now we consider the time required by the algorithm. All
the steps within the for loop, except possibly the while loop, require
only constant time. Then Algorithm 1.3.1 requires Q(n) time plus the
total time used within the while loop. To estimate this total time,
consider the values that b assumes during the execution of the
algorithm: b is initially zero, and can be increased in value only by
one, and only by the assignment 0[i + 1] <- b + 1 at the end of an
iteration for i, followed by the assignment at the beginning of the
iteration for i + 1. Thus b can be incremented by one at most n — 2
times, and each such incrementation uses up one iteration of the for
loop. At the same time, the only way that b can be decreased is in
the while loop: since the number of decrements (always by at least
one) cannot be greater than the number of increments (always by
exactly one), it follows that the assignment statement b «— /3[b]
within this loop can be executed no mors than n — 2 times in total.
Hence Theorem 1.3.3 Algorithm 1.3.1 requires &{n) time and
constant additional space for its execution. U Digression 1.3.1 This is
the first time of many in this book that we have occasion to discuss
the asymptotic space and time complexity of an algorithm. There are
a couple of important points to be made about the assumptions that
underlie the analysis leading to jrem 1.3.3: Consider the integer
values i and b of Algorithm 1.3.1, as well as those stored in the
border array f3: each of these values may be as large as n - 1, and
so for each one |~log2 n] bits need to be reserved for storage.
Thus, strictly speaking, the storage requirement, therefore the
processing requirement, for an element of f3 is actually log2 n/w G
O(logn), where w is the word length of the computer. This is not
constant at all!

18 Chapter 1. Properties of Strings Based on this analysis, the space


required for i and b is 0(log n), while that for C is O(n log n), and
the overall processing time becomes O(n log n) rather than 6 (n).
We resolve this difficulty here, and throughout this book, by making
the assumption that the size of the problem considered is not
gigantic, that it is subject to some reasonable bound. Thus, if n is
the problem size, we are supposing that there always exists a
(small) constant A; such that log2 n/w < k. Indeed, in any practical
context, we will not go far wrong if we assume that A; = 1: in a
modern computer, w > 32, and so log2 n/w < 1 provided n < 232 «
4.3 x 109. If we are dissatisfied with this limitation, we may instead
luxuriate in the supposition that A; = 2, hence that n < 264 « 1.8 x
1019, certainly large enough to deal with any normal person's string
processing requirements. In fact, my pocket calculator tells me that
just to scan 1.8 x 1019 computer words (string elements) at a rate
of one billion per second will require 570 years — it is not we, but
rather our distant descendants, who will finally get to the end of
such a string! That being said, I am still not unsympathetic to the
purist who multiplies occur- occurrences of n in the time and space
complexities quoted in this book by a log n factor. In some sense the
factor should "really" be there: it is only in deference to ordinary
practical considerations that it is omitted. The second point to be
made relates to the sets O, fl, and 6 used throughout this book to
characterize the asymptotic time and space requirements of
algorithms. We suppose that the reader is familiar with the
definitions of these sets; if not, the seminal reference is [K76], and
[RS92] may also be found useful. But the story does not end there.
When O(n), for example, is used to describe a property of an
algorithm, an intellectual leap is made that requires justification. If
we say execution time G O(n), we treat the execution time of the
algorithm as if it were a function of the size n of the problem
instance, when of course it is no such thing — it is instead a function
of the problem instance itself. As a rule there will be a large number,
in fact usually an infinite number, of problem instances
corresponding to any given problem size n. Nevertheless, the
approach adopted in this book will be to treat both execution space
and execution time as if'they were functions of problem size. Thus,
for example, to say that execution time G 6 (n) A.8)

Periodicity 19 will mean that, over all problem instances whose size
exceeds some fixed value no, the algorithm requires time at least kin
and at most k2n, where k\ and fo are also fixed. By contrast, the
statement execution space e 0{n) makes the considerably weaker
assertion that, again over all problem instances of size greater than
no, the algorithm requires space at most k2n. Thus, in this case, it is
possible that there may exist problem instances whose space
requirement is proportional to a quantity, perhaps log n or s/n or n/
log n or even 1, that is asymp- asymptotically less than n; in fact, it
is even possible that all problem instances have space requirement
proportional to such a quantity. Throughout this book, we will use 0
to describe algorithmic properties only when the strong condition
illustrated in A.8) is satisfied. Usually, when we use O(f(n))
(respectively, f2(/(n))), we will mean that there do exist collections
of problem instances for which the property is in fact exactly of
order f{n), but that also there exist other collections of problem
instances for which the property is asymptotically less than
(respectively, greater than) order f{n). We have seen that Algorithm
1.3.1 requires 0(n) time for its execution over all problem instances;
since any algorithm that computed the border array would need to
access each of n positions at least once, we are therefore justified in
describing Algorithm 1.3.1 as asymp- asymptotically optimal: no
algorithm could compute /3[l..n] in less than 0(n) time and constant
;e for any problem instance of size n. As discussed in Section 4.1,
Algorithm 1.3.1 is in a certain sense an on-line algorithm, yielding at
each step a result for the current spai alsc position i as the given
string is processed from left to right. Thus, without backtracking, the
algorithm yields a result, not only for the given string, but also for
every prefix of it. However, in a stricter sense, Algorithm 1.3.1 would
not be described as on-line: since any position i' < i in x may need
to be visited in order to assist in the calculation for i, the entire
string needs to be available for processing at every step of the
calculation. As an example of a border calculation, suppose that the
string 1 2 3 4 5 6 7 8 9 10 11 12 13 f = abaababaab a a b,
introduced in Section 1.2, is given. Then the border array /3[l..n] of
/ is 0011232345645. Observe how, in a string such as / with many
repetitive substrings, the values in the border am.y may fall back
and then rise again. For example, /3[6..8] = 323 and /3[11..13] =
645. The way in which the border array C of a given string x is
calculated makes it clear that all of the borders of x (indeed, of
every prefix of x) can be computed from /3. This follows from the
observation stated in A.6) that the second-longest border of x[l..i] is
the longest border of x [l../3[i\]. We state this important result as a
theorem:

20 Chapter 1. Properties of Strings Theorem 1.3.4 The border array


E of any string x contains all the information required to compute all
borders, periods and exponents of any nonempty prefix ofx. ?
Digression 1.3.2 The border array is the first of several string data
structures that report in linear space information that may in fact be
supralinear. To see this, consider the example string /. The borders
reported by /3[l..n] in this case are as follows: i 13 12 11 10 border
5, 4, 6, 5, 2,0 1,0 3,1,0 2,0 The point being made here is that the
user must agree to accept the array C as a compact representation
of the information provided perhaps more conveniently or more
explicitly in the above table. If indeed he or she were to insist that
we produce the table, then as much as 6(n log n) processing time
could be required, and so the border array computation would no
longer be linear. ? We conclude this section with a result that
identifies an important arithmetic relationship between distinct
periods of a string. This result is a consequence of, inter alia, the
properties of a border array. Theorem 1.3.5 ("The Periodicity
Lemma" [FW65, LS62]) Let p and q be two periods ofx = x[l..n], and
let d = gcd(//,</). Ifp + q < n + d, then d is also a period of x.
Proof The proof of this theorem is interesting. It depends on
reducing the conditions as they are stated based on parameters (d,
p, q, n) to an equivalent set of conditions based on parameters
(d,p,q — p,n — p). This reduction mimics the iterative reduction
contained in the Euclidean algorithm [K73a] for the calculation of
gcd(p, q), and in fact the validity of the proof depends upon the
correctness of the Euclidean algorithm. Recall that for q > p this
algorithm computes d based on the reduction d = gcd{p,q) =
gcd(p,g-p), A.9) terminating after a finite number of steps with d =
gcd(d, d). Let H(d,p, q, n) denote the hypothesis of the theorem as
it is stated; that is, that p and q are periods of aj[l..n] with d =
gcd(p, q) and p + q < n + d. Without loss of generality we suppose
that q > p. (If q = p the result holds trivially.) Hence H(d,p, q — p,n
— p)
Periodicity 21 demotes the hypothesis that p and q - p are periods of
x[l..n - p] with d = gcd(p, q — p) andj q < n — p + d. We shall
show that H(d,p,q,n)=>H(d,p,q-p,n-p), thus allowing us to replace
the hypothesis related to aj[l..n] by an analogous hypothesis related
to the shorter string x[l..n - p]. We imagine this reduction being
carried out a finite number of times until a hypothesis H(d, d, d, n')
is reached for which it is trivially true that d is a period of aj[l..n'].
We then show that d is a period of x[l..n — p] if and only if d is a
period of x; since d is a period of aj[l..n'], it therefore follows that d
is a period of x. We now suppose that H(d, p, q, n) holds. Let E\ = n
- p and /32 = n - q be the lengths of the borders 61 and 62
corresponding to p and q, respectively. Then as we have seen in
Lemma 1.3.1, 62 is necessarily a border of 61; in other words,
jc[1../3i] = x[l..n — p] has border aj[l../32], hence period
Furthermore, note that since d = gcd(p, q), it must therefore be true
that d < q — p. Then by hypothesis P < q — d <n — p, A.10) so
that, since x has period p, x[l..n — p] must also have period p. Thus
we have so far shown that x[l..n — p] has periods p and q — p. By
A.9) we know that d = gcd(p, q — p), since p + q<n + d if and only
if and we obs d see that H(d,p, g, n) implies H(d,p, q — p,n—p). t
remains to show that d is a period of x if and only if it is a period of
x[l..n — p]. Clearly if d is a period of x, it is also a period of aj[l..n -
p). To establish the converse, erve that if d is a period of a;[l..n —
p], it is also by A.10) a period of aj[l..p]; and since must divide p, it
follows that a repetition unless in fact d = p. Thus, recalling that
x[l..n—p] is a border of x, we find thai d is a period of as 1 ..p] =
x[l..d]p/d, x[l..d]p/dx[l..n -p]= x[l..p]x\p = x, quired. Thus the
reduced string x[l..n—p] has d as a period if and only if d is a period
of the original string x. It remains to remark that, by the correctness
of the Euclidean algorithm, we arrive after a finite number of such
reductions at the reduced hypothesis H(d, d, d,n'), for some integer
n' e d..n — p, where d is trivially a period of aj[l..n'], hence of x. ?

22 Chapter 1. Properties of Strings In Section 1.4 we shall see one


of the many applications of this result. As an example of the
Periodicity Lemma, consider the string x = abaaabaaabaaabaaab of
length n = 18 and periods q = 12, p = 8: since d = gcd(p, g) = 4
and p + q = 20 < n + d = 22, we conclude that d = 4 is also a
period of x. To see that sometimes the Periodicity Lemma holds even
though the conditions for it are not satisfied, consider the related
example y = abaaabaaabaaab of length n = 14, also with periods q
= 12 and p = 8: even though in this case p + q = 20 > 18 = n + d,
nevertheless d = 4 is a period of y. However, to see that the
condition p + q <n + dis required in Theorem 1.3.5, consider the
example z = abaaba of length n = 6 with periods q = 5 and p = 3:
in this case d = gcdC,5) = 1 is nof a period of z, and p + q = 8 = n
+ d+l. The Periodicity Lemma has recently been extended to apply
to strings with three periods [CMR99]. Exercises 1.3 1. Determine
the asymptotic complexity of the "obvious" algorithm for finding the
longest border of a given string x. Give an example of a string of
arbitrary length n which gives rise to the algorithm's worst-case
behaviour. In the worst case, exactly how many letter comparisons
are required between the individual elements of <c? 2. Since our
objective is to compute the longest border of x, we might well have
thought a little more about the problem in order to come up with a
somewhat more efficient algorithm that tested possible borders in
descending rather than ascending order of length. Present such an
algorithm and determine its asymptotic complexity. In which cases is
its time requirement the same as that of the obvious algorithm? In
which cases does it execute in time linear in the string length? 3. If
f3 is thought of as a string, on what alphabet is it defined? 4. Prove
the statement made in the text that if 6 is a border of x, and b' is a
border of b, then b' is a border of x. 5. Prove the following lemma
(due to Jiandong Jiang): If x — bx'b, where b is primitive, x' is either
empty or primitive, and x' ¦? b, then b is the only nonempty border
of a:. 6. We have regarded Lemma 1.3.1 as proved even though it
has not been clearly demonstrated that there exists no border of
x[l..n] of length other than Cj[n\, j — 1,2,..., k. Complete the proof.

Periodicity 23 7. Software engineers seek to apply formal methods to


establish the correctness of software. In particular, for loops they
define the following: 8 A loop invariant is a Boolean condition
Another random document with
no related content on Scribd:
“I am going southward for a few days, to visit two or three places
further down the coast. When I come back I shall call at the Abbey to
see you: will you make me welcome for an hour?”
“Indeed I would if I might, if I could,” said she mournfully. “But I
don’t feel that I am the real mistress there; there are Crispin and his
wife.”
Her friend frowned and spoke with kindly impatience.
“I can’t bear to think of your having to put up with the
companionship and protection of those people! I shall find out your
guardian—you must have some guardian, and get him to send you
back to the convent, at least for a little while, since that seems to be
your ideal of happiness.”
“My ideal of happiness!” echoed Freda wonderingly.
“Yes, you said so the other day at the ‘Barley Mow.’ ”
“Did I!” said the girl, blushing.
“Yes, you did. Now, I suppose, it is something else.”
She hung her head.
“Some young fellow has been talking to you!”
Freda gave him a glance of terror. How horribly shrewd he was, to
touch at once upon a kind of secret she hardly knew herself yet! She
would admit nothing, yet she was afraid to be silent. He might
blunder upon some other sensitive truth if she did not speak. So she
evaded the point.
“You seem here in England,” she began proudly, “to think that
there is only one subject which can interest a girl!”
“Quite true. Everywhere else it is the same. There is only one. I
don’t want to force your confidence, but I know that you stayed at
Oldcastle Farm on the night of the journey.”
It seemed to Freda that an expression of disappointment crossed
Mr. Thurley’s face when she made no answer to this, and the next
moment he seemed suddenly in a great hurry to be off. Shaking her
hand heartily in both his, he uttered a number of good wishes, and
questions about her welfare with a bluff sincerity of interest which
touched her. She watched him as he went down the steep
churchyard without one look behind him, and the tears came into her
eyes as she felt that here was a friend, none the less real for being a
new acquaintance, going away.
Freda felt almost like a prisoner coming of his own accord back to
the confinement from which he had escaped, as she pulled the
lodge-bell and passed through the iron gates. Mrs. Bean, who was
probably on the lookout, heard the loud clang, and was ready to
open the inner gate. She did not seem in very good humour.
“You have been a long time talking with your gentleman friend,”
she said coldly. “I didn’t know those were convent manners, to
encourage every man who chooses to cast sheep’s-eyes at one!”
Poor Freda entered the dining-room thoroughly heart-sick and
disgusted. Why did they say those coarse things to her, and about
people she liked too! She felt so miserable that, instead of trying to
eat, she sat down on the hearth-rug and cried, with her head on a
chair.
Presently Crispin looked in at the window, and coming round to the
door of the room, opened it and peeped in.
“What’s the matter?” asked he.
Freda sprang from the floor, but refused to give any other
explanation than that she was tired, and had stood talking in the
churchyard.
“Talking! Who to?”
“To the gentleman who was kind to me in the train. Mrs. Bean,
doesn’t seem to think it was right of me to talk to him; but he was
very kind.”
Crispin said nothing to this, but persuaded her to eat her dinner,
waiting upon her himself. When she had finished, and he was
making up the fire for her, she suddenly addressed him.
“Crispin,” she said, “I want to ask you a question. There is a thing
which some people call Free Trade, and other people call smuggling.
Which do you call it?”
Crispin, who was holding the poker in his hand, stopped short in
his work, and remained for a few seconds quite still, without looking
at her. Then he answered in a very quiet manner, and went on
making up the fire.
“Smuggling, of course. And, what did your friend of the journey call
it?”
He suddenly turned as he spoke, and under the piercing gaze
which he directed upon her, Freda fancied that all her little girlish
fancies and secrets were laid bare to his eyes.
“He called it smuggling too,” she answered.
“And what was his name?”
Freda hesitated. Such a hard, disagreeable tone seemed suddenly
to be heard in Crispin’s voice. He repeated the question.
“His name is John Thurley.”
Without asking her any more questions, seeming, in fact, to
become suddenly unconscious of her presence, Crispin abruptly left
her to herself.
CHAPTER XVI.
Freda Mulgrave had come face to face with the most difficult
problem of conduct she had ever encountered. There was now no
shirking the fact that her father was the organiser and head of a
band of men who carried on smuggling in a systematic and
determined manner. It was evident too that, if occasion came, they
were quite as ready for still guiltier exploits as their fore-runners of a
by-gone time. Whether, as she feared with a sickly horror, it was her
father who had shot Blewitt, or whether the servant had been
murdered by some one else, it was clear that his death was
connected with the nefarious enterprises in which the whole country-
side seemed to be so deeply engaged. She passed a miserable
night, awake for a great part of the time, fancying she heard in the
many night-noises of the old house, voices and footsteps, cries and
even blows.
Next morning she wrote a long letter to Sister Agnes, saying that
she had been left alone in a position of great difficulty, and asking for
the prayers of all her old friends at the convent that she might do
what was right.
Mrs. Bean, who came in while she was directing the envelope,
offered to take it to the post, and Freda, with a reluctance of which
she felt ashamed, gave it into her keeping.
Then for ten days the poor child lived on the daily hope and
expectation of an answer.
During all that time she never once saw Crispin, and although she
two or three times tried to break through the ice of Nell’s reticence,
she always failed. For blank, deaf, impervious stolidity, and an
ignorance of everything outside her kitchen which approached the
admirable, Nell could never have had an equal. Crispin was away on
business. This was the most Freda could learn from her.
So the dull days passed, the wished-for letter never coming. For
the first two days the snow remained thick on the ground, and when
it began to melt the roads were in such a bad state that it was still
impossible for Freda to go out. Nell unlocked the library and made a
fire there. And in this old room, with its quaintly moulded ceiling, its
rows upon rows of musty-smelling books, its dust and its cobwebs,
the young girl passed her time, diving for the most part in records of
the county, of ancient priory and dismantled castle. Her flesh would
creep and her breath come fast as she read of lawless deeds in the
time past, and thought that even while she read, acts just as illegal, if
not as daring, might be taking place under the very roof which
sheltered her.
At the end of the ten days, however, it seemed to Freda one
morning that the patches of green on the snow-covered fields had
grown much wider; and she said, first to herself and then to Nell, that
the roads, if not yet clear, must now be passable to and from the
town. Mrs. Bean looked at her out of the corners of her eyes.
“What you, coming from a walled-up convent, can want with walks,
is more than I can understand. However, you can go over the ruins if
you like.”
And Nell unlocked a side-door in the wall of the garden which
admitted her into the meadow in which the Abbey-church stood.
“You’ll be safe there,” said Nell, half to herself, as Freda passed
through. “You can’t do any worse harm than getting your feet wet,
and that’s your own fault.”
“Safe! Of course I shall be safe!” laughed Freda.
But it occurred to her, as she turned and noted Nell’s furtive glance
at her, that it was not with her personal safety that the housekeeper
was concerned.
Freda cared little for this; she was half-crazy with the joy of being
again by herself in the open air; and the ruins of the old church, as
they rose above her in their worn majesty against the morning sky,
filled her with delight and awe. She was approaching the old pile
from the southwest, the quarter in which least of the building
remained. Scarcely a trace was left of the south aisle or the south
transept. Between the ruined west front and the pillars on the south
side of the choir there was nothing left but grass-grown mounds of
fallen masonry and one solitary pillar, massive and erect as when,
seven hundred years ago, pious hands placed the stones which
were to defy, through long centuries, the biting sea air, the keen
north wind, the storms which beat upon the cliffs, and the waves
which, decade by decade, had sapped and swallowed up, bit by bit,
the once fertile Abbey lands. Nearer to the cliff’s edge now than in its
prime, the dismantled church still filled one of its old offices, and
formed, with its lofty choir and mouldering pinnacles, a landmark
from the sea.
Freda began to cry as she stole reverently into the roofless choir.
She had had no opportunity, in her secluded life, of visiting ruins as
showplaces; to her this was still a church, as holy as when the
monks kept watch before the altar. A sentiment of peace entered into
her for the first time since her arrival in England as she wandered
about, not heeding the fall of melting snow on her head and
shoulders, and listened to the shriek of the sea-birds as they
wheeled in the air above. She thought she had never seen anything
so beautiful as the graceful succession of pointed arches, with their
clustered shafts, and the triforium above, with the long-hidden
beauties of its carving now exposed to the light of day. Time had
mellowed the tint of the walls to a soft grey, deepening here and
there into red. Crowned kings, winged angels, stern-faced saints still
looked out to sea from the north side, with eager necks outstretched,
all the deep meaning the old monkish sculptors knew how to express
in stone still to be discerned in their weatherworn outlines. The gulls
perched upon them; in summer the wallflowers grew about them; but
still they kept watch and ward until, one by one, by storm and stress
of weather they were loosened in their places, and fell, sentinels who
had done their work, into the long grass underneath.
The north transept was still almost entire. An arcade ran round the
lower part of the wall, and in one of the arches was an old pointed
wooden door, leading by a circular staircase of steep steps, to the
passages in the walls above. This door was locked. Yet it must still
be used, thought Freda. For she noticed that the grass was worn
away before it, and that a narrow track had been beaten thence as
far as one of the windows on the north side of the nave. Here a gap
had evidently been intentionally made in the stone, and looking
through, Freda perceived that the foot-track went through the
meadow outside as far as the stone wall which bordered the road.
As she was looking at this path, she caught sight of two young
men on horseback whom, little as she could see of them above the
stone wall, she at once recognised. They were Robert and Richard
Heritage. Both saw her, raised their hats, and reined in their horses.
Freda pretended not to see them, yet she was conscious of a
great uplifting of the heart when they dismounted, tied their horses
up in the yard of a dismantled cottage at the other side of the road,
and climbing over the stone wall with the agility of cats, came along
the foot-path towards her.
“They have used that foot-path before,” thought Freda.
CHAPTER XVII.
To Freda’s perhaps rather prejudiced mind, the contrast between the
two cousins seemed even stronger than when she had seen them a
fortnight before at their own home. The fact that both were evidently
harassed and anxious only emphasised the difference between
them; for while Robert looked savage and sullen even under the
smile with which he approached her, Dick seemed to Freda’s shy
eyes to look haggard, downcast and depressed to an extent which
sent a pang through her heart.
Robert came first, cracking his riding-whip and singing, and
assuming a jauntiness belied by the expression of his face. He
raised his hat again as he came through the ruined window, and
greeted Freda with much deference. He made a feint of holding out
his hand, but the young lady took no notice of it.
“I am afraid,” began he, in a deprecating tone, “that our
acquaintance did not begin in the most auspicious possible manner,
Miss Mulgrave.”
“No, and I did not expect to see you again.”
Freda was far too unsophisticated to be otherwise than cruelly
direct of speech. Robert Heritage, however, was not easily
disconcerted.
“But if the reason of my daring to appear before you again is to
make my peace in the humblest manner?”
“There is no need to be humble to me. You said so the last time I
saw you.”
“Pray forget everything I said then, and let us begin afresh. I had
had a good deal of worry that day, and I spoke to you under a
misapprehension.”
“I would rather have you remain under it, and not speak to me
again.”
“You are very unforgiving.”
Freda hung her head. They used to tell her that at the convent. It
was true too, she felt. She had never been able to humble herself to
docile obedience—to the doctrine of forgiveness of enemies. Nothing
could be wrong in those she loved, nothing right in those she did not
love. And she did not love Robert Heritage. Guiltily, therefore, she
said, after a minute’s pause:
“I will hear what you have to say.”
Robert made a grimace to his cousin, to imply that this
insignificant little girl was giving herself great airs. As for Dick, Freda
had steadily avoided meeting his eyes, and he stood in the
background, silently watching the flying sea-mews, without taking
any active part in this interview.
“In the first place,” said Robert, still with a great show of
deference, “I came—my cousin and I came, to express our regrets at
your sad bereavement, at your father’s death, in fact.”
He looked at her rather curiously. Freda blushed.
“Thank you,” she said hurriedly.
“Yes,” he went on slowly, “we were very much shocked to hear
about it, and very much surprised too. For I was just coming over
here to inquire if Captain Mulgrave could tell me what had become of
a servant of mine, a man you saw at our house, Miss Mulgrave;
Blewitt, I dare say you remember him?”
“Yes, I do,” answered Freda, who had grown very pale.
“I sent him over here with a letter, a message to your father. From
that day to this he has never been seen, and we have been unable
to get any tidings of him. In the meantime comes the news of
Captain Mulgrave’s having committed suicide. Under the
circumstances, your father being known as a violent man, and the
message being an unwelcome one, it was impossible to help
thinking that the two events might have some connection with each
other.”
“Well,” said Freda slowly, “but as both Blewitt and my father are—
gone, I don’t see how the truth is ever to be found now; unless,
indeed, the person who knows most about it should confess.”
Robert’s face flushed a little.
“I am afraid it will be difficult to clear your father’s name from
suspicion. Already I’ve heard these ugly rumors whispered about
everywhere. Nothing would set them at rest, unless I were to say
that I myself had sent Blewitt away to his home in London.”
“That would not be true.”
“But it would save your father’s reputation.”
Freda said nothing. Her mistrust of this man made her shrewd.
After a long pause she turned and looked straight into his face.
“Why do you tell me this?”
“I wanted to know whether you would care to have your father’s
name cleared.”
“Not in such a way as that. I believe the best thing for my poor
father would be for the whole truth to come out, and though the
falsehood might seem to protect his name for the time, it would do
less real good than quietly waiting.”
“Then you wouldn’t do me any little favour, out of gratitude if I tried
to shield his name?”
“Little favour! Oh! and what is that?”
“For instance, you wouldn’t get Crispin Bean to deal with us
instead of with Josiah Kemm?”
“No!” flashed out the girl, “neither with you, nor Kemm, nor
anybody else. The Abbey’s mine now, and I won’t have it used for
smuggling, Mr. Heritage.”
Robert started violently, and his hand shook as he played with his
riding-whip.
“You are ready to accuse your own father of doing wrong then?”
“I don’t make any accusations, Mr. Heritage. I only tell you that the
Abbey is under my rule, now.”
“You think so, perhaps; but you will find yourself mistaken. The
trade will go on just the same whatever orders you may give; and it
will make no difference if I have to go away, and if my cousin Dick,
who brought you in out of the snow and was so good to you, has to
starve.”
Freda moved uneasily and shot a furtive glance at Dick, who was
outside the old walls, apparently absorbed in unpleasant thoughts.
Robert perceived the expression on the girl’s face, its coy pity and
maidenly fear. This vein, so happily struck, would bear a little further
working, he thought.
“Yes,” he went on. “Poor Dick! It has always been his lot to have a
rough time of it. When he told me this morning of the impression you
had made upon him, and asked me to put in a word for him with you
if I got a chance, I knew it would be of no use. Not that he isn’t a
good-looking, good-hearted fellow enough, but because he is Dick,
and never has any luck!”
The girl’s face underwent many changes as she listened to this
speech. Compassion, surprise, pleasure, confusion, annoyance—all
flitted over her ingenious countenance, until at the end, suddenly
perceiving that Robert’s small light eyes were fixed upon her with
great intentness, she blushed and turned away from him even
haughtily.
“I do not believe that he asked you to speak to me!” she said.
“You don’t? Well, I’ll fetch him and make him speak for himself.”
“No, no, no,” cried the girl, crimson with confusion and distress. “I
am going indoors. I—I am tired, cold. Good-morning, Mr. Heritage.”
While Freda was crossing the meadow which lay between the ruin
and the Abbey-house, she saw Nell at an upper window, watching
her with an uneasy expression of face; by the time she reached the
side-door, the housekeeper was there to admit her.
“Who was that I saw you talking to up there in the ruins?” asked
Nell sharply. “Come, I know, for I saw you.”
“Why do you ask me then?”
“After all the trouble I’ve taken too, to prevent those young rascals
getting at you! Why, they’ve been pulling the bell nearly off every day
and sometimes twice a day.”
“Oh, they’ve been to see me before then?”
“Yes, at least Bob Heritage has, and everybody knows what a nice
acquaintance he is for a young girl! But they won’t see any more of
you, if I can help it. A pretty mess I should get myself into if they did!”
Freda passed into the house and, without waiting for another
word, went straight into the library, which was in the west wing, away
from the rest of the inhabited part. The fire was burning very low, and
the room looked cold, dusty and forlorn. A great pile of the books
with which she had been amusing herself the night before still lay
undisturbed on the hearth-rug. The books had almost become living
friends to her, in the absence of sympathetic human beings. She
threw herself down beside them and rested her arms on a stack of
calf-bound histories and biographies.
What had Robert Heritage meant by those words about the
“impression,” she had made on Dick, and “putting in a good word for
him.” Innocent as she was, Freda could scarcely misunderstand the
drift of these expressions, and they roused a thought which brought
the blood to her cheeks, all alone as she was, and stirred her
strangely. She did not believe Robert; who was she, a little lame girl,
to rouse any deep interest in a big, strong, handsome man like Dick?
And with a sigh, the girl sat up among her books and tried to stir the
log fire into a blaze.
As she did so, a loud knocking on the wall behind her made her
look round. The whole of the side of the room from which the sound
came was filled with book-shelves from floor to ceiling. The knocking
went on, until suddenly Freda saw some of the books begin to shake
in a surprising manner, and a minute later six rows of books began to
move slowly forward, and then a face peered out from behind them.
It was that of Dick Heritage. Then she perceived that the books
which he had appeared to disturb were sham ones, mere leather
backs pasted on a door introduced among the genuine ones.
“How did you come in?” asked Freda in a husky whisper.
“By a way you don’t know of,” answered the young fellow, looking
at his riding-whip.
“You came in to see me?” asked Freda in a softer tone.
“Yes,” said Dick, suddenly standing erect, speaking in a full, firm
voice, and looking straight up at the dusty ceiling with flashing blue
eyes, “I came to see you, to speak to you about what that rascal Bob
said. He told you something about me, didn’t he? He made up some
ridiculous nonsense that I’d said about you?”
Freda, with her little head bending lower and lower, nodded an
affirmative very slowly.
“Well, there wasn’t a word of truth in it. I never said anything of the
sort. He only said it to serve his own interests. I was obliged to come
and tell you the moment he confessed to me what he’d done. I didn’t
wish you to think me a fool or a knave.”
Freda did not answer. When at last, after a long pause, Dick
glanced at her, he perceived that she was quietly crying. Dick looked
closer, in surprise and consternation.
“You’re not crying, are you?” asked he uneasily.
Freda shook her head. Rising from her chair, she picked up an
armful of the books that were scattered about the floor, and carrying
them back to the shelves, began to replace them very deliberately.
Dick, putting down his whip, followed with another load, which she
took from him so hastily and awkwardly that they all dropped on the
floor.
“I hope it’s not anything in what I said, or the way I said it, that
made you cry?”
He had gone down on one knee to pick up the fallen books, and
he looked up into her face with an expression which seemed to
Freda most touching.
“I am not crying, Mr. Heritage,” she said, trying to be very dignified;
“and I quite understand that you were not so foolish as to say that I
had made a pleasant impression on you.”
Dick dropped the books, and looked up at her with curiosity,
compassion, and a little admiration. For although her eyes and nose
were red with crying, she looked rather pretty as well as very pitiful.
“Oh,” he said, laughing with some embarrassment, “it’s not fair to
put it like that now, is it?”
“That is all that your cousin said to me about you.”
“No! Really? He told me that he said, implied rather, that I was
making up to you, wanted you to marry me, in fact.”
Freda blushed crimson.
“He never said anything like that to me,” she said, “if he had, I
should have known it was not true.”
Dick sprang up eagerly.
“Yes, you would, wouldn’t you? You would have known it was
impossible such an idea should enter my head!”
Freda turned away and very quietly re-arranged some of the
books she had placed on the shelves.
“Oh, yes;” and she laughed with some bitterness but more
sadness. “Did you think it possible that I, who am lame, and fit for
nothing but a convent, where I can pray, and can work with my
needle as well as the strong ones, should ever put myself on an
equality with the girls who can dance, and ride, and row?”
Dick was overwhelmed. In her innocence, as she had
misunderstood his cousin, so she was misunderstanding him.
“Now look here, Miss Mulgrave,” said he, as he brought his right
hand heavily down on one of the bookshelves. “You are quite wrong.
You have mistaken Bob’s meaning and mine altogether. Don’t you
see that what he wanted was to get some sort of hold on you
through me, since he couldn’t get it in any other way? And can’t you
understand how mean it would be of me, and absurd (mean if I had
any chance, and absurd as I haven’t) to come to you and talk about
admiration and love and marriage, when I am just in the position of a
farm-labourer about to be turned off?”
“What do you mean?”
“Why, that your father’s refusal to—to have anything more to do
with us has ruined us; so that Bob and my aunt will have to leave the
farm and go to London.”
“And you, what will you do?”
“I shall stay on at the old place.”
“But, you won’t be comfortable!”
“More comfortable than I should be anywhere else. You see I’m
not like the others, who just came to the old place when they had to
let the Hall. I was brought up at the farm, and used to spend my
holidays there. I was only annexed by my aunt and Bob when there
was some dirty work to be done and it was seen that I might prove
useful.”
Dick’s voice was so sweet and he spoke so very quietly that it was
not until some minutes after he had finished this short autobiography
that Freda perceived all the bitterness he had expressed in it.
“Oh!” she sighed out at last, in a voice full of soft reproach. “How
could you?”
Dick laughed a little.
“I don’t think I could make you understand. You are too good. I
wish none of this business had ever come to your ears.”
Freda looked thoughtful for a few minutes. Then she said:
“I don’t wish that. You see I’ve been obliged to think a great deal
lately, and I see that there is a great deal more wickedness and
unhappiness in the world than we in the convent ever thought of.
And it seems to me that to shut oneself up out of it all and to try to
make a little heaven for oneself and to keep apart from all the
difficulties and miseries outside is selfish. So that I’m glad I can’t be
so selfish any longer.”
“Now I don’t quite agree with you. By coming out you only add to
the general sum of misery in the world by one more miserable unit;
where’s the advantage to your fellow-creatures of that?”
“But I don’t intend to be miserable. I am going to try to bring some
of the convent’s happiness and peace to the people outside, or at
least to—some of them.”
“I should like to know how you propose to set about it.”
“First, I am going to try to persuade—some people to give up
doing what is wrong. I am going to try to persuade you.”
“To give up——”
“Well, Free Trade.”
“And make a virtue of necessity? You see, it has given me up.”
“Did you like—doing that?”
“Smuggling? You called it smuggling this morning, and now that it
has nothing more to do with me, I don’t mind if I give it the same
name. I was first mixed up with it when I was seventeen, before the
age when one grows either a beard or a conscience, and I can’t
honestly say that I felt anything but enjoyment of the excitement.”
“Your cousin led you into it?”
“Well, I suppose so. Somebody else led him.”
Her face fell.
“I know—my father.”
“And it went on for a long time, and one got used to the risk and
took that as a set-off against the wrong. And after all, we were only
carrying out with logical thoroughness the blessed theory of Free
Trade, of which we are told we ought as a nation to be so proud. It
has ruined us small land-owners, by making it impossible to cultivate
the land remuneratively. Who can blame us then if we try to get
compensation by taking a hair out of the tail of the dog that bit us?”
“I can’t argue with you, because I don’t know enough. But I
suppose the laws are on the whole good and just, and it is right to
obey them. It must be bad for people to live always under the feeling
that they have to hide something.”
“Why, what bad effect has it had upon me? Have you found me
such a very redoubtable ruffian?”
“Oh, no! Oh, no; you have been very good and kind.”
“Well, certainly I have wished nothing but good to you. I came with
Bob this morning only to see that he didn’t bully you, and if in any
way I could help you or get you away out of this place, I would. Is
that rough brute Crispin kind to you?”
“Yes, and no. He is very strange. Sometimes he is harsh and hard
and so disagreeable I scarcely dare speak to him, and then at other
times he will be almost tender.”
“He hasn’t got tipsy yet, and frightened you?”
“Tipsy! Oh, no!” cried Freda half in alarm and half in indignation. “I
don’t believe he would. I am sure he wouldn’t,” she added warmly.
“You speak as if you were quite fond of him,” said Dick, surprised
and laughing.
“So I am, rather. Somehow I can’t help thinking he is fond of me. It
is very strange.”
“I don’t think so. I don’t think it strange that any one seeing a good
deal of you should get fond of you. Well,” he added after a pause,
during which they both reddened and looked rather embarrassed,
“and have you tried yet to convert Crispin to your views upon
smuggling?”
“Crispin! Oh, no, I should be afraid.”
“I see, you respect him more than you do me. You think he may
smuggle from conscientious conviction? For I may tell you that he is
the right hand in all these enterprises, so that they can go on as well
without the Captain as with him, if only Crispin is there.”
“I know that.”
She paused a moment and then went on: “I haven’t seen him the
last few days. When I do I have something to say to him which will
stop his smuggling too, I think.”
“Why, what’s that!”
Freda raised her finger in sign of caution, not without a little air of
importance.
“There is a man about here sent by government to look after the
smuggling: I’m going to tell him that.”
Dick’s face changed, and became full of excitement and interest.
“Why, how came you to hear of such a thing? Are you sure of it?”
“Quite sure. I have seen him, talked with him. He is a great friend
of mine.”
“Then if he is, I warn you most solemnly to tell him not to interfere
with these men, nor to let them know what he’s up to. They’re an
awfully rough lot, these fellows. Only the Captain, and Crispin Bean,
who’s been captain of the yacht so long, can manage them.”
“The yacht!” cried Freda. “Why, that is used for the smuggling
then!”
“Oh, I don’t know that,” answered Dick hastily. “But, but—if you
don’t want to hear of any more mysterious deaths and
disappearances in the neighbourhood, remember to warn your
friend. Now I must go; good-bye.”
He held out his hand abruptly, but withdrew it with a shy laugh
before Freda could take it.
“Perhaps you would rather not shake hands with such a rascal.”
“Oh,” said Freda naïvely, as she held out both hers, “that doesn’t
matter. For all the men I know seem to be rascals.”
Dick laughed, but did not seem to like this observation. He drew
himself up a little, and a variety of emotions seemed to chase each
other across his face.
“I’m glad my poor mother isn’t alive to hear me called that,” he said
in a low voice.
Freda ran up to him, but stopped herself shyly as she was going to
take his hand.
“You used the word first, and I didn’t mean it seriously,” she
whispered, in great distress. “You could not think me so ungrateful.”
“Oh, I didn’t mean to put on airs and pretend to be insulted. But
perhaps I am not so bad as you think. At any rate, if I do wrong,
there’s a comfort in knowing I get punished for it.”
CHAPTER XVIII.
Dick disappeared through the door by which he had entered so
quickly that before Freda had had time to utter more than an
exclamation, the rows of real books and sham ones were again
unbroken, and the noise of a drawn bolt told her that it was of no use
for her to try to follow him.
She sat down again in a tumult of agitated feelings. Her heart felt
drawn out to this young fellow with what she thought must be
gratitude for his kindness. She looked with vivid interest at the
various spots in the room on which he had stood, and tried to
imagine his figure in them again. She even crossed to the
bookshelves, and laid her hand on the place where his had lain, and
touched again the books which he had handed to her. She felt so
sorry for him, so sure that in his share of the wicked enterprise of his
cousin and her father, Dick had been little more than a victim. And
then these musings gave place to more serious thoughts. She had
two duties to perform; one was to tell Crispin that there really was a
government emissary on the look out, the other was to warn John
Thurley not to betray himself. This latter duty was, however, clearly
impossible for her to fulfil without the aid of accident; but the former
might be easier.
Now during all this time that Crispin had kept himself invisible to
her eyes, the night-noises which had alarmed Freda so much at first
had been continued regularly, with only this difference: that although
she had crept out to watch the panel-door in the gallery, no one had
passed through it, and no one had been visible in the courtyard. It
seemed clear then, to the girl, that there must be, as Dick had said,
some entrance to the house which she did not know of. To ascertain
this beyond a doubt, she laid an ingenious plan, and night having by
this time fallen, she proceeded to carry it out. For if, she said to
herself, she could once find the door by which the nocturnal exits
and entrances were made, she would not only be able to waylay
Crispin as he came in or went out, but would have a very important
weapon in her hand by this knowledge.
Freda had seen, in a corner of Mrs. Bean’s wash-house, a heap of
silver sand. Watching her opportunity, she filled her skirt with this,
slipped out, and making a careful tour of the house, stables, and
outbuildings, she put two narrow lines of the sand before every door,
including that by which Crispin had once carried her into the house.
The snow had by this time melted or been swept away from the
neighbourhood of all buildings, and in such places the ground had
dried sufficiently for her purpose. To do her work the more
thoroughly, she then went the round of the outer walls of the garden
and enclosures, and repeated her sand-strewing before every door
she found, and before the iron entrance gates. Then she crept back
into the house, feeling pretty sure that she had been unseen in the
moonless night, and went to bed, tired but full of excitement.
She was too restless to sleep, so presently she got up again, put
on her dressing-gown, and waited eagerly for a repetition of the
usual sounds. She was soon satisfied: first the distant mutterings, far
underneath her feet, then the mounting of slow feet up stone steps;
the voices subdued, but nearer; the moving of heavy burdens; a
sound of weights falling; the chink of glasses; a low murmur of talk in
men’s voices, the sounds gradually dying away. That was all. An
hour, by the little clock on her mantelpiece, from the first sound to the
last. Then all was quiet till morning, when Freda, after a disturbed
night of short snatches of sleep, woke with a start to the memory of
her undertaking. Ah! She had got them now! In an hour she would
know all about it; she would be able to waylay and confront them, if
she chose. And she almost thought she would choose.
Full of these ideas, Freda dressed hastily and ran downstairs. Nell
was busy in the kitchen; the place was as deserted as usual. She
stole out of the house with a loudly beating heart, feeling refreshed
instead of chilled by the air of the keen March morning. Stealthily,
with one eye on Nell’s quarters and one on her task, she began her
tour, her excitement increasing as door after door was reached, and
there was still no sign.
At last the tour was made, the inspection ended, in bitter
disappointment.
For the sand before every door was undisturbed.

You might also like