Professional Documents
Culture Documents
Boyer Moore Algorithm: Idan Szpektor
Boyer Moore Algorithm: Idan Szpektor
Idan Szpektor
Boyer and Moore
What Its About
A String Matching Algorithm
T: aaaaaaaaaaaaaaaaaaaaaaaaa
P: abaaaa
The Good Suffix Rule (GSR)
1 2 3 4 5 6 7 8 9 10 11 12 13
P: b b a b b a a b b c a b b
L: 0 0 0 0 0 0 0 0 0 5 9 0 12
Preprocessing the GSR l(i)
P: b b a b b a a b b c a b b
l: 2 2 2 2 2 2 2 2 2 2 2 1
Using L(i) and l(i) in GSR
s: b b a c d c b b a a b b c d d
Z: 1 0 0 0 0 3 1 0 0 2 1 0 0 0
s: d d c b b a a b b c d c a b b
N: 0 0 0 1 2 0 0 1 3 0 0 0 0 1
Building L(i) in O(n)
L(i) The biggest index j < n, such that prefix
P[1..j] contains suffix P[i..n] as a suffix but not
suffix P[i-1..n]
for i := 1 to n, L(i) := 0
for j := 1 to n-1
i := n N(j) + 1
L(i) := j
Building l(i) in O(n)
l(i) The length of the longest suffix of P[i..n]
that is also a prefix of P
k := 0
for j := 1 to n-1
If(N(j) == j), k := j
l(n j + 1) := k
Building Z in O(n)
S i j i
1. Properties of strings
2. Proof of search in O(m) if P is not in T, using
only the good suffix rule.
3. Proof of search in O(m) even if P is in T,
adding the Galil rule.
Properties of Strings
If for two strings , : = then there is a
string such that = i and = j, i, j > 0
- Proof by induction
Properties of Strings (Cont)
Proof - when P is Not Found in T
si m
fi = m
We want to prove that gi 3si ( gi 3m).
Proof (Cont)
Each round dont find P it matched a
substring ti and one bad char xi in T (xiti T)
T: bbacdcbaabcbbabdbabcaabcbcb
P: bdbabc
Proof: by Lemma 1.
Lemma 3 (|ti| + 1 > 3si)
Suppose P overlapped ti during round i. We
shall examine in what ways could P overlap ti
in previous rounds.
|n| + 1 3s
matches in round i 3s m