Professional Documents
Culture Documents
Semester 1, 2006-2007
Pattern Matching
Dr. Andrew Davison
WiG Lab (teachers room), CoE
ad@fivedots.coe.psu.ac.th
Updated by:
T: a b a c a a b
1
Dr. Rinaldi Munir,
Informatika – STEI
P: a b a c a b
4 3 2
Institut Teknologi Bandung
a b a c a b
P: “main”
T: a n d r e w T: a n d r e w
P: r e w P: r e w
Donald Ervin Knuth (born January 10, 1938) is a computer scientist and Professor
Emeritus at Stanford University. He is the author of the seminal multi-volume work
The Art of Computer Programming.[3] Knuth has been called the "father" of the
analysis of algorithms. He contributed to the development of the rigorous analysis of
the computational complexity of algorithms and systematized formal mathematical
techniques for it. In the process he also popularized the asymptotic notation.
T:
P: j=6
jnew = 3
b a c b a b a b a a b c b a T
s’
a b a b a c a P
k
a b a b a Pq Longest prefix of Pq that is also a
suffix of P5 is ‘aba’; so b[5]= 3
a b a Pk
240-301 Comp. Eng. Lab III (Software), Pattern Matching
7-23 23
Fungsi Pinggiran KMP
(KMP Border Function)
KMP preprocesses the pattern to find matches of
prefixes of the pattern with the pattern itself.
j = mismatch position in P[]
k = position before the mismatch (k = j-1).
The border function b(k) is defined as the size of
the largest prefix of P[1..k] that is also a suffix of
P[1..k].
The other name: failure function (disingkat: fail)
b(5) means
– find the size of the largest prefix of P[1..5] that
is also a suffix of P[1..5]
– find the size largest prefix of "abaab" that
is also a suffix of "baab“
– find the size of "ab"
=2
J 1 2 3 4 5 6 7 8 9 10
P [j] a b a b a b a b c a
b[j] 0 0 1 2 3 4 5 6 0 1
int i=0;
int j=0;
:
int m = pattern.length();
int j = 0;
int i = 1;
:
b(5) means
– find the size of the largest prefix of P[1..5] that
is also a suffix of P[1..5]
= find the size largest prefix of "abaca" that
is also a suffix of "baca"
= find the size of "a"
=1
T: a b a a b x
Basic KMP
P: a b a a b a does not do this.
a b a a b a
T x a
There are 3 possible
i
cases, tried in order.
P b a
j
240-301 Comp. Eng. Lab III (Software), Pattern Matching 42
Case 1
If P contains x somewhere, then try to
shift P right to align the last occurrence
of x in P with T[i].
T x a T x a ? ?
i inew
and
move i and
j right, so
P xc b a j at end P x c ba
j jnew
240-301 Comp. Eng. Lab III (Software), Pattern Matching 43
Case 2
If P contains x somewhere, but a shift right
to the last occurrence is not possible, then
shift P right by 1 character to T[i+1].
T x a x T xa x ?
i inew
and
move i and
j right, so
P cw a x j at end P c wax
j x is after jnew
j positio
240-301 Comp. Eng. Lab III (Software), Pattern Matching 44
Case 3
If cases 1 and 2 do not apply, then shift P to
align P[1] with T[i+1].
T x a T x a ? ? ?
i inew
and
move i and
j right, so
Pdcb a j at end P d c ba
No x in P j 1 jnew
240-301 Comp. Eng. Lab III (Software), Pattern Matching 45
Boyer-Moore Example (1)
T:
a p a t t e r n m a t c h i n g a l g o r i t h m
1 3 5 11 10 9 8 7
r i t h m r i t h m r i t h m r i t h m
P: 2 4 6
r i t h m r i t h m r i t h m
x a b c d
L(x) 5 6 4 -1
x a b c d
L(x) 5 6 4 1
240-301 Comp. Eng. Lab III (Software), Pattern Matching 50
Return index where
Boyer-Moore in Java pattern starts, or -1
if (i > n-1)
return -1; // no match if pattern is
// longer than text
:
return last;
} // end of buildLast()