You are on page 1of 23

StringsandPatternMatching

BruteForce,RabinKarp,KnuthMorrisPratt
RegularExpressions

StringSearching
Thepreviousslideisnotagreatexampleofwhatismeant
byStringSearching.Norisitmeanttoridiculepeople
withouteyes....
Theobjectofstringsearchingistofindthelocationofa
specifictextpatternwithinalargerbodyoftext(e.g.,a
sentence,aparagraph,abook,etc.).
Aswithmostalgorithms,themainconsiderationsforstring
searchingarespeedandefficiency.
Thereareanumberofstringsearchingalgorithmsin
existencetoday,butthethreeweshallreviewareBrute
Force,RabinKarp,andKnuthMorrisPratt.

BruteForce
TheBruteForcealgorithmcomparesthepatterntothetext,one
characteratatime,untilunmatchingcharactersarefound

Comparedcharactersareitalicized.
Correctmatchesareinboldfacetype.
Thealgorithmcanbedesignedtostoponeitherthefirst
occurrenceofthepattern,oruponreachingtheendofthetext.

BruteForcePseudoCode
Heresthepseudocode
doif(textletter==patternletter)
comparenextletterofpatterntonext
letteroftext
elsemovepatterndowntextbyoneletter
while(entirepatternfoundorendoftext)

BruteForceComplexity

GivenapatternMcharactersinlength,andatextNcharactersin
length...
Worstcase:comparespatterntoeachsubstringoftextoflengthM.
Forexample,M=5.
Thiskindofcasecanoccurforimagedata.

Totalnumberofcomparisons:M(NM+1)

Worstcasetimecomplexity:O(MN)

BruteForceComplexity(cont.)
GivenapatternMcharactersinlength,andatextNcharactersin
length...
Bestcaseifpatternfound:FindspatterninfirstMpositionsoftext.
Forexample,M=5.

Totalnumberofcomparisons:M
Bestcasetimecomplexity:O(M)

BruteForceComplexity(cont.)
GivenapatternMcharactersinlength,andatextNcharactersinlength...
Bestcaseifpatternnotfound:Alwaysmismatchonfirstcharacter.For
example,M=5.

Totalnumberofcomparisons:N

Bestcasetimecomplexity:O(N)

RabinKarp
TheRabinKarpstringsearchingalgorithmcalculatesahashvalue
forthepattern,andforeachMcharactersubsequenceoftexttobe
compared.
Ifthehashvaluesareunequal,thealgorithmwillcalculatethehash
valuefornextMcharactersequence.
Ifthehashvaluesareequal,thealgorithmwilldoaBruteForce
comparisonbetweenthepatternandtheMcharactersequence.
Inthisway,thereisonlyonecomparisonpertextsubsequence,
andBruteForceisonlyneededwhenhashvaluesmatch.
Perhapsanexamplewillclarifysomethings...

RabinKarpExample

HashvalueofAAAAAis37
HashvalueofAAAAHis100

RabinKarpAlgorithm
patternisMcharacterslong
hash_p=hashvalueofpattern
hash_t=hashvalueoffirstMlettersinbodyoftext
do
if(hash_p==hash_t)
bruteforcecomparisonofpattern
andselectedsectionoftext
hash_t=hashvalueofnextsectionoftext,onecharacterover
while(endoftext
or
bruteforcecomparison==true)

10

RabinKarp
CommonRabinKarpquestions:
Whatisthehashfunctionusedtocalculate valuesfor
charactersequences?
IsntittimeconsumingtohashveryoneoftheMcharacter
sequencesinthetextbody?
Isthisgoingtobeonthefinal?
Toanswersomeofthesequestions,wellhavetogetmathematical.

11

RabinKarpMath

ConsideranMcharactersequenceasanMdigitnumberinbaseb,wherebisthenumber
oflettersinthealphabet.Thetextsubsequencet[i..i+M1]ismappedtothenumber

Furthermore,givenx(i)wecancomputex(i+1)forthenext
subsequencet[i+1..i+M]inconstanttime,asfollows:

Inthisway,weneverexplicitlycomputeanewvalue.We
simplyadjusttheexistingvalueaswemoveoverone

character.

12

RabinKarpMathExample

Letssaythatouralphabetconsistsof10letters.
ouralphabet=a,b,c,d,e,f,g,h,i,j
Letssaythatacorrespondsto1,bcorrespondsto2andso
on.
Thehashvalueforstringcahwouldbe...
3*100+1*10+8*1=318

13

RabinKarpMods
IfMislarge,thentheresultingvalue(~bM)willbeenormous.Forthisreason,we

hashthevaluebytakingitmodaprimenumberq.
Themodfunction(%inJava)isparticularlyusefulinthiscaseduetoseveralofits
inherentproperties:
[(xmodq)+(ymodq)]modq=(x+y)modq
(xmodq)modq=xmodq
Forthesereasons:
h(i)=((t[i]bM1modq)+(t[i+1]bM2modq)+...
+(t[i+M1]modq))modq
h(i+1)=(h(i)bmodq
Shiftleftonedigit
t[i]bMmodq
Subtractleftmostdigit
+t[i+M]modq)
Addnewrightmostdigit
modq

14

RabinKarpComplexity
Ifasufficientlylargeprimenumberisusedforthehashfunction,
thehashedvaluesoftwodifferentpatternswillusuallybedistinct.
Ifthisisthecase,searchingtakesO(N)time,whereNisthe
numberofcharactersinthelargerbodyoftext.
Itisalwayspossibletoconstructascenariowithaworstcase
complexityofO(MN).This,however,islikelytohappenonlyif
theprimenumberusedforhashingissmall.

15

TheKnuthMorrisPrattAlgorithm
TheKnuthMorrisPratt(KMP)stringsearchingalgorithmdiffersfromthebruteforcealgorithmby
keepingtrackofinformationgainedfrompreviouscomparisons.
Afailurefunction(f)iscomputedthatindicateshowmuchofthelastcomparisoncanbereusedifit
fails.
Specifically,fisdefinedtobethelongestprefixofthepatternP[0,..,j]thatisalsoasuffixofP[1,..,j]
Note:notasuffixofP[0,..,j]
Example:valueofthe
KMPfailurefunction:

Thisshowshowmuchofthebeginningofthestringmatchesuptothe

portionimmediatelyprecedingafailedcomparison.
ifthecomparisonfailsat(4),weknowthea,binpositions2,3isidentical
topositions0,1

16

TheKMPAlgorithm(contd.)
theKMPstringmatchingalgorithm:PseudoCode

AlgorithmKMPMatch(T,P)
Input:StringsT(text)withncharactersandP
(pattern)withmcharacters.
Output:StartingindexofthefirstsubstringofT
matchingP,oranindicationthatPisnota
substringofT.

17

Algorithm

fKMPFailureFunction(P){buildfailurefunction}
i0
j0
whilei<ndo
ifP[j]=T[i]then
ifj=m1then
returnim1{amatch}
ii+1
jj+1
elseifj>0then{nomatch,butwehaveadvanced}
jf(j1){jindexesjustaftermatchingprefixinP}
else
ii+1
returnThereisnosubstringofTmatchingP

18

TheKMPAlgorithm(contd.)
TheKMPfailurefunction:PseudoCode
AlgorithmKMPMatch(T,P)
Input:StringP(pattern)withmcharacters
Output:ThefailurefunctionfforP,whichmapsjto
thelengthofthelongestprefixofPthatisasuffix
ofP[1,..,j]

19

Algorithm

fKMPFailureFunction(P){buildfailurefunction}
i0
j0
whileim1do
ifP[j]=T[i]then
ifj=m1then
{wehavematchedj+1characters}
f(i)j+1
ii+1
jj+1
elseifj>0then
jf(j1){jindexesjustaftermatchingprefixin
P}
else{thereisnomatch}
f(i)0

20
ii+1

TheKMPAlgorithm(contd.)
AgraphicalrepresentationoftheKMPstringsearchingalgorithm

21

TheKMPAlgorithm(contd.)
TimeComplexityAnalysis
definek=ij
Ineveryiterationthroughthewhileloop,oneofthreethingshappens.
1)ifT[i]=P[j],theniincreasesby1,asdoesjkremainsthesame.
2)ifT[i]!=P[j]andj>0,thenidoesnotchangeandkincreasesbyatleast1,
sincekchangesfromijtoif(j1)
3)ifT[i]!=P[j]andj=0,theniincreasesby1andkincreasesby1sincej
remainsthesame.

Thus,eachtimethroughtheloop,eitheriorkincreasesbyatleast1,sothe
greatestpossiblenumberofloopsis2n
Thisofcourseassumesthatfhasalreadybeencomputed.
However,fiscomputedinmuchthesamemannerasKMPMatchsothetime
complexityargumentisanalogous.KMPFailureFunctionisO(m)
TotalTimeComplexity:O(n+m)

22

RegularExpressions
notationfordescribingasetofstrings,possiblyofinfinite
size
denotestheemptystring
ab+cdenotestheset{ab,c}
a*denotestheset{,a,aa,aaa,...}
Examples
(a+b)*allthestringsfromthealphabet{a,b}
b*(ab*a)*b*stringswithanevennumberofas
(a+b)*sun(a+b)*stringscontainingthepatternsun
(a+b)(a+b)(a+b)a4letterstringsendingina

23

You might also like