Professional Documents
Culture Documents
Strings
Strings
BruteForce,RabinKarp,KnuthMorrisPratt
RegularExpressions
StringSearching
Thepreviousslideisnotagreatexampleofwhatismeant
byStringSearching.Norisitmeanttoridiculepeople
withouteyes....
Theobjectofstringsearchingistofindthelocationofa
specifictextpatternwithinalargerbodyoftext(e.g.,a
sentence,aparagraph,abook,etc.).
Aswithmostalgorithms,themainconsiderationsforstring
searchingarespeedandefficiency.
Thereareanumberofstringsearchingalgorithmsin
existencetoday,butthethreeweshallreviewareBrute
Force,RabinKarp,andKnuthMorrisPratt.
BruteForce
TheBruteForcealgorithmcomparesthepatterntothetext,one
characteratatime,untilunmatchingcharactersarefound
Comparedcharactersareitalicized.
Correctmatchesareinboldfacetype.
Thealgorithmcanbedesignedtostoponeitherthefirst
occurrenceofthepattern,oruponreachingtheendofthetext.
BruteForcePseudoCode
Heresthepseudocode
doif(textletter==patternletter)
comparenextletterofpatterntonext
letteroftext
elsemovepatterndowntextbyoneletter
while(entirepatternfoundorendoftext)
BruteForceComplexity
GivenapatternMcharactersinlength,andatextNcharactersin
length...
Worstcase:comparespatterntoeachsubstringoftextoflengthM.
Forexample,M=5.
Thiskindofcasecanoccurforimagedata.
Totalnumberofcomparisons:M(NM+1)
Worstcasetimecomplexity:O(MN)
BruteForceComplexity(cont.)
GivenapatternMcharactersinlength,andatextNcharactersin
length...
Bestcaseifpatternfound:FindspatterninfirstMpositionsoftext.
Forexample,M=5.
Totalnumberofcomparisons:M
Bestcasetimecomplexity:O(M)
BruteForceComplexity(cont.)
GivenapatternMcharactersinlength,andatextNcharactersinlength...
Bestcaseifpatternnotfound:Alwaysmismatchonfirstcharacter.For
example,M=5.
Totalnumberofcomparisons:N
Bestcasetimecomplexity:O(N)
RabinKarp
TheRabinKarpstringsearchingalgorithmcalculatesahashvalue
forthepattern,andforeachMcharactersubsequenceoftexttobe
compared.
Ifthehashvaluesareunequal,thealgorithmwillcalculatethehash
valuefornextMcharactersequence.
Ifthehashvaluesareequal,thealgorithmwilldoaBruteForce
comparisonbetweenthepatternandtheMcharactersequence.
Inthisway,thereisonlyonecomparisonpertextsubsequence,
andBruteForceisonlyneededwhenhashvaluesmatch.
Perhapsanexamplewillclarifysomethings...
RabinKarpExample
HashvalueofAAAAAis37
HashvalueofAAAAHis100
RabinKarpAlgorithm
patternisMcharacterslong
hash_p=hashvalueofpattern
hash_t=hashvalueoffirstMlettersinbodyoftext
do
if(hash_p==hash_t)
bruteforcecomparisonofpattern
andselectedsectionoftext
hash_t=hashvalueofnextsectionoftext,onecharacterover
while(endoftext
or
bruteforcecomparison==true)
10
RabinKarp
CommonRabinKarpquestions:
Whatisthehashfunctionusedtocalculate valuesfor
charactersequences?
IsntittimeconsumingtohashveryoneoftheMcharacter
sequencesinthetextbody?
Isthisgoingtobeonthefinal?
Toanswersomeofthesequestions,wellhavetogetmathematical.
11
RabinKarpMath
ConsideranMcharactersequenceasanMdigitnumberinbaseb,wherebisthenumber
oflettersinthealphabet.Thetextsubsequencet[i..i+M1]ismappedtothenumber
Furthermore,givenx(i)wecancomputex(i+1)forthenext
subsequencet[i+1..i+M]inconstanttime,asfollows:
Inthisway,weneverexplicitlycomputeanewvalue.We
simplyadjusttheexistingvalueaswemoveoverone
character.
12
RabinKarpMathExample
Letssaythatouralphabetconsistsof10letters.
ouralphabet=a,b,c,d,e,f,g,h,i,j
Letssaythatacorrespondsto1,bcorrespondsto2andso
on.
Thehashvalueforstringcahwouldbe...
3*100+1*10+8*1=318
13
RabinKarpMods
IfMislarge,thentheresultingvalue(~bM)willbeenormous.Forthisreason,we
hashthevaluebytakingitmodaprimenumberq.
Themodfunction(%inJava)isparticularlyusefulinthiscaseduetoseveralofits
inherentproperties:
[(xmodq)+(ymodq)]modq=(x+y)modq
(xmodq)modq=xmodq
Forthesereasons:
h(i)=((t[i]bM1modq)+(t[i+1]bM2modq)+...
+(t[i+M1]modq))modq
h(i+1)=(h(i)bmodq
Shiftleftonedigit
t[i]bMmodq
Subtractleftmostdigit
+t[i+M]modq)
Addnewrightmostdigit
modq
14
RabinKarpComplexity
Ifasufficientlylargeprimenumberisusedforthehashfunction,
thehashedvaluesoftwodifferentpatternswillusuallybedistinct.
Ifthisisthecase,searchingtakesO(N)time,whereNisthe
numberofcharactersinthelargerbodyoftext.
Itisalwayspossibletoconstructascenariowithaworstcase
complexityofO(MN).This,however,islikelytohappenonlyif
theprimenumberusedforhashingissmall.
15
TheKnuthMorrisPrattAlgorithm
TheKnuthMorrisPratt(KMP)stringsearchingalgorithmdiffersfromthebruteforcealgorithmby
keepingtrackofinformationgainedfrompreviouscomparisons.
Afailurefunction(f)iscomputedthatindicateshowmuchofthelastcomparisoncanbereusedifit
fails.
Specifically,fisdefinedtobethelongestprefixofthepatternP[0,..,j]thatisalsoasuffixofP[1,..,j]
Note:notasuffixofP[0,..,j]
Example:valueofthe
KMPfailurefunction:
Thisshowshowmuchofthebeginningofthestringmatchesuptothe
portionimmediatelyprecedingafailedcomparison.
ifthecomparisonfailsat(4),weknowthea,binpositions2,3isidentical
topositions0,1
16
TheKMPAlgorithm(contd.)
theKMPstringmatchingalgorithm:PseudoCode
AlgorithmKMPMatch(T,P)
Input:StringsT(text)withncharactersandP
(pattern)withmcharacters.
Output:StartingindexofthefirstsubstringofT
matchingP,oranindicationthatPisnota
substringofT.
17
Algorithm
fKMPFailureFunction(P){buildfailurefunction}
i0
j0
whilei<ndo
ifP[j]=T[i]then
ifj=m1then
returnim1{amatch}
ii+1
jj+1
elseifj>0then{nomatch,butwehaveadvanced}
jf(j1){jindexesjustaftermatchingprefixinP}
else
ii+1
returnThereisnosubstringofTmatchingP
18
TheKMPAlgorithm(contd.)
TheKMPfailurefunction:PseudoCode
AlgorithmKMPMatch(T,P)
Input:StringP(pattern)withmcharacters
Output:ThefailurefunctionfforP,whichmapsjto
thelengthofthelongestprefixofPthatisasuffix
ofP[1,..,j]
19
Algorithm
fKMPFailureFunction(P){buildfailurefunction}
i0
j0
whileim1do
ifP[j]=T[i]then
ifj=m1then
{wehavematchedj+1characters}
f(i)j+1
ii+1
jj+1
elseifj>0then
jf(j1){jindexesjustaftermatchingprefixin
P}
else{thereisnomatch}
f(i)0
20
ii+1
TheKMPAlgorithm(contd.)
AgraphicalrepresentationoftheKMPstringsearchingalgorithm
21
TheKMPAlgorithm(contd.)
TimeComplexityAnalysis
definek=ij
Ineveryiterationthroughthewhileloop,oneofthreethingshappens.
1)ifT[i]=P[j],theniincreasesby1,asdoesjkremainsthesame.
2)ifT[i]!=P[j]andj>0,thenidoesnotchangeandkincreasesbyatleast1,
sincekchangesfromijtoif(j1)
3)ifT[i]!=P[j]andj=0,theniincreasesby1andkincreasesby1sincej
remainsthesame.
Thus,eachtimethroughtheloop,eitheriorkincreasesbyatleast1,sothe
greatestpossiblenumberofloopsis2n
Thisofcourseassumesthatfhasalreadybeencomputed.
However,fiscomputedinmuchthesamemannerasKMPMatchsothetime
complexityargumentisanalogous.KMPFailureFunctionisO(m)
TotalTimeComplexity:O(n+m)
22
RegularExpressions
notationfordescribingasetofstrings,possiblyofinfinite
size
denotestheemptystring
ab+cdenotestheset{ab,c}
a*denotestheset{,a,aa,aaa,...}
Examples
(a+b)*allthestringsfromthealphabet{a,b}
b*(ab*a)*b*stringswithanevennumberofas
(a+b)*sun(a+b)*stringscontainingthepatternsun
(a+b)(a+b)(a+b)a4letterstringsendingina
23