0 Up votes1 Down votes

500 views7 pagesSep 15, 2010

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

500 views

Attribution Non-Commercial (BY-NC)

- The Woman Who Smashed Codes: A True Story of Love, Spies, and the Unlikely Heroine who Outwitted America's Enemies
- NIV, Holy Bible, eBook
- NIV, Holy Bible, eBook, Red Letter Edition
- Steve Jobs
- Cryptonomicon
- Hidden Figures Young Readers' Edition
- Make Your Mind Up: My Guide to Finding Your Own Style, Life, and Motavation!
- Console Wars: Sega, Nintendo, and the Battle that Defined a Generation
- The Golden Notebook: A Novel
- Alibaba: The House That Jack Ma Built
- Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Autonomous: A Novel
- Algorithms to Live By: The Computer Science of Human Decisions
- Digital Gold: Bitcoin and the Inside Story of the Misfits and Millionaires Trying to Reinvent Money

You are on page 1of 7

String Matching

We formalize the string-matching problem as follows. Given a text array, T[1 . . n], of n character and a pattern array,

P[1 . . m], of m characters. The problem is to find an integer s, called valid shift where 0 s < n-m and T[s+1 . . .

s+m] = P[1 . . m]. In other words, to find whether P in T i.e., whether P is a substring of T.

The naïve approach simple test all the possible placement of Pattern P[1 . . m] relative to text T[1 . . n]. Specifically,

we try shift s = 0, 1, . . . , n-m, successively and for each shift, s. Compare T[s+1 . . s+m] to P[1 . . m]

NAÏVE_STRING_MATCHER (T, P)

1. n ← length [T]

2. m ← length [P]

3. for s ← 0 to n-m do

4. if P[1 . . m] = T[s+1 . . s+m]

5. then return valid shift s

Q. Write an algorithm for naïve string matcher? What is its worst case complexity? Show the

comparisons the naïve string matcher makes for the pattern P=0001 in the text

T=000010001010001

@2008-09 Shankar Thawkar , Sr. Lect. IT dept.

2) Rabin-Karp Algorithm

1. 2.

Rabin-

Rabin-Karp Algorithm …

Key idea:

The pattern P[1..m] as a key, transform (hash) How to compute p?

it into an equivalent integer p

p = 2m-1 P[0] + 2m-2 P[1] + … + 2 P[m-

P[m-2] + P[m-

P[m-

Similarly, we transform substrings in the text 1]

string T[] into integers

For s=0,1,…,n

s=0,1,…,n-m, transform T[s+1..s+m] to an

equivalent integer ts Using horner’s rule

The pattern occurs at position s if and only if

p=t

p=ts

If we compute p and ts quickly, then the

pattern matching problem is reduced to

This takes O(m) time, assuming each arithmetic operation

comparing p with n-

n-m+1 integers can be done in O(1) time.

3. 4.

Upper limits

How it works

Problem

For long patterns, or for large alphabets, the number

Hash pattern P into a numeric value representing a given string may be too large to be practical

Solution

Let a string be represented by the sum of Use MOD operation

these digits Let q be a prime number so that 2q can be stored in one

Horner’

Horner’s rule (§

(§ 30.1) computer word.

Example

Example

BAN = 1 + 0 + 13 = 14

{ A, B, C, ..., Z } → { 0, 1, 2, ..., } 14 mod q = 1

BAN → 1 + 0 + 13 = 14 14 mod 13 = 1

BAN → 1

CARD → 2 + 0 + 17 + 3 = 22 CARD = 2 + 0 + 17 + 3 = 22

22 mod 13 = 9

CARD → 9

@2008-09 Shankar Thawkar , Sr. Lect. IT dept.

How it Works • if the hash values match, the strings might not match

Once we use the modulo arithmetic, when p=t p=ts and in those cases we have the spurious hits .

for some s, we can no longer be sure that P[1 ..

M] is equal to T[s+1 .. S+ m ]

Therefore, after the equality test p = ts, we

should compare P[1..m] with T[s+1..s+m]

character by character to ensure that we really

have a match.

So the worst-

worst-case running time becomes O(nm),

O(nm),

but it avoids a lot of unnecessary string

matchings in practice.

2: hs = hash(T[1::m]) i.e. ts

3: for s = 0 to n - m do

4: if hs = hsub then

5: if T[s+1.. S+m] = P then

6: print “Pattern occurs with shift” i

7: hs = hash(T[i + 1..i + m])

Q. Write a rabin-karp algo for string matching. Given working modulo q=11.how may spurious hits does the rabin

karp matcher encountered in the Text T=3151592653589793 when looking for pattern P=26.

p= P mod q p= 26 mod 11 = 4 The find ts for the text T as ts= 31 mod 11 , ts+1= 15 mod 11

3 1 5 1 5 9 2 6 5 3 5 8 9 7 9 3

9 3 8 4 4 4 4 10 9 2 3 1 9 2

Spurious

@2008-09 Shankar Thawkar , Sr. Lect. IT dept.

The Knuth-

Knuth-Morris-

Morris-Pratt (KMP) algorithm If a mismatch occurs between the text and

looks for the pattern in the text in a left-

left-to-

to- pattern P at P[ j ], what is the most we can

right order (like the brute force algorithm). shift the pattern to avoid wasteful

comparisons?

But it shifts the pattern more intelligently

than the brute force algorithm. Answer:

Answer: the largest prefix of P[0 .. j-

j-1] that

is a suffix of P[1 .. j-

j-1]

continued

Example

i

T:

P: j=5

jnew = 2

The KMP algorithm preprocess the pattern P by computing a prefix function that indicates the largest possible shift

s using previously performed comparisons. Specifically, the prefix function (q) is defined as the length of the

longest prefix of P .

1. m=length[P]

2. [1]=0

3. k=0

4. for q=2 to m

5. while k>0 and P[k+1]<> P[q]

@2008-09 Shankar Thawkar , Sr. Lect. IT dept. 4

@2008-09 Shankar Thawkar , Sr. Lect. IT dept.

6. k=[k]

7. if P[k+1]=P[q] then

8. k=k+1

9. [q]=k

Note that the prefix function for P, which maps q to the length of the longest prefix of P that is a suffix of P[1 . . q],

encodes repeated substrings inside the pattern itself.

As an example, consider the pattern P = a b b a b a . The prefix function, using above algorithm is

q 1 2 3 4 5 6

P[q] a b b a b a

(q) 0 0 0 1 2 1

Analysis

The running time of Knuth-Morris-Pratt algorithm is proportional to the time needed to read the characters in text and

pattern. In other words, the worst-case running time of the algorithm is O(m+n) and it requires O(m) extra space. It is

important to note that these quantities are independent of the size of the underlying alphabet.

function. Calculate the prefix function for the patter – a b b a b a [ Ans : shown in above

prefix example.]

@2008-09 Shankar Thawkar , Sr. Lect. IT dept.

@2008-09 Shankar Thawkar , Sr. Lect. IT dept.

- Sorting AlgorithmUploaded byanujgit
- Merge Sort Quick SortUploaded byTudor Tilinschi
- Data Structures and AlgorithmsUploaded byThirumal Venkat
- Priority QueueUploaded byapi-3701035
- Run Time Analysis of Insertion Sort and Quick SortUploaded byPhilip Strong
- Report [Insertion Merge Quick] revised version.pdfUploaded byTosin Amuda
- Priority QueueUploaded byapi-19981779
- Learning in Artificial IntelligenceUploaded byAnik
- rt503_dbms_module1__2 MG UNIVERSITYUploaded byAby Panthalanickal
- DSA Notes 8Uploaded byNishu Rave
- Mg Question Papers s6 CsUploaded byJinu Madhavan
- Algorithms and Data StructuresUploaded byKareem Sabri
- Memory ManagementUploaded byShreyans Pathak
- DS notesUploaded byrohietsharmaa
- C07-quicksortUploaded byShaunak Patel
- Bk Binomial HeapUploaded bybabu_kolar
- Artificial IntelligenceUploaded byKarishma Gautam
- Red Black TreesUploaded byPrasaad Deshhmukkh
- Heap SortUploaded byVivek Marakana
- Memory ManagementUploaded byappurav
- Rabin Karp MatchingUploaded byMouniga Ve
- 5153 DESIGN and ANALYSIS of ALGORITHMS Anna University Previous Year Question PaperUploaded byarumugam1984293269
- Quick SortUploaded byNurul Husna Zulkifli
- Algorithm Analysis and DesignUploaded bykalaraiju
- Rabin Karp Algorithm of Pattern Matching(Goutam Padhy)Uploaded byGoutam Padhy
- heap sortUploaded byme_nageshkumar5342
- StackUploaded byGoldy Batra
- Algorithm and Analysis - Greedy Algorithms.tifUploaded byashish0417
- Class Diagram NotationUploaded byqueenkammy

- MATLAB Photoshop Read MeUploaded bygmejuly
- CTIGuideUploaded byJulian Gutierred
- IM2000.pdfUploaded bySantosh Kesavan
- NewsUploaded byMahender Shah
- Your Body is Your IdUploaded byCostitutionalAllianc
- realUploaded byVeena Divya Krishnappa
- Poster TeamPark Traject ENUploaded byPatrick Savalle
- BIPM White Paper Business-Intelligence-Today-Tomorrow 120111Uploaded byeuge_prime2001
- Net 03Uploaded byRobert Miller
- Pcad AsciiUploaded byKleber Freitas
- Managment of Information SystemUploaded byKeyur Neema
- adding subtracting decimalsUploaded byapi-326799487
- POO03017USENUploaded byNguyễn Cương
- Using Computer-Aided Tools in Information Systems DevelopmentUploaded byIJEID :: www.ijeid.com
- Ippb Finacle Menus and OperationsUploaded bySathishKumar
- Bugfix_Build_18.10_english.pdfUploaded byHakim Benmajid
- Work Breakdown StructureUploaded byKamal Shah
- Creating Navigation Sand Embedding Views at Run TimeUploaded byJoopie Boo
- Middleware Stalker BrandingUploaded byRamon Rueda
- C4 Specimen paperUploaded byNourhan Jamal
- Threat Modeling Tool 2016 User Guide.docxUploaded byMihai si atat
- Digital-Design-Using-Verilog.pdfUploaded bySuperdudeGaurav
- VTK Quadrature Point Design DocUploaded byMoȠit Shuklä
- Vijeo Designer 5.1 Manual AUploaded byBrayan Peralta
- 1. Introduction to Databases - PracticalUploaded byNovaarnold
- Alchourrón and Bulygin on Deontic Logic and the Logic of Norm-Propositions. Lennart ÅqvistUploaded byEduardo Gandulfo
- Analisis Manajemen Badan Usaha Milik Desa BUMDESAUploaded bySisca Sct
- Jobswire.com Resume of acap3751Uploaded byapi-30029399
- Using the Fisher TransformUploaded byvest4betterlife
- XML PublisherUploaded byBhujangam Naidu

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.