0 Up votes0 Down votes

8 views40 pagesIt contains all the algorithms realted to string.Good for competitve coding.

Mar 06, 2016

© © All Rights Reserved

PDF, TXT or read online from Scribd

It contains all the algorithms realted to string.Good for competitve coding.

© All Rights Reserved

8 views

It contains all the algorithms realted to string.Good for competitve coding.

© All Rights Reserved

- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
- Sapiens: A Brief History of Humankind
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Shoe Dog: A Memoir by the Creator of Nike
- The Unwinding: An Inner History of the New America
- The Little Book of Hygge: Danish Secrets to Happy Living
- Never Split the Difference: Negotiating As If Your Life Depended On It
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Yes Please
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- Grit: The Power of Passion and Perseverance
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- The Emperor of All Maladies: A Biography of Cancer
- John Adams
- This Changes Everything: Capitalism vs. The Climate
- Team of Rivals: The Political Genius of Abraham Lincoln
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Principles: Life and Work
- Rise of ISIS: A Threat We Can't Ignore
- The World Is Flat 3.0: A Brief History of the Twenty-first Century

You are on page 1of 40

Jaehyun Park

CS 97SI

Stanford University

Outline

Hash Table

Knuth-Morris-Pratt (KMP) Algorithm

Suffix Trie

Suffix Array

within T

Notations:

: set of alphabets (of constant size)

Pi : ith letter of P (1-indexed)

a, b, c: single letters in

x, y, z: strings

Example

T = AGCATGCTGCAGTCATGCTTAGGCTA

P = GCT

Initiate string comparison at every starting point

Each comparison takes O(m) time

Outline

Hash Table

Knuth-Morris-Pratt (KMP) Algorithm

Suffix Trie

Suffix Array

Hash Table

Hash Function

A good hash function has few collisions

i.e., If x 6= y, H(x) 6= H(y) with high probability

prime p

Consider each letter as a number (ASCII value is fine)

H(x1 . . . xk ) = x1 ak1 + x2 ak2 + + xk1 a + xk (mod p)

How do we find H(x2 . . . xk+1 ) from H(x1 . . . xk )?

Hash Table

Hash Table

Hash every substring of length k

k is a small constant

the occurrences of it within T

Hash Table

Hash Table

Pros:

Easy to implement

Significant speedup in practice

Cons:

Doesnt help the asymptotic efficiency

difficult

Hash Table

Outline

Hash Table

Knuth-Morris-Pratt (KMP) Algorithm

Suffix Trie

Suffix Array

problem by preprocessing P in (m) time

Main idea is to skip some comparisons by using the previous

comparison result

[i] is the largest integer smaller than i such that P1 . . . P[i] is

a suffix of P1 . . . Pi

10

is a suffix of P1 . . . Pi

e.g., [6] = 4 since abab is a suffix of ababab

e.g., [9] = 0 since no prefix of length 8 ends with c

11

P = ABCDABD

= (0, 0, 0, 0, 1, 2, 0)

12

Thus, there is no point in starting the comparison at T2 , T3

(crucial observation)

Mismatch at T4 again!

13

Mismatch at T11 !

14

15

16

17

Mismatch at T18

18

After recording this match (at T16 . . . T22 , we shift P again in

order to find other matches

19

Computing

P1 . . . P[i]1 is a suffix of P1 . . . Pi1

P1 . . . Pi can be obtained by recursively applying to i

Well, obviously...

suffixes of P1 . . . Pi

20

Computing

A non-obvious conclusion:

First, lets write (k) [i] as [] applied k times to i

e.g., (2) [i] = [[i]]

[i] is equal to (k) [i 1] + 1, where k is the smallest integer

that satisfies P(k) [i1]+1 = Pi

P1 . . . Pi1 , and find the longest one whose next letter

matches Pi

21

Implementation

pi[0] = -1;

int k = -1;

for(int i = 1; i <= m; i++) {

while(k >= 0 && P[k+1] != P[i])

k = pi[k];

pi[i] = ++k;

}

22

int k = 0;

for(int i = 1; i <= n; i++) {

while(k >= 0 && P[k+1] != T[i])

k = pi[k];

k++;

if(k == m) {

// P matches T[i-m+1..i]

k = pi[k];

}

}

23

Outline

Hash Table

Knuth-Morris-Pratt (KMP) Algorithm

Suffix Trie

Suffix Array

Suffix Trie

24

Suffix Trie

suffixes (thus all the substrings)

pointer called suffix link that leads to the node corresponding

to x

Suffix Trie

25

Example

Suffix Trie

26

Incremental Construction

Then we append Tn+1 = a to T , creating necessary nodes

Create an a-transition to a new node v

T2 . . . Tn

Create an a-transition to a new node v

Create a suffix link from v to v

Suffix Trie

27

Incremental Construction

Take the suffix link at the current node

Make a new a-transition there

Create the suffix link from the previous node

Because from this point, all nodes that are reachable via suffix

links already have an a-transition

Suffix Trie

28

Construction Example

We want to add a new letter c

Suffix Trie

29

Construction Example

Suffix Trie

30

Construction Example

Suffix Trie

31

Construction Example

Suffix Trie

32

Construction Example

Suffix Trie

33

Construction Example

Suffix Trie

34

Construction Example

But the tree size can be quadratic in n

e.g., T = aa . . . abb . . . b

Suffix Trie

35

Construction Example

with P1 , P2 , etc.

Suffix Trie

36

Outline

Hash Table

Knuth-Morris-Pratt (KMP) Algorithm

Suffix Trie

Suffix Array

Suffix Array

37

Suffix Array

Suffix Array

38

Suffix Array

Can be constructed in O(n) time (!)

If you want to see how it works, read the paper on the course

website

http://cs97si.stanford.edu/suffix-array.pdf

Suffix Array

39

concatenating it with itself and see if it helps

and the KMP matcher

Suffix Array

40

- Formal Semantics- Origins Issues Early ImpactUploaded byMilana Kostic
- Composing Music With GrammarsUploaded bylikufanele
- Chapt15 Set Theory and Propositional CalculusUploaded byPi Poli
- test through lesson 3Uploaded byapi-236387090
- Conflicts of Norms and the Revision of Normative Systems.pdfUploaded byMohamad Nasir
- Cse-V-Formal Languages and Automata Theory [10cs56]-NotesUploaded byShefali Tomar
- Elain rich automa computiability and complexityUploaded by888_569160
- Discrete Structures.pdfUploaded byjadad
- Names of SyllogismsUploaded byanil
- Arabic 12Uploaded byAhlal-BaytPh
- Session-7.pptUploaded byNalam Ashika
- slides09.pdfUploaded bypicala
- 12 SLR ParsingUploaded byDinesh Chand
- refacutUploaded bySorin
- Automata Study GuideUploaded byHeron Clayton
- Ch2 Propositional LogicUploaded byvaghelamiral
- part1set2finalUploaded byapi-237793444
- c++_chapter_7_solution_of_data_handling_by_sumita_aroaUploaded byKamal Joshi
- Type Convert MatrixUploaded bynavaneet
- Chapter 6 _ Ll(k) and Lr (k) Grammars _ Formal Language & Automata TheoryUploaded byVivek Patidar
- MA2265 2 MarksUploaded bySomasundaram Mahesh
- 2268661 - Alfred HornUploaded byjohngreen13
- NPDA problem solution manualUploaded byactayechanthu
- KIVUploaded byBerthin Torres
- Data Types and VariablesUploaded byGurpreet Singh
- pda-introUploaded byAnonymous GuQd67
- ocaml-3.08-refmanUploaded byChiritoiu Tiberiu
- From Paraconsistent Logic to Universal Logic Jean-Yves BéziauUploaded byrcarturo
- Collection of String Programs using Recursion [First Edition]Uploaded byGuide For School
- 01.OverviewUploaded by00matrix00

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.