Term Paper Synopsis CSE 408 Algorithm

We then _nd a mismatch: P[4] 6= T[4]. : : : . we consider the following example: T = xyxxyxyxyyxyxyxyyxyxyxxy P = xyxyyxyxyxx At a high level. P[5] == T[17]. Then a mismatch is discovered: P[11] 6= T[16]. 1. 2. : : : . P[6] == T[18] and so on. 10] is matched with T[6. : : : . At a later point. The text and pattern are included in Figure 1. Based on the fact that we know T[6. Let's look at some examples of how sliding can be done. Based on our knowledge that P[1. 3] is successfully matched with T[1. 15] = P[1. the KMP algorithm is similar to the naive algorithm: it considers shifts in order from 1 to nm. the pattern slides to the right again. thus bypassing re-examination of previously matched characters. starting with the pattern aligned underneath the text at the leftmost end. Thus. we can tell that the _rst possible shift that might result in a match is 12. 15]. 11] = T[13. 23]. we repeatedly \slide" the pattern to the right and attempt to match it with the text. 3. to make it easier to follow. we will slide the pattern right. The next comparison is between P[2] and T[4]. Morris in 1977. with numbering. The di_erence is that the KMP algorithm uses information gleaned from partial matches of the pattern and text to skip over shifts that are guaranteed not to result in a match. : : : . Suppose that. : : : . 3]. 10] (and ignoring symbols of the pattern after position 10 and symbols of the text after position 15).The Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs. the word itself embodies sufficient information to determine where the next match could begin. but the three published it jointly. The algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H. : : : . : : : . Therefore. : : : . : : : . 3]. Consider the situation when P[1. To illustrate the ideas of the algorithm. and next ask whether P[1. what can we deduce about where a potential match might be? In this case. the algorithm slides the pattern 2 positions to the right so that P[1] is lined up with T[3]. P[1. the next comparisons done are P[4] == T[16]. and ignoring symbols of the pattern and text after position 3. so that the next comparison is between P[1] and T[4]. 3] = T[1. Since P[2] 6= T[4]. and determines if the pattern matches at that shift. . : : : . as long as matches are found.

Also. The following notation is useful. : : : . each string of the form si : : : sk. A su_x s0 of s is a proper su_x if s0 6= s. q] is matched with the text T[iq+1.Sliding rule We need to make precise exactly how to implement the sliding rule. Let S = s1s2 : : : sk be a string. with the last symbol of this pre_x aligned at T[i]. i] and a mismatch then occurs: P[q+1] 6= T[i+1]. Then. q] that is also a su_x of P[1. Each string of the form s1 : : : si. Similarly. If _(q) is the number such that P[1. 1 _ i _ k is called a pre_x of s. Also. q] is now aligned with the text. _(q)] is . Suppose that P[1. : : : . slide the pattern right so that the longest possible proper pre_x of P[1. we de_ne the empty string (containing no symbols) to be a pre_x of s. : : : . 1 _ i _ k is called a su_x of s. the empty string (containing no symbols) is a su_x of s. : : : . A pre_x s0 of s is a proper pre_x if s0 6= s. : : : .