You are on page 1of 7

Thut ton KnuthMorrisPratt

Thut ton KnuthMorrisPratt


Thut ton so khp chui KnuthMorrisPratt (hay thut ton KMP) tm kim s xut hin ca mt "t" W trong mt "xu vn bn" S bng cch tip tc qu trnh tm kim khi khng ph hp, chnh t cho ta y thng tin xc nh v tr bt u ca k t so snh tip theo, do b qua qu trnh kim tra li cc k t so snh trc . Thut ton c Donald Knuth, Vaughan Pratt v J. H. Morris nghin cu c lp nm 1977, nhng h cng b n cng nhau.

Thut ton KMP


V d cho thut ton tm kim
minh ha chi tit thut ton, chng ta s tm hiu tng qu trnh thc hin ca thut ton. mi thi im, thut ton lun c xc nh bng hai bin kiu nguyn, m v i, c nh ngha ln lt l v tr tng ng trn S bt u cho mt php so snh vi W, v ch s trn W xc nh k t ang c so snh. Khi bt u, thut ton c xc nh nh sau: m: S: W: i: 0 ABC ABCDAB ABCDABCDABDE ABCDABD 0

Chng ta tin hnh so snh cc k t ca W tng ng vi cc k t ca S, di chuyn ln lt sang cc ch ci tip theo nu chng ging nhau. S[0] v W[0] u l A. Ta tng i : m: S: W: i: 0 ABC ABCDAB ABCDABCDABDE ABCDABD _1

S[1] v W[1] u l B. Ta tip tc tng i : m: S: W: i: 0 ABC ABCDAB ABCDABCDABDE ABCDABD __2

S[2] v W[2] u l C. Ta tng i ln 3 : m: S: W: i: 0 ABC ABCDAB ABCDABCDABDE ABCDABD ___3

Nhng, trong bc th t, ta thy S[3] l mt khong trng trong khi W[3] = 'D', khng ph hp. Thay v tip tc so snh li v tr S[1], ta nhn thy rng khng c k t 'A' xut hin trong khong t v tr 0 n v tr 3 trn xu S ngoi tr v tr 0; do , nh vo qu trnh so snh cc k t trc , chng ta thy rng khng c kh nng tm thy xu d c so snh li. V vy, chng ta di chuyn n k t tip theo, gn m = 4 v i = 0. m: S: ____4 ABC ABCDAB ABCDABCDABDE

Thut ton KnuthMorrisPratt W: i: ABCDABD 0

Tip tc qu trnh so snh nh trn, ta xc nh c xu chung "ABCDAB", vi W[6] (S[10]), ta li thy khng ph hp. Nhng t kt qu ca qu trnh so snh trc, ta duyt qua "AB", c kh nng s l khi u cho mt on xu khp, v vy ta bt u so sanh t v tr ny. Nh chng ta thy cc k t ny trng khp vi hau k t trong php so khp trc, chng ta khng cn kim tra li chng mt ln na; ta bt u vi m = 8, i = 2 v tip tc qu trnh so khp. m: S: W: i: ________8 ABC ABCDAB ABCDABCDABDE ABCDABD __2

Qu trnh so khp ngay lp tc tht bi, nhng trong W khng xut hin k t ,v vy, ta tng m ln 11, v gn i = 0. m: S: W: i: ___________11 ABC ABCDAB ABCDABCDABDE ABCDABD 0

Mt ln na, hai xu trng khp on k t "ABCDAB" nhng k t tip theo, 'C', khng trng vi 'D' trong W. Ging nh trc, ta gn m = 15, v gn i = 2, v tip tc so snh. m: S: W: i: _______________15 ABC ABCDAB ABCDABCDABDE ABCDABD __2

Ln ny, chng ta tm c khp tng ng vi v tr bt u l S[15].

Thut ton v m gi ca thut ton tm kim


By gi, chng ta tm hiu v s tn ti ca bng "so khp mt phn"(partial match) T, c m t bn di, gip ta xc nh c v tr tip theo so khp khi php so khp trc tht bi. Mng T c t chc nu chng ta c mt php so khp bt u t S[m] tht bi khi so snh S[m + i] vi W[i], th v tr ca php so khp tip theo c ch s l m + i - T[i] trong S (T[i] l i lng xc nh s cn li khi c mt php so khp tht bi). Mc d php so khp tip theo s bt u ch s m + i - T[i], ging nh v d trn, chng ta khng cn so snh cc k t T[i] sau n, v vy chng ta ch cn tip tc so snh t k t W[T[i]]. Ta c T[0] = -1, cho thy rng nu W[0] khng khp, ta khng phi li li m tip tc php so snh mi k t tip theo. Sau y l on m gi mu ca thut ton tm kim KMP. algorithm kmp_search: input: mng k t, S (on vn bn) mng k t, W (xu ang tm) output: mt bin kiu nguyn ( v tr (bt u t 0) trn S m W c tm thy) define variables: bin nguyn, m 0

Thut ton KnuthMorrisPratt bin nguyn, i 0 mng nguyn, T while m + i nh hn di ca su S, do: if W[i] = S[m + i], let i i + 1 if i bng di W, return m otherwise, let m m + i - T[i], if T[i] ln hn -1, let i T[i] else let i 0 return di ca on vn bn S

phc tp ca thut ton tm kim


Vi s xut hin ca mng T, phn tm kim ca thut ton KnuthMorrisPratt c phc tp O(k), trong k l di ca xu S. Ngoi tr cc th tc nhp xut hm ban u, tt c cc php ton u c thc hin trong vng lp while, chng ta s tnh s cu lnh c thc hin trong vng lp; lm c vic ny ta cn phi tm hiu v bn cht ca mng T. Theo nh ngha, mng c to : nu mt php so khp bt u v tr S[m] tht bi khi so snh S[m + i] vi W[i], th php so khp c th thnh cng tip theo s bt u v tr S[m + (i - T[i])]. C th hn, php so khp tip theo s bt u ti v tr c ch s cao hn m, v vy T[i] < i. T iu ny, chng ta thy rng vng lp c th thc hin 2k ln. Vi mi ln lp, n thc hin mt trong hai nhnh ca vng lp. Nhnh th nht tng i v khng thay i m, v vy ch s m + i ca k t ang so snh trn S tng ln. Nhnh th hai cng thm i - T[i] vo m, v nh chng ta bit, y lun l s dng. V vy, v tr m, v tr bt u ca mt php so khp tim nng tng ln. Vng lp dng nu m + i = k; v vy mi nhnh ca vng lp c th c s dng trong ti a k ln, do chng ln lt tng gi tr ca m + i hoc m, v m m + i: nu m = k, th m + i k, v vy: do cc php ton ch yu tng theo n v, chng ta c m + i = k vo mt thi im no trc, v v vy thut ton dng. Do vng lp ch yu thc hin 2k ln, phc tp tnh ton ca thut ton tm kim ch l O(k).

Bng so snh mt phn ("Partial match")


Mc ch ca bng l cho php thut ton so snh mi k t ca S khng qu mt ln. S quan st cha kha v bn cht ca phng php tm kim tuyn tnh cho php iu ny xy ra l trong qu trnh so snh cc on ca chui chnh vi on m u ca mu, chng ta bit chnh xc c nhng v tr m on mu c th xut hin trc v tr hin ti. Ni cch khc, chng ta t tm kim on mu trc v a ra mt danh sch cc v tr trc m b qu ti cc k t v vng m vn khng mt i cc on tim nng. Chng ra mun tm kim, vi mi v tr trn W, di ca on di nht ging vi on bt u trn W tnh n (khng bao gm) v tr , y l khong cch chng ra c th li li tip tc so khp. Do vy T[i] l gi tr ca di on di nht kt thc bi phn t W[i - 1]. Chng ta s dng quy c rng mt chui rng c di l 0. Vi trng hp khng trng vi mu ngay gi tr u tin (khng c kh nng li li), ta gn T[0] = -1.

Thut ton KnuthMorrisPratt

V d cho thut ton xy dng bng


Ta xt xu W = "ABCDABD". Ta s thy thut ton xy dng bng c nhiu nt tng ng vi thut ton tm kim chnh. Ta gn T[0] = -1. tnh T[1], ta cn tm ra mt xu con "A" ng thi cng l xu con bt u ca W. V vy ta gn T[1] = 0. Tng t , T[2] = 0 v T[3] = 0. Ta xt n k t W[4], 'A'. D thy k t ny trng vi k t bt u xu W[0]. Nhng do T[i] l di xu di nht trng vi xu con bt u trong W tnh n W[i 1] nn T[4] = 0 v T[5] = 1. Tng t, k t W[5] trng vi k t W[1] nn T[6] = 2. V vy ta c bng sau:
i 0 1 2 3 4 5 6

W[i] A B C D A B D T[i] -1 0 0 0 0 1 2

Mt v d khc phc tp hn
i W[i] 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 P A R T I C I P A T E I N P A R A C H U T E

T[i] -1 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 1 2 3 0 0 0 0 0

M gi ca thut ton to bng


V d trn m t k thut tng qut to bng. Di y l on m gi algorithm kmp_table: input: mng k t, W mng s nguyn, T output: mng T define variables: bin kiu nguyn, pos 2 bin kiu nguyn, cnd 0 let T[0] -1, T[1] 0 while pos nh hn di ca W, do: (trng hp mt: tip tc dy con) if W[pos - 1] = W[cnd], let T[pos] cnd + 1, pos pos + 1, cnd cnd + 1 (trng hp hai: khng tha mn, nhng ta c th quay ngc tr li) otherwise, if cnd > 0, let cnd T[cnd] (trng hp ba: ht phn t. Ch rng cnd = 0) otherwise, let T[pos] 0, pos pos + 1

Thut ton KnuthMorrisPratt

phc tp ca thut ton to bng


phc tp ca thut ton to bng l O(n), vi n l di ca W. Ngoi tr mt s sp xp ban u, ton b cng vic c thc hin trong vng lp while, phc tp ca ton b vng lp l O(n), vi vic cng lc s l gi tr ca pos v pos - cnd. Trong trng hp th nht, pos - cnd khng thay i, khi c pos v cnd cng tng ln mt n v. trng hp hai, cnd c thay th bi T[cnd], nh chng ta bit trn, lun lun nh hn cnd, do tng gi tr ca pos - cnd. Trong trng hp th ba, pos tng v cnd th khng, nn c gi tr ca pos v pos - cnd u tng. M pos pos - cnd, iu ny c ngha l mi bc hoc pos hoc chn di pos u tng; m thut ton kt thc khi pos = n, nn n phi kt thc ti a sau 2n vng lp, do pos - cnd bt u vi gi tr 1. V vy phc tp ca thut ton xy dng bng l O(n).

phc tp ca thut ton KMP


Do phc tp ca hai phn trong thut ton ln lt l O(k) v O(n), nn phc tp ca c thut ton l O(n + k). Nh thy trong v d trn, thut ton mnh hn cc thut ton so khp chui km hn v n c th b qua cc k t duyt. t phi quay tr li hn, thut ton s nhanh hn, v c th hin trong bng T bi s hin din ca cc s khng. Mt t nh "ABCDEFG" s lm tt vi thut ton ny v n khng c s lp li ca nhng ch bt u, v vy mng n gin ch ton s khng vi -1 u. Ngc li, vi t W = "AAAAAAA" n hot ng ti t, bi v bng s l
i 0 1 2 3 4 5 6

W[i] A A A A A A A T[i] -1 0 1 2 3 4 5

y l mu xu nht cho mng T, v n c th dng so snh vi on nh S = "AAAAAABAAAAAABAAAAAAA", trong trng hp ny thut ton s c gng ghp tt c cc ch 'A' vi 'B' trc khi dng li; kt qu l s lng ti a cu lnh c s dng, tin ti trn hai ln s k t ca xu S khi s ln lp ca "AAAAAAB" tng. Mc d qu trnh xy dng bng rt nhanh so vi ch ny (nhng v tc dng), qu trnh ny chy c mt ln vi ch W, trong khi qu trnh tm kim chy rt nhiu ln. Nu vi mi ln, t W c dng tm trn xu nh xu S, phc tp tng th s rt ln. Bng cch so sch, s kt hp ny l trng hp tt nht vi thut ton so khp chui Boyer-Moore. Lu rng trong thc t, thut ton KMP lm vic khng tt i vi tm kim trong vn bn ngn ng t nhin, bi v n ch c th b qua cc k t khi phn u ca t ging vi mt phn trong vn bn. Trong thc t iu ny ch i khi xy ra trong cc vn bn ngn ng t nhin. V d, hy xem xt bao nhiu ln mt xu "text" xut hin trong on vn ny.

Thut ton KnuthMorrisPratt

Lin kt ngoi
An explanation of the algorithm [1] and sample C++ code [2] by David Eppstein Knuth-Morris-Pratt algorithm [3] description and C code by Christian Charras and Thierry Lecroq Interactive animation for Knuth-Morris-Pratt algorithm [4] by Mike Goodrich Explanation of the algorithm from scratch [5] by FH Flensburg.

Ch thch
Donald Knuth; James H. Morris, Jr, Vaughan Pratt (1977). "Fast pattern matching in strings [6]". SIAM Journal on Computing 6 (2): 323350. doi:10.1137/0206024. Thomas H. Cormen; Charles E. Leiserson, Ronald L. Rivest, Clifford Stein (2001). "Section 32.4: The Knuth-Morris-Pratt algorithm". Introduction to Algorithms (n bn Second). MIT Press and McGraw-Hill. 923931. ISBN978-0-262-03293-3.

Ch thch
[1] http:/ / www. ics. uci. edu/ ~eppstein/ 161/ 960227. html [2] http:/ / www. ics. uci. edu/ ~eppstein/ 161/ kmp/ [3] [4] [5] [6] http:/ / www-igm. univ-mlv. fr/ ~lecroq/ string/ node8. html http:/ / www. ics. uci. edu/ ~goodrich/ dsa/ 11strings/ demos/ pattern/ http:/ / www. inf. fh-flensburg. de/ lang/ algorithmen/ pattern/ kmpen. htm http:/ / citeseer. ist. psu. edu/ context/ 23820/ 0

Ngun v ngi ng gp vo bi

Ngun v ngi ng gp vo bi
Thut ton KnuthMorrisPratt Ngun: http://vi.wikipedia.org/w/index.php?oldid=5172762 Ngi ng gp: Ph phch, Pq, Tranh, Y Kpia Mlo, 1 sa i v danh

Giy php
Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/

You might also like