Professional Documents
Culture Documents
I HC THI NGUYN
TRNG I HC CNG NGH THNG TIN V TRUYN THNG
QUNH ANH
http://www.lrc-tnu.edu.vn/
-2-
MC LC
MC LC .......................................................................................................................1
DANH MC CC K HIU, CC CH VIT TT ..................................................5
DANH MC CC HNH V V CC BNG ............................................................ 6
M U ......................................................................................................................... 7
CHNG 1. SO KHP CHUI ..................................................................................10
1.1. Khi nim so khp chui ...................................................................................10
1.2. Lch s pht trin................................................................................................ 11
1.3. Cc cch tip cn ................................................................................................ 12
1.4. ng dng ca so khp chui..............................................................................12
1.5. Cc dng so khp chui .....................................................................................13
1.5.1. So khp n mu ........................................................................................ 13
1.5.2. So khp a mu........................................................................................... 14
1.5.3. So mu m rng .......................................................................................... 15
1.5.4. So khp chnh xc ....................................................................................... 16
1.5.5. So khp xp x ............................................................................................ 17
1.5.5.1. Pht biu bi ton ................................................................................17
1.5.5.2. Cc tip cn so khp xp x .................................................................18
1.5.5.3. tng t gia hai xu .....................................................................19
1.5. Mt s thut ton so mu ...................................................................................20
1.5.1. Thut ton Brute Force ...............................................................................20
1.5.2. Thut ton Karp-Rabin ...............................................................................21
1.5.3. Thut ton BM ( Boyer- Moor) ..................................................................24
1.5.4. Cc thut ton khc .....................................................................................27
1.6. Khp chui vi otomat hu hn .........................................................................28
1.6.1. Otomat hu hn ........................................................................................... 28
1.6.1.1. tmt hu hn n nh DFA ........................................................... 29
1.6.1.2. tmt hu hn khng n nh NFA ................................................33
1.6.2. Otomat khp chui......................................................................................36
1.6.2.1. Gii thiu ............................................................................................. 36
1.6.2.2. Thut ton xy dng Otomat so khp chui .......................................38
1.7. Kt lun chng .................................................................................................40
CHNG 2. THUT TON SO KHP CHUI KNUTH-MORRIS-PRATT..........41
2.1. Thut ton KMP .................................................................................................41
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
http://www.lrc-tnu.edu.vn/
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
-5-
DFA
DOC
Document
FA
HTML
IDF
KMP
KNUTH-MORRIS-PRATT
LAN
NFA
TF
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
-6-
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
-7-
M U
1. L do chn ti
My tnh ngy nay c s dng trong hu ht cc lnh vc v gp
phn quan trng vo vic thc y s pht trin kinh t, x hi, khoa hc k
thut, My tnh ra i nhm phc v cho nhng mc ch nht nh ca con
ngi. Vi tt c s x l ca my tnh ly thng tin hu ch v trong qu
trnh x l mt vn c bit quan trng l tm kim thng tin vi khi
lng ln, chnh xc cao, thi gian nhanh nht.
Cng vi s ph bin ca cng ngh thng tin, s lng cc ti liu in
t cng gia tng tng ngy. n nay, s lng cc ti liu c lu tr ln n
hng t trang. Trong khi , nhu cu khai thc trong kho ti liu khng l ny
tm kim nhng thng tin cn thit ang l nhu cu thng ngy v thit thc
ca ngi s dng. Tuy nhin, mt trong nhng kh khn con ngi gp phi
trong vic khai thc thng tin l kh nng tm chnh xc thng tin h cn trong
kho ti liu. tr gip cng vic ny, cc h thng tm kim ln lt c
pht trin nhm phc v cho nhu cu tm kim ca ngi s dng.
Nhng h thng tm kim bt u pht trin v a vo ng dng, ph
bin l cc h thng tm kim theo t kha. Nhiu h thng hot ng hiu qu
trn Internet nh Google, Bing, Yahoo! Tuy nhin, phn ln cc cng c tm
kim ny l nhng sn phm thng mi v m ngun c gi b mt. Hoc
cc h thng tm kim trn my c nhn nh Windows Search, Google
Desktop p ng phn no nhu cu ca ngi s dng, min ph cho c
nhn, tuy nhin cng ch p ng c trn phm vi nh v mi ch dng li
mc tm kim t kha theo tiu v phn tm tt.
C mt cch tip cn hiu qu gii quyt vn ny l thc hin vic
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
-8-
so khp v tm kim ton vn. Mt trong nhng thut ton so khp chui kinh
in l thut ton KMP. C th ni, KPM l mt thut ton mi m t c s
dng ti Vit Nam trong vic qun l, lu tr v x l lng d liu ln nhng
rt hiu qu v chnh xc. Da trn hng tip cn v s hng dn ca gio
vin, ti mnh dn nhn ti So khp chui v thut ton Knuth-MorrisPratt.
2. i tng v phm vi nghin cu
Cc khi nim so khp chui.
Cc khi nim thut ton so khp chui KMP.
Mt s ng dng trong thut ton KMP.
3. Hng nghin cu ca ti
Nghin cu tm kim KnuthMorrisPratt v ng dng trong vic
tm kim thng tin trn vn bn.
Nghin cu gii php cng ngh ci t chng trnh th nghim.
4. Nhng ni dung chnh
Lun vn c trnh by trong 3 chng, c phn m u, phn kt lun,
phn mc lc, phn ti liu tham kho. Lun vn c chia lm ba chng vi
ni dung c bn nh sau:
Chng 1: Trnh by khi nim v so khp chui, cc hng tip
cn, cc dng so khp v mt s thut ton so mu.
Chng 2: Trnh by v thut ton KMP, thut ton KMP m v
thut ton KMP-BM m.
Chng 3: Trnh by v bi ton tm kim thng tin trn vn bn v
tin hnh ci t th nghim chng trnh.
5. Phng php nghin cu
Tng hp cc ti liu c cng b v thut ton tm kim thng tin,
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
-9-
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 10 -
- 11 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 12 -
- 13 -
khp chui c s dng rng ri trong nhiu ng dng v lnh vc khc nhau
nh:
Chc nng search trong cc trnh son tho vn bn v web
browser.
Cc cng c so khp nh: Google Search, Yahoo Search,.
Sinh hc phn t nh trong so khp cc mu trong DNA,
protein,.
So khp c s d liu.
Trong nhiu knh vi cho php chp nhn c.
Trong so khp mu hoc vt ca tn cng, t nhp v cc phn
mm c hi.
Trong lnh vc an ton mng v an ton thng tin.
Cho xu mu P d di m, P = P1 P2 Pm , v xu di n, S = S1 S 2 Sn
(S thng di, l mt vn bn) trn cng mt bng ch A. Tm tt c cc xut
hin ca xu P trong S.
Trong cc thut ton so mu thng s dng cc khi nim: Khc u,
khc cui, khc con hay xu con ca mt xu, c nh ngha nh sau: Cho 3
xu x, y, z. Ta ni x l khc u (prefix) ca xu xy, l khc cui (suffix) ca
xu yx v l khc con hay xu con (factor) ca xu yxz.
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 14 -
v xu vo S =
http://www.lrc-tnu.edu.vn/
- 15 -
http://www.lrc-tnu.edu.vn/
- 16 -
http://www.lrc-tnu.edu.vn/
- 17 7. If j > m then
8. return s // s l v tr tm c
9. return false. // khng c v tr no tha mn
1.5.5. So khp xp x
http://www.lrc-tnu.edu.vn/
- 18 -
http://www.lrc-tnu.edu.vn/
- 19 -
http://www.lrc-tnu.edu.vn/
- 20 -
xu x thnh xu y (vic tnh ton kh phc tp). Khong cch son tho cng ln
th s khc nhau gia hai xu cng nhiu (hay tng t cng nh) v ngc
li. Khong cch son tho thng kim tra chnh t hay ting ni. Tu thuc
vo quy c v cc php sa i m ta nhn c cc loi khong cch son
tho khc nhau, chng hn nh:
Khong cch Hamming: Php sa i ch l php thay th k t.
Khong cch Levenshtein: Php sa i bao gm: Chn, xo, v thay
th k t.
Khong cch Damerau: Php sa i bao gm: Chn, xo, thay th
v hon v lin k ca cc k t.
2) Xu con chung di nht (hay khc con chung di nht): Mt xu w l
xu con hay khc con (substring or factor) ca xu x nu x = uwv (u, v c th
rng). Xu w l khc con chung ca hai xu x, y nu w ng thi l khc con
ca x v y. Khc con chung di nht ca hai xu x v y, k hiu LCF (x,y), l mt
khc con c di ln nht.
3) Dy con chung di nht: Mt dy con ca xu x l mt dy cc k t c
c bng cch xo i khng, mt hoc nhiu k t t x. Dy con chung ca
hai xu x, y l mt dy con ca c hai xu x v y. Dy con chung ca x v y c
di ln nht c gi l dy con chung di nht LCS (x,y). C th dng
di dy con chung ca hai xu x, y tnh khong cch Levenstein gia x v y
theo cng thc:
LevDistance (x,y) = m + n - 2 length(LCS( x,y))
http://www.lrc-tnu.edu.vn/
- 21 -
cho n khi kim tra ht vn bn. Thut ton khng cn cng vic chun b cng
nh cc mng ph cho qu trnh tm kim. phc tp tnh ton ca thut ton
ny l O(n*m).
function IsMatch(const X: string; m: integer;
const Y: string; p: integer): boolean;
var i: integer;
begin
IsMatch := false;
Dec(p);
for i := 1 to m do
if X <> Y[p + i] then Exit;
IsMatch := true;
end;
http://www.lrc-tnu.edu.vn/
- 22 -
http://www.lrc-tnu.edu.vn/
- 23 j := 1;
while j <= n - m do
begin
if hx = hy then
if IsMatch(X, m, Y, j) then Output(j);
{hm IsMatch trong phn BruteForce}
hy := ((hy - Ord(Y[j])*dM) shl 1) + Ord(Y[j + m]); {Rehash}
Inc(j);
end;
if hx = hy then
if IsMatch(X, m, Y, j) then Output(j);
end;
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
Sk c hai kh nng:
http://www.lrc-tnu.edu.vn/
- 25 -
Pi (nu
Nh vy, khi Pi
1 v (g
i hoc Pi-g
m, d2 i c xc
Pi) v ((g
k hoc Pk-g
m)
Pi+1Pi+2....Pm = Pj+1Pj+2....Pm+j-i .
Thut tnh bng dch chuyn d2
procedure computed 2();
begin
for i: = 1 to m do d2 i : = 2 *m- i;
j := m; k: = m+ 1;
while j > 0 do
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 26 begin
f j : = k;
while k <= m and Pj
Pi do
begin
d2 k := min d2 k , m- j ;
k: = f k];
end;
j := j - 1; k := k - 1;
end;
for i: = 1 to k do d2 i : = min d2 i , m +k - i
j: = f k ;
while k < = m do
begin
while k <=j do
begin
d2 k := min d2 k , j-k + m
k := k + 1;
end;
j: = f j ;
end;
end;
Pi do
begin i: = i - 1; j: = j - 1; end;
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 27 if i: = 0 then
begin
counter: = counter + 1;
j := j + m + 1;
end;
else j: =j+ max d1 Sj , d2 i ;
end;
Ghi nhn counter;
end;
http://www.lrc-tnu.edu.vn/
- 28 -
http://www.lrc-tnu.edu.vn/
- 29 -
- 30 -
http://www.lrc-tnu.edu.vn/
- 31 -
q0 Q l trng thi bt u.
F Q l tp cc trng thi kt thc.
Ta v DFA nh l b iu khin hu hn, vi mi trng thi thuc Q,
DFA c mt chui cc k hiu a t vit trn bng.
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 32 -
http://www.lrc-tnu.edu.vn/
- 33 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 34 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 35 -
http://www.lrc-tnu.edu.vn/
- 36 -
http://www.lrc-tnu.edu.vn/
- 37 -
- 38 -
http://www.lrc-tnu.edu.vn/
- 39 -
P[1..m] nh sau:
Procedure COMPUTE-TRANSITION-FUNCTION(P, )
Begin
m : = length(P)
For q : = 0 to m do
For mi k t a do
Begin
k : = min(m+1, q+2);
repeat k : = k 1
until (Pk l hu t ca Pqa);
(q, a) : = k
End;
End;
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 40 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 41 -
S:
W:
ABCDABD
i:
http://www.lrc-tnu.edu.vn/
- 42 -
m:
S:
W:
ABCDABD
i:
_1
S:
W:
ABCDABD
i:
__2
S:
W:
ABCDABD
i:
___3
____4
S:
W:
ABCDABD
i:
http://www.lrc-tnu.edu.vn/
- 43 -
________8
S:
W:
ABCDABD
i:
__2
Qu trnh so khp ngay lp tc tht bi, nhng trong W khng xut hin
k t ,v vy, ta tng m ln 11, v gn i = 0.
m:
___________11
S:
W:
ABCDABD
i:
_______________15
ABC ABCDAB ABCDABCDABDE
W:
ABCDABD
i:
__2
http://www.lrc-tnu.edu.vn/
- 44 -
http://www.lrc-tnu.edu.vn/
- 45 -
http://www.lrc-tnu.edu.vn/
- 46 -
W[i]
A B C D A B D
T[i]
-1
i
W[i]
T[i]
3 4 5 6 7 8 9 0 1 2 3
N
P A R A C H U T E
0 0 0 1 2 3 0 0 0 0 0
http://www.lrc-tnu.edu.vn/
- 47 if W[pos - 1] = W[cnd],
let T[pos] cnd + 1, pos pos + 1, cnd cnd + 1
(trng hp hai: khng tha mn, nhng ta c th quay ngc tr li)
otherwise, if cnd > 0, let cnd T[cnd]
(trng hp ba: ht phn t. Ch rng cnd = 0)
otherwise, let T[pos] 0, pos pos + 1
http://www.lrc-tnu.edu.vn/
- 48 i
W[i]
A A A A A A A
T[i]
-1
- 49 -
c yu cu snh mu nh trn.
Otomat m so mu l b A(P) = (A, Q, q0, , F) trong :
Bng ch vo A = AP
{#}
http://www.lrc-tnu.edu.vn/
- 50 for i: = 0 to m do
for t: = 0 to k-1 do
begin
if i = m then j:= next [i+1]
elsse j:=i+1;
while (j > 0) and (Pj
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 51 -
next[i]
A
Q
0
1
2
3
4
5
6
S ha bi Trung tm Hc liu
1
2
0
4
2
6
7
0
0
3
0
5
0
0
0
0
0
0
0
0
0
http://www.lrc-tnu.edu.vn/
- 52 7
8
2
4
8
0
0
0
10 11 12 13 14 15 16 17 18 19 20 21
ghi nhn
ghi nhn
ghi nhn
11-8+1=4
168+1=9
21-8+1=14
j-i
= i - 1. Xt
= i). Tng i, j ln
Pi (hay m ti v tr j l
i)
=i-1
i
S
i
P
?
next [i]
http://www.lrc-tnu.edu.vn/
- 53 -
j=3
P=aababaab
i=3
dch ln th nht a a b a b a a b
i = next[i] = 2
dch ln th 2
i = next[i] = 0
aababaab
= TFuzz (
j-1,
Sj). Lnh ny
Mu P
Kch thc tp S
TKMP
TFuzzy-KMP
1)
aababcab
1400 KB
17% s
11% s
2)
MDSVF6V
140.000 KB
35 s
30 s
3)
bacabccaa
1200 KB
16% s
10% s
4)
S068FAB50
140.000 KB
37 s
30 s
http://www.lrc-tnu.edu.vn/
- 54 -
http://www.lrc-tnu.edu.vn/
S ha bi Trung tm Hc liu
S
ptr
P(n1)
m+1
ptr+m-n1
- 55 -
2.4.2. Otomat m so mu
n2
N, 0
n1
Qs; (q, w)
q = (q, w)
Nu n2 > 1 th t n1 = 0
2.
3.
Ap th n2 = 1, ngc li n2=1+m-n1
http://www.lrc-tnu.edu.vn/
- 56 -
n do
j: = j + qold.n2
c khi k t quan st w; {w1 Sj}
{Tnh q = (qold, w)}
if qold.n2 > 1 then qold.n1:= 0; endif;
q.n1: = TFuzz (qold.n1, w1);
q.n2: = 1;
if q.n1= m then
Ghi nhn v tr xut hin mu l j - m + 1;
Counter: = counter + 1;
else if q.n1 < m and q.n1 < qold.n1 then
if w1+m-q.n1
endif;
endif;
qold := q;
end while;
2.3.3. Thut ton tm kim
procedure GFSearching (); {tm kim mu}
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
j := j + n2;
if n2 > 1 then n1:= 0;
n1 := TFuzz [n1, index [S[j]]];
n2: = 1;
if n1 = m then
begin
counter := counter + 1;
apr[counter]:= j - m + 1;
end
else if n1 < m and n1 <= n1 then
begin if j + m - n1 > n then return;
if index [S[j + m - n1]] = k then n2 : = 1 + m - n1;
end;
n1:=n1; n2: = n2;
end;
end;
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 58 -
http://www.lrc-tnu.edu.vn/
- 59 -
http://www.lrc-tnu.edu.vn/
- 60 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 61 -
- 62 -
http://www.lrc-tnu.edu.vn/
- 63 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 65 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 66 -
http://www.lrc-tnu.edu.vn/
- 67 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 68 -
http://www.lrc-tnu.edu.vn/
http://www.lrc-tnu.edu.vn/
- 70 }
- Hm to bng so snh:
public static int[] BuildTable(string p)
{
int[] result = new int[p.Length];
result[0] = 0;
for (int i = 1; i < p.Length - 1; i++)
{
// The substring from p[1] to p[i]
string s = p.Substring(0, i + 1);
- Hm tm kim
private static int SearchKMP(int[] x, string s)
{
int n = s.Length;
int l = x.Length;
int find = 0;
Char[] charPattern = pattern.ToCharArray();
for (int i = 0; i < n; )
{
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 72 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 73 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 74 -
KT LUN
nh gi kt qu ti:
Trong qu trnh nghin cu v thc hin, lun vn t c nhng kt
qu nh sau:
Gii thiu mt s khi nim c bn so khp chui, cc hng tip
cn, cc dng so khp v mt s thut ton so mu.
Trnh by v thut ton KMP, thut ton KMP m v thut ton
KMP-BM m.
Ci t thut ton KMP bng ngn ng lp trnh C# chy trn nn
h iu hnh Window v sau th nghim tm kim vi mt s
cm t kha trn cc file vn bn c lu tr.
Hn ch:
Chng trnh th nghim cn n gin. Chng trnh ch thc hin
c cc thut ton tm kim trn mt s nh dng c bn: doc,
ppt, xls, html, txt. Cha h tr tm kim trn mt s nh dng: pdf,
docx, xlsx, ppts.
Chng trnh mi dng li tm kim trong my cc b, cha h tr
tm kim thng qua mng LAN v Internet.
Hng pht trin trong tng lai:
Vi nhng kt qu t c, tc gi xut mt s cng vic tip theo
trong thi gian ti nh sau:
Tip tc x l nhng vn cn tn ti trong chng trnh th
nghim ci t nh: Vn d liu vo, xy dng giao din
chng trnh thn thin v d s dng hn.
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 75 -
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/
- 76 -
in
Action,
Apache
Jakarta
Project
Management
Committee.
S ha bi Trung tm Hc liu
http://www.lrc-tnu.edu.vn/