You are on page 1of 30

Bi tp ln Thit k v Phn tch Thut ton

s 8: Bi ton tm xu con chung di nht

Sinh vin

: L Ngc Minh

Lp

: Khoa hc my tnh

Kho

: 52

SHSV

: 20071946

H Ni, thng 11/2010

Mc lc
Chng 1. Gii thiu..................................................................................................................................3
Chng 2. Cy hu t................................................................................................................................ 4
2.1. Khi nim.......................................................................................................................................4
2.2. Biu din cy hu t trong my tnh...............................................................................................5
2.3. Gii thut dng cy hu t Ukkonen..............................................................................................6
2.3.1. Cy hu t ngm nh (implicit suffix tree)............................................................................6
2.3.2. Dng cy hu t ngm nh....................................................................................................7
2.3.3. Lin kt hu t (suffix link)....................................................................................................9
2.3.4. Tng tc thut ton s dng lin kt hu t............................................................................9
2.3.5. Thut ton dng cy hu t ngm nh trong thi gian tuyn tnh.......................................12
2.3.6. Dng cy hu t thc s.......................................................................................................13
Chng 3. Cy hu t tng qut.............................................................................................................. 14
3.1. Khi nim.....................................................................................................................................14
3.2. Biu din cy hu t tng qut trong my tnh.............................................................................14
3.3. Dng cy hu t tng qut trong thi gian tuyn tnh..................................................................15
Chng 4. Bi ton tm xu con chung di nht......................................................................................16
4.1. Khi nim.....................................................................................................................................16
4.2. Tm xu con chung di nht ca hai xu......................................................................................16
4.3. Tm xu con chung di nht ca nhiu hn hai xu.....................................................................16
4.3.1. Tnh s C(v)..........................................................................................................................17
Chng 5. Chng trnh th nghim.......................................................................................................18
5.1. Kt qu th nghim......................................................................................................................18
5.2. Hng dn ci t chng trnh...................................................................................................19
5.3. Hng dn s dng chng trnh.................................................................................................19
5.3.1. Trc quan ho cy hu t......................................................................................................19
5.3.2. Tm xu con chung di nht ca cc xu ln........................................................................20
5.3.3. Sinh xu ln..........................................................................................................................21
5.4. Hng dn bin dch m ngun................................................................................................... 21
5.5. M ngun chng trnh................................................................................................................27
Ph lc A. Ti liu tham kho..................................................................................................................28
Ph lc B. Danh mc hnh....................................................................................................................... 29
Ph lc C. Danh mc thut ng............................................................................................................... 30

Chng 1. Gii thiu


Tm xu con chung di nht bng cy hu t

Chng 2. Cy hu t
2.1.

Khi nim
nh ngha: Cy hu t T ca chui m k t S l cy c hng, c gc c cc tnh cht sau:

Cc ng i t gc n l tng ng 1-1 vi cc hu t ca S.

Mi nt trong, tr nt gc, c t nht l hai con.

Mi cnh c gn nhn l mt xu con khc rng ca S.

Khng c hai cnh no ca cng mt nt c nhn bt u bng cng mt k t

V d, chui xabxac c cy hu t nh Hnh 1. ng i t gc n nt l 1 tng ng vi ton b


xu cn ng i t gc n nt l 5 cho ta xu ac, bt u v tr 5 ca xu S.
nh ngha ca cy hu t khng m bo mt cy nh vy lun tn ti vi mi xu S. Nu mt
hu t ca S li l tin t ca mt hu t khc th ng i t gc n n s khng kt thc bi mt nt
l. V d, nu b ch c trong xu S, hu t xa l tin t ca hu t xabxa nn khng c ng i no t
gc n l tng ng vi xa.
m bo lun dng c cy hu t ngi ta thng thm mt k t c bit vo cui xu S,
gi l k t kt thc, khng c bt c hu t no l tin t ca hu t khc. K t ny phi khng xut
hin trong xu ban u, ngi ta thng chn mt k t khng c trong bng ch ci. Trong cc ti liu
thng k hiu l $.
Mt s khi nim lin quan n cy hu t:

Nhn ca mt ng i bt u t gc l xu nhn c bng cch ghp cc xu dc theo


ng i theo th t.

Nhn ng i ca mt nt l nhn ca ng i bt u t gc n nt

su chui ca mt nt l di nhn ng i ca nt

su nt (node-depth) ca mt nt l s nt trn ng i t gc n nt

Nhn ca mt ng i kt thc gia cnh (u, v) chia nhn ca (u, v) ti mt im c nh


ngha l nhn ca u ghp vi cc k t trong nhn ca (u, v) t u n im chia.

Hnh 1: Cy hu t ca chui xabxac


V d trong Hnh 1, nhn ca w l xa, nhn ca u l a v xu xabx l nhn ca ng i t gc n
gia cnh (w, 1).

2.2.

Biu din cy hu t trong my tnh

Theo cch n gin nht, mi cnh ca cy c gn nhn ng bng mt xu con ca S. Do tng


chiu di tt c cc xu con vo c O(m 2) nn gii hn di cho thi gian tnh ca thut ton dng cy
cng l O(m2).
gim kch thc b nh v t c thi gian tnh nh hn, ta biu din cnh ca cy bng ch
s u v ch s cui ca n trong xu S. Nh vy mi cnh ch cn hai s gn nhn v s cnh ti
a l 2m-1 nn b nh yu cu l O(m).

Hnh 2: Cy bn tri l mt phn ca cy hu t cho chui S=abcdefabcuvw vi nhn ca cnh c


vit tng minh. Cy bn phi biu din nhn s dng hai ch s. Lu rng cnh c nhn 2,3 cng c
th c gn nhn 8,9
cho n gin, cc hnh v v din gii sau y vn coi nh nhn ca cnh l c xu con ca

chui.

2.3.

Gii thut dng cy hu t Ukkonen

Bo co ny s tip cn gii thut Ukkonen theo cch trc ht trnh by mt gii thut n gin,
d hiu nhng km hiu qu sau s tng tc dn bng cc quan st b sung v cc mo ci t.

2.3.1.

Cy hu t ngm nh (implicit suffix tree)

nh ngha: Cy hu t ngm nh ca xu S l cy nhn c t cy hu t ca S sau cc bc


x l:
1. Xo tt c cc k t kt thc $ trong cc nhn
2. Xo cc cnh khng c nhn
3. Xo cc nt c t hn 2 con
V d vi chui xabxa$ c cy hu t nh Hnh 3, nu b k t kt thc $, xu xabxa c mt s hu
t l tin t ca hu t khc nn chng ta cn xo mt s cnh v nt c cy hu t ngm nh
nh Hnh 4. Xt trng hp chui xabxac, do k t c ch xut hin duy nht mt ln cui xu nn cy
hu t ngm nh cng l cy hu t.
Gii thut Ukkonen gm hai bc:
1. Dng cy hu t ngm nh ca xu S
2. Chuyn cy hu t ngm nh thnh cy hu t tht s

Hnh 3: Cy hu t cho chui xabxa$

Hnh 4: Cy hu t ngm nh cho xu xabxa

2.3.2.

Dng cy hu t ngm nh

Gii thut chia lm m pha, mi bc ta c gng thm mt k t ca xu S vo cy. Gi cy ti pha


th i l Ii, u tin ta c cy I1 vi mt cnh duy nht cha S 1. Pha i+1 c chia nh thnh i+1 bc
m rng trong ta thm k t Si+1 vo hu t S[j..i] ca xu S[1..i]. Ti bc m rng j, xt ng i
t gc c nhn =S[j..i] v thc hin mt trong ba lut m rng sau:
1. Nulnhncamtntl:ktSi+1cthmvocnhnivintl.
2. NukhngcnginotbtubngSi+1nhngctnhtmtnginitip :
trnghpnytacnthmmtcnhcnhnlSi+1, nu ktthcgiamtcnhthmt
ntmicngcncto.
3. NucnginitipbtubngSi+1:khnglmgvchuynsangbctiptheo.
Tibcmrngi+1caphai+1,lxurng,thuttonnginthmktSi+1bndint
gc(trkhin).
XtvdtrongHnh5vHnh6,bnhututinktthcntlnhnghutcuicngch
gmktxktthcbntrongmtcnh.Khithmktthsub,bnhututincm rng
bnglut1,hutthnmsdnglut2vvihutth6llut3.

Hnh 5: Cy hu t ngm nh cho xu axabx trc khi k t th 6, b, c thm

Hnh 6: Cy hu t ngm nh sau khi thm k t b


Dthyrngtacthtmimktthccamihuttrongi+1hutcaxuS[1..i]bngcch
duytcytgcvichiphthigianlO(| |).Saukhitmcimktthccahut,thaotc
mrngchcnthigianhngs.
Vicchlmny,s phptoncbncnthchinl
,dophc
3
tptnhtonvokhongO(n ).Rrngthuttonnykhngthctvcnhngcchlmngin
hntmxuconchungdinhttrongthigianO(n2)trongtrnghptinht.Chngtasxemxt

cchcitingiithuttrongccphnsau.

2.3.3. Linkthut(suffixlink)
nh ngha: Gi x l mt xu bt k, x l mt k t v l mt xu (c th rng). Xt nt trong v
c nhn l x, nu c mt nt khc s(v) c nhn l , mt con tr t v n s(v) gi l mt lin kt hu
t.
Trong Hnh 1 (trang 5), nt v c nhn xa do s(v) l nt c nhn a, tn ti mt lin kt t v n
s(v). Trng hp ny ch gm mt k t.
Trng hp c bit khi l xu rng, lin kt c tr n nt gc. Nt gc khng c coi l
nt trong v khng c lin kt hu t no bt u t n.
B : Nu mt nt trong mi v c nhn x c thm trong ln m rng th j ca pha i+1 th
hoc mt ng i c nhn tn ti trong cy hoc mt nt trong mi c nhn s c thm
trong ln m rng tip theo, j+1 ca cng pha i+1.
CHNG MINH: Nt trong mi c to ra ch khi trng hp m rng th hai xy ra, ngha l ng
i c nhn x tip ni bi mt k t khc k t S i+1, ta gi k t l c. bc m rng th j+1, ta c
gng thm xu Si+1 vo cy, r rng ng i c nhn tn ti v tip ni bi k t c. C hai
trng hp xy ra:

Nu ng i c nhn ch c tip ni bng k t c m khng phi k t no khc hay ni


cch khc ch tn ti ng i c m khng tn ti ng i d vi mi d khc c. Lut m rng
th hai c p dng v to ra nt s(v) c nhn .

Nu ng i c nhn cn hai hoc nhiu hn k t tip ni, chng hn c, d,... chc chn
tn ti nt c nhn .
Vy, b c chng minh.

H qu: Trong gii thut Ukkonen, mi nt trong mi c to ra s c mt lin kt hu t khi


pha tip theo kt thc.
CHNG MINH: nh l c chng minh bng quy np.

Cy u tin, I1, tho mn nh l v khng cha nt trong no.

Gi s nh l ng vi cy Ii, ta cn chng minh n vn ng vi cy Ii+1.

Theo b trn, nu mt nt trong v c to ra trong bc m rng j, nt s(v) ca n s c


to ra hoc tm thy trong bc m rng j+1. Bc m rng cui cng thm xu ch gm k t
Si+1 nn khng to ra nt trong mi no.

Vy, h qu c chng minh. Ta thy khi xy dng xong cy I i+1, mi nt trong ca n u c


lin kt hu t, ta c h qu tip theo.
H qu: Vi mi cy hu t ngm nh Ii, nu mt nt trong c nhn x th c mt nt khc ca
cy Ii c nhn .
Da vo lin kt hu t, ta c th tng tc thut ton t c phc tp tnh ton thc t hn.

2.3.4.

Tng tc thut ton s dng lin kt hu t

Trong gii thut u tin, ti pha i+1, bc m rng j ta cn tm ng i c nhn S[j..i] mt


thi gian O(||). Lin kt hu t c th c s dng gim phc tp ca bc ny xung hng s.

ng i c nhn S[1..i] chc chn phi kt thc l v n l ng i di nht trong cy I i. Khi


xy dng cy Ii ta lu li nt l tng ng vi ton b xu ang xt S[1..i]. Bc b sung u tin ca
pha i+1 ly nt l ny v p dng lut b sung th nht do ch cn thi gian hng s.
t S[1..i] = x v (v, 1) l cnh n nt l, nhn ca cnh l , bc m rng tip theo thut ton
cn tm ng i c nhn l S[2..i] = . Nu v l nt gc, ta duyt cy t gc theo thut ton trc.
Nu v khng phi nt gc th c mt lin kt hu t t v n nt s(v), ta bt u duyt cy t nt s(v).
ng i t v tr hin ti c nhn l ng i t gc c nhn .
Ti bc m rng th j vi j > 2 ta cng lm tng t. Khc bit duy nht l nu xu S[j-1..i] l
nhn ca mt nt trong, khi ta theo lin kt hu t ca nt ny. K c khi xu S[j-1..i] kt thc mt
nt l th nt cha ca n hoc l mt nt trong (do c lin kt hu t) hoc l nt gc. Vy ta khng
bao gi phi i ngc ln qu mt cnh.

Hnh 7: Bc m rng j>1 trong pha i. i ln ti a l mt cnh t cui ng i S[j-1..i] n nt v


sau theo lin kt hu t n s(v), i xung theo ng i c nhn ri p dng lut b sung ph
hp thm hu t S[j..i+1].
Vic tm ng i c nhn theo cch thng thng cn thi gian O(||), do ng i nhn chc
chn tn ti v khng c hai cnh no cng xut pht t mt nh c nhn bt u bng cng mt k t
nn ta c th da vo k t u ca nhn v di ca cnh tm ra im kt thc ca ng i trong
thi gian t l vi s nt trn ng i.
nh l: Gi (v, s(v)) l mt lin kt hu t, su nt ca v ti a l ln hn su nt ca s(v)
mt n v.
CHNG MINH: Vi mi nt t tin trong ca v c nhn x u c lin kt hu t n nt c nhn .
Tuy nhin x l xu con ca nhn ca v nn l xu con ca nhn ca s(v) hay ni cch khc t tin

ca v c lin kt hu t n mt t tin ca s(v). Mi t tin ca v c nhn ng i khc nhau (do


nhn trn cc cnh l xu khc rng) nn cc t tin ca s(v) c lin kt hu t tr n cng khc
nhau. Ta c mt n nh gia tp cc xu trn ng i t gc n v (lc lng ca tp ny l su
nt ca v) tr nt gc v tp cc xu trn ng i t gc n s(v) (lc lng ca n l su nt ca
s(v)) do v c su khng qu su ca s(v) cng 1.

Hnh 8: Vi mi nt v trn ng i x, c mt nt s(v) trn ng i . Tuy nhin, su nt ca


v c th ln hn, bng hoc nh hn su nt ca s(v) mt n v. V d, nt c nhn xab c
su hai, nt c nhn ab c su mt; nt c nhn xabcdefg c su bn, nt c nhn abcdefg
c su nm.
B : Mi pha ca gii thut Ukkonen c th c thc hin trong thi gian O(m).
CHNG MINH: Tr bc m rng u tin thc hin c trong thi gian hng s, cc bc m
rng tip theo bao gm vic i ngc ln t nt hin ti khng qua mt nt, theo lin kt hu t n nt
mi v i xung mt s nt tm ra v tr kt thc ca ng i. Ta c th tnh c gii hn trn ca
tng s thao tc i xung bng nhn xt sau: su ti a ca cy hu t l m. Tht vy, mi cnh ca
cy l mt xu con khc rng v ng i t gc n l bt k biu din mt hu t ca cy nn tng s
cnh ti a trn ng i t gc n mt nt l bt k bng m hay su ti a ca mi nt l m. Mi

pha gm c i+1 mbcm rng,trbcutinkhngcndichuynnns bcilntia


camtphalm.Cngvimilnitheolinkthutsuntgimtialmtnntngsln
gimsuntl2m.sucanthintikhngthqumnntngcngsbcixungtrong
cphakhngqu3m.VythigiantnhtoncamiphalO(m).
Tngcngcmphanntacngaybtiptheo:
B:ThuttonUkkonencthcthchintrongthigianO(m2).
nyhiuqucathuttonccithinngktuynhintavncth gim phc
tptnhtonxungthigianhngsbngmtvinhnxtnh.

2.3.5. Thuttondngcyhutngmnhtrongthigiantuyntnh
Nhnxt1:Lutmrngs3ktthcmipha.
CHNGMINH:Nuluts3cpdngnghalngicnhnS[j..i]chcchncnitip
bngktSi+1nnttcccngicnhnS[k..i]vik>jcngvyvlutth3tiptccp
dngchoccbcmrngcnlicapha.
Gij*lch s cabcmrngutinluts 3cpdng.Theonhnxttrntakhng
cnthchinccbcmrngkvik>j*trongphahinti.
Nhnxt2:Mtkhilntlthlunlunlntl.
CHNG MINH: R rng trong 3 lut m rng khng c lut no cho php thm mt nt l mi bn
di mt nt l c nhn . Do khi mt nt l c to ra th n s lun lun l nt l cho n khi
pha cui cng ca thut ton kt thc.
Gi ji l bc m rng cui cng trong pha i m lut m rng 1 hoc 2 c p dng. Xt xu
S[1..i-1], lut m rng 1 p dng cho nt l c nhn S[1..i-1] m rng n thnh S[1..i], lut m rng 2
to ra mt nt l c nhn S[1..i] do vi mi k ji, nt c nhn S[1..k] l nt l. Trong pha i+1 tip
theo, cc bc m rng 1..ji u p dng lut 1 nn ji ji+1.
Nhn xt trn gi cch ci t hiu qu thut ton: thay v cp nht nhn ca cc nt l mt cch
tng minh, ta gn nhn cho cc nt l (lu rng trong ci t nhn ca nt l l mt cp s nguyn)
l (p, m). Trong p l v tr bt u ca xu con v m l di xu S, thay th cho v tr cui xu ang
xt. Nh vy trong pha i+1 ta khng cn thc hin ji bc m rng tng minh u tin.
nh l: S dng lin kt hu t v cc nhn xt trn, gii thut Ukkonen c th dng cy hu t
ngm nh trong thi gian O(m).
CHNG MINH: Trong mi pha i ca thut ton, ta ch cn thc hin cc bc m rng tng minh t
ji-1 n j*. Do bc m rng cui cng lut 1 hoc 2 c p dng chnh l mt bc trc khi lut 3
ln u tin c p dng nn ta c j i = j*-1. Nh vy s bc m rng c thc hin c th tnh theo
cng thc:
. Vy thi gian thc hin ca thut ton l
O(m).

Hnh 9: Hnh nh qu trnh thc hin ca thut ton. Mi dng l mt giai on trong thut ton, mi
s l mt bc m rng tng minh c thc hin.

2.3.6.

Dng cy hu t thc s

Cy hu t ngm nh cui cng Im c th c chuyn thnh cy hu t thc s trong thi gian


O(m). Ta ch vic thm k t $ vo cui xu S v thc hin thut ton, kt qu l mt cy hu t ngm
nh ca mt chui m khng c hu t no l tin t ca hu t khc, ng thi cng l cy hu t
thc s ca xu.
Tm li ta c:
nh l: Gii thut Ukkonen dng cy hu t ca xu m k t S trong thi gian O(m).

Chng 3. Cy hu t tng qut


3.1.

Khi nim

Trong cc phn trn ta tng bc xy dng cy hu t cho mt chui. gii quyt bi ton
xu con chung ln nht ca hai hay nhiu chui ta cn m rng khi nim cy hu t cha nhiu
chui hn trong mt cu trc d liu chung.
nh ngha: Cho tp cc chui {S1, S2,... SK}, cy hu t tng qut cho tp cc chui ny l cy
sao cho:

Cc ng i t gc n l tng ng 1-1 vi cc hu t ca Si.

Mi nt trong, tr nt gc, c t nht l hai con.

Mi cnh c gn nhn l mt xu con khc rng ca S.

Khng c hai cnh no ca cng mt nt c nhn bt u bng cng mt k t.

phn bit hu t ca cc chui khc nhau, mi chui c b sung mt k t kt thc khc


nhau v khng c trong bng ch ci. Mi nt l ca cy tng ng vi mt hu t ca mt chui nht
nh v c gn nhn bng ch s ca chui . Hnh 10 cho ta mt v d v cy hu t ca {xabxa,
abxbx}.

Hnh 10: Cy hu t cng vi cc lin kt hu t cho hai chui xabxa v


abxbx

3.2.

Biu din cy hu t tng qut trong my tnh

Ging nh cy hu t, mt cnh hiu qu biu din cnh ca cy hu t tng qut l lu ch s


bt u v kt thc ca xu con thay v bn thn xu con . Cy hu t tng qut cha nhiu hn mt
xu nn cnh cn lu c ch s ca xu cha xu con n biu din. Hnh nh ca cy hu t tng qut
cho hai xu xabxa v abxbx trong b nh nh trong Hnh 11.

Hnh 11: Cy hu t tng qut cho hai xu xabxa v abxbx cng vi cc lin kt hu t trong b nh
(cc ch s bt u t 0)
Nhn ca nt l l khng cn thit v xu cha n c xc nh qua cnh duy nht ni vi n.

3.3.

Dng cy hu t tng qut trong thi gian tuyn tnh

p dng gii thut Ukkonen gii thiu trong chng trc ta d dng dng c cy hu t tng
qut trong thi gian O(N) vi N l tng di cc xu.
u tin ta dng cy hu t thng thng cho xu S1. Vi cc xu S2, S3,... SK trc tin ta tm tin
t di nht Sk[1..i] tn ti trong cy. Ta thc hin cc giai on i+1, i+2,... m k ca thut ton
Ukkonen m rng cy hu t tng qut ph ton b xu.
i su vo chi tit, vic tm tin t di nht c trong cy ng ngha vi vic tm ng i di
nht trong cy c nhn Sk[1..i] bng cch qut tng k t trn ng i t gc. C hai trng hp xy
ra:
1. ng i kt thc nt v (c th l nt gc): thm nt con mi ni vi v bng cnh c nhn
l Sk[i+1].
2. ng i kt thc gia mt cnh: chia i cnh ti im ng i kt thc v to ra nt mi
v. To nt con ca v ni vi n bng cnh Sk[i+1].
Sau khi thc hin xong bc trn bc m rng u tin ca giai on i+1 hon thnh, ta c th
i theo nt cha ca v, theo lin kt hu t v.v... thc hin cc bc m rng tip theo. Lu rng
trong trng hp th 2 ta cng cn m bo lin kt hu t ca v s c thit lp trong bc m rng
tip theo.

Chng 4. Bi ton tm xu con chung di nht


4.1.

Khi nim

Xu con ca mt xu S l xu thu c bng cch chn ra mt s k t lin tc trong S. Mt cch


hnh thc, gi s S = S1S2...Sm, mt xu Z=Si+1Si+2...Si+t l vi 0 i v i+t m l xu con ca S. V d,
xu Z = bcda l xu con ca S = aabcbcdabdab.
Cn phn bit khi nim xu con v dy con. Dy con ca S l dy thu c bng cch xo bt i
mt s k t trong S hay ni cch khc dy con l mt dy cc k t xut hin trong S, theo ng th t
nhng c th khng lin tc. V d dcbd l mt dy con ca S nhng khng phi xu con ca S.
dy.

Cho hai xu S1 v S2, ta ni Z l xu con chung ca S1 v S2 nu Z ng thi l xu con ca c hai


V d, cho S1 = abcdefg v S2 = bccdegf ta c:

Z = bc l xu con chung.

Xu efg khng phi xu con chung.

Z = bc c di 2 khng phi xu con chung di nht v c

Xu cde l xu con chung c di 3.

Xu cde l xu con chung di nht v khng tm c xu con chung c di 4.

4.2.

Tm xu con chung di nht ca hai xu

Trong cy hu t tng qut ca xu S 1 v S2, nh du cc nt trong bng 1 (hoc 2) nu cy con


ti nt cha nt l c nhn 1 (hoc 2). Nhn ca ng i t gc n mi nt c nh du c hai
l mt xu con chung ca hai xu. Nt c nhn di nht hay su ng i ln nht cho ta li gii ca
bi ton.
V d trong Hnh 10 (trang 14) cc nh u, v, w, t ln lt tng ng vi cc xu con chung bx,
abx, a, x ca hai xu ban u. Xu con chung di nht l abx tng ng vi nh v. Qua hnh v ta cng
nhn thy rng khi tm c mt nh trong tho mn xu con ca n cha c nt l ca S 1 v S2 th
khng cn xt cc nh cha ca n na v ng i tng ng vi cc nh cha chc chn ngn hn
ng i ca nh . Trong trng hp ny vic xt nh w l khng cn thit.
Ta c th duyt ng thi tnh su ng i cho tt c cc nt trong cy trong thi gian O(|V|+|
E|) bng gii thut tm kim theo chiu su (DFS). Do s nt v s cnh ca cy u l O(N) vi N l
tng di hai xu nn ta c gii thut tm xu con chung di nht trong thi gian tuyn tnh.

4.3.

Tm xu con chung di nht ca nhiu hn hai xu

nh ngha: Cho K xu S1, S2,..., SK, vi mi gi tr k t 2 n K, gi l(k) l di ca xu con


chung di nht ca t nht k xu trong tp cho.
V d, xt tp cc xu {sandollar, sandlot, handler, grand, pantry}, cc gi tr l(k) l:

l(k)

xu con

sand

and

and

an

Tn ti thut gii cho bi ton tm tt c cc gi tr l(k) cng vi cc xu con tng ng trong thi
gian tuyn tnh O(N) [7]. Tht ng ngc nhin l mt lng ln thng tin nh vy li c th c trch
ra trong thi gian ch t l thun vi thi gian c xu. Tuy nhin y l mt gii thut phc tp, ti liu
ny ch xin trnh by gii thut c phc tp thi gian O(KN).
nh ngha: Vi mi nt trong v ca cy T, gi C(v) l s ch s xu phn bit xut hin trong cc
nt l ca cy con ti nt v.
Khi s C(v) cho mi nt trong bit, ta d dng tnh c l(k) vi mi k = 2, 3,..., K cng vi v
tr ca cc xu con trong thi gian tuyn tnh bng cch duyt cy. Trong qu trnh duyt ta xy dng
mng V(k) cha su ng i (v xu con) ca nt su nht c C(v) = k. Mng V sau khi xy dng
xong cha xu di nht trong nhng xu con xut hin ng k ln trong tp xu cho, do V(k)
l(k).tml(k)taduytmngVngctcuilnuvghiligitrV(k)lnnhtgphayni
cchkhc,nuV(k)lrnghocV(k)<V(k+1)thtV(k)=V(k+1).Ktqulmngl(k)cntnh.

4.3.1. TnhsC(v)
Cchnginnht tnhC(v)lbanutC(v)=0chominttrongsauthchinKln
duyttonbcy,tronglnthktatmttcccnttrongvmcyconticchantlc
chsxulkvtnggitrC(v)lnmtnv.
MilnduytcnthigianO(N)dosntcacytngtuyntnhviNnntngcngKln
duyttamtkhongthigianO(KN),ycnglphctpthigiancatonbthuttontmcc
xuconchungdinhtcaKxu.

Chng 5. Chng trnh th nghim


5.1.

Kt qu th nghim
Hnh 12 cho thy thi gian dng cy hu t ph thuc tuyn tnh vo di xu.

Hnh 12: th thi gian dng cy hu t ph thuc vo kch thc xu. Kch thc tnh theo n v
4Kb, thi gian tnh theo giy.
Hnh 13 cho thy thi gian gii bi ton k-xu con chung ln lt vi 2, 3, 4, 5 xu c di bng
nhau. th th hin rt r s ph thuc tuyn tnh ca thut ton vo tch ca s xu v di xu.
Chng trnh th nghim c ci t bng ngn ng C, chy trn my Intel Core Duo vi 1Gb
b nh trong.

Hnh 13: Thi gian di bi ton k-xu con chung theo di xu. Cc xu c sinh ngu nhin c
di bng nhau (tnh theo Kb).

5.2.

Hng dn ci t chng trnh

Chng trnh cs-suffix-tree s dng b giao din a nn GTK+ v phn mm GraphViz trc
quan ho cy hu t. Chng trnh c th chy trn h iu hnh *nix v Windows.
Mt bn GTK c pht hnh km vi chng trnh, chy chng trnh bng bn ny hy dng
tp cs-suffix-tree.bat. Nu ci GTK trong my v th mc bin c trong bin mi trng PATH ca
Windows, ngi s dng c th chy trc tip tp cs-suffix-tree.exe.
B ci t GraphViz i km vi chng trnh v cn c hon thnh trc khi chy chng
trnh. Sau khi ci t xong GraphViz ngi dng cng nn thm th mc bin (chng hn C:\Program
Files\Graphviz2.26.3\bin) vo bin mi trng PATH ca Windows tin li khi s dng.

5.3.

Hng dn s dng chng trnh

5.3.1.

Trc quan ho cy hu t

Kch hot chng trnh bng cch chy tp cs-suffix-tree.bat hoc cs-suffix-tree.exe khng c
tham s.
Mn hnh chng trnh gm hai phn:

Phn thng tin bn tay tri

Phn hnh nh bn tay phi


Ngi s dng nhp t nht l mt xu vo trong cc text box c nhn S1, S2,..., S5 v nhn nt

Tnh. Chng trnh s to ra hnh nh ca cy hu t qua cc bc xy dng v t ng hin th kt


qu cui cng l cy hu t tng qut cho nhng chui (khc rng) nhp.
Ngi s dng c th nhn cc nt Trc / Sau / u / Cui hoc nhp s th t hin
th cc hnh nh trong chui hnh nh c to ra.
ng thi cc xu con chung di nht ca t nht l k chui trong cc chui cho cng c hin
th bn di cc nt dch chuyn.
Mc ng dn n dot l ng dn n tp thc thi ca chng trnh dot trong th vin
GraphViz. Nu th mc bin ca GraphViz nm trong PATH th c th nguyn l dot, nu
khng ngi s dng cn nhp ng dn n tp dot.exe vo mc ny.

5.3.2.

Tm xu con chung di nht ca cc xu ln

Chy tp cs-suffix-tree.bat hoc cs-suffix-tree.exe vi c php:


cs-suffix-tree -c file1 file2 [file3 [file4 [] ] ]

Trong :

-c (compare) l m chc nng.

filen l ng dn n cc file cn so snh, ti thiu phi c hai file.

Chng trnh s so snh k xu c cung cp (ni dung ca mi tp l mt xu) v tr v xu con


chung di nht ca t nht 2, 3, 4,..., k xu trong . Kt qu c in ra mn hnh, nu mun ngi s
dng c th dng c php > outfile lu ra tp.
Gii hn: kch thc mi tp khng qu 1MB, s tp khng qu 20.

5.3.3.

Sinh xu ln

Chy tp cs-suffix-tree.bat hoc cs-suffix-tree.exe vi c php:


cs-suffix-tree -g fileSize fileCount filePattern

Trong :

fileSize l kch thc tp tnh theo byte.

fileCount l s tp cn sinh

filePattern l mu tn file theo nh dng ca hm printf trong C, trong phi cha %d

Chng trnh chn ngu nhin mt danh sch t c sn (ting Vit) to xu. di xu khng
c nh m chn ngu nhin trong khong fileSize/2 n fileSize.

5.4.

Hng dn bin dch m ngun


bin dch m ngun trn Linux ch cn ci Eclipse vi plugin CDT.

bin dch m ngun trn Windows ngoi Eclipse + CDT cn cn thit lp mi trng lp trnh
gm c (cc tp ci t c t trong th mc dev ca gi sn phm):

MinGW (http://www.mingw.org/ hoc mingw-get-inst-20101030.exe): to mi trng bin


dch GNU C trn h iu hnh Windows.

MSYS (http://www.mingw.org/wiki/MSYS, i km vi b ci MinGW): mt s cng c cn


thit bin dch chng trnh GNU C.

GTK
(http://www.gtk.org/download-windows.html
hoc
20101016_win32.zip): th vin giao din, gii nn vo th mc C:\GTK.

gtk+-bundle_2.22.0-

Giao din chng trnh c thit k bng chng trnh Glade v cha trong tp ui.glade. xem
tp ny cn ci t Glade 3 (http://ftp.gnome.org/pub/GNOME/binaries/win32/glade3/3.6/ hoc tp
glade3-3.6.7-with-GTK+.exe trong th mc dev).
Sau khi ci t cc gi phn mm trn, cn thit lp mi trng Eclipse c th tm c cc tp
thc thi cng nh cc tp header ca th vin. Tt nht l nn thm ng dn n th mc bin ca
MinGW, bin ca MSYS (nm trong MinGW), bin ca GTK vo bin mi trng PATH ca Windows.
Mt s hnh nh v cch cu hnh d n trong Eclipse nh sau (phi chut vo tn d n, chn
Properties...):

Hnh 14: Cu hnh Linker: thm c mms-bitfields

Hnh 15: Cu hnh assembler: thm c mms-bitfields

Hnh 16: Cu hnh compiler: thm c mms-bitfields

Hnh 17: Thm ng dn n cc tp header

Hnh 18: Thm th vin cho linker

5.5.

M ngun chng trnh


Chng trnh c s dng cc th vin v phn mm:

GTK+ (http://www.gtk.org/, GNU LGPL 2.1)

GraphViz (http://www.graphviz.org/, Common Public License)

M ngun chng trnh s c ti ln Google Code (http://code.google.com/p/cs-suffix-tree/)


di giy php GNU GPL v3.

Ph lc A. Ti liu tham kho


[1] Dan Gusfield, Algorithms on Strings, Trees and Sequences, 1997
[2] Thy Nguyn c Ngha, Slide mn hc Thit k v phn tch thut ton, hc k 1 nm hc
2010-2011
[3] Wikipedia contributors, "Longest common substring problem", Wikipedia, The Free
Encyclopedia,
http://en.wikipedia.org/w/index.php?
title=Longest_common_substring_problem&oldid=398399925 (accessed November 30, 2010).
[4] Wikipedia contributors, "Generalised suffix tree", Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/w/index.php?title=Generalised_suffix_tree&oldid=397462588 (accessed
November 30, 2010).
[5] Wikipedia
contributors,
"Suffix
tree",
Wikipedia,
The
Free
Encyclopedia,
http://en.wikipedia.org/w/index.php?title=Suffix_tree&oldid=396896972 (accessed November
30, 2010).
[6] Wikipedia
contributors,
"Substring",
Wikipedia,
The
Free
Encyclopedia,
http://en.wikipedia.org/w/index.php?title=Substring&oldid=385496932 (accessed November
30, 2010).
[7] L. Hui. Color set size problem with applications to string matching, Proc. 3rd Symp. on
Combinatorial Pattern matching, Springer LNCS 644, 1992.

Ph lc B. Danh mc hnh
Hnh 1: Cy hu t ca chui xabxac.........................................................................................................5
Hnh 2: Cy bn tri l mt phn ca cy hu t cho chui S=abcdefabcuvw vi nhn ca cnh c
vit tng minh. Cy bn phi biu din nhn s dng hai ch s. Lu rng cnh c nhn 2,3 cng c
th c gn nhn 8,9................................................................................................................................5
Hnh 3: Cy hu t cho chui xabxa$........................................................................................................6
Hnh 4: Cy hu t ngm nh cho xu xabxa...........................................................................................7
Hnh 5: Cy hu t ngm nh cho xu axabx trc khi k t th 6, b, c thm...................................8
Hnh 6: Cy hu t ngm nh sau khi thm k t b..................................................................................8
Hnh 7: Bc m rng j>1 trong pha i. i ln ti a l mt cnh t cui ng i S[j-1..i] n nt v sau
theo lin kt hu t n s(v), i xung theo ng i c nhn ri p dng lut b sung ph hp
thm hu t S[j..i+1]................................................................................................................................10
Hnh 8: Vi mi nt v trn ng i x, c mt nt s(v) trn ng i . Tuy nhin, su nt ca v
c th ln hn, bng hoc nh hn su nt ca s(v) mt n v. V d, nt c nhn xab c su
hai, nt c nhn ab c su mt; nt c nhn xabcdefg c su bn, nt c nhn abcdefg c su
nm...........................................................................................................................................................11
Hnh 9: Hnh nh qu trnh thc hin ca thut ton. Mi dng l mt giai on trong thut ton, mi
s l mt bc m rng tng minh c thc hin...............................................................................13
Hnh 10: Cy hu t cng vi cc lin kt hu t cho hai chui xabxa v abxbx....................................14
Hnh 11: Cy hu t tng qut cho hai xu xabxa v abxbx cng vi cc lin kt hu t trong b nh
(cc ch s bt u t 0)...........................................................................................................................15
Hnh 12: th thi gian dng cy hu t ph thuc vo kch thc xu. Kch thc tnh theo n v
4Kb, thi gian tnh theo giy....................................................................................................................18
Hnh 13: Thi gian di bi ton k-xu con chung theo di xu. Cc xu c sinh ngu nhin c
di bng nhau (tnh theo Kb)....................................................................................................................19
Hnh 14: Cu hnh Linker: thm c mms-bitfields..................................................................................23
Hnh 15: Cu hnh assembler: thm c mms-bitfields.............................................................................24
Hnh 16: Cu hnh compiler: thm c mms-bitfields...............................................................................25
Hnh 17: Thm ng dn n cc tp header.........................................................................................26
Hnh 18: Thm th vin cho linker..........................................................................................................27

Ph lc C. Danh mc thut ng
cy hu t...................................................................................................................................................4
cy hu t ngm nh.................................................................................................................................6
cy hu t tng qut.................................................................................................................................14
dy con.....................................................................................................................................................16
lin kt hu t.............................................................................................................................................9
lut m rng...............................................................................................................................................7
nhn............................................................................................................................................................4
nhn ng i.............................................................................................................................................4
xu con.....................................................................................................................................................16
xu con chung..........................................................................................................................................16
su chui...............................................................................................................................................4
su nt...................................................................................................................................................4

You might also like