You are on page 1of 43

19

CHNG 2: TP PH BIN
2.1 Tp ph bin
2.1.1 nh ngha: nh ngha ph bin: Cho CSDL bao gm: Tp cc danh mc I, tp cc giao dch D v tp danh mc X I ph bin ca X trong D - k hiu l sup(X) - c nh ngha l s giao dch m X xut hin trong D. V d: Vi CSDL mu trong bng 1.1, ta c: Tp danh mc I = { A, C, D, T, W } v tp giao dch D gm c 6 giao dch: {ACTW, CDW, ACTW, ACDW, ACDTW, CDT } ph bin ca tp danh mc X1 = { A } l s giao dch trong D c cha { A }, do sup(X1) = 4 Tng t, X2 = { A, T } => ph bin ca X2 l sup(X2) = 3 nh ngha tp ph bin: Tp X I c gi l tp ph bin nu sup(X) minSup, vi minSup l gi tr do ngi dng ch nh. V d: Cng vi CSDL mu trong bng 1.1, vi minSup = 3 (50%) th tp X2 = { A, T } l tp ph bin v c sup(X2) = 3 minSup. Tng t, vi X3 = { A, C, T } th sup(X3) = 3 minSup v X3 cng l tp ph bin. Ngc li, vi X4 = { C, D, T } th sup(X4) = 2 < minSup, do X4 khng phi l tp ph bin.

20

2.1.2 Cc tnh cht: Tnh cht 1: ph bin ca tp con ln hn tp cha. Cho hai tp ph bin X, Y vi X Y th sup(X) sup(Y) Tnh cht 2 : Mi tp con ca mt tp ph bin u l tp ph bin. X l tp ph bin v Y X th sup(Y) sup(X) minSup, v vy Y cng l tp ph bin. Tnh cht 3 : Mi tp cha ca mt tp khng ph bin th cng khng ph bin. X l tp khng ph bin v Y X th sup(Y) sup(X) < minSup, v vy Y cng khng phi l tp ph bin.

2.1.3 Cch b tr d liu : Trong cc c s d liu quan h thng thng s lu tr d liu theo chiu ngang. Tc l cc bng d liu hai chiu s gm N dng tng ng vi cc giao dch, v M ct tng ng vi cc danh mc. Vic b tr theo chiu ngang gip cho vic xc nh cc danh mc thuc v mt giao dch n gin nhanh chng. Tuy nhin khi cn xc nh mt danh mc c th thuc vo nhng giao dch no th cch b tr theo chiu ngang li gy ra kh khn, khi ta phi duyt tt c cc giao dch c trong CSDL v ghi nhn nhng giao dch c cha danh mc c th . V d : Trong CSDL mu bng 1.1, mun bit danh mc A nm trong nhng giao dch no th ta phi duyt ht tt c cc danh mc xem giao dch no cha A th lit k ra, kt qu cc giao dch cha A l tp { 1, 3, 4 , 5 }

21

C th nhn thy vi cc CSDL ln th vic xc nh nhng giao dch c cha mt danh mc l rt tn km. Trong lc vic xc nh nhng giao dch c cha mt danh mc c th li l cng vic thng xuyn tnh ph bin ca mt tp danh mc. tng tc khai thc tm tp ph bin, chng ta c th s dng cch b tr d liu theo chiu dc. Ngha l bng d liu c s o chiu, cc dng bin thnh cc ct v ngc li : V d : CSDL mu trong bng 1.1 c b tr theo chiu dc nh sau :

Bng 2.1 Bng d liu b tr theo chiu dc M danh mc C W A D T 1 1 1 2 1 2 2 3 4 3 M cc giao dch 3 3 4 5 5 4 4 5 6 6 5 5 6

Lc ny vic xc nh xem danh mc A c ph bin bao nhiu trong CSDL b tr theo chiu dc tr nn n gin bng cch xc nh dng trong bng tng ng vi m danh mc A, kt qu l tp giao dch {1, 3, 4, 5} v sup(A) = 4 Tuy nhin cch b tr d liu theo chiu dc s gy kh khn trong vic qun l b nh v bng d liu gia tng kch thc theo chiu ngang thay v theo chiu dc.

22

2.2 Tp ph bin ng
2.2.1 Kt ni Galois Quan h hai ngi: Cho I = { i1, i2, , in } l tp tt c cc danh mc. T = { t1, t2, , tm } l tp tt c cc giao dch trong CSDL giao dch D. CSDL c cho l quan h hai ngi I x T. Nu mc i I xy ra trong giao dch t T th ta vit l: (i, t) , k hiu i t. V d: Xt CSDL mu trong bng 1.1, vi quan h hai ngi: - Giao dch th nht c biu din l { A1, C1, T1, W1 } - Giao dch th hai c biu din l { C2, D2, W2 } - Giao dch th ba c biu din l { A3, C3, T3, W3 } - Giao dch th t c biu din l { A4, C4, D4, W4 } - Giao dch th nm c biu din l { A5, C5, D5, T5, W5 } - Giao dch cui cng c biu din l { C6, D6, T6 }

Bng 2.2 Bng minh ha quan h 2 ngi M giao dch 1 2 3 4 5 6 Ni dung giao dch A, C, T, W C, D, W A, C, T, W A, C, D, W A, C, D, T, W C, D, T M danh mc A C D T W Cc giao dch c cha danh mc 1, 3, 4, 5 1, 2, 3, 4, 5, 6 2, 4 , 5, 6 1, 3, 5, 6 1, 2, 3, 4, 5

23

nh ngha kt ni Galois: Cho quan h hai ngi I x T cha CSDL cn khai thc. t X I v Y T. Vi P(S) gm tt c cc tp con ca S. Ta nh ngha hai nh x gia P(I) v P(T) c gi l kt ni Galois nh sau :

a. t : P(I) P(T), t(X) = { y T | x X, x y } b. i : P(T) P(I), i(Y) = { x I | y Y, x y }

Hnh 2.1 Kt ni Galois Hai nh x trn c minh ha trong hnh 2.1, trong nh x t(X) l tp tt c cc giao dch c cha itemset X (hay cn gi l Tidset ca Itemset X) v i(Y) l tp tt c cc danh mc c trong tt c cc giao dch trong Y. K hiu itemset X v tp cc giao dch tng ng vi n t(X) l : X x t(X) v c gi l IT-pair. Tng t vi tp giao dch Y v i(Y) l i(Y) x Y. V d : t(ACW) = t(A) t(C) t(W) = 1345 123456 12345 = 1345 i(245) = i(2) i(4) i(5) = CDW ACDW ACDTW = CDW

24

2.2.2 Ton t ng v tp ng: Ton t ng: Cho X I v nh x c: P(I) P(I) vi c(X) = i(t(X)). nh x c c gi l ton t ng. V d: Xt CSDL mu cho trong bng 1.1 c: c(AW) = i(t(AW)) = i(1345) = ACW c(ACW) = i(t(ACW)) = i(1345) = ACW Tp ng: Cho X I. X c gi l tp ng khi v ch khi c(X) = X. V d: Xt CSDL mu cho trong bng 1.1, ta c: Do: c(AW) = i(t(AW)) = i (1345) = ACW => AW khng phi l tp ng Do: c(ACW) = i(t(ACW)) = i(1345) = ACW => ACW l tp ng

2.2.3 nh ngha tp ph bin ng: X c gi l tp ph bin ng nu X ph bin v X l tp ng. V d: Xt CSDL mu cho trong bng 1.1, vi minSup = 50% = 50% * 6 = 3, ta c: Do t(AW ) = 1345 => sup( AW ) = 4 > minSup => AW l tp ph bin Do t(ACW ) = 1345 => sup( ACW ) = 4 >minSup => ACW l tp ph bin Tuy nhin ch c ACW l tp ph bin ng cn AW th khng phi. Ni tm li, tp danh mc X l tp ph bin ng khi khng tn ti tp cha X sao cho X X v sup(X) = sup(X).

25

ngha ca tp ph bin ng: Trong nhng CSDL ln, s lng tp ph bin ng t hn rt nhiu so vi s lng tp ph bin thng thng. Do gim thi gian trong cng on 1 trong khai thc lut kt hp, chng ta c th i tm cc tp ph bin ng thay cho cc tp ph bin, v thc hin rt trch lut kt hp trn cc tp ph bin ng. 2.2.4 Tnh cht ca tp ph bin ng: Tnh cht 1 (item merging): Gi s X l mt tp ph bin v tt c cc giao dch Trans c cha tp danh mc X, ng thi mi giao dch ny cng cha tp danh mc Y vi Y X = v khng tn ti tp Y tng t nh Y vi Y Y (c ngha l Y l tp ln nht c th c). Th c th kt lun l tp X Y l mt tp ph bin ng c sup (X Y) = Trans; v nhng tp ph bin cha X m khng cha Y th khng th l tp ph bin ng. V d: Trong bng 1.1 cho thy nhng giao tc c cha tp danh mc {W} th cng cha tp danh mc {C} v cng khng tm c tp danh mc Y sao cho {C} Y vi nhng giao tc c cha tp danh mc {CW} th cng cha tp danh mc Y. Vy c th kt lun l tp danh mc {CW} l tp ng. Ngoi ra cng nhn thy mt iu: tp ph bin {TW} khng l tp ph bin ng. Tnh cht 2 (sub-itemset pruning): Tp danh mc X l tp ph bin v tp danh mc Y vi Y X v sup(Y) = sup(X) th c th khng nh l nhng tp ph bin ng c cha Y th chc chn s cha lun X hoc nhng tp ch cha Y khng cha X th khng th l tp ph bin ng. V d: Trong bng 1.1 cho thy tp danh mc {CTW} l tp ph bin vi ph bin l 3, v tp con {TW} cng c ph bin l 3. Vy tp ph bin ng {ACTW} cha tp danh mc {TW} th cng cha tp danh mc {CTW} v cc tp danh mc {ATW}, {DTW} cng khng th l tp ph bin ng.

26

2.3 Cc phng php tm tp ph bin


2.3.1 Phng php sinh ng vin thut ton Apriori: Phng php sinh ng vin tm tp ph bin c Agrawal [13] xut t nm 1993 vi thut ton Apriori. tng ca thut ton Apriori da trn kt lun: nu mt tp danh mc l ph bin th tt c tp con ca n cng phi ph bin (tnh cht 2 tp ph bin). Do vy khng th c trng hp mt tp ph bin c tp con l khng ph bin hay ni cch khc tp ph bin nhiu danh mc hn ch c th c to ra t cc tp ph bin t danh mc hn. V l cch hot ng ca thut ton Apriori : - Bt u t cc tp ph bin ch c mt danh mc - To ra cc tp ph bin c k danh mc t nhng tp ph bin c (k-1) danh mc Ni dung thut ton : Lk: Tp hp ca cc tp ph bin k danh mc (k-itemset) . Mi phn t ca tp hp ny c hai trng: tp danh mc v ph bin. Ck: Tp hp ca cc tp ng vin k danh mc. Mi phn t ca tp hp ny c hai trng: tp danh mc v ph bin. L1 ={tp hp 1 danh mc}; For (k = 2; Lk-1 ; k++) do begin Ck = apriori-gen (Lk-1); // To ng vin mi. Forall giao tc t D do begin //duyt CSDL Ct = subset(Ck, t); //cc tp danh mc ng vin c trong //giao tc t Forall ng vin c Ct do c. count ++; end Lk = {c Ck | c.count minsup} end Answer = k Lk; // Tr v tp hp ca cc tp ph bin

27

Din gii thut ton : - Bc u tin ca thut ton n gin ch tnh cc danh mc xut hin xc nh tp hp ca cc tp ph bin 1 danh mc. - Lp bc k : + Tp hp Lk-1 c s dng to nn tp ng vin Ck : s dng hm apriori-gen c miu t l hm ly Lk-1 (tp hp ca cc tp ph bin k-1 danh mc ) l u vo v tr v Ck l tp hp ca tt c cc tp cha k danh mc pht sinh t tp hp Lk-1 bng cch hp cc tp k-1 danh mc trong tp hp Lk-1. Ch mt ng vin thuc Ck th tt c cc tp con ca ng vin phi c mt trong Lk-1 (theo tnh cht 2 ca tp ph bin) + Bc k tip, duyt CSDL tnh ph bin ca cc ng vin trong tp hp Ck. T , tnh c tp hp Lk. + Nu Lk = th dng li. - Hp ca cc Lk chnh l cc tp ph bin cn tm. V d minh ha : Xt CSDL mu trong bng 1.1 vi minSup = 50% = 50% * 6 = 3 Bc 1 : Duyt CSDL tm ra L1 l cc tp ph bin c 1 danh mc c ph bin 3 : Bng 2.3 Cc tp ph bin c 1 danh mc Database (D) TID Ni dung 1 A, C, T, W 2 C, D, W 3 A, C, T, W 4 A, C, D, W 5 A, C, D, T, W 6 C, D, T L1 Danh ph mc bin A 4 C 6 D 4 T 4 W 5

Bc 2 : tnh tp ng vin C2 da trn L1 bng php hp. Sau duyt CSDL tnh ph bin ca cc ng vin thuc C2, loi b nhng ng vin c ph bin b hn minSup, to thnh L2 l cc tp ph bin c 2 danh mc.

28

Bng 2.4 Cc tp ph bin c 2 danh mc C2 Danh ph mc bin AC 4 AD 2 AT 3 AW 4 CD 4 CT 4 CW 5 CT 2 DW 3 TW 3

L2 Danh mc AC AT AW CD CT CW DW TW ph bin 4 3 4 4 4 5 3 3

Bc 3 : Tng t bc 2 to C3 v L3 t L2 . Ch khi to C3 th loi b nhng ng vin c tp con 2 danh mc khng nm trong L2. T L2 to C3 c th bng cch t tp hp tt c danh mc c trong L2, ri ln lt kim tra cc tp danh mc 3 thuc tnh trong tp hp theo ch dng trn. Bng 2.5 Cc tp ph bin c 3 danh mc C3 ph bin 3 4 3 3 3 L3 Danh mc ACT ACW ATW CDW CTW ph bin 3 4 3 3 3

Danh mc ACT ACW ATW CDW CTW

Bc 4 : To C4 v L4 t L3

29

Bng 2.6 Cc tp ph bin c 4 danh mc C4 Danh mc ACTW ph bin 3 Danh mc ACTW L4 ph bin 3

Bc 5 : To C5 v L5 t L4. Ta c C5 = . Nh vy thut ton dng li bc 5 v L5 = Kt qu : vi minSup =50% th : Tp hp cc tp ph bin = L1 L2 L3 L4 = {A, C, D, T, W, AC, AT, AW, CD, CT, CW, DW, TW, ACT, ACW, ATW, CDW, CTW, ACTW } Hn ch ca thut ton Apriori: xc nh ph bin ca cc tp ng vin, thut ton Apriori phi qut li ton b giao dch trong CSDL, do s tiu tn rt nhiu thi gian, c bit khi s danh mc ln. Trong bi bo [12] cc tc gi xut thut ton Apriori Tid vi s khc bit c bn l khng s dng li ton b CSDL ca ln duyt th nht m k t bc th hai thut ton apriori-Tid s dng tp Ck. Vic tnh ph bin s da vo cc tp Ck v nh vy trnh c vic phi c nhiu ln CSDL. Tuy nhin trong qu trnh xt duyt khi to kch thc ca Ck l rt ln v hu ht l tng ng vi kch thc ca CSDL gc. Do thi gian tiu tn cng s tng ng vi thut ton Apriori, ngoi ra thut ton Apriori- Tid cn phi gnh chu thm chi ph pht sinh nu Ck vt qu b nh m phi s dng km b nh ngoi. Nh vy, nhn chung cc phng php sinh ng vin tm tp ph bin u khng hiu qu v phi c CSDL nhiu ln v phi pht sinh v kim tra mt lng ln cc ng vin. Nhng hn ch ny s c khc phc vi cc phng php khng sinh ng vin s tm hiu phn tip theo.Honung

30

2.3.2 Phng php da trn cy FP-Tree 2.3.2.1 Cu trc cy FP Tree Cu trc FP-Tree (Frequent Pattern tree ) c gii thiu ln u tin bi cc tc gi J.Han, J.Pei v Y.Yin trong bi bo [5] khc phc c nhc im ca thut ton Apriori l phi pht sinh v kim tra mt lng ln cc ng vin. Cu trc FP-tree c dng t chc li CSDL cho thun li hn trong qu trnh tm tp ph bin, ng thi cc thng tin c nn trong cy FP-tree vi t l tng i cao. Nhng thun li l: Nhng danh mc khng ph bin c loi ngay t u, v vy vic tm tp ph bin ch thao tc trn mt s lng danh mc nh hn nhiu so vi ton b cc danh mc. Nhiu giao dch s c nn chung trong cy FP-tree v vic ny gip gim bt kh nhiu thao tc trong qu trnh xc nh ph bin ca tp danh mc. Cu trc FP-tree cho php thc hin tm kim theo chiu su v p dng m hnh chia tr kh hiu qu. Cy FP-tree l cu trc cy vi mt s c im sau: C mt nt cha c nh nhn NULL, nhng nt con ni vi nt cha l nhng thnh phn chung ca nhiu giao dch c nn li vi nhau (item prefix subtree), bn cnh cng c mt mng cc danh mc n ph bin (frequent-item header table). Mi nt trong item prefix subtree c ba trng d liu: m danh mc, s tch ly v con tr lin kt. M danh mc tng ng danh mc m nt ny i din, s tch ly l s giao dch c cha chung phn danh mc ny, con tr lin kt dng lin kt 2 nt i din chung mt m danh mc hai item prefix subtree khc nhau. Gi tr con tr lin kt mang gi tr rng khi l nt cui cng trong chui lin kt.

31

Mi phn t trong frequent-item header table gm 2 trng: m danh mc v con tr lin kt n u nt ca chui lin kt cc nt cng i din chung cho mt danh mc. 2.3.2.2 Xy dng cy FP Tree Thut ton to cy FP-tree: Duyt ton b CSDL v xc nh th t ca cc danh mc gim dn theo ph bin v c a vo trong f-list. Da vo ngng ph bin ngi dng a vo s xc nh nhng danh mc no c to trong FP-tree v sp xp cc danh mc trong tng giao dch theo th t trong f-list. Sau to cy FP-tree bng cch ln lt xt tng giao dch trong CSDL c sp xp v loi b nhng danh mc khng t ngng ph bin.

Hm createFPtree()
INPUT: CSDL v ngng ph bin min_sup. OUTPUT: cu trc d liu FP-tree ca CSDL. Cc bc thc hin: Bc 1: Duyt ton b CSDL v tnh ph bin ca tng danh mc. Sau xc nh nhng danh mc c ph bin ln hn ngng ph bin minSup v sp xp gim dn theo ph bin trong f-list. Bc 2: root. Bc 3: + Vi mi giao dch trong CSDL thc hin chn v sp xp nhng danh mc ph bin theo th t trong f-list. + Giao dch ang xt c k hiu l [prlist] gm 2 phn: p l phn t danh mc u tin rlist l phn danh mc cn li bn phi ca giao dch (khng k nhng danh mc khng tha ngng ph bin). + Gi hm insert_tree([prlist], root). To cy FPtree ch c mt nt gc c gn nhn l null, k hiu

32

Hm insert_tree(List:[prlist], Node)
Cc bc thc hin: Bc 1: + So snh p vi cc nt con ca Node (child), nu nhn ca p trng vi nhn ca nt con (p.item-name = childe.item-name) th tng ch s m ca nt con thm 1. + Nu p khc nhn cc nt con, hoc nt con rng th to mt nt con mi, khi to ch s m l 1, to lin kt vi nt trong cy c cng nhn Bc 2: Nu rlist cha rng th gi hm insert_tree(rlist, child).

V d minh ha xy dng cy FP-tree: Qu trnh xy dng cy FP-tree s c th hin qua v d xy dng cy tng ng vi CSDL mu bng 1.1 tm tp ph bin tha ngng minSup = 2 Tm cc danh mc n ph bin: Qut CSDL tnh ph bin ca tng danh mc v sp xp gim dn trong mng cc danh mc n ph bin f-list, xc nh nhng danh mc c ph bin khng nh hn minSup s c to trong FP-tree: Bng 2.7 Mng th t cc danh mc n ph bin f-list STT 1 2 3 4 5 M danh mc C W A D T ph bin 6 5 4 4 4 Con tr lin kt Null Null Null Null Null

Sp xp th t cc danh mc trong tng giao dch theo th t trong f-list: Qut CSDL ln 2, vi mi giao dch chn v sp li th t cc danh mc theo th t trong bng 2.6. Ta c CSDL sau khi sp xp nh sau:

33

Bng 2.8 CSDL sau khi sp xp theo th t trong f-list M giao dch 1 2 3 4 5 6 Ni dung giao dch A, C, T, W C, D, W A, C, T, W A, C, D, W A, C, D, T, W C, D, T Ni dung giao dch sau khi sp xp theo th t mi C, W, A, T C, W, D C, W, A, T C, W, A, D C, W, A, D, T C, D, T

Cy FP-tree khi mi khi to:

Hnh 2.2 Cy FP-tree mi khi to Cy FP-tree sau i c giao dch 1: CWAT

Hnh 2.3 Cy FP-tree sau khi c xong giao dch CWAT

34

Cy FP-tree sau khi c giao dch 2: CWD

Hnh 2.4 Cy FP-tree sau khi c giao dch CWD Cy FP-tree sau khi c giao dch 3: CWAT

Hnh 2.5 Cy FP-tree sau khi c giao dch CWAT

35

Cy FP-tree sau khi c giao dch 4: CWAD

Hnh 2.6 Cy FP-tree sau khi c giao dch CWAD Cy FP-tree sau khi c giao dch 5: CWADT

Hnh 2.7 Cy FP-tree sau khi c giao dch CWADT

36

Sau khi c giao dch cui cng CDT ta c cy FP-tree ng vi minSup = 2 :

Hnh 2.8 Cy FP-tree ton cc

2.3.2.3 Php chiu trn cy FP-tree: Sau khi xy dng cu trc FP-tree cho ton b CSDL ch gm nhng danh mc n tha ngng ph bin, chng ta phi duyt cy tm ra nhng tp ph bin tha minSup. Hiu qu ca qu trnh khai thc ph thuc nhiu vo phng php duyt. Phng php duyt phi tha nhng yu cu: m bo kt qu tp ph bin l y . Kt qu cc tp ph bin khng b trng lp Nhng tp ph bin to ra tha ngng minSup.

37

duyt cy FP-tree, ta c th s dng mt trong hai php chiu di y: Php chiu t di ln: Da trn th t ca f-list, chn danh mc ht ging bt u t danh mc c ph bin nh nht tha minSup cho n danh mc c ph bin ln nht. Trn cy FP-tree, duyt t nhng nt cha danh mc ht ging tin dn n nt gc xy dng f-list cc b v FP-tree cc b ca danh mc ht ging. Nu duyt ht danh mc trong f-list th quay lui mt bc v thc hin tip. Php chiu t trn xung: Da trn th t ca f-list, chn danh mc ht ging bt u t danh mc c ph bin ln nht cho n danh mc c ph bin nh nht tha minSup. Trn cy FP-tree, duyt t nt cha danh mc ht ging tin dn xung nt l ca cy v xy dng f-list cc b ca danh mc ht ging. Ghi nhn v tr ca nt con trc tip ca nhng nt cha danh mc ht ging trong f-list cc b. Nu f-list cc b khng cn danh mc no th quay lui mt bc v thc hin tip

38

2.3.2.4 Tm cc tp ph bin vi thut ton FP-growth: Trong bi bo [5] cc tc gi gii thiu thut ton FP-growth duyt cy FP-tree kh hiu qu. Thut ton FP-growth duyt cy bng php chiu t di ln theo phng php duyt theo chiu su v da trn m hnh chia tr. Thut ton FP-growth:

Hm gen-FreqItemset()
INPUT: CSDL cc giao dch v ngng ph bin minSup OUTPUT: Tp cc tp ph bin FI tha ngng ph bin minSup Cc bc thc hin: Bc 1: Tree0 = createFPtree (CSDL, minSup) Bc 2: FP-growth (Tree0, null, minSup)

Hm FP-growth (FP-tree, prefix, minSup)


Cc bc thc hin: Bc 1: Nu FP-tree ch c mt ng n P th ch cn to ra nhng tp ph bin kt hp gia prefix v cc t hp ca nhng danh mc trong P v ph bin bng vi gi tr tch ly nh nht ca nhng nt tham gia vo t hp. Sau khi pht sinh xong th kt thc hm. Bc 2: Ngc li, ln lt xt tng phn t a trong f-list ca FP-tree v pht sinh tp ph bin ( prefix a ) c ph bin bng sup(a). Bc 2.1: Duyt cy FP-tree t chui cc nt i din cho danh mc a, bt u bng con tr lin kt ca phn t a trong f-list v hng ln nt gc. Sau khi duyt xong ta c c CSDL cc giao dch chiu trn tp danh mc (prefix a), k hiu l CSDL { prefix a } Bc 2.2: Tree{prefix a} = createFPtree (CSDL{prefix a}, minSup) Bc 2.3: Nu Tree {prefix a} th gi hm:

FP-growth (Tree {prefix a} , (prefix a), minSup )

39

V d minh ha thut ton FP-growth: Thc hin thut ton FP-growth tm cc tp ph bin trong CSDL mu bng 1.1 vi ngng ph bin l minSup = 2. Sau khi to c Tree0, gi hm: FP-growth (Tree0, null, minSup)

Hnh 2.9 Cy Tree0 V Tree0 khng phi ng n, nn xt danh mc ( T) trong f-list v pht sinh tp ph bin ( T : 4 ). Sau chiu trn Tree0 to ra CSDL { T} : Chiu trn tp danh mc (T:4) C s d liu cc b ca cc giao dch c cha tp danh mc { T } Bng 2.9 Ni dung CSDL{ T} Ni dung cn li ca cc giao dch c cha tp danh mc { T } C, W, A C, W, A, D C, D Gi tr tch ly 2 1 1

40

To cy FP-tree cc b tng ng vi CSDL{ T } :

Hnh 2.10: Tree{ T} cc b tng ng vi CSDL{ T} V Tree { T} nn gi quy chiu su hm FP-growth (Tree { T} ,( T) ,minSup) V Tree{ T} khng phi ng n, nn xt danh mc ( D) trong f-list v pht sinh tp ph bin ( TD: 2 ). Chiu trn tp danh mc (TD :2) : C s d liu cc b ca cc giao dch cha tp danh mc { T, D} Bng 2.10: Ni dung CSDL{ TD } Ni dung cn li ca cc giao dch c cha tp danh mc { T, D } C C, W, A Gi tr tch ly 1 1

To cy FP-tree cc b tng ng vi CSDL{ TD } :

41

Hnh 2.11: Tree{ TD } cc b tng ng vi CSDL{ TD } V Tree{ TD} nn gi quy hm FP-growth (Tree { TD} ,( TD) ,minSup) V Tree{ TD} l ng n nn pht sinh tp ph bin ( TDC : 2). Quay tr li vi Tree{ T } xt tip danh mc (A) trong f-list ca Tree{T} v pht sinh tp ph bin ( TA :3). Chiu trn tp danh mc (TA :3) : C s d liu cc b ca cc giao dch c cha tp danh mc {T, A} Bng 2.11: Ni dung CSDL{ TA } Ni dung cn li ca cc giao dch c cha tp danh mc { T, A } C, W To cy FP-tree cc b tng ng vi CSDL{ TA } : Gi tr tch ly 3

Hnh 2.12: Tree{ TA } cc b tng ng vi CSDL{ TA } Gi hm quy FP-growth (Tree { TA} ,( TA) ,minSup) v pht sinh c cc tp ph bin ( TAC :3) ; ( TAW : 3) ; ( TACW :3)

42

Quay tr li vi Tree{ T } xt tip danh mc (W) trong f-list ca Tree{T} v pht sinh tp ph bin ( TW :3). Chiu trn tp danh mc (TW :3) : C s d liu cc b ca cc giao dch c cha tp danh mc {T, W} Bng 2.12: Ni dung CSDL{ TW } Ni dung cn li ca cc giao dch c cha tp danh mc { T, W } C Gi tr tch ly 3

To cy FP-tree cc b tng ng vi CSDL{ TW } :

Hnh 2.13: Tree{ TW } cc b tng ng vi CSDL{ TW } Gi hm quy FP-growth (Tree { TW} ,( TW) ,minSup) v pht sinh c cc tp ph bin ( TWC : 3) Quay tr li vi Tree{ T } xt tip danh mc (C) trong f-list ca Tree{T} v pht sinh tp ph bin ( TC :4) Chiu trn tp danh mc (TC :4) : CSDL{ TC} = dn n Tree{TC} = Quay tr li vi Tree0 xt tip danh mc (D) trong f-list ca Tree0 v pht sinh tp ph bin ( D :4)

43

Chiu trn tp danh mc ( D :4) : Ta c c s d liu cc b ca cc giao dch c cha tp danh mc { D }

Bng 2.13: Ni dung CSDL{ D } Ni dung cn li ca cc giao dch c cha tp danh mc { D } C, W, A C, W C To cy FP-tree cc b tng ng vi CSDL{ D} : Gi tr tch ly 2 1 1

Hnh 2.14: Tree{ D } cc b tng ng vi CSDL{ D } Gi hm quy FP-growth (Tree { D} ,( D) ,minSup) v pht sinh c cc tp ph bin : ( DA : 2) ; ( DW :3) ; ( DC :4) ; ( DAW : 2) ; ( DAC : 2) ; ( DWC :3) ; (DAWC :2) Quay tr li vi Tree0 xt tip danh mc (A) trong f-list ca Tree0 v pht sinh tp ph bin ( A :4)

44

Chiu trn tp danh mc ( A :4) : Ta c c s d liu cc b ca cc giao dch c cha tp danh mc { A }

Bng 2.14: Ni dung CSDL{ A } Ni dung cn li ca cc giao dch c cha tp danh mc { A } C, W Gi tr tch ly 4

To cy FP-tree cc b tng ng vi CSDL{ A} :

Hnh 2.15: Tree{ A } cc b tng ng vi CSDL{ A }

Gi hm quy FP-growth (Tree { A} ,( A) ,minSup) v pht sinh c cc tp ph bin : ( AW : 4) ; ( AC :4) ; ( AWC : 4) Quay tr li vi Tree0 xt tip danh mc (W) trong f-list ca Tree0 v pht sinh tp ph bin ( W :5)

45

Chiu trn tp danh mc ( W :5) : Ta c c s d liu cc b ca cc giao dch c cha tp danh mc { W }

Bng 2.15: Ni dung CSDL{ W } Ni dung cn li ca cc giao dch c cha tp danh mc { W } C Gi tr tch ly 5

To cy FP-tree cc b tng ng vi CSDL{W} :

Hnh 2.16: Tree{ W } cc b tng ng vi CSDL{ W } Gi hm quy FP-growth (Tree { W} ,( W) ,minSup) v pht sinh c tp ph bin : ( WC : 5) Quay tr li vi Tree0 xt tip danh mc (C) trong f-list ca Tree0 v pht sinh tp ph bin ( C :6) Chiu trn tp danh mc ( C :6) : C s d liu cc b ca cc giao dch c cha tp danh mc { C } : CSDL{ C} = Cu trc cy FP-tree cc b tng ng vi CSDL{ C} : Tree{ C} =

Thut ton kt thc

46

Kt qu tp ph bin thu c khi thc hin thut ton FP-growth : Bng 2.16 : Kt qu tp ph bin tha ngng minSup = 2 STT 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. T T, D T, D, C T, A T, A, W T, A, C T, A, W, C T, W T, W, C T, C D D, A D, W D, W, A D, C D, A, C D,W, C D, A, W, C A A, W A, C A, C, W W W, C C Tp ph bin ph bin (s giao dch ) 4 2 2 3 3 3 3 3 3 4 4 2 3 2 4 2 3 2 4 4 4 4 5 5 6 ph bin (%) 66.67% 33.33% 33.33% 50.00% 50.00% 50.00% 50.00% 50.00% 50.00% 66.67% 66.67% 33.33% 50.00% 33.33% 66.67% 33.33% 50.00% 33.33% 66.67% 66.67% 66.67% 66.67% 83.33% 83.33% 100.00%

47

Nhn xt : Thut ton FP-growth kh hiu qu v s dng cu trc cy FP-tree v duyt theo chiu su vi m hnh chia tr. Tuy nhin v FP-growth lm vic trn tt c cc tp ph bin nn vn chm hn mt s thut ton khc s dng cy FP-tree nhng ch lm vic trn cc tp ph bin ng nh CLOSET v CLOSET+. 2.3.2.5 Tm cc tp ph bin ng vi thut ton CLOSET+ Trong cc bi bo [6], [4] cc tc gi da trn tng l khai thc lut kt hp thng qua cc tp ph bin ng v trnh by cc thut ton CLOSET v CLOSET+ tm tp ph bin ng. S lng tp ph bin ng t hn rt nhiu so vi s lng cc tp ph bin thng thng, bn cnh cc tp ph bin ng vn cha ng tt c cc lut kt hp c trong CSDL, do vic khai thc lut kt hp da trn cc tp ph bin ng l rt hiu qu. Sau y chng ta tm hiu v thut ton CLOSET+ dng tm ra cc tp ph bin ng trn cy FP-tree. Da vo tnh cht ca tp ph bin ng c mt s nhn xt sau: Nhn xt 1: Nu mt danh mc ph bin u c xut hin trong nhiu cp f-list cc b vi cng ph bin th ta c th loi b v khng xt n danh mc ny trong nhng cp f-list cc b trc . Nhn xt 2: Trong qu trnh pht sinh tp ph bin ng, chng ta m bo tnh ng ca tp ph bin bng 2 php kim tra: + Superset checking: Kim tra trong cc tp ph bin ng c nu c tp no l tp con ca tp ph bin mi v c cng ph bin th loi b tp . + Subset checking: Kim tra trong cc tp ph bin ng c nu c tp no l tp cha ca tp ph bin mi v c cng ph bin th tp ph bin mi khng phi l tp ph bin ng. Thut ton CLOSET + s dng chin lc duyt theo chiu su v m hnh chia tr cng vi vic p dng tnh cht ca tp ph bin ng nn khng cn php Superset checking vn m bo tnh ng ca cc tp ph bin tm thy.

48

Thut ton CLOSET+

Hm CLOSET+ ( )
INPUT: CSDL cc giao dch v ngng ph bin minSup OUTPUT: Tp cc tp ph bin ng FCI tha ngng ph bin minSup Cc bc thc hin:

Bc 1: Duyt CSDL v xy dng cy FPtree: Tree0


Ch khi xy dng cy Tree0 tnh s tch ly trung bnh ca mt nt. Da vo nh gi xem CSDL c t l nn cao hay thp.

Bc 2: Nu Tree0 c t l nn cao:
+ S dng php chiu t di ln khai thc tp ph bin + Trong qu trnh khai thc tp ph bin s dng tnh cht 1 v tnh cht 2 ca tp ph bin ng gii hn khng gian khai thc. + Gi th tc Checking_BottomUp mi ln pht sinh mt ng vin tp ph bin ng kim tra tnh ng ca kt qu.

Bc 3: Nu Tree0 c t l nn thp:
+ S dng php chiu t trn xung khai thc tp ph bin + Trong qu trnh khai thc tp ph bin s dng tnh cht 1 v tnh cht 2 ca tp ph bin ng gii hn khng gian khai thc. + Gi th tc Checking_TopDown mi ln pht sinh mt ng vin tp ph bin ng kim tra tnh ng ca kt qu. Gii thch thut ton: Vi CSDL c t l nn cao, FPtree ton cc c th c nn li nh hn nhiu ln so vi CSDL gc. Tng t, cc php chiu trn cy FPtree cng to thnh cc FPtree cc b nh hn nhiu so vi chnh n. V vy qu trnh khai thc tp ph bin kh hiu qu. Bn cnh thut ton s dng thm cc tnh cht 1(item merging) v tnh cht 2 (sub-itemset pruning) gim bt khng gian tm kim.

49

V d: Vi CSDL mu trong bng 1.1 v minSup = 2. Hnh 2.21(a) l Tree{ T } sau khi chiu FPtree ton cc ln tp danh mc { T }

Hnh 2.17: p dng tnh cht ca tp ph bin ng Khi thc hin php chiu t di ln vi tp danh mc {TA: 3} ta c Tree{TD} nh hnh 2.21(b) v CSDL{TA} = ( CW : 3). V mi giao dch cha tp ph bin {TA:3} cng cha {CW:3} v {CW} ln nht c th nn theo tnh cht item merging th: {TA} v {CW} hp thnh tp ph bin ng { TACW :3}. Khng phi xt ti cc tp ph bin { TAC } v {TAW } v chng chc chn khng phi l tp ph bin ng. Tip , khi xt n tp danh mc {TW :3} th theo tnh cht sub-itemset pruning th do {TW } l tp con ca tp ph bin ng {TACW } va tm thy trn v li c cng ph bin l 3, do khng cn thc hin php chiu vi {TW} C th nhn thy, khi CSDL c t l nn cao th nhiu kh nng xut hin cc FPtree cc b l ng n, v do tnh cht item merging s c c hi p dng nhiu ln lm gim khng gian tm kim.

50

Sau khi tm c mt ng vin tp ph bin ng th ta s dng th tc Checking_BottomUp hoc Checking_TopDown kim tra tnh ng ca ng vin . Th tc Checking_BottomUp lm vic da trn cy kt qu: Cy kt qu l cu trc d liu dng lu tr cc tp ph bin ng, n bao gm 2 cp ch mc: Mt l tn danh mc theo th t trong f-list ca danh mc trong cc tp ph bin ng tham gia vo cy. Khi b sung tp ph bin ng mi vo cy, duyt t gc nu nt c th ly gi tr ln nht thay v tnh tch ly, nu nt cha c th to nt mi vi ph bin bng ph bin ca tp ng thm vo cy. V d: T cc tp ph bin ng { CDT:2}, {CWAT:3}, {CWA:4} ta xy dng - Hai l ph bin ca danh mc, cn c trn ph bin ln nht

cy kt qu nh sau theo th t trong f-list l {CWADT} :

Hnh 2.18: Minh ha xy dng cy kt qu

51

kim tra tnh ng ca tp ph bin {CWT:3} ta tm v tr xut hin trn cy ca phn t cui theo th t f-list l nt {T:3}, sau duyt dn v gc kim tra thy tt c cc phn t ca {CWT:3} u xut hin ht trong cy, do {CWT:3} khng phi l tp ph bin ng.

Hm Checking_BottomUp( )
INPUT: Cy kt qu RTree lu tr tp ph bin ng, tp ph bin X OUTPUT: Kt lun tp X c phi l tp ph bin ng hay khng Cc bc thc hin: Bc 1: Chn phn t cui cng trong tp danh mc ca X, xc nh v tr xut hin ca n trn cy c gi tr trng sup(X). Bc 2: Xc nh nt trn cy v duyt dn v gc kim tra xem tt c cc danh mc trong X c ln lt xut hin ht trong cy hay khng? Bc 3: Nu khng ht, kt lun X l tp ph bin ng v phi b sung vo cy RTree. Ngc li th khng phi l tp ph bin ng. Khi CSDL c t l nn thp, chng ta s dng th tc Checking_TopDown kim tra mt tp ph bin c phi l tp ph bin ng hay khng da trn cy FP-tree ton cc.

Hm Checking_BottomUp( )
INPUT: Cy FP-Tree ton cc, tp ph bin X OUTPUT: Kt lun tp X c phi l tp ph bin ng hay khng Cc bc thc hin: Bc 1: max = th t ln nht trong s cc danh mc trong X theo f-list Bc 2: Duyt ln lt cc giao dch Ti c cha tp ph bin X da trn cy FP-tree v ghi nhn nhng danh mc c th t nh hn max theo f-list th tch ly ph bin ca danh mc . Bc 3: Nu c danh mc no bc 2 c ph bin bng sup(X) th X khng phi tp ph bin ng. Ngc li X l tp ng.

52

2.3.3 Phng php da trn cy IT-Tree 2.3.3.1 Cu trc IT-tree Cho X I, ta nh ngha hm P (X, k) = X [ 1: k ] gm k phn t u ca X v quan h tng ng da vo tin t nh sau :

X , Y P ( I ), X K Y p ( X , k ) = p (Y , k )
Mi nt trn IT-tree gm hai thnh phn: - Itemset X - Tidset t(X) : l tp cc giao dch c cha X Cp X x t(X) c gi l IT-pair. Cc tnh cht ca IT-pair: Cho Xi x t( Xi ) v Xj x t( Xj ) l hai IT-pair. Ta c 4 tnh cht sau 1. Nu t( Xi ) = t( Xj ) th c( Xi ) = c( Xj ) = c( Xi Xj ) 2. Nu t( Xi ) t( Xj ) th c( Xi ) c( Xj ) nhng c( Xi ) = c( Xi Xj ) 3. Nu t( Xi ) t( Xj ) th c( Xi ) c( Xj ) nhng c( Xj ) = c( Xi Xj )
t( Xi ) t( Xj )

4. Nu

Th c( Xi ) c( Xj ) c( Xi Xj )

t( Xi ) t( Xj )

2.3.3.2 Xy dng cy IT-tree Xt CSDL mu bng 2.18, vi minSup =3 (50%)

53

Bng 2.17 Bng CSDL minh ha xy dng IT-tree M giao dch 1 2 3 4 5 6 Ni dung giao dch A, C, T, W C, D, W A, C, T, W A, C, D, W A, C, D, T, W C, D, T M danh mc A C D T W Cc giao dch c cha danh mc 1, 3, 4, 5 1, 2, 3, 4, 5, 6 2, 4 , 5, 6 1, 3, 5, 6 1, 2, 3, 4, 5

Ta xy dng cy IT-tree nh sau : Nt gc Root l nt rng Lp 1 l cc nt con c itemset ch l 1 danh mc da trn cc danh mc ph bin tha minSup. A x t(A) = A x 1345 ; C x t(C) = C x 123456 ; D x t(D) = D x 2456 T x t(T) = T x 1356 ; W x t(W) = W x 12345

Hnh 2.19: Lp 1 ca cy IT-tree Lp 2 l cc nt con c itemset gm 2 danh mc, c xy dng bng s t hp cc itemset ca cc nt lp 1 vi Tidset c tnh da trn Tidset ca cc nt lp 1 : t(AC) = t(A) t(C) = 1345 123456 = 1345 sup(AC) > minSup AC x 1345 to thnh mt nt mi lp 2. t(AD) = t(A) t(D) = 1345 2456 = 45 sup(AD) < minSup AD x 45 khng to thnh nt mi lp 2.

54

Hnh 2.20: Cy IT-tree dng Tidset vi minSup =3 Nh vy c th thy cc tp ph bin c lit k ht trong cc nt ca cy IT-tree, v trong qu trnh xy dng cy It-tree chnh l chng ta i tm cc tp ph bin ca CSDL. 2.3.3.3 Tm tp ph bin trn cy IT-tree Thut ton:

Hm createITtree ()
[ Gen ] = { i I | sup (i) minSup } enumerateFI ( [Gen] )

55

Hm enumerateFI ( [ P ] )
for all p i [ P ] do [ Pi ] = for all p j [ P ] with j > i do Iij = p i p j Tij = t( p i) t( p j) if | Tij | minSup then [ Pi ] = [ Pi ] { Iij x Tij } enumerateFI( [ Pi ] )
Gii thch thut ton: Thut ton bt u vi vic sinh tp Gen l cc tp ph bin ch c 1 danh mc. Tp Gen to thnh lp th nht ca cy IT-tree. Lp th k ca cy IT-tree c to thnh vic xt t hp theo th t tng cp phn t ca lp th k-1, tnh ph bin da vo Tidset nu tha mn ngng minSup th b sung cp IT-pair vo lp k. Nu lp k khng rng th gi quy tnh lp k+1 Nhn xt: Thut ton da vo phn giao gia cc Tidset tnh nhanh ph bin nn ch cn qut CSDL mt ln, tuy nhin li tn khng gian lu tr Tidset. Khi s tp ph bin ln th thi gian khai thc lut s ln. Chng ta c th p dng khi nim Diffset tnh nhanh ph bin nhm lm gim khng gian lu tr Tidset. Khi nim Diffset: Diffset ca X so vi Y, k hiu d (X, Y) c nh ngha: d( XY ) = t(X) - t(Y) Nh vy Diffset ca X so vi Y l cc giao dch c cha X nhng khng cha Y. Cc tnh cht ca Diffset: (1). sup(XY) = sup(X) - | d(XY) |

56

(2). d(PXY) = d(PY) - d(PX) (3). Diffset thng kh nh so vi Tidset. T cc tnh cht (1), (2), (3) ta c th s dng Diffset thay th Tidset trong qu trnh to cy IT-tree. V d minh ha:

Hnh 2.21: Cy IT-tree dng Diffset vi minSup =50% 2.3.3.4 Tm tp ph bin ng trn cy IT-tree Nhn xt v IT-pair: Da trn cc tnh cht ca IT-pair ta c cc nhn xt sau: (1). Nu t(Xi) = t(Xj) th | t(Xi) | = | t(Xj) | = | t(Xi Xj )| Xi (Xi Xj) Xj (Xi Xj) (2). Nu t(Xi) t(Xj) th Xi , Xj khng phi l cc tp ng.

57

c( Xi) = c(Xi Xj) Xi khng l tp ng. t(Xi) t(Xj) Xi , Xj thuc v hai tp ng khc nhau (3) Nu t(Xi) t(Xj) th c( Xj) = c( Xi Xj) Xj khng l tp ng. t(Xi) t(Xj) Xi , Xj thuc v hai tp ng khc nhau t( Xi ) t( Xj ) t( Xi ) t( Xj )

(4). Nu

Th c( Xi ) c( Xj ) c( Xi Xj ) nn Xi , Xj v Xi Xj s thuc v 3 tp ng khc nhau

Da trn cc nhn xt trn cc tc gi M.J Zaki v C.Hsiao trong bi bo [9] trnh by thut ton CHARM tm cc tp ph bin ng trn cy IT-tree. Thut ton CHARM :

CHARM (D, minSup)


[ ] = { ki x t(ki) : ki I sup( ki ) minSup } CHARM-EXTEND ( [ ] , C = ) return C

CHARM-EXTEND ( [ P ] , C )
for each ki x t(ki) in [ P ] do Pi = P ki and [ Pi ] = for each kj x t( kj) in [ P] with j > i do X = kj Y = t(ki) t( kj) CHARM-PROPERTY ( X x Y, ki , kj , [ Pi ], [ P] ) SUBSUMPTION-CHECK ( C, Pi ) CHARM-EXTEND ( [ Pi ], C ) delete ( [ Pi] )

58

CHARM-PROPERTY ( X x Y, ki , kj , [ Pi ], [ P] )
if sup(X) minSup then if t( ki) = t( kj) then Remove kj from [ P ] Pi = Pi kj elseif t( ki) t( kj) then Pi = Pi kj elseif t( ki) t( kj) then Remove kj from [ P ] Add X x Y to [ Pi ] else Add X x Y to [ Pi ]

SUBSUMPTION-CHECK ( C , P )
for all Y HASHTABLE [ | t(P) | ] do if sup( P) = sup( Y) and not( P Y ) then C=C P

Gii thch : bc thm tp ph bin vo cy IT-tree th thut ton CHARM s dng hm CHARM-PROPERTY ( X x Y, ki , kj , [ Pi ], [ P] ) kim tra xem tp ph bin c kh nng l tp ng hay khng da vo cc nhn xt v IT-pair trn. Sau s dng tip hm SUBSUMPTION-CHECK ( C , P ) kim tra

tnh ng ca tp P da trn bng bm HASHTABLE


Minh ha thut ton : S dng CSDL cho bng 2.19 vi minSup = 50%

59

Bng 2.18 Bng CSDL minh ha tm tp ph bin ng trn IT-tree M giao dch 1 2 3 4 5 6 Ni dung giao dch A, C, T, W C, D, W A, C, T, W A, C, D, W A, C, D, T, W C, D, T M danh mc A C D T W Cc giao dch c cha danh mc 1, 3, 4, 5 1, 2, 3, 4, 5, 6 2, 4 , 5, 6 1, 3, 5, 6 1, 2, 3, 4, 5

Cy IT-tree lu gi cc tp ph bin ng FCI s c xy dng nh hnh 2.38

Hnh 2.22: Minh ha xy dng IT-tree bng Charm u tin, tp I = {A, C, D, T, W } s c sp xp theo chiu tng dn ca ph bin thnh K = { D, T, A, W, C }. Vi ki = D : Kt hp vi cc kj { T, A, W, C } thnh cc nt con {DT, DA, DW, DC } Do sup(DT) = sup(DA) = 2 < minSup nn khng c sinh ra mc k tip. Do sup(DW) = 3 = minSup nn nt (DW x 245) c thm vo cy.

60

Do t(D) t(C) nn theo tnh cht 2 ca IT-pair th D khng th l tp ng v ta thay (D x 2456 ) bng (DC x 2456) v (DW x 245) bng (DWC x 245) Vi ki = T : Kt hp vi cc kj { A, W, C } thnh cc nt con {TA, TW, TC } Do sup(TA) = 3 = minSup nn nt (TA x 135) c thm vo cy Do sup(TW) = 3 = minSup nn nt (TW x 135) c thm vo cy Do t(T) t(C) nn theo tnh cht 2 ca IT-pair th T khng th l tp ng v ta thay (T x 1356) bng (TC x 1356), (TA x135) bng (TAC x135) v (TW x 135) bng (TWC x 135). Do t(TAC) = t(TWC) nn thay (TAC x 135) bng (TACW x 135) v xa (TWC x 135) Vi ki = A: Kt hp vi cc kj { W, C } thnh cc nt con {AW, AC } Do t(A) t(W) nn xa (A x 1345) thay bng (AW x 1345) Do t(AW) t(C) nn xa (AW x 1345) thay bng (AWC x1345) Vi ki = W: Kt hp vi cc kj { C } thnh cc nt con {WC } Do t(W) t(C) nn xa (W x 12345) thay bng (WC x 12345) Kt qu sinh ra c 7 tp ph bin ng tha ngng minSup =50% ln lt l: { C:6; DC:4; TC:4; CW:5; AWC:4; DWC:3; TAWC:3 } Nhn xt: S lng tp ph bin ng thng nh hn nhiu so vi s tp ph bin v th vic khai thc lut t chng s hiu qu hn. Thm na mc tm kim trn IT-tree i vi FCI thp hn so vi tm FI nn yu cu b nh cho qu trnh gi quy s nh hn.

61

2.4 Kt lun
Giai on tm tp ph bin l mt giai on quan trng v tn thi gian nht trong qu trnh khai khong lut kt hp. Do c rt nhiu nghin cu tp trung vo vn ny. Hu ht u tp trung gii quyt cc nhim v sau : c CSDL cng t ln cng tt S dng b nh cho qu trnh gi quy cng t cng tt. Tm cc tp ph bin ng thay cho cc tp ph bin ni chung d dng hn cho qu trnh khai thc lut. Vi h cc thut ton Apriori l cch thc nguyn thy do Agrawal [13] xut thc hin tm tp ph bin da trn vic c li CSDL nhiu ln pht sinh ng vin, nn cch thc ny tn thi gian v khng hiu qu. khc phc cc hn ch ca phng php sinh ng vin, cc nh nghin cu xut ra cc cu trc cy FP-tree v IT-tree khai thc tp ph bin trn hiu qu hn nhiu. Cu trc cy FP-tree c tc gi J.Han v cc ng s gii thiu trong [5] dng t chc li CSDL thun li hn cho qu trnh tm tp ph bin. Li ch ca cu trc FP-tree nm ch ton b thng tin trong CSDL c lu tr trong cy vi t l nn cao v thun li cho qu trnh duyt theo chiu su cng nh p dng m hnh chia tr. Sau ca J.Han v ng s trong cc bi bo [6] [4] p dng cc tnh cht ca tp ng vo cy FP-tree khai thc cc tp ph bin ng vi cc thut ton CLOSET v CLOSET+. Vic ch khai thc cc tp ph bin ng gii hn c khng gian tm kim nh hn v hiu qu hn khi khai thc lut Cu trc cy IT-tree lu tr trc tip cc tp ph bin trn cc nt ca cy. Vic xy dng cy IT-tree da vo phn giao gia cc Tidset tnh nhanh ph bin nn kh hiu qu. C th s dng Diffset thay cho Tidset gim khng gian lu tr. Vic tm tp ph bin ng vi cy IT-tree c tc gi M.J Zaki trnh by trong [9] vi thut ton CHARM. Ngoi ra, thun li cho qu trnh khai thc lut, cc nh nghin cu cn xy dng dn tp ph bin v dn tp ph bin ng vi cc thut ton nh CHARM-L hay phng php lai ghp.

You might also like