You are on page 1of 10

60

Chng 4: THC NGHIM


4.1 H THNG CHNG TRNH
Chng trnh c thc hin trn ngn ng C# 2005, dng h qun tr c s d liu SQL Server. Chng trnh c s dng thnh phn kmlocal l nhm cc chng trnh gom cm theo K-means ca nhm tc gi David M. Mount [5], thuc trng i hc MaryLand, c vit bng ngn ng VC++ 2008, do vy thc hin h thng ny cn phi ci t thm VC++ 2008. Chng trnh c cc phn nh cc hnh sau:

Hnh 4.1: Chn c s d liu

Nh trong hnh 4.1, chng trnh ch lm vic vi h qun tri c s d liu SQL Server, cho php ngi dng s dng c 2 c ch ng nhp trong h qun tr ny.

61

Hnh 4.2: Tab c d liu

Trong tab c d liu m t chi tit d liu mi c chn. Thng tin cho bit tng s item v tng s giao tc c trong c s d liu xut hin ca cc item. C s d liu gm 2 bng: Bng cc item v bng cc giao tc, mi giao tc biu din s xut hin ca cc item trong mt giao dch. Trong tab c d liu chng ta c th xem thng tin ca 2 bng d liu.

62

Hnh 4.3: Tab cc thut ton gom cm theo K-means

Trong hnh 4.3, cho php ngi dng phn cm bng cc item thnh cc cm theo cc thut ton K-means. C 4 thut ton gom cm theo K-means: Lloyds, Swap, EZ_Hybrid v Hybrid. Ngi dng c th chn 1 trong 4 thut ton gom cm bng cch chn tng loi thut ton v nht nt Thc hin phn nhm. Trc khi thc hin cn phi cho bit s nhm cn gom cm. Trong tab ny c 2 ca s, ca s bn tri m t trng tm ca tng cm, ca s bn phi cho bit cc item tng ng ca tng cm sau khi thc hin thut ton gom cm.

63

Hnh 4.4: Tab xc nh tp cu

Nh trong hnh 4.4, l cc thao tc xc nh tp cu. u tin ngi dng cho bit ngng xut hin ca thuc tnh, y l ngng xc nh thuc tnh no ca cc item dng lm c trng phn hoch cc item , ta c tp thuc tnh c trng BA. Sau phn hoch U da trn tp thuc tnh BA va tm c, kt qu thu c l c s tm tp cu, l tp cha cc item nm trn 2 cm no . Kt qu ta c cc tp cu.

Hnh 4.5: Tab tm lut kt hp thng thng

64

Hnh 4.5 th hin chc nng tm lut kt hp thng thng tha mn hai o: h tr v tin cy da trn phng php Apriori.

Hnh 4.6: Tab tm lut th v

Hnh 4.5 m t cc bc tm lut th v. u tin da vo tng tp cu ta tm cc lut tha mn h tr v tin cy cho. T tp cc lut kt hp, bc tip theo, tm lut th v theo o minsimi (ngng tng t) v o entropy hay mutual. Cc lut trong tp lut kt hp tha 2 ngng trn s l lut th v. Nu tm theo o
entropy th chn nt entropy, ngc li th chn mutual.

4.2 KT QU THC NGHIM


Thc nghim lun vn c thc hin trn ngn ng C# 2005, dng h qun tr c s d liu SQL Server 2005. Lun vn c tham kho cc thut ton gm cm ca David M. Mount i hc Maryland, cc nhm thut ton gom cm c thc hin bng

65

ngn ng visual C++ 2008, do vy tch hp nhm cc thut ton ny, lun vn ngh cn phi ci thm Visual C++ 2008. Thc nghim c thc hin trn cu hnh CPU intel core dual 2.2 GHz, RAM 3GB, h iu hnh Win XP. Thc nghim th nht c thc hin trn c s d liu nhn to vi 50 tp mc, mi tp mc c 10 thuc tnh, v bng c s d liu c 50.000 giao tc. Trong , d liu c gom thnh 10 cm, tn s xut hin ca cc thuc tnh xut l 42/50; minsimi = 5 v
minentro = 0.5. Kt qu thc nghim th hin trong bng 4.1.

Thc nghim th hai c thc hin trn c s d liu Mushroom vi 119 tp mc, mi tp mc c 10 thuc tnh, v bng c s d liu c 8.124 giao tc. Trong , d liu c gom thnh 12 cm, tn s xut hin ca cc thuc tnh xut l 105/119. Kt qu thc nghim vi minsimi = 8, minentro = 1. Kt qu thu c trong bng 4.2.
Bng 4.1: Kt qu th nghim trn CSDL nhn to. Stt Minsup Minconf Lut kt hp thng thng 2450 2450 1390 0 1483 1234 0 Lut bc cu Lut bc cu th v 6 6 4 0 4 4 0

1 2 3 4 5 6 7

0.2 0.2 0.2 0.2 0.25 0.25 0.3

0.3 0.4 0.5 0.6 0.4 0.5 0.3

14 14 5 0 6 5 0

Bng 4.2: Kt qu thc nghim trn CSDL Mushroom. Stt Minsup Minconf Lut kt hp thng thng 63129 3828 1907 667 223 Lut bc cu Lut bc cu th v 3 1 1 0 0

1 2 3 4 5

0.3 0.4 0.45 0.5 0.6

0.65 0.7 0.7 0.7 0.7

5 3 2 2 0

66

Hin nay khng c nhiu thut ton xc nh tp cu nn khng th so snh cc bi ton vi nhau v mt tm s lng lut cng nh v mt thi gian. Do vy, kt qu thc nghim c so snh s lng lut bc cu tm c v lut kt hp thng thng, da trn nhiu h tr v tin cy khc nhau. Da vo kt qu thc nghim ta thy s lng lut bc cu tm c so vi lut kt hp thng thng l rt b. Ngha l, s lng lut bc cu tm c l rt t so vi lut kt hp thng thng. iu cho thy, lut bc cu th v tm c chnh l nhng bit l cn tm.
Cch thc chn gi tr ngng: Ty thuc vo tng c s d liu th m ta chn cc

gi tr ngng khc nhau. Vi c s d liu Chess bao gm 75 tp mc v 3196 giao tc c gom thnh 10 cm v tn s xut hin ca cc thuc tnh xut l 65/75. Kt qu thc nghim cho thy gi tr minsimi giao ng trong on [7, 9] v minentro (0.5, 0.66). i vi c s d liu Connect bao gm 129 tp mc v 67557 giao tc c gom thnh 15 cm v tn s xut hin ca cc thuc tnh xut l 110/129. Kt qu thc nghim cho thy gi tr minsimi giao ng trong on [7, 10] v minentro (0.8, 1.68).

4.3 KT LUN V HNG PHT TRIN


Trong qu trnh thc hin lun vn c tm hiu mt s vn cn thit nh: hc khng gim st c th l tm hiu nhm thut ton gom cm k-means, tm hiu cc bi ton v khai thc lut kt hp, tm hiu kin thc c bn tp th phc v cho vic phn hoch tm tp cu, phn tch mt s o xc nh lut bc cu no l lut bc cu th v. Bn cnh , lun vn cng ci t c demo tm cc tp cu t c s d liu, ng thi c th so snh c v t l vi lut kt hp truyn thng v xc nh c cc lut bc cu th v chnh l nhng bit l l th gia 2 nhm d liu hon ton khc nhau. Trong phn tch cc o l mt trong nhng vn khng th thiu xc nh ngng lm c s tm lut bc cu th v.

67

Phn tch cc o tng t:

C nhiu cng thc tnh tng t gia cc tp i tng khc nhau. Trong lun vn ny tc gi s dng 2 phng php o tng t: (1) m s thuc tnh ging nhau gia cc tp mc (item); (2) dng khong cch Euclide. V c s d liu khai thc c chun ha sao cho mi thuc tnh u c gi tr khong nn vic s dng hai phng php o trn l hp l. Tuy nhin, mi phng php c nhng hn ch ring. Ta bit lut bc cu l lut X Y, trong X v Y l tp cc item thuc v hai lp khi nim khc nhau. Do , phng php o (1) s gp kh khn nu t nht mt trong hai tp X, Y c hn mt item. N ch hiu qu khi X v Y ch c mt item. Trong khi phng php (2) c th gii quyt cho mi trng hp ca X v Y. Tuy nhin, thi gian tnh ton s tng ln kh nhiu nu c s d liu c nhiu thuc tnh. tng t gia X v Y th hin tnh th v ca lut bc cu X Y. i vi phng php (1), nu tng t cng nh th lut bc cu cng th v. Nhng nu dng (2) th ngc li, ngha l, nu khong cch gia X v Y cng ln th lut bc cu X Y cng th v hn.
Phn tch cc o th v:

Entropy l o mc khng chc chn ca bin ngu nhin cn thng tin tng h gia hai bin ngu nhin X, Y l o cho ta bit bin ngu nhin ny cha bao nhiu thng tin v bin ngu nhin khc. Do , tm mi quan h gia cc lp khi nim vi nhau chng ta c th s dng 2 o trn. Tuy nhin, do o tng t, o entropy v thng tin tng h c tnh cht i xng nn tnh th v ca hai lut kt hp X Y v Y X l nh nhau. iu ny khng st vi thc t. Do , lun vn nghin cu thm v cc o v xut s dng thm quan trng (importance). Khai thc lut bc cu th v l mt hng nghin cu mi, v vy lun vn ch dng li mc khai thc tt c cc lut bc cu th v trn tp c s d liu giao tc. Tuy

68

nhin lun vn cng pht hin ra y l mt hng cn c th pht trin c v mt l thuyt cng nh ci t cc ng dng hu ch cho nhiu lnh vc. Lun vn s nghin cu thm v cc o tm o hp l hn cho tng loi bi ton. Ngoi ra, lut bc cu th v trong mt s bi ton c th sinh ra rt nhiu nn cn tm cc lut quan trng nht trong cc lut th v . Da vo khi nim tp ph bin ng tm cc lut kt hp khng d tha. Lun vn c th pht trin theo hng tm lut th v khng d tha da trn tp ph bin ng. Phng php tm lut th v khng d tha c th c ngh nh sau: Bc 1: Xc nh tp cu Bc 2: Xc nh tp ph bin ng trn phm vi tp cu Bc 3: Tm cc lut tha tnh cht bc cu t tp ph bin ng Bc 4: Xc nh lut th v da trn tng t v th v. Trc y, cc thut ton tm lut kt hp thng lm vic trn c s d liu giao tc; ngha l, trn mt giao tc ta c th bit c c bao nhiu mn hng (tp mc) xut hin. Nhng nm gn y xut hin nhiu thut ton tm lut kt hp trn c s d liu s (quantitative database), ngha l trn mt giao tc ngoi thng tin cho bit c bao nhiu mn hng xut hin cn bit c thng tin s lng mi mn hng. Da vo hng tm lut bc cu th v trn c s d liu s, ta c th pht trin bi ton lut bc cu th v c trng s. Bi ton ny cn thm mt bng m t trng s ca tng mn hng (tp mc), cho bit tm quan trng ca mn hng ny so vi mn hng khc. Nh vy, vi bi ton ny ta c 3 bng trn c s d liu: Bng m t thuc tnh tng tp mc, bng m t trng s tng tp mc, v giao tc s (quantitative transaction). Gn y c nhiu hng nghin cu kt hp vi di truyn v logic m trong vic tm lut kt hp. Theo hng ny th bi ton s tm c ngng ca h tr v tin

69

cy tt nht da vo thut ton di truyn v logic m. T ta c th pht trin bi ton tm cc lut bc cu th v c kt hp di truyn v logic m xc nh ngng tt nht, thay v ngi dng phi t xc nh ngng. Cc hng khai thc m gn y c nhiu ngi tho lun nhm suy dn tri thc m. Do cc tp mc c th c cc c tnh ring, cc h tr ti thiu khc nhau v c th xc nh hm lin thuc cho cc tp mc khc nhau. Trong qu kh, c nhiu thut ton khai thc d liu theo kiu di truyn-m rt trch cc h tr ti thiu v cc hm lin thuc cho cc tp mc t c s d liu s. Tm li, lun vn cn c th pht trin thm theo 5 hng: tm hiu cc o cho tng loi bi ton c th, tm lut bc cu th v khng d tha, khai thc tp cu th v trn c s d liu c trng s, khai thc tp cu th v c trng s, khai thc tp cu th v theo hng di truyn v logic m. V vy, khai thc lut bc cu th v s l mt hng nghin cu kh l th trong tng lai m ta c th tip tc khai thc.

You might also like