Professional Documents
Culture Documents
Data Mining Luat Ket Hop
Data Mining Luat Ket Hop
I. Data Mining Ngy nay, chng ta thng nghe ni chng ta ngp chm trong DL nhng li thiu tri thc. Qa tht s pht trin ca CSDL ngy cng tng hnh thnh vi mt khi lng khng l, i hi chng ta - nhng ngi s dng - phi bit khai thc, chn lc d liu c ch cho mnh gia mt bin d liu y. cng l l do Data Mining (DM) ra i. 1. nh ngha Data Mining (khai ph d liu) l tm ra nhng qui lut ng quan tm, cc thng tin v d liu c ch trong qu trnh s dng khi lng d liu khng l. 2. Mc ch Data Mining phn tch d liu v s dng k ngh phn mm tm ra khun mu v cc qui tc ca d liu, t c c nhng tri thc, hiu bit v d liu ang tip cn. Nh vy mc ch chnh ca DM khng ch l ly thng tin c sn t DL m quan trng l nhng hiu bit c c t DL . mi chnh l thng tin quan trng cn t ti. 3. Chc nng ca DM - Phn loi d liu - Kt hp d liu - Xu chui d liu... 4. So snh vi DBMS - DBMS: truy vn CSDL. VD: tm ra cc bn ghi vic mua hng cng c mt hng A v B - DM: phn tch v khm ph nhng hiu bit t nhng truy vn. VD: nguyn nhn cng ty sn xut mt hng A v B lin kt v chia li nhun? 5. ng dng - Giao dch thng mi: Tm ra cc lut kt hp trong cc giao dch thng mi - Y hc: phn tch gen - Ngn hng: phn tch lin quan ti th thanh ton ca khch hng II. Lut kt hp Trong cc giao dch mua bn, nhn thy rng chng loi cc mt hng l rt ln. Tuy nhin s lng bn ghi giao dch c cha ng thi mt s mt hng xc nh chim mt t l ng quan tm. Chng ta khng bit ngi mua l ai, do vn t ra l s trng lp ngu nhin hay c mt qui lut cng nh mt cn c no hay khng? l tin cho s ra i ca lut kt hp. 1. nh ngha Lut kt hp l lut ch ra mi quan h ca hai hay nhiu i tng (i tng chng ta ang xt y l cc mt hng). Cu trc ca lut nh sau: A=>B (sup, con). C ngha l lut c A th ko theo B vi c s support v confidence, trong : sup= support: ( h tr) l t l giao dch cha c hai mt hng A v B. con= confidence: ( tin cy) l t l giao dch cha mt hng B trong cc giao dch cha mt hng A.
VD v lut kt hp: bnh m=>sa (40%,45%) c ngha l: c bnh m th ko theo sa vi c s: 40% cc giao dch cha c hai mt hng bnh m v sa, trong s cc bn ghi cha bnh m c 45% bn ghi cha sa. Tuy nhin khng phi lut kt hp gia mt hng no cng c ngha, chng ta ch quan tm ti nhng lut c mt c s no hay cn gi l ngng. Mt trong cc ngng thng dng l gii hn c s, min_sup. VD: chng ta ch quan tm ti nhng lut kt hp c h tr ln hn min_sup, nh vy lut kt hp tm c s c gi tr cao hn. 2. ngha Mt ng dng quan trng ca lut kt hp l phn tch th trng. l vic phn tch thi quen mua hng ca khch tm s kt hp gia cc mt hng khc nhau trong mt ln mua hng ca h. VD: Quay li v d trn, trong 1 ln mua hng ti siu th nu khch hng mua bnh m, thng th h s mua sa. Thng tin nh th c th ch dn ngi bn la chn mt hng v v tr ca chng trn gi hng. Do ngi bn c th t sa v bnh m trong phm vi gn k gy tc ng tch cc ti vic mua ca khch cho c hai mt hng ny. Vic nhn ra cc mt hng thng c mua cng nhau gip ngi bn hng c th bn c nhiu hng hn do tng doanh thu. Khai thc lut kt hp nhm tm ra nhng mi lin kt ng quan tm hoc nhng quan h tng quan trong mt tp ln cc i tng. Trong giao dch thng mi khm ph mi quan h trong s lng ln cc bn ghi giao dch c th gip nhiu nh kinh doanh x l gii quyt cc vn nh: thit k catalog nh th no?... III. Thut ton Apriori Vn t ra l lm th no tm ra c cc lut lin kt gia khi lng khng l ca DL? DL th hin mi lin h u? lut kt hp no ng quan tm nht? Tm ra lut kt hp ng quan tm nh th no? 1. Chc nng Apriori l mt thut ton mnh v tp ph bin vi cc lut kt hp logic. Chc nng ca thut ton l tm tp ph bin t xy dng thnh cc lut kt hp. 2. Tp ph bin Tp ph bin l tp cha cc tp con tho mn ngng c s xc nh. VD: tp {A,B} tho mn ngng c s khi SupAB= min_sup Tnh cht: mi tp con khng rng ca mt tp ph bin cng l tp ph bin 3. Phn tch 3.1.Tm tp ph bin s dng s sinh cc ng vin. a. Tnh cht Apriori s dng tp phn t bit trong cc tp ph bin, tp k phn t c dng kho st v a ra tp (k+1) phn t. u tin, tm tp ph bin 1 phn t (tp L1), t tp L1 tm tp L2 l tp ph bin 2 phn t. Tip tc s dng L2 tm L3 ... Qa trnh tm mi tp Lk s duyt ton b CSDL. Theo tnh cht ca tp ph bin ta c suy lun sau: Nu mt phn t khng tho mn ngng nh nht ca h tr, min_sup, th I khng l ph bin, ngha l P(I) < min_sup. Nu phn t A c thm vo tp phn t I c tp I A, khng ph bin mc cao hn I th I A cng khng l tp ph bin ngha l P(I A)<min_sup. b. Qu trnh sinh tp Lk-1 da vo Lk c xy dng nh sau: Bc 1: Kt ni - Tm Ck Ck l tp ng vin k_itemsets c sinh bi Lk-1 lin kt vi chnh n. Vic lin kt
Lk-1 vi Lk-1 c xc nh nh sau: li l tp phn t th i trong Lk-1, trong li(j) l phn t th j (tnh t phn t cui) ca tp phn t li. Hai tp phn t trong Lk-1 c kt ni vi nhau khi v ch khi chng c (k-2) phn t u tin ging nhau. iu kin li[k-1]<lt[k-1] m bo cho vic sinh Ck khng b lp cc phn t. Kt qu kt ni liv lt l li[1] li[2] ... li[k-2] li[k-1]. Bc 2: iu chnh Ck l tp bao gm Lk, tht vy, cc tp con ca n c th hoc khng l tp ph bin, nhng tt c tp ph bin k phn t u c cha trong Ck. Qa trnh duyt v m cc phn t ca Ck s loi b cc phn t khng tho mn gii hn c s v cho kt qu l tp Lk. Vic gim kch thc ca Ck c tin hnh nh sau: - Tt c cc tp (k-1) phn t khng ph bin khng l tp con ca tp ph bin k phn t. - Nu tp (k-1) phn t no ca tp ng vin k phn t khng thuc Lk-1 th ng vin khng l tp ph bin v loi b khi Ck.