You are on page 1of 47

Khai Ph D Liu

Nguyn Nht Quang


quangnn-fit@mail.hut.edu.vn
Trng i hc Bch Khoa H Ni
Vin Cng ngh Thng tin v Truyn thng
Nm hc 2011-2012

Ni dung mn hc:

Gii thiu v Khai ph d liu

Gii thiu v
cng c WEKA

Tin x l d liu

Pht hin cc lut kt hp

Cc k thut phn lp v d on

Cc k thut phn nhm

Khai Ph D Liu

Pht hin cc lut kt hp Gii thiu

Bi ton pht hin lut kt hp (Association rule mining)

Vi mt
tp
p cc g
giao dch
(transactions)
(
) cho trc,, cn tm cc
lut d on kh nng xut hin trong mt giao dch ca cc mc
(items) ny da trn vic xut hin ca cc mc khc
Cc v d ca lut kt hp:

TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper, Beer, Eggs


Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Khai Ph D Liu

{Diaper} {Beer}
{Milk, Bread} {Eggs, Coke}
{Beer, Bread} {Milk}

Cc nh ngha c bn (1)

Tp mc (Itemset)

Mt tp hp gm mt hoc nhiu mc

Tp mc mc k (k-itemset)

S ln xut hin ca mt tp mc
V d: ({Milk, Bread, Diaper}) = 2

h tr (Support) s

Mt tp mc gm k mc

Tng

s
h
tr (Support
(S
count))

V d: {Milk, Bread, Diaper}

TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper,
Bread
Diaper Beer,
Beer Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread Milk,
Bread,
Milk Diaper,
Diaper Coke

T l cc giao dch cha mt tp mc


V d: s({Milk, Bread, Diaper}) = 2/5

Tp mc thng xuyn
(Frequent/large itemset)

Mt tp mc m h tr ln hn
hoc bng mt gi tr ngng minsup
Khai Ph D Liu

Cc nh ngha c bn (2)

Lut kt hp (Association
rule)

Mt biu
thc ko theo c
dng: X Y, trong X v Y
l cc tp mc
V d: {Milk,
{Milk Diaper} {Beer}

TID

Items

Bread, Milk

2
3
4
5

Bread, Diaper, Beer, Eggs


Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Cc o nh gi lut

h tr
(Support)
( pp ) s
T l cc giao dch cha c
X v Y i vi tt c cc
giao dch
tin cy (Confidence) c
T l cc giao dch cha c
X v Y i vi cc giao dch
cha X
Khai Ph D Liu

{Milk , Diaper} Beer


s=
c=

( Milk , Diaper, Beer )


|T|

2
= 0 .4
5

(Milk, Diaper, Beer) 2


= = 0.67
(Milk, Diaper)
3
5

Pht hin cc lut kt hp

Vi mt tp cc giao dch T, mc ch ca bi ton pht


hin lut kt hp l tm ra tt c cc lut c:

h tr gi tr ngng minsup, v
tin cy gi tr ngng minconf

Cch tip cn vt cn (Brute-force)

Lit k tt c cc lut kt hp c th
T h ton
Tnh
t h tr
t v
ti
tin cy
cho
h mi
i llut
t
Loi b i cc lut c h tr nh hn minsup hoc c tin
cy nh hn minconf

Phng php vt cn ny c chi ph tnh ton qu


ln, khng p dng c trong thc t!
Khai Ph D Liu

Pht hin lut kt hp


TID

Items

Cc lut kt hp:

Bread, Milk

2
3
4
5

Bread,
B
d Di
Diaper, B
Beer, E
Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

{Milk, Diaper} {Beer}


{Milk, Beer} {Diaper}
{Diaper, Beer} {Milk}
{Beer} {Milk, Diaper}
{Diaper} {Milk,
{Milk Beer}
{Milk} {Diaper, Beer}

(s=0.4, c=0.67)
(s=0.4, c=1.0)
(s=0.4, c=0.67)
(s=0.4, c=0.67)
(s
(s=0
0.4,
4 c=0
c 0.5)
5)
(s=0.4, c=0.5)

Tt c cc lut trn u l s phn tch (thnh 2 tp con) ca


cng tp mc : {Milk, Diaper, Beer}

Cc lut sinh ra t cng mt tp mc s c cng h tr,


nhng c th khc v tin cy

Do , trong qu trnh pht hin lut kt hp, chng ta c th


tch ring 2 yu cu v h tr v tin cy
Khai Ph D Liu

Pht hin lut kt hp

Qu trnh pht hin lut kt hp s gm 2 bc (2 giai


on) quan trng:

Sinh ra cc tp mc thng xuyn (frequent/large itemsets)


Sinh ra tt c cc tp mc c h tr minsup
Sinh ra cc lut kt hp
T mi tp mc thng xuyn (thu c bc trn), sinh ra
tt c cc lut c tin cy cao ( minconf)
Mi lut l mt phn tch nh phn (phn tch thnh 2 phn)
ca mt tp mc thng xuyn

Bc sinh ra cc tp mc thng xuyn (bc th 1)


vn c chi ph tnh ton qu cao!
Khai Ph D Liu

Lattice biu din cc tp mc cn xt


null

AB

AC

AD

AE

BC

BD

BE

CD

CE

DE

ABC

ABD

ABE

ACD

ACE

ADE

BCD

BCE

BDE

CDE

ABCD

ABCE

ABDE

ACDE

Vi d
mc, th
phi xt
n 2d
cc tp
mc c
th!

BCDE

ABCDE

Khai Ph D Liu

Sinh ra cc tp mc thng xuyn


TID
1
2
3
4
5

Items
B d Milk
Bread,
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Phng php vt cn (Brute


(Brute-force)
force)
Mi tp mc trong lattice u c xt
Tnh h tr ca mi tp mc, bng cch duyt qua tt c cc
giao
i d
dch
h
Vi mi giao dch, so snh n vi mi tp mc c xt
phc tp ~ O(N.M.w)
Vi M = 2d, th phc tp ny l qu ln!
Khai Ph D Liu

10

Cc chin lc sinh tp mc thng xuyn

Gim bt s lng cc tp mc cn xt (M)


Tm kim (
(xt)) yy : M=2d
S dng cc k thut ct ta (pruning) gim gi tr M

Gim bt s lng
g cc giao
g
dch
cn xt ((N))
Gim gi tr N, khi kch thc (s lng cc mc) ca
tp mc tng ln

Gim bt s
lng cc so snh
(matchings/comparisons) gia cc tp mc v cc
giao dch (N.M)
g
(
)
S dng cc cu trc d liu ph hp (hiu qu)
lu cc tp mc cn xt hoc cc giao dch
Khng cn phi so snh mi tp mc vi mi giao dch
Khai Ph D Liu

11

Gim bt s lng cc tp mc cn xt

Nguyn tc ca gii thut Apriori Loi b (prunning)


da trn h tr

Nu mt tp mc l thng xuyn, th tt c cc tp con


(subsets) ca n u l cc tp mc thng xuyn
Nu mt tp mc l khng thng xuyn (not frequent)
frequent), th tt c
cc tp cha (supersets) ca n u l cc tp mc khng thng
xuyn

Nguyn tc ca gii thut Apriori da trn c tnh


khng n iu (anti-monotone) ca h tr

X , Y : ( X Y ) s( X ) s(Y )

h tr ca mt tp mc nh hn h tr ca cc tp con
ca n
Khai Ph D Liu

12

Apriori: Loi b da trn h tr

Tp mc
khng
thng
xuyn

Cc tp cha ca tp
m c (AB) b loi b
mc
Khai Ph D Liu

13

Apriori: Loi b da trn h tr


Item
Bread
C k
Coke
Milk
Beer
Diaper
Eggs

Count
4
2
4
3
4
1

minsup = 3

Cc tp mc mc 1 (1-itemsets)
Itemset
{Bread,Milk}
{Bread,Beer}
{Bread,Diaper}
{Milk,Beer}
{Milk,Diaper}
{Beer,Diaper}

Nu xt tt c cc tp mc c th:
6C + 6C + 6C = 41
1
2
3
Vi c ch loi b da trn h tr:
6 + 6 + 1 = 13

Khai Ph D Liu

Count
3
2
3
2
3
3

Cc tp mc mc 2 (2itemsets)
(Khng cn xt cc tp
mc c cha mc Coke
hoc Eggs)
Cc tp mc mc 3
(3-itemsets)

Ite m s e t
{ B r e a d ,M ilk ,D ia p e r }

C ount
3

14

Gii thut Apriori

Sinh ra tt c cc tp mc thng xuyn mc 1 (frequent


1-itemsets):
1
itemsets): cc tp mc thng xuyn ch cha 1 mc
Gn k = 1
Lp
p li, cho n khi khng
g c thm bt k tp
p mc
thng xuyn no mi

T cc tp mc thng xuyn mc k (cha k mc), sinh ra cc


tp mc mc (k
(k+1)
1) cn xt
Loi b cc tp mc mc (k+1) cha cc tp con l cc tp mc
khng thng xuyn mc k
Tnh h tr ca mi tp mc mc (k+1),
(k+1) bng cch duyt qua
tt c cc giao dch
Loi b cc tp mc khng thng xuyn mc (k+1)
Thu c cc tp mc thng xuyn mc (k+1)
Khai Ph D Liu

15

Gim bt s lng cc so snh

Cc so snh (matchings/comparisons) gia cc tp mc cn xt v


cc giao dch

Cn
phi duyt qua tt
c cc giao dch,
tnh h
tr ca mi
tp
mc cn xt

gim bt s lng cc so snh, cn s dng cu trc bm (hash


structure) lu cc tp mc cn xt

Thay v phi so snh mi giao dch vi mi tp mc cn xt, th ch cn


so snh giao dch vi cc tp mc cha trong cc (hashed buckets)

TID
1
2
3
4
5

Items
Bread,, Milk
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Khai Ph D Liu

16

Sinh ra cy bm (hash tree)

Gi s chng ta c 15 tp mc mc 3 cn xt:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3
4 5},
5} {3 5 6},
6} {3 5 7},
7} {6 8 9}
9}, {3 6 7}
7}, {3 6 8}

Sinh ra cy bm (Hash tree):

Hm bm (Hash function) V d: h(p) = p mod 3


Kch thc ti a ca nt l (Max leaf size): S lng ti cc tp
mc c lu mt nt l (Nu s lng cc tp mc vt qu gi tr
ny, nt s tip tc b phn chia) V d: Max leaf size = 3

(Hm bm)
3,6,9
147
1,4,7

234
567
345
136

145

2,5,8
124
457
Khai Ph D Liu

125
458

159

356
357
689

367
368

17

Pht hin lut kt hp bng cy bm (1)


(Hm bm)

1,4,7

Cy bm lu cc tp mc cn xt

3,6,9

258
2,5,8

234
567
Bm
(hash)
i vi
1, 4,
hoc 7

145

136
345

124

125

457

458

159

Khai Ph D Liu

356

367

357

368

689

18

Pht hin lut kt hp bng cy bm (2)


(Hm bm)

1,4,7

Cy bm lu cc tp mc cn xt

3,6,9

258
2,5,8

234
567
145
Bm
(h h)
(hash)
i vi
2, 5,
hoc 8

136
345

124

125

457

458

159

Khai Ph D Liu

356

367

357

368

689

19

Pht hin lut kt hp bng cy bm (3)


(Hm bm)

1,4,7

Cy bm lu cc tp mc cn xt

3,6,9

258
2,5,8

234
567
145
Bm
(hash)
i vi
3, 6,
9
hoc

136
345

124

125

457

458

159

Khai Ph D Liu

356

367

357

368

689

20

Cc tp mc mc k trong mt giao dch

i vi giao
dch t, hy xc
nh cc tp
mc mc 3?

G s trong
Gi
o g
mi tp mc,
cc mc
c lit k
theo th t
t in

Khai Ph D Liu

21

Xc nh cc tp mc bng cy bm (1)
(Hm bm)

1 2 3 5 6 Giao dch t
1+ 2356

2+ 356

1,4,7

3+ 56

3,6,9

2,5,8

234
567
145

136
345

124
457

125
458

159

356
357
689

Khai Ph D Liu

367
368

22

Xc nh cc tp mc bng cy bm (2)
(Hm bm)

1 2 3 5 6 Giao dch t
1+ 2356

1,4,7

2+ 356

12+ 356

3,6,9

2,5,8

3+ 56
13+ 56
234

15+ 6

567
145

136
345

124
457

125
458

159

356
357
689

367
368

Ch cn so snh ggiao dch t vi 11


(trong tng s 15) tp mc cn xt!
Khai Ph D Liu

23

Apriori: Cc yu t nh hng phc tp

La chn gi tr ngng minsup

S lng cc mc trong c s d liu (cc giao dch)

Cn thm b nh lu gi tr h tr i vi mi mc
Nu s lng cc mc (tp mc mc 1) thng xuyn tng ln,
th chi ph tnh ton v chi ph I/O (duyt cc giao dch) cng tng

Kch thc ca c s d liu (cc giao dch)

Gi tr minsup qu thp s sinh ra nhiu tp mc thng xuyn


iu ny c th
lm tng s lng cc tp mc phi xt v
di (kch thc) ti a ca cc tp mc thng xuyn

Gii thut Apriori duyt c s d liu nhiu ln. Do , chi ph


tnh ton ca Apriori tng ln khi s lng cc giao dch tng ln

Kch thc trung bnh ca cc giao dch

Khi kch thc (s lng cc mc) trung bnh ca cc giao dch


tng ln, th di ti a ca cc tp mc thng xuyn cng
tng v chi ph duyt cy bm cng tng
tng,
Khai Ph D Liu

24

Biu din cc tp mc thng xuyn

Trong thc t, s lng cc tp mc thng xuyn c


sinh ra t mt csdl giao dch c th rt ln

Cn mt cch biu din ngn gn (compact


representation)

Bng mt tp (nh) cc tp mc thng xuyn i din m c


th dng suy ra (sinh ra) tt c cc tp mc thng xuyn
khc

C 2 cch biu din nh vy

Cc tp mc thng xuyn ln nht (Maximal frequent itemsets)

Cc tp mc thng xuyn ng (Closed frequent itemsets)

Khai Ph D Liu

25

Cc tp mc thng xuyn ln nht

Mt tp mc thng xuyn l ln nht (Maximal frequent itemset), nu


mi tp cha (superset) ca n u l tp mc khng thng xuyn
Cc tp mc
thng xuyn
ln nht

Cc tp mc
khng
thng xuyn

Ranh gii
Khai Ph D Liu

26

Cc tp mc thng xuyn ng

Mt tp mc thng xuyn l ng (Closed frequent itemset),


nu khng
g c tp
p cha no ca n c cng
g
h tr
vi n

TID
1
2
3
4
5

Items
{A,B}
{B,C,D}
{A B C D}
{A,B,C,D}
{A,B,D}
{A,B,C,D}

Itemset
{A}
{B}
{C}
{{D}}
{A,B}
{A,C}
{A,D}
{B,C}
{B,D}
{C,D}

Khai Ph D Liu

Support
4
5
3
4
4
2
3
3
4
3

Itemset Support
{A,B,C}
2
{A,B,D}
3
{A,C,D}
2
{B,C,D}
3
{A,B,C,D}
2

27

Tp mc thng xuyn: ln nht vs. ng (1)


TIDs

null

TID

Items

ABC

ABCD

BCE

ACDE

DE

124

123

12

124

AB

12
ABC

24

AC

AE

24
ABD

ABE

345
D

2
BC

BD

4
ACD

245

123

AD

1234

4
ACE

BE

ADE

BCD

24

CD

BCE

34

CE

BDE

45

4
ABCD

ABCE

Khng
g c h tr bi
bt k giao dch no

ABDE

ACDE

BCDE

ABCDE

Khai Ph D Liu

28

DE

CDE

Tp mc thng xuyn: ln nht vs. ng (2)


Minsup = 2
124

123

12

124

AB

12
ABC

24

AC

ABE

AE

345
D

2
BC

BD

4
ACD

245

123

24

1234

AD

ABD

ng, nhng khng


phi l ln nht

null

4
ACE

BE

ADE

BCD

24

CD

BCE

ABCE

ABDE

34

CE

BDE

45

DE

CDE

# ng = 9

4
ABCD

ng v
ln nht

ACDE

BCDE

# Ln nht = 4

ABCDE

Khai Ph D Liu

29

Tp mc thng xuyn: ln nht vs. ng (3)

Bt k tp mc
thng xuyn ln
nht no cng l
tp mc thng
xuyn ng

Cch biu din s


dng tp mc
thng
g xuyn
y ln
nht khng gi
thng tin v h
tr ca cc tp
con (ca mi
tp
mc thng
xuyn ln nht)
Khai Ph D Liu

30

Gii thut FP-Growth

Mt phng php khc cho vic xc nh cc tp mc


thng xuyn

Nh li: Apriori s dng c ch sinh-kim tra (sinh ra cc tp


mc cn xt, v kim tra xem mi tp mc c phi l thng
xuyn)

FP-Growth biu din d liu ca cc giao dch bng mt


cu trc d liu gi l FP-tree
FP tree

FP-Growth s dng cu trc FP-tree xc nh trc


tip cc tp mc thng xuyn

Khai Ph D Liu

31

Biu din bng FP-tree

Vi mi giao dch, FP-tree xy dng mt ng i (path)


trong cy

Hai giao dch c cha cng mt s cc mc, th ng i


ca chng s c phn (on) chung

Cng nhiu cc ng i c cc phn chung, th vic biu din


bng FP-tree s cng gn (compressed/compacted)

Nu kch thc ca FP-tree nh c th lu tr


trong b nh lm vic, th gii thut FP-Growth c th
xc nh cc tp mc thng xuyn trc tip t FP
FP-tree
tree
lu trong b nh

Khng cn phi lp li vic duyt d liu lu trn cng


Khai Ph D Liu

32

Xy dng FP-tree (1)

Ban u, FP-tree ch cha duy nht nt gc (c biu


null))
din bi k hiu

C s d liu cc giao dch c duyt ln th 1, xc


nh ((tnh)) h tr ca mi mc

Cc mc khng thng xuyn (infrequent items) b loi b

Cc m
mc
c thng
th ng xuyn
n (frequent
(freq ent items) c
c sp xp
p
theo th t gim dn v h tr

Trong v d ( cc slides tip theo), th t gim dn v h tr:


A, B, C, D, E

C s d liu cc giao dch c duyt ln th 2, xy


d
dng
FP t
FP-tree
Khai Ph D Liu

33

Xy dng FP-tree (2)


(Sau khi xt
giao dch th 1)

null
A1
A:1

TID
1
2
3
4
5
6
7
8
9
10

Items
{{A,B}
, }
{B,C,D}
{A,C,D,E}
{A,D,E}
{A,B,C}
{A,B,C,D}
{A}
{A B C}
{A,B,C}
{A,B,D}
{B,C,E}

(Sau khi xt
giao dch th 2)

null

B1
B:1

A:1

B:1

B:1

C:1

null

D:1
B:1

A:2
B:1

C1
C:1

(Sau khi xt
giao dch th 3)

C:1
D:1
E:1

Khai Ph D Liu

D1
D:1
34

Xy dng FP-tree (3)


TID
1
2
3
4
5
6
7
8
9
10

Items
{A,B}
{B,C,D}
{A,C,D,E}
{A,D,E}
{A,B,C}
{A,B,C,D}
{A}
{A,B,C}
{A,B,D}
{{B,C,E}
, , }

Bng con tr
Item
A
B
C
D
E

Pointer

C s d
liu cc giao
dch

(Sau khi xt giao


dch th 10)

null

B:2

A:8
B:5

C:1

D:1

C:2
D:1

C:3
D:1

D:1

E:1

E:1

E:1

D:1

Cc con tr c s dng trong


qu trnh sinh cc tp
q
p mc

thng xuyn ca FP-Growth


Khai Ph D Liu

35

FP-Growth: Sinh cc tp mc thng xuyn

FP-Growth sinh cc tp mc thng xuyn trc tip


t FP
FP-tree
tree, t mc l n mc gc (bottom-up)
(bottom up)

Trong v d trn, FP-Growth trc ht tm cc tp mc thng


xuyn kt thc bi E sau mi tm cc tp mc thng xuyn
kt thc bi D
D bi C
C bi B
B v bi A

V mi giao dch c biu din bng mt ng i


trong FP-tree,
FP tree, chng ta c th xc nh cc tp mc
thng xuyn kt thc bi mt mc (vd: E), bng cch
duyt cc ng i cha mc (E)

Nhng ng
Nh
i ny
c

xc
nh
h d dng
d b
bng cc
con ttr

gn vi nt (vd: E)

Khai Ph D Liu

36

Cc ng i kt thc bi mt mc
(Cc
ng
g
i kt
thc
bi e)

(Cc
ng
g
i kt
thc
bi d)

(
(Cc
ng
g i
kt thc bi c)

(
(Cc
ng
g i
kt thc bi b)
Khai Ph D Liu

(
(Cc
ng
g i
kt thc bi a)
37

Xc nh cc tp mc thng xuyn

FP-Growth tm tt c cc tp mc thng xuyn kt thc


bi mt mc da theo chin lc chia tr (divide
(divideand-conquer)

V d, cn tm tt c cc tp mc thng xuyn kt thc bi e


T ht
Trc
ht, kim
ki tra
t tp
t mc mc
1 ({
({e})
}) c
phi
hi l t
tp mc
thng xuyn
Nu n l tp mc thng xuyn, xt cc bi ton con: tm tt c
cc
t
tp mc thng
th xuyn
kt thc
th bi de
d
bi cebi
bi bev
b

bi ae
Mi bi ton con nu trn li c phn tch thnh cc bi ton
con nh
h hn
h
Kt hp cc li gii ca cc bi ton con, chng ta s thu c
cc tp mc thng xuyn kt thc bi e
Khai Ph D Liu

38

Vd: Cc tp mc thng xuyn kt thc bi e

Xc nh tt c cc ng i
trong FP-tree
FP tree kt thc bi e

Cc ng i tin t (prefix paths) i


vi e

Da vo cc ng i tin t i
vi e, xc nh h tr ca e,
bng cch cng cc gi tr h tr
gn vi nt e

Cc ng i
tin t i vi e

Gi s minsup=2,
minsup=2 th tp mc {e}
l tp mc thng xuyn (v n
c h tr =3 > minsup)
Khai Ph D Liu

39

Vd: Cc tp mc thng xuyn kt thc bi e

V {e} l tp mc thng xuyn, nn FP-Growth phi gii


quyt cc bi ton con: tm cc tp mc thng xuyn
kt thc bi debi cebi bev bi ae

Trc tin, cn chuyn cc ng i tin t ca e thnh


biu din FP-tree c iu kin (conditional FP-tree)

C cu trc tng t nh FP-tree

c
d
dng t
tm cc
tp
t mc thng
th xuyn
kt thc
th bi mt
t
mc

Khai Ph D Liu

40

Xy dng FP-tree c iu kin

Cp nht cc gi tr h tr i vi cc
ng i tin t

V mt s gi tr h tr tnh n c cc
giao dch khng cha mc e
V d:
ng
g i null b:2 c:2 e:1
tnh n c giao dch {b,c} khng cha mc
e. Do , gi tr h tr phi gn bng 1,
th hin s lng cc giao dch cha {b,c,e}

Loi b nt e khi cc ng i tin t


Sau khi cp nht cc gi tr h tr i vi
cc ng i tin t
t, mt s mc c th tr
nn khng thng xuyn B loi b

FP-tree c iu
kin i vi e

Vd: Nt b by gi c gi tr h tr =1
Nt b b loi
l i b
Khai Ph D Liu

41

Vd: Cc tp mc thng xuyn kt thc bi e

FP-Growth s dng cu trc biu din FPtree c iu kin i vi e, gii quyt cc


bi ton con: tm cc tp mc thng xuyn
kt thc bi debi cebi bev bi ae

Vd: t
Vd
tm cc
tp
t mc thng
th xuyn
kt
thc bi de, cc ng i tin t i vi d
c xy dng t biu din FP-tree c iu
ki i vi
kin
i e

Bng cch cng vi gi tr h tr gn vi nt


d chng ta xc nh c h tr cho tp
d,
{d,e}

Cc ng i
tin t i vi de

h tr ca {d,e}=2: n l mt tp mc
th xuyn
thng

Khai Ph D Liu

42

Sinh ra cc lut kt hp (1)

Vi mi tp mc thng xuyn L, cn tm tt c cc tp
con khc rng f L sao cho: f {L \ f} tha mn iu
kin v tin cy ti thiu
Vd: Vi tp mc thng xuyn {A,B,C,D}, cc lut
cn
xt gm
c:
ABC D, ABD C, ACD B, BCD A,
A BCD,
BCD B ACD,
ACD
C ABD,
ABD
D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,

Nu |L| = k, th s phi xt (2k 2) cc lut kt hp c


th (b qua 2 lut: L v L)
Khai Ph D Liu

43

Sinh ra cc lut kt hp (2)

Lm th no sinh ra cc lut t cc tp mc thng


xuyn,
y , mt
cch c hiu
q
qu?

Xt tng qut, tin cy khng c c tnh khng


n iu (anti-monotone)
c(ABC D) c th ln hn hoc nh hn c(AB D)

Nhng, tin cy ca cc lut c sinh ra t cng


mt
tp
mc thng
h xuyn
th
h li
l i c

c tnh
h khng
kh
n iu

V d: Vi L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD)
tin cy c c tnh khng n iu i vi s lng cc mc
v
p
phi
ca lut
ut
Khai Ph D Liu

44

Apriori: Sinh ra cc lut (1)


Lattice ca cc lut
Lut c
tin cy thp

Cc lut
b loi b
Khai Ph D Liu

45

Apriori: Sinh ra cc lut (2)

Cc lut cn xt c sinh ra bng cch kt hp 2 lut


c cng tin t (phn bt u) ca phn kt lun
(rule consequent)

V d: Kt hp 2 lut
(CD AB, BD AC)
s sinh ra lut cn xt
D ABC

Loi b lut D ABC nu bt k mt


lut con ca n (AD BC, BCD A, ) khng c
tin cy cao (< minconf)
Khai Ph D Liu

46

Ti liu tham kho


P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining
(chapter 6). Addison
Addison-Wesley,
Wesley, 2005.

Khai Ph D Liu

47

You might also like