Professional Documents
Culture Documents
Khai Phá dữ liệu - Phát hiện luật kết hợp
Khai Phá dữ liệu - Phát hiện luật kết hợp
Ni dung mn hc:
Gii thiu v
cng c WEKA
Tin x l d liu
Cc k thut phn lp v d on
Khai Ph D Liu
Vi mt
tp
p cc g
giao dch
(transactions)
(
) cho trc,, cn tm cc
lut d on kh nng xut hin trong mt giao dch ca cc mc
(items) ny da trn vic xut hin ca cc mc khc
Cc v d ca lut kt hp:
TID
Items
Bread, Milk
2
3
4
5
Khai Ph D Liu
{Diaper} {Beer}
{Milk, Bread} {Eggs, Coke}
{Beer, Bread} {Milk}
Cc nh ngha c bn (1)
Tp mc (Itemset)
Mt tp hp gm mt hoc nhiu mc
Tp mc mc k (k-itemset)
S ln xut hin ca mt tp mc
V d: ({Milk, Bread, Diaper}) = 2
h tr (Support) s
Mt tp mc gm k mc
Tng
s
h
tr (Support
(S
count))
TID
Items
Bread, Milk
2
3
4
5
Bread, Diaper,
Bread
Diaper Beer,
Beer Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread Milk,
Bread,
Milk Diaper,
Diaper Coke
Tp mc thng xuyn
(Frequent/large itemset)
Mt tp mc m h tr ln hn
hoc bng mt gi tr ngng minsup
Khai Ph D Liu
Cc nh ngha c bn (2)
Lut kt hp (Association
rule)
Mt biu
thc ko theo c
dng: X Y, trong X v Y
l cc tp mc
V d: {Milk,
{Milk Diaper} {Beer}
TID
Items
Bread, Milk
2
3
4
5
Cc o nh gi lut
h tr
(Support)
( pp ) s
T l cc giao dch cha c
X v Y i vi tt c cc
giao dch
tin cy (Confidence) c
T l cc giao dch cha c
X v Y i vi cc giao dch
cha X
Khai Ph D Liu
2
= 0 .4
5
h tr gi tr ngng minsup, v
tin cy gi tr ngng minconf
Lit k tt c cc lut kt hp c th
T h ton
Tnh
t h tr
t v
ti
tin cy
cho
h mi
i llut
t
Loi b i cc lut c h tr nh hn minsup hoc c tin
cy nh hn minconf
Items
Cc lut kt hp:
Bread, Milk
2
3
4
5
Bread,
B
d Di
Diaper, B
Beer, E
Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
(s=0.4, c=0.67)
(s=0.4, c=1.0)
(s=0.4, c=0.67)
(s=0.4, c=0.67)
(s
(s=0
0.4,
4 c=0
c 0.5)
5)
(s=0.4, c=0.5)
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
Vi d
mc, th
phi xt
n 2d
cc tp
mc c
th!
BCDE
ABCDE
Khai Ph D Liu
Items
B d Milk
Bread,
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
10
Gim bt s lng
g cc giao
g
dch
cn xt ((N))
Gim gi tr N, khi kch thc (s lng cc mc) ca
tp mc tng ln
Gim bt s
lng cc so snh
(matchings/comparisons) gia cc tp mc v cc
giao dch (N.M)
g
(
)
S dng cc cu trc d liu ph hp (hiu qu)
lu cc tp mc cn xt hoc cc giao dch
Khng cn phi so snh mi tp mc vi mi giao dch
Khai Ph D Liu
11
Gim bt s lng cc tp mc cn xt
X , Y : ( X Y ) s( X ) s(Y )
h tr ca mt tp mc nh hn h tr ca cc tp con
ca n
Khai Ph D Liu
12
Tp mc
khng
thng
xuyn
Cc tp cha ca tp
m c (AB) b loi b
mc
Khai Ph D Liu
13
Count
4
2
4
3
4
1
minsup = 3
Cc tp mc mc 1 (1-itemsets)
Itemset
{Bread,Milk}
{Bread,Beer}
{Bread,Diaper}
{Milk,Beer}
{Milk,Diaper}
{Beer,Diaper}
Nu xt tt c cc tp mc c th:
6C + 6C + 6C = 41
1
2
3
Vi c ch loi b da trn h tr:
6 + 6 + 1 = 13
Khai Ph D Liu
Count
3
2
3
2
3
3
Cc tp mc mc 2 (2itemsets)
(Khng cn xt cc tp
mc c cha mc Coke
hoc Eggs)
Cc tp mc mc 3
(3-itemsets)
Ite m s e t
{ B r e a d ,M ilk ,D ia p e r }
C ount
3
14
15
Cn
phi duyt qua tt
c cc giao dch,
tnh h
tr ca mi
tp
mc cn xt
TID
1
2
3
4
5
Items
Bread,, Milk
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Khai Ph D Liu
16
Gi s chng ta c 15 tp mc mc 3 cn xt:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3
4 5},
5} {3 5 6},
6} {3 5 7},
7} {6 8 9}
9}, {3 6 7}
7}, {3 6 8}
(Hm bm)
3,6,9
147
1,4,7
234
567
345
136
145
2,5,8
124
457
Khai Ph D Liu
125
458
159
356
357
689
367
368
17
1,4,7
Cy bm lu cc tp mc cn xt
3,6,9
258
2,5,8
234
567
Bm
(hash)
i vi
1, 4,
hoc 7
145
136
345
124
125
457
458
159
Khai Ph D Liu
356
367
357
368
689
18
1,4,7
Cy bm lu cc tp mc cn xt
3,6,9
258
2,5,8
234
567
145
Bm
(h h)
(hash)
i vi
2, 5,
hoc 8
136
345
124
125
457
458
159
Khai Ph D Liu
356
367
357
368
689
19
1,4,7
Cy bm lu cc tp mc cn xt
3,6,9
258
2,5,8
234
567
145
Bm
(hash)
i vi
3, 6,
9
hoc
136
345
124
125
457
458
159
Khai Ph D Liu
356
367
357
368
689
20
i vi giao
dch t, hy xc
nh cc tp
mc mc 3?
G s trong
Gi
o g
mi tp mc,
cc mc
c lit k
theo th t
t in
Khai Ph D Liu
21
Xc nh cc tp mc bng cy bm (1)
(Hm bm)
1 2 3 5 6 Giao dch t
1+ 2356
2+ 356
1,4,7
3+ 56
3,6,9
2,5,8
234
567
145
136
345
124
457
125
458
159
356
357
689
Khai Ph D Liu
367
368
22
Xc nh cc tp mc bng cy bm (2)
(Hm bm)
1 2 3 5 6 Giao dch t
1+ 2356
1,4,7
2+ 356
12+ 356
3,6,9
2,5,8
3+ 56
13+ 56
234
15+ 6
567
145
136
345
124
457
125
458
159
356
357
689
367
368
23
Cn thm b nh lu gi tr h tr i vi mi mc
Nu s lng cc mc (tp mc mc 1) thng xuyn tng ln,
th chi ph tnh ton v chi ph I/O (duyt cc giao dch) cng tng
24
Khai Ph D Liu
25
Cc tp mc
khng
thng xuyn
Ranh gii
Khai Ph D Liu
26
Cc tp mc thng xuyn ng
TID
1
2
3
4
5
Items
{A,B}
{B,C,D}
{A B C D}
{A,B,C,D}
{A,B,D}
{A,B,C,D}
Itemset
{A}
{B}
{C}
{{D}}
{A,B}
{A,C}
{A,D}
{B,C}
{B,D}
{C,D}
Khai Ph D Liu
Support
4
5
3
4
4
2
3
3
4
3
Itemset Support
{A,B,C}
2
{A,B,D}
3
{A,C,D}
2
{B,C,D}
3
{A,B,C,D}
2
27
null
TID
Items
ABC
ABCD
BCE
ACDE
DE
124
123
12
124
AB
12
ABC
24
AC
AE
24
ABD
ABE
345
D
2
BC
BD
4
ACD
245
123
AD
1234
4
ACE
BE
ADE
BCD
24
CD
BCE
34
CE
BDE
45
4
ABCD
ABCE
Khng
g c h tr bi
bt k giao dch no
ABDE
ACDE
BCDE
ABCDE
Khai Ph D Liu
28
DE
CDE
123
12
124
AB
12
ABC
24
AC
ABE
AE
345
D
2
BC
BD
4
ACD
245
123
24
1234
AD
ABD
null
4
ACE
BE
ADE
BCD
24
CD
BCE
ABCE
ABDE
34
CE
BDE
45
DE
CDE
# ng = 9
4
ABCD
ng v
ln nht
ACDE
BCDE
# Ln nht = 4
ABCDE
Khai Ph D Liu
29
Bt k tp mc
thng xuyn ln
nht no cng l
tp mc thng
xuyn ng
30
Khai Ph D Liu
31
32
Cc m
mc
c thng
th ng xuyn
n (frequent
(freq ent items) c
c sp xp
p
theo th t gim dn v h tr
33
null
A1
A:1
TID
1
2
3
4
5
6
7
8
9
10
Items
{{A,B}
, }
{B,C,D}
{A,C,D,E}
{A,D,E}
{A,B,C}
{A,B,C,D}
{A}
{A B C}
{A,B,C}
{A,B,D}
{B,C,E}
(Sau khi xt
giao dch th 2)
null
B1
B:1
A:1
B:1
B:1
C:1
null
D:1
B:1
A:2
B:1
C1
C:1
(Sau khi xt
giao dch th 3)
C:1
D:1
E:1
Khai Ph D Liu
D1
D:1
34
Items
{A,B}
{B,C,D}
{A,C,D,E}
{A,D,E}
{A,B,C}
{A,B,C,D}
{A}
{A,B,C}
{A,B,D}
{{B,C,E}
, , }
Bng con tr
Item
A
B
C
D
E
Pointer
C s d
liu cc giao
dch
null
B:2
A:8
B:5
C:1
D:1
C:2
D:1
C:3
D:1
D:1
E:1
E:1
E:1
D:1
35
Nhng ng
Nh
i ny
c
xc
nh
h d dng
d b
bng cc
con ttr
gn vi nt (vd: E)
Khai Ph D Liu
36
Cc ng i kt thc bi mt mc
(Cc
ng
g
i kt
thc
bi e)
(Cc
ng
g
i kt
thc
bi d)
(
(Cc
ng
g i
kt thc bi c)
(
(Cc
ng
g i
kt thc bi b)
Khai Ph D Liu
(
(Cc
ng
g i
kt thc bi a)
37
Xc nh cc tp mc thng xuyn
bi ae
Mi bi ton con nu trn li c phn tch thnh cc bi ton
con nh
h hn
h
Kt hp cc li gii ca cc bi ton con, chng ta s thu c
cc tp mc thng xuyn kt thc bi e
Khai Ph D Liu
38
Xc nh tt c cc ng i
trong FP-tree
FP tree kt thc bi e
Da vo cc ng i tin t i
vi e, xc nh h tr ca e,
bng cch cng cc gi tr h tr
gn vi nt e
Cc ng i
tin t i vi e
Gi s minsup=2,
minsup=2 th tp mc {e}
l tp mc thng xuyn (v n
c h tr =3 > minsup)
Khai Ph D Liu
39
c
d
dng t
tm cc
tp
t mc thng
th xuyn
kt thc
th bi mt
t
mc
Khai Ph D Liu
40
Cp nht cc gi tr h tr i vi cc
ng i tin t
V mt s gi tr h tr tnh n c cc
giao dch khng cha mc e
V d:
ng
g i null b:2 c:2 e:1
tnh n c giao dch {b,c} khng cha mc
e. Do , gi tr h tr phi gn bng 1,
th hin s lng cc giao dch cha {b,c,e}
FP-tree c iu
kin i vi e
Vd: Nt b by gi c gi tr h tr =1
Nt b b loi
l i b
Khai Ph D Liu
41
Vd: t
Vd
tm cc
tp
t mc thng
th xuyn
kt
thc bi de, cc ng i tin t i vi d
c xy dng t biu din FP-tree c iu
ki i vi
kin
i e
Cc ng i
tin t i vi de
h tr ca {d,e}=2: n l mt tp mc
th xuyn
thng
Khai Ph D Liu
42
Vi mi tp mc thng xuyn L, cn tm tt c cc tp
con khc rng f L sao cho: f {L \ f} tha mn iu
kin v tin cy ti thiu
Vd: Vi tp mc thng xuyn {A,B,C,D}, cc lut
cn
xt gm
c:
ABC D, ABD C, ACD B, BCD A,
A BCD,
BCD B ACD,
ACD
C ABD,
ABD
D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,
43
V d: Vi L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD)
tin cy c c tnh khng n iu i vi s lng cc mc
v
p
phi
ca lut
ut
Khai Ph D Liu
44
Cc lut
b loi b
Khai Ph D Liu
45
V d: Kt hp 2 lut
(CD AB, BD AC)
s sinh ra lut cn xt
D ABC
46
Khai Ph D Liu
47