Professional Documents
Culture Documents
Chng 1:
Chng 2:
46
47
Chng 3: Lut kt hp m
3.1.
Lut kt hp
Lut kt hp c gii thiu t nm 1993, bi ton khai ph lut kt hp nhn
c rt nhiu s quan tm ca nhiu nh khoa hc. Ngy nay vic khai ph cc lut
nh th vn l mt lnh vc quan trng trong khai ph d liu. Lut kt hp gip
chng ta tm c cc mi lin quan gia cc mc d liu (items) ca c s d liu
(CSDL) [11]. Lut kt hp l dng kh n gin nhng mang li nhiu hiu qu.
Thng tin v nhng dng lut ny rt quan trng v h tr khng nh trong qu trnh
ra quyt nh.
3.1.1. Cc khi nim
3.1.1.1.
Lut kt hp
48
Tp mc
AB DE
BC E
AB
ABC E
AB DE
B D
h tr v tin cy
{T D : T X }
(3.1)
Ta c: 0 Supp ( X ) 1 vi mi tp hp X.
nh ngha 3.2: h tr ca mt lut kt hp X Y l t l gia s lng cc
bn ghi cha tp hp X Y vi tng s cc bn ghi trong D - K hiu Supp ( X Y ) .
Supp ( X Y )
T D : T X Y
D
(3.2)
49
(3.3)
50
h tr
100% (6/6)
E, BE
83% (5/6)
67% (4/6)
D, BD
50% (3/6)
51
tin cy (c)
c
A
BE
Supp(ABE)/Sup(A)=100%
c
B
AE
Supp(ABE)/Sup(B)=67%
c
E
AB
Supp(ABE)/Sup(E)=80%
c
AB
Supp(ABE)/Sup(AB)=100%
c
AE
Supp(ABE)/Sup(AE)=100%
c
BE
Supp(ABE)/Sup(BE)=80%
52
53
Lut kt hp m
54
3.2.1. M ha d liu
Cc thut ton khai ph lut kt hp nh phn ch c th p dng trn nhng c
s d liu quan h c thuc tnh nh phn hoc c s d liu dng giao dch, ch khng
th p dng trc tip cho cc c s d liu c thuc tnh s v thuc tnh hng mc. Do
, chng ta phi tin hnh m ha d liu cho cc thuc tnh s v thuc tnh hng
mc chuyn chng v dng thuc tnh nh phn.
m ha d liu chng ta phi xc nh cc hm thnh vin biu din gi tr
ngn ng cho cc bin m iu ny li khng thuc hn v nhim v nghin cu ca l
thuyt tp m. Chuyn gia th nhiu nht cng ch c th cung cp c kin v min
xc nh cho cc tp m. V vy, nn vic thit k cc hm thuc cho tp m hon ton
do ngi thit k m nhn.
Sau y l minh ha v hai phng php ri rc ha im cc mn hc ton,
l, ha, sinh, vn, s, a, anh vn, hnh kim v mc u tt nghip ca hc sinh.
3.2.1.1.
M ha cc thuc tnh im
K hiu
1
2
3
4
5
Tp m
DT_Gioi
DT_Kha
DT_TB
DT_Yeu
DT_Kem
Kem ( x) 3
0
,0 x 3
, otherwise
(3.4)
x / 3
5 x
Yeu ( x )
2
0
,0 x 3
,3 x 5
, otherwise
(3.5)
55
x 3
2
7 x
TB ( x)
2
0
x 5
2
10 x
Kha ( x)
3
0
,3 x 5
(3.6)
,5 x 7
, otherwise
x 7
Gioi ( x ) 3
0
,7 x 10
,5 x 7
,7 x 10
(3.7)
, otherwise
(3.8)
, otherwise
V d 3.4: M ha thuc tnh im ton bng cc cng thc (3.4), (3.5), (3.6), (3.7),
(3.8), chng ta c cc gi tr m tng ng:
Bng 3.4. V d m ha thuc tnh im ton theo hnh tam gic.
TID
DT
3.3
4.3
3.8
6.2
Tp m
DT_Gioi
DT_Kha
DT_TB
DT_Yeu
DT_Kem
Kem ( x) 1
3 x
,x 0 x 3
,0 x 2
,2 x 3
(3.9)
0
x 2
Yeu ( x)
1
5 x
,x 2 x 5
,2 x 3
,3 x 4
,4 x 5
(3.10)
56
,x 4 x 7
,4 x 5
(3.11)
,5 x 6
,6 x 7
0
x 4
TB ( x)
1
7 x
Gioi ( x) 1
x 8
, x 8 x 10
,9 x 10
,8 x 9
0
x 6
Kha ( x)
1
9 x
,x 6 x 9
,6 x 7
,7 x 8
,8 x 9
(3.12)
(3.13)
V d 3.5: M ha thuc tnh im ton bng cc cng thc (3.9), (3.10), (3.11), (3.12),
(3.13), chng ta c cc gi tr m tng ng:
Bng 3.5. V d m ha thuc tnh im ton theo hnh thang.
TID
DT
3.3
4.3
3.8
6.2
57
1
Tot ( x )
0
1
Kha ( x)
0
(3.14)
(3.15)
1
TB ( x)
0
1
Yeu ( x)
0
, if x ' TB '
, if x ' TB '
(3.16)
(3.17)
HK
TB
4
3.2.1.3.
T
M ha thuc tnh mc u tt nghip
iemTN *100
60
(%)
(3.18)
Tp m
MDD_Ratcao
MDD_Cao
MDD_TB
MDD_Thap
MDD_Ratthap
Hnh 3.3. Hm thnh vin ca cc tp m Ratcao, Cao, TB, Thap, Ratthap theo hnh
tam gic.
58
Ratthap ( x) 30
0
x 30
20
70 x
TB ( x )
20
0
x 70
Ratcao ( x) 30
0
,0 x 30
(3.19)
, otherwise
,30 x 50
,50 x 70
(3.21)
, otherwise
,70 x 100
,0 x 30
x / 30
50 x
Thap ( x)
20
0
,30 x 50
(3.20)
, otherwise
x 50
20
10 x
Cao ( x)
30
0
,50 x 70
,70 x 100
(3.22)
, otherwise
(3.23)
, otherwise
MDD
52.5
56.7
46.7
47.5
Tp m
MDD_Ratcao
MDD_Cao
MDD_TB
MDD_Thap
DT_Ratthap
Hnh 3.4. Hm thnh vin ca cc tp m Ratcao, Cao, TB, Thap, Ratthap theo
hnh thang
59
, if x 80 x 100
, if 80 x 90
(3.24)
, if x 90
, if x 40 x 70
0
x 40
, if 40 x 50
10
TB ( x )
1
, if 50 x 60
70
, if 60 x 70
10
1
30 x
Ratthap ( x )
10
0
(3.26)
, if 0 x 20
, if x 60 x 90
0
x 60
, if 60 x 70
(3.25)
10
Cao ( x )
1
, if 70 x 80
90 x , if 80 x 90
10
, if x 20 x 50
0
x 20
, if 20 x 30
(3.27)
Thap ( x ) 10
1
, if 30 x 40
50 x , if 40 x 50
10
(3.28)
, if 20 x 30
, if x 30 x 0
MDD
52.5
56.7
46.7
47.5
60
V d 3.9:
FDT ={f1DT_Gioi, f2DT_Kha , f3DT_TB , f4DT_Yeu , f5DT_Kem } (vi k = 5)
FDL ={f1DL_Gioi, f2DL_Kha , f3DL_TB , f4DL_Yeu , f5DL_Kem } (vi k = 5)
FDH ={f1DH_Gioi, f2DH_Kha , f3DH_TB , f4DH_Yeu , f5DH_Kem } (vi k = 5)
FDS ={f1DS_Gioi, f2DS_Kha , f3DS_TB , f4DS_Yeu , f5DS_Kem } (vi k = 5)
FDV ={f1DV_Gioi, f2DV_Kha , f3DV_TB , f4DV_Yeu , f5DV_Kem } (vi k = 5)
FDSU ={f1DSU_Gioi, f2DSU _Kha , f3DSU _TB , f4DSU _Yeu , f5DSU _Kem } (vi k = 5)
FDD ={f1DD_Gioi, f2DD_Kha , f3DD_TB , f4DD_Yeu , f5DD_Kem } (vi k = 5)
61
{
fs( X , A )
x1
v 1
(3.29)
Trong :
X = {x1, x2,. . . , xp}: tp cc thuc tnh, xi l thuc tnh th i trong X.
T = {t1, t2,. . . , tm}: tp cc bn ghi (giao tc), tv l bn ghi th v trong T
62
(t [ x ]) if xi (tv [ xi ]) wxi
xi (t v [ xi ]) xi v i
else
0
(3.30)
nh php ton logic AND trong logic c in. C nhiu cch chn php Tnorm nh:
Php ly min: a b min( a, b)
Tch i s: a b a.b
Tch b chn: a b max( 0, a b 1)
a if (b 1)
Tch Drastic: a b b if (a 1)
0 if (a, b 1)
1
w w
(w > 0)
Thc t, php ly min v php tnh tch i s l hai php ton ph hp nht v n
thun tin cho vic tnh ton v th hin c mi lin h cht ch gia cc thuc tnh
trong cc tp ph bin.
Khi chn php ly min cho ton t T-norm, cng thc (3.29) tr thnh cng thc:
m
min{
fs( X , A )
x1
v 1
(3.31)
Khi chn php tnh tch i s cho ton t T-norm, cng thc (3.29) tr thnh:
m
(
fs( X , A )
v 1
{ xu (tv [ xu ])}
xuX
(3.32)
63
fs( X , A ) fminsup
(3.33)
(3.34)
(3.35)
(3.36)
Trong :
X l tp con khc rng ca X.
64
X \ X l hiu ca hai tp hp X v X.
fc l tin cy ca lut tho mn fc fminconf (do ngi dng xc nh).
A l tp con khc rng ca A v l tp cc tp m tng ng vi cc thuc
tnh trong X.
A \ A l hiu ca hai tp hp A v A.
u vo ca thut ton (inputs): c s d liu D vi tp cc thuc tnh I v cc bn
ghi T, ngng hm thuc w f , h tr ti thiu fminsup , tin cy ti thiu fminconf
, v Ton t T-norm ().
u ra ca thut ton (outputs) : Tp tt c cc lut kt hp m tin cy.
65
ngha
DF
IF
TF
fminsup
h tr ti thiu.
fminconf
tin cy ti thiu.
Ck
Lk
FR
66
{
Ck = To_L_k(Lk-1);
Lk = Tnh_Support_K(Ck, DF, fminsup, wf);
FRk = Tm_lut(L, Lk, fminconf);
L L Lk ;
FR FR FRk ;
k = k +1;
}
End
Thut ton s dng cc chng trnh con sau:
Chng trnh con M_ha_d_liu (D, I, T): hm ny thc hin nhim v
chuyn i cc thuc tnh t c s d liu D ban u sang cc thuc tnh m
cho c s d liu DF , DF gm cc thuc tnh ca D c gn thm cc tp m
v gi tr ca cc thuc tnh cc bn ghi T trong D c nh x thnh mt gi
tr thuc khong [0, 1] thng qua cc hm thuc ca cc tp m tng ng vi
cc thuc tnh.
V d 3.10: cho CSDL D nh sau:
Bng 3.10. Thng tin 10 hc sinh.
TID
1
2
3
4
5
6
7
8
9
10
DT DL DH
3.3 4.7 4.7
4.3 6.0 6.0
3.8 3.9 4.0
6.2 5.6 5.5
5.9 7.8 7.6
3.4 5.6 5.3
4.1 4.9 5.0
4.7 5.1 5.5
3.6 6.4 6.2
5.2 4.6 6.2
DS DV DSU DD DAV
6.4 5.5
5.1 6.1
3.5
7.9 5.8
6.3 7.2
4.6
3.8 3.6
3.5 4.9
4.1
5.5 4.7
5.2 5.8
4.3
8.9 6.5
7.7 8.5
4.4
7.1 5.2
5.1 5.6
2.7
6.7 5.0
5.1 6.2
4.0
6.3 5.2
5.4 6.4
4.5
8.4 5.3
6.0 7.9
2.6
7.2 6.8
5.8 7.0
5.5
HK
TB
T
K
T
T
K
K
K
TB
T
MDD
52.5
56.7
46.7
47.5
60.8
40.0
51.7
60.0
67.5
45.0
67
68
K hiu
IF
K hiu
[DT, DT_Gioi]
[DSU, DSU_Gioi]
26
[DT, DT_Kha]
[DSU, DSU_Kha]
27
[DT, DT_TB]
[DSU, DSU_TB]
28
[DT, DT_Yeu]
[DSU, DSU_Yeu]
29
[DT, DT_Kem]
[DSU, DSU_Kem]
30
[DL, DL_Gioi]
[DD, DD_Gioi]
31
[DL, DL_Kha]
[DD, DD_Kha]
32
[DL, DL_TB]
[DD, DD_TB]
33
[DL, DL_Yeu]
[DD, DD_Yeu]
34
[DL, DL_Kem]
10
[DD, DD_Kem]
35
[DH, DH_Gioi]
11
[DAV, DAV_Gioi]
36
[DH, DH_Kha]
12
[DAV, DAV_Kha]
37
[DH, DH_TB]
13
[DAV, DAV_TB]
38
[DH, DH_Yeu]
14
[DAV, DAV_Yeu]
39
[DH, DH_Kem]
15
[DAV, DAV_Kem]
40
[DS, DS_Gioi]
16
[HK, HK_Tot]
41
[DS, DS_Kha]
17
[HK, HK_Kha]
42
[DS, DS_TB]
18
[HK, HK_TB]
43
[DS, DS_Yeu]
19
[HK, HK_Yeu]
44
[DS, DS_Kem]
20
[MDD, MDD_Ratcao]
45
[DV, DV_Gioi]
21
[MDD, MDD_Cao]
46
[DV, DV_Kha]
22
[MDD, MDD_TB]
47
[DV, DV_TB]
23
[MDD, MDD_Thap]
48
[DV, DV_Yeu]
24
[MDD, MDD_Ratthap]
49
[DV, DV_Kem]
25
69
S dng phng php m ha theo hnh tam gic, chng ta thu c kt qu sau
khi m ha t d liu bng 3.10 nh sau :
Bng 3.12. Gi tr cc thuc tnh c m ha.
DT
3.3
4.3
3.8
6.2
5.9
3.4
4.1
4.7
3.6
5.2
DH
4.7
6.0
4.0
5.5
7.6
5.3
5.0
5.5
6.2
6.2
1
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
11
0.00
0.00
0.00
0.00
0.20
0.00
0.00
0.00
0.00
0.00
2
0.00
0.00
0.00
0.60
0.45
0.00
0.00
0.00
0.00
0.10
12
0.00
0.50
0.00
0.25
0.80
0.15
0.00
0.25
0.60
0.60
3
0.15
0.65
0.40
0.40
0.55
0.20
0.55
0.85
0.30
0.90
4
0.85
0.35
0.60
0.00
0.00
0.8
0.45
0.15
0.70
0.00
5
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
DL
4.7
6.0
3.9
5.6
7.8
5.6
4.9
5.1
6.4
4.6
6
0.00
0.00
0.00
0.00
0.27
0.00
0.00
0.00
0.00
0.00
7
0.00
0.50
0.00
0.30
0.73
0.30
0.00
0.05
0.70
0.00
8
0.85
0.50
0.45
0.70
0.00
0.70
0.95
0.95
0.30
0.80
9 10
0.15 0.00
0.00 0.00
0.55 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.05 0.00
0.00 0.00
0.00 0.00
0.20 0.00
13
0.85
0.50
0.50
0.75
0.00
0.85
1.00
0.75
0.40
0.40
14
0.15
0.00
0.5
0.00
0.00
0.00
0.00
0.00
0.00
0.00
15
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
DS
6.4
7.9
3.8
5.5
8.9
7.1
6.7
6.3
8.4
7.2
16
0.00
0.30
0.00
0.00
0.63
0.03
0.00
0.00
0.47
0.07
17
0.70
0.70
0.00
0.25
0.37
0.97
0.85
0.65
0.53
0.93
18
0.30
0.00
0.40
0.75
0.00
0.00
0.15
0.35
0.00
0.00
19 20
0.00 0.00
0.00 0.00
0.60 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
0.00 0.00
70
DV
5.5
5.8
3.6
4.7
6.5
5.2
5.0
5.2
5.3
6.8
DD
6.1
7.2
4.9
5.8
8.5
5.6
6.2
6.4
7.9
7.0
21
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
31
0.00
0.07
0.00
0.00
0.50
0.00
0.00
0.00
0.30
0.00
22
0.25
0.40
0.00
0.00
0.75
0.10
0.00
0.10
0.15
0.90
32
0.55
0.93
0.00
0.40
0.50
0.30
0.60
0.70
0.70
1.00
23
0.75
0.60
0.30
0.85
0.25
0.90
1.00
0.90
0.85
0.10
33
0.45
0.00
0.95
0.60
0.00
0.70
0.40
0.30
0.00
0.00
24
0.00
0.00
0.70
0.15
0.00
0.00
0.00
0.00
0.00
0.00
34
0.00
0.00
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
25 DSU 26
27
0.00
5.1 0.00 0.05
0.00
6.3 0.00 0.65
0.00
3.5 0.00 0.00
0.00
5.2 0.00 0.10
0.00
7.7 0.23 0.77
0.00
5.1 0.00 0.05
0.00
5.1 0.00 0.05
0.00
5.4 0.00 0.20
0.00
6.0 0.00 0.50
0.00
5.8 0.00 0.40
35 DAV 36
0.00 3.5 0.00
0.00 4.6 0.00
0.00 4.1 0.00
0.00 4.3 0.00
0.00 4.4 0.00
0.00 2.7 0.00
0.00 4.0 0.00
0.00 4.5 0.00
0.00 2.6 0.00
0.00 5.5 0.00
37
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.25
28
0.95
0.35
0.25
0.90
0.00
0.95
0.95
0.80
0.50
0.60
38
0.25
0.80
0.55
0.65
0.70
0.00
0.50
0.75
0.00
0.75
29
0.00
0.00
0.75
0.00
0.00
0.00
0.00
0.00
0.00
0.00
30
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
39
0.75
0.20
0.45
0.35
0.30
0.90
0.50
0.25
0.87
0.00
40
0.00
0.00
0.00
0.00
0.00
0.10
0.00
0.00
0.13
0.00
71
HK
TB
T
K
T
T
K
K
K
TB
T
41
0.00
1.00
0.00
1.00
1.00
0.00
0.00
0.00
0.00
1.00
42
0.00
0.00
1.00
0.00
0.00
1.00
1.00
1.00
0.00
0.00
43
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
1.00
0.00
44
MDD
0.00 52.5
0.00 56.7
0.00 46.7
0.00 47.5
0.00 60.8
0.00 40.0
0.00 51.7
0.00 60.0
0.00 67.5
0.00 45.0
45
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
46
0.13
0.34
0.00
0.00
0.54
0.00
0.09
0.50
0.88
0.00
47
0.88
0.67
0.84
0.88
0.46
0.50
0.92
0.50
0.13
0.75
48
0.00
0.00
0.17
0.13
0.00
0.50
0.00
0.00
0.00
0.25
49
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
72
h tr
54.5%
52.0%
53.3%
58.5%
56.5%
49.8%
47.0%
59.2%
EndIf
EndFor
Return Ck
End
Thut gii trn thuc tnh i.itemk-1.fieldroot, j.itemk-1.fieldroot l nhng thuc tnh
gc; i.itemk, j.itemk l s hiu ca thuc tnh m th k.
V d 3.12. vi L1 nh sau:
73
Item2
13
17
23
28
32
38
47
17
23
28
32
38
47
23
28
32
38
47
28
32
38
47
32
38
47
38
47
47
74
Item2
13
17
23
28
32
38
47
17
23
28
32
38
47
23
28
32
38
47
28
32
h tr
44.5%
42.0%
45.0%
46.0%
31.5%
31.5%
47.2%
35.5%
46.0%
41.5%
23.5%
29.0%
45.2%
42.3%
42.5%
39.7%
26.0%
39.7%
47.5%
31.5%
75
23
38
25.0%
23
47
41.2%
28
32
29.5%
28
38
25.0%
28
47
42.7%
32
38
32.5%
32
47
30.7%
38
47
28.7%
Bng 3.17. L2 - Cc tp ph bin 2 thuc tnh c h tr >= fminsup.
Item1 Item2
h tr
8
13
44.5%
8
23
45.0%
8
28
46.0%
8
47
47.2%
13
23
46.0%
13
47
45.2%
23
28
47.5%
Nh vy, L2 = {{8,13},{8,23},{8,28},{8,47},{13,23},{13,47},{23,28}}
k = 3, tp ng c vin 3 thuc tnh C3 c to t L2 v h tr tng ng:
Bng 3.18. Cc tp 3 thuc tnh v h tr tng ng.
item1
8
8
8
8
8
8
13
13
23
item2
13
13
13
23
23
28
23
23
28
item3
23
28
47
28
47
47
28
47
47
h tr
43.5%
39.5%
39.7%
39.0%
38.7%
40.7%
40.5%
39.2%
35.2%
76
Chng trnh con FRk= Tm_lut(L, Lk, fminconf): chng trnh ny sinh ra lut
kt hp m tin cy t cc tp ph bin Lk.
Thut gii to FRk= Tm_lut(L, Lk, fminconf):
Begin
FRk = ;
For each X Lk do
For each Y X and X Y and Y do
r X \Y Y
EndIf
EndFor
EndFor
End
V d 3.14: thi hnh chng trnh con FRk= Tm_lut(L, Lk, fminconf), vi k=2 v
tin cy cc tiu l fminconf = 80% , ta c cc lut m FRk sinh ra t Lk:
Bng 3.19. Cc lut m c sinh ra t csdl trong bng 3.12.
STT
1
2
3
4
5
6
7
8
9
10
11
Lut
DL_TB DH_TB
DH_TB DL_TB
DL_TB DV_TB
DL_TB DSU_TB
DSU_TB DL_TB
DL_TB MDD_TB
MDD_TB DL_TB
DH_TB DV_TB
DH_TB MDD_TB
DV_TB DSU_TB
DSU_TB DV_TB
h tr
44.5%
44.5%
45.0%
46.0%
46.0%
47.2%
47.2%
46.0%
45.2%
47.5%
47.5%
tin cy
81.7%
85.6%
82.6%
84.4%
81.4%
86.5%
80.0%
88.5%
86.8%
81.2%
84.1%