You are on page 1of 38

1

1
KHAI THC
D LIU &
NG DNG
(DATA MINING)
GV : NGUYN HONG T ANH
2
BI 4 PHN 1
PHN LP D
LIU
2
3
NI DUNG
1. Gii thiu
2. Phng php da trn cy
quyt nh
3. Phng php da trn lut
4
GII THIU
1. Phn lp :
Cho tp cc mu phn lp trc, xy
dng m hnh cho tng lp
Mc ch : Gn cc mu mi vo cc lp
vi chnh xc cao nht c th.
Cho CSDL D={t
1
,t
2
,,t
n
} v tp cc lp
C={C
1
,,C
m
}, phn lp l bi ton xc
nh nh x f : DgC sao cho mi t
i
c
gn vo mt lp.
3
5
GII THIU
V d Phn lp :
Phn lp khch hng (trong
ngn hng) cho vay hay
khng
D on t bo khi u l lnh
tnh hay c tnh
Phn loi giao dch th tn dng
l hp php hay gian ln
Phn loi tin tc thuc lnh vc
ti chnh, thi tit, gii tr, th
thao,
D on khi no sng c l
Chun on y khoa
6
GII THIU
2. Qui trnh phn lp :
Bc 1 : Xy dng m hnh
M t tp cc lp xc nh trc
Tp hun luyn : cc mu / b dnh
cho xy dng m hnh
Mi mu/ b thuc v mt lp
nh ngha trc
Tm lut phn lp, cy quyt nh
hoc cng thc ton m t lp
4
7
GII THIU
2. Qui trnh phn lp (tt) :
Bc 2 : S dng m hnh
Phn lp cc i tng cha bit
Xc nh chnh xc ca m hnh, s
dng tp DL kim tra c lp
chnh xc chp nhn c -> p
dng m hnh phn lp cc mu/b
cha xc nh c nhn lp
8
V d : XD m hnh
Training
Data
NAME RANK YEARS TENURED
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no
Anne Associate Prof 3 no
Classification
Algorithms
IF rank = professor
OR years > 6
THEN tenured = yes
Classifier
(Model)
5
9
V d : S dng m hnh
Classifier
Testing
Data
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
10
GII THIU
3. Cc k thut phn lp :
Phng php da trn cy quyt
nh
Phng php da trn lut
Phng php Nave Bayes
Phng php da trn th hin
Mng Nron
SVM (support vector machine)
Tp th
6
11
NI DUNG
1. Gii thiu
2. Phng php da
trn cy quyt nh
3. Phng php da trn lut
12
CY QUYT NH
1. nh ngha
2. Xy dng cy quyt nh
3. Thut ton xy dng cy quyt nh
4. Cch phn chia mu
o la chn thuc tnh
5. Vn qu ph hp vi DL
6. u im
7
13
CY QUYT NH
1. nh ngha
Cy quyt nh l mt
cu trc phn cp ca
cc nt v cc nhnh
3 loi nt trn cy:
Nt gc
Nt ni b : mang tn
thuc tnh ca CSDL
Nt l : mang tn lp
C
i
Nhnh : mang gi tr
ca thuc tnh
Nt gc Nt ni b
Nt l
14
CY QUYT NH
2. Xy dng cy quyt nh
Gm 2 bc :
Bc 1 : Thit lp cy quyt nh
Bt u t gc
Kim tra cc gi tr ca thuc tnh v phn
chia cc mu qui

Bc 2 : Ta bt cy
Xc nh v loi b bt cc nhnh khng n
nh hoc c bit
8
15
V D 1: D liu hun luyn
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
3140 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
3140 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
3140 medium no excellent yes
3140 high yes fair yes
>40 medium no excellent no
16
V D 1 : CY QUYT NH
age?
student? credit rating?
no yes fair excellent
<=30
>40
no no yes yes
yes
31..40
9
17
CY QUYT NH
3. Thut ton xy dng cy quyt
nh
Hunts Algorithm
CART
ID3, C4.5
SLIQ, SPRINT
18
CY QUYT NH
3. Thut ton xy dng cy quyt nh
tng chnh :
Phng php tham lam (greedy)
Phn chia tp mu da trn thuc tnh cho
kt qu ti u ha tiu chun
Vn :
Xc nh cch phn chia cc mu
Da trn o s ng nht ca d liu
iu kin dng
10
19
CY QUYT NH
3. Thut ton xy dng cy quyt nh
(tt)
iu kin dng :
Tt c cc mu ri vo mt nt thuc v
cng mt lp
Khng cn thuc tnh no c th dng
phn chia mu na
Khng cn li mu no ti nt
20
CY QUYT NH
4. Cch phn chia cc mu
Tiu chun phn chia : to ra cc nhm sao cho
mt lp chim u th trong tng nhm
Thuc tnh c chn l thuc tnh cho o tt
nht, c li nht cho qu trnh phn lp
o nh gi cht lng phn chia l
o s ng nht
Entropy (Information Gain)
Information Gain Ratio
Gini Index
11
21
CY QUYT NH
li thng tin (Information gain) : ID3 /
C4.5
Chn thuc tnh c li thng tin cao nht
Gi s :
D : tp hun luyn
C
i,D
: tp cc mu ca D thuc lp C
i
vi i = {1, , m}
|C
i, D
|, |D| : lc lng ca tp C
i,D
v D tng ng
p
i
l xc sut mt mu bt k ca D thuc v lp C
i
Thng tin k vng phn lp mt mu trong D l :
) ( log ) (
2
1
i
m
i
i
p p D Info

=
=
D
C
p
D i
i
,
=
22
V D 1: D liu hun luyn
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
3140 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
3140 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
3140 medium no excellent yes
3140 high yes fair yes
>40 medium no excellent no
12
23
CY QUYT NH
li thng tin (Information gain) :
Trong VD1 : 14 mu tin, trong c 9 mua my
tnh
|D| = 14; m = 2; C
1
=Mua ; C
2
=Khng mua
|C
1, D
|= 9, |C
2, D
|=5
Thng tin k vng phn lp mt mu trong D
l :
940 . 0
14
5
log
14
5
14
9
log
14
9
) 5 , 9 I(
2 2
= = = Info(D)
24
CY QUYT NH
Thuc tnh A c cc gi tr :{a
1
, a
2
, ,a
v
}
Dng thuc tnh A phn chia tp hun luyn D
thnh v tp con {D
1
, D
2
, , D
v
}
Thng tin cn thit phn chia D theo thuc tnh A :
) ( ) (
1
j
v
j
j
A
D I
D
D
D Info

=
=
li thng tin (information gain) da trn phn
chia theo thuc tnh A :
) ( ) ( D Info D Info Gain(A)
A
=
13
25
V D 1 : INFORMATION GAIN
K hiu :
Lp P: buys_computer = Yes
Lp N: buys_computer = No
Info(D) = I(9, 5) =0.940
Tnh li thng tin cho thuc tnh age ?
age p
j
n
j
I(p
j
, n
j
)
<=30 2 3 0.971
3140 4 0 0
>40 3 2 0.971
971 . 0
5
3
log
5
3
5
2
log
5
2
) 3 , 2 (
2 2
= = I
0
4
0
log
4
0
4
4
log
4
4
) 0 , 4 (
2 2
= = I
971 . 0
5
2
log
5
2
5
3
log
5
3
) 2 , 3 (
2 2
= = I
26
V D 1 : INFORMATION GAIN
Tnh li thng tin cho thuc tnh age ?
Khi :
694 . 0 ) 2 , 3 (
14
5
) 0 , 4 (
14
4
) 3 , 2 (
14
5
) ( = + + = I I I D Info
age
246 . 0 ) ( ) ( ) ( = = D Info D Info age Gain
age
Suy ra :
14
27
BI TP
Thi gian: 10
Cho tp DL nh trong v d 1
K hiu:
Lp P: buys_computer = Yes
Lp N: buys_computer = No
Tnh li thng tin da trn phn chia theo
thuc tnh
income:
student:
credit_rating:
28
V D 1: IG
age?
student? credit rating?
no yes fair excellent
<=30
>40
no no yes yes
yes
31..40
15
29
CY QUYT NH
Information Gain Ratio: C4.5
o Gain c xu hng thin v cho cc
thuc tnh c nhiu gi tr -> cn chun ha
o Gain
Chn thuc tnh c o Gain Ratio ln
nht
GainRatio(A) = Gain(A)/SplitInfo
A
(D)
)
| |
| |
( log
| |
| |
) (
2
1
D
D
D
D
D SplitInfo
j
v
j
j
A
=

=
30
CY QUYT NH

=
=
m
i
p
i
D gini
1
2
1 ) (
Ch mc Gini (Gini index) : CART, SLIQ,
SPRINT
Tp hun luyn D cha cc mu ca m lp.
Ch mc Gini ca tp D gini(D) l :

vi p
i
l tn sut ca lp C
i
trong D
Cho tp DL ca v d 1, ta c gini(D) l :
459 . 0
14
5
14
9
1 ) (
2 2
=
|
.
|

\
|

|
.
|

\
|
= D gini
16
31
CY QUYT NH
Ch mc Gini (Gini index) :
Thuc tnh A c cc gi tr :{a
1
, a
2
, ,a
v
}
Dng thuc tnh A phn chia tp hun luyn D
thnh v tp con {D
1
, D
2
, , D
v
}
Ch mc Gini ca phn chia D theo thuc tnh
A :

) ( ) (
j
1
D gini
D
D
D gini
v
j
j
A
=
=
Ti mi cp, chng ta chn thuc tnh c ch
mc Gini nh nht phn chia tp d liu
32
V D 1 : GINI INDEX
Lp P: buys_computer =
Yes
Lp N: buys_computer =
No
gini(D) =0.459
Tnh ch mc gini cho thuc
tnh age ?
Suy ra :
age p
j
n
j
gini(p
j
, n
j
)
<=30 2 3 0.48
3140 4 0 0
>40 3 2 0.48
343 . 0
) 2 , 3 (
14
5
) 0 , 4 (
14
4
) 3 , 2 (
14
5
) (
=
+ + = gini gini gini D gini
age
17
33 33
V d : GINI INDEX
Sau khi tnh o ch mc Gini da
trn phn chia theo thuc tnh :
Gini
age
(D) = 0.343
Gini
income
(D) = 0.44
Gini
student
(D) = 0.367
Gini
credit_rating
(D) =0.429
o ch mc Gini da trn phn chia
theo thuc tnh age l nh nht nn ta
s chia DL theo thuc tnh age
34 34
V d : GINI INDEX
age?
<=30 >40 31..40
income student credit_rating buys_computer
high no fair no
high no excellent no
medium no fair no
low yes fair yes
medium yes excellent yes
income student credit_rating buys_computer
medium no fair yes
low yes fair yes
low yes excellent no
medium yes fair yes
medium no excellent no
Lm tng
t
Lm tng
t
yes
18
35
CY QUYT NH
5. Vn qu ph hp vi DL (overfitting)
C 500 im DL
hnh trn and 500
im hnh tam
gic.

Cc im hnh trn :
0.5 s sqrt(x
1
2
+x
2
2
) s 1
Cc im hnh tam
gic:
sqrt(x
1
2
+x
2
2
) > 0.5 or
sqrt(x
1
2
+x
2
2
) < 1
36
CY QUYT NH
5. Vn qu ph hp vi DL (overfitting)
Overfitting
19
37
CY QUYT NH
Cy to ra c th
qu ph hp vi
DL hun luyn :
Qu nhiu
nhnh do
nhiu hoc c
bit
38
CY QUYT NH
Cy to ra c th
qu ph hp vi DL
hun luyn do thiu
mu
Thiu cc im
DL na di
ca biu gy
kh khn cho
vic d on lp
chnh xc ca
vng ny.
20
39
CY QUYT NH
Kt qu : chnh xc km khi phn lp cho mu mi
Hai phng php trnh qu PH HP DL :
Lai b trc : Dng thm nhnh cy sm, ngay khi
n c th to ra o di ngng no
Rt kh chn ngng thch hp
Loi b sau : Loi bt nhnh t cy hon chnh (t
di ln)
S dng tp DL c lp kim tra v loi bt
Xc nh chnh xc kch thc cy kt qu nh th no ?
Phn chia : tp hun luyn (2/3), tp test (1/3)
S dng nh gi cho ( cross-validation)
40
CY QUYT NH
6. u im :
D dng xy dng cy
Phn lp mu mi nhanh
D dng din gii cho cc cy c kch thc nh
chnh xc chp nhn c so vi cc k thut phn lp khc
trn nhiu tp DL n
V d : C4.5
Dng li thng tin
Sp xp thuc tnh s ti tng nt
Yu cu ton b DL cha va trong b nh
Khng ph hp vi cc tp d liu ln
http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz

21
41
VD : Biu din kt qu phn loi
(DBMiner)
42
NI DUNG
1. Gii thiu
2. Phng php da trn cy
quyt nh
3. Phng php da
trn lut
22
43
GII THIU
S dng cc lut IF-THEN phn loi
Lut c dng : IF (iu kin) Then Y
Vi iu kin : kt hp cc thuc tnh
Y l nhn lp
V d :
IF age = youth AND student = yes THEN
buys_computer = yes
Lut R ph mt mu x nu cc thuc tnh
ca mu tha mn iu kin ca lut
44
GII THIU
ph ca lut :
coverage(R)
T l cc mu tha mn iu
kin (v tri) ca lut
chnh xc ca lut :
accuracy(R)
T l cc mu tha mn c
iu kin v kt lun (2 v tri,
phi) ca lut
V d :
IF (Marital Status=Single)
No
Coverage(R) = 40%
Accuracy(R) = 50%
Tid Refund Marital
Status
Taxable
Income
Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10

23
45
V D 2
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
Cho tp DL hun luyn sau :

46
V D 2
Tp lut :
R1: (Give Birth = no) . (Can Fly = yes) Birds
R2: (Give Birth = no) . (Live in Water = yes) Fishes
R3: (Give Birth = yes) . (Blood Type = warm) Mammals
R4: (Give Birth = no) . (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
S dng tp lut xc nh lp cho cc mu mi
sau :

Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
24
47
V D 2
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
Mu lemur ph bi lut R3, nn c phn vo lp
Mammals
Mu turtle ph bi c lut R4 v R5
Mu dogfish shark khng c ph bi bt k lut
no.
Cch gii quyt ?
48
GII THIU
Cch gii quyt ?
Xp hng cc lut theo u tin:
Theo kch thc ca lut : cc lut c tp iu kin
ln hn s c u tin cao hn
Theo lut : cc lut c xp hng theo o cht
lng lut hoc theo kin chuyn gia
Theo lp : gom cc lut thuc cng mt lp
Nu mt mu c ph bi nhiu lut th chn
lut c th hng cao nht
Nu khng ph bi bt k lut no th gn vo
lp mc nh
25
49
GII THIU
Xy dng lut phn lp :
Phng php trc tip :
Rt cc lut trc tip t d liu
RIPPER, CN2, ILA, FOIL, AQ,
Phng php gin tip :
Rt lut t cc m hnh phn lp khc nh
cy quyt nh, mng nron,
Lut C4.5
50
GiI THIU
Phng php gin tip :
Rt trch lut t cy quyt nh
Lut to ra t tng ng dn t gc n l
Mi cp gi tr thuc tnh dc theo ng dn
to nn php kt
Cc nt l mang tn ca lp
V d 1
26
51
V D 1 : CY QUYT NH
age?
student? credit rating?
no yes fair excellent
<=30
>40
no no yes yes
yes
31..40
52
GiI THIU
Phng php gin tip :
Cc lut ca V d 1:
IF age = <=30 AND student = no
THEN buys_computer = no
IF age = <=30 AND student = yes
THEN buys_computer = yes
IF age = 3140
THEN buys_computer = yes
IF age = >40 AND credit_rating = excellent THEN
buys_computer = no
IF age = >40 AND credit_rating = fair
THEN buys_computer = yes
27
53
GiI THIU
Phng php trc tip :
Rt trch lut trc tip t DL : tht ton ph tun t
Cc lut c hc tun t. Mi lut trong lp C
i
s ph nhiu
mu ca C
i
nhng khng ph (hoc ph t) mu ca cc lp khc
Xy dng lut :
Bt u t lut rng
S dng hm Learn-One-Rule pht trin lut
Thm thuc tnh lm tng cht lng ca lut ( ph,
chnh xc)
Loi cc mu b ph bi lut ra khi DL
Lp li qu trnh trn cho n khi gp iu kin dng (khng
cn mu hoc o cht lng thp hn ngng do ngi
dng xc nh)
54
GiI THIU
(i) Original Data (ii) Step 1
V d : Ph tun t
28
55
GiI THIU
(iii) Step 2
R1
(iv) Step 3
R1
R2
V d : Ph tun t
56
GiI THIU
u im ca phng php da trn
lut :
D hiu nh cy quyt nh
D xy dng lut
Phn lp mu mi nhanh
Thi gian thi hnh tng t nh phng php
da trn cy quyt nh
29
57
LUT QUI NP - ILA
M.Tolun, 1998, ILA inductive learning
algorithm
Xc nh cc lut IF-THEN trc tip t tp
hun luyn (pht trin lut theo hng t
tng qut -> c th)
Chia tp hun luyn thnh cc bng con
theo tng gi tr ca lp.
Thc hin vic so snh cc gi tr ca
thuc tnh trong tng bng con v tnh s
ln xut hin.
Thuc tnh c dng phi s, gi tr ri rc
58
THUT TON ILA
Bc 1 : Chia bng c cha m mu thnh n bng con
(ng vi n gi tr ca thuc tnh lp)
( Bc 2 n bc 8 s c lp li cho mi bng
con)
Bc 2 : Khi to s lng thuc tnh kt hp j=1
Bc 3 : Xt tng bng con, to danh sch cc thuc
tnh kt hp (phn t danh sch c j thuc tnh)
Bc 4 : Vi mi phn t trong danh sch trn, m
s ln xut hin cc gi tr ca thuc tnh cc dng
cha nh du ca bng con ang xt, nhng gi tr
khng c xut hin nhng bng con khc.

Chn phn t kt hp u tin c s ln xut hin
ca gi tr thuc tnh nhiu nht v t tn l
max-combination.
30
59
THUT TON ILA
Bc 5 : Nu max-combination = 0 th j=j+1 v quay li
bc 3
Bc 6 : Trong bng con ang xt, nh du cc dng
c xut hin gi tr ca max-combination
Bc 7 : to lut
IF AND(thuc tnh = gi tr) (thuc max-combination)
THEN gi tr ca thuc tnh lp ng vi bng con ang
xt
Bc 8 :
Nu tt c cc dng u nh du
Nu cn bng con th chuyn qua bng con tip theo
v lp li t bc 2
Ngc li : chm dt thut ton
Ngc li (cn dng cha nh du) th quay li bc
4
60
V D 3 : THUT TON ILA
No Size Color Shape Decision
1 Va Xanh dng Hp Yes
2 Nh Nn No
3 Nh Cu Yes
4 Ln Nn No
5 Ln Xanh l cy Tr Yes
6 Ln Tr No
7 Ln Xanh l cy Cu Yes
31
61
V D 3 : BC 1
Bng con 1
No Size Color Shape Decision
1 (1) Va Xanh dng Hp Yes
2 (3) Nh Cu Yes
3 (5) Ln Xanh l cy Tr Yes
4 (7) Ln Xanh l cy Cu Yes
Bng con 2
No Size Color Shape Decision
1 (2) Nh Nn No
2 (4) Ln Nn No
3 (6) Ln Tr No
62
V D 3 : THUT TON ILA
Bng con 1
No Size Color Shape Decision
1 (1) Va Xanh dng Hp Yes
2 (3) Nh Cu Yes
3 (5) Ln Xanh l cy Tr Yes
4 (7) Ln Xanh l cy Cu Yes
B2 : j = 1
B3 : {[size], [color], [shape]}
B4 : max-combination = xanh l cy
B6 : nh du dng 3,4
B7 : R1 : IF color = xanh l cy THEN decision =Yes
B8 : Quay li B4
32
63
V D 3 : THUT TON ILA
Bng con 1
No Size Color Shape Decision
1 (1) Va Xanh dng Hp Yes
2 (3) Nh Cu Yes
3 (5) Ln Xanh l cy Tr Yes
4 (7) Ln Xanh l cy Cu Yes
B4 : max-combination = va
B6 : nh du dng 1
B7 : R2 : IF size = va THEN decision =Yes
B8 : quay li B4
64
V D 3 : THUT TON ILA
Bng con 1
No Size Color Shape Decision
1 (1) Va Xanh dng Hp Yes
2 (3) Nh Cu Yes
3 (5) Ln Xanh l cy Tr Yes
4 (7) Ln Xanh l cy Cu Yes
B4 : max-combination = cu
B6 : nh du dng 2
B7 : R3 : IF shape = cu THEN decision =Yes
B8 : chuyn qua bng con 2 vs bt du t B2
33
65
V D 3 : THUT TON ILA
Bng con 2
No Size Color Shape Decision
1 (2) Nh Nn No
2 (4) Ln Nn No
3 (6) Ln Tr No
66
V D 3 : THUT TON ILA
Bng con 2
No Size Color Shape Decision
1 (2) Nh Nn No
2 (4) Ln Nn No
3 (6) Ln Tr No
B2 : j = 1
B3 : {[size], [color], [shape]}
B4 : max-combination = Nn
B6 : nh du dng 1,2
B7 : R4 : IF shape = Nn THEN decision =No
B8 : Quay li B4
34
67
V D 3 : THUT TON ILA
Bng con 2
No Size Color Shape Decision
1 (2) Nh Nn No
2 (4) Ln Nn No
3 (6) Ln Tr No
B4 : max-combination = {}
B5 : j = j +1 =2 v quay li B3
B3 : {[size, color], [size, shape], [color, shape]}
B4 : max-combination = Ln
B6 : nh du dng 3
B7 : R5 : IF size = Ln AND color = THEN decision =No
B8 : kt thc
68
V D 3 : TP LUT
R1 : IF color = xanh l cy
THEN decision =Yes
R2 : IF size = va THEN decision =Yes
R3 : IF shape = cu
THEN decision =Yes
R4 : IF shape = Nn THEN decision =No
R5 : IF size = Ln AND color =
THEN decision =No
35
69
CC CNG VIC CN LM
1. Chun b bi 4 : Phn lp d liu
Xem ni dung bi 4 Phn 2.
Chun b BT.
Cch thc hin :
c slide, xem cc v d
Tham kho trn Internet v ti liu tham
kho
70
BI TP PHN 1
1. Hy cho bit chi tit cc bc ca phng php phn lp da trn
cy quyt nh, da trn lut (ILA).
2. Cho cy quyt nh, bn c 2 la chn :
a) Bin i cy thnh lut, sau loi bt lut kt qu
b) Loi bt nhnh ca cy, sau bin i cy thnh lut.
Hy cho bit cc u th ca a) so vi b) ?
3. Cho tp hun luyn nh trong v d 1 (mua, khng mua my
tnh)
a) S dng chi mc gini xy dng cy quyt nh. So snh kt
qu vi cy s dng li thng tin.
b) p dng thut ton ILA cho v d 1, so snh kt qu vi tp lut
rt ra t phng php cy quyt nh
c) Xc nh lp cho mu X =<age =<=30, income = medium,
student =yes, credit_rating =fair > da trn tp lut thu
c.
36
71
BI TP PHN 1
4. Cho tp hun luyn nh trong v d 2.
a) p dng phng php cy quyt nh ln v d 2
(khng s dng ct thuc tnh name phn chia
DL) v rt tp lut t cy. So snh vi tp lut c,
nhn xt.
b) S dng tp lut thu c t cy quyt nh xc
nh lp cho cc mu mi sau. So snh kt qu vi
vic s dng tp lut c trong bi ging.
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
72
BI TP PHN 1
5. Cho tp hun luyn sau. Gi s Chi Tennis l thuc tnh
lp.
a) S dng ln lt o Gain, ch mc gini xy dng
cy quyt nh. Bin i cy thnh lut. So snh.
b) S dng phng php ILA xc nh lut. So snh vi
cc lut tthu c cu a). Nhn xt ?
c) S dng ln lt cc tp lut thu c t cu a) , b)
xc nh lp cho mu mi. So snh kt qu.
Quang cnh Nhit m Sc gi Chi Tennis
Ma TB BT Mnh ?
Nng TB Cao Mnh ?
37
73
BI TP PHN 1
74
TI LIU THAM KHO
1. C. Apte and S. Weiss. Data mining with decision trees
and decision rules. Future Generation Computer
Systems, 13, 1997.
2. M. Kamber, L. Winstone, W. Gong, S. Cheng, and J.
Han. Generalization and decision tree induction:
Efficient classification in data mining. In Proc. 1997
Int. Workshop Research Issues on Data Engineering
(RIDE'97), pages 111-120, Birmingham, England, April
1997.
3. Mehmet R. Tolun, Saleh M. Abu-Soud. ILA, an
inductive learning algorithm for rule extraction. ESA
14(3), 4/1998, 361-370
38
75
Q & A

You might also like