You are on page 1of 29

nh ngha cy quyt nh

Cy quyt nh l mt kiu m hnh d bo K thut hc my dng trong cy quyt nh c gi l hc bng cy quyt nh, hay ch gi vi ci tn ngn gn l cy quyt nh Phng tin c tnh m t dnh cho vic tnh ton cc xc sut c iu kin S kt hp ca cc k thut ton hc v tnh ton nhm h tr vic m t, phn loi v tng qut ha mt tp d liu cho trc

nh ngha cy quyt nh
Cy quyt nh l mt cu trc phn cp ca cc nt v cc nhnh
3 loi nt trn cy: Nt gc Nt ni b: mang tn thuc tnh ca CSDL Nt l: mang tn lp Ci Nhnh: mang gi tr c th ca thuc tnh

Cy quyt nh c s dng trong phn lp bng cch duyt t nt gc ca cy cho n khi ng n nt l, t rt ra lp ca i tng cn xt

V d
David l qun l ca mt cu lc b nh golf ni ting. Anh ta ang c rc ri chuyn cc thnh vin n hay khng n. C ngy ai cng mun chi golf nhng s nhn vin cu lc b li khng phc v. C hm, khng hiu v l do g m chng ai n chi, v cu lc b li tha nhn vin. Mc tiu ca David l ti u ha s nhn vin phc v mi ngy bng cch da theo thng tin d bo thi tit on xem khi no ngi ta s n chi golf. thc hin iu , anh cn hiu c ti sao khch hng quyt nh chi v tm hiu xem c cch gii thch no cho vic hay khng. Vy l trong hai tun, anh ta thu thp thng tin v: Tri (outlook) (nng (sunny), nhiu my (overcast) hoc ma (raining)). Nhit (temperature) bng F. m (humidity). C gi mnh (wind) hay khng. V tt nhin l s ngi n chi golf vo hm . David thu c mt b d liu gm 14 dng v 5 ct.

V d
Day Outlook Temp. Humidity Wind Play?

1
2 3 4 5 6 7 8 9 10 11 12 13

Sunny
Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast

Hot
Hot Hot Mild Cool Cool Cool Mild Cold Mild Mild Mild Hot

High
High High High Normal Normal Normal High Normal Normal Normal High Normal

Weak
Strong Weak Weak Weak Strong Weak Weak Weak Strong Strong Strong Weak

No
No Yes Yes No Yes No Yes Yes Yes Yes Yes Yes

14

Rain

Mild

High

Strong

No

V d
Kim tra khi no chi golf, khi no khng chi
Outlook

Sunny

Overcast

Rain

Humidity

Yes

Wind

High No

Normal Yes

Strong No

Weak Yes

V d
Kim tra khi no chi golf, khi no khng chi
Outlook

Sunny

Overcast

Rain

Humidity

Mi nt mang mt thuc tnh (bin c lp)

High No

Normal Yes

Mi nhnh tng ng vi mt gi tr ca thuc tnh


Mi nt l l mt lp (bin ph thuc)

Duyt cy quyt nh
Day 1 Outlook Sunny Temp. Hot Humidity High Wind Weak Play? No

Outlook

Sunny

Overcast

Rain

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

Biu thc lun l


Outlook=Sunny Wind=Weak
Outlook = AND = v = OR = hoc

Sunny

Overcast

Rain

Wind

No

No

Strong

Weak

No

Yes

Biu thc lun l


Outlook=Sunny Wind=Weak
Outlook

Sunny

Overcast

Rain

Yes

Wind

Wind

Strong

Weak

Strong

Weak

No

Yes

No

Yes

Biu thc lun l


(Outlook=Sunny Humidity=Normal) Outlook=Overcast (Outlook=Rain Wind=Weak)
Outlook

Sunny

Overcast

Rain

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

Xy dng cy quyt nh
Cy c thit lp t trn xung di Ri rc ha cc thuc tnh dng phi s Cc mu hun luyn nm gc ca cy Chn mt thuc tnh phn chia thnh cc nhnh. Thuc tnh c chn da trn o thng k hoc o heuristic Tip tc lp li vic xy dng cy quyt nh cho cc nhnh

Xy dng cy quyt nh
iu kin dng
Tt c cc mu ri vo mt nt thuc v cng mt lp (nt l) Khng cn thuc tnh no c th dng phn chia mu na Khng cn li mu no ti nt

La chn thuc tnh


o la chn thuc tnh: Thuc tnh c chn l thuc tnh c li nht cho qu trnh phn lp (to ra cy nh nht) C 2 o thng dng 1. li thng tin (Information gain)
Gi s tt c cc thuc tnh dng phi s C th bin i p dng cho thuc tnh s

2. Ch s Gini (Gini index)


Gi s tt c cc thuc tnh dng s Gi s tn ti mt vi gi tr c th phn chia gi tr ca tng thuc tnh C th bin i p dng cho thuc tnh phi s

li thng tin (Information gain)


S: s lng tp hun luyn Si: s cc mu ca S nm trong lp Ci vi i = {1, , m} Thng tin cn bit phn lp mt mu

si si I( s1,s 2 ,...,s m ) log 2 s i 1 s

li thng tin
Thuc tnh A c cc gi tr {a1, a2, ,an} Dng thuc tnh A phn chia tp hun luyn thnh n tp con {S1, S2, , Sn} Sij : s mu ca lp Ci thuc tp con Sj (A=aj) Entropy ca thuc tnh A: n 1j mj 1j mj j 1 li thng tin da trn phn nhnh bng thuc tnh A:

E(A)

s ... s s

I(s ,...,s )

G(A) I(s 1 , s 2 ,..., s m ) E(A)


Ti mi cp, chng ta chn thuc tnh c li ln nht phn nhnh cy hin ti

V d
Day Outlook Temp. Humidity Wind Play?

1
2 3 4 5 6 7 8 9 10 11 12 13

Sunny
Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast

Hot
Hot Hot Mild Cool Cool Cool Mild Cold Mild Mild Mild Hot

High
High High High Normal Normal Normal High Normal Normal Normal High Normal

Weak
Strong Weak Weak Weak Strong Weak Weak Weak Strong Strong Strong Weak

No
No Yes Yes No Yes No Yes Yes Yes Yes Yes Yes

14

Rain

Mild

High

Strong

No

li thng tin, v d
Ta c
S = 14 m=2 C1 = Yes, C2 = No S1 = 9, S2 = 5

9 9 5 5 I(S1 , S2 ) I(9,5) log 2 log 2 0.940 14 14 14 14

li thng tin, v d
Humidity

High

Normal

3 3 4 4 log 2 log 2 0.985 7 7 7 7 6 6 1 1 log 2 log 2 0.592 7 7 7 7

[3+, 4-]

[6+, 1-]

E=0.985

E=0.592

Gain(S,Humidity) =0.940 (7/14)*0.985 (7/14)*0.592 =0.151


Ghi ch: tnh log25 bng my tnh in t, nhn: 5 log / 2 log =

li thng tin, v d
Wind

6 6 2 2 log 2 log 2 0.811 8 8 8 8 3 3 3 3 log 2 log 2 1.000 6 6 6 6

Weak

Strong

[6+, 2-]

[3+, 3-]

E=0.811

E=1.000

Gain(S,Wind) =0.940 (8/14)*0.811 (6/14)*1.000 =0.048

li thng tin, v d
Outlook

Sunny

Overcast

Rain

[2+, 3-]

[4+, 0-]

[3+, 2-]

E=0.971

E=0.000

E=0.971

Gain(S,Wind)=0.048
Gain(S,Humidity)=0.151

Gain(S,Outlook) =0.940 (5/14)*0.971 (4/14)*0.0 (5/14)*0.0971 =0.247

Ch s Gini
Ch s Gini ca nt t:

GINI(t ) 1 p( j t )
j

Trong p( j t ) l tn sut ca lp j trong nt t


Ln nht l 1-1/nc khi cc mu phn b u trn cc lp Thp nht l 0 khi cc mu ch thuc v mt lp

V d ch s Gini
GINI(t ) 1 p( j t )
j 2

C1 C2 C1 C2 C1 C2

0 6 1 5 2 4

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 GINI = 1 (P(C1)2+P(C2)2) = 1 (0+1) = 0 P(C1) = 1/6 P(C2) = 5/6 GINI = 1 (1/6)2 (5/6)2 = 0.278 P(C1) = 2/6 P(C2) = 4/6 GINI = 1 (2/6)2 (4/6)2 = 0.444

Phn nhnh bng ch s Gini


Khi phn chia nt p thnh k nhnh, cht lng ca php chia c tnh bng:

GINI chia
trong

ni GINI (i ) i 1 n

ni l s mu trong nt i n l s mu trong nt p

Chn thuc tnh c GINIchia nh nht phn nhnh

Phn nhnh thuc tnh nh phn


Ch phn thnh 2 nhnh
A

p C1

p 6

N1 Gini(N1) =1-(5/6)2-(2/6)2 =0.194

N2

C2

Gini=0.500

Gini(N2) =1-(1/6)2-(4/6)2 =0.528

N1
C1 C2 5 2

N2
1 4

Ginichia =7/12*0.194 +5/12*0.528 =0.333

Gini=0.333

Phn chia thuc tnh c gi tr lin tc


Da trn mt gi tr nu mun phn chia nh phn Da trn vi gi tr nu mun c nhiu nhnh Vi mi gi tr tnh cc mu thuc mt lp theo dng A<v v A>v Cch chn gi tr v n gin: vi mi gi tr v trong CSDL u tnh Gini ca n v ly gi tr c Gini nh nht km hiu qu
TID Refund Marital Tax Cheat

1
2 3 4 5 6 7 8 9 10

Yes
No No Yes No No Yes No No No

Single
Married Single Married Divorced Married Divorced Single Married Single

125K
100K 70K 120K 95K 60K 220K 85K 75K 90K

No
No No No Yes No No Yes No Yes

Tax
> 80K < 80K

Phn chia thuc tnh c gi tr lin tc


Cch chn gi tr v hiu qu:
Sp xp cc gi tr tng dn Chn gi tr trung bnh ca tng gi tr ca thuc tnh phn chia v tnh ch s gini Chn gi tr phn chia c ch s gini thp nht

Bin i cy quyt nh thnh lut


Biu din tri thc di dng lut IFTHEN Mi lut to ra t mi ng dn t gc n l Mi cp gi tr thuc tnh dc theo ng dn to nn php kt (php AND v) Cc nt l mang tn ca lp

Bin i cy quyt nh thnh lut


Outlook Sunny Humidity Overcast Yes Rain Wind

High
No

Normal
Yes

Strong
No

Weak
Yes

R1 : R2 : R3 : R4 : R5 :

If If If If If

(Outlook=Sunny) (Humidity=High) Then Play=No (Outlook=Sunny) (Humidity=Normal) Then Play=Yes (Outlook=Overcast) Then Play=Yes (Outlook=Rain) (Wind=Strong) Then Play=No (Outlook=Rain) (Wind=Weak) Then Play=Yes

u im ca cy quyt nh
Cy quyt nh d hiu Vic chun b d liu cho mt cy quyt nh l c bn hoc khng cn thit Cy quyt nh c th x l c d liu c gi tr bng s v d liu c gi tr l tn th loi Cy quyt nh l mt m hnh hp trng C th thm nh mt m hnh bng cc kim tra thng k Cy quyt nh c th x l tt mt lng d liu ln trong thi gian ngn

You might also like