Professional Documents
Culture Documents
2016w - 09 - Decision Trees-Solution
2016w - 09 - Decision Trees-Solution
-
" )(
:9
,
" .
N
S N , S x k , dk k 1 . dk {c1, , cC } -
) " "( ) ( c j
":
j 1,...,C
I {dk c j },
k 1
1
N
pj
: S
.1 :
Q(S ) 1 max pj
.2 :Gini
) Q(S ) j pj (1 pj
.3:
} j {1,,C
1
) j pj log2 (pj
pj
Q(S ) H (S ) j pj log2
) :Q(S
Q(S ) 0 .1 - ) pj 1 j(.
Q(S ) .2 ) .( p j 1 / C
:
A S - . -
, Sm , m 1,2,..., M M . A
- } {Sm :
Q Sm
Sm
N
m 1
Q S | A
-
" )(
Q Sm - . Sm
A S
)Q(S | A) Q(S ) Q(S | A
) -( } {Sm
. S ) Q( Q(S | A) ,
) (information gain . A
A )( ) Q(S | A ,
) Q(S | A.
1
, ,
, , ,
.
:
Dependent Attributes /
Decision Attributes
Result
)sunburned (positive
)none (negative
none
sunburned
sunburned
none
none
none
Weight
light
average
average
average
heavy
heavy
heavy
light
Height
average
tall
short
short
average
tall
average
short
Hair
blonde
blonde
brown
blonde
red
brown
brown
blonde
Name
Sarah
Dana
Alex
Annie
Emily
Pete
John
Katie
1
S .
:
3
3 5
5
H (S ) log log 0.954
8
8 8
5
-
-
() "
:
:Hair
Blonde
+2/-2
Brown
Red
0/-3
1/0
1
1 1
1
H (S | Hair blonde) log log 1
2
2 2
2
H (S | Hair brown ) 0 log 0 1 log1 0
H (S | Hair red ) 1log 1 0 log 0 0
Hair
:
1
3
1
1
H (S | Hair ) H (S ) H (S | Hair ) H (S ) ( 1 0 0) H (S )
2
8
8
2
:Height
Short
+1/-2
Average
+2/-1
Tall
0/-2
1
1 2
2
H (S | Height short ) log log 0.918
3
3 3
3
2
2 1
1
H (S | Height average) log log 0.918
3
3 3
3
H (S | Height tall ) 0
Height
:
3
3
2
H (S | Height ) H (S ) ( 0.918 0.918 0) H (S ) 0.69
8
8
8
:Weight
light
+1/-1
Average
+1/-2
Heavy
+1/-2
1
1 1
1
H (S | Weight light ) log log 1
2
2 2
2
1
1 2
2
H (S | Weight average) log log 0.918
3
3 3
3
1
1 2
2
H (S | Weight heavy ) log log 0.918
3
3 3
3
weight
:
2
3
3
H (S | weight ) H (S ) ( 1 0.918 0.918) H (S ) 0.9385
8
8
8
:Lotion
No
+3/-2
3
3 3
2
H (S | lotion no) log log 0.97
5
5 5
5
-
Yes
-
() "
H (S | lotion yes ) 0
+0/-3
lotion
:
5
3
H (S | lotion ) H (S ) ( 0.97 0) H (S ) 0.606
8
8
) (
.Hair
Hair=red Hair=brown
, .
:Hair=blonde .
:
Name
Sarah
Dana
Annie
Katie
Hair
blonde
blonde
blonde
blonde
Height
average
tall
short
short
Weight
light
average
average
light
Lotion
No
Yes
No
Yes
Result
sunburned (positive)
none (negative)
sunburned
none
: height
Short
+1/-1
Average
Tall
+1/0
0/-1
1
1 1
1
H (S , Hair | Height short ) log log 1
2
2 2
2
H (S , Hair | Height average ) 0
H (S , Hair | Height tall ) 0
:weight
light
+1/-1
Average
+1/-1
heavy
0/0
1
1 1
1
H (S , Hair | weight light ) log log 1
2
2 2
2
1
1 1
1
H (S , Hair | weight average) log log 1
2
2 2
2
H (S , Hair | Height heavy ) 0
:Lotion
No
Yes
+2/0
0/2
.Lotion ) (
-
" )(
:
Prediction
No
Yes
Sunburn
No Sunburn
No sunburn
Sunburn
second
splitting
criterion
Blonde +2/-2 Lotion
0/-3
1/0
First splitting
criterion
Hair
Brown
Red
)(overfitting
"" ""
" " .
.
:
.
: " " A
)Q(S | A
Q (S | A)
)Split(S , A
) Split(S , A . :