Professional Documents
Culture Documents
Procedimiento Matemático
Procedimiento Matemático
Fórmula de la Entropía
n
E ( S )=−∑ Pi log 2 (Pi)
i=1
Ganancia:
|S V|
G ( S , A )=E ( S )− ∑ |S|
E ( Sv )
v∈ values ( A )
Hallar la entropía:
( ) ( )
n
−17 17 7 7
E ( S )=−∑ Pi log 2 ( Pi ) E ( S )= log 2 − log 2
i=1 24 24 24 24
E ( S )=0.87
Hallar la ganancia para cada atributo:
|S V|
G ( S , A )=E ( S )− ∑ |S|
E ( Sv )
v∈ values ( A )
Para Sex
|S Male|=7 E ( S Male )=
−6
7
log 2
6 1
()
− log 2
7 7
2
7 ()
E ( S Male )=0. 22
|S V| 17 7
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.94 )− ( 0. 22 )G ( S , A )=0. 17
24
v∈ values ( A )
|S vgood|=9 E ( S vgood )=
−6
9
log 2
6 3
()
− log 2
9 9
3
9 ()
E ( S vgood )=0.92
|S V| 5 9 10
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.9 7 )− ( 0. 92 ) − ( 0. 15 )
24 24
v∈ values ( A )
G ( S , A )=0. 29
|S vgood|=9 E ( S vgood )=
−8
9
log 2
8 1
()
− log 2
9 9
1
9 ()
E ( S vgood )=0. 50
|S excellent|=1E ( Sexcellent ) =
−1
1
log 2 ()
1 0
− log 2
1 1
0
1 ()
E ( Sexcellent ) =0. 0
|S V| 14 9 1
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.9 9 )− ( 0.5 ) − ( 0. 0 )
24 24
v∈ values ( A )
G ( S , A )=0. 1 4
|S yes|=23E ( S yes )=
−17
23
log 2 ( )
17 6
− log 2
23 23
6
23 ( )
E ( S yes )=0.83
|S V| 1 23
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0 )− ( 0.83 )G ( S , A )=0. 10
24
v∈ values ( A )
|S average|=2 E ( Saverage )=
−2
2
log 2 ()
2 0
− log 2
2 2
0
2 ()
E ( Saverage )=0 .0
|S family|=1E ( S family ) =
−1
1
log 2 ()
1 0
− log 2
1 1
0
1 ()
E ( S family ) =0.0
|S V| 21 2 1
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.9 2 ) − ( 0. 0 )− ( 0.0 )
24 24
v∈ values ( A )
G ( S , A )=0. 095
Ordenamos de mayor a menos ganancia
Los datos son del dataset anterior. Analizaremos si debemos reducir el atributo Sex si no tiene mucha
importancia para la variable Failure Year.
Failure Year
Sex Yes No Total
Male 1 6 7
Female 6 11 17
Total 7 17 24
7∗7 7∗17
1→ =2.04 ; 6 → =4.96
24 24
7∗7 17∗17
6→ =2.04 ; 11→ =12.04
24 24
Paso 6:
Paso 7:
2 2 2 2
X > X tabla X < X tabla
Paso 8: Interpretación
Con los datos de nuestro estudio, tenemos suficiente evidencia para rechazar H o de que NO hay
asociación entre las variables Sex y Failure Year.