You are on page 1of 5

Pregunta 1: Armar el árbol de decisión

Family Family Economic Study Reason of Failure


Sex Relationship Level willing study Year
Female good good yes you no
Female excellent excellent yes average no
Male excellent vgood yes average no
Female vgood vgood yes you no
Male vgood vgood yes you no
Female vgood vgood yes you yes
Female excellent vgood yes you no
Male vgood good yes you yes
Female good good yes you yes
Female excellent good yes you yes
Male vgood vgood yes family no
Female good good yes you no
Male excellent good yes you no
Female vgood vgood yes you no
Female vgood good yes you yes
Female vgood vgood yes you no
Female excellent good yes you no
Male excellent good yes you no
Female good good no you yes
Female vgood good yes you no
Female good good yes you yes
Female excellent vgood yes you no
Male excellent good yes you no
Female excellent good yes you no

Fórmula de la Entropía
n
E ( S )=−∑ Pi log 2 (Pi)
i=1

Ganancia:

|S V|
G ( S , A )=E ( S )− ∑ |S|
E ( Sv )
v∈ values ( A )

Hallar la entropía:

( ) ( )
n
−17 17 7 7
E ( S )=−∑ Pi log 2 ( Pi ) E ( S )= log 2 − log 2
i=1 24 24 24 24

E ( S )=0.87
Hallar la ganancia para cada atributo:
|S V|
G ( S , A )=E ( S )− ∑ |S|
E ( Sv )
v∈ values ( A )

Para Sex

|S|=2 4 |S Female|=17 E ( S Female )=


−11
17
log 2
11 6
( )
− log 2
17 17
6
17
E ( S Female )=0.94( )

|S Male|=7 E ( S Male )=
−6
7
log 2
6 1
()
− log 2
7 7
2
7 ()
E ( S Male )=0. 22

|S V| 17 7
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.94 )− ( 0. 22 )G ( S , A )=0. 17
24
v∈ values ( A )

Para Family Relationship

|S|=2 4 |S good|=5E ( S good ) =¿− log 2


2
5 ( 25 )− 35 log ( 35 )E ( s
2 good ) =0.97

|S vgood|=9 E ( S vgood )=
−6
9
log 2
6 3
()
− log 2
9 9
3
9 ()
E ( S vgood )=0.92

|S excellent|=10E ( Sexcellent ) = 10 log2


−9
( 109 )− 101 log ( 101 ) E ( S
2 excellent ) =0.15

|S V| 5 9 10
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.9 7 )− ( 0. 92 ) − ( 0. 15 )
24 24
v∈ values ( A )

G ( S , A )=0. 29

Para Family Economic Level

|S|=24 |S good|=14 E ( S good ) =¿−


8
14
log 2
8
( )
6
− log 2
14 14
6
14
E ( s good ) =0.9 9( )

|S vgood|=9 E ( S vgood )=
−8
9
log 2
8 1
()
− log 2
9 9
1
9 ()
E ( S vgood )=0. 50
|S excellent|=1E ( Sexcellent ) =
−1
1
log 2 ()
1 0
− log 2
1 1
0
1 ()
E ( Sexcellent ) =0. 0

|S V| 14 9 1
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.9 9 )− ( 0.5 ) − ( 0. 0 )
24 24
v∈ values ( A )

G ( S , A )=0. 1 4

Para Study willing

|S no|=1 E ( Sno ) =¿− 1 log 2


0
( 01 )− 11 log ( 11 )E ( s )=0
2 no

|S yes|=23E ( S yes )=
−17
23
log 2 ( )
17 6
− log 2
23 23
6
23 ( )
E ( S yes )=0.83

|S V| 1 23
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0 )− ( 0.83 )G ( S , A )=0. 10
24
v∈ values ( A )

Para Reason of study

|S|=24 |S you|=21E ( S you )=¿−


14
21
log 2 ( )
14 7
− log 2
21 21
7
21 ( )
E ( s goyouod ) =0.9 2

|S average|=2 E ( Saverage )=
−2
2
log 2 ()
2 0
− log 2
2 2
0
2 ()
E ( Saverage )=0 .0

|S family|=1E ( S family ) =
−1
1
log 2 ()
1 0
− log 2
1 1
0
1 ()
E ( S family ) =0.0

|S V| 21 2 1
G ( S , A )=E ( S )− ∑ |S|
E ( S v )G ( S , A )=0.90−
24
( 0.9 2 ) − ( 0. 0 )− ( 0.0 )
24 24
v∈ values ( A )

G ( S , A )=0. 095
Ordenamos de mayor a menos ganancia

- Family Relationship : 0.29


- Sex : 0.17
- Family Economic Level: 0.14
- Study willing: 0.10
- Reason of study: 0.095
Finalmente, armamos el árbol de decisión

Pregunta 2: Aplicaremos la técnica de reducción del chi2

Los datos son del dataset anterior. Analizaremos si debemos reducir el atributo Sex si no tiene mucha
importancia para la variable Failure Year.

Failure Year
Sex Yes No Total
Male 1 6 7
Female 6 11 17
Total 7 17 24

Paso 1: Definir la hipótesis

H o : Notiene asoación entre las variables Sex y Failure Year


H l :Si hay asociación entre las variables Sex y FailureYear
Paso 2: Defina el nivel de significancia para las pruebas
X =0.05
Paso 3: Calcular el grado de libertad

gl=( r −1 )( c−1 ) gl=( 2−1 )( 2−1 ) gl=1

Paso 4: Establecer el valor crítico de la H o para la distribución X 2


2
X 1 ;0.05 =3,8415
Paso 5: Calcular el valor estadístico de contraste

7∗7 7∗17
1→ =2.04 ; 6 → =4.96
24 24
7∗7 17∗17
6→ =2.04 ; 11→ =12.04
24 24
Paso 6:

( f −f t ) 1−2.04 6−4.96 6−4.96 11−12.04


x 2 ;0.05=∑
2 2
x 2 ;0.05= + + + ¿−0.17682
ft 2.04 4.96 4.96 12.04

Paso 7:

2 2 2 2
X > X tabla X < X tabla

Como el resultado obtenido es -0,18 entonces se rechaza H o

Paso 8: Interpretación

Con los datos de nuestro estudio, tenemos suficiente evidencia para rechazar H o de que NO hay
asociación entre las variables Sex y Failure Year.

You might also like