You are on page 1of 23

Taller 5

La siguiente tabla se compone de datos de entrenamiento de una base de datos de los empleados. Los
datos han sido generalizados. Por ejemplo, "31-35" para la edad representa el rango de edad de 31 a 35.
Para una entrada de fila dada, representa el número de tuplas de datos que tienen los valores de
departamento, estado, edad y salario dado en esa fila.
department
sales
sales
sales
systems
systems
systems
systems
marketing
marketing
secretary
secretary

status
senior
junior
junior
junior
senior
junior
senior
senior
junior
senior
junior

age
31-35
26-30
31-35
21-25
31-35
26-30
41-45
36-40
31-35
46-50
26-30

salary
46K-50K
26K-30K
31K-35K
46K-50K
66K-70K
46K-50K
66K-70K
46K-50K
41K-45K
36K-40K
26K-30K

count
30
40
40
20
5
3
3
10
4
4
6

Sea status el atributo etiqueta de clase.

1) Usando weka,
a. Construir el árbol usando id3, j48 y random forest. Compare los resultados
b. Bayes net
c. Multilayer perceptron
d. LibSVM, pruebe con 4 diferentes tipos de kernel. Compare los resultados.
e.
2) Otro método para solucionar las redes bayesianas es el de Belief propagation, tambien conocido
como sum-product message passing. Describa en brevemente en que consiste.
3) Hacer el ejercicio 9.1 del libro

Solución

Id3

=== Classifier model (full training set) ===

Id3

salary = 46K-50k: senior
salary = 26K-30K: junior
salary = 31K-35K: junior
salary = 46K-50K
| department = sales: null
| department = systems: junior
| department = marketing: senior
| department = secretary: null
salary = 66K-70K: senior
salary = 41K-45k: junior
salary = 36K-40K: senior

Time taken to build model: 0.03 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances

165

100

%

classified as 52 0 | a = senior 0 113 | b = junior J48 === Classifier model (full training set) === J48 pruned tree . 1 0 1 1 === Confusion Matrix === a b <-.Incorrectly Classified Instances Kappa statistic 0 0 % 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 Total Number of Instances 165 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 senior 1 0 1 1 1 1 junior 1 1 Weighted Avg.

0) salary = 46K-50K | department = sales: junior (0.0) | department = secretary: junior (0.0) | department = marketing: senior (10.0) salary = 26K-30K: junior (46.0) | department = systems: junior (23.0) salary = 36K-40K: senior (4.------------------ salary = 46K-50k: senior (30.0) salary = 31K-35K: junior (40.0) Number of Leaves : 10 Size of the tree : 12 Time taken to build model: 0.02 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 165 100 % .0) salary = 41K-45k: junior (4.0) salary = 66K-70K: senior (8.

classified as 52 0 | a = senior 0 113 | b = junior random forest === Classifier model (full training set) === Random forest of 10 trees. 1 0 1 1 === Confusion Matrix === a b <-. Out of bag error: 0 .Incorrectly Classified Instances Kappa statistic 0 0 % 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 Total Number of Instances 165 % === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 senior 1 0 1 1 1 1 junior 1 1 Weighted Avg. each constructed while considering 3 random features.

Time taken to build model: 0.04 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 165 Incorrectly Classified Instances Kappa statistic 100 0 0 % % 1 Mean absolute error 0.0199 Relative absolute error 0.0029 Root mean squared error 0.6805 % Root relative squared error 4.2893 % Total Number of Instances 165 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 senior 1 0 1 1 1 1 junior 1 1 Weighted Avg. 1 0 === Confusion Matrix === 1 1 .

Lo que no hace el algoritmo id3. J48 nos muestra el número de hojas que posee el árbol y el tamaño de este. Ambos métodos muestran un porcentaje de error nulo. b.a b <-.classified as 52 0 | a = senior 0 113 | b = junior     .2178322167754 . El algoritmo id3 y j48 muestra un conteo de la relación entre las tuplas mostradas anteriormente. Bayes Net === Classifier model (full training set) === Bayes Network Classifier not using ADTree #attributes=4 #classindex=1 Network structure (nodes followed by parents) department(4): status status(2): age(6): status salary(7): status LogScore Bayes: -671. El algoritmo j48 es más preciso que el algoritmo randomForest debido a las diferencias de los porcentajes de errores que muestra cada método.

6156 % Total Number of Instances 165 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0.4242 % 0.963 1 senior 0.965 0.945 Mean absolute error 0.416758927013 LogScore AIC: -696.0912 6.5758 % 4 2.982 1 junior .035 0 0.416758927013 Time taken to build model: 0.3016 % Root relative squared error 19.4529682985715 LogScore ENTROPY: -667.0007744477259 LogScore MDL: -741.0273 Root mean squared error Relative absolute error 0.01 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 161 Incorrectly Classified Instances Kappa statistic 97.929 1 1 0.965 0.LogScore BDeu: -734.

011 0.2131181020972868 Node 8 -0.3229943341191277 Node 6 1. AIC.976 0.5914245531409845 Node 9 -2.classified as 52 0 | a = senior 4 109 | b = junior  En este algoritmo de clasificación proporciona datos de los resultaos de los logaritmos referentes algunas variables como la entropía.5278889125307347 Node 7 -2. los Bayes.8508751465691542 Node 3 -2.977 0. c. BDeu.4870386211949875 Node 5 -1. 0. MDL.354821458534509 Node 4 -1.270779949723552 .976 0.Weighted Avg. También se puede observas las instancias que se clasificaron correctamente y las que se clasificaron incorrectamente.976 1 === Confusion Matrix === a b <-.0587896015776685 Node 10 2. Multilayer perceptron === Classifier model (full training set) === Sigmoid Node 0 Inputs Weights Threshold 2.7394370243645954 Node 2 -1.

07755042992069021 Attrib department=sales 0.7825884536066616 Node 2 1.323626175638421 Node 4 1.22690884175757925 Attrib department=systems 0.Sigmoid Node 1 Inputs Weights Threshold -2.4594876918432313 Node 7 2.9031969460307472 Attrib age=41-45 -0.232715726449767 Node 8 0.297941872862322 Sigmoid Node 2 Inputs Weights Threshold -0.477966841443966 Node 5 1.6801335831028799 Node 9 2.304880433881045 Node 6 -1.24524874175681483 Attrib age=36-40 -0.7708789960391705 Attrib age=21-25 0.1545650553116968 Attrib department=marketing -0.051171802934638 Node 10 -2.028916340821306043 Attrib department=secretary -0.02750617633512445 Attrib age=26-30 0.3173972567786108 .9372910085639369 Attrib age=46-50 -0.2113862149709065 Attrib age=31-35 -0.85191579937515 Node 3 2.

021132001738419306 Attrib department=secretary -0.01617542844596184 Attrib age=26-30 0.22057503111112434 Attrib age=31-35 0.6690483312961746 Attrib salary=31K-35K 1.19633050850394618 Attrib department=systems 0.Attrib salary=46K-50k -1.596029591410492 Attrib salary=31K-35K 1.41474256360460926 Attrib salary=46K-50k -1.448381199332202 Attrib salary=26K-30K 0.9840722037777665 Attrib salary=36K-40K -0.0619222218399373 .8931249639552377 Attrib salary=41K-45k 0.32173827499151414 Sigmoid Node 3 Inputs Weights Threshold -0.11767253931489512 Attrib salary=66K-70K -1.14350673033770345 Attrib department=marketing -0.2752012547415663 Attrib age=36-40 -1.0622717118790497 Attrib age=41-45 -0.09160443154315269 Attrib department=sales 0.073614573206162 Attrib salary=46K-50K 0.2721688131980107 Attrib salary=26K-30K 0.9011003972162315 Attrib age=21-25 1.2615937981296461 Attrib salary=46K-50K 0.084971185421792 Attrib age=46-50 -0.10622100546611281 Attrib salary=66K-70K -0.

0943505094026181 Attrib department=secretary -0.14725072714787346 Attrib age=31-35 0.7376105027424174 Attrib salary=36K-40K -0.010512235705337582 Attrib age=26-30 0.7498866595807208 Attrib age=21-25 0.0701392832809675 Attrib salary=26K-30K 0.10504649400363189 .1464959547224344 Attrib age=36-40 -0.15069976965985996 Attrib department=systems 0.Attrib salary=41K-45k 1.13349935484850442 Attrib salary=66K-70K -0.992235520740865 Attrib salary=46K-50K 0.1155548070459484 Attrib salary=36K-40K -0.7580525890150757 Attrib age=46-50 -0.2556075853274041 Attrib salary=46K-50k -1.7873805984482601 Attrib age=41-45 -0.4197992341286937 Sigmoid Node 4 Inputs Weights Threshold -0.020643481201846912 Attrib department=sales 0.31165331988873113 Sigmoid Node 5 Inputs Weights Threshold -0.7751334030115394 Attrib salary=41K-45k 0.12379838338640137 Attrib department=marketing -0.5473953000497558 Attrib salary=31K-35K 0.

15300803282270037 Attrib department=marketing -0.7010453789252467 Attrib salary=36K-40K -0.670937831354511 Attrib salary=41K-45k 0.02098097843654761 Attrib department=secretary -0.0739998323577446 .6482292510400863 Attrib age=21-25 0.09557922327970046 Attrib department=sales -0.1328719579969402 Attrib department=secretary 0.6915153763340572 Attrib age=41-45 -0.1888580478883902 Attrib department=systems 0.028572139713019157 Attrib age=31-35 -0.7359220880188584 Attrib age=46-50 -0.5262317715574415 Attrib salary=31K-35K 0.1280029758315231 Attrib age=31-35 -0.Attrib department=sales 0.017009595286887676 Attrib age=26-30 0.19586354097092593 Attrib age=36-40 -0.9182024924100928 Attrib salary=46K-50K 0.10209290440196074 Attrib department=marketing -0.22576085497477247 Sigmoid Node 6 Inputs Weights Threshold 0.1188196030377099 Attrib salary=66K-70K -0.0108613445175572 Attrib salary=26K-30K 0.018139053459302966 Attrib department=systems -0.29334007131163364 Attrib salary=46K-50k -1.

045120149940549435 Sigmoid Node 7 Inputs Weights Threshold -0.3457681026476964 Attrib age=21-25 -0.10589748590134765 Attrib salary=66K-70K 0.06929754238286273 Attrib salary=46K-50k 0.06160115651344392 Attrib department=sales 0.39852936666748573 Attrib age=46-50 0.9906708595270766 Attrib age=41-45 -0.172272506671537 Attrib department=marketing -0.4818491178517223 Attrib salary=46K-50K -0.3434715967606044 Attrib salary=41K-45k -0.042763003576478054 Attrib age=36-40 0.5752678422949488 Attrib salary=36K-40K 0.20575332717275158 Attrib age=36-40 -1.8651532630509001 Attrib age=21-25 0.3741253602958736 Attrib salary=26K-30K -0.047645476787963 Attrib age=46-50 -0.24106617744959774 Attrib age=31-35 0.46939913970448427 Attrib age=41-45 -0.Attrib age=26-30 -0.029607224853917643 Attrib department=secretary -0.3535834941223248 .029991616105111238 Attrib age=26-30 0.24297175284159717 Attrib salary=31K-35K -0.2134113259865331 Attrib department=systems 0.

4530072183024529 Attrib age=21-25 0.1819371137499333 Attrib salary=46K-50K 0.06807690704999361 Attrib department=secretary -0.10660422983302109 Attrib salary=66K-70K -0.03799085042854391 Attrib age=36-40 -0.6493573607930859 Attrib salary=31K-35K 1.3920240148745457 Attrib age=46-50 -0.08736108359059333 Attrib age=26-30 0.15217654915023798 Attrib department=systems 0.0472094583119319 Attrib department=sales 0.3802185575469776 Attrib salary=31K-35K 0.064741079546573 Attrib salary=41K-45k 1.1205586069197979 Attrib salary=46K-50k -0.6066272566318488 Attrib salary=46K-50K 0.5003986272198906 Attrib age=41-45 -0.3320427568391321 .12596930399229297 Attrib salary=66K-70K -1.024785605611675223 Attrib age=31-35 -0.11722287931424957 Attrib department=marketing -0.605742149160759 Attrib salary=26K-30K 0.Attrib salary=46K-50k -1.3944593594329362 Sigmoid Node 8 Inputs Weights Threshold -0.4332784615759104 Attrib salary=26K-30K 0.1431299292474153 Attrib salary=36K-40K -0.

186853891625651 Attrib salary=46K-50K 0.0099505342882553 Attrib salary=36K-40K -0.6385529218748827 Attrib salary=31K-35K 1.17691559953689145 Attrib age=31-35 0.Attrib salary=41K-45k 0.9588501691188384 Attrib salary=41K-45k 1.42400945288403125 Attrib salary=36K-40K -0.362947971136462 Sigmoid Node 10 Inputs Weights Threshold 0.8489877575101271 Attrib age=21-25 0.014746998161577226 Attrib department=secretary -0.2608835537805671 Attrib age=36-40 -1.0141091744602124 Attrib age=46-50 -0.18487395033283963 Attrib department=systems 0.09908187740254618 .39718995571738797 Attrib salary=46K-50k -1.08424420101244516 Attrib department=sales 0.10285393387441728 Sigmoid Node 9 Inputs Weights Threshold -0.14603610059077612 Attrib salary=66K-70K -0.14441985599862148 Attrib department=marketing -0.9380267979491645 Attrib age=41-45 -0.3435954859187489 Attrib salary=26K-30K 0.02784723014273439 Attrib age=26-30 0.

4117530818438961 Attrib salary=31K-35K -0.8460358832180466 Attrib salary=36K-40K 0.11828014170484517 Attrib age=36-40 0.05342457992841399 Attrib department=systems -0.7689602300444687 Attrib salary=46K-50K -0.Attrib department=sales -0.22686248383226118 Class senior Input Node 0 Class junior Input Node 1 .854775905742494 Attrib salary=26K-30K -0.09957529167902422 Attrib age=26-30 -0.7183111474183733 Attrib salary=41K-45k -0.14338709613151754 Attrib age=31-35 -0.1192290423182317 Attrib department=marketing -0.05846524907628152 Attrib department=secretary 0.09902009869134437 Attrib salary=66K-70K 0.2309416143733404 Attrib salary=46K-50k 0.7313360476645168 Attrib age=41-45 0.7424640467459025 Attrib age=46-50 0.5969152237408825 Attrib age=21-25 -0.

2424 Root mean squared error Relative absolute error 0.Time taken to build model: 2.4924 56.95 level) Total Number of Instances 50 165 === Detailed Accuracy By Class === % . original code by Yasser EL-Manzalawy (= WLSVM) Time taken to build model: 0.95 level) 75.7576 % 40 24. region size (0.9548 % Coverage of cases (0.09 seconds  Mediante este algoritmo podemos observar cada uno de los nodos con cada uno de sus atributos y los pesos que corresponde a cada nodo.7576 % Mean rel. LibSVM tipo de Kernel: Linear === Classifier model (full training set) === LibSVM wrapper. D.14 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 125 75.0304 % Root relative squared error 105.3429 Mean absolute error 0.2424 % Incorrectly Classified Instances Kappa statistic 0.

053 0.386 0.726 0.465 0.759 0. 0.4409 0.843 0.346 0.classified as 18 34 | a = senior 6 107 | b = junior LibSVM Tipo de Kernel : Polinomial === Classifier model (full training set) === LibSVM wrapper.758 0.758 0.474 0.06 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error 132 80 % 33 20 % 0.947 0.466 senior 0.4472 0.647 === Confusion Matrix === a b <-. original code by Yasser EL-Manzalawy (= WLSVM) Time taken to build model: 0.346 0.386 0.647 0.654 0.756 0.947 0.664 .750 0.386 0.755 junior Weighted Avg.647 0.TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.2 0.

845 0.000 0. region size (0.365 0.565 senior 1.708 .683 0.535 0.03 seconds 0.2251 % Root relative squared error 96.classified as 19 33 | a = senior 0 113 | b = junior LibSVM tipo de Kernel: Funcion Radial === Classifier model (full training set) === LibSVM wrapper.Relative absolute error 46.000 0.766 0. original code by Yasser EL-Manzalawy (= WLSVM) Time taken to build model: 0.774 junior Weighted Avg.000 1.683 === Confusion Matrix === a b <-.800 0.774 1.532 0.95 level) 80 Mean rel.683 0.635 0. 0.873 0.2382 % Coverage of cases (0.95 level) Total Number of Instances % 50 % 165 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.365 0.532 0.000 0.435 0.532 0.800 0.

95 level) Total Number of Instances % 165 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 1.95 level) 100 % 50 % Mean rel. 1.000 1.000 1.000 0.000 0.000 1.000 1.=== Stratified cross-validation === === Summary === Correctly Classified Instances 165 Incorrectly Classified Instances Kappa statistic 100 0 0 % % 1 Mean absolute error 0 Root mean squared error Relative absolute error 0 0 % Root relative squared error 0 Coverage of cases (0.000 1.000 junior Weighted Avg.000 1.000 1. region size (0.000 1.000 senior 1.000 1.000 1.000 1.classified as 52 0 | a = senior 1.000 === Confusion Matrix === a b <-.000 1.000 1.000 1.000 0.000 1.000 1.000 .000 1.

6667 % Mean rel.3333 % Incorrectly Classified Instances Kappa statistic 0.95 level) 86.1333 Root mean squared error Relative absolute error 0. original code by Yasser EL-Manzalawy (= WLSVM) Time taken to build model: 0.95 level) Total Number of Instances 50 % 165 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class .5782 % Coverage of cases (0.3651 30.0 113 | b = junior LibSVM tipo de Kernel: Sigmoid === Classifier model (full training set) === LibSVM wrapper.04 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 143 86. region size (0.6667 % 22 13.8167 % Root relative squared error 78.6513 Mean absolute error 0.

  2.788 0. Con el kernel de función de Base Radial el procentaje de instancias clasificadas incorrectamente fue 0 por tal razón los porcentajes de error.732 0. precisión 1.695 0. Es utilizado en la inteligencia artificial y teoría de la información.classified as 30 22 | a = senior 0 113 | b = junior  Este tipo de algoritmo nos presenta el numero correcto de instancias que se clasificaron y las que no se clasificaron correctamente con sus respectivos porcentajes. con valores como tasa TP.788 0. 0. se ha demostrado que es un algoritmo útil en aproximada de gráficos generales.290 0. medida F 1 y MCC 1.867 0.837 1.888 0. y la raíz del error cuadrado es 0 y la tada TP esta en 1. nos muestra un error relativo.867 0. una cobertura de los casos y su porcentaje. Con cada tipo de kernel diferente el número de instancias correctas que se clasifican correcta e incorrectamente cambia por tal razón varían todos los datos y porcentajes.788 0.837 junior Weighted Avg.000 0.797 === Confusion Matrix === a b <-. Calculando la distribución marginal de cada nodo. recall 1.911 0.000 0. Precisión. MCC con valores que varían entre 0 y 1.695 0. la raíz del error cuadrado y su error relativo. medida F.000 1.  Se hace una tabla de valores con los detalles de la precisión por clase. tada FP 1. .0.710 senior 1. re-llamado.577 0. Belief propagation Es un algoritmo para realizar inferencias en modelos gráficos. tasa FP.855 0.423 0. como redes bayesianas y los campos aleatorios de Markov .000 0.695 0.577 0.