You are on page 1of 2
Exercises for submission - First call Abstract ‘This document contains the exercises tha have fo be mibitted fr elation ofthe mbject. Some of ‘he execs are included in the tutorials ofthe subject. In thse eases, hee we include a reference o where thay cat be foun "The deadline for the submiston the 25th af Decmber. Delayed submissions wil be aceeped, but the highest. atainable mark wll be reduced in frtr of 098 every day consdeing a whole day from bef init of day). That, for & work submited onthe 26th of December at any tin), the highs poole ‘hark willbe 8 and fr work subuitted onthe 27th of December, 1 wil be 96 “Al the mbmiions mst include the code needed to slve the propoted problem, propery documented to cesly explain the proposed sation. A, the Tenis (hen applicable) obtained shoslé be properly Presented and dicuseod All this can be icladed ins documento Re o i @ PDP generated wieh init fr BMarkdoew, being tithe prefered option, Exercise 1 An outbreak of chickenpox has been detected in the sutonomows community ofthe Basque Country. Since itis ‘disease that in most cases is only passed once, we are interested in estimating the proportion of people who hhave already passed it in the UPV/EHU, to assess the nocd to limit access to the university for a few days, For ‘this purpose, «sample of 00 people (shidents snd employees) was taken at UPV/EHU and asked if they had fever had chickenpox. 443 of them answered yes. On the other hand, according to report by the Cals TT Health Institute, the average prevalenee in Spain is 95% in the adult population, with a variability of 0.2, ‘With those data, what conclusions weuld you draw? Why? Exercise 2 Considering the data svalable in the data sot medcoat AData, we ae interested in adjusting « Bayesian linear regression moda for the response variable chergea. To doo, fest study the disteibution ofthe response variable snd moda t consistently considering the eft of availuble covariates, Is it necessary to include all covariates in the model? Analyze which is the best model to explain the response variable and interpret and justify the results obtained. The following is & summary of che variables avalable in the database: ‘+ age: age of primary beneficiary 1 ser insurance contractor gender (Female/ male) '¢ bmi: Body mass index, providing an understanding of bods, weights that are relatively high or low relative to height, objective index of body weight (Rg/m?) using the ratio of height to weight, ideally 185 to 24 ‘children: Number of childsen covered by health insurance ‘smoker: Smoking (ys / no) ‘region: the benefcinny's residential are in the US, northeast, southowst, southwest, northwest, charges: Individual medical cots billed by health insurance Exercise 3 (7.1 in Tutorial #7) Using the Markov blanket is not the only way to select the most relevant variables forthe a elective Naive Bays, Another alternative is using independence tests between the predicting variables and the clas, ineluding in the ‘model only the variables wire the nll hypothesis (the variables being infependent) cau be rejected. Im this ‘exercise you have to implement an function similar to the learaSelect:vet8 function above (same arguments), but using the hypothesis testing approach. In pasticalar, the output of the function has to be compatible with the predictSelectveN® function. Once implemented, ase sin the mushroom sdaaset sind compase the reult with the version in the tutorial. For the independence test, there axe a couple of alternative. You can use the x® tet, implemented in the ehisq,tost function or the test based on the ‘mutual information, implemented in the ¢4 vat function, available ia the bnlearn pecage, Exercises 4 (8.1 in Tutorial #8) the code inthe Plackett-Lice section of Tori a a tating pon 8) Implement function to sample a Plackeet-Luce model following the algorithm described inthe section. ‘The function should have this definition: samplePL(nsins, cosfficionts), whore nsins indicate the ‘numberof samples to obtain and coef? icsents the coeficints of the model (that is, the a, parameters), ‘The function tas to return a matrix where each row isa permutation, ‘+ b) Got a sampling based approximation of the expected ranking of each algorithm. 1 ©) Get a sampling based approximation of the probability ofthe mode, 1d) Calculate the exact value ofthe expected ranking ofeach algorithin Excercise 5 (8.2 in Tutorial #8) Exercise 8.2 Taking into account that, in a Plackett-Luce model, the probebilty of an itom ¥ being ranked before item j is Sin, calculate these probabilities forall the pairs of algorithans in the model trained in the Plackett-Luce section of Tutorial #8, and check them with the results obtained directly from the data. Exercise 6 (8.3 in Tutorial #8) ‘the code in the Distance based modols section of Tutorial #8 asa starting point: ‘¢ a) Get a sampling based approximation of the expected ranking ofeach algorithm for both the Maliows snd the Generalized Mallows models. For the sampling, the PerMallows package includes two functions, rm and rex 4 b) Calculate the exact value of the expected ranking ofeach algorithm. '¢ ©) Got the exact probability (under both the Mallows and the Generalized Mallows mode!) that GA and IRS are the two worst algorithms. Approximate this probability wing sample of the distribution nd ‘compare it withthe exact value

You might also like