You are on page 1of 2

Exercises

1.
Consider the data set in heart.csv.

1. Perform an exploratory analysis on the variables.


2. Adjust a logistic regression model using target has the dependent variable, using all
other variables as predictors.
3. Perform a residual analysis for the model commenting on its quality of adjustment.
4. Perform a test to a certain the validity of the model.
5. Calculate a 95% confidence interval for the age coefficient.
6. Fixating all other variables, calculate the odds ratio for gender. Comment.
7. Using a stepwise variable selection algorithm, indicate a subset of variables that
produce a good model adjustment.
8. Adjust a logistic regression model using the variables obtained in the previous paragraph
on a sample constituted by 80% of the observations in the dataset. Build a ROC curve
using cross-validation with the remaining 20%, indicate its area under the curve and
comment on the prediction accuracy of the model.

2.
Adjust a multinomial regression model for the iris.csv data set to predict the species. Using the
maximum probability as the indicator for the outcome prediction, build a classification matrix.

3.
The dataset bdiag.csv, included several imaging details from patients that had a biopsy to test
for breast cancer.

The variable Diagnosis classifies the biopsied tissue as M = malignant or B = benign.

1. Fit a logistic regression to predict Diagnosis using texture_mean and radius_mean.


2. Build the confusion matrix for the model above
3. Calculate the area and the ROC curve for the adjusted model.
4. Plot the scatter plot for texture_mean and radius_mean and draw the border line for
the prediction of Diagnosis based on the adjusted model.
5. If you wanted to use the model above to predict the result of the biopsy, but wanted to
decrease the chances of a false negative test, what strategy could you use?
4.
The SBI.csv dataset contains the information of more than 2300 children that attended the
emergency services with fever and were tested for serious bacterial infection. The outcome sbi
has 4 categories: Not Applicable(no infection) / UTI / Pneum / Bact.

1. Build a multinomial model using wcc, age, prevAB, pct, and crp to predict sbi.
2. Compute the confusion matrix.
3. How does the model classify a child with 1 year of age, wcc=29, pct=5, crp=200 and no
prevAB?

5.
The responses of the tobacco budworm Heliothis virescens to doses of pyrethroid trans-
cypermethrin were recorded (budworm.csv) from a small experiment. Twenty male and twenty
female moths were exposed at each of six doses of the pyrethroid, and the outcomw (killed or
not killed) was recorded.

1. Plot survival proportions against dose, distinguishing male and female moths.
2. Fit a logistic or regression on the survival outcome with dependent variables log(𝐷𝑜𝑠𝑒)
and 𝐺𝑒𝑛𝑑𝑒𝑟.
3. Check the model for goodness of fit and calculate the MacFaden 𝑅! , commenting its
value.
4. Determine the odds ratio for comparing the odds of a male moth dying to the odds to a
female moth dying.
5. Determine if there is any evidence of a difference in the mortality rates between the
male and female moths.
6. Determine the 90% confidence interval for the gender effect.

6.
After the explosion of the space shuttle Challenger on January 28, 1986, a study was conducted
[1, 4] to determine if previously-collected data about the ambient air temperature at the time
of launch could have been used to foresee potential problems with the launch (data set:
shuttles.csv).

1. Plot the data.


2. Fit and interpret a model for the O-ring failure using the temperature as an explanatory
variable.
3. Check the model for goodness of fit.
4. Perform a diagnostic analysis.
5. On the day of the Challenger launch, the forecast temperature was 31◦F. What is the
predicted probability of an O-ring failure?

You might also like