You are on page 1of 1

Part 2 Tasks

1. Break your data in to random parts train and test, train should contina 70% of the observation. Make
sure this random sampling is reproducible. [10]
2. Remove predictor variables with VIF>5 from the train data. [10]
3. Build logistic regression model for the response y with remaining variables. Use function step to drop
insignificant variables from your model. Check if p-values in resulting model for all the variables are
less than 0.05. If they are not , get rid of such variables sequentially. For thus obtained final model, get
scores and save them in the train data.[30]
4. Get cutoff of the score using KS method. Check performance of the model thus obtained on the test
data.[20]
5. Build a random Forest model on the same train data and report if performance of this model is better
than logistic regression model on the test data. Get a variable importance plot. [30]
6. Bonus Question: [ it is not mandatory to attempt this, if you do , you get bonus marks ] Pick top
6 variables from the variable importance plot and build a logistic regression model with just those
variables [ get cutoff using KS method]. See how it performs on the test data. [20]

You might also like