You are on page 1of 1

Homework 3

(due: 23:59, 13/Dec/2021)


Use telecom_Churn.xlsx dataset (on Moodle) to:
1. [40 points] obtain a model estimating the likelihood of customer churn using different
modeling approaches that you know,
2. [40 points] compare several model evaluation metrics and back up your model selection
decision based on the performance of various measures (e.g. ROC AUC, recall etc.),
3. [20 points] interpret the results of the final model.

Below, are the steps you need to take in Python.


1. Read the dataset.
2. Check whether there are any missing values/duplicates or not, and if yes, deal with them
accordingly.
3. Check whether there are any variables that have only one value and remove them if any.
4. Check whether you have correlated features or not, and if yes, deal with them
accordingly.
5. Do some descriptive analysis (visualizations), which may help to improve the model
afterwards.
6. Calculate the percentage of churn in your dataset to have some opinion on general
accuracy you could benchmark without using any model (naive approach).
7. Develop a model which estimates the probability of churn as correctly as possible (you
may do variable transformations).
8. Calculate Sensitivity, Specificity, ROC AUC and any other measure you find appropriate
for evaluating model performance and choose a winner model.
9. Interpret the winner model.

You might also like