Professional Documents
Culture Documents
We do modeling and forecasting for X Education Company and look for ways to change the user
experience. We will better understand, use data, and draw conclusions to build better teams and
increase conversions. Let's discuss the following steps:
1. EDA:
• Quick check of percentage Critical Columns that are inferentially checked as critical with more than
45% missing values.
• We also see that rows with null values receive more data and are important. Therefore, we replace
the Nan values with "Not Given".
• Since most of the results are missing in India, we give all the results not given to India.
• We later found that the number of values for India was too high (almost 97% of the data), so this
row was deleted.
• Training and testing data are split between 70% and 30% respectively.
• ['Total Number of Visits', 'Number of Page Views per Visit', 'Total Time Spent on the Website']
3. Model Design
• Then, variables were manually removed based on their VIF values and p values.
• The overall accuracy was checked by creating a confusion matrix and the result was 80.91%.
• Sensitivity-Specificity
If we accept the sensitivity-specificity evaluation. We will get:
o Find the optimal bound using the ROC curve. The area under the ROC curve is 0.88.
o After drawing the graph, we found that the best cut-off value is 0.35,
Accuracy 80.91%
Sensitivity 79.94%
Specificity 81.50%.
o We achieved
Accuracy 80.02%
Sensitivity 79.23%
Specificity 80.50%
• Precision – Return:
o At a cut-off value of 0.35, we achieve precision and return of 79.29% and 70.22%, respectively.
o Therefore, we need to change the exemption rate to increase the above percentage. After plotting
the graph, we found the best cut-off value: 0.44,
Accuracy 81.80%
Precision 75.71%
Recall 76.32%
o We Achieve
Accuracy 80.57%
Precision 74.87%
73.26%
5. So if we go with Sensitivity-Specificity Evaluation the optimal cut-off value would be 0.35 &
If we go with Precision – Recall Evaluation the optimal cut-off value would be 0.44
Result
o Total visits
o Direct Traffic
o Google
o Welingak Site
o Organic Search
o Recommended Sites
Last Activity:
This model seems to work well at predicting price changes, and trust companies to make good calls
based on this We need to give the model.