You are on page 1of 4

Summary

We do modeling and forecasting for X Education Company and look for ways to change the user
experience. We will better understand, use data, and draw conclusions to build better teams and
increase conversions. Let's discuss the following steps:

1. EDA:

• Quick check of percentage Critical Columns that are inferentially checked as critical with more than
45% missing values.

• We also see that rows with null values receive more data and are important. Therefore, we replace
the Nan values with "Not Given".

• Since most of the results are missing in India, we give all the results not given to India.

• We later found that the number of values for India was too high (almost 97% of the data), so this
row was deleted.

• We also looked at different numbers, ratios, and ratios.

2. Training-Test Split and Scaling:

• Training and testing data are split between 70% and 30% respectively.

• ['Total Number of Visits', 'Number of Page Views per Visit', 'Total Time Spent on the Website']

We will do minimum-maximum scaling of 3 variables.

3. Model Design

• RFE for feature selection.

• Then get the first 15 different RFEs.

• Then, variables were manually removed based on their VIF values and p values.

• The overall accuracy was checked by creating a confusion matrix and the result was 80.91%.

4. Evaluation of the model

• Sensitivity-Specificity
If we accept the sensitivity-specificity evaluation. We will get:

▪ Regarding the training data

o Find the optimal bound using the ROC curve. The area under the ROC curve is 0.88.

o After drawing the graph, we found that the best cut-off value is 0.35,

Accuracy 80.91%

Sensitivity 79.94%

Specificity 81.50%.

▪ Prediction on test data

o We achieved

Accuracy 80.02%

Sensitivity 79.23%

Specificity 80.50%

• Precision – Return:

If we take the Sensitivity – Return evaluation

▪ Regarding the training data

o At a cut-off value of 0.35, we achieve precision and return of 79.29% and 70.22%, respectively.

o Therefore, we need to change the exemption rate to increase the above percentage. After plotting
the graph, we found the best cut-off value: 0.44,

Accuracy 81.80%
Precision 75.71%

Recall 76.32%

▪ Regarding test data prediction

o We Achieve

Accuracy 80.57%

Precision 74.87%

73.26%

5. So if we go with Sensitivity-Specificity Evaluation the optimal cut-off value would be 0.35 &

If we go with Precision – Recall Evaluation the optimal cut-off value would be 0.44

Result

Variables that contribute the most to conversion:

• Potential customer location:

o Total visits

o Total time spent on site

• Potential customer sources:

o Potential Customer New Form

• Potential Customer Sources:

o Direct Traffic

o Google

o Welingak Site

o Organic Search

o Recommended Sites

Last Activity:

• Don't Send Email Yes


• Latest Activity Email Back

• Olark Chat Conversations

This model seems to work well at predicting price changes, and trust companies to make good calls
based on this We need to give the model.

You might also like