1.8) Based on your analysis and working on the business problem, detail out
appropriate insights and recommendations to help the management solve the
business objective.
Inferences
~ Logistic Regression performed the best out of all the models build
- Logistic Regression Equation for the model:
(3.05008) * Intercept + (-0.01891) * age + (0.41855) * economic_cond_national + (0.06714) *
economic_cond_household + (0.62627) * Blair + (-0.83974) * Hague + (- 0.21413) * Europe + (-
0.40331) * political_ knowledge + (0.10881) * gender
The above equation help in understanding the model and the feature importance, how each feature
contributes to the predicted output.
Top 5 features in Logistic Regression Model in order of decreasing importance are-
1. Hague : |-0.8181846212178241|
2. Blair: 0.5460018962250501|
3. economic_cond_national: — |0.37700497490783885|
4. political knowledge |-0.34594856080054 13|
5. Europe : |-0.19691071679312278|
Insights and Recommendations
Our main Business Objective is - “To build a model, to predict which party a voter will vote for on the basis
of the given information, to create an exit poll that will help in predicting overall win and seats covered by a
particular party.”
‘+ Using Logistic Regression Model without scaling for predicting the outcome as it has the best
optimised performance.
‘+ Hyper-parameters tuning is an important aspect of model building. There are limitations to this as to
process these combinations huge amount of processing power is required. But if tuning can be
done with many sets of parameters than we might get even better results,
Gathering more data will also help in training the models and thus improving their predictive powers.
Boosting Models can also perform well ike CATBoost performed well even without tuning, Thus, if
we perform hyper-parameters tuning we might get better results.
‘+ We can also create a function in which all the models predict the outcome in sequence. This will
helps in better understanding and the probability of what the outcome will be.1.8 Based on these predictions, what are the insights?
+
te
ee
Accuracy of all the models is similar to each other on train data and test
data,
AUC and ROC curves appear similar on train data and test data.
Model score of all the models for train data and test data is similar and
close to each other's score.
From the summary of the confusion matrix, we can see that the actual and
the predicted data are very close to each other. This is the reflection of the
right fit model
FI score of all the models for train data and test data are almost same.
Model tuning gives better results, but bagging performs well on both train
data and test data.
Boosting technique shows good performance,
Based on overall performance of all the models, we can come toa
conclu:
ion that there is no overfitting nor under fitting issues in this case
study.8) Based on these predictions, what are the insights?
1)Comparing all the performance measure, Naive Bayes model from second iteration is,
performing best. Although there are some other models such as SVM and Extreme Boosting
which is performing almost same as that of Naive Bayes. But Naive Bayes model is very
consistent when train and test results are compared with each other. Along with other
parameters such as Recall value, AUC_SCORE and AUC_ROC_Curve, those results were
pretty good is this model.
2)Labour party is performing better than Conservative from huge margin.
3)Female voters turn out is greater than the male voters,
4)Those who have better national economic conditions are preferring to vote for Labour party.
5)Persons having higher Eurosceptic sentiments conservative party are preferring to vote for
Conservative party
6)Those who have higher political knowledge have voted for Conservative party
7)Looking at the assessment for both the leaders, Labour Leader is performing well as he has
got better ratings in assessment.