Professional Documents
Culture Documents
PGDM 2019-2021
Name SAP ID
Saurabh Pratap Singh 80203190169
Aditya Yadav 80203190184
Yogeshkumar Shankariya 80203190158
Logistic Regression:
Excel Results:
Cutoff =0.3
CONFUSION ACTUAL
MATRIX 1 0
1 156 49
PREDICTED
0 9 89
0.9454
Sensitivity 55
0.6449
Specificity 28
0.8085
Accuracy 81
Misclassifica 0.1914
tion 19
Cutoff =0.4
CONFUSION ACTUAL
MATRIX 1 0
1 154 34
PREDICTED
0 11 104
0.9333
Sensitivity 33
0.7536
Specificity 23
0.8514
Accuracy 85
Misclassifica 0.1485
tion 15
Cutoff =0.5
CONFUSION ACTUAL
MATRIX 1 0
1 148 29
PREDICTED
0 17 109
0.8969
Sensitivity 7
0.7898
Specificity 55
0.8481
Accuracy 85
Misclassificati 0.1518
on 15
R Results:
Concordance 0.8971429
Discordance 0.1028571
Tied 1.387779e-17
Pairs 2100
Cutoff = 0.397
CONFUSION ACTUAL
MATRIX 1 0
1 49 11
PREDICTED
0 1 31
0.9769
Sensitivity 8
0.7378
Specificity 1
0.8695
Accuracy 6
Misclassificati 0.1304
on 2
Observations:
For both R and excel we saw that the optimal cutoff is around 0.4 which will
provide the best sensitivity and accuracy value for the data provided.
Specificity: It is the ability of the test to correctly identify those without the
disease.
Both the sensitivity and specificity along with accuracy and misclassification
have been depicted above in the table.
The ROC curve plots true positive rate against false positive rate, giving a
picture of the whole spectrum of such tradeoffs.
High ROC curve implies a good model. We have ROC curve have an area of
0.8976.
The model should be Sensitive or Specific??
First, of all it depends on the business problem. It varies from one business
model to other. In some cases, sensitivity is of utmost importance and in some
cases, specificity is of vital importance.
So, we should understand the business problem before coming out whether
the problem deals with sensitivity or specificity.
Here, the problem deals with medical diagnosis of Heart diseases. Here the
sensitivity is very critical otherwise it could prove to be fatal for the patient
under consideration. Sensitivity measures how often a test correctly generates
a positive result for people who have the condition that’s being tested. So, if a
person has a heart disease but the model is not able to predict it, then the
sensitivity will decline and the life of the person is in risk. So, here a model
should be designed such that it should have high sensitivity because high
sensitivity implies low false negative rate and our model should be such that it
minimizes the false negative and maximises our true positive. So, for heart
disease prediction – SENSITIVITY is of utmost importance.