Professional Documents
Culture Documents
Diagnostics
U Dinesh Kumar
Objective Fitness Test
Predicted Value of Y
without the variable in
the model
Observed value of Y
Predicted Value of Y
with the variable in the
model
Lecture Outline
R2 in Logistic Regression.
W is a chi-square statistic
Wald test hypothesis
P-value less
than 0.05
Wald =
(0.036/0.006)2=33.066
Wald Test – Challenger Data
For statistically
significant variable
CI interval will not
have the value 1
Deviance
Deviance (goodness of fit test)
n
L( 0 , 1 ) = i (1 − i )1− yi
yi
i =1
n
ln( L( 0 , 1 )) = LL ( 0 , 1 ) = yi ln( i ) + (1 − yi ) ln(1 − i )
i =1
n
− 2 LL( 0 , 1 ) = −2 yi ln( i ) + (1 − yi ) ln(1 − i )
i =1
n
− 2 LL( 0 , 1 ) = −2 yi ln( i ) + (1 − yi ) ln(1 − i )
i =1
where
−1.635 + 0.036 X
e
i = −1.635 + 0.036 X
1+ e
-2LL before after adding variable “duration”
-2LL0 -2LL
H0: 1 = 2 = … k = 0
Model without
HA: Not all s are zero any predictor
variable
Null Model
G = 2 = −2LL
Given Model
Model with
predictor
variables
Model Chi-Square
TP 115
Sensitivity = = = 48.1
TP + FN 115 + 124
TN 507
Specificity = = = 90.4
TN + FP 507 + 54
Sensitivity & Specificity
No of true positives
Sensitivity =
Number of true positives + Number of false negatives
No of true negatives
Specificity =
Number of true negatives + Number of false positives
𝑇𝑃 19
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 27.94%
𝑇𝑃 + 𝐹𝑃 19 + 49
Precision Recall
F − Score = 2
Precision + Recall
Measures of Classification
Measure Interpretation
TP 115
Sensitivity = = = 48.1
TP + FN 115 + 124
TN 507
Specificity = = = 90.4
TN + FP 507 + 54
Values of various measures German
Credit Rating Example
Measure Value*
Specificity 90.4%
Precision 68.04%
F-Score 56.35%
* cut-off = 0.50
Receiver Operating Characteristics (ROC) Curve
ROC curve plots the true positive ratio (sensitivity)
against the false positive ratio (1-specificity) and
compares it with random classification.
The higher the area under the ROC curve, the better the
prediction ability.
Concordant and Discordant Pairs
Divide the dataset into positives (y=1) and negatives (y=0).
0.5 No discrimination
Youden’s Index.
4 + 1 +
I 1 I
I 1 I
F I 1 I
R 3 + 1 0 +
E I 1 0 I
Q I 1 0 I
U I 1 0 I
E 2 + 0 0 1 0 0 +
N I 0 0 1 0 0 I
C I 0 0 1 0 0 I
Y I 0 0 1 0 0 I
1 + 000 0 0 0 0 0 0 0 0 0 1 1 1 1 +
I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I
I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I
I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I
Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111
Mis-
Classification Plot: Challenger Crash
classification
Youden’s Index
Youden's index is a measures for diagnostic accuracy. It is
also a global measure of a test performance, used for the
evaluation of overall discriminative power of a diagnostic
procedure.
Youden's index is calculated by deducting 1 from the sum
of test’s sensitivity and specificity.
1 N10 N11
Optimal cut-off
Min C01 N 01 + C10 N10
p
German Credit Rating – Cost based cut-off
Cut-off Youden's
Probability P00 P01 P10 P11 C01 C10 Cost Index
0.05 0.15 0.85 0.01 0.99 100.00 200.00 88.00 0.13
0.10 0.32 0.68 0.07 0.93 100.00 200.00 81.80 0.25
0.15 0.48 0.52 0.11 0.89 100.00 200.00 74.20 0.37
0.20 0.56 0.44 0.14 0.86 100.00 200.00 72.3 0.42
0.25 0.63 0.37 0.21 0.80 100.00 200.00 78.1 0.42
0.28 0.67 0.33 0.23 0.77 100.00 200.00 77.8 0.45
0.30 0.71 0.29 0.24 0.76 100.00 200.00 77.3 0.47
0.35 0.76 0.24 0.76 0.70 100.00 200.00 175.3 0.46
Number of Number of
Number of Positives without Positives using Cumulative
Decile observations Model Model Positives Gain
1 452.1 52.1 223 223 0.42802303
2 904.2 104.2 122 345 0.6621881
3 1356.3 156.3 74 419 0.80422265
4 1808.4 208.4 38 457 0.87715931
5 2260.5 260.5 27 484 0.92898273
6 2712.6 312.6 11 495 0.95009597
7 3164.7 364.7 18 513 0.98464491
8 3616.8 416.8 3 516 0.99040307
9 4068.9 468.9 4 520 0.99808061
10 4521 521 1 521 1
GAIN CHART
Random Model
Lift
Number of Number of
Number of Positives Positives Cumulative
Decile observations without Model using Model Positives Gain Lift
1 452.1 52.1 223 223 0.42802303 4.28023
2 904.2 104.2 122 345 0.6621881 3.31094
3 1356.3 156.3 74 419 0.80422265 2.680742
4 1808.4 208.4 38 457 0.87715931 2.192898
5 2260.5 260.5 27 484 0.92898273 1.857965
6 2712.6 312.6 11 495 0.95009597 1.583493
7 3164.7 364.7 18 513 0.98464491 1.406636
8 3616.8 416.8 3 516 0.99040307 1.238004
9 4068.9 468.9 4 520 0.99808061 1.108978
10 4521 521 1 521 1 1
Lift Chart
R2 in Logistic Regression
2/n
LL( Null Model)
R = 1 −
2
LL( Model)
LL(Null Model) 2 / n
1 −
2 LL(Full Model)
Nagelkerke R =
2/n
1 − LL(null model)
R-square for Challenger Data
Model Summary
1 z1− / 2 SE( 1)
0 z1− / 2 SE( 0 )
CI for Challenger Beta Value
0.02424 0.0477
Confidence Intervals for Exp(1)
The confidence interval for the odds ratio, exp(1) can be
obtained by transforming the confidence interval for 1.
1 − z * S e ( 1 ) 1 + z * S e ( 1 )
(e ,e ) = (1.024,1.050)
Influential Observations
Cook’s distance should be less than 1 (otherwise
classified as influential observation).
http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm
http://www.ats.ucla.edu/stat/Spss/topics/logistic_regression
.htm