Logistic Regression Diagnostics 2019 PDF

Logistic Regression – Inference and
Diagnostics
U Dinesh Kumar
Objective Fitness Test
Predicted Value of Y
without the variable in
the model
Observed value of Y
Predicted Value of Y
with the variable in the
model
Lecture Outline
 Testing individual regression parameters (Wald’s test).
 Deviance (Deviation from a perfect Model)
 Null Deviance (-2LL0) and Model Deviance (-2LL or – 2LLM)
 Likelihood Ratio Test
 R2 in Logistic Regression.
 Confidence Intervals for parameters and probabilities.

Significance of Individual Parameters – Wald’s
Test
 Wald test is used to check the significance of individual
explanatory variables (similar to t-statistic in linear
regression).
 Wald test statistic is given by:

2
  
 i 
W =
 
 SE (  ) 
 i 
W is a chi-square statistic
Wald test hypothesis
 Null Hypothesis H0: 1 = 0
 Alternative Hypothesis H1: 1  0

Wald Test – German Credit Rating
P-value less
than 0.05
Wald =
(0.036/0.006)2=33.066
Wald Test – Challenger Data
For statistically
significant variable
CI interval will not
have the value 1
Deviance
Deviance (goodness of fit test)
 Deviance measures the deviation from the perfect

model (aka saturated model).
 The larger the value of Deviance, the worse the fit.
 Null Deviance is similar to SST (sum of squares of

total error) in linear regression and Model Deviance
is similar to SSE (sum of squared error) in linear
regression.
Null Deviance (-2LL0)
 Null deviance (-2LL0 or -2log-likelihood function) is

deviance value when no predictor variables are added to
the model.
 Null deviance = Prediction without using any features
 It is similar to total sum of squared errors (SST) in

multiple linear regression.
Likelihood , Log-likelihood, - 2LL
Likelihood function for logistic regression is given by:
n
L(  0 , 1 ) =   i (1 −  i )1− yi
yi
i =1
n
ln( L(  0 , 1 )) = LL (  0 , 1 ) =   yi ln( i ) + (1 − yi ) ln(1 −  i )
i =1
n
− 2 LL(  0 , 1 ) = −2   yi ln( i ) + (1 − yi ) ln(1 −  i )
i =1
Used for calculating Deviance

Null Model
-2LL0 = -2[N0 LN(N0/N) + N1 LN(N1/N)]

N0 and N1 are the number of 0s and 1s
in the data.
In the training sample we
have 561 good credits (N0)
and 239 bad credits (N1)
-2LL = -2 [561* ln (561/800) + 239 * ln(239/800)] = 975.682

Model Deviance (-2LL)
 Model deviance is the value of deviance after adding
features (predictors) to the model.
n
− 2 LL(  0 , 1 ) = −2   yi ln( i ) + (1 − yi ) ln(1 −  i )
i =1
where
−1.635 + 0.036 X
e
i = −1.635 + 0.036 X
1+ e
-2LL before after adding variable “duration”
-2LL0 -2LL
-2LL0 – (-2LL) = 975.075 – 941.754 = 33.928

Likelihood Ratio Test
Likelihood Ratio Test
 H0: 1 = 2 = … k = 0
Model without
 HA: Not all s are zero any predictor
variable
 Null Model 
G =  2 = −2LL 
 Given Model 
Model with
predictor
variables
Model Chi-Square
-2LL0 – (-2LL) = 975.075 – 941.754 = 33.928
CHIDIST(33.928,1) = 5.71 x10-9

Feature Selection in Logistic Regression
 Change in the value of Deviance is used for selecting features

(predictors).
 Variable that results in maximum reduction in deviance will be

chosen (provided the variable is statistically significant) for adding
to the model.
 likelihood without t he variable 

G = −2 ln 
 likelihood with the variable 
Gives decrease in deviance

after adding a variable
German Credit Rating – Final Model
Variables in the Equation
B S.E. Wald df Sig. Exp(B)

n
Step 14
Duration .029 .010 9.079 1 .003 1.030
@0DM 1.851 .247 56.230 1 .000 6.366
lessthan200DM 1.551 .245 39.995 1 .000 4.715
over200DM .843 .401 4.431 1 .035 2.324
critical -.781 .248 9.905 1 .002 .458
Bankpaid 1.001 .394 6.466 1 .011 2.722
CreditAmount .000 .000 4.724 1 .030 1.000
lessthan100 .801 .225 12.697 1 .000 2.228
less500 .630 .325 3.762 1 .052 1.878
SevenYears -.708 .258 7.559 1 .006 .492
Install_rate .338 .090 14.038 1 .000 1.402
MaritalStatusSM -.708 .187 14.388 1 .000 .493
CoapplicantGauran
tor -1.245 .442 7.924 1 .005 .288
Num_Credits .359 .180 3.971 1 .046 1.432

Constant -4.349 .487 79.694 1 .000 .013
Classification Table
Predicted
Credit Rating Percentage
Observed 0 (Negative) 1 (positive) Correct
Step Credit Rating 0 (Negative)
(TN)507 (FP)54 90.4
14 561
1
(Positive) (FN)124 (TP)115 48.1
239
Overall Percentage 77.8
 TP   115 
Sensitivity =  =  = 48.1
 TP + FN   115 + 124 
 TN   507 
Specificity =  =  = 90.4
 TN + FP   507 + 54 
Sensitivity & Specificity
No of true positives
Sensitivity =
Number of true positives + Number of false negatives
Sensitivity is the conditional probability that the predicted value

of y = 1 given that the observed value is 1 (also known as recall)
No of true negatives
Specificity =
Number of true negatives + Number of false positives
Specificity is the conditional probability that the predicted value

of y = 0 given that the observed value is 0
Precision
 Precision measures the ratio of true positive among cases
that are classified as positives:
 Precision = True Positive / (True Positive + False Positive)

Precision is useful Measure in Imbalanced
Data Set
 Assume that a test can identify presence of a disease with
95% accuracy.
 In a population only 2% are known to be affected by the

disease.
 If a population of 1000 people are tested using this test, then

the number of false positive will far exceed the number of
true positives.
Precision
Predicted
Disease 0 (Negative)
931 (TN) 49(FP) 95.0%
980
1
(Positive) 1(FN) 19(TP) 95.0%
20
Overall Percentage 95.0%
𝑇𝑃 19
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = = 27.94%
𝑇𝑃 + 𝐹𝑃 19 + 49
Overall accuracy is 95%, however, precision is only 27.94%

F Score (or F1 Score)
 F Score is a measure that combines both precision and
recall (harmonic mean between precision and
recall) and is given by
Precision  Recall
F − Score = 2
Precision + Recall
Measures of Classification
Measure Interpretation
Sensitivity (aka Recall) P(predicted class is positive | Class is positive)
Specificity P(predicted class is negative | Class is negative)
Precision P(Actual class is positive | predicted class is positive)
F-Score Harmonic mean of Precision and Recall

Classification Table
Predicted
Step Credit Rating 0 (Negative)
507 54 90.4
14 561
1
(Positive) 124 115 48.1
239
Overall Percentage 77.8
 TP   115 
Sensitivity =  =  = 48.1
 TP + FN   115 + 124 
 TN   507 
Specificity =  =  = 90.4
 TN + FP   507 + 54 
Values of various measures German
Credit Rating Example
Measure Value*
Sensitivity (aka Recall) 48.10%
Specificity 90.4%
Precision 68.04%
F-Score 56.35%
* cut-off = 0.50
Receiver Operating Characteristics (ROC) Curve
 ROC curve plots the true positive ratio (sensitivity)
against the false positive ratio (1-specificity) and
compares it with random classification.
 The higher the area under the ROC curve, the better the
prediction ability.
Concordant and Discordant Pairs
 Divide the dataset into positives (y=1) and negatives (y=0).
 For a randomly chosen positive and negative, if the probability

of positive (obtained using logistic regression model) is greater
than probability of negative then such pairs are called
concordant pairs.
 For a randomly chosen positive and negative, if the probability

of positive is less than probability of negative then such pairs
are called discordant pairs.
 Area under the ROC curve is the proportion of concordant

pairs in the dataset.
ROC and Area Under ROC
German Credit Rating with Duration as

covariate
ROC and Area Under ROC
German Credit Rating with after inclusion of

all variables
Area Under the ROC Curve
AUC = 0.629 AUC = 0.801

 Area under the ROC (AUC) curve is interpreted as the
probability that the model will rank a randomly chosen
positive higher than randomly chosen negative.
 If n1 is the number of positives (1s) and n2 is the number

of negatives (0s), then the area under the ROC curve is
the proportion of cases in all possible combinations of (i,
j) such that positives will have higher probability than
negatives.
AUC = P (Random Positive Observation) > P(Random Negative Observation)

Area Under the ROC Curve (AUC) is a measure of the

ability of the logistic regression model to discriminate
positives and negatives correctly.
ROC Curve
 General rule for acceptance of the model:
 If the area under ROC is:
 0.5  No discrimination
 0.7  ROC area < 0.8  Acceptable discrimination
 0.8  ROC area < 0.9  Excellent discrimination
 ROC area  0.9  Outstanding discrimination

Optimal Cut-off probabilities
 Using classification plots.
 Youden’s Index.
 Cost based optimization.

Classification Plots
Step number: 1
Observed Groups and Predicted Probabilities
4 + 1 +
I 1 I
I 1 I
F I 1 I
R 3 + 1 0 +
E I 1 0 I
Q I 1 0 I
U I 1 0 I
E 2 + 0 0 1 0 0 +
N I 0 0 1 0 0 I
C I 0 0 1 0 0 I
Y I 0 0 1 0 0 I
1 + 000 0 0 0 0 0 0 0 0 0 1 1 1 1 +
I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I
I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I
I 000 0 0 0 0 0 0 0 0 0 1 1 1 1 I
Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+----------
Prob: 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111
Mis-
Classification Plot: Challenger Crash
classification
Youden’s Index
 Youden's index is a measures for diagnostic accuracy. It is
also a global measure of a test performance, used for the
evaluation of overall discriminative power of a diagnostic
procedure.
 Youden's index is calculated by deducting 1 from the sum
of test’s sensitivity and specificity.
Youden' s Index J(p) = Sensitivity(p) + specificity(p) - 1

Cost based Model for Optimal Cut-off
Predicted
Observed 0 1
0 N00 N01
1 N10 N11
C00 = Cost of classifying 0 as 0

Optimal cut-off
Min C01 N 01 + C10 N10 
p
German Credit Rating – Cost based cut-off
C00 = Cost of classifying 0 as 0 = 0

Optimal Cut-off probability
Cut-off Youden's
Probability P00 P01 P10 P11 C01 C10 Cost Index
0.05 0.15 0.85 0.01 0.99 100.00 200.00 88.00 0.13
0.10 0.32 0.68 0.07 0.93 100.00 200.00 81.80 0.25
0.15 0.48 0.52 0.11 0.89 100.00 200.00 74.20 0.37
0.20 0.56 0.44 0.14 0.86 100.00 200.00 72.3 0.42
0.25 0.63 0.37 0.21 0.80 100.00 200.00 78.1 0.42
0.28 0.67 0.33 0.23 0.77 100.00 200.00 77.8 0.45
0.30 0.71 0.29 0.24 0.76 100.00 200.00 77.3 0.47
0.35 0.76 0.24 0.76 0.70 100.00 200.00 175.3 0.46
Minimum cost cut-off

Youden’s Index
Bank Term Deposit Data – Target
Marketing
Total data = 4521 Number of Positives = 521 Number of Negatives = 4000
Conversion Rate = 11.5%

Gain and Lift
Cumulative number of positive observations up to decile i

Gain =
Total number of positive observations in the data
Cumulative Gain using LR model

Lift =
Cumulative Gain using Random model
GAIN
Number of Number of
Number of Positives without Positives using Cumulative
Decile observations Model Model Positives Gain
1 452.1 52.1 223 223 0.42802303
2 904.2 104.2 122 345 0.6621881
3 1356.3 156.3 74 419 0.80422265
4 1808.4 208.4 38 457 0.87715931
5 2260.5 260.5 27 484 0.92898273
6 2712.6 312.6 11 495 0.95009597
7 3164.7 364.7 18 513 0.98464491
8 3616.8 416.8 3 516 0.99040307
9 4068.9 468.9 4 520 0.99808061
10 4521 521 1 521 1
GAIN CHART
Logistic Regression Model
Random Model
Lift
Number of Number of
Number of Positives Positives Cumulative
Decile observations without Model using Model Positives Gain Lift
1 452.1 52.1 223 223 0.42802303 4.28023
2 904.2 104.2 122 345 0.6621881 3.31094
3 1356.3 156.3 74 419 0.80422265 2.680742
4 1808.4 208.4 38 457 0.87715931 2.192898
5 2260.5 260.5 27 484 0.92898273 1.857965
6 2712.6 312.6 11 495 0.95009597 1.583493
7 3164.7 364.7 18 513 0.98464491 1.406636
8 3616.8 416.8 3 516 0.99040307 1.238004
9 4068.9 468.9 4 520 0.99808061 1.108978
10 4521 521 1 521 1 1
Lift Chart
R2 in Logistic Regression
 In linear regression R2 is the proportion of variation

explained by the regression model.
 It is not possible to develop a R2 type measure for

Logistic Regression since the variance of the error term
is not constant.
 Many Pseudo R2 values are used in Logistic Regression.

Pseudo R2 is an indicator of strength of relationship.
R2 in Logistic Regression
 R-squared is a measure of improvement from null model to
fitted model - The denominator of the ratio can be thought of
as the sum of squared errors from the null model--a model
predicting the dependent variable without any independent
variables.
 In the null model, each y value is predicted to be the mean of

the y values.
McFadden’s R2
LL( Model with predictors)

McFadden's R 2 = 1 −
LL(Intercept only Model)
A value of above 0.2 is acceptable

Cox and Snell R2
 Based on Log-likelihood ratio.
2/n
 LL( Null Model) 
R = 1 − 
2

 LL( Model) 
Null Model : Model without predictors
Full Model: Model with predictors
n is the number of observations

Nagelkerke R2
 The maximum value of Cox and Snell R2 may not be

1. Nagelkerke modified Cox and Snell R2 to the
maximum value = 1.
  LL(Null Model)  2 / n 
1 −   
2   LL(Full Model)  
Nagelkerke R = 
2/n 
 1 − LL(null model) 
 
 
R-square for Challenger Data
Model Summary
-2 Log Cox & Snell Nagelkerke

Step likelihood R Square R Square
1 20.371a .301 .430
a. Estimation terminated at iteration number 6 because
parameter estimates changed by less than .001.
Confidence Intervals for Beta Values
100 (1-)% confidence interval for 1 and 0 is given

by:
 
1  z1− / 2 SE( 1)
 
0  z1− / 2 SE( 0 )
CI for Challenger Beta Value
0.036 – 1.96*0.006    0.036 +1.96*0.006
0.02424    0.0477
Confidence Intervals for Exp(1)
 The confidence interval for the odds ratio, exp(1) can be
obtained by transforming the confidence interval for 1.
 If 1 is significant, the confidence interval will NOT have

value 1.
1 − z * S e ( 1 ) 1 + z * S e ( 1 )
(e ,e ) = (1.024,1.050)
Influential Observations
 Cook’s distance should be less than 1 (otherwise
classified as influential observation).
 Leverage values should be less than 3 times the average

leverage values.
 Standardized residual should be less than 3.

DFBeta
 DFBeta is the change in the estimated Beta value when an
observation is removed from the sample.
DFBeta = Beta (with observation in the sample) -

Beta (without the observation in the sample)
Recommended Readings
 D W Hosmer and S Lemeshow, “Applied Logistic Regression”,
John Wiley, 2000.
 Thomas P Ryan, “Modern Regression Methods”, John Wiley

2009
LR resources in the net
 http://faculty.chass.ncsu.edu/garson/PA765/logistic.htm
 http://www.ats.ucla.edu/stat/Spss/topics/logistic_regression
.htm

Logistic Regression Diagnostics 2019 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression Diagnostics 2019 PDF

Uploaded by

Copyright:

Available Formats

Logistic Regression – Inference and

 Testing individual regression parameters (Wald’s test).

 Deviance (Deviation from a perfect Model)

 Null Deviance (-2LL0) and Model Deviance (-2LL or – 2LLM)

 Likelihood Ratio Test

 Confidence Intervals for parameters and probabilities.

 Wald test statistic is given by:

 Null Hypothesis H0: 1 = 0

 Alternative Hypothesis H1: 1  0

 Deviance measures the deviation from the perfect

 The larger the value of Deviance, the worse the fit.

 Null Deviance is similar to SST (sum of squares of

 Null deviance (-2LL0 or -2log-likelihood function) is

 Null deviance = Prediction without using any features

 It is similar to total sum of squared errors (SST) in

Used for calculating Deviance

-2LL0 = -2[N0 LN(N0/N) + N1 LN(N1/N)]

-2LL = -2 [561* ln (561/800) + 239 * ln(239/800)] = 975.682

-2LL0 – (-2LL) = 975.075 – 941.754 = 33.928

-2LL0 – (-2LL) = 975.075 – 941.754 = 33.928

CHIDIST(33.928,1) = 5.71 x10-9

 Change in the value of Deviance is used for selecting features

 Variable that results in maximum reduction in deviance will be

 likelihood without t he variable 

Gives decrease in deviance

B S.E. Wald df Sig. Exp(B)

Num_Credits .359 .180 3.971 1 .046 1.432

Sensitivity is the conditional probability that the predicted value

Specificity is the conditional probability that the predicted value

 Precision = True Positive / (True Positive + False Positive)

 In a population only 2% are known to be affected by the

 If a population of 1000 people are tested using this test, then

Overall accuracy is 95%, however, precision is only 27.94%

Sensitivity (aka Recall) P(predicted class is positive | Class is positive)

Specificity P(predicted class is negative | Class is negative)

Precision P(Actual class is positive | predicted class is positive)

F-Score Harmonic mean of Precision and Recall

Sensitivity (aka Recall) 48.10%

 For a randomly chosen positive and negative, if the probability

 For a randomly chosen positive and negative, if the probability

 Area under the ROC curve is the proportion of concordant

German Credit Rating with Duration as

German Credit Rating with after inclusion of

AUC = 0.629 AUC = 0.801

 If n1 is the number of positives (1s) and n2 is the number

AUC = P (Random Positive Observation) > P(Random Negative Observation)

Area Under the ROC Curve (AUC) is a measure of the

 If the area under ROC is:

 0.7  ROC area < 0.8  Acceptable discrimination

 0.8  ROC area < 0.9  Excellent discrimination

 ROC area  0.9  Outstanding discrimination

 Cost based optimization.

Observed Groups and Predicted Probabilities

Youden' s Index J(p) = Sensitivity(p) + specificity(p) - 1

C00 = Cost of classifying 0 as 0

C00 = Cost of classifying 0 as 0 = 0

Minimum cost cut-off

Total data = 4521 Number of Positives = 521 Number of Negatives = 4000

Conversion Rate = 11.5%

Cumulative number of positive observations up to decile i

Cumulative Gain using LR model

Logistic Regression Model

 In linear regression R2 is the proportion of variation

0.036 – 1.960.006    0.036 +1.960.006