Binary Logistic Regression - Assignment ANINDYA BISWAS 1527605 - M1

Developing the Decision Model
through Logistic Regression

As part of Weekly Assignments
To be Submitted to
PROF. SREEDHARA R.
Presented by:
Anindya Biswas
1527605, M1
QUESTION:
A pharmaceutical firm that developed particular drug for women wants to understand the
characteristics that cause some of them to have an adverse reaction to a particular drug. They
collect data on 15 women who had such a reaction and 15 who did not. The variables measured
are:
1.
2.
3.
4.
Systolic Blood Pressure

Cholesterol Level
Age of the person
Whether or not the woman was pregnant (1 = No, 2 = Yes)
The dependent variable indicates if there was an adverse reaction (1 = Yes; 0 = No)
B
P
10
0
12
0
11
0
10
0
95
11
0
12
0
15
0
16
0
12
5
13
5
16
5
14
5
12
0
10
0
10
0
95
12
0
12
5
13
0
12
0
12
0
Cholester
ol
Ag
e
Pregna
nt
DrugReacti
on
150
20
160
16
150
18
175
250
25
36
1
1
0
0
200
56
180
59
175
45
185
40
195
20
190
18
200
25
175
30
180
28
180
21
160
250
19
18
2
2
1
1
200
30
240
29
172
30
130
35
140
38
12
5
11
5
15
0
13
0
17
0
14
5
18
0
14
0
160
32
185
40
195
65
175
72
200
56
210
58
200
81
190
73
Results & Analysis

Classification Tablea
PredictedB
Drug Reaction
A
Observed
Step 1
Drug Reaction
No Reaction
No Reaction
Reaction
Overall PercentageC
Percentage
Reaction
Correct
11
73.3
13
86.7
80.0s
a. The cut value is .500
Classification Table:
The classification table is designed to display the overall prediction accuracy level of the model.
The variables involved under this are:
Observed: This indicates the number of 0's and 1's that are observed in the dependent variable
(which in our case is Drug Reaction).
Predicted: These are the predicted values of the dependent variable based on the full logistic
regression model. This table shows how many cases are correctly predicted, and how many
cases are not correctly predicted.
In our model, we can see that, of the 15 women with no reaction, the model correctly
identified 11 of them as not likely to have one. Similarly, of the 15 who did have a reaction,
the model correctly identifies 13 as likely to have one.
3
Overall Percentage: This gives the overall percent of cases that are correctly predicted by
the model. As we can see that the accuracy level has increased from 50% in the null model to
80% of the time overall in the block 1 classification table.
Hosmer and Lemeshow Test

Step
1
Chi-square
df
4.412
Sig.
8
.818
Hosmer Lemeshow Test

The Hosmer and Lemeshow test which divides subjects into 10 ordered groups of subjects and
then compares the number actually in each group (observed) to the number predicted by the
logistic regression model (predicted).
The 10 ordered groups are created based on their estimated probability; those with estimated
probability below .1 form one group, and so on, up to those with probability .9 to 1.0.
If the Hosmer and Lemeshow statistic is greater than .05 as is the rule for well-fitting
models, we fail to reject the null hypothesis that there is no significant difference between
the observed and predicted values.
As in our case, the Hosmer and Lemeshow statistical value is .818 which is greater than .05
which means that it is not statistically significant and therefore our model is quite a good
fit.
Model Summary
Step
-2 Log likelihood
Cox & Snell R
Nagelkerke R
SquareA
SquareB
21.841a
.482
.643
a. Estimation terminated at iteration number 7 because

parameter estimates changed by less than .001.
Model Summary:
Cox and Snells R-Square: It attempts to imitate multiple R-Square based on likelihood,
but its maximum can be (and usually is) less than 1.0, making it difficult to interpret. In our case,
the value of Cox & Snell R Square is .482 which means that 48.2% of the variations in the
dependent variable (DrugReaction) can be explained by our predictor variables.
The Nagelkerke modification that does range from 0 to 1 is a more reliable measure of the
relationship. Nagelkerkes R2 will normally be higher than the Cox and Snell measure.
Nagelkerkes R2 is part of SPSS output in the Model Summary table and is the most-reported of
the R-squared estimates.
In our case, we see that it is indeed more (.643) than the value shown by the Cox & Snell R
Square, which means a more significant amount of variance in the dependent variable can
be explained by the predictor independent variables.
Variables in the Equation

B
Step 1a
BP
S.E.
Wald
df
Sig.
Exp(B)
-.018
.027
.463
.496
.982
Cholesterol
.027
.025
1.182
.277
1.027
Age
.265
.114
5.404
.020
1.304
Pregnant
8.501
3.884
4.790
.029
4918.147
Constant
-26.375
13.680
3.717
.054
.000
a. Variable(s) entered on step 1: BP, Cholesterol, Age, Pregnant.
Variables in the Equation

Since BP and Cholesterol show up as not significant, one can try to run the regression again
without those variables to see how it impacts the prediction accuracy. Since the sample size is
low, one cannot assume that they are insignificant. Walds test is best suited to large sample
sizes.
5
The prediction equation is:

odds of reaction
log ( a drug )=26.375.018 ( BP )+ .027 (Cholesterol ) +.265 ( Age ) +8.501( Pregnant )
As with any regression, the positive coefficients indicate a positive relationship with the
dependent variable. Herein we can say that with age women develop allergies to certain drugs
which might be the reason why we see that age has a positive effect on the odds of reaction to
a drug. Similarly, a pregnant woman might be allergic to many drugs due to the fact that the
infant she is carrying might reject them which possibly explains the positive impact of the
pregnancy factor in the equation.
We shall now calculate the predicted probability by putting the values of BP, Cholesterol, Age
and Pregnant in the above equation. We shall also calculate the predicted group with the cut off
as .500 and see the accuracy for ourselves as shown in the next page:
BP
100
120
110
100
95
110
120
150
160
125
135
165
145
120
100
100
95
120
125
130
120
120
125
115
150
130
Cholesterol
Age
Pregnant
150
160
150
175
250
200
180
175
185
195
190
200
175
180
180
160
250
200
240
172
130
140
160
185
195
175
20
16
18
25
36
56
59
45
40
20
18
25
30
28
21
19
18
30
29
30
35
38
32
40
65
72
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
Drug
Reaction
Predicted Probability
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0.00003
0.00001
0.00002
0.00023
0.03352
0.58319
0.60219
0.01829
0.00535
0.24475
0.12197
0.40238
0.65193
0.66520
0.30860
0.13323
0.58936
0.85228
0.92175
0.69443
0.76972
0.90642
0.75435
0.98365
0.86545
0.97205
Predicted
Group
0
0
0
0
0
1
1
0
0
0
0
0
1
1
0
0
1
1
1
1
1
1
1
1
1
1
170
145
180
140
200
210
200
190
56
58
81
73
1
1
1
1
1
1
1
1
0.31892
0.62148
0.99665
0.98260
0
1
1
1
As we can see that with the cutoff at .500, the values that lie below .500 are classified as 0
or the ones who do not have any reaction to a particular drug whereas the ones whose
predicted probability value is more than .500 are then classified as 1 or those who would
be having an adverse reaction to the drug. This can be represented graphically as shown
below:
No Reaction to Drug
Adverse Reaction to Drug

1.000
0.000
Cutoff = 0.500
Thus from the above table we can see that the predicted probability of six instances has been
incorrectly predicted bringing down the number of correct predictions to 24 out of 30 or
80% which as shown in the classification table earlier.
FINAL VERDICT:
The model has a significant discriminating power of 80% however it takes certain factors
such as Systolic Blood Pressure and the respondents Cholesterol level as insignificant
factors which are certainly not so in the real life scenario and in many cases is the deciding
factor on whether a particular drug can be administered to a patient or not.
However, this is mainly due to the fact that the Sample Size is too small and if the sample size is
increased, we do believe that eventually even these factors might prove to be significant in
the research process.

Binary Logistic Regression - Assignment ANINDYA BISWAS 1527605 - M1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Binary Logistic Regression - Assignment ANINDYA BISWAS 1527605 - M1

Uploaded by

Copyright:

Available Formats

Developing the Decision Model

through Logistic Regression

Systolic Blood Pressure

Results & Analysis

a. The cut value is .500

Hosmer and Lemeshow Test

Hosmer Lemeshow Test

Cox & Snell R

a. Estimation terminated at iteration number 7 because

Variables in the Equation

a. Variable(s) entered on step 1: BP, Cholesterol, Age, Pregnant.

Variables in the Equation

The prediction equation is:

Adverse Reaction to Drug

You might also like