Professional Documents
Culture Documents
To be Submitted to
PROF. SREEDHARA R.
Presented by:
Anindya Biswas
1527605, M1
QUESTION:
A pharmaceutical firm that developed particular drug for women wants to understand the
characteristics that cause some of them to have an adverse reaction to a particular drug. They
collect data on 15 women who had such a reaction and 15 who did not. The variables measured
are:
1.
2.
3.
4.
The dependent variable indicates if there was an adverse reaction (1 = Yes; 0 = No)
B
P
10
0
12
0
11
0
10
0
95
11
0
12
0
15
0
16
0
12
5
13
5
16
5
14
5
12
0
10
0
10
0
95
12
0
12
5
13
0
12
0
12
0
Cholester
ol
Ag
e
Pregna
nt
DrugReacti
on
150
20
160
16
150
18
175
250
25
36
1
1
0
0
200
56
180
59
175
45
185
40
195
20
190
18
200
25
175
30
180
28
180
21
160
250
19
18
2
2
1
1
200
30
240
29
172
30
130
35
140
38
12
5
11
5
15
0
13
0
17
0
14
5
18
0
14
0
160
32
185
40
195
65
175
72
200
56
210
58
200
81
190
73
Observed
Step 1
Drug Reaction
No Reaction
No Reaction
Reaction
Overall PercentageC
Percentage
Reaction
Correct
11
73.3
13
86.7
80.0s
Classification Table:
The classification table is designed to display the overall prediction accuracy level of the model.
The variables involved under this are:
Observed: This indicates the number of 0's and 1's that are observed in the dependent variable
(which in our case is Drug Reaction).
Predicted: These are the predicted values of the dependent variable based on the full logistic
regression model. This table shows how many cases are correctly predicted, and how many
cases are not correctly predicted.
In our model, we can see that, of the 15 women with no reaction, the model correctly
identified 11 of them as not likely to have one. Similarly, of the 15 who did have a reaction,
the model correctly identifies 13 as likely to have one.
3
Overall Percentage: This gives the overall percent of cases that are correctly predicted by
the model. As we can see that the accuracy level has increased from 50% in the null model to
80% of the time overall in the block 1 classification table.
Chi-square
df
4.412
Sig.
8
.818
Model Summary
Step
-2 Log likelihood
Nagelkerke R
SquareA
SquareB
21.841a
.482
.643
Model Summary:
Cox and Snells R-Square: It attempts to imitate multiple R-Square based on likelihood,
but its maximum can be (and usually is) less than 1.0, making it difficult to interpret. In our case,
the value of Cox & Snell R Square is .482 which means that 48.2% of the variations in the
dependent variable (DrugReaction) can be explained by our predictor variables.
The Nagelkerke modification that does range from 0 to 1 is a more reliable measure of the
relationship. Nagelkerkes R2 will normally be higher than the Cox and Snell measure.
Nagelkerkes R2 is part of SPSS output in the Model Summary table and is the most-reported of
the R-squared estimates.
In our case, we see that it is indeed more (.643) than the value shown by the Cox & Snell R
Square, which means a more significant amount of variance in the dependent variable can
be explained by the predictor independent variables.
BP
S.E.
Wald
df
Sig.
Exp(B)
-.018
.027
.463
.496
.982
Cholesterol
.027
.025
1.182
.277
1.027
Age
.265
.114
5.404
.020
1.304
Pregnant
8.501
3.884
4.790
.029
4918.147
Constant
-26.375
13.680
3.717
.054
.000
Cholesterol
Age
Pregnant
150
160
150
175
250
200
180
175
185
195
190
200
175
180
180
160
250
200
240
172
130
140
160
185
195
175
20
16
18
25
36
56
59
45
40
20
18
25
30
28
21
19
18
30
29
30
35
38
32
40
65
72
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
Drug
Reaction
Predicted Probability
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
0.00003
0.00001
0.00002
0.00023
0.03352
0.58319
0.60219
0.01829
0.00535
0.24475
0.12197
0.40238
0.65193
0.66520
0.30860
0.13323
0.58936
0.85228
0.92175
0.69443
0.76972
0.90642
0.75435
0.98365
0.86545
0.97205
Predicted
Group
0
0
0
0
0
1
1
0
0
0
0
0
1
1
0
0
1
1
1
1
1
1
1
1
1
1
170
145
180
140
200
210
200
190
56
58
81
73
1
1
1
1
1
1
1
1
0.31892
0.62148
0.99665
0.98260
0
1
1
1
As we can see that with the cutoff at .500, the values that lie below .500 are classified as 0
or the ones who do not have any reaction to a particular drug whereas the ones whose
predicted probability value is more than .500 are then classified as 1 or those who would
be having an adverse reaction to the drug. This can be represented graphically as shown
below:
No Reaction to Drug
0.000
Cutoff = 0.500
Thus from the above table we can see that the predicted probability of six instances has been
incorrectly predicted bringing down the number of correct predictions to 24 out of 30 or
80% which as shown in the classification table earlier.
FINAL VERDICT:
The model has a significant discriminating power of 80% however it takes certain factors
such as Systolic Blood Pressure and the respondents Cholesterol level as insignificant
factors which are certainly not so in the real life scenario and in many cases is the deciding
factor on whether a particular drug can be administered to a patient or not.
However, this is mainly due to the fact that the Sample Size is too small and if the sample size is
increased, we do believe that eventually even these factors might prove to be significant in
the research process.