You are on page 1of 19

AMBO UNIVERSITY WOLISO

CAMPUS

Advanced biostatics assignment: on Logistic Regression and


survival analysis

Summited to Tariku Dejene (Associate professor, PhD fellow)

Submission date April 13, 202

1
Group Members Name and ID. Number

1.Abdi Alemayo MPHE/PGW/001/14


2. Abiyot Bekele MPHE/PGW/ 003/14
3. Asefa Dumesa MPHE/PGW/004/14
4. Bekan Feyisa MPHE/PGW/005/14
5. Dabelo Deresa MPHE/PGW/______
6. Dejene Fita MPHE/PGW/005/14
7. Gudeta Taresa GMPH/PGW/009/14
8. Mengistu Nuguse MPHG/PGW/011/14
9. Setegn Gemechu MPHE/PGW/018/14
10. Endale Wondimu MPH/RH/PGW/008/14
11. Workinesh Ejata MPH/RH/PGW/019/14
12. Adane Tsegay MPH/RH/PGW/001/14
13. Said Kebede MPHE/PGW/014/14

2
PART I: Categorical Data Analysis
1. Use regular health checkup, BMI, age and exercise as predictors and describe your finding
(NB: Both crude and adjusted coefficients shall be reported and the variable BMI should be
included as a categorical variable).
A. Heart attack versus exercise
H0: There is no association between Heart attack and exercise
HA: There is association between Heart attack and exercise
Response variable: heart attack
Independent variables: exercise
No exercise = 0 (reference category); Exercise regularly = 1
Table 1: SPSS output for logistic regression
Variables in the Equation
95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
a
Step 1 exercises regularly(1) -2.522 .210 143.833 1 .000 .080 .053 .121
Constant 2.053 .172 141.909 1 .000 7.789
a. Variable(s) entered on step 1: exercises regularly.

Logistic regression Equation:


Log (Heart attack) =2.053 – 2.522(reg. exercise)
The SPSS output on the above table 1 shows the estimates of βo and β1 are given by bo = 2.053
and b1 = -2.522 respectively. So from the above table, we estimate that the odds of heart attack
among those who do not exercise regularly is = 7.789. This means those who do not
exercise regularly are around eight times at risk of heart attack when compared to those exercise
regularly. In addition, from the table, we estimate that the odds ratio for those who exercise
regularly is = 0.080. In other words, the odds of heart attack is 92% lower among those
who exercise regularly (COR = 0.080) as compared to those who do not.
Table 2: SPSS output for logistic regression
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step 182.581 1 .000
Block 182.581 1 .000
Model 182.581 1 .000

3
Here, on the above table chi-square is highly significant (chi-square= 182.581, df =1, p=0.000).
So our full model (model with independent variables) is significantly better than null model
(model without independent variables).

B. Heart attack and BMI_cat

H0: There is no association between Heart attack and BMI

HA: There is association between Heart attack and BMI

Outcome variable: heart attack

Independent variables: exercise

BMI ≤ 30 kg/m2 = 0 (reference category); BMI ≥ 30 kg/m2 = 1

Table 3: SPSS output for logistic regression

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
a
Step 1 BMI_cat(1) 2.095 .342 37.529 1 .000 8.125 4.156 15.881
Constant .347 .091 14.644 1 .000 1.415
a. Variable(s) entered on step 1: BMI_cat.

Logistic regression Equation:

Log (Heart attack) = 0.347 +2.095(BMI_cat (≥ 30)


Individuals who have BMI ≥ 30 kg/m2 have eight times more likely to experience heart attack as
compared to BMI less than thirty (COR = 8.125). The odd of heart attack is eight times higher in
those who have BMI greater than 30 kg/m2. It has significant relationships with heart attack (p=
0.000)
Table: 4 SPSS output for logistic regression

Omnibus Tests of Model Coefficients


Chi-square df Sig.
Step 1 Step 59.146 1 .000
Block 59.146 1 .000
Model 59.146 1 .000

4
So our model with independent variables is significantly better than model without independent
variables (chi-square= 59.146, df =1, p=0.000).

C. HEAR attack and regular checkup

Table 5: SPSS output for logistic regression

Categorical Variables Codings


Parameter coding
Frequency (1)
receives regular checkups no 268 .000
yes 357 1.000

Table 6: SPSS output for logistic regression


Variables in the Equation
95% C.I.for
EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step receives regular .027 .170 .026 1 .872 1.028 .737 1.433
1a checkups(1)
Constant .616 .128 23.140 1 .000 1.851
a. Variable(s) entered on step 1: receives regular checkups.
Logistic regression Equation:
Log (Heart attack) = 0.616 + 0.027checkup
There is statistically non-significant association between heart attack and regular checkup (p =
0.872, COR = 1.028). The odds of heart attack is increased by 2.8% among those who have
regular checkup (COR = 0.973; 95% C.I (0.737, 1.433)) as compared to those who do not have
regular checkup.
Table 7: SPSS output for logistic regression
Omnibus Tests of Model Coefficients
Chi-square df Sig.
Step 1 Step .026 1 .872
Block .026 1 .872
Model .026 1 .872

5
D. heart attack and age
H0: There is no association between Heart attack and age
HA: There is association between Heart attack and age
Outcome variable: heart attack and Independent variables: Age in years

Logistic regression Equation:


Log (Heart attack) = - 13.314 +0.257Age

Table 8: SPSS output for logistic regression


Variables in the Equation
95% C.I.for
EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
a
Step 1 age .257 .040 40.437 1 .000 1.293 1.195 1.400
Constant -13.314 2.187 37.061 1 .000 .000
a. Variable(s) entered on step 1: age in years.

Age is statistically significant independent risk factors for the heart attack (p = 0.000, COR =
1.293). The odds of having heart attack is increased by 29.3 %( COR = 1.293) for each yearly
increase in age.

Table 9: SPSS output for logistic regression

Hosmer and Lemeshow Test


Step Chi-square df Sig.
1 2.880 6 .824
The Hosmer and Lemeshow test of the goodness of fit is non-significant (Chi-square = 2.880,

df =6, p-value=0.824) suggesting that the model is a good fit to the data.

Table 10: SPSS output for logistic regression

Omnibus Tests of Model Coefficients


Chi-square df Sig.
Step 1 Step 43.646 1 .000
Block 43.646 1 .000
Model 43.646 1 .000

6
The table 10 show that chi-square is highly significant (chi-square=43.646,df =1, p=0.000). So
our full model (model with independent variables) is significantly better than null model (model
without independent variables).

E. Multivariable Logistic regression

In the multivariable analysis, multiple logistic regressions were performed, using the forward
stepwise method in order to identify independent risk factors for the onset of heart attack.

Table: 11 SPSS output for Multivariable Logistic regression

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
a
Step 1 exercises regularly(1) -2.522 .210 143.833 1 .000 .080 .053 .121

Constant 2.053 .172 141.909 1 .000 7.789


b
Step 2 exercises regularly(1) -2.920 .241 147.026 1 .000 .054 .034 .087

age in years .391 .052 56.599 1 .000 1.478 1.335 1.637


Constant -18.947 2.765 46.946 1 .000 .000
c
Step 3 exercises regularly(1) -2.762 .246 126.165 1 .000 .063 .039 .102

BMI_cat(1) 1.704 .384 19.696 1 .000 5.495 2.589 11.662


age in years .396 .053 55.361 1 .000 1.486 1.339 1.650
Constant -19.553 2.839 47.425 1 .000 .000
a. Variable(s) entered on step 1: exercises regularly.
b. Variable(s) entered on step 2: age in years.
c. Variable(s) entered on step 3: BMI_cat.

Model 1 can be used as a comparison with model 2 and model 3 to evaluate the potential
confounding effect of the variable age and BMI_cat. When we see the Exp (B) column of regular
exercise of model 1 is 0.080(COR). After inclusion of variable age it decreased to 0.054(AOR).

The crude model (model 1) yields an estimated odds ratio that is somewhat lower than the
corresponding estimate obtained when we adjust for log age. Since AOR < COR; confounding

7
due age is controlled. When we compare model 1 with model 3; OR of regular exercise is
reduced to 0.063(AOR).
The confidence intervals width for regular exercise in model 1is equal to 0.068, in model 2 equal
to 0.053 while in model 3 it is 0.063. Therefore, model 2 gives a more precise estimate of the
hazard ratio than do model 1 and 2.
Table: 12 SPSS Output for logistic regression
Omnibus Tests of Model Coefficients
Chi-square Df Sig.
Step 1 Step 182.581 1 .000
Block 182.581 1 .000
Model 182.581 1 .000
Step 2 Step 67.061 1 .000
Block 249.642 2 .000
Model 249.642 2 .000
Step 3 Step 24.995 1 .000
Block 274.637 3 .000
Model 274.637 3 .000

Table: 13 SPSS Output for logistic regression


Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
a
1 624.539 .253 .349
a
2 557.478 .329 .454
b
3 532.484 .356 .490
a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001.
b. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001.

From the table 13 above Log-likelihoods (specifically the -2LLs) is reduced from the model one
to model three. So our full model is significantly better than null model. Nagelkerke’s R2 explain
the model 49% in model 3. The chi-square is significant for the entire three models. Due to this
entire reason model 3 is the best model

8
Logistic regression Equation for fitted model:

Log (Heart attack) = -19.553+1.704BMI_cat (≥30) - 2.762(reg .exercise) + 0.396age

Looking first at the results for BMI_CAT (1), there is a significant overall effect (Wald=19.696,
df =1, p=0.000). BMI_CAT (1) (BMI ≥ 30) has significant contribution to the heart attack as
compared to BMI <30. Those participants who have BMI ≥ 30 are more than five times
decreased risk of heart attack controlling for other independent variables (AOR = 5.495; 95% C.I
( 2.589, 11.662).

Regular exercises reduces the risk of heart attack by 93.7% (AOR = 0.063; 95% C.I (0.039,
0.102) controlling for the other variables. It has significant effect on the heart attack (p = 0.000)
The odds of having heart attack is increased by 48.6% for each additional yearly increase in age
(AOR = 1.486; C.I (1.339, 1.650) after adjusting other variables and it is significant risk factors
(p = 0.000).

Conclusion: Regular exercise (p = 0.000), BMI_cat (p = 0.000) and age (p = 0.000) are
statistically significant risk factors for heart attack.

2. In place of exercise, use intensity of exercise as a predictor and describe the result of the
revised model.

Reference category: exercise intensity = no exercise

BMI = BMI_cat (<30)

9
Table: 14 Table: SPSS Output for logistic regression

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
a
Step 1 exercise intensity 147.951 2 .000
exercise intensity(1) -2.303 .223 106.257 1 .000 .100 .065 .155
exercise intensity(2) -3.064 .294 108.544 1 .000 .047 .026 .083
Constant 2.053 .172 141.909 1 .000 7.789
Step 2b age in years .416 .054 59.333 1 .000 1.516 1.364 1.685
exercise intensity 150.677 2 .000
exercise intensity(1) -2.653 .252 110.686 1 .000 .070 .043 .115
exercise intensity(2) -3.718 .344 116.640 1 .000 .024 .012 .048
Constant -20.288 2.872 49.919 1 .000 .000
c
Step 3 BMI_cat(1) 1.716 .392 19.158 1 .000 5.562 2.580 11.995

age in years .420 .055 58.041 1 .000 1.521 1.366 1.695


exercise intensity 130.761 2 .000
exercise intensity(1) -2.488 .258 93.037 1 .000 .083 .050 .138
exercise intensity(2) -3.553 .349 103.509 1 .000 .029 .014 .057
Constant -20.797 2.933 50.279 1 .000 .000
a. Variable(s) entered on step 1: exercise intensity.

b. Variable(s) entered on step 2: age in years.

c. Variable(s) entered on step 3: BMI_cat.

The adjusted Odds Ratio tells us that individuals who have BMI greater than 30kg/m2 are 5.555
times more likely than those who have below 30kg/m2 (reference category) to develop hearth
attack(AOR= 5.555 ; 95% CI: 2.574, 11.985), controlling for other independent variables.
Individuals who have moderate exercise intensity have 91.7% (AOR = 0.083, CI: 0.050, 0.137)
decreased risk of heart attack; while heavy exercise intensity decrease by 97.1% (AOR = 0.029,
CI: 0.014, 0.057) as compared to those who have no exercise intensity. The intensity of exercise
have statistically significant (p = 0.000) relationship with heart attack.
3. Has the performance of the model improved? How did you make the judgment or arrive at
the conclusion? The performance of the model is not improved. When we look the confidence
intervals of the both model (model with regular exercise and model with exercise intensity) there
is no increase in width or decrease in the narrows’. They have almost the same width. The other
thing is when we look at AOR of the age and BMI; they are almost the same

10
PART II: Survival Analysis

1. Describe the survival experience of under-five children by selected characteristics of mothers


and children using a plot. Test whether the survival curves are different using an appropriate
test.
A. Survival analysis for survival time and place of residence

HO: Rural and urban under year five children have the same survival experience
HA: Rural and urban under five children have the different survival experience

Figure: 1 KM curves of survival analysis

Plots of the KM curves for urban and rural are shown here on the same graph. The KM curve for
urban is consistently higher than the KM curve for rural. These figures indicate that urban under-
five children have better survival experience than rural under-five children. As the number of
months increases, the two curves appear to get farther apart, suggesting that the beneficial effects
living in the urban than rural in reducing death of under-five children as the child stays longer in
urban.

11
Table: 15 SPSS output for survival analysis

Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 1.253 1 .263
Test of equality of survival distributions for the different levels of type of place of residence.

The log–rank statistic is 1.253 and the corresponding P-value is 0.263 to three decimal places.
This P-value indicates that we fail to reject the null hypothesis being tested that there is no
overall difference between the two survival curves.
Conclusion: Rural and urban under year five children have the same survival experience
B. Survival analysis for survival time and sex of child
HO: male and female under year five children have the same survival experience

HA: male and female under five children have different survival experience

Figure: 2 KM curves of survival analysis


The graph shows that the survivor function for the female lies above that for the male; this
difference indicates that female have better survival experience than male and the being female
increases the survival experience. The two curves are somewhat closer together in the first few
months, but thereafter are quite spread apart. At survival time 30months male have 94.2%
cumulative survival proportion while female have around 95.8% survival proportion.

12
Table: 16 SPSS output for survival analysis
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 7.063 1 .008
Test of equality of survival distributions for the different levels of sex of child.

Survival distributions is statistically significant among sex of child (Chi-Square = 7.063, df = 1,


p = 0.008). So that we reject null hypothesis
Conclusion: male and female under year five children have different survival experience
C. Survival analysis for survival time and sex of child
HO: No survival experience difference across different educational level
HA: Survival experience is different across different educational level

Figure: 3 KM curves of survival analysis


Under year five children who’s their mother have secondary and higher educational levels have
relatively high and same survival experience. Under five children who’s their mother have No
education and have primary educational level have close and lower survival experience when
compared to who’s their mother have secondary and higher educational level.

13
Table: 17 SPSS output for survival analysis
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 5.277 3 .153
Test of equality of survival distributions for the different levels of highest educational level.

Decision: Since p-value is greater than 0.05, we fail to reject null hypothesis
Conclusion: Mother’s educational levels have no effect on the survival experience of under year
five children.

2. Fit cox regression model for the factors identified in the previous steps. [Please produce both
crude and adjusted effect measures and also check all the assumptions of the model].
A. COX regression for sex of child
Reference category = male (0)
Table: 18 SPSS output for cox regression
Variables in the Equation
95.0% CI for Exp(B)
B SE Wald df Sig. Exp(B) Lower Upper
sex of child -.285 .108 6.925 1 .008 .752 .608 .930
Females have 24.8% (1- 0.752) decreased risk of dying before the age of five (Crude HR =
0.752) as compared to male. It is statistically significant (p = 0.008, 95% CI: (0.608, 0.930). The
assumption is met because the curves do not cross each other.

D. Cox regression for type of place of residence


Reference category = urban
Table: 19 SPSS output for Cox regression

Variables in the Equation


95.0% CI for
Exp(B)
B SE Wald df Sig. Exp(B) Lower Upper
type of place of .156 .140 1.235 1 .267 1.168 .888 1.538
residence

14
Under five year children who are living in the rural have 16.8% increased risk of dying (crude
HR = 1.168 95% CI: (0.888, 1.538)) as compared to under five year children who are living in
the urban. Place of residence is non-significant risk factor for dying of under five year children
(p = 0.268, 95% CI: (0.888, 1.538)).
The assumption is met because the curves of the two groups do not cross each other (figure 1).
E. Cox regression for highest educational level
Reference category = no education
Mother’s educational levels have no effect on the survival experience of under year five children.
Because the table below shows that that the overall p – value is greater than 0.05 and confidence
interval each category cross one as shown on the output. Overall mother mother’s educational
status is non-significant risk factors for hazard of dying of under year five children.

Table: 20 SPSS output for cox regression

Variables in the Equation

95.0% CI for Exp(B)


B SE Wald df Sig. Exp(B) Lower Upper
highest educational level 5.024 3 .170
highest educational -.038 .121 .100 1 .752 .962 .758 1.221
level(1)
highest educational -.610 .359 2.878 1 .090 .544 .269 1.099
level(2)
highest educational -.757 .504 2.256 1 .133 .469 .175 1.260
level(3)

As shown on figure 4 below; assumption of Cox PH model is not meets; hazard function curves
of no education and primary education touch each other. Relative hazard difference between
groups is not constant over time.

15
Figure: 4 Cox regression curves for survival function of educational level

F. Cox regression analysis for independent variable “child is twin”


Reference category = single birth
Table: 21 SPSS output Cox regression analysis

Variables in the Equation


95.0% CI for Exp(B)
B SE Wald df Sig. Exp(B) Lower Upper
child is twin 12.903 2 .002
child is twin(1) .980 .273 12.898 1 .000 2.664 1.561 4.547

child is twin(2) -6.018 85.655 .005 1 .944 .002 .000 1.977E+70

Hazard for dying of under year five children who is first multiple (crude HR = 2.664; 95% CI:
(1.561, 4.547) is 2.6 times the hazard for single birth. Second multiple birth (crude HR = 0.002)
have almost the same (99.8%) hazard for dying as single birth.

16
Figure: 5 SPSS output for cox regression analysis

The assumption is met because the curves do not cross each other.

G. Cox regression analysis for independent variable “ wealth index”


Reference category = poorest (0)
Table: 22 SPSS output for cox PH analysis
Variables in the Equation

B SE Wald df Sig. Exp(B)


wealth index 3.527 4 .474

wealth index(1) -.038 .157 .059 1 .808 .963

wealth index(2) -.161 .168 .920 1 .337 .851

wealth index(3) -.077 .163 .225 1 .635 .926

wealth index(4) -.274 .156 3.068 1 .080 .761

Hazard for dying of under year five children whose family is richer (crude HR = 0.926) is
reduced by 36.5% as compared to the hazard for under year five children whose family is
poorest. Under year five children whose family have middle income (crude HR = 0.926) have
14.9% decreased risk of dying when compared to poorest family. Wealth index is not significant
risk factors for hazard of dying of under year five children (p = 0.474).
Assumption of cox PH hazard model not met because survival curves do cross each other.

17
Figure: 6 SPSS output for cox regression analysis

H. Multivariable cox regression PH model


In the multivariable analysis, cox regression analysis was performed, using the forward
stepwise (conditional LR) method in order to identify independent risk factors for hazard
of dying of under year five children.

Reference category: child is twin = single birth

Sex of child = male

Table: 23 SPSS output for cox PH analysis


Variables in the Equation
95.0% CI for Exp(B)
B SE Wald df Sig. Exp(B) Lower Upper
Step 1 child is twin 12.903 2 .002
child is twin(1) .980 .273 12.898 1 .000 2.664 1.561 4.547
child is twin(2) -6.018 85.655 .005 1 .944 .002 .000 1.977E+70
Step 2 child is twin 12.809 2 .002
child is twin(1) .976 .273 12.803 1 .000 2.654 1.555 4.531
child is twin(2) -6.158 86.075 .005 1 .943 .002 .000 3.914E+70
sex of child -.284 .108 6.880 1 .009 .752 .608 .931

18
When we compare model 1 (step 1) with Model 2(step 2) to evaluate the potential confounding
effect of the variable “sex of child”; In particular the HR column for the “child is twin” variable
is the same in both model. This means is no confounding due to sex of the child. Thus, the crude
model yields an estimated hazard ratio that is same with adjusted model.
The confidence intervals for child is twin in each model are shown on above table 23. The
interval for first multiple in model 1 has width of 2.986; in model 2 the width is 2.976, therefore,
model 2 gives a more precise estimate of the hazard ratio than do model 1. The fitted cox
regression model is model 2.
The first multiple birth (AHR = 2.654; 95% CI: (1.555, 4.531)) have 2.6 times higher risk of
dying when compared to single birth. Females under year five child (AHR = 0.752; 95% CI:
(0.608, 0.931))

19

You might also like