Professional Documents
Culture Documents
Biostatistics in Practice 1
Contents
Statistical test for binary (or categorical) data
FREQ procedure
Simple and multiple logistic regression analysis
LOGISTIC procedure
Biostatistics in Practice 2
Chi-square test: independent binary data
Clinical trial data (浜田,2017; p.115)
Group: placebo, active
Outcome: effective, ineffective
Biostatistics in Practice 3
Type of data set
Each subject data Frequency data
ID Group Outcome Group Outcome N
1 Placebo 1 Placebo 0 51
2 Active 0 Placebo 1 4
3 Placebo 0 Active 0 38
4 Placebo 1 Active 1 14
5 : :
Biostatistics in Practice 4
FREQ procedure: chi-square test
Provides frequency table and statistical tests for categorical
data
Syntax
data datchisq;
length group $ 100 outcome n 8;
input group outcome n;
cards;
Placebo 0 51
Placebo 1 4
Active 0 38 If you use frequency data,
Active 1 14 specify frequency variable in
;
run; WEIGHT statement
proc freq data=datchisq;
table group * outcome / chisq nocol nopercent ;
weight n;
run;
Biostatistics in Practice 5
FREQ procedure: results for chi-square test
Yates’s correction
Biostatistics in Practice 6
McNemar test: paired binary data
Clinical trial data (浜田,2017; p130)
Compare yes/no of showing symptoms between pre and post
treatment
Biostatistics in Practice 7
Type of data set
Each subject data Frequency data
ID Pre Post Pre Post N
1 1 1 0 0 7
2 0 0 0 1 19
3 1 0 1 0 6
4 0 1 1 1 16
5 : :
Biostatistics in Practice 8
FREQ procedure: McNemar’s test
Syntax
proc freq data=datmcnemar;
table pre * post / agree norow nocol;
exact agree;
weight n; Exact McNemar’s test
run;
Biostatistics in Practice 9
FREQ procedure: Results for McNemar’s test
Biostatistics in Practice 10
Logistic regression: model for binary data
Clinical trial data (浜田,2017; p.164)
Group: placebo(0), active(1)
Outcome: effective(1), ineffective(0)
Covariate: sex (0: female, 1: male)
Biostatistics in Practice 11
LOGISTIC procedure
Syntax
Hosmer-Lemeshow test
proc logistic data=datlogis ;
model outcome (event = "1") = group sex/lackfit;
freq n;
run;
Biostatistics in Practice 12
LOGISTIC procedure: results
Model information
Biostatistics in Practice 13
LOGISTIC procedure: results
Used for model selection
Regression coefficients
In this case, event proportion
is high, therefore odds ratio
cannot be interpreted as risk
ratio
Biostatistics in Practice 14
LOGISTIC procedure: results
Goodness of fit
c index(0.5~1)
0.5: same as coin toss
(nonsense)
1.0: perfectly fitted
Hosmer-Lemeshow test
Not significant -> well fitted
Biostatistics in Practice 15
Today’s exercise: EX15-9
1. A study was conducted to evaluate the relative efficacy of supplementation
with calcium versus calcitriol in the treatment of postmenopausal
osteoporosis. Calcitriol is an agent that has the ability to increase
gastrointestinal absorption of calcium. A number of patients withdrew from
this study prematurely due to the adverse effects of treatment, which
include thirst, skin problems, and neurologic symptoms. The relevant data
appear below.
Trt | Withdrawal Yes No Total
Calcitriol 27 287 314
Calcium 20 288 308
Total 47 575 622
(a) Compute the sample proportion of subjects who withdrew from the study in each
treatment group.
(b) Test the null hypothesis that there is no association between treatment group and
withdrawal from the study at the 0.05 level of significance. What do you conclude?
Biostatistics in Practice 16
Today’s exercise: EX 19-8
2. Consider the following data, taken from a study investigating
the relationship between smoking and aortic stenosis, a
narrowing or structure of the aorta that impedes the flow of
blood to the body. Since gender is associated with both of
these variables, we suspect that it might influence the
observed relationship between them.
Male Female
Disease | Smoker Yes No Total Disease | Smoker Yes No Total
Yes 37 25 62 Yes 14 29 43
No 24 20 44 No 19 47 66
Total 61 45 106 Total 33 76 109
Biostatistics in Practice 17
Today’s exercise: EX 19-8
(a) Using the presence of aortic stenosis as the response, fit a logistic
regression model with smoking status as the single explanatory variable.
Interpret the estimated coefficient of smoking status.
(b) What are the estimated odds of suffering from aortic stenosis for
individuals who smoke relative to those who do not?
(c) Construct a 95% confidence interval for the population odds ratio.
Does this interval contain the value 1? What does this tell you?
(d) Add the explanatory variable gender to the model that already contains
smoking status. What is the estimated relative odds of aortic stenosis for
smokers versus nonsmokers, adjusting for gender?
(e) Construct a 95% confidence interval for the population odds ratio that
adjusts for gender. What do you conclude?
(e) Do you believe that the relationship between the presence of aortic
stenosis and smoking status differs for males and females? Explain.
Biostatistics in Practice 18