You are on page 1of 18

Biostatistics Practice

7. Categorical data analysis


Kazushi Maruo
maruo@md.tsukuba.ac.jp

Biostatistics in Practice 1
Contents
Statistical test for binary (or categorical) data
FREQ procedure
Simple and multiple logistic regression analysis
LOGISTIC procedure

Biostatistics in Practice 2
Chi-square test: independent binary data
Clinical trial data (浜田,2017; p.115)
Group: placebo, active
Outcome: effective, ineffective

Group | Effect Effective Ineffective


Placebo 4 51
Active 14 38

Biostatistics in Practice 3
Type of data set
Each subject data Frequency data
ID Group Outcome Group Outcome N
1 Placebo 1 Placebo 0 51
2 Active 0 Placebo 1 4
3 Placebo 0 Active 0 38
4 Placebo 1 Active 1 14
5 : :

Outcome; 0: Ineffective, 1: Effective

Both types of data are available

Biostatistics in Practice 4
FREQ procedure: chi-square test
Provides frequency table and statistical tests for categorical
data
Syntax
data datchisq;
length group $ 100 outcome n 8;
input group outcome n;
cards;
Placebo 0 51
Placebo 1 4
Active 0 38 If you use frequency data,
Active 1 14 specify frequency variable in
;
run; WEIGHT statement
proc freq data=datchisq;
table group * outcome / chisq nocol nopercent ;
weight n;
run;

Biostatistics in Practice 5
FREQ procedure: results for chi-square test

Yates’s correction

Biostatistics in Practice 6
McNemar test: paired binary data
Clinical trial data (浜田,2017; p130)
Compare yes/no of showing symptoms between pre and post
treatment

Pre | Post Show symptoms No symptoms Total


Show symptoms 16 6 22
No symptoms 19 7 26
Total 35 13 48

Biostatistics in Practice 7
Type of data set
Each subject data Frequency data
ID Pre Post Pre Post N
1 1 1 0 0 7
2 0 0 0 1 19
3 1 0 1 0 6
4 0 1 1 1 16
5 : :

Outcome; 0: No symptoms, 1: Show symptoms

Biostatistics in Practice 8
FREQ procedure: McNemar’s test
Syntax
proc freq data=datmcnemar;
table pre * post / agree norow nocol;
exact agree;
weight n; Exact McNemar’s test
run;

Biostatistics in Practice 9
FREQ procedure: Results for McNemar’s test

Used for reliability research

Biostatistics in Practice 10
Logistic regression: model for binary data
Clinical trial data (浜田,2017; p.164)
Group: placebo(0), active(1)
Outcome: effective(1), ineffective(0)
Covariate: sex (0: female, 1: male)

Types of data set are similar as FREQ procedure

Biostatistics in Practice 11
LOGISTIC procedure
Syntax
Hosmer-Lemeshow test
proc logistic data=datlogis ;
model outcome (event = "1") = group sex/lackfit;
freq n;
run;

Event level can be


specified optionally
Similar as WEIGHT option (default: first level (0))
in FREQ procedure

Biostatistics in Practice 12
LOGISTIC procedure: results
Model information

Biostatistics in Practice 13
LOGISTIC procedure: results
Used for model selection

Null hypothesis: all


regression coefficients = 0

Regression coefficients
In this case, event proportion
is high, therefore odds ratio
cannot be interpreted as risk
ratio

Biostatistics in Practice 14
LOGISTIC procedure: results
Goodness of fit

c index(0.5~1)
0.5: same as coin toss
(nonsense)
1.0: perfectly fitted

Hosmer-Lemeshow test
Not significant -> well fitted

Biostatistics in Practice 15
Today’s exercise: EX15-9
1. A study was conducted to evaluate the relative efficacy of supplementation
with calcium versus calcitriol in the treatment of postmenopausal
osteoporosis. Calcitriol is an agent that has the ability to increase
gastrointestinal absorption of calcium. A number of patients withdrew from
this study prematurely due to the adverse effects of treatment, which
include thirst, skin problems, and neurologic symptoms. The relevant data
appear below.
Trt | Withdrawal Yes No Total
Calcitriol 27 287 314
Calcium 20 288 308
Total 47 575 622
 (a) Compute the sample proportion of subjects who withdrew from the study in each
treatment group.
 (b) Test the null hypothesis that there is no association between treatment group and
withdrawal from the study at the 0.05 level of significance. What do you conclude?

Biostatistics in Practice 16
Today’s exercise: EX 19-8
2. Consider the following data, taken from a study investigating
the relationship between smoking and aortic stenosis, a
narrowing or structure of the aorta that impedes the flow of
blood to the body. Since gender is associated with both of
these variables, we suspect that it might influence the
observed relationship between them.
Male Female
Disease | Smoker Yes No Total Disease | Smoker Yes No Total
Yes 37 25 62 Yes 14 29 43
No 24 20 44 No 19 47 66
Total 61 45 106 Total 33 76 109

Biostatistics in Practice 17
Today’s exercise: EX 19-8
(a) Using the presence of aortic stenosis as the response, fit a logistic
regression model with smoking status as the single explanatory variable.
Interpret the estimated coefficient of smoking status.
(b) What are the estimated odds of suffering from aortic stenosis for
individuals who smoke relative to those who do not?
(c) Construct a 95% confidence interval for the population odds ratio.
Does this interval contain the value 1? What does this tell you?
(d) Add the explanatory variable gender to the model that already contains
smoking status. What is the estimated relative odds of aortic stenosis for
smokers versus nonsmokers, adjusting for gender?
(e) Construct a 95% confidence interval for the population odds ratio that
adjusts for gender. What do you conclude?
(e) Do you believe that the relationship between the presence of aortic
stenosis and smoking status differs for males and females? Explain.
Biostatistics in Practice 18

You might also like