You are on page 1of 19

Example 1

Data from 8 male patients who underwent a thyroidectomy


Y= change in hemoglobin (%)
X
1
= duration of the operation (min)
X
2
= blood loss (ml)
Data:
Patient Y X
1
X
2
1 -1.7 105 503
2 - 4.6 80 490
3 -9.8 86 471
4 -1.1 112 505
5 -4.1 109 482
6 -3.3 100 490
7 0.4 96 513
8 -2.9 120 464
1 / 19
Estimated Models:
Model A: y
i
= 3.3875
Model B: y
i
= 14.520 + 0.110x
1i
R
2
=0.23
Model C: y
i
= 65.886 + 0.128x
2i
R
2
=0.50
Model D: y
i
= 84.256 + 0.129x
1i
+ 0.139x
2i
R
2
=0.88
Conclusion: Model D is substantially better than Model A, B,
and C.
2 / 19
For Model D:
y
i
=
0
+
1
x
1i
+
2
x
2i
+ e
i
e
i
iid
N(0,
2
)
SAS code 1:
proc glm
model y = x1 x2
SAS code 2:
proc glm
model y = x2 x1
3 / 19
proc glm
model y = x1 x2
Thyroidectomy ANOVA (Type I)
Source of
variation d.f. SS MS F p-val
reg on x
1
1 15.2 15.2 6.2 0.055
reg on x
2
after x
1
1 38.4 38.4 15.8 0.011
error 5 12.1 2.44
corrected total 7 65.7
Conclusions: (1) Without adjusting for linear eects of blood
loss (X
2
), the linear relationship between hemoglobin change
(Y) and operation time (X
1
) was not quite signicant. (2)
Blood loss (X
2
) has a signicant linear association with
hemoglobin change (Y) after adjusting for linear eects of
operation time (X
1
).
4 / 19
proc glm
model y = x2 x1
Thyroidectomy ANOVA (Type I)
Source of
variation d.f. SS MS F p-val
reg on x
2
1 32.9 32.9 13.5 0.014
reg on x
1
after x
2
1 20.6 20.6 8.5 0.033
error 5 12.1 2.44
corrected total 7 65.7
Conclusions: (1) Operation time (X
1
) has a signicant linear
association with Y after adjusting by X
2
. (2) Blood loss (X
2
)
has a signicant linear association with hemoglobin change
(Y) ignoring operation time (X
1
).
5 / 19
Example 2: Chapter 9 problem 5 (HW)
An experiment was conducted regarding a quantitative analysis
of factors found in high-density lipoprotein (HDL) in a sample
of human blood serum. Three variables throughout to be
predictive of, or associated with, HDL measurement (Y) were
the total cholesterol (X
1
) and total triglyceride (X
2
)
concentrations in the sample, plus the presence or absence of a
certain sticky component of the serum called sinking pre-beta,
or SPB (X
3
) coded as 0 if absent and 1 if present. The data
obtained are shown in book on page 156.
HDL Y cholesterol X
1
, triglyceride X
2
, SPB X
3
, X
1
X
3
,
X
2
X
3
, X
1
X
2
, etc.
6 / 19
Question a: Test whether X
1
alone signicantly helps to
predict Y.
Solution:
Full Model: y =
0
+
1
x
1
+ e;
Reduced Model: y =
0
+ e.
(Simple linear regression in chapter 5)
proc glm model y = x1 or
proc reg model y = x1
Use ANOVA table F test on model signicance or t-test
on
1
.
7 / 19
proc glm model y = x1 run;
8 / 19
Question b: Test whether X
1
, X
2
, X
3
together signicantly
helps to predict Y.
Solution:
Full Model: y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+ e;
Reduced Model: y =
0
+ e.
proc glm model y = x1 x2 x3 or
proc reg model y = x1 x2 x3
Use ANOVA table F test on model signicance.
9 / 19
proc glm model y = x1 x2 x3 run;
10 / 19
Question c: Test whether the true coecients of the product
terms X
1
X
3
and X
2
X
3
are simultaneously 0 in the model
containing X
1
, X
2
, and X
3
plus these product terms.
Solution:
Full Model:
y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+
4
x
1
x
3
+
5
x
2
x
3
+ e;
Reduced Model: y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+ e.
proc glm model y = x1 x2 x3 x1*x3 x2*x3
Use Type I ANOVA table to compute F
obs
and compare it
to an F distribution with correct degrees of freedom.
11 / 19
proc glm model y = x1 x2 x3 x1*x3 x2*x3 run;
12 / 19
Question d: Test whether X
3
is associated with Y, after
taking into account the combined contribution of X
1
and X
2
.
Solution:
Full Model: y =
0
+
1
x
1
+
2
x
2
+
3
x
3
+ e;
Reduced Model: y =
0
+
1
x
1
+
2
x
2
+ e.
proc glm model y = x1 x2 x3 or
proc reg model y = x1 x2 x3
Use Type III ANOVA table (reg x
3
after x
1
and x
2
) or
t-test on
3
.
Use output from question (b).
13 / 19
Conclusions:
Be aware of what is the full and reduced models in a test.
Fit the model accordingly such that the SS for test is in
the output or use the output can compute the
corresponding SS.
If SAS code is: proc glm model y = x2 x1 x3
Type I SS table contains (variable-added-in-order) :
SS(x2|1), SS(x1|1, x2), SS(x3|, 1, x2, x1).
Type III SS table contains (variable-added-last):
SS(x2|1, x1, x3), SS(x1|1, x2, x3), SS(x3|1, x1, x2).
t-test is based on Type III SS.
14 / 19
Example 3
Yield of a chemical process.
Y = Yield (%)
X
1
= Temperature (F)
X
2
= Time (hours)
Data:
Y X
1
X
2
77 160 1
79 160 2
82 165 1
83 165 2
85 170 1
88 170 2
90 175 1
93 175 2
15 / 19
r
x
1
,x
2
=

n
i =1
(x
1i
x
1
)(x
2i
x
2
)

n
i =1
(x
1i
x
1
)
2

n
i =1
(x
2i
x
2
)
2
= 0
Source d.f. Type I SS MS F p-val
reg on x
1
1 198.025 198.025 574.0 0.0001
reg on x
2
after x
1
1 10.125 10.125 29.3 0.0029
error 5 1.725 0.345
corrected total 7 209.875
Source d.f. Type I SS MS F p-val
reg on x
2
1 10.125 10.125 29.3 0.0029
reg on x
1
after x
2
1 198.025 198.025 574.0 0.0001
error 5 1.725 0.345
corrected total 7 209.875
16 / 19
Conclusions:
Generally speaking, for multiple regression, dierent
feeding order results in dierent Type I SS table; i.e.
proc glm model y = x1 x2 compared to
proc glm model y = x2 x1.
If and only if covariates (Xs) are uncorrelated, then the
Type I SS table remain the same regardless of the feeding
order; i.e. the contribution of one variable X
j
(extra SS X
j
explains) does not change no matter if adjusting X
k
to Y
or not (j = k). The opposite scenario: confounding will be
discussed later.
What will not change regardless the feeding order and
relationships of covariates?

j
for all j , SSR, SSE, SSY,
Type III table no matter we use model y = x1 x2 or
model y = x2 x1.
17 / 19
Test Intercept
Variable-added-last test: Test whether
0
is necessary
after tting X
1
, , X
k
in the model.
proc glm model y = x
1
x
2
... x
k
run;
Use t-test for intercept; i.e. use SS(1|x
1
, , x
k
) in test.
Variable-added-in-order test: Test whether
0
alone is
signicant.
F
obs
=
n

Y
2
SSY/(n 1)
Look for SSY in ANOVA table, and use proc mean to get

Y (or hand calculation of



Y). Then, compare F
obs
with
F
1,n1,1
for level test.
18 / 19
Summary
Multiple Regression (Important knowledge points)
Meaning of
j
s, j = 1, , k.
LSE estimates of
j
s, nd in SAS output.
Test signicance of each covariate X
j
, j = 1, , k, in
predicting Y after adjusting contribution of other
covariates or not.
Test signicance of simultaneous signicance of several
covariates in predicting Y with or without adjusting
contribution of other covariates.
SAS GLM procedure, statement, output, Type I and Type
III output, t-test for each covariate.
19 / 19