Professional Documents
Culture Documents
SSR SSE
R2 = =1−
SSTO SSTO
and represents the proportion of variation in Y “explained” by the
multiple linear regression model with predictors, x1 , x2 , . . . , xp−1 .
I However, R 2 always increases (or stays the same) as more predictors
are added to a multiple linear regression model, even if the predictors
added are unrelated to the response variable.
I An alternative measure,
n−1
adjusted R 2 = 1 − (1 − R 2 )
n−p
does not necessarily increase as more predictors are added, and can
be used to help us identify which predictors should be included in a
model and which should be excluded.
Significance Testing of Each Variable
I
Yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + εi
I As an example, to determine whether variable x1 is a useful predictor
variable in this model, we could test
H0 : β1 = 0 vs H1 : β1 6= 0
b1 − 0 b1
t? = =
se(b1 ) se(b1 )
Yi = β0 + β1 xi + εi , i = 1, . . . , n
Y1 = β0 + β1 x1 + ε1
Y2 = β0 + β1 x2 + ε2
..
.
Yn = β0 + β1 xn + εn
I We can formulate the above simple linear regression function in
matrix notation:
Y1 1 x1 ε
! 1
Y2 1 x2 β0 ε2
. = . . + (1)
..
. . . β
. . . 1 .
Yn 1 xn εn
Y = Xβ + ε
Least Squares Estimates in Matrix Notation
I X is a n × 2 matrix.
I Y is a n × 1 column vector, β is a 2 × 1 column vector, and ε is an
n × 1 column vector.
I The matrix X and vector β are multiplied together using the
techniques of matrix multiplication.
I The vector Xβ is added to the vector ε using the techniques of
matrix addition.
We can get the least squares estimates b0 and b1 using matrix notation.
b0
b= = (X0 X)−1 X0 Y (2)
b1
where
I X is a n × 2 matrix.
I X0 is the transpose of the X matrix.
I (X0 X)−1 is the inverse of the X0 X matrix.
I Y is a n × 1 column vector
In simple linear regression, X0 X is a 2 × 2 matrix
n
P
n xi
X0 X = n
i=1
n
xi2
P P
xi
i=1 i=1
and
I X is a n × p matrix.
I X0 is the transpose of the X matrix.
I (X0 X)−1 is the inverse of the X0 X matrix.
I Y is a n × 1 column vector.
Topics in Today’s Class
I Does the mean size of the infarcted area differ among the three
treatment groups – no cooling, early cooling, and late cooling – when
controlling for the size of the region at risk for infarction?
I If we translate this question into a model, then it is:
where:
I Yi is the size of the infarcted area (in grams) of rabbit i
I xi1 is the size of the region at risk (in grams) of rabbit i
I xi2 = 1 if early cooling of rabbit i, 0 if not
I xi3 = 1 if late cooling of rabbit i, 0 if not
and the independent error terms εi follow a normal distribution with mean
0 and equal variance σ 2 .
Categorical Variable
Yi = β0 + β1 xi1 + β2 + εi
Yi = β0 + β1 xi1 + β3 + εi
Yi = β0 + β1 xi1 + εi
I Thus, β2 represents the difference in mean size of the infarcted
area – controlling for the size of the region at risk – between
“early cooling” and “no cooling” rabbits.
I β3 represents the difference in mean size of the infarcted area –
controlling for the size of the region at risk – between “late
cooling” and “no cooling” rabbits.
Fitting the model to the rabbits’ data, the summary table in R is
summary(fit)
##
## Call:
## lm(formula = Infarcted ~ Area + X2 + X3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29410 -0.06511 -0.01329 0.07855 0.35949
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.13454 0.10402 -1.293 0.206459
## Area 0.61265 0.10705 5.723 3.87e-06 ***
## X2 -0.24348 0.06229 -3.909 0.000536 ***
## X3 -0.06566 0.06507 -1.009 0.321602
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1395 on 28 degrees of freedom
## Multiple R-squared: 0.6377, Adjusted R-squared: 0.5989
## F-statistic: 16.43 on 3 and 28 DF, p-value: 2.363e-06
As always, the researchers aren’t just interested in this sample. They want
to be able to answer their research question for the whole population of
rabbits.
Recall that the research question is: Does the mean size of the infarcted
area differ among the three treatment groups – no cooling, early cooling,
and late cooling – when controlling for the size of the region at risk for
infarction?
How Could the Researchers Use the Above Regression
Model to Answer Their Research Question?
I Note that the estimated slope coefficients b2 = −0.2435 and
b3 = −0.0657. If the estimated coefficients b2 and b3 were instead
both 0, then the average size of the infarcted area would be the same
for the three groups of rabbits in this sample.
I If the two slopes β2 and β3 simultaneously equal 0, then the mean
size of the infarcted area would be the same for the whole population
of rabbits – controlling for the size of the region at risk.
I That is, the researchers’s question reduces to testing the hypothesis
H0 : β2 = β3 = 0 vs H1 : At least one βk 6= 0 (k = 2, 3)
In this case, the researchers are interested in testing that all three
slope parameters are zero.
We’ll soon see that the null hypothesis is tested using the analysis of
variance F -test.
A Final Research Question
H0 : β1 = 0 and H1 : β1 6= 0
In this case, the researchers are interested in testing that just one of
the three slope parameters is zero. Wouldn’t this just involve
performing a t-test for β1 ?
We’ll soon learn how to think about the t-test for a single slope
parameter in the multiple regression framework.
The General Linear F -Test
Yi = β0 + εi
I “Fit the full model” to the data: Obtain the least squares estimates
of β0 and β1 . Determine the error sum of squares, which we denote
“SSE (F ).”
I “Fit the reduced model” to the data. Obtain the least squares
estimate of β0 . Determine the error sum of squares, which we denote
“SSE (R).”
Where are we going with this general linear F -test approach? In short:
I The general linear F -test involves a comparison between SSE (R) and
SSE (F ).
I SSE (R) can never be smaller than SSE (F ). It is always larger than
(or possibly the same as) SSE (F ).
Yi = β0 + εi
H0 : Yi = β0 + εi and H1 : Yi = β0 + β1 xi1 + εi
or as
H0 : β1 = 0 and H1 : β1 6= 0
For simple linear regression, it turns out that the general linear
F -test is just the same ANOVA F -test that we learned before.
The formula for each entry is summarized for you in the following
analysis of variance table:
F -test for the Slope Parameter β1
I Hypothesis Test
H0 : β1 = 0 and H1 : β1 6= 0
I Test statistic
SSR/1 MSR
F? = =
SSE /(n − 2) MSE
The degrees of freedom associated with the error sum of squares for the
full model is n − 2, and:
n
X
SSE (F ) = (Yi − Ŷi )2 = SSE
i=1
That is, the general linear F -statistic reduces to the ANOVA F -statistic:
SSE (R) − SSE (F ) SSE (F ) SSR SSE MSR
F? = ÷ = ÷ =
dfR − dfF dfF (n − 1) − (n − 2) n − 2 MSE
Sequential (or Extra) Sums of Squares
I SSE (x1 ) denotes the error sum of squares when x1 is the only
predictor in the model.
I SSR(x1 , x2 ) denotes the regression sum of squares when x1 and x2
are both in the model.
I SSR(x2 | x1 ) denotes the sequential sum of squares obtained by
adding x2 to a model already containing only the predictor x1 .
I The vertical bar “|” is read as “given” – that is, “x2 | x1 ” is read as
“x2 given x1 .”
Here are a few more examples of the notation:
Let’s try out the notation and the two alternative definitions of a
sequential sum of squares on an example.
An “Allen Cognitive Level” (ACL) Study investigated the relationship
between ACL test scores and level of psychopathology. Researchers
collected the following data on each of 69 patients in a hospital psychiatry
unit:
0.4
Corr: Corr: Corr:
ACL
0.250* 0.354** 0.521***
0.2
0.0
40
30
Vocab
Corr: Corr:
0.698*** 0.556***
20
10
40
30
Abstract
Corr:
20
0.577***
10
60
SDMT
40
20
0
4 5 6 10 20 30 40 10 20 30 40 0 20 40 60
If we estimate the regression function with Y = ACL score as the response
and x1 = Vocab as the predictor, that is, if we “regress Y = ACL on x1 =
Vocab,” we obtain:
anova(fit1)
Noting that x1 is the only predictor in the model, the output tells us that:
SSR(x1 ) = 2.691, SSE (x1 ) = 40.359, and SSTO = 43.050.
If we regress Y = ACL on x1 = Vocab and x3 = SDMT, we obtain:
##
## Call:
## lm(formula = ACL ~ Vocab + SDMT, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.55758 -0.44619 -0.01027 0.34114 1.55955
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.845292 0.324380 11.854 < 2e-16 ***
## Vocab -0.006840 0.015045 -0.455 0.651
## SDMT 0.029795 0.006803 4.379 4.35e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6883 on 66 degrees of freedom
## Multiple R-squared: 0.2736, Adjusted R-squared: 0.2516
## F-statistic: 12.43 on 2 and 66 DF, p-value: 2.624e-05
Noting that x1 and x3 are the predictors in the model. The
regression equation is
For a given data set, the total sum of squares will always be the same
regardless of the predictors in the model, because the total sum of squares
quantifies how much the observed response Yi vary around Ȳ , and it has
nothing to do with which predictors are in the model.
Now, how much have the error sum of squares (SSE ) decreased and the
regression sum of squares (SSR) increased?
Noting that x1 and x3 are the two predictors in the model, the output tells us that:
SSR(x1 , x3 ) = 11.7778 and SSE (x1 , x3 ) = 31.2717.
How much did the error sum of squares decrease – or alternatively, the
regression sum of squares increase?