You are on page 1of 18

Applied Statistical Methods, T. S.

Lu 1

LECTURE 3

1 The ANOVA Table for Straight-line Regression

An overall summary of the results of any regression analysis, whether


straight-line or nor, can be provided by a table called an analysis-
of-variance (ANOVA) table. The basic information in an ANOVA
table contains several estimates of variance.
The simplest version of the ANOVA table for straight-line regres-
sion is given below,
Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 1 6394.02269 6394.02269 21.33 <.0001


Error 28 8393.44398 299.76586
Corrected Total 29 14787

The mean-square term is equal to sum of squares divided by its


degrees of freedom.
SSY: the total unexplained variance or the total sum of squares
about the mean, representing the total variance of Y before account-
ing for the linear effect of the variable X,
n
X
SSY = (Yi − Y )2
i=1

SSE: the sum of squares of deviations of observed Y ’s from the

Fall 2022
Applied Statistical Methods, T. S. Lu 2

fitted regression line; measuring the variation in the observed Y ’s


that remains after accounting for the linear effect of X,
n
X
SSE = (Yi − Ybi)2
i=1

SSR = (SSY - SSE): the sum of squares due to regression, repre-


senting the deviation of the predicted values from the mean Y ,
n
X
SSR = (Ybi − Y )2
i=1

Thus, we have the following result,

n
X n
X n
X
(Yi − Y )2 = (Ybi − Y )2 + (Yi − Ybi)2 (1)
i=1 i=1 i=1
or
Total unexplained variation = Variation due to regression + Unex-
plained residual variation
Equation (1) is called the fundamental equation of regression
analysis, holds for any general regression case.
It can be shown that the mean-square residual and mean-square
regression terms are statistically independent of one another. Thus,
if H0 : β1 = 0 is true, the ratio of these terms represents the ratio
of two independent estimates of the same variance σ 2. Under the
normality and independence assumptions about the Y ’s, such a ratio

Fall 2022
Applied Statistical Methods, T. S. Lu 3

has the F distribution and this F test statistic can used to test the
hypothesis, H0: “No significant straight-line relationship of Y on X”
(i.e., H0 : β1 = 0).

Fall 2022
Applied Statistical Methods, T. S. Lu 4

2 Multiple Regression Analysis

Multiple regression analysis can be viewed as an extension of straight-


line regression analysis to the case where more than one independent
variable must be considered in the model. The following are rea-
sons that dealing with several independent variables simultaneously
in a regression analysis is relatively more difficult than with a single
independent variable:

1. Since there might exist several reasonable candidates, it is more


difficult to choose the best model.

2. Since it is not possible to draw the graph of the data or the


fitted model directly, it is more difficult to visualize what the
final model looks like.

3. It may be not easy to interpret some terms in the best-fitting


model.

4. It is not possible to do the computations without a computer


program.

2.1 Multiple Regression Models

Y = β0 + β1X1 + β2X2 + . . . + βk Xk +  ,

where β0, β1, . . . , βk are the it regression coefficients that need to be


Fall 2022
Applied Statistical Methods, T. S. Lu 5

estimated; X1, X2, . . . , Xk are independent variables. The linear


statistical model for an observation is

Yi = β0 + β1X1i + β2X2i + . . . + βk Xki + i .

The assumptions are a generalization of those for simple linear


regression:

1. Existence: For each specific combination of values of the inde-


pendent variables X1, X2, . . . , Xk , Y is a random variable with a
certain probability distribution having finite mean and variance.

2. Independence: The Y observations are statistically independent


of one another.

3. Linearity: The mean value of Y for each specific combination of


X1, X2, . . . , Xk is a linear function of β0, β1, . . . , βk . That is,

µY |X1,X2,...,Xk = β0 + β1X1 + β2X2 + · · · + βk Xk

4. Homoscedasticity: The variance of Y is the same and unknown


for any fixed combination of X1, X2, . . . , Xk . That is,

σY2 |X1,X2,...,Xk = Var(Y |X1, X2, . . . , Xk ) ≡ σ 2

5. Normality: For any fixed combination of X1, X2, . . . , Xk , the

Fall 2022
Applied Statistical Methods, T. S. Lu 6

variable Y is normally distributed.

Y ∼ N (µY |X1,X2,...,Xk , σ 2)

Notation: The estimated error ebi is usually called a residual,

ebi = Yi − Ybi = Yi − (βb0 + βb1X1i + βb2X2i + · · · + βbk Xki)

where βb0, βb1, . . . , βbk are sample coefficients (estimates).

2.2 Determining the Best Estimate of the Multiple Re-


gression Equation

We want to do the same things as with simple regression:

1. Evaluate assumptions

2. Fit model (estimate parameters)

3. Test hypotheses

4. Use model to predict/estimate, including measures of precision,


such as standard errors and confidence intervals

The formulas used to obtain the intercept and slope coefficients


are far more complex than those in the simple linear regression and
are best expressed using matrix algebra. We will not cover this part

Fall 2022
Applied Statistical Methods, T. S. Lu 7

in this course and instead just assume that we obtain the results form
a computer program.
The criterion used here is the same as for simple linear regression;
we use the least-squares approach to minimizing the sum of squares
of the distances between the observed responses and those predicted
by the fitted model
n
X n
X
SSE = (Yi − Ybi)2 = (Yi − βb0 − βb1X1i − βb2X2i − · · · − βbk Xki)2
i=1 i=1

Example: Suppose that we want to know how weight(WGT) varies


with height(HGT) and age(AGE) for children with a particular kind
of nutritional deficiency. The dependent variable is Y = WGT, and
two independent variables are X1 = HGT and X2 = AGE.
The following ANOVA table is based on the model of WGT, using
HGT, AGE, and (AGE)2 as independent variables.
Table 1: ANOVA table for WGT regressed on HGT, AGE, and AGE2

Source d.f. SS MS F R2
Regression k = 3 SSY - SSE = 693.06 231.02 9.47 0.7803
Residual n − k − 1 = 8 SSE = 195.19 24.40
Total n − 1 = 11 SSY = 888.25

Pn
SSY = i=1 (Yi − Y )2 = 888.25: the total sum of squares, repre-
senting the total variability in the Y observations before accounting

Fall 2022
Applied Statistical Methods, T. S. Lu 8

for the joint effect of the independent variables


Pn
SSE = i=1 (Yi − Ybi)2 = 195.19: the residual sum of squares,
representing the amount of Y variation left unexplained after the
independent variables used in the regression equation to predict Y
Pn
SSR = SSY - SSE = i=1 (Yi − Y
b )2 = 693.06: the regression sum
of squares, measuring the variation due to the independent variables

The regression degrees of freedom = k (the number of independent


variables in the model)

The residual degrees of freedom = n − k − 1

The total degrees of freedom = n − 1

SSY − SSE
R2 = (between 0 and 1): measuring how well the
SSY
fitted model containing the variables HGT, AGE, and AGE2 predicts
the dependent variable WGT.

The computer program will usually give us all of the sums of


squares, mean squares, intercept, slopes, etc.
The F-test shown in the ANOVA table tests the null hypothesis:
H0 : β1 = β2 = . . . = βk = 0
HA : at least one βj 6= 0 (j = 1, . . . , k)

Fall 2022
Applied Statistical Methods, T. S. Lu 9

So in this example, we are testing β1 = β2 = β3 = 0

data temp; *name of the new data set;


set stat.wgt; *name of the old (existed) data set;

agesq = age*age;

label WGT="Weight"
HGT="Height"
agesq="Age Squared";

run;
*above is called the data step;

proc reg data=temp;


model wgt = hgt age agesq;
run;
quit;

The REG Procedure


Model: MODEL1
Dependent Variable: WGT Weight

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 3 693.06046 231.02015 9.47 0.0052


Error 8 195.18954 24.39869
Corrected Total 11 888.25000

Root MSE 4.93950 R-Square 0.7803


Dependent Mean 62.75000 Adj R-Sq 0.6978
Coeff Var 7.87172

Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 3.43843 33.61082 0.10 0.9210


HGT 1 0.72369 0.27696 2.61 0.0310
AGE 1 2.77687 7.42728 0.37 0.7182
agesq 1 -0.04171 0.42241 -0.10 0.9238

Fall 2022
Applied Statistical Methods, T. S. Lu 10

We also want test each individual slope coefficient in order to


see which independent variables are contributing to predicting the
dependent variable. To test

H0 : βi = 0

HA : βi 6= 0

Let βi0 be the value of βi under H0; usually, βi0 = 0.


We form a t-statistic,

βbi − βi0
tobs =
sβbi

where the df is n − k − 1 and sβbi is the standard error of the estimate


for the ith slope coefficient.
A 100(1−α)% confidence interval on βi is given by βbi±t1−α/2,n−k−1sβbi .

Fall 2022
Applied Statistical Methods, T. S. Lu 11

Remarks:

1. For a particular model, we fit the parameters, perform the hy-


pothesis tests, calculate the standard errors, and estimate the
confidence intervals.

2. If you change the model, everything will be changed. Thus, never


add or remove more than one variable at a time.

3. Sometimes you may reject H0 : R2 = 0, but fail to reject any of


the individual H0 : βi = 0

4. More frequently, you may fail to reject H0 : R2 = 0, but reject


one or more H0 : βi = 0

5. Commonly, the multiple regression is used for two kinds of mod-


elling:

(a) Predictive modelling: use of the model to obtain the best


predictive values you can, whether you understand the model
or not.

(b) Analytical modelling: use of the model to try to understand


the relationships; to evaluate how independent variables are
related to the dependent variable.

Fall 2022
Applied Statistical Methods, T. S. Lu 12

2.3 Numerical Examples

Back to the example, we reject the null hypothesis at 0.05 signifi-


cance level since the p-value for the F-test is 0.0052. We now know
that at least 1 slope coefficient is non-zero, but we do not know
which one(s). We then can examine the estimates for the regression
coefficients. We can use the standard errors to obtain p-values and
confidence intervals on the parameter estimates; the fitted multiple
regression model is

Ybi = 3.44 + 0.72HGT + 2.78AGE − 0.04AGESQ

We can fit other possible model.


Model: WGT = β0 + β1HGT + β2AGE + 

proc reg data=temp;


model wgt = hgt age;
run;
quit;

The REG Procedure


Model: MODEL1
Dependent Variable: WGT

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 2 692.82261 346.41130 15.95 0.0011


Error 9 195.42739 21.71415
Corrected Total 11 888.25000

Fall 2022
Applied Statistical Methods, T. S. Lu 13

Root MSE 4.65984 R-Square 0.7800


Dependent Mean 62.75000 Adj R-Sq 0.7311
Coeff Var 7.42605

Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 6.55305 10.94483 0.60 0.5641


HGT 1 0.72204 0.26081 2.77 0.0218
AGE 1 2.05013 0.93723 2.19 0.0565

Clearly, the model above is better than the previous model if we


use R2 and model simplicity as our criteria for selecting a model.

Fall 2022
Applied Statistical Methods, T. S. Lu 14

3 Testing Hypotheses in Multiple Regression

Three basic types of tests:

1. Overall test. Does the entire set of independent variables (or the
fitted model itself) contribute significantly to predict Y ?

2. Test for addition of a single variable. Does adding one particular


independent variable contribute significantly to predict Y given
that other independent variables are already in the model?

3. Test for addition of a group of variables. Does adding some


group of independent variable contribute significantly to pre-
dict Y given that other independent variables are already in the
model?

We can answer above question by performing statistical tests of hy-


potheses.
One characteristics of the tests is that each test can be interpreted
as a comparison of two models. One of these models is always referred
as the full or complete model; the other is called the reduced model.
Consider the following two models:

Y = β0 + β1X1 + β2X2 + 

Fall 2022
Applied Statistical Methods, T. S. Lu 15

and
Y = β0 + β1 X 1 + 

Under H0 : β2 = 0, the full model reduces to the reduced model. A


test of H0 : β2 = 0 is then equivalent to determining which of these
two models is more appropriate.

3.1 Test for Significant Overall Regression

Consider the first question, an overall test for a model containing k


independent variables,

Y = β0 + β1X1 + β2X2 + · · · + βk Xk + 

We can express the null hypothesis as


H0: All k independent variables considered together do not explain
a significant amount of the variation in Y
or
H0: There is no significant overall regression using all k independent
variables in the model
or
H0 : β1 = β2 = . . . = βk = 0
We use the mean-square quantities provided in the ANOVA table to

Fall 2022
Applied Statistical Methods, T. S. Lu 16

perform the test. First, the F statistic is

RegressionM S (SSY - SSE)/k


F = = ,
ResidualM S SSE/(n − k − 1)
Pn 2
Pn b 2
where SSY = (Y
i=1 i − Y ) and SSE = i=1 (Yi − Yi ) . The

computed F can be compared with the critical point Fk,n−k−1,1−α ,


where α is the preselected significance level. We reject H0 if the
computed F exceeds the critical value.
Using the previous example, the regression of WGT on HGT,
AGE, AGE2, we have k = 3. From the formula, we have F = 9.47.
The critical point for α = 0.01 is F3,8,0.99 = 7.59. Since the p-value
is less than 0.01, we would reject H0 at α = 0.01. When interpret-
ing the results, we can conclude that the set of independent vari-
ables HGT, AGE, AGE2 significantly contribute to predict WGT.
However, this conclusion does not mean that all three variables are
needed for predict Y ; maybe only one or two are sufficient. In other
words, we may need further tests to determine the final model.

3.2 Partial F Test

Suppose that we want to test whether adding a variable X ∗ sig-


nificantly improves the prediction of Y , given that the variables,
X1, X2, . . . , Xk are in the model. The null hypothesis can be then

Fall 2022
Applied Statistical Methods, T. S. Lu 17

stated as H0 : β ∗ = 0 in the model Y = β0 + β1X1 + β2X2 + · · · +


βk Xk + β ∗X ∗ + .
We first compute the extra sum of squares from adding X ∗ given
X1, X2, . . . , Xk , using the following formula,

SS(X ∗|X1, X2, . . . , Xk ) = SSR(X1, X2, . . . , Xk , X ∗)−SSR(X1, X2, . . . , Xk )

Again, using the previous example, we have

SS(X3|X1, X2) = SSR(X1, X2, X3) − SSR(X1, X2)

= 693.06 − 692.82

= 0.24

We compute

∗ SS(X ∗|X1, X2, . . . , Xk )


F (X |X1, X2, . . . , Xk ) =
Residual MS(X1, X2, . . . , Xk , X ∗)

for testing the null hypothesis that the addition of X ∗ to a model con-
taining X1, X2, . . . , Xk does not significantly improve the prediction
of Y . For our example, the partial F statistic is

SS(X3|X1, X2) 0.24


F (X3|X1, X2) = = = 0.01
Residual MS(X1, X2, X3) 24.40

The statistic F (X3|X1, X2) is 0.01 so that we fail to reject H0 re-


gardless of the significance level. We therefore conclude that adding

Fall 2022
Applied Statistical Methods, T. S. Lu 18

X3 = AGE2 to the model already containing HGT and AGE does


not improve.

3.3 t Test Alternative

Performing the partial F test is equivalent to use a t test. The t test


focuses on testing the null hypothesis H0 : β ∗ = 0. The test statistic
βb∗ βb3
is T = Sβb∗ . Using the results in our example, we compute T = Sβb =
3
−0.0417 2
0.4224 = −0.10. Clearly, T = 0.01 = partial F (X3|X1, X2).

3.4 Multiple Partial F Test

This testing procedure is to assess the additional contribution of two


or more independent variables adding to the model already contain-
ing other variables. The test procedure is a straight-forward exten-
sion of the partial F test.

Fall 2022

You might also like