Professional Documents
Culture Documents
Cochran's Theorem provides the basis for the distribution theory. Under
Suppose SSTO=Q1 + Q2 + . . . + Qs
sigma**2
The Q's are independent chi-square random variables if and only if the
sum of the df of the Q's add to N-1.
Chapter 3
Multiple Linear Regression
Figure 3.1 (a) The regression plane for the model E(y)=
50+10x1+7x2 . (b) The contour plot.
Notation
n – number of observations available
k – number of regressor variables
y – response or dependent variable
xij – ith observation or level of regressor j.
The ordinary least squares estimators are the solutions to the normal
equations.
Linear Regression Analysis 6E Montgomery, Peck 14
& Vining
Page 102
Example 3-1.
The Delivery
Time Data
Figure 3.4
Scatterplot matrix
for the delivery
time data from
Example 3.1.
MINITAB Output
• Statistical Properties
E (βˆ ) = β
Cov (βˆ ) = σ 2 (X' X )−1
• Variances/Covariances
Var (βˆ j ) = σ 2 C jj
Cov(βˆ i , βˆ j ) = σ 2 C ij
3.2.4 Estimation of σ2
ˆ
= y'y − β'X'y
SSRes
• The residual mean square for the model with p parameters
is:
SS Re s 2
MS Re s = = σ̂
n− p
Linear Regression Analysis 6E Montgomery, Peck 30
& Vining
Page 118
3.2.4 Estimation of σ2
• Recall that the estimator of σ2 is model dependent - that
is, change the form of the model and the estimate of σ2
will invariably change.
– Note that the variance estimate is a function of the errors;
“unexplained noise about the fitted regression line”
SS R / k MS R
F0 = =
SS Re s /(n − k − 1) MS Re s
ANOVA Table:
Reject H0 if
• Adjusted R2
– Penalizes you for added terms to the model that are not significant
2 SS Re s /(n − p )
Radj = 1−
SST /(n − 1)
• Adjusted R2 Example
– Say SST = 245.00, n = 15
– Suppose that for a model with three regressors, the SSres = 90.00,
then
2 90 /(15 − 4)
Radj = 1− = 0.53
245 /(15 − 1)
– Now suppose that a fourth regressor has been added, and the SSRes
= 88.00
2 88 /(15 − 5)
Radj = 1− = 0.49
245 /(15 − 1)
Linear Regression Analysis 6E Montgomery, Peck 47
& Vining
Page 135
The Extra Sum of Squares method can also be used to test hypotheses
on individual model parameters or groups of parameters
Full model
Example
• Consider a dataset with four regressor variables and a single response.
• Fit the equation with all regressors and find that:
Example
• The model must be refit with the insignificant regressors left
out of the model.
• The regression equation is
y = - 24.9 + 0.0117x1 + 31.0x2 - 0.217x4
• The refitting must be done since the coefficient estimates for
an individual regressor depend on all of the regressors, xj
Example
βˆ j − tα / 2,n − p σˆ 2C jj ≤ β j ≤ βˆ j + tα / 2,n − p σˆ 2C jj
Or,
• 100(1-α) percent CI on the mean response at the point x01, x02, …, x0k is
From this result, the joint confidence region for all parameters in β is
• The set of points x (not necessarily data points used to fit the
model) that satisfy
Example 3.13
Consider prediction or
estimation at:
#9
d
c
a
b
Figure 3.10 Scatterplot of cases and distance for the delivery time data.
Linear Regression Analysis 6E Montgomery, Peck 71
& Vining
Page 159
where
3.11 Multicollinearity
• A serious problem that may dramatically impact the usefulness of a
regression model is multicollinearity, or near-linear dependence
among the regression variables.
• Multicollinearity implies near-linear dependence among the
regressors. The regressors are the columns of the X matrix, so clearly
an exact linear dependence would result in a singular X’X.
• The presence of multicollinearity can dramatically impact the ability
to estimate regression coefficients and other uses of the regression
model.
3.11 Multicollinearity
3.11 Multicollinearity
• The main diagonal elements of the inverse of the X’X
matrix in correlation form (W’W)-1 are often called
variance inflation factors VIFs, and they are an
important multicollinearity diagnostic.
• For the soft drink delivery data,
3.11 Multicollinearity
• The variance inflation factors can also be written as:
• Regression coefficients may have the wrong sign for the following
reasons:
1. The range of some of the regressors is too small.
2. Important regressors have not been included in the model.
3. Multicollinearity is present.
4. Computational errors have been made.