You are on page 1of 90

Page 87

Cochran's Theorem provides the basis for the distribution theory. Under

assumption of normality of error term in the Regression model,

SSTO/sigma**2 follows a chi-square distribution with N-1 degrees of


freedom (df).

Suppose SSTO=Q1 + Q2 + . . . + Qs
sigma**2

The Q's are independent chi-square random variables if and only if the
sum of the df of the Q's add to N-1.

Distribution theory: If Q1 has q1 df and Q2 has q2 df and Q1 and Q2 are


independent then Q1/q1 divided by Q2/q2 follows an F distribution with
q1 and q2 df.
Page 88

Chapter 3
Multiple Linear Regression

Linear Regression Analysis 6E Montgomery, Peck 1


& Vining
Page 89

3.1 Multiple Regression Models


• Suppose that the yield in pounds of conversion in a chemical process
depends on temperature and the catalyst concentration. A multiple
regression model that might describe this relationship is

• This is a multiple linear regression model in two variables.

Linear Regression Analysis 6E Montgomery, Peck 2


& Vining
Page 90

3.1 Multiple Regression Models

Figure 3.1 (a) The regression plane for the model E(y)=
50+10x1+7x2 . (b) The contour plot.

Linear Regression Analysis 6E Montgomery, Peck 3


& Vining
Page 91

3.1 Multiple Regression Models

In general, the multiple linear regression model with k regressors is

Linear Regression Analysis 6E Montgomery, Peck 4


& Vining
Page 92

3.1 Multiple Regression Models

Linear Regression Analysis 6E Montgomery, Peck 5


& Vining
Page 93

3.1 Multiple Regression Models

Linear regression models may also contain interaction effects:

If we let x3 = x1x2 and β3 = β12, then the model can be written in


the form

Linear Regression Analysis 6E Montgomery, Peck 6


& Vining
Page 94

3.1 Multiple Regression Models

Linear Regression Analysis 6E Montgomery, Peck 7


& Vining
Page 95

Linear Regression Analysis 6E Montgomery, Peck 8


& Vining
Page 96

3.2 Estimation of the Model Parameters


3.2.1 Least Squares Estimation of the Regression Coefficients

Notation
n – number of observations available
k – number of regressor variables
y – response or dependent variable
xij – ith observation or level of regressor j.

Linear Regression Analysis 6E Montgomery, Peck 9


& Vining
Page 97

3.2.1 Least Squares Estimation of Regression Coefficients

Linear Regression Analysis 6E Montgomery, Peck 10


& Vining
Page 98

3.2.1 Least Squares Estimation of the Regression Coefficients

The sample regression model can be written as

Linear Regression Analysis 6E Montgomery, Peck 11


& Vining
Page 99

3.2.1 Least Squares Estimation of the Regression Coefficients

The least squares function is

The function S must be minimized with respect to the coefficients.

Linear Regression Analysis 6E Montgomery, Peck 12


& Vining
Page 100

3.2.1 Least Squares Estimation of the Regression Coefficients

The least squares estimates of the coefficients must satisfy

Linear Regression Analysis 6E Montgomery, Peck 13


& Vining
Page 101

3.2.1 Least Squares Estimation of the Regression Coefficients


Simplifying, we obtain the least squares normal equations:

The ordinary least squares estimators are the solutions to the normal
equations.
Linear Regression Analysis 6E Montgomery, Peck 14
& Vining
Page 102

3.2.1 Least Squares Estimation of the Regression Coefficients


Matrix notation is typically used:
Let
where

Linear Regression Analysis 6E Montgomery, Peck 15


& Vining
Page 103

3.2.1 Least Squares Estimation of the Regression Coefficients

Linear Regression Analysis 6E Montgomery, Peck 16


& Vining
Page 104

3.2.1 Least Squares Estimation of the Regression Coefficients

These are the least-squares normal equations. The solution is

Linear Regression Analysis 6E Montgomery, Peck 17


& Vining
Page 105

3.2.1 Least Squares Estimation of the Regression Coefficients

Linear Regression Analysis 6E Montgomery, Peck 18


& Vining
Page 106

3.2.1 Least Squares Estimation of the Regression Coefficients

The n residuals can be written in matrix form as

There will be some situations where an alternative form is useful

Linear Regression Analysis 6E Montgomery, Peck 19


& Vining
Page 107

Example 3-1. The Delivery


Time Data

The model of interest is


y = β0 + β1x1+ β2x2 + ε

Linear Regression Analysis 6E Montgomery, Peck 20


& Vining
Page 108

Example 3-1.
The Delivery
Time Data

Figure 3.4
Scatterplot matrix
for the delivery
time data from
Example 3.1.

Linear Regression Analysis 6E Montgomery, Peck 21


& Vining
Page 109

Example 3-1 The Delivery Time Data

Figure 3.5 Three-dimensional scatterplot of the delivery time data


from Example 3.1.
Linear Regression Analysis 6E Montgomery, Peck 22
& Vining
Page 110

Example 3-1 The


Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck 23


& Vining
Page 111

Example 3-1 The Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck 24


& Vining
Page 112

Example 3-1 The Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck 25


& Vining
Page 113

Linear Regression Analysis 6E Montgomery, Peck 26


& Vining
Page 114

MINITAB Output

Linear Regression Analysis 6E Montgomery, Peck 27


& Vining
Page 115

3.2.2 Geometrical Interpretation of Least Squares

Linear Regression Analysis 6E Montgomery, Peck 28


& Vining
Page 116

3.2.3 Properties of Least-Squares Estimators

• Statistical Properties
E (βˆ ) = β
Cov (βˆ ) = σ 2 (X' X )−1

• Variances/Covariances
Var (βˆ j ) = σ 2 C jj
Cov(βˆ i , βˆ j ) = σ 2 C ij

Linear Regression Analysis 6E Montgomery, Peck 29


& Vining
Page 117

3.2.4 Estimation of σ2

• The residual sum of squares can be shown to be:

ˆ
= y'y − β'X'y
SSRes
• The residual mean square for the model with p parameters
is:

SS Re s 2
MS Re s = = σ̂
n− p
Linear Regression Analysis 6E Montgomery, Peck 30
& Vining
Page 118

3.2.4 Estimation of σ2
• Recall that the estimator of σ2 is model dependent - that
is, change the form of the model and the estimate of σ2
will invariably change.
– Note that the variance estimate is a function of the errors;
“unexplained noise about the fitted regression line”

Linear Regression Analysis 6E Montgomery, Peck 31


& Vining
Page 119

Example 3.2 Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck 32


& Vining
Page 120

Example 3.2 Delivery Time Data

Linear Regression Analysis 6E Montgomery, Peck 33


& Vining
Page 121

3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression

• Scatter diagrams of the regressor variable(s) against the


response may be of little value in multiple regression.
– These plots can actually be misleading
– If there is an interdependency between two or more regressor
variables, the true relationship between xi and y may be
masked.

Linear Regression Analysis 6E Montgomery, Peck 34


& Vining
Page 122

An Illustration of the Inadequacy of Scatter Diagrams in


Multiple Regression

Linear Regression Analysis 6E Montgomery, Peck 35


& Vining
Page 123

3.3 Hypothesis Testing in Multiple Linear Regression

Once we have estimated the parameters in the model, we


face two immediate questions:
1. What is the overall adequacy of the model?
2. Which specific regressors seem important?

Linear Regression Analysis 6E Montgomery, Peck 36


& Vining
Page 124

3.3 Hypothesis Testing in Multiple Linear Regression

This section considers four cases:

• Test for Significance of Regression (sometimes called the global test


of model adequacy)
• Tests on Individual Regression Coefficients (or groups of
coefficients)
• Special Case of Hypothesis Testing with Orthogonal Columns in X
• Testing the General Linear Hypothesis

Linear Regression Analysis 6E Montgomery, Peck 37


& Vining
Page 125

3.3.1 Test for Significance of Regression


• The test for significance is a test to determine if there is a
linear relationship between the response and any of the
regressor variables
• The hypotheses are
H0: β1 = β2 = …= βk = 0
H1: βj ≠ 0 for at least one j

Linear Regression Analysis 6E Montgomery, Peck 38


& Vining
Page 126

3.3.1 Test for Significance of Regression


• As in Chapter 2, the total sum of squares can be partitioned in two parts:

SST = SSR + SSRes

• This leads to an ANOVA procedure with the test (F) statistic

SS R / k MS R
F0 = =
SS Re s /(n − k − 1) MS Re s

Linear Regression Analysis 6E Montgomery, Peck 39


& Vining
Page 127

3.3.1 Test for Significance of Regression


• The standard ANOVA is conducted with

Linear Regression Analysis 6E Montgomery, Peck 40


& Vining
Page 128

3.3.1 Test for Significance of Regression

ANOVA Table:

Reject H0 if

Linear Regression Analysis 6E Montgomery, Peck 41


& Vining
Page 129

Linear Regression Analysis 6E Montgomery, Peck 42


& Vining
Example 3.3 Delivery Time Data
Page 130

Linear Regression Analysis 6E Montgomery, Peck 43


& Vining
Page 131

Example 3.3 Delivery Time Data

To test H0: β1 = β2 = 0, we calculate the F–statistic:

Linear Regression Analysis 6E Montgomery, Peck 44


& Vining
Example 3.3 Delivery Time Data
Page 132

Linear Regression Analysis 6E Montgomery, Peck 45


& Vining
Page 133

3.3.1 Test for Significance of Regression


• R2
– R2 is calculated exactly as in simple linear regression
– R2 can be inflated simply by adding more terms to the model (even
insignificant terms)

• Adjusted R2
– Penalizes you for added terms to the model that are not significant

2 SS Re s /(n − p )
Radj = 1−
SST /(n − 1)

Linear Regression Analysis 6E Montgomery, Peck 46


& Vining
Page 134

3.3.1 Test for Significance of Regression

• Adjusted R2 Example
– Say SST = 245.00, n = 15
– Suppose that for a model with three regressors, the SSres = 90.00,
then
2 90 /(15 − 4)
Radj = 1− = 0.53
245 /(15 − 1)

– Now suppose that a fourth regressor has been added, and the SSRes
= 88.00
2 88 /(15 − 5)
Radj = 1− = 0.49
245 /(15 − 1)
Linear Regression Analysis 6E Montgomery, Peck 47
& Vining
Page 135

3.3.2 Tests on Individual Regression Coefficients

• Hypothesis test on any single regression coefficient:


H0 : β j = 0
• Test Statistic: H1 : β j ≠ 0
βˆ j βˆ j See Example 3.4,
t0 = =
σˆ 2 C jj se(βˆ j ) pg. 90, text
– Reject H0 if |t0| > t α / 2,n − k −1
– This is a partial or marginal test!

Linear Regression Analysis 6E Montgomery, Peck 48


& Vining
Page 136

The Extra Sum of Squares method can also be used to test hypotheses
on individual model parameters or groups of parameters

Full model

Linear Regression Analysis 6E Montgomery, Peck 49


& Vining
Page 137

Linear Regression Analysis 6E Montgomery, Peck 50


& Vining
Page 138

Linear Regression Analysis 6E Montgomery, Peck 51


& Vining
Page 139

3.3.3 Special Case of Orthogonal Columns in X

• If the columns X1 are orthogonal to the columns in X2, the


sum of squares due to β2 that is free of any dependence on
the regressors in X1.

Linear Regression Analysis 6E Montgomery, Peck 52


& Vining
Page 140

Example
• Consider a dataset with four regressor variables and a single response.
• Fit the equation with all regressors and find that:

y = - 19.9 + 0.0123x1 + 27.3x2 - 0.0655x3 - 0.196x4

• Looking at the t-tests, suppose that x3 is insignificant. So it is removed. What is


the equation now?
• Generally, it is not

y = - 19.9 + 0.0123x1 + 27.3x2 - 0.196x4

Linear Regression Analysis 6E Montgomery, Peck 53


& Vining
Page 141

Example
• The model must be refit with the insignificant regressors left
out of the model.
• The regression equation is
y = - 24.9 + 0.0117x1 + 31.0x2 - 0.217x4
• The refitting must be done since the coefficient estimates for
an individual regressor depend on all of the regressors, xj

Linear Regression Analysis 6E Montgomery, Peck 54


& Vining
Page 142

Example

• However, if the columns are orthogonal to each other,


then there is no need to refit.

• Can you think of some situations where we would have


orthogonal columns?

Linear Regression Analysis 6E Montgomery, Peck 55


& Vining
Page 143

3.4.1. Confidence Intervals on the Regression Coefficients

A 100(1-α) percent C.I. for the regression coefficient, βj is:

βˆ j − tα / 2,n − p σˆ 2C jj ≤ β j ≤ βˆ j + tα / 2,n − p σˆ 2C jj

Or,

βˆ j − tα / 2,n − p se( βˆ j ) ≤ β j ≤ βˆ j + tα / 2,n − p se( βˆ j )

Linear Regression Analysis 6E Montgomery, Peck 56


& Vining
Page 144

Linear Regression Analysis 6E Montgomery, Peck 57


& Vining
Page 145

3.4.2. Confidence Interval Estimation of the Mean Response

• 100(1-α) percent CI on the mean response at the point x01, x02, …, x0k is

ˆy0 − tα / 2,n − p σˆ 2 x '0 ( X'X)−1 x0 ≤ E ( y | x0 )


ˆ ˆ
≤ y0 + tα / 2,n − p σ x '0 ( X'X) x0
2 −1

• See Example 3-9 on page 95 and the discussion that follows

Linear Regression Analysis 6E Montgomery, Peck 58


& Vining
Page 146

Linear Regression Analysis 6E Montgomery, Peck 59


& Vining
Page 147

Linear Regression Analysis 6E Montgomery, Peck 60


& Vining
Page 148

Linear Regression Analysis 6E Montgomery, Peck 61


& Vining
Page 149

3.4.3. Simultaneous Confidence Intervals on Regression Coefficients

It can be shown that

(βˆ − β) ' X ' X(βˆ − β)


~ Fp ,n − p
pMS Re s

From this result, the joint confidence region for all parameters in β is

(βˆ − β) ' X ' X(βˆ − β)


≤ Fα , p ,n − p
pMS Re s
Linear Regression Analysis 6E Montgomery, Peck 62
& Vining
Page 150

3.5 Prediction of New Observations

• A 100(1-α) percent prediction interval for a future


observation is

yˆ 0 − tα / 2,n − p σˆ 2 (1 + x '0 ( X'X) −1 x 0 ) ≤ y0


≤ yˆ 0 + tα / 2,n − p σˆ 2 (1 + x '0 ( X'X) −1 x 0 )

Linear Regression Analysis 6E Montgomery, Peck 63


& Vining
Page 151

Linear Regression Analysis 6E Montgomery, Peck 64


& Vining
Page 152

3.6 Hidden Extrapolation in Multiple Regression


• In prediction, exercise care about potentially extrapolating beyond the
region containing the original observations.

Figure 3.10 An example of


extrapolation in multiple
regression.

Linear Regression Analysis 6E Montgomery, Peck 65


& Vining
Page 153

3.6 Hidden Extrapolation in Multiple Regression

• We will define the smallest convex set containing all of the


original n data points (xi1, xi2, … xik), i = 1, 2, …, n, as the
regressor variable hull RVH.

• If a point x01, x02, …, x0k lies inside or on the boundary of the


RVH, then prediction or estimation involves interpolation, while
if this point lies outside the RVH, extrapolation is required.

Linear Regression Analysis 6E Montgomery, Peck 66


& Vining
Page 154

3.6 Hidden Extrapolation in Multiple Regression

• Diagonal elements of the matrix H = X(X’X)-1X’ can aid in


determining if hidden extrapolation exists:

• The set of points x (not necessarily data points used to fit the
model) that satisfy

is an ellipsoid enclosing all points inside the RVH.

Linear Regression Analysis 6E Montgomery, Peck 67


& Vining
Page 155

Two Interesting Examples, Continued


• 3.6 A Multiple Regression Model for the Patient
Satisfaction Data
– Does adding patient age to the model improve the fit?
– Is there any indication that this model is a better predictor?
• 3.7 Does Pitching and Defense Win Baseball Games?
– Does adding number of errors to the model improve the fit?
– Is there any indication that this model is a better predictor?

Linear Regression Analysis 6E Montgomery, Peck 68


& Vining
Page 156

3.9 Hidden Extrapolation in Multiple Regression

• Let x0 be a point at which prediction or estimation is of interest.


Then

• If h00 > hmax then the point is a point of extrapolation.

Linear Regression Analysis 6E Montgomery, Peck 69


& Vining
Page 157

Example 3.13

Consider prediction or
estimation at:

Linear Regression Analysis 6E Montgomery, Peck 70


& Vining
Page 158

#9
d

c
a
b

Figure 3.10 Scatterplot of cases and distance for the delivery time data.
Linear Regression Analysis 6E Montgomery, Peck 71
& Vining
Page 159

3.10 Standardized Regression Coefficients


• It is often difficult to directly compare regression coefficients due to
possible varying dimensions.
• It may be beneficial to work with dimensionless regression
coefficients.
• Dimensionless regression coefficients are often referred to as
standardized regression coefficients.
• Two common methods of scaling:
1. Unit normal scaling
2. Unit length scaling

Linear Regression Analysis 6E Montgomery, Peck 72


& Vining
Page 160

3.10 Standardized Regression Coefficients

Unit Normal Scaling


The first approach employs unit normal scaling for the
regressors and the response variable. That is,

Linear Regression Analysis 6E Montgomery, Peck 73


& Vining
Page 161

3.10 Standardized Regression Coefficients

Unit Normal Scaling

where

Linear Regression Analysis 6E Montgomery, Peck 74


& Vining
Page 162

3.10 Standardized Regression Coefficients

Unit Normal Scaling


• All of the scaled regressors and the scaled response have sample mean
equal to zero and sample variance equal to 1.
• The model becomes

• The least squares estimator:

Linear Regression Analysis 6E Montgomery, Peck 75


& Vining
Page 163

3.10 Standardized Regression Coefficients


Unit Length Scaling
• In unit length scaling:

Linear Regression Analysis 6E Montgomery, Peck 76


& Vining
Page 164

3.10 Standardized Regression Coefficients


Unit Length Scaling
• Each regressor has mean 0 and length

• The regression model becomes

• The vector of least squares regression coefficients:

Linear Regression Analysis 6E Montgomery, Peck 77


& Vining
Page 165

3.10 Standardized Regression Coefficients

Unit Length Scaling


• In unit length scaling, the W’W matrix is in the form of a correlation
matrix:

where rij is the simple correlation between xi and xj.


Linear Regression Analysis 6E Montgomery, Peck 78
& Vining
Page 166

3.10 Standardized Regression Coefficients

Linear Regression Analysis 6E Montgomery, Peck 79


& Vining
Example 3.14
Page 167

Linear Regression Analysis 6E Montgomery, Peck 80


& Vining
Example 3.14
Page 168

Linear Regression Analysis 6E Montgomery, Peck 81


& Vining
Example 3.14
Page 169

Linear Regression Analysis 6E Montgomery, Peck 82


& Vining
Example 3.14
Page 170

Linear Regression Analysis 6E Montgomery, Peck 83


& Vining
Page 171

3.11 Multicollinearity
• A serious problem that may dramatically impact the usefulness of a
regression model is multicollinearity, or near-linear dependence
among the regression variables.
• Multicollinearity implies near-linear dependence among the
regressors. The regressors are the columns of the X matrix, so clearly
an exact linear dependence would result in a singular X’X.
• The presence of multicollinearity can dramatically impact the ability
to estimate regression coefficients and other uses of the regression
model.

Linear Regression Analysis 6E Montgomery, Peck 84


& Vining
Page 172

3.11 Multicollinearity

Linear Regression Analysis 6E Montgomery, Peck 85


& Vining
Page 173

3.11 Multicollinearity
• The main diagonal elements of the inverse of the X’X
matrix in correlation form (W’W)-1 are often called
variance inflation factors VIFs, and they are an
important multicollinearity diagnostic.
• For the soft drink delivery data,

Linear Regression Analysis 6E Montgomery, Peck 86


& Vining
Page 174

3.11 Multicollinearity
• The variance inflation factors can also be written as:

where R2j is the coefficient of multiple determination


obtained from regressing xj on the other regressor variables.
• If xj is highly correlated with any other regressor variable,
then R2j will be large.

Linear Regression Analysis 6E Montgomery, Peck 87


& Vining
Page 175

3.12 Why Do Regression Coefficients Have The Wrong Sign?

• Regression coefficients may have the wrong sign for the following
reasons:
1. The range of some of the regressors is too small.
2. Important regressors have not been included in the model.
3. Multicollinearity is present.
4. Computational errors have been made.

Linear Regression Analysis 6E Montgomery, Peck 88


& Vining
Page 176

3.12 Why Do Regression Coefficients Have The Wrong Sign?

Linear Regression Analysis 6E Montgomery, Peck 89


& Vining

You might also like