3 Linear Regression 2

9/24/2021
Assessing the Accuracy of the Model
• Once we have rejected the null hypothesis in favor of the

alternative hypothesis, it is natural to want to quantify the extent
to which the model fits the data.
• The quality of a linear regression fit is typically assessed using

two related quantities:
1. The residual standard error (RSE) and
2. The R2 statistic.
11
11
1. Residual Standard Error

• Recall from the population line equation that associated with
each observation is an error term .
• Due to the presence of these error terms, even if we knew the
true regression line, we would not be able to perfectly predict Y
from X
• The RSE is an estimate of the standard deviation of .
• It is the average amount that the response will deviate from the true
regression line.
12
12
1
9/24/2021
The least squares model for the regression of number of units

sold on TV advertising budget.
• Residual standard error = 3.26

• In other words, actual sales in each market deviate from the true
regression line by approximately 3,260 units, on average.
• In the advertising data set, the mean value of sales over all markets is
approximately 14,000 units, and so the percentage error is 3,260/14,000 =
23%.
• RSE is considered a measure of the lack of fit of the model to the data
• Since the RSE is measured in the units of Y, it is not always clear

what constitutes a good RSE
13
13
2. Coefficient of Determination: R2
• Some of the variation in Y can be explained by variation in the
X’s and some cannot.
• R2 measures the proportion of variability in Y that can be

explained using X.
𝑅𝑆𝑆
𝑅2 = 1 −
σ𝑛𝑖=1 𝑌𝑖 − 𝑌ത 2
• R2 is always between 0 and 1. Zero means no variance has been

explained. One means it has all been explained (perfect fit to the
data).
14
14
2
9/24/2021
• An R2 statistic that is close to 1 indicates:

• that a large proportion of the variability in the response has been
explained by the regression.
• A number near 0 indicates:

• that the regression did not explain much of the variability in the response;
this might occur because the linear model is wrong, or the inherent error
σ2 is high, or both.
• R2 = 0.612
• just under two-thirds of the variability in sales is explained by a linear
regression on TV.
• In the simple linear regression setting, R2 = r2
15
15
Multiple Linear Regression (MLR)

• Simple linear regression is a useful approach for predicting a
response on the basis of a single predictor variable. However, in
practice we often have more than one predictor.
• How can we extend our analysis to accommodate additional

predictors?
• One option is to run separate simple linear regressions for each

predictor; not entirely satisfactory.
1. Difficult to make a single prediction given the different predictors.
2. Each predictor ignores the other predictors. What if the various
predictors are correlated ➔ can lead to very misleading estimates of
the individual predictor's effects on response.
16
16
3
9/24/2021
Multiple Linear Regression (2)
Population Yi = b0 + b1X1 + b2 X2 + + bp Xp +e
line
Least Squares
line
Yˆi = b̂0 + b̂1 X1 + b̂2 X2 + + b̂ p X p
• The parameters in the linear regression model are very easy to

interpret.
• 0 is the intercept (i.e. the average value for Y if all the X’s are
zero), j is the slope for the jth variable Xj
• j is the average increase in Y when Xj is increased by one unit
and all other X’s are held constant.
17
17
MLR on advertising data
18
18
4
9/24/2021
Observations
• Simple and multiple regression coefficients can be quite different.
• Does it make sense for the multiple regression to suggest no
relationship between sales and newspaper while the simple
linear regression implies the opposite?
• So newspaper sales are a surrogate for radio advertising;

newspaper gets “credit” for the effect of radio on sales.
• Almost all the explaining that Newspapers could do in simple regression
has already been done by TV and Radio in multiple regression!
19
19
20
20

3 Linear Regression 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Linear Regression 2

Uploaded by

Copyright:

Available Formats

9/24/2021

Assessing the Accuracy of the Model

• Once we have rejected the null hypothesis in favor of the

• The quality of a linear regression fit is typically assessed using

1. Residual Standard Error

The least squares model for the regression of number of units

• Residual standard error = 3.26

• Since the RSE is measured in the units of Y, it is not always clear

• R2 measures the proportion of variability in Y that can be

• R2 is always between 0 and 1. Zero means no variance has been

• An R2 statistic that is close to 1 indicates:

• A number near 0 indicates:

• In the simple linear regression setting, R2 = r2

Multiple Linear Regression (MLR)

• How can we extend our analysis to accommodate additional

• One option is to run separate simple linear regressions for each

Multiple Linear Regression (2)

• The parameters in the linear regression model are very easy to

MLR on advertising data

• So newspaper sales are a surrogate for radio advertising;

You might also like