You are on page 1of 84

A PowerPoint Presentation Package to Accompany

Applied Statistics in Business &


Economics, 6th edition

David P. Doane and Lori E. Seward

Prepared by Lloyd R. Jaisingh

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-1
Chapter 12
Simple Regression

Chapter Contents

12.1 Visual Displays and Correlation Analysis


12.2 Simple Regression
12.3 Regression Models
12.4 Ordinary Least Squares Formulas
12.5 Tests for Significance
12.6 Analysis of Variance: Overall Fit
12.7 Confidence and Prediction Intervals for Y

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-2
Chapter 12
Simple Regression (continued)

Chapter Contents, continued

12.8 Residual Tests


12.9 Unusual Observations
12.10 Other Regression Problems (Optional)
12.11 Logistic Regression (Optional)

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-3
Chapter 12
Simple Regression (continued, 2)

Chapter Learning Objectives (LOs)

LO12-1: Calculate and test a correlation coefficient for significance.


LO12-2: Interpret a regression equation and use it to make predictions.
LO12-3: Explain the form and assumptions of a simple regression model.
LO12-4: Explain the least squares method, apply formulas for coefficients,
and interpret R2.
LO12-5: Construct confidence intervals and test hypotheses for the slope
and intercept.
LO12-6: Interpret the ANOVA table and use it to compute F, R2, and
standard error.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-4
Chapter 12
Simple Regression (continued, 3)

Chapter Learning Objectives (LOs), continued

LO12-7: Distinguish between confidence and prediction intervals for Y.


LO12-8: Calculate residuals and perform tests of regression
assumptions.
LO12-9: Identify unusual residuals and tell when they are outliers.
LO12-10: Define leverage and identify high-leverage observations.
LO12-11: Improve data conditioning and use transformations if needed
(optional).
LO12-12: Identify when logistic regression is appropriate and calculate
predictions for a binary response variable.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-5
Chapter 12
12.1 Visual Displays and Correlation
Analysis
LO12-1: Calculate and test a correlation coefficient for
significance.

Visual Displays
 Begin the analysis of bivariate data (i.e., two variables) with a
scatter plot.
 A scatter plot
 displays each observed data pair (xi, yi) as a dot on an X-Y
grid.
 indicates visually the strength of the relationship or
association between the two variables.
 A scatter plot is typically the precursor to more complex analytical
techniques.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-6
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued).

Visual Displays, continued


 Figure 12.1 shows a scatter plot comparing the average price per
gallon of regular unleaded gasoline to the average price per
gallon of premium gasoline for all 50 states.

Figure 12.1 shows a


Positive association.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-7
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 2).

Correlation Coefficient, r
• A visual display is a good first step in analysis, but we
would also like to quantify the strength of the association
between two variables.
• Therefore, accompanying the scatter plot is the sample
correlation coefficient (also called the Pearson
correlation coefficient.)
• This statistic measures the degree of linearity in the
relationship between two random variables X and Y and
is denoted r.
• Its value will fall in the interval [−1, 1].

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-8
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 3).

Correlation Coefficient, r (continued)


• When r is near 0, there is little or no linear
relationship between X and Y.
• An r value near +1 indicates a strong positive
relationship, while an r value near −1 indicates a
strong negative relationship.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-9
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 4).

Correlation Coefficient, r (continued)


• The formula used to compute the linear correlation coefficient is
shown below.

Note: -1 ≤ r ≤ +1

r = 0 indicates no linear relationship

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-10
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 5).

Correlation Coefficient, r (continued)


• To simplify the notation here and elsewhere in this
chapter, we define three terms called sums of squares:

• Using this notation, the formula for the sample


correlation coefficient can be written as:

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-11
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 6).
Examples of Scatter Plots Showing Various Correlation Values

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-12
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 7).

Tests for Significant Correlation Using Student’s t


 Step 1: State the Hypotheses
Determine whether you are using a one or two-tailed test
and the level of significance (a).
H0: r = 0 (population correlation coefficient equals zero)
H 1: r ≠ 0
 Step 2: Specify the Decision Rule
For degrees of freedom df = n -2, look up the critical value ta
in Appendix D.
• Note: r is an estimate of the population correlation coefficient r (rho).

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-13
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 8).

Tests for Significant Correlation Using Student’s t


(continued)
 Step 3: Calculate the Test Statistic

 Step 4: Make the Decision


Reject H0 if t > ta/2 or if t < -ta/2
 Also, reject H0 the if the p-value  a.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-14
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 9).
Critical Value for Correlation Coefficient (Tests for
Significance)
 Equivalently, you can calculate the critical value for the correlation
coefficient using

 This method gives a benchmark for the correlation coefficient.


 However, there is no p-value and is inflexible if you change your
mind about a.
 MegaStat uses this method, giving two-tail critical values for
 = 0.05 and  = 0.01.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-15
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 10).
Critical Value for Correlation Coefficient (Tests for
Significance) (continued)

• Table 12.1 (on next slide) shows that, as sample size


increases, the critical value of r becomes smaller.
• Thus, in very large samples, even very small correlations
could be “significant.”
• In a larger sample, smaller values of the sample correlation
coefficient can be considered “significant.”
• While a larger sample does give a better estimate of the true
value of ρ, a larger sample does not mean that the
correlation is stronger nor does its increased significance
imply increased importance.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-16
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 11).
Critical Value for Correlation Coefficient (Tests for
Significance) (continued, 2)

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-17
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 12).

Significance versus Importance


In large samples, small correlations may be
significant, even if the scatter plot shows little
evidence of linearity. Thus, a significant
correlation may lack practical importance.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-18
Chapter 12
12.2 Simple Regression
LO12-2: Interpret a regression equation and use it to
make predictions.
What Is Simple Regression?
 Simple Regression analyzes the relationship between two
variables.
 It specifies one dependent (response) variable and one
independent (predictor) variable.
 The hypothesized relationship here will be linear of the form
Y = slope x X + y-intercept.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-19
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued).

Response or Predictor?
 The response variable is the dependent variable. This is the Y
variable.
 The predictor variable is the independent variable. This is the X
variable.
 Only the dependent variable (not the independent variable) is
treated as a random variable.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-20
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 2).
Interpreting an Estimated Regression Equation: Examples

• The intercept and slope of an estimated regression


can provide useful information.
• The slope tells us how much, and in which direction,
the response variable will change for each one unit
increase in the explanatory variable.
• However, it is important to interpret the intercept
with caution because it is meaningful only if the
explanatory variable would reasonably have a value
equal to zero. See next slides for examples.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-21
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 3).
Interpreting an Estimated Regression Equation: Examples
Each extra $1 million of advertising
will generate $7.37 million of sales
on average. The firm would
average $268 million of sales with
Sales = 268 + 7.37 Ads
zero advertising. However, the
intercept may not be meaningful
because Ads = 0 may be outside
the range of observed data.

Each extra dependent raises the


mean annual prescription drug
DrugCost = 410 + 550 Dependents cost by $550. An employee with
zero dependents averages $410 in
prescription drugs.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-22
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 4).
Interpreting an Estimated Regression Equation: Examples

Each extra square foot adds $1.05


to monthly apartment rent. The
Rent = 150 + 1.05 SqFt
intercept is not meaningful because
no apartment can have SqFt = 0.

Each unit increase in engine


horsepower decreases the fuel
efficiency by 0.079 mile per gallon.
MPG = 49.22 − 0.079 Horsepower
The intercept is not meaningful
because a zero horsepower engine
does not exist.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-23
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 5).

Cause and Effect?

• When interpreting a regression model, one must


remember that the proposed relationship between
the explanatory variable and response variable is not
an assumption of causation.
• One cannot conclude that the explanatory variable
causes a change in the response variable.
• Consider a regression equation with unemployment
rate per capita as the explanatory variable and crime
rate per capita as the response variable (on next
slide). 
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-24
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 6).

Cause and Effect? (continued)

• Crime Rate = 0.125 + 0.031 Unemployment Rate

• The slope value, 0.031, means that for each one unit
increase in the unemployment rate, we expect to see
an increase of .031 in the crime rate.
• Does this mean being out of work causes crime to
increase?
• No, there are many lurking variables that could
further explain the change in crime rates (e.g.,
poverty rate, education level, or police presence.)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-25
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 7).

Cause and Effect? (continued)

• When we propose a regression model, we might


have a causal mechanism in mind, but cause and
effect are not proven by a simple regression. We
cannot assume that the explanatory variable is
“causing” the variation we see in the response
variable.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-26
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 8).

Prediction Using Regression

• One of the main uses of regression is to make


predictions.
• Once we have a fitted regression equation that
shows the estimated relationship between X (the
independent variable) and Y (the dependent
variable), we can plug in any value of X to obtain the
prediction for Y.
• For example (see next slide).

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-27
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 9).

Prediction Using Regression (continued)

If the firm spends $10 million on


advertising, its predicted sales would
Sales = 268 + 7.37 Ads
be $341.7 million; that is, Sales = 268
+ 7.37(10) = 341.7.
If an employee has four dependents,
the predicted annual drug cost would
DrugCost = 410 + 550 Dependents
be $2,610; that is, DrugCost = 410 +
550(4) = 2,610.
The predicted rent on an 800-square-
Rent = 150 + 1.05 SqFt foot apartment is $990; that is, Rent =
150 + 1.05(800) = 990.
If an engine has 200 horsepower, the
predicted fuel efficiency is 33.42 mpg;
MPG = 49.22 − 0.079 Horsepower
that is, MPG = 49.22 − 0.079(200) =
33.42.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-28
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 10).

Extrapolation Outside the Range of X

• Predictions from our fitted regression model are


stronger within the range of our sample x values.
• The relationship seen in the scatter plot may not be
true for values far outside our observed x range.
• Extrapolation outside the observed range of x is
always tempting but should be approached with
caution.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-29
Chapter 12
12.3 Regression Models
LO12-3: Explain the form and assumptions of a simple
regression model.

Model and Parameters

 The assumed model for a linear relationship is


y = b0 + b1x + e.
• The relationship holds for all pairs (xi , yi ).
 The error term is not observable and is assumed to be
independently normally distributed with mean of 0 and standard
deviation .
 The unknown parameters are:
b0 Intercept
b1 Slope
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-30
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued).

Model and Parameters (continued)


 The fitted model or regression model is used to predict the
expected value of Y for a given value of X and is given below.
 The fitted coefficients are
b0 the estimated intercept
b1 the estimated slope

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-31
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 2).

What Is a Residual?
A residual is calculated as the observed value
of y minus the estimated value of y:

(residual)

The n residuals are used to estimate , the


standard deviation of the errors.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-32
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 3).

Fitting a Regression on a Scatter Plot

• From a scatter plot, we could visually estimate the


slope and intercept.
• Although this method is inexact, experiments
suggest that people are pretty good at “eyeball” line
fitting.
• We instinctively try to adjust the line to ensure that
the line passes through the “center” of the scatter of
data points, to match the data as closely as possible.
• In other words, we try to minimize the vertical
distances between the fitted line and the observed y
values.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-33
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 4).

Fitting a Regression on a Scatter Plot (continued)

• A more precise method is to let Excel calculate the


estimates.
• We enter observations on the independent variable
x1, x2, . . . , xn and the dependent variable y1, y2, . . . ,
yn into separate columns, and let Excel fit the
regression equation, as illustrated in Figure 12.6 on
the next slide.
• Excel will choose the regression coefficients so as to
produce a good fit.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-34
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 5).

Fitting a Regression on a Scatter Plot (continued)

Figure 12.6 shows a


sample of miles per
gallon and horsepower
for 15 engines. The
Excel graph and its
fitted regression
equation are also
shown.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-35
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 6).
Slope and Intercept Interpretations

• Slope Interpretation: The slope of -0.0785 says that for


each additional unit of engine horsepower, the miles per
gallon decreases by 0.0785 mile. This estimated slope is
a statistic because a different sample might yield a
different estimate of the slope.
• Intercept Interpretation: The intercept value of 49.216
suggests that when the engine has no horsepower, the
fuel efficiency would be quite high. However, the
intercept has little meaning in this case, not only
because zero horsepower makes no logical sense, but
also because extrapolating to x = 0 is beyond the range
of the observed data.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-36
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 7).
Regression Caveats

• The “fit” of the regression does not depend on the sign


of its slope. The sign of the fitted slope merely tells
whether X has a positive or negative association with Y.

• View the intercept with skepticism unless x = 0 is


logically possible and is within the observed range of X.

• Regression does not demonstrate cause and effect


between X and Y. A good fit only shows that X and Y
vary together. Both could be affected by another
variable or by the way the data are defined.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-37
Chapter 12
12.4 Ordinary Least Squares (OLS)
Formulas
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret .
Slope and Intercept
 The ordinary least squares method (OLS) estimates the slope
and intercept of the regression line so that the sum of residuals
is minimized which will ensure the best fit.
 The sum of the residuals = 0.

 The sum of the squared residuals is SSE.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-38
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued).

Slope and Intercept (continued)

 This is an optimization problem that can be solved for b0 and b1


by using Excel’s Solver Add-In. However, we also can use
calculus to solve for b0 and b1.

 The OLS estimator for the slope is:

or

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-39
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 2).

Slope and Intercept (continued)

• The OLS estimator for the intercept is:

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-40
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 3).

Slope and Intercept (continued)


The OLS formulas give unbiased and consistent estimates* of β0 and
β1. The OLS regression line always passes through the point ) for any
data, as illustrated in Figure 12.8. *Recall from
Chapter 8 that an
unbiased estimator’s
expected value is
the true parameter
and that a
consistent estimator
approaches ever
closer to the true
parameter as the
sample size
increases.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-41
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 4).
Sources of Variation in Y

In a regression, we seek to explain the variation in the dependent


variable around its mean. We express the total variation as a sum of
squares (denoted SST):

We can split the total variation into two parts:

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-42
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 5).
Sources of Variation in Y (continued)

The explained variation in Y (denoted SSR) is the sum of the


squared differences between the conditional mean (conditioned on
a given value ) and the unconditional mean (same for all ):

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-43
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 6).

Sources of Variation in Y (continued)


The unexplained variation in Y (denoted SSE) is the sum of
squared residuals, sometimes referred to as the error sum
of squares.

Note: If the fit is good, SSE will be relatively small compared


to SST. If each observed data value is exactly the same as its
estimate (i.e., a perfect fit), then SSE will be zero. There is no
upper limit on SSE.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-44
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 7).
Assessing Fit

 Because the magnitude of SSE is dependent on sample size and on


the units of measurement (e.g., dollars, kilograms, ounces), we want
a unit-free benchmark to assess the fit of the regression equation.
 We can obtain a measure of relative fit by comparing SST to SSR.
Recall that total variation in Y can be expressed as

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-45
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 8).

Assessing Fit (continued)


 By dividing both sides by SST, we now have the sum of two
proportions on the right-hand side.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-46
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 9).

Coefficient of Determination
• The first proportion, SSR/SST, has a special name:
coefficient of determination or R2. You can calculate this
statistic in two ways.

• The range of the coefficient of determination is 0 ≤ R2 ≤ 1. The


highest possible R2 is 1 because, if the regression gives a
perfect fit, then SSE = 0.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-47
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 10).

Coefficient of Determination (continued)

• The lowest possible R2 is 0 because, if knowing the value of X


does not help predict the value of Y, then SSE = SST.
• Because a coefficient of determination always lies in the range
0 ≤ R2 ≤ 1, it is often expressed as a percent of variation
explained.
• The unexplained variation reflects factors not included in our
model or just plain random variation.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-48
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 11).
R2 and r

• In a bivariate regression, R2 is the square of the correlation


coefficient r.
• Thus, if r = .50, then R2 = .25.
• For this reason, MegaStat (and some textbooks) denotes the
coefficient of determination as r2 instead of R2.
• In this textbook, the uppercase notation R2 is used to indicate
the difference in their definitions.
• It is tempting to think that a low R2 indicates that the model is
not useful. Yet in some applications (e.g., predicting crude oil
future prices), even a slight improvement in predictive power
can translate into millions of dollars.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-49
Chapter 12
12.5 Test For Significance
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept.
Standard Error of Regression
 The standard error () is an overall measure of model fit.

• If the fitted model’s predictions are perfect


(SSE = 0), then s = 0. Thus, a small indicates a better fit.
• Used to construct confidence intervals.
• Magnitude of depends on the units of measurement of Y and on
data magnitude.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-50
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued).
Confidence Intervals for Slope and Intercept

 Once we have the standard error se, we construct confidence


intervals for the coefficients from the formulas shown below. 

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-51
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 2).
Confidence Intervals for Slope and Intercept (continued)

These standard errors are used to construct confidence intervals


for the true slope and intercept, using Student’s t with d.f. = n − 2
degrees of freedom and any desired confidence level. 

• Note: One can use Excel, Minitab, MegaStat or other software


to compute these intervals and do hypothesis tests relating to
linear regression.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-52
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 3).

Hypothesis Tests
 Is the true slope different from zero? This is an important question
because if β1 = 0, then X is not associated with Y and the
regression model collapses to a constant β0 plus a random error
term:

 The hypotheses (for zero slope and/or intercept) to be tested are:

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-53
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 4).
Hypothesis Tests, continued
For testing either coefficient, we use a t test with d.f. = n − 2 degrees of freedom.
Usually we are interested in testing whether the parameter is equal to zero, as
shown here, but you may substitute another value in place of 0 if you wish. The
hypotheses and their test statistics are

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-54
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 5).
Slope versus Correlation

NOTE: The test for zero slope is the same as the


test for zero correlation. That is, the t test for zero
slope will always yield exactly the same tcalc as the t test for
zero correlation.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-55
Chapter 12
12.6 Analysis of Variance: Overall Fit

LO12-6: Interpret the ANOVA table and use it to calculate


F, R2, and standard error.

Decomposition of Variance
• The decomposition of variance may be written as

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-56
Chapter 12
LO12-6: Interpret the ANOVA table and use it to calculate
F, R2, and standard error (continued).

F Test for Overall Fit

• To test a regression for overall significance, we use


an F test to compare the explained (SSR) and
unexplained (SSE) sums of squares.
• We divide each sum by its respective degrees of
freedom to obtain mean squares (MSR and MSE).
• The F statistic is the ratio of these two mean squares.
• Calculations of the F statistic are arranged in a table
called the analysis of variance or ANOVA table (see
Table 12.4 on the next slide).

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-57
Chapter 12
LO12-6: Interpret the ANOVA table and use it to calculate
F, R2, and standard error (continued, 2).

F Test for Overall Fit (continued)


• ANOVA table.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-58
Chapter 12
12.7 Confidence and Prediction
Intervals for Y
LO12-7: Distinguish between confidence and prediction
intervals for Y.
How to Construct an Interval Estimate for Y
 The regression line is an estimate of the conditional mean of Y,
that is, the expected value of Y for a given value
of X, denoted E(Y | xi).
 But the estimate may be too high or too low.
 To make this point estimate more useful, we need an interval
estimate to show a range of likely values.
 To do this, we insert the xi value into the fitted regression
equation, calculate the estimated ​​, and use the following
formulas.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-59
Chapter 12
LO12-7: Distinguish between confidence and prediction
intervals for Y (continued).

Confidence and Prediction Interval Estimate for Y


 Confidence Interval for the conditional mean of Y.
 Prediction intervals are wider than confidence intervals because
individual Y values vary more than the mean of Y.
 The first formula gives a confidence interval for the conditional
mean of Y, while the second is a prediction interval for
individual values of Y.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-60
Chapter 12
12.8 Residual Tests
LO12-8: Calculate residuals and perform tests of
regression assumptions.
Three Important Assumptions
1. The errors (residuals) are normally distributed.
2. The errors have constant variance (i.e., they are homoscedastic).
3. The errors are independent (i.e., they are nonautocorrelated).
Violation of Assumption 1: Non-normal Errors
• Non-normality of errors is a mild violation since the regression
parameter estimates b0 and b1 and their variances remain
unbiased and consistent.
• Confidence intervals for the parameters may be untrustworthy
because normality assumption is used to justify using
Student’s t distribution.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-61
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued).

Non-normal Errors
• A large sample size would compensate.
• Outliers could pose serious problems.

Normal Probability Plot


• The Normal Probability Plot tests the assumption
H0: Errors are normally distributed
H1: Errors are not normally distributed
• If H0 is true, the residual
probability plot should be
linear as shown in the
example.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-62
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 2).

What to Do About Non-Normality?


1. Trim outliers only if they clearly are mistakes.
2. Increase the sample size if possible.
3. Try a logarithmic transformation of both X and Y.

Non-normality is not considered a major violation because


the parameter estimates remain unbiased and consistent.
Don’t worry too much about it unless you have major
outliers.

Violation of Assumption 2: Nonconstant Variance


• The ideal condition is if the error magnitude is constant (i.e.,
errors are homoscedastic).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-63
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 3).

Violation of Assumption 2: Nonconstant Variance


• Heteroscedastic (nonconstant) errors increase or decrease with X.
• In the most common form of heteroscedasticity, the variances of
the estimators are likely to be understated.
• This results in overstated t statistics and artificially narrow
confidence intervals.

Tests for Heteroscedasticity

• Plot the residuals against X.


Ideally, there is no pattern in the
residuals moving from left to right.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-64
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 4).
Tests for Heteroscedasticity
• Although many patterns of non-constant variance might exist, the
“fan-out” pattern (increasing residual variance) is most common.
• Less frequently, we might see a “funnel-in” pattern, which shows
decreasing residual variance.
• The residuals always have a mean of zero, whether the residuals
exhibit homoscedasticity or heteroscedasticity.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-65
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 5).

What to Do About Heteroscedasticity?


• Transform both X and Y, for example, by taking logs.
• Although violation of the constant variance assumption can widen
the confidence intervals for the coefficients, heteroscedasticity does
not bias the estimates.
• At this stage of your training, it is sufficient just to recognize its
existence.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-66
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 6).

Violation of Assumption 3: Autocorrelated Errors


• Autocorrelation is a pattern of non-independent errors.
• In a first-order autocorrelation, et is correlated with et-1.
• The estimated variances of the OLS estimators are biased,
resulting in confidence intervals that are too narrow, overstating the
model’s fit.
Runs Test for Autocorrelation
• In the runs test, count the number of the residual’s sign reversals (i.e., how
often does the residual cross the zero centerline?).
• If the pattern is random, the number of sign changes should be n/2.
• Fewer than n/2 would suggest positive autocorrelation.
• More than n/2 would suggest negative autocorrelation.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-67
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 7).

Durbin-Watson (DW) Test


• Tests for autocorrelation under the hypotheses
H0: Errors are non-autocorrelated
H1: Errors are autocorrelated
• The DW statistic will range from 0 to 4.
DW < 2 suggests positive autocorrelation
DW = 2 suggests no autocorrelation (ideal)
DW > 2 suggests negative autocorrelation

Autocorrelation is a concern with time-series data. Although it can widen the


confidence intervals for the coefficients, autocorrelation does not bias the estimates. At
this stage of your training, it is sufficient just to recognize when you have
autocorrelation.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-68
Chapter 12
12.9 Unusual Observations
LO12-9: Identify unusual residuals and tell when they are
outliers.
Standardized Residuals

• In a regression, we look for observations that are unusual.

• An observation could be unusual because its Y-value is poorly predicted


by the regression model (unusual residual) or because its unusual X-
value greatly affects the regression line (high leverage).

• Tests for unusual residuals and high leverage are important diagnostic
tools in evaluating the fitted regression.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-69
Chapter 12
LO12-9: Identify unusual residuals and tell when they are
outliers (continued).
Standardized Residuals (continued)
• One can use Excel, Minitab, MegaStat or other software to compute
standardized residuals.
• If the absolute value of any standardized residual is at least 2, then it is
classified as unusual.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-70
Chapter 12
LO12-10: Define leverage and identify high leverage
observations.
High Leverage

• A high leverage statistic indicates that the observation is far


from the mean of X.
• Such observations have great influence on the regression
estimates because they are at the “end of the lever.”
• The leverage for observation i is denoted hi and is calculated as 

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-71
Chapter 12
LO12-10: Define leverage and identify high leverage
observations (continued).
High Leverage (continued)
• Figure 12.27 illustrates this concept of high leverage.
• One individual worked 65 hours, while the others worked between
12 and 42 hours. This individual will have a big effect on the slope
estimate because he is so far above the mean of X.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-72
Chapter 12
LO12-10: Define leverage and identify high leverage
observations (continued, 2).

High Leverage (continued)

• As a rule of thumb for a simple regression, a


leverage statistic that exceeds 4/n is unusual
(if , the leverage statistic is 1/n so the rule of
thumb is just four times this value).

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-73
Chapter 12
12.10 Other Regression Problems
(optional)
LO12-11: Improve data conditioning and use
transformations if needed (optional).
Outliers
Outliers may be caused by To fix the problem,
• an error in recording data • delete the observation(s)
• impossible data • delete the data
• an observation that has been • formulate a multiple regression
influenced by an unspecified model that includes the lurking
“lurking” variable that should have variable.
been controlled but wasn’t.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-74
12B-74
Chapter 12
LO12-11: Improve data conditioning and use
transformations if needed (optional) (continued).
Model Misspecification
• If a relevant predictor has been omitted, then the model is
misspecified.
• Use multiple regression instead of bivariate regression.

Ill-Conditioned Data
• Well-conditioned data values are of the same general order of
magnitude.
• Ill-conditioned data have unusually large or small data values and
can cause loss of regression accuracy or awkward estimates.
• Avoid mixing magnitudes by adjusting the magnitude of your data
before running the regression.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-75
Chapter 12
LO12-11: Improve data conditioning and use
transformations if needed (optional) (continued, 2).

Spurious Correlation
• In a spurious correlation, two variables appear related because
of the way they are defined.
• For example, consider the hypothesis that a state’s spending on
education is a linear function of its prison population. Such a
hypothesis seems absurd, and we would expect the regression to
be insignificant. But if the variables are defined as totals without
adjusting for population, we will observe significant correlation.
• This phenomenon is called the size effect or the problem of
totals.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-76
Chapter 12
LO12-11: Improve data conditioning and use
transformations if needed (optional) (continued, 3).

Model Form and Variable Transforms


• Sometimes a nonlinear model is a better fit than a linear model.
• Excel offers many model forms.
• Variables may be transformed (e.g., logarithmic or exponential
functions) in order to provide a better fit.
• Log transformations reduce heteroscedasticity.
• Nonlinear models may be difficult to interpret.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-77
Chapter 12
12.11 Logistic Regression (Optional)
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable.
Binary Response Variable
• Sometimes we need to predict something that has only two possible
values (a binary dependent variable).
• For example, will a Chase bank customer use online banking (Y = 1) or
not (Y = 0)? Will an Amazon customer make another purchase within the
next six months (Y = 1) or not (Y = 0)?
• Such research questions would seem to be candidates for regression
modeling because we could define possible predictors such as a
customer’s age, gender, length of time as an existing customer, or past
transaction history.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-78
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued).
Why Not Use Least Squares?

• Unfortunately, if you perform an ordinary least-squares regression with a


binary response variable, there will be complications.
• While the actual value of Y can only be 1 (if the event occurs) or 0 (if the
event does not occur), the predicted value of Y should be a number
between 0 and 1, denoting the probability of the event of interest.
• Using Excel’s linear regression, the predicted Y values could be greater
than one or less than zero, which would be illogical.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-79
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 2).
Why Not Use Least Squares? (continued)
• Another issue is that your regression errors will violate the assumptions
of homoscedasticity (constant variance) because as the
predicted Y values vary from .50 (in either direction), the variance of the
errors will decrease and approach zero.
• Finally, significance tests assume normally distributed errors, which
cannot be the case when Y has only two values (Y = 0 or Y = 1).
• Therefore, tests for significance would be in doubt if you used linear
regression with a binary response variable.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-80
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 3).
Why Not Use Least Squares? (continued)

• The solution is to choose logistic regression using


the nonlinear regression model shown below. This equation predicts
the probability that Y = 1 for any specified value of the independent
variable. This model form ensures predictions within the range 0
< ŷ <1.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-81
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 4).
Why Not Use Least Squares? (continued)
• The logistic regression model has an S-shaped form, as illustrated in
Figure 12.40. The logistic function approaches 1 as the value of the
independent variable increases.

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-82
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 5).
Estimating a Logistic Regression Model

• The underlying model is the Bernoulli (binary) distribution.


• The event of interest either occurs (probability π) or does not occur
(probability 1 – π).
• Instead of the least squares method, we estimate the parameters using
the method of maximum likelihood.
• This method chooses values of the regression parameters that will
maximize the probability of obtaining the observed sample data. 

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-83
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 6).
Estimating a Logistic Regression Model (continued)
• While easy to state in words, the computational procedure requires
specialized software.
• Any major statistical package will safely perform logistic regression
(sometimes called logit for short) and will provide p-values for the
estimated coefficients and predictions for Y.
• An iterative process is required because there is no simple formula
for the parameter estimates.
• What is important at this stage of training is for you to recognize the
need for a specialized tool when Y is a binary (0, 1) variable. 

Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-84

You might also like