Professional Documents
Culture Documents
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-1
Chapter 12
Simple Regression
Chapter Contents
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-2
Chapter 12
Simple Regression (continued)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-3
Chapter 12
Simple Regression (continued, 2)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-4
Chapter 12
Simple Regression (continued, 3)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-5
Chapter 12
12.1 Visual Displays and Correlation
Analysis
LO12-1: Calculate and test a correlation coefficient for
significance.
Visual Displays
Begin the analysis of bivariate data (i.e., two variables) with a
scatter plot.
A scatter plot
displays each observed data pair (xi, yi) as a dot on an X-Y
grid.
indicates visually the strength of the relationship or
association between the two variables.
A scatter plot is typically the precursor to more complex analytical
techniques.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-6
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-7
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 2).
Correlation Coefficient, r
• A visual display is a good first step in analysis, but we
would also like to quantify the strength of the association
between two variables.
• Therefore, accompanying the scatter plot is the sample
correlation coefficient (also called the Pearson
correlation coefficient.)
• This statistic measures the degree of linearity in the
relationship between two random variables X and Y and
is denoted r.
• Its value will fall in the interval [−1, 1].
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-8
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 3).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-9
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 4).
Note: -1 ≤ r ≤ +1
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-10
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 5).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-11
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 6).
Examples of Scatter Plots Showing Various Correlation Values
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-12
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 7).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-13
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 8).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-14
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 9).
Critical Value for Correlation Coefficient (Tests for
Significance)
Equivalently, you can calculate the critical value for the correlation
coefficient using
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-16
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 11).
Critical Value for Correlation Coefficient (Tests for
Significance) (continued, 2)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-17
Chapter 12
LO12-1: Calculate and test a correlation coefficient for
significance (continued, 12).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-18
Chapter 12
12.2 Simple Regression
LO12-2: Interpret a regression equation and use it to
make predictions.
What Is Simple Regression?
Simple Regression analyzes the relationship between two
variables.
It specifies one dependent (response) variable and one
independent (predictor) variable.
The hypothesized relationship here will be linear of the form
Y = slope x X + y-intercept.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-19
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued).
Response or Predictor?
The response variable is the dependent variable. This is the Y
variable.
The predictor variable is the independent variable. This is the X
variable.
Only the dependent variable (not the independent variable) is
treated as a random variable.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-20
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 2).
Interpreting an Estimated Regression Equation: Examples
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-21
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 3).
Interpreting an Estimated Regression Equation: Examples
Each extra $1 million of advertising
will generate $7.37 million of sales
on average. The firm would
average $268 million of sales with
Sales = 268 + 7.37 Ads
zero advertising. However, the
intercept may not be meaningful
because Ads = 0 may be outside
the range of observed data.
• The slope value, 0.031, means that for each one unit
increase in the unemployment rate, we expect to see
an increase of .031 in the crime rate.
• Does this mean being out of work causes crime to
increase?
• No, there are many lurking variables that could
further explain the change in crime rates (e.g.,
poverty rate, education level, or police presence.)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-25
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 7).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-26
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 8).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-27
Chapter 12
LO12-2: Interpret a regression equation and use it to
make predictions (continued, 9).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-29
Chapter 12
12.3 Regression Models
LO12-3: Explain the form and assumptions of a simple
regression model.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-31
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 2).
What Is a Residual?
A residual is calculated as the observed value
of y minus the estimated value of y:
(residual)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-32
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 3).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-34
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 5).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-35
Chapter 12
LO12-3: Explain the form and assumptions of a simple
regression model (continued, 6).
Slope and Intercept Interpretations
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-38
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued).
or
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-39
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 2).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-40
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 3).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-42
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 5).
Sources of Variation in Y (continued)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-43
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 6).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-45
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 8).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-46
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 9).
Coefficient of Determination
• The first proportion, SSR/SST, has a special name:
coefficient of determination or R2. You can calculate this
statistic in two ways.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-47
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 10).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-48
Chapter 12
LO12-4: Explain the least squares method, apply
formulas for coefficients, and interpret
(continued, 11).
R2 and r
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-49
Chapter 12
12.5 Test For Significance
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept.
Standard Error of Regression
The standard error () is an overall measure of model fit.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-51
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 2).
Confidence Intervals for Slope and Intercept (continued)
Hypothesis Tests
Is the true slope different from zero? This is an important question
because if β1 = 0, then X is not associated with Y and the
regression model collapses to a constant β0 plus a random error
term:
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-53
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 4).
Hypothesis Tests, continued
For testing either coefficient, we use a t test with d.f. = n − 2 degrees of freedom.
Usually we are interested in testing whether the parameter is equal to zero, as
shown here, but you may substitute another value in place of 0 if you wish. The
hypotheses and their test statistics are
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-54
Chapter 12
LO12-5: Construct confidence intervals and test
hypotheses for the slope and intercept
(continued, 5).
Slope versus Correlation
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-55
Chapter 12
12.6 Analysis of Variance: Overall Fit
Decomposition of Variance
• The decomposition of variance may be written as
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-56
Chapter 12
LO12-6: Interpret the ANOVA table and use it to calculate
F, R2, and standard error (continued).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-57
Chapter 12
LO12-6: Interpret the ANOVA table and use it to calculate
F, R2, and standard error (continued, 2).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-58
Chapter 12
12.7 Confidence and Prediction
Intervals for Y
LO12-7: Distinguish between confidence and prediction
intervals for Y.
How to Construct an Interval Estimate for Y
The regression line is an estimate of the conditional mean of Y,
that is, the expected value of Y for a given value
of X, denoted E(Y | xi).
But the estimate may be too high or too low.
To make this point estimate more useful, we need an interval
estimate to show a range of likely values.
To do this, we insert the xi value into the fitted regression
equation, calculate the estimated , and use the following
formulas.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-59
Chapter 12
LO12-7: Distinguish between confidence and prediction
intervals for Y (continued).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-60
Chapter 12
12.8 Residual Tests
LO12-8: Calculate residuals and perform tests of
regression assumptions.
Three Important Assumptions
1. The errors (residuals) are normally distributed.
2. The errors have constant variance (i.e., they are homoscedastic).
3. The errors are independent (i.e., they are nonautocorrelated).
Violation of Assumption 1: Non-normal Errors
• Non-normality of errors is a mild violation since the regression
parameter estimates b0 and b1 and their variances remain
unbiased and consistent.
• Confidence intervals for the parameters may be untrustworthy
because normality assumption is used to justify using
Student’s t distribution.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-61
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued).
Non-normal Errors
• A large sample size would compensate.
• Outliers could pose serious problems.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-64
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 4).
Tests for Heteroscedasticity
• Although many patterns of non-constant variance might exist, the
“fan-out” pattern (increasing residual variance) is most common.
• Less frequently, we might see a “funnel-in” pattern, which shows
decreasing residual variance.
• The residuals always have a mean of zero, whether the residuals
exhibit homoscedasticity or heteroscedasticity.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-65
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 5).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-66
Chapter 12
LO12-8: Calculate residuals and perform tests of
regression assumptions (continued, 6).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-68
Chapter 12
12.9 Unusual Observations
LO12-9: Identify unusual residuals and tell when they are
outliers.
Standardized Residuals
• Tests for unusual residuals and high leverage are important diagnostic
tools in evaluating the fitted regression.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-69
Chapter 12
LO12-9: Identify unusual residuals and tell when they are
outliers (continued).
Standardized Residuals (continued)
• One can use Excel, Minitab, MegaStat or other software to compute
standardized residuals.
• If the absolute value of any standardized residual is at least 2, then it is
classified as unusual.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-70
Chapter 12
LO12-10: Define leverage and identify high leverage
observations.
High Leverage
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-71
Chapter 12
LO12-10: Define leverage and identify high leverage
observations (continued).
High Leverage (continued)
• Figure 12.27 illustrates this concept of high leverage.
• One individual worked 65 hours, while the others worked between
12 and 42 hours. This individual will have a big effect on the slope
estimate because he is so far above the mean of X.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-72
Chapter 12
LO12-10: Define leverage and identify high leverage
observations (continued, 2).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-73
Chapter 12
12.10 Other Regression Problems
(optional)
LO12-11: Improve data conditioning and use
transformations if needed (optional).
Outliers
Outliers may be caused by To fix the problem,
• an error in recording data • delete the observation(s)
• impossible data • delete the data
• an observation that has been • formulate a multiple regression
influenced by an unspecified model that includes the lurking
“lurking” variable that should have variable.
been controlled but wasn’t.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-74
12B-74
Chapter 12
LO12-11: Improve data conditioning and use
transformations if needed (optional) (continued).
Model Misspecification
• If a relevant predictor has been omitted, then the model is
misspecified.
• Use multiple regression instead of bivariate regression.
Ill-Conditioned Data
• Well-conditioned data values are of the same general order of
magnitude.
• Ill-conditioned data have unusually large or small data values and
can cause loss of regression accuracy or awkward estimates.
• Avoid mixing magnitudes by adjusting the magnitude of your data
before running the regression.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-75
Chapter 12
LO12-11: Improve data conditioning and use
transformations if needed (optional) (continued, 2).
Spurious Correlation
• In a spurious correlation, two variables appear related because
of the way they are defined.
• For example, consider the hypothesis that a state’s spending on
education is a linear function of its prison population. Such a
hypothesis seems absurd, and we would expect the regression to
be insignificant. But if the variables are defined as totals without
adjusting for population, we will observe significant correlation.
• This phenomenon is called the size effect or the problem of
totals.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-76
Chapter 12
LO12-11: Improve data conditioning and use
transformations if needed (optional) (continued, 3).
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-77
Chapter 12
12.11 Logistic Regression (Optional)
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable.
Binary Response Variable
• Sometimes we need to predict something that has only two possible
values (a binary dependent variable).
• For example, will a Chase bank customer use online banking (Y = 1) or
not (Y = 0)? Will an Amazon customer make another purchase within the
next six months (Y = 1) or not (Y = 0)?
• Such research questions would seem to be candidates for regression
modeling because we could define possible predictors such as a
customer’s age, gender, length of time as an existing customer, or past
transaction history.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-78
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued).
Why Not Use Least Squares?
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-79
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 2).
Why Not Use Least Squares? (continued)
• Another issue is that your regression errors will violate the assumptions
of homoscedasticity (constant variance) because as the
predicted Y values vary from .50 (in either direction), the variance of the
errors will decrease and approach zero.
• Finally, significance tests assume normally distributed errors, which
cannot be the case when Y has only two values (Y = 0 or Y = 1).
• Therefore, tests for significance would be in doubt if you used linear
regression with a binary response variable.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-80
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 3).
Why Not Use Least Squares? (continued)
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-81
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 4).
Why Not Use Least Squares? (continued)
• The logistic regression model has an S-shaped form, as illustrated in
Figure 12.40. The logistic function approaches 1 as the value of the
independent variable increases.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-82
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 5).
Estimating a Logistic Regression Model
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-83
Chapter 12
LO12-12: Identify when logistic regression is appropriate
and calculate predictions for a binary response
variable (continued, 6).
Estimating a Logistic Regression Model (continued)
• While easy to state in words, the computational procedure requires
specialized software.
• Any major statistical package will safely perform logistic regression
(sometimes called logit for short) and will provide p-values for the
estimated coefficients and predictions for Y.
• An iterative process is required because there is no simple formula
for the parameter estimates.
• What is important at this stage of training is for you to recognize the
need for a specialized tool when Y is a binary (0, 1) variable.
Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the
prior written consent of McGraw-Hill Education. 12-84