Professional Documents
Culture Documents
McGraw-Hill/Irwin Copyright © 2015 by The McGraw-Hill Companies, Inc. All rights reserved.
Simple Regression
Chapter Contents
Chapter Contents
Begin the analysis of bivariate data (i.e., two variables) with a scatter plot.
A scatter plot - displays each observed data pair (xi, yi) as a dot on an X/Y
grid.- indicates visually the strength of the relationship between the
two variables.
Note: -1 ≤ r ≤ +1
r = 0 indicates no linear
relationship
12.1 Visual Displays and
LO12-1
Correlation Analysis
Scatter Plots Showing Various Correlation Values
12.1 Visual Displays and Correlation
LO12-1
Analysis
Tests for Significant Correlation Using Student’s t
Step 1: State the HypothesesDetermine whether you are using a one or
two-tailed test and the level of significance (α). H0: ρ = 0 H1: ρ ≠ 0
Step 2: Specify the Decision RuleFor degrees of freedom df = n -2, look
up the critical value tα in Appendix D.
NOTES:
LO12-3 12.3 Regression Models
LO12-3: Explain the form and assumptions of a simple
regression model.
The fitted model or regression model is used to predict the expected value
of Y for a given value of X and is given below.
The fitted coefficients are b0 the estimated intercept b1 the estimated
slope
LO12-3 12.3 Regression Models
The ordinary least squares method (OLS) estimates the slope and
intercept of the regression line so that the sum of residuals is minimized
which will ensure the best fit.
The sum of the residuals = 0.
o
r
The OLS estimator for the intercept is:
LO12-4 12.4 Ordinary Least Squares (OLS)
Formulas
Slope and Intercept
*Recall from Chapter 8 that an unbiased estimator’s expected value is the true parameter
and that a consistent estimator approaches ever closer to the true parameter as the sample
size increases.
LO12-4 12.4 Ordinary Least Squares (OLS)
Formulas
Assessing Fit
We want to explain the total variation in Y around its mean (SST for Total
Sums of Squares).
If the fitted model’s predictions are perfect (SSE = 0), then s = 0. Thus, a
small  indicates a better fit.
Used to construct confidence intervals.
Magnitude of  depends on the units of measurement of Y and on data
magnitude.
LO12-5 12.5 Test For Significance
Confidence Intervals for Slope and Intercept
Standard error of the slope and intercept:
LO12-5 12.5 Test For Significance
Confidence Intervals for Slope and Intercept
df = n -2
Reject H0 if tcalc
> tα/2
or if p-value ≤
α.
LO12-6 12.6 Analysis of Variance: Overall Fit
LO12-6: Interpret the ANOVA table and use it to calculate F, R2, and the
standard error.
Decomposition of Variance
The decomposition of variance may be written as
LO12-6 12.6 Analysis of Variance: Overall Fit
LO12-6: Interpret the ANOVA table and use it to calculate F, R2, and
the standard error.
F Test for Overall Fit
To test a regression for overall significance, we use an F test to compare
the explained (SSR) and unexplained (SSE) sums of squares.
12.7 Confidence and Prediction
LO12-7
Intervals for Y
LO12-7: Distinguish between confidence and prediction
intervals for Y.
How to Construct an Interval Estimate for Y
Confidence Interval for the conditional mean of Y.
Prediction intervals are wider than confidence intervals because individual
Y values vary more than the mean of Y.
LO12-8 12.8 Residual Tests
LO12-8: Calculate residuals and perform tests of
regression assumptions.
Three Important Assumptions
1. The errors are normally distributed.
2. The errors have constant variance (i.e., they are homoscedastic).
3. The errors are independent (i.e., they are nonautocorrelated).
Non-normal Errors
A large sample size would compensate.
Outliers could pose serious problems.
Tests for autocorrelation under the hypotheses H0: Errors are non-autocorrelatedH1:
Errors are autocorrelated
The DW statistic will range from 0 to 4.DW < 2 suggests positive autocorrelationDW
= 2 suggests no autocorrelation (ideal)DW > 2 suggests negative autocorrelation
12.8 Residual Tests
LO12-8
High Leverage
12B-45
12.10 Other Regression Problems
LO12-11
(optional)
Model Misspecification
If a relevant predictor has been omitted, then the model is misspecified.
Use multiple regression instead of bivariate regression.
Ill-Conditioned Data
Well-conditioned data values are of the same general order of magnitude.
Ill-conditioned data have unusually large or small data values and can
cause loss of regression accuracy or awkward estimates.
12.10 Other Regression Problems
LO12-11
(optional)
Ill-Conditioned Data
Avoid mixing magnitudes by adjusting the magnitude of your data before
running the regression.
Spurious Correlation
In a spurious correlation two variables appear related because of the way
they are defined.
This problem is called the size effect or problem of totals.
12.10 Other Regression Problems
LO12-11
(optional)