Professional Documents
Culture Documents
Simple Linear
Regression
Y Y
X X
Y Y
X X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 4
Types of Relationships DCOVA
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 5
Types of Relationships DCOVA
(continued)
No relationship
X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 6
Introduction to
Regression Analysis
DCOVA
Regression analysis is used to:
Predict the value of a dependent variable based on
the value of at least one independent variable
Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to
predict or explain
Independent variable: the variable used to
predict or explain the
dependent variable
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi β0 β1Xi ε i
Linear component Random Error
component
Y Yi β0 β1Xi ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 10
Simple Linear Regression Equation
(Prediction Line)
DCOVA
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi b0 b1Xi
observation i
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 26
Simple Linear Regression
Example: Making Predictions
DCOVA
When using a regression model for prediction,
only predict within the relevant range of data
Relevant range for
interpolation
450
400
House Price ($1000s)
350
300
250
200
150
100
50 Do not try to
0
extrapolate
0 500 1000 1500 2000 2500 3000
beyond the range
Square Feet
of observed X’s
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 27
Measures of Variation
DCOVA
Total variation is made up of two parts:
DCOVA
SST = total sum of squares (Total Variation)
Measures the variation of the Yi values around their
mean Y
SSR = regression sum of squares (Explained Variation)
Variation attributable to the relationship between X
and Y
SSE = error sum of squares (Unexplained Variation)
Variation in Y attributable to factors other than X
DCOVA
Y
Yi
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2
Y _
_ SSR = (Yi - Y)2 _
Y Y
Xi X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 30
Coefficient of Determination, r2
DCOVA
The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
The coefficient of determination is also called
r-square and is denoted as r2
SSR regression sum of squares
r
2
SST total sum of squares
note: 0 r 1 2
X
r2 =1
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 32
Examples of Approximate
r2 Values
DCOVA
Y
0 < r2 < 1
X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 33
Examples of Approximate
r2 Values
DCOVA
r2 = 0
Y
No linear relationship
between X and Y:
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010 58.08% of the variation in
Residual Error 8 13666 1708 house prices is explained
Total 9 32600 by variation in square feet
SSE
(Yi Yˆi ) 2
i 1
SYX
n2 n2
Where
SSE = error sum of squares
n = sample size
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
SYX 41.33032
Square Feet 0.10977 0.03297 3.33 0.010
Analysis of Variance
Source DF SS MS F P
Regression 1 18935 18935 11.08 0.010
Residual Error 8 13666 1708
Total 9 32600
Y Y
Independence of Errors
Error values are statistically independent
period of time
Normality of Error
Error values are normally distributed for any given
value of X
Equal Variance (also called homoscedasticity)
The probability distribution of the errors has constant
variance
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 41
Residual Analysis
DCOVA
ei Yi Ŷi
The residual for observation i, ei, is the difference
between its observed and predicted value
Check the assumptions of regression by examining the
residuals
Examine for linearity assumption
Evaluate independence assumption
Evaluate normal distribution assumption
Examine for constant variance for all levels of X
(homoscedasticity)
x x
residuals
x residuals x
Not Linear
Copyright © 2016 Pearson Education, Ltd.
Linear
Chapter 12, Slide 43
Residual Analysis for
Independence
DCOVA
Cycial Pattern:
Not Independent
No Cycial Pattern
Independent
residuals
residuals
X
residuals
Percent
100
0
-3 -2 -1 0 1 2 3
Residual
Y Y
x x
residuals
x residuals x
2 273.87671 38.12329 60
3 284.85348 -5.853484 40
Residuals
4 304.06284 3.937162 20
5 218.99284 -19.99284
0
6 268.38832 -49.38832 0 1000 2000 3000
-20
7 356.20251 48.79749
-40
8 367.17929 -43.17929
9 254.6674 64.33264 -60
Residual
Percent
50
0
10 -25
-50
1
-100 -50 0 50 100 200 240 280 320 360
Residual Fitted Value
2 25
Residual
0
1
-25
-50
0
-50 -25 0 25 50 75 1 2 3 4 5 6 7 8 9 10
Residual Observation Order
i
e 2
i1
D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
0 dL dU 2
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 53
Testing for Positive
Autocorrelation (continued)
DCOVA
Suppose we have the following time series
data:
160
140
120
100
Sales
80 y = 30.65 + 4.7038x
2
60 R = 0.8976
40
20
0
0 5 10 15 20 25 30
Tim e
Is there autocorrelation?
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 54
Testing for Positive
(continued)
Autocorrelation
DCOVA
Example with n = 25: 160
140
100
Durbin-Watson Calculations
Sales
80 y = 30.65 + 4.7038x
Sum of Squared 2
60 R = 0.8976
Difference of Residuals 3296.18
40
Sum of Squared
20
Residuals 3279.98
0
Durbin-Watson 0 5 10 15 20 25 30
Statistic 1.00494 Tim e
(e ei i1 )2
3296.18
D i 2
n
1.00494
3279.98
ei
2
i1
S YX S YX
Sb1
SSX i
(X X ) 2
where:
Sb1 = Estimate of the standard error of the slope
b1 β 1 0.10977 0
t ST AT 3.32938
b1 Sb1 Sb
1
0.03297
H0: β1 = 0
Test Statistic: tSTAT = 3.329
H1: β1 ≠ 0
d.f. = 10- 2 = 8
a/2=.025 a/2=.025
Decision: Reject H0
where SSR
MSR
k
SSE
MSE
n k 1
where FSTAT follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
Regression Statistics
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 FSTAT 11.0848
Adjusted R Square 0.52842 MSE 1708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees p-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Analysis of Variance
b1 t α / 2 S b
1
Test statistic
r -ρ
t STAT (with n – 2 degrees of freedom)
2
1 r where
n2 r r 2 if b1 0
r r 2 if b1 0
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 70
t-test For A Correlation Coefficient (continued)
DCOVA
Is there evidence of a linear relationship
between square feet and house price at the
.05 level of significance?
r ρ .762 0
t STAT 3.329
1 r2 1 .762 2
n2 10 2
DCOVA
r ρ .762 0 Decision:
t STAT 3.329
Reject H0
1 r2 1 .762 2
n2 10 2 Conclusion:
There is
d.f. = 10-2 = 8
evidence of a
linear association
a/2=.025 a/2=.025
at the 5% level of
significance
Reject H0 Do not reject H0 Reject H0
-tα/2 tα/2
0
-2.3060 2.3060
3.329
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 72
Estimating Mean Values and
Predicting Individual Values
DCOVA
Goal: Form intervals around Y to express uncertainty
about the value of Y for a given Xi
Confidence
Interval for Y
the mean of Y
Y, given Xi
Y = b0+b1Xi
Prediction Interval
for an individual Y,
given Xi
Xi X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 73
Confidence Interval for
the Average Y, Given X
DCOVA
Confidence interval estimate for the
mean value of Y given a particular Xi
Yˆ t α / 2 S YX hi
Yˆ t α / 2 S YX 1 hi
1 (X i X) 2
Ŷ t 0.025S YX 317.85 37.12
n
(X i X) 2
1 (X i X) 2
Ŷ t 0.025S YX 1 317.85 102.28
n
(X i X) 2
Check the
“confidence and prediction interval for X=”
box and enter the X-value and confidence level
desired
Y
New
Obs Fit SE Fit 95% CI 95% PI
1 317.8 16.1 (280.7, 354.9) (215.5, 420.1)