Professional Documents
Culture Documents
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
• Understand the goals of simple linear regression analysis
• Consider what the error term contains
• Define the population regression model and the sample
regression function
• Estimate the sample regression function
• Interpret the estimated sample regression function
• Predict outcomes based on our estimated sample regression
function
• Assess the goodness-of-fit of the estimated sample
regression function
• Understand how to read regression output in Excel
• Understand the difference between correlation and
causation
4-2
4-3
Understand the Goals of Simple
Linear Regression Analysis
Regression analysis is used to:
– Obtain the marginal effect that a one-unit change in
the independent variable has on the dependent
variable
– Predict the value of a dependent variable based on
the value of the independent variable
4-6
Population Linear Regression Model
y β0 β1x ε
Linear component Random Error
component
4-7
Consider What the Random Error
Component, ε, Contains
• Omitted Variables – independent variables that are
related to the dependent variable, y, but are not in
the regression model (i.e. they are omitted).
4-9
Consider What the Random Error
Component, ε, Contains
• Incorrect Functional Form – the wrong model is fit to
the data. For example, a linear function is fit
between y and x but the true relationship is
quadratic.
4-10
Estimated Regression Function
The sample regression line provides an estimate of the
population regression line
Independent
variable
4-11
What is a Residual?
A residual is the difference between the observed
value of y and the predicted value of y. It is an
estimate of the error term, ε, that resides in the
population while the residual is from the sample.
ei yi yˆ i
observed value predicted value
4-12
Graph of the Sample Regression
Function
4-13
Graph of Predictions and Residuals
for Multiple Observations
4-14
Estimate the Sample Regression
Function
• and are obtained by minimizing the sum
of the squared residuals with respect to
and
n n
min (y
i 1
i yˆ i ) e
2
i 1
2
i
n
( yi 0 1 xi )
ˆ ˆ 2
i 1
4-15
The Least Squares Equation
• The formulas for and are:
n
( x x )( y y )
i i
Cov( x, y )
ˆ1 i 1
n
i
Var ( x)
( x x ) 2
i 1
and
ˆ0 y b1 x
4-16
Interpretation of the
Slope and the Intercept
4-17
Salary (y) vs. Education (x)
Example in salary.xls
n
( x x )( y
i 1
i i y ) 743,000
n
( xi x ) 568
2
i 1
x 16 and y 58,800
4-18
Example continued
n
(x i x )( yi y )
743,000
ˆ1 i 1
n
11,257.5758
i
66
( x x ) 2
i 1
4-19
A Graphical Representation of the
Estimated Regression Line
Salary (Dollars) vs. Experience
160,000
140,000
120,000
Salary (dollars)
100,000
80,000
60,000
40,000
20,000
0
10 12 14 16 18 20 22
Experience (years)
4-20
Using Excel to Compute the Estimated
Regression Equation in a Scatter Plot
• Create a scatter diagram in Excel
• Position the mouse over any data point and right
click
• Select Add Trendline option
• When the Add Trendline dialog box appears:
On the Type tab select Linear (it is the default)
On the Options tab select the Display equation
on chart box (note the equation is displayed with
the slope first and the intercept second)
Click OK
4-21
Interpret the Estimated Sample
Regression Function
4-22
Predict Outcomes Based on our
Estimated Sample Regression Function
Say we want to predict salary for a person with
12 years of education. We would put this value
of x into the sample regression function as
4-23
Assess the Goodness-of-Fit of the
Estimated Regression Function
4-24
Comparing the Goodness-of-Fit of
Two Hypothetical Data Sets
4-25
A Venn Diagram Demonstrating
Joint Variation between y and x
4-26
The Sample Regression Function
Explains None of the Variation in y
4-27
The Sample Regression Function
Explains All of the Variation in y
4-28
The Sample Regression Function
Explains All of the Variation in y
4-29
Explained and Unexplained
Variation
• Total variation in the dependent variable is made up of two
parts:
4-31
Explained and Unexplained
Variation
y
yi
2 y
USS = (yi - yi )
_
TSS = (yi - y)2
y 2_
_ ESS = (yi - y) _
y y
Xi x
4-32
Coefficient of Determination, R 2
Coefficient of determination
ESS sum of squares explained by regression
R
2
TSS total sum of squares
R r2 2
xy
where:
R2 = Coefficient of determination
rxy= correlation coefficient between x and y
4-34
How are the Correlation Coefficient
and the Coefficient of Determination
Related?
R2 = rxy2
rxy ( sign of b1 ) R 2
4-35
What is the Intuition Behind This
Relationship?
• In the case of linear relationship between two variables,
both the coefficient of determination and the sample
correlation coefficient provide measures of the strength of
relationship.
• The coefficient of determination provides a measure
between 0 and 1 whereas the correlation coefficient
provides a measure between -1 and 1.
• The coefficient of determination can be used for nonlinear
relationships and for relationships that have two or more
independent variables.
• Why might the correlation coefficient be preferred to the
coefficient of determination?
4-36
Examples of Approximate
R2 Values
y
R2 = 1
x
4-38
Examples of Approximate
R2 Values
R2 = 0
y
No linear relationship between
x and y:
4-39
What does R2 mean?
• R2 means that R2*100% of the variation in y is
explained by x.
4-40
Calculating R2 for the salary.xls example
ESS 8,364,378,788
R
2
0.6373
TSS 13,125,600,000
This says 63.73% of the variation in salary is
explained by education 4-41
Using Excel to Compute the
Coefficient of Determination
• Position the mouse pointer over any data
point in the scatter diagram and right click to
display the chart menu.
• Select Add Trendline option
• When the Add Trendline dialog box occurs: On
the Options tab display the R-squared value
on the chart box and click OK.
4-42
The Standard Error of the Estimated
Sample Regression Function
The standard error of the regression function
measures, on average, how far the points fall
away from the regression line.
s y| x
Un exp lainedSS
( y ˆ
y ) 2
n k 1 n k 1
where k = the number of explanatory variables. In
simple linear regression k = 1.
4-43
Calculation of the Standard Error for
the salary.xls Example
̂ 0
ˆ1
4-45
Reading Regression Output in Excel: R2
ESS 8,364,378,788
R2 0.6373
TSS 13,125,600,000
63.73% of the variation in
salary is explained by the
variation in education
Explained
Unexplained
Total
4-46
Reading Regression Output in Excel:
Standard Error
USS 4,761,221,212
s y| x 24,395.75
n - k -1 8
Explained
Unexplained
Total
4-47
Excel’s Regression Tool
• Select the Tools menu
• Choose the Data Analysis option
• Choose Regression from the list of Analysis Tools
• Input y into the Input Y Range
Input x into the Input X Range
Select Labels
Select Output Range in the sheet
Click OK
4-48
Understand the Difference between
Correlation and Causation
Correlation is when there is a linear relationship
between two random variables.
Causation occurs between two random
variables when changes in one variable (say x)
causes changes in another variable (say y)
Spurious correlation occurs when there is
correlation between two random variables that
results from a relationship from a third random
variable
4-49
Understand the Difference
between Correlation and Causation
Just because there is correlation between two
random variables it does not mean causation.
Examples:
The more firemen at a fire is linked to increased
monetary damages from the fire.
The number of shark attacks and ice cream sales
are positively related.
Students who are tutored tend to get worse grades
than children that are tutored.
See Google correlate for more real world examples
of this phenomenon. 4-50