Professional Documents
Culture Documents
SESSION 10*
SIMPLE LINEAR
REGRESSION
WMY Chapter 9
Parts 1-5
(Chapter 11 of Notes)
Learning Objectives
Simple Linear Regression Model
Least Squares Method of Estimation
Measuring Goodness of Fit
Inference on Regression Coefficients
Predicting with the Model
2
SMU Classification: Restricted
Introduction
We are interested in the relationship between two numerical
variables X and Y.
• One of these variables, say X, is known in advance, called
the explanatory variable, or independent variable.
• The other variable, Y, is a random variable and its values or
its general random behavior is of interest. For this, Y is
called the response variable, or dependent variable.
• If there is a strong relationship between X and Y, one can
predict a future random variable Y , based on the known
future value of X, through such a “relationship”.
• To study the relation, n pairs of observations on (X, Y) are
collected, denoted as (x1, y1) , (x2, y2) , . . . , (xn, yn).
• The Least Squares Method helps finding such a relation. 3
SMU Classification: Restricted
X
4
SMU Classification: Restricted
Example 1
Prices of used cars and the Car Odometer ( X ) Price ( Y )
odometer readings. 1 37388 14636
2 44758 14122
• A car dealer wants to find the 3 45833 14016
relationship between the 4 30862 15590
odometer reading and the 5 31705 15568
selling price of used cars. 6 34010 14718
• A random sample of 100 cars . . .
is selected, and the data . . .
recorded. . . .
• Construct a scatter plot of
the data. The full data
5
SMU Classification: Restricted
Example 1 (continued)
Summary Statistics
Besides the graphical display of the data, some numerical
measures can be used to measure the direction and
strength of the linear relationship between two variables
1 n 1 n
• Sample Means: x xi and y yi
n i 1 n i 1
1 n 1 n
s 2
• Sample Variances: X n 1
i 1
( xi x ) 2
and sY
2
n 1 i 1
( y i y ) 2
1 n
• Sample covariance: Cov ( X , Y )
n 1 i 1
( xi x )( yi y )
Cov( X , Y )
• Sample correlation coefficient: r
s X sY
Shortcut Formulas :
1 xi yi
Cov( X , Y ) xi yi
n 1 n
1 2 xi 1 yi
2 2
s
2
X xi ; s
2
Y yi
2
n 1 n n 1 n
8
SMU Classification: Restricted
Example 1 (continued)
Continuing on the Example 1, find the five statistics
summary and comment on the linear relationship between
price and odometer reading.
Solution:
9
SMU Classification: Restricted
Y = dependent variable
x = independent variable
Rise b1 = Rise/Run
0 = intercept parameter
b0 Run
1 = slope parameter
ϵ = random error/random disturbance x
It has 2 parts. The 1st part is the straight line given by
𝑌 = 𝛽 0 + 𝛽 1 𝑥 𝑖+ 𝜖 𝑖
Y
Observed Value
of Y for xi:
ϵi Slope = β1
Predicted Value
of Y for xi
Random
Error for this
Intercept = β0 xi value
x 12
SMU Classification: Restricted
i.e.
And so on
Note there is no change to intercept and slope 13
SMU Classification: Restricted
x
14
14
SMU Classification: Restricted
Learning Objectives
Simple Linear Regression Model
Least Squares Method of Estimation
Measuring Goodness of Fit
Inference about Regression Coefficients
Predicting with the Model
15
SMU Classification: Restricted
ei yi yˆi
16
SMU Classification: Restricted
i 1
Y Y
Errors
Errors
Errors Errors
X X
There is a line that minimizes the sum of squared errors,
and in this sense it is the best line.
17
SMU Classification: Restricted
n
1 n
xi yi nyx
n 1 i 1
( xi x )( yi y )
Cov( X , Y )
b1 n
i 1
n
1 s X2
i 1
xi nx
2 2
n 1 i 1
( xi x ) 2
b0 y b1 x
19
SMU Classification: Restricted
Example 1 (continued)
Continuing on the Example 1, find the least squares line
relating odometer reading to the price of the used car.
Solution: The estimated coefficients are
Cov( X , Y ) 2,712,511
b1 2
.06232
sX 43,528,690
b0 y b1 x 14,822.82 (.06232)(36,009.45) 17,067
yˆ 17,067 .0623x
16000
15000
Price
14000
0 13000
Odometer
No data
22
SMU Classification: Restricted
23
SMU Classification: Restricted
Learning Objectives
Simple Linear Regression Model
Least Squares Method of Estimation
Measuring Goodness of Fit
Inference about Regression Coefficients
Predicting with the Model
24
SMU Classification: Restricted
Coefficient of Determination R2
2
∑ ( 𝑦𝑖 − ^𝑦𝑖 )
2
∑ ( 𝑦𝑖 − ´𝑦 )
2
∑ ( ^𝑦𝑖 − ´𝑦 )
25
SMU Classification: Restricted
Coefficient of Determination R2
To understand the significance of coefficient of
determination, note:
n n n
i
( y
i 1
y ) i
2
( ˆ
y y
i 1
) i i
( y ˆ
y ) 2 2
i 1
Coefficient of Determination R2
It is a measure of the strength of the linear relationship
between the response Y and the explanatory variable(s) X,
and is defined as
R 1
2 SSE
or R
2 Cov( X , Y )
2
i
( y y ) 2
s 2 2
X sY
𝑛
2
where 𝑆𝑆𝑇 =∑ ( 𝑦 𝑖 − ´𝑦 )
𝑖=1
• The first definition is a general one and applies to linear
regression models with multi predictors.
• It simplifies to the second definition when there is only one
predictor X.
• In the case of simple linear regression, R2 is also the square
of the sample correlation coefficient r. 27
SMU Classification: Restricted
Coefficient of Determination R2
28
SMU Classification: Restricted
It can serve as a measure of how well the line fits the data.
SSE is defined by
n
SSE ( yi yˆ i ) 2 .
i 1
A shortcut formula:
2 Cov( X , Y ) 2
SSE (n 1) sY 2
s X
29
SMU Classification: Restricted
Example 1 (continued)
30
SMU Classification: Restricted
Learning Objectives
Simple Linear Regression Model
Least Squares Method of Estimation
Measuring Goodness of Fit
Inference about Regression Coefficients
Predicting with the Model
31
SMU Classification: Restricted
n2 n 1 32
SMU Classification: Restricted
Example 1 (continued)
Calculate the estimated of error standard deviation and the
coefficient of determination for Example 1, and describe
what does it tell you about the model fit?
Solution:
sY2
i
( y y ) 2
259,996 s
SSE
9,005,450
303.13
n 1 n2 98
2 Cov( X , Y ) 2
SSE (n 1) sY 2 It is hard to assess the
s X
model based on se even
(2,712,511) 2
99 (259,996) when compared with the
43 ,528,690 mean value of Y,
9,005,450 s 303.1, y 14,823
Calculated earlier 33
SMU Classification: Restricted
X
34
SMU Classification: Restricted
Example 1 (continued)
Test to determine whether there is enough evidence to infer
that there is a linear relationship between the car auction
price and the odometer reading for all three-year-old cars.
Use α = 5%.
37
SMU Classification: Restricted
Example 1 (continued)
A 95% CI for 1:
0.0623 1.984 0.00462 {0.0715, 0.0531}
With n = n 2 = 98, the rejection region is
t > t0.025, 98
or
t < t0.025, 98
38
SMU Classification: Restricted
TEXTBOOK REFERENCES
39