Session 10 Simple Linear Regression: WMY Chapter 9 Parts 1-5 (Chapter 11 of Notes)

SMU Classification: Restricted
SESSION 10*
SIMPLE LINEAR
REGRESSION
WMY Chapter 9
Parts 1-5
(Chapter 11 of Notes)
*Most Slides from Prof Yang Zhenlin

Learning Objectives
 Simple Linear Regression Model
 Least Squares Method of Estimation
 Measuring Goodness of Fit
 Inference on Regression Coefficients
 Predicting with the Model
2
Introduction
We are interested in the relationship between two numerical
variables X and Y.
• One of these variables, say X, is known in advance, called
the explanatory variable, or independent variable.
• The other variable, Y, is a random variable and its values or
its general random behavior is of interest. For this, Y is
called the response variable, or dependent variable.
• If there is a strong relationship between X and Y, one can
predict a future random variable Y , based on the known
future value of X, through such a “relationship”.
• To study the relation, n pairs of observations on (X, Y) are
collected, denoted as (x1, y1) , (x2, y2) , . . . , (xn, yn).
• The Least Squares Method helps finding such a relation. 3
Describing the Relationship
Scatter diagram: plot of the pairs of observed values (x1, y1) ,

(x2, y2) , . . . , (xn, yn) of variables X and Y. It is a very effective
graphical tool for “revealing” the relationship between
variables.
X
4
Example 1
Prices of used cars and the Car Odometer ( X ) Price ( Y )
odometer readings. 1 37388 14636
2 44758 14122
• A car dealer wants to find the 3 45833 14016
relationship between the 4 30862 15590
odometer reading and the 5 31705 15568
selling price of used cars. 6 34010 14718
• A random sample of 100 cars . . .
is selected, and the data . . .
recorded. . . .
• Construct a scatter plot of
the data. The full data
5
Example 1 (continued)
The plot indeed shows a negative linear relation between

the price and the odometer reading.
6
Summary Statistics
Besides the graphical display of the data, some numerical
measures can be used to measure the direction and
strength of the linear relationship between two variables
1 n 1 n
• Sample Means: x   xi and y   yi
n i 1 n i 1
1 n 1 n
s 2

• Sample Variances: X n  1 
i 1
( xi  x ) 2
and sY
2
 
n  1 i 1
( y i  y ) 2
1 n
• Sample covariance: Cov ( X , Y )  
n  1 i 1
( xi  x )( yi  y )
Cov( X , Y )
• Sample correlation coefficient: r
s X sY
This is called the ‘five statistics summary’ of the data

7
Formulas for Covariance

1 n
Cov ( X , Y )  
n  1 i 1
( xi  x )( yi  y )
Shortcut Formulas :
1   xi  yi 
Cov( X , Y )    xi yi  
n 1  n 
1  2   xi   1    yi  
2 2
s 
2
X  xi  ; s 
2
Y   yi 
2

n 1  n  n 1  n 
8
Continuing on the Example 1, find the five statistics
summary and comment on the linear relationship between
price and odometer reading.
Solution:
x  36,009.45; s X2  43,528,690, sY2  259,996

y  14,822.823; Cov( X , Y )  2,712,511, or r   0.8063
As r = 0.8063, there exists a strong negative linear

relation …
9
Sample Coefficient of Correlation

+1 Strong positive linear
relationship. The scatter Cov(X, Y) > 0
diagram shows a clear
upward trend.
No linear relationship.
Scatter diagram shows or
r= 0 either no pattern, or a
non-linear pattern. Cov(X, Y) = 0
Strong negative linear
relationship. The scatter
diagram shows a clear
Cov(X, Y) < 0
1 downward trend.
10
Simple Linear Regression Model

The simple linear regression model takes the form:
y
𝑌 = 𝛽 0 + 𝛽 1 𝑥+ 𝜖
Y = dependent variable
x = independent variable
Rise b1 = Rise/Run
0 = intercept parameter
b0 Run
1 = slope parameter
ϵ = random error/random disturbance x
It has 2 parts. The 1st part is the straight line given by
0 and 1 are unknown population parameters, therefore

need to be estimated from the data 11

Since the actual y values do not fall on the straight line. We add
the 2nd part of the model which is the error term ϵ
ϵ is a random variable assumed to be normally distributed with
E(ϵ)= 0 and Var(ϵ)=
𝑌 = 𝛽 0 + 𝛽 1 𝑥 𝑖+ 𝜖 𝑖
Y
Observed Value
of Y for xi:
ϵi Slope = β1
Predicted Value
of Y for xi
Random
Error for this
Intercept = β0 xi value
x 12

To estimate the parameters 0 and 1, a random sample of
n experimental units are selected, and the values of (X, Y)
for each unit are to be observed to give (x1, y1), (x2, y2), . . .
, (xn, yn) . These n pairs of observations satisfy:

i.e.
And so on
Note there is no change to intercept and slope 13

Y is a random variable since ϵ is one. Due to the random
sampling mechanism, {Yi} must be independent, and so are
the {ϵi}. Further, it is reasonable to assume that
E(ϵi) = 0, i = 1, 2, . . . , n.
Thus, E (Yi )   0  1 xi , i  1, 2, . . . , n.
𝑌 = 𝛽 0 + 𝛽 1 𝑥 𝑖+ 𝜖 𝑖
Y
Observed Value
of Y for xi
ϵi Slope = β1
Predicted Value
of Y for xi
Random
Error for this
Intercept = β0 xi value
x
14
14
Learning Objectives
 Inference about Regression Coefficients
15
Least Squares Estimation
Based on the observed data, we are seeking a line that

best fits the data when two variables are related to one
another. yˆ  b  b x
0 1
ei  yi  yˆi
We define “best fit line” as a line for which the sum of

squared differences between it and the data points is
minimized.
16

Different lines generate different errors, thus different sum of
n
squares of errors.
SSE   ei 2
i 1
Y Y
Errors
Errors
Errors Errors
X X
There is a line that minimizes the sum of squared errors,
and in this sense it is the best line.
17

Let yˆ  b0  b1 x be a fitted line. Finding the best line that
minimizes the sum of squared errors (SSE) is equivalent to
finding the intercept b0 and the slope b1 which
n n
minimize SSE   e  ( yi  yi )
ˆ 2 2
i
i 1 i 1
The value of point i

The actual Y
calculated from the
value of point i
equation yˆ i  b0  b1 xi
That is, to minimize
n
SSE  ( yi  b0  b1 xi ) 2
i 1
18
Least Squares Estimators
n
1 n
 xi yi  nyx 
n  1 i 1
( xi  x )( yi  y )
Cov( X , Y )
b1  n
i 1
 n

1 s X2
i 1
xi  nx
2 2

n  1 i 1
( xi  x ) 2
b0  y  b1 x
gives the least squares equation: yˆ  b0  b1 x
19
Continuing on the Example 1, find the least squares line
relating odometer reading to the price of the used car.
Solution: The estimated coefficients are
Cov( X , Y )  2,712,511
b1  2
  .06232
sX 43,528,690
b0  y  b1 x  14,822.82  (.06232)(36,009.45)  17,067
The least squares equation is

yˆ  b0  b1 x  17,067  0.0623x
Interpretation of b1 = 0.0623: for one additional mile on

the odometer, it is estimated that the average cost of the
cars decrease by $0.0623. 20
Interpreting the Linear Regression

Equation
yˆ  17,067  .0623x
This is the estimated slope of

The intercept is
the line.
estimated as $17067.
For each additional mile on the
Do not interpret the odometer, the price decreases
intercept as the “Price of by an average of $0.0623
cars that have not been
driven”!
21
Interpreting the Linear Regression

Equation
17067
Odometer Line Fit Plot
16000
15000
Price
14000
0 13000
Odometer
No data
22
Properties of the Least Squares

Estimators
For the simple linear regression model:
𝑌 𝑖 = 𝛽 0+ 𝛽 1 𝑥𝑖 +𝜖 𝑖 , 𝑖=1,2 , … , 𝑛
Where {ϵi} are independent with E(ϵi) = 0, the least squares
estimators b0 and b1 are unbiased estimators of 0 and 1.
Under certain assumptions of the model, the least squares
estimators are best linear unbiased estimators (BLUE).
23
Learning Objectives
24
Coefficient of Determination R2
2
∑ ( 𝑦𝑖 − ^𝑦𝑖 )

2
∑ ( 𝑦𝑖 − ´𝑦 )

2
∑ ( ^𝑦𝑖 − ´𝑦 )

25
To understand the significance of coefficient of
determination, note:
n n n
 i
( y
i 1
 y )   i
2
( ˆ
y  y
i 1
)   i i
( y  ˆ
y ) 2 2
i 1
( SST ) ( SSR) ( SSE )

SST: Total corrected sum of squares. Represents the
variation in the response values that ideally would be
explained by the model
SSR: Regression sum of squares. Reflects the amount of
variation in the y-values explained by the model.
SSE: Error sum of squares. Is the variation in response due
to the error, or variation unexplained
It follows that R2 = 1 SSE/SST = SSR/SST 26
It is a measure of the strength of the linear relationship
between the response Y and the explanatory variable(s) X,
and is defined as
R  1
2 SSE
or R 
2  Cov( X , Y )
2
 i
( y  y ) 2
s 2 2
X sY
𝑛
2
where 𝑆𝑆𝑇 =∑ ( 𝑦 𝑖 − ´𝑦 )
𝑖=1
• The first definition is a general one and applies to linear
regression models with multi predictors.
• It simplifies to the second definition when there is only one
predictor X.
• In the case of simple linear regression, R2 is also the square
of the sample correlation coefficient r. 27
R2 measures the proportion of variability in the response Y

explained by the variation in X, or by the fitted model.
R2 takes on any value between zero and one.

R2 = 1: Fit is perfect, perfect match between the line and
the data points.
R2 = 0: There is no linear relationship between X and Y.
28
Sum of Squares due to Errors (SSE)

This is the sum of differences between the points and the
regression line.
It can serve as a measure of how well the line fits the data.
SSE is defined by
n
SSE   ( yi  yˆ i ) 2 .
i 1
A shortcut formula:
 2 Cov( X , Y ) 2 
SSE  (n  1)  sY  2 
 s X 
29
SST  (n  1) sY2 R 2  1  SSE / SST

 99  259996  1  9005450 /(99  259996)
 0.6501
65% of the variation in the auction selling price is explained

by the variation in odometer reading. The rest (35%)
remains unexplained by this model.
30
Learning Objectives
31
Estimate of Error Standard Deviation

• The mean error is equal to zero. If se is small, the errors
tend to be close to zero (close to the mean error). Then,
the model fits the data well.
• Therefore, we can also use se as a measure of the
suitability of using a linear model.
• However, se is unknown and has to be estimated.
Estimate of Error Standard Deviation
SSE
s 
n2
2
s =
 ( yi  ˆ
y ) 2
compare with sY2 

 i
( y  y ) 2
n2 n 1 32
Calculate the estimated of error standard deviation and the
coefficient of determination for Example 1, and describe
what does it tell you about the model fit?
Solution:
sY2 
 i
( y  y ) 2
 259,996 s 
SSE

9,005,450
 303.13
n 1 n2 98
 2 Cov( X , Y ) 2 
SSE  (n  1)  sY  2  It is hard to assess the
 s X 
model based on se even
 (2,712,511) 2 
 99 (259,996)   when compared with the
 43 ,528,690  mean value of Y,
 9,005,450 s  303.1, y  14,823
Calculated earlier 33
Testing the Slope

Testing whether there is a linear relationship between X
and Y is the same as testing whether slope=0
X
34
Testing the Slope

We can draw inference about 1 from b1 by testing
H0: 1 = 0 versus
H1: 1  0 (or < 0,or > 0)
The implication of this test is clear: if H0 is rejected, one can
conclude that there is sufficient evidence to show that Y and
X are linearly related; otherwise, they are not. The same
question can be answered by constructing a confidence
interval for 1. From the theoretical result and our knowledge
of the t-distribution, it is immediate to see that
b1  1 A statistic for testing the slope
~t n  2
s (n  1) s X2 parameter or constructing a
confidence interval for it.
Note: TB uses 35
Confidence Interval of Slope

Coefficient
s 
Apparently, the quantity is an estimate of
(n  1) s 2
X (n  1) s X2
the standard deviation of ̂1 , and thus referred to as the
estimated standard error of ̂1 .
A 100(1)% confidence interval for 1 is given as

 s s 
b1  t / 2,n  2 , b1  t / 2,n  2 
 (n  1) s X
2
(n  1) s X 
2
Inference concerning the intercept parameter 0 can be

carried out in a similar manner, but it is not as interesting
and important as for the slope parameter 1.
36
Test to determine whether there is enough evidence to infer
that there is a linear relationship between the car auction
price and the odometer reading for all three-year-old cars.
Use α = 5%.
Solution: H0: 1 = 0 vs H1: 1  0

b1  .0623
s 303.1
  .00462
(n  1) s X2 (99)( 43,528,690)
b1  0
t   .0623  0  13.49
s (n  1) s X2 .00462
37
A 95% CI for 1:
 0.0623  1.984  0.00462  {0.0715,  0.0531}
With n = n  2 = 98, the rejection region is
t > t0.025, 98
or
t <  t0.025, 98
where t0.025, 98  1.984.
As t =  13.49 <  1.984, reject H0 at 5% level of

significance. Yes, there is enough evidence to …
38
TEXTBOOK REFERENCES
Chapter 9: Linear Regression

Relevant Sections: 1 - 5
Section Remarks
1
2
3 Excluding: “Mean and Variance of Estimators”,
“Statistical Inference on the Intercept”
4
5 Partitioning of the total corrected sum of squares
of y only (pg 397-398)
39

Session 10 Simple Linear Regression: WMY Chapter 9 Parts 1-5 (Chapter 11 of Notes)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 10 Simple Linear Regression: WMY Chapter 9 Parts 1-5 (Chapter 11 of Notes)

Uploaded by

Copyright:

Available Formats

SMU Classification: Restricted

*Most Slides from Prof Yang Zhenlin

Describing the Relationship

Scatter diagram: plot of the pairs of observed values (x1, y1) ,

The plot indeed shows a negative linear relation between

This is called the ‘five statistics summary’ of the data

Formulas for Covariance

x  36,009.45; s X2  43,528,690, sY2  259,996

As r = 0.8063, there exists a strong negative linear

Sample Coefficient of Correlation

Simple Linear Regression Model

0 and 1 are unknown population parameters, therefore

Simple Linear Regression Model

Simple Linear Regression Model

Simple Linear Regression Model

Least Squares Estimation

Based on the observed data, we are seeking a line that

We define “best fit line” as a line for which the sum of

Least Squares Estimation

Least Squares Estimation

The value of point i

Least Squares Estimators

gives the least squares equation: yˆ  b0  b1 x

The least squares equation is

Interpretation of b1 = 0.0623: for one additional mile on

Interpreting the Linear Regression

This is the estimated slope of

Interpreting the Linear Regression

Properties of the Least Squares

( SST ) ( SSR) ( SSE )

R2 measures the proportion of variability in the response Y

R2 takes on any value between zero and one.

Sum of Squares due to Errors (SSE)

SST  (n  1) sY2 R 2  1  SSE / SST

65% of the variation in the auction selling price is explained

Estimate of Error Standard Deviation

compare with sY2 

Testing the Slope

Testing the Slope

Confidence Interval of Slope

A 100(1)% confidence interval for 1 is given as

Inference concerning the intercept parameter 0 can be

Solution: H0: 1 = 0 vs H1: 1  0

where t0.025, 98  1.984.

As t =  13.49 <  1.984, reject H0 at 5% level of

Chapter 9: Linear Regression

You might also like