Definition of Simple Linear Regression

Chapter 2.
Simple linear regression
1. Definition of simple linear regression
1.1. Standard model

Y = β0 + β1 X + ε
• Y dependent variable
• X independent variable
• ε random error term
• Parameters
o β0 intercept
o β1 slope
(a) Data setting

yi = β0 + β1 xi + εi, i =1, 2, …, n
• β0, β1
o constant parameters
• xi
o nonrandom, observed with negligible error
• εi
o Random
o Zero mean, E(εi) = 0
o Constant variance, Var(εi) = σ2
o Uncorrelated, Cov(εi, εj) = 0, for i ≠ j
o Usually assumed independent identically distributed (i.i.d.)
(b) Properties
(i) Dependent variable yi
• yi is a random variable
• E(yi) = β0 + β1 xi
• E(Y|X) = β0 + β1 X
o Mean of Y conditional on X
• Var(yi) = σ2
o Independent of xi
• Cov(yi, yj) = 0, for i ≠ j
(ii) Regression parameters

• β0 and β1 are called regression coefficients
o Depend on the units used on Y and X
• β1 is the change in E(Y) per unit increase of X
• Characteristics of β0 depends on the range of the X in the data
o When X = 0 is included, β0 = E(Y|X=0)
o When X = 0 is far away from the data, β0 has no particular meaning
o Usually just call β0 the intercept
1.2. Alternative model

• Dummy variable
yi = β0 x0 + β1 xi + εi
o x0 ≡ 1 for all i
1
• Centered linear regression model
yi = β 0* + β1 ( xi − x ) + ε i
o β 0* = ? , express in terms of the components in the base model.
2. Estimation
2.1. Least squares estimates for β 0 and β1

• The linear regression model assume the relation between X and Y to be linear.
o What would be the exact relation between X and Y?
• β0 and β1 are the true but unknown parameters under the linear relationship.
o What are the actual values of the parameters?
(a) Definitions
• b0 = βˆ0 and b1 = βˆ1 are estimators of β0 and β1
• Fitted value
yˆ i = b0 + b1 xi
• Residual
ei = yi − yˆ i
• Fitted regression line
yˆ = b0 + b1 x
o Is it the true relation when the realization of b0 and b1 are obtained?
(b) Least squares method

• Estimate β0 and β1 so that the fitted regression line (based on the estimated parameters) lies
“closest” to the data
• The parameter estimates are obtained by minimizing the residual sum of squares
n n
SSE = ∑ ( yi − yˆ i ) = ∑ ( yi − f ( xi | b0 , b1 ))
2 2
i =1 i =1
o In simple linear regression model, f ( x | b0 , b1 ) = b0 + b1 x

o The estimators are called the least squares estimators
o SSE corresponds to the squared vertical distances from the observed values to the fitted
line
2
• For the simple linear regression model, the LSE b0 and b1 satisfy
⎧ ∂ ⎡ n
⎪ ⎢ ∑ ( yi − b0 − b1 xi )2 ⎤⎥ = 0
⎪ ∂b0 ⎣ i =1 ⎦
⎨
⎪ ∂ ⎡∑ ( yi − b0 − b1 xi )2 ⎤ = 0
n
⎪⎩ ∂b1 ⎢⎣ i =1 ⎥
⎦
⎧ n
⎪⎪ − ∑ 2( yi − b0 − b1 xi ) = 0
i =1
⎨ n
⎪− ∑ 2 xi ( yi − b0 − b1 xi ) = 0
⎪⎩ i =1
⎧ n n
⎪⎪ ∑ i y = nb 0 + b1 ∑ xi
i =1 i =1
⎨n n n
⎪∑ xi yi = b0 ∑ xi + b1 ∑ xi2
⎪⎩ i =1 i =1 i =1
o From the 1st equation

1⎛ n n
⎞
⎜ ∑ yi − b1 ∑ xi ⎟ = y − b1 x
b0 =
n ⎝ i =1 i =1 ⎠
o Substitute b0 into the 2nd equation
n n n
∑ xi yi = ∑ ( y − b1 x )xi + b1 ∑ xi2
i =1 i =1 i =1
n n
∑ x ( y − y ) = b ∑ x (x − x )
i =1
i i 1
i =1
i i
n n
⎛ n
⎞ n
∑ (xi − x )( yi − y ) + ∑ x ( yi − y ) = b1 ⎜ ∑ (xi − x )(xi − x ) + ∑ x (xi − x )⎟

i =1 i =1 ⎝ i=1 i =1 ⎠
(xi − x )( yi − y ) + x ∑ ( yi − y ) = b1 ⎛⎜ ∑ (xi − x )(xi − x ) + x ∑ (xi − x )⎞⎟

n n n n
∑
i =1 i =1 ⎝ i=1 i =1 ⎠
S XY = b1S XX
S
b1 = XY
S XX
• The least squares estimators b0 and b1 are linear estimators as they are linear combinations
of yi
n n n n
∑ (x − x )( y
i i − y) ∑ (x − x ) y − y ∑ (x − x ) ∑ (x − x ) y
i i i i i n
b1 = i =1
= i =1 i =1
= i =1
= ∑ k i yi
S XX S XX S XX i =1
x −x
o where ki = i is independent of yi
S XX
1⎛ n n
⎞ 1⎛ n ⎛ n ⎞ n ⎞ n 1⎛ n
⎞
b0 = ⎜ ∑ yi − b1 ∑ xi ⎟ = ⎜⎜ ∑ yi − ⎜ ∑ ki yi ⎟∑ xi ⎟⎟ = ∑ ⎜1 − ki ∑ xi ⎟ yi
n ⎝ i=1 i =1 ⎠ n ⎝ i=1 ⎝ i=1 ⎠ i=1 ⎠ i=1 n ⎝ i =1 ⎠
3
(c) Distribution of the least squares estimators when the linear model is true
• b0 and b1 are unbiased estimators for β0 and β1
⎛ n ⎞ n n
⎜ ∑ (xi − x ) yi ⎟ ∑ ( xi − x )E ( yi ) ∑ (xi − x )(β 0 + β1 xi )
E (b1 ) = E ⎜ i=1 ⎟ = i=1 = i=1
⎜ S XX ⎟ S XX S XX
⎜ ⎟
⎝ ⎠
n n n
β 0 ∑ ( xi − x ) + β1 ∑ (xi − x )xi β1 ∑ (xi − x )2
= i =1 i =1
= i =1
S XX S XX
= β1
E (b0 ) = E ( y − b1 x ) = E ( y ) − E (b1 x )
1 n 1 n
= ∑ i E ( y ) − x E (b1 ) = ∑ (β 0 + β1xi ) − β1 x
n i=1 n i=1
1 n
= β 0 + β1 ∑ xi − β1 x
n i=1
= β0
Example
Westwood company data

• Man-hours : dependent variable
• Lot size: independent variable
• Regression model
o Man-hours = β0 + β1 Lot size + ε 180
y = 2x + 10
• Least squares estimates 160
S 6800
o b1 = XY = =2
140
S XX 3400 120
o b0 = y − b1 x = 110 − 2 × 50 = 10 100
• Estimated regression line:

80
60
o Man-hours = 10 + 2 Lot size
• b1 = +2 40
o Man-hours increase with log-size 20
o When log-size increases by 1 unit, man- 0

0 20 40 60 80 100
hours increase by 2 units
• b0 = 10
o When log-size = 0, man-hors = 10 unit
o Not reliable as data range for X excludes
zero
4
Example
Shocks data
• All observations are considered
• Time: dependent variable
• Shocks: independent variable
• Regression model Time
15
o Time = β0 + β1 Shocks + ε 14
13
• Least squares estimates 12
S − 208.4 11
o b1 = XY = = - 0.6129 10
S XX 340 9
8
o b0 = y − b1 x = 5.8875 − (−0.6129 × 7.5) = 10.4846 7
6
• Estimated regression line 5
o Time = 10.48456 - 0.612941×Shocks 4
3
• b1 = -0.6129 2
o Time decreases with number of shocks 0 2 4 6 8 10 12 14 16
o When number of shock increases by 1, time Shocks
decreases by 0.6129 seconds
• b0 = 10.48
o When number of shocks = 0, time = 10.48 seconds
o Data range for X includes zero
• Variance of the sampling distribution of b1

⎛ n ⎞ n n
⎜ ∑ ( xi − x ) yi ⎟ ∑ (xi − x ) Var ( yi ) ∑ (x − x ) σ
2 2 2
i
σ2
Var (b1 ) = Var ⎜ i=1 ⎟ = i=1 = i =1
=
⎜ S XX ⎟ (S XX )2 (S XX )2 S XX
⎜ ⎟
⎝ ⎠
o Q Cov ( yi , y j ) = 0, i ≠ j
• Variance of the sampling distribution of b0

o Consider
(x − x )yi ⎞⎟ 1
Cov( y , b1 ) = Cov⎜⎜ y , ∑ i
⎛
S XX ⎟= S ∑ (x − x )Cov( y, y )
i i
⎝ ⎠ XX
⎛1 n ⎞ ⎛ yi ⎞
∑ (x − x )Cov⎜⎜ n ∑ y , y ⎟⎟ = S ∑ (x − x )Cov⎜⎝ n , y ⎟⎠
1 1
= i j i i i
S XX ⎝ j =1 ⎠ XX
σ 2
=
nS XX
∑ (x − x )
i
=0
o Therefore
Var (b0 ) = Var ( y − b1 x )
= Var ( y ) + x 2Var (b1 ) − 2 x Cov( y , b1 )
⎛ 1 x2 ⎞
= σ 2 ⎜⎜ + ⎟⎟
⎝ n S XX ⎠
5
• Covariance between b0 and b1
xσ 2
Cov(b0 , b1 ) = Cov(( y − b1 x ), b1 ) = Cov( y , b1 ) − x Cov(b1 , b1 ) = −
S XX
o We obtain SXX and x from the data, how about σ2?
Gauss-Markov theorem
The least square estimators b0 and b1 are unbiased and have minimum variance among all
unbiased linear estimators (Exercise)
(d) Remarks
• The inference and prediction by the fitted line are only valid for X values in the range of the
data set.
• A linear relationship between two variables can exist without causation.
• The simple linear regression model applies only if the true relationship between the two
variables is a straight-line relationship.
• When the magnitude of the slope estimate b1 is close to zero, the fitted regression line will
be nearly parallel to the x-axis. Then the supplementary variable X will be of little use for
the prediction of Y.
2.2. Estimate of error variance

• The error variance
σ 2 = Var (ε i ) = Var ( yi − (β 0 + β1 xi ))
• Error mean square (or mean square error, MSE) is defined as
n
∑ (y − yˆ i )
2
i
MSE = s 2 = i =1
n−2
o Error (or residual) degrees of freedom (df) = n − 2
o s2 is unbiased under the important assumption that the model is correct
E s2 = σ 2 ( )
• Estimates of the variances and covariance of b0 and b1
o Obtained by replacing σ2 by s2
s2
s 2 {b1} =
S XX
⎛ 1 x2 ⎞
s {b0 } = s ⎜⎜ +
2 2
⎟⎟
⎝ n S XX ⎠
x ⋅ s2
Cov(b0 , b1 ) = −
^
S XX
Example
Westwood company data

• b0 = 10, b1 = 2
o yˆ i = 10 + 2 xi
6
Production run Log-size (X) Man-hours (Y) Predicted Man-hours ( Ŷ )
1 30 73 70
2 20 50 50
3 60 128 130
4 80 170 170
5 40 87 90
6 50 108 110
7 60 135 130
8 30 69 70
9 70 148 150
10 60 132 130
• s2 =
1
10 − 2
[ ]
(73 − 70)2 + (50 − 50)2 + L + (132 − 130)2 = 60 = 7.5
8
o s = 7.5 = 2.74
o Degrees of freedom = n – 2 = 8
• Sample variance of b1
s2 7.5
o = = 0.002206
S XX 3400
o SE (b1 ) = 0.002206 = 0.046967
• Sample variance of b0
⎛ 1 x2 ⎞ ⎛1 50 2 ⎞
o s 2 ⎜⎜ + ⎟⎟ = 7.5⎜⎜ + ⎟⎟ = 6.264706
⎝ n S XX ⎠ ⎝ 10 3400 ⎠
o SE (b0 ) = 6.264706 = 2.502939
• Sample covariance of b0 and b1
xs 2 50 × 7.5
o − =− = −0.1103
S XX 3400
Example
Shocks data
• b0 = 10.4846, b1 = -0.6129
o yˆ = 10.4846 − 0.6129 x
X Y Predicted Y
0 11.4 10.4846
1 11.9 9.8716
2 7.1 9.2587
3 14.2 8.6457
4 5.9 8.0328
5 6.1 7.4199
… … …
• s2 =
1
16 − 2
[ ]
(11.4 − 10.48)2 + (11.9 − 9.87 )2 + L = 5.0943
o s = 5.0943 = 2.257
7
o df = 14
• Sample variance for b1

5.0943
o = 0.0150
340
o SE (b1 ) = 0.1224
• Sample variance for b0
⎛ 1 7.52 ⎞
o 5.0943⎜⎜ + ⎟⎟ = 1.1612
⎝ 16 340 ⎠
o SE (b0 ) = 1.0776
• Sample covariance of b0 and b1
7.5 × 5.0943
o − = −0.1124
340
2.3. Maximum likelihood estimation

(a) Likelihood function
• Assume εi ~ i.i.d. N(0,σ2) with probability density function
f (ε i ) = φ (ε i )
• The joint density function / likelihood function
n n
L = ∏ f (ε i ) = ∏ φ (ε i )
i =1 i =1
• The pdf for the normal distribution with mean 0 is

⎛ x2 ⎞
φ (x ) =
1
Exp ⎜− ⎟
(2π )1/ 2 σ ⎜⎝ 2σ 2 ⎟⎠
• The likelihood function is
L = L(β 0 , β1 , σ 2 | x, y )
1 ⎛ 1 n
⎞
=
(2π ) σ
n/2 n
Exp⎜ − 2
⎝ 2σ
∑ε
i =1
i
2
⎟
⎠
⎛ 1 n
2⎞
∑ (y − β 0 − β1 xi ) ⎟
1
= Exp⎜ − 2
(2π ) σ
n/2 n
⎝ 2σ i =1
i
⎠
(b) Maximum likelihood estimates (MLE) for β0 and β1

• MLE for β0 and β1 maximize L, i.e. to minimize
1 n
2 ∑
( yi − β 0 − β1 xi )2
2σ i =1
o Equivalent to minimizing SSE
o Under the normal theory assumption, the MLE of the regression coefficients β0 and β1
are the least squares estimators
(c) MLE for error variance

• The log-likelihood is given as
( )
n
l = log(L ) = k − log σ 2 − 2 ∑ (y − β 0 − β1 xi )
n 1 2
2σ
i
2 i =1
o where k is free of σ2
8
• Substituted by b0 and b1
n
2
( ) 1 n
l = k − log σ 2 − 2 ∑ ( yi − yˆ i )
2σ i =1
2
• MLE for σ2 maximizes l

∂l 1 n
= −
n
+ 4 ∑
( yi − yˆ i )2 = 0
∂σ 2
2σ 2
2σ i =1
n
− n + 2 ∑ ( yi − yˆ i ) = 0
1 2
σˆ i =1
n
∑ (y i − yˆ i )
2
σˆ 2 = i =1
n
n−2 2
= s
n
o σˆ 2 is a biased estimator of σ2

Definition of Simple Linear Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Definition of Simple Linear Regression

Uploaded by

Copyright:

Available Formats

Chapter 2.

Simple linear regression

1. Definition of simple linear regression

1.1. Standard model

(a) Data setting

(ii) Regression parameters

1.2. Alternative model

2.1. Least squares estimates for β 0 and β1

(b) Least squares method

o In simple linear regression model, f ( x | b0 , b1 ) = b0 + b1 x

o From the 1st equation

∑ (xi − x )( yi − y ) + ∑ x ( yi − y ) = b1 ⎜ ∑ (xi − x )(xi − x ) + ∑ x (xi − x )⎟

(xi − x )( yi − y ) + x ∑ ( yi − y ) = b1 ⎛⎜ ∑ (xi − x )(xi − x ) + x ∑ (xi − x )⎞⎟

Westwood company data

• Estimated regression line:

o Man-hours increase with log-size 20

o When log-size increases by 1 unit, man- 0

• Variance of the sampling distribution of b1

• Variance of the sampling distribution of b0

2.2. Estimate of error variance

Westwood company data

• Sample variance for b1

2.3. Maximum likelihood estimation

• The pdf for the normal distribution with mean 0 is

(b) Maximum likelihood estimates (MLE) for β0 and β1

(c) MLE for error variance

• MLE for σ2 maximizes l

You might also like