Professional Documents
Culture Documents
• β0, β1
o constant parameters
• xi
o nonrandom, observed with negligible error
• εi
o Random
o Zero mean, E(εi) = 0
o Constant variance, Var(εi) = σ2
o Uncorrelated, Cov(εi, εj) = 0, for i ≠ j
o Usually assumed independent identically distributed (i.i.d.)
(b) Properties
(i) Dependent variable yi
• yi is a random variable
• E(yi) = β0 + β1 xi
• E(Y|X) = β0 + β1 X
o Mean of Y conditional on X
• Var(yi) = σ2
o Independent of xi
• Cov(yi, yj) = 0, for i ≠ j
1
• Centered linear regression model
yi = β 0* + β1 ( xi − x ) + ε i
o β 0* = ? , express in terms of the components in the base model.
2. Estimation
(a) Definitions
• b0 = βˆ0 and b1 = βˆ1 are estimators of β0 and β1
• Fitted value
yˆ i = b0 + b1 xi
• Residual
ei = yi − yˆ i
• Fitted regression line
yˆ = b0 + b1 x
o Is it the true relation when the realization of b0 and b1 are obtained?
i =1 i =1
2
• For the simple linear regression model, the LSE b0 and b1 satisfy
⎧ ∂ ⎡ n
⎪ ⎢ ∑ ( yi − b0 − b1 xi )2 ⎤⎥ = 0
⎪ ∂b0 ⎣ i =1 ⎦
⎨
⎪ ∂ ⎡∑ ( yi − b0 − b1 xi )2 ⎤ = 0
n
⎪⎩ ∂b1 ⎢⎣ i =1 ⎥
⎦
⎧ n
⎪⎪ − ∑ 2( yi − b0 − b1 xi ) = 0
i =1
⎨ n
⎪− ∑ 2 xi ( yi − b0 − b1 xi ) = 0
⎪⎩ i =1
⎧ n n
⎪⎪ ∑ i y = nb 0 + b1 ∑ xi
i =1 i =1
⎨n n n
⎪∑ xi yi = b0 ∑ xi + b1 ∑ xi2
⎪⎩ i =1 i =1 i =1
∑ xi yi = ∑ ( y − b1 x )xi + b1 ∑ xi2
i =1 i =1 i =1
n n
∑ x ( y − y ) = b ∑ x (x − x )
i =1
i i 1
i =1
i i
n n
⎛ n
⎞ n
∑
i =1 i =1 ⎝ i=1 i =1 ⎠
S XY = b1S XX
S
b1 = XY
S XX
• The least squares estimators b0 and b1 are linear estimators as they are linear combinations
of yi
n n n n
∑ (x − x )( y
i i − y) ∑ (x − x ) y − y ∑ (x − x ) ∑ (x − x ) y
i i i i i n
b1 = i =1
= i =1 i =1
= i =1
= ∑ k i yi
S XX S XX S XX i =1
x −x
o where ki = i is independent of yi
S XX
1⎛ n n
⎞ 1⎛ n ⎛ n ⎞ n ⎞ n 1⎛ n
⎞
b0 = ⎜ ∑ yi − b1 ∑ xi ⎟ = ⎜⎜ ∑ yi − ⎜ ∑ ki yi ⎟∑ xi ⎟⎟ = ∑ ⎜1 − ki ∑ xi ⎟ yi
n ⎝ i=1 i =1 ⎠ n ⎝ i=1 ⎝ i=1 ⎠ i=1 ⎠ i=1 n ⎝ i =1 ⎠
3
(c) Distribution of the least squares estimators when the linear model is true
• b0 and b1 are unbiased estimators for β0 and β1
⎛ n ⎞ n n
⎜ ∑ (xi − x ) yi ⎟ ∑ ( xi − x )E ( yi ) ∑ (xi − x )(β 0 + β1 xi )
E (b1 ) = E ⎜ i=1 ⎟ = i=1 = i=1
⎜ S XX ⎟ S XX S XX
⎜ ⎟
⎝ ⎠
n n n
β 0 ∑ ( xi − x ) + β1 ∑ (xi − x )xi β1 ∑ (xi − x )2
= i =1 i =1
= i =1
S XX S XX
= β1
E (b0 ) = E ( y − b1 x ) = E ( y ) − E (b1 x )
1 n 1 n
= ∑ i E ( y ) − x E (b1 ) = ∑ (β 0 + β1xi ) − β1 x
n i=1 n i=1
1 n
= β 0 + β1 ∑ xi − β1 x
n i=1
= β0
Example
S 6800
o b1 = XY = =2
140
S XX 3400 120
o b0 = y − b1 x = 110 − 2 × 50 = 10 100
60
o Man-hours = 10 + 2 Lot size
• b1 = +2 40
• b0 = 10
o When log-size = 0, man-hors = 10 unit
o Not reliable as data range for X excludes
zero
4
Example
Shocks data
• All observations are considered
• Time: dependent variable
• Shocks: independent variable
• Regression model Time
15
o Time = β0 + β1 Shocks + ε 14
13
• Least squares estimates 12
S − 208.4 11
o b1 = XY = = - 0.6129 10
S XX 340 9
8
o b0 = y − b1 x = 5.8875 − (−0.6129 × 7.5) = 10.4846 7
6
• Estimated regression line 5
o Time = 10.48456 - 0.612941×Shocks 4
3
• b1 = -0.6129 2
o Time decreases with number of shocks 0 2 4 6 8 10 12 14 16
o When number of shock increases by 1, time Shocks
decreases by 0.6129 seconds
• b0 = 10.48
o When number of shocks = 0, time = 10.48 seconds
o Data range for X includes zero
⎛1 n ⎞ ⎛ yi ⎞
∑ (x − x )Cov⎜⎜ n ∑ y , y ⎟⎟ = S ∑ (x − x )Cov⎜⎝ n , y ⎟⎠
1 1
= i j i i i
S XX ⎝ j =1 ⎠ XX
σ 2
=
nS XX
∑ (x − x )
i
=0
o Therefore
Var (b0 ) = Var ( y − b1 x )
= Var ( y ) + x 2Var (b1 ) − 2 x Cov( y , b1 )
⎛ 1 x2 ⎞
= σ 2 ⎜⎜ + ⎟⎟
⎝ n S XX ⎠
5
• Covariance between b0 and b1
xσ 2
Cov(b0 , b1 ) = Cov(( y − b1 x ), b1 ) = Cov( y , b1 ) − x Cov(b1 , b1 ) = −
S XX
o We obtain SXX and x from the data, how about σ2?
Gauss-Markov theorem
The least square estimators b0 and b1 are unbiased and have minimum variance among all
unbiased linear estimators (Exercise)
(d) Remarks
• The inference and prediction by the fitted line are only valid for X values in the range of the
data set.
• A linear relationship between two variables can exist without causation.
• The simple linear regression model applies only if the true relationship between the two
variables is a straight-line relationship.
• When the magnitude of the slope estimate b1 is close to zero, the fitted regression line will
be nearly parallel to the x-axis. Then the supplementary variable X will be of little use for
the prediction of Y.
∑ (y − yˆ i )
2
i
MSE = s 2 = i =1
n−2
o Error (or residual) degrees of freedom (df) = n − 2
o s2 is unbiased under the important assumption that the model is correct
E s2 = σ 2 ( )
• Estimates of the variances and covariance of b0 and b1
o Obtained by replacing σ2 by s2
s2
s 2 {b1} =
S XX
⎛ 1 x2 ⎞
s {b0 } = s ⎜⎜ +
2 2
⎟⎟
⎝ n S XX ⎠
x ⋅ s2
Cov(b0 , b1 ) = −
^
S XX
Example
6
Production run Log-size (X) Man-hours (Y) Predicted Man-hours ( Ŷ )
1 30 73 70
2 20 50 50
3 60 128 130
4 80 170 170
5 40 87 90
6 50 108 110
7 60 135 130
8 30 69 70
9 70 148 150
10 60 132 130
• s2 =
1
10 − 2
[ ]
(73 − 70)2 + (50 − 50)2 + L + (132 − 130)2 = 60 = 7.5
8
o s = 7.5 = 2.74
o Degrees of freedom = n – 2 = 8
• Sample variance of b1
s2 7.5
o = = 0.002206
S XX 3400
o SE (b1 ) = 0.002206 = 0.046967
• Sample variance of b0
⎛ 1 x2 ⎞ ⎛1 50 2 ⎞
o s 2 ⎜⎜ + ⎟⎟ = 7.5⎜⎜ + ⎟⎟ = 6.264706
⎝ n S XX ⎠ ⎝ 10 3400 ⎠
o SE (b0 ) = 6.264706 = 2.502939
• Sample covariance of b0 and b1
xs 2 50 × 7.5
o − =− = −0.1103
S XX 3400
Example
Shocks data
• b0 = 10.4846, b1 = -0.6129
o yˆ = 10.4846 − 0.6129 x
X Y Predicted Y
0 11.4 10.4846
1 11.9 9.8716
2 7.1 9.2587
3 14.2 8.6457
4 5.9 8.0328
5 6.1 7.4199
… … …
• s2 =
1
16 − 2
[ ]
(11.4 − 10.48)2 + (11.9 − 9.87 )2 + L = 5.0943
o s = 5.0943 = 2.257
7
o df = 14
2σ
i
2 i =1
o where k is free of σ2
8
• Substituted by b0 and b1
n
2
( ) 1 n
l = k − log σ 2 − 2 ∑ ( yi − yˆ i )
2σ i =1
2
σˆ i =1
n
∑ (y i − yˆ i )
2
σˆ 2 = i =1
n
n−2 2
= s
n
o σˆ 2 is a biased estimator of σ2