Topic13 OLS

Outline
 Least Squares Methods

 Estimation: Least Squares
 Interpretation of estimators
 Properties of OLS estimators
 Variance of Y, b, and a
 Hypothesis Test of b and a
 ANOVA table
 Goodness-of-Fit and R2
(c) 2007 IUPUI SPEA K300 (4392)

Linear regression model
Y = 2 +.5X
5
4
3
y
.5
2
1
1
-1 0 1 2 3 4 5
x
(c) 2007 IUPUI SPEA K300 (4392)

Terminology
 Dependent variable (DV) = response

variable = left-hand side (LHS) variable
 Independent variables (IV) = explanatory
variables = right-hand side (RHS)
variables = regressor (excluding a or b0)
 a (b0) is an estimator of parameter α, β0
 b (b1) is an estimator of parameter β, β1
 a and b are the intercept and slope
(c) 2007 IUPUI SPEA K300 (4392)

Least Squares Method
 How to draw such a line based on data points
observed?
 Suppose a imaginary line of y= a + bx
 Imagine a vertical distance (or error) between
the line and a data point. E=Y-E(Y)
 This error (or gap) is the deviation of the data
point from the imaginary line, regression line
 What is the best values of a and b?
 A and b that minimizes the sum of such errors
(deviations of individual data points from the
line)
(c) 2007 IUPUI SPEA K300 (4392)

4
x3
3
e3
E(Y)=a + bX
2
y
x1
e2
e1
1
x2
0
0 1 2 3 4 5
x
(c) 2007 IUPUI SPEA K300 (4392)

 Deviation does not have good properties

for computation
 Why do we use squares of deviation?
(e.g., variance)
 Let us get a and b that can minimize the
sum of squared deviations rather than
the sum of deviations.
 This method is called least squares
(c) 2007 IUPUI SPEA K300 (4392)

 Least squares method minimizes the sum of
squares of errors (deviations of individual data
points form the regression line)
 Such a and b are called least squares
estimators (estimators of parameters α and β).
 The process of getting parameter estimators
(e.g., a and b) is called estimation
 “Regress Y on X”
 Lest squares method is the estimation method
of ordinary least squares (OLS)
(c) 2007 IUPUI SPEA K300 (4392)

Ordinary Least Squares
 Ordinary least squares (OLS) =

 Linear regression model =
 Classical linear regression model
 Linear relationship between Y and Xs
 Constant slopes (coefficients of Xs)
 Least squares method
 Xs are fixed; Y is conditional on Xs
 Error is not related to Xs
 Constant variance of errors
(c) 2007 IUPUI SPEA K300 (4392)

Least Squares Method 1
Y    X  
E (Y )  Yˆ  a  bX
  Y  Yˆ  Y  (a  bX )  Y  a  bX
 2  (Y  Yˆ ) 2  (Y  a  bX ) 2
(Y  a  bX )2  Y 2  a 2  b2 X 2  2aY  2bXY  2abX
 
 2
 (Y  ˆ
Y ) 2
  (Y  a  bX ) 2
Min  2  Min (Y  a  bX )2
How to get a and b that can minimize the sum

of squares of errors?
(c) 2007 IUPUI SPEA K300 (4392)
• Linear algebraic solution
• Compute a and b so that partial derivatives
with respect to a and b are equal to zero

  2

  (Y  a  bX ) 2 
 2na  2 Y  2b X  0
a a
na  Y  b X  0
a  Y
b  X
 Y  bX
n n
(c) 2007 IUPUI SPEA K300 (4392)

Take a partial derivative with respect to b and
plug in a you got, a=Ybar –b*Xbar

  2

 
  (Y  a  bX ) 2 2b X 2  2 XY  2a  X  0
b b
b X 2   XY  a X  0 b X 2   XY  Y  bX  X  0
 Y  X 
b X   XY  
2
 b  X  0

 n n 
b X 2   XY   X Y
b
 X
2
0
n n
 n X 2   X 2   XY   X Y
b 
 n  n
 
(c) 2007 IUPUI SPEA K300 (4392)

Least squares method is an algebraic solution
that minimizes the sum of squares of errors
(variance component of error)
n XY   X  Y  ( X  X )(Y  Y ) SP
b   xy
n X 2   X  (X  X )
2 2
SS x
a Y  b  X  Y  bX
n n
a     X  XY
Y X 2
n X 2   X 
2 Not recommended
(c) 2007 IUPUI SPEA K300 (4392)

OLS: Example 10-5 (1)
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)^2
1 43 128 -14.5 -8.5 123.25 210.25
2 48 120 -9.5 -16.5 156.75 90.25
3 56 135 -1.5 -1.5 2.25 2.25
4 61 143 3.5 6.5 22.75 12.25
5 67 141 9.5 4.5 42.75 90.25
6 70 152 12.5 15.5 193.75 156.25
Mean 57.5 136.5

Sum 345 819 541.5 561.5
b  ( X  X )(Y  Y ) SP
 xy

541.5
 .9644
( X  X ) 2
SS x 561.5
a  Y  bX  136.5  .9644  57.5  81.0481

(c) 2007 IUPUI SPEA K300 (4392)
OLS: Example 10-5 (2), NO!
No x y xy x^2
1 43 128 5504 1849
2 48 120 5760 2304
3 56 135 7560 3136
4 61 143 8723 3721
5 67 141 9447 4489
6 70 152 10640 4900
Mean 57.5 136.5

Sum 345 819 47634 20399
n XY   X  Y 6  47634  345  819
b   .964
n X 2   X  6  20399  3452
2
a 
Y X   X  XY 819  20399  345  47634
2
  81.048
n X   X  6  20399  345
2 2 2
(c) 2007 IUPUI SPEA K300 (4392)

OLS: Example 10-5 (3)
Y hat = 81.048 + .964X
150
140
130
120
40 50 60 70
x
Fitted values y
(c) 2007 IUPUI SPEA K300 (4392)

What Are a and b ?
 a is an estimator of its parameter α

 a is the intercept, a point of y where the
regression line meets the y axis
 b is an estimator of its parameter β
 b is the slope of the regression line
 b is constant regardless of values of Xs
 b is more important than a since that is
what researchers want to know.
(c) 2007 IUPUI SPEA K300 (4392)

How to interpret b?
 For unit increase in x, the expected

change in y is b, holding other things
(variables) constant.
 For unit increase in x, we expect that y
increases by b, holding other things
(variables) constant.
 For unit increase in x, we expect that y
increases by .964, holding other
variables constant.
(c) 2007 IUPUI SPEA K300 (4392)

Properties of OLS estimators
 The outcome of least squares method is OLS

parameter estimators a and b.
 OLS estimators are linear
 OLS estimators are unbiased (precise)
 OLS estimators are efficient (small variance)
 Gauss-Markov Theorem: Among linear
unbiased estimators, least square estimator
(OLS estimator) has minimum variance.
BLUE (best linear unbiased estimator)
(c) 2007 IUPUI SPEA K300 (4392)

Hypothesis Test of a an b
 How reliable are a and b we compute?

 T-test (Wald test in general) can answer
 The standardized effect size (effect size /
standard error)
 Effect size is a-0 and b-0 assuming 0 is the
hypothesized value; H0: α=0, H0: β=0
 Degrees of freedom is N-K, where K is the
number of regressors +1
 How to compute standard error (deviation)?
(c) 2007 IUPUI SPEA K300 (4392)

Variance of b (1)
 b is a random variable that changes across
samples.
 b is a weighted sum of linear combinations of
random variable Y
̂ 
 ( X  X )(Y  Y )  XY  nXY  ( X  X )Y
   w Y
(X  X ) (X  X ) (X  X )
2 2 2 i i
 w Y  w Y  w Y  ...  w Y
i i 1 1 2 2 n n
(Xi  X )
wi 
 ( X i  X )2
 ( X  X )(Y  Y )   ( XY  XY  XY  XY )   XY   XY   XY   XY 
 XY  Y  X  X  Y  nXY   XY  YnX  XnY  nXY   XY  nXY
(c) 2007 IUPUI SPEA K300 (4392)
Variance of b (2)
 Variance of Y (error) is σ2
 Var(kY) = k2Var(Y) = k2σ2
̂  b   ( X  X )(Y  Y )
 wY
( X  X ) 2 i i
Var ( ˆ )  Var ( wiYi )  w12Var (Y1 )  w22Var (Y2 )  ...  wn2Var (Yn ) 
w12 2  w22 2  ...  wn2 2   2  wi2
(Xi  X )
wi 
 ( X i  X )2
2
 (Xi  X )  2 2
 2  wi2   2      (X  X) 
2
 ( Xi  X ) 
2 
 ( X i  X) 2 2 i
( X i  X )2
(c) 2007 IUPUI SPEA K300 (4392)

Variance of a
 a=Ybar + b*Xbar
 Var(b)=σ2/SSx , SSx = ∑(X-Xbar)2
 Var(∑Y)=Var(Y1)+Var(Y2)+…+Var(Yn)=nσ2
Var (ˆ )  Var (Y  bX )  Var (Y )  Var (bX )  2Cov(Y , bX ) 
 Y  1   2 
Var   X Var (b)  2 Var ( Y )  X
2 2 

 (Xi  X ) 
 2 
 n  n
1 
2 2  1 X 2 
n  X
2   2   
n 2  (X  X ) 
2  n (X  X ) 
2
 i   i 
Now, how do we compute the variance

of Y, σ2?
(c) 2007 IUPUI SPEA K300 (4392)
Variance of Y or error
 Variance of Y is based on residuals (errors), Y-
Yhat
 “Hat” means an estimator of the parameter
 Y hat is predicted (by a + bX) value of Y; plug
in x given a and b to get Y hat
 Since a regression model includes K
parameters (a and b in simple regression), the
degrees of freedom is N-K
 Numerator is SSE in the ANOVA table
̂ 2  se2   i i 
(Y  ˆ
Y ) 2
SSE
 MSE
N K NK
(c) 2007 IUPUI SPEA K300 (4392)
Illustration (1)
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)^2 yhat (y-yhat)^2
1 43 128 -14.5 -8.5 123.25 210.25 122.52 30.07
2 48 120 -9.5 -16.5 156.75 90.25 127.34 53.85
3 56 135 -1.5 -1.5 2.25 2.25 135.05 0.00
4 61 143 3.5 6.5 22.75 12.25 139.88 9.76
5 67 141 9.5 4.5 42.75 90.25 145.66 21.73
6 70 152 12.5 15.5 193.75 156.25 148.55 11.87
Mean 57.5 136.5
Sum 345 819 541.5 561.5 127.2876
s2
  (Y  Yˆ )
i i
2

127.2876
 31.8219 SSE=127.2876, MSE=31.8219
N K 62
e
ˆ 2 31.8219
Var (b)  Var ( ˆ )    .0567  .2381
2
 ( X i  X )2 561.5
1 X2   1 57.52 
Var (a)  ˆ2    31.8219    13.88092
 n (X  X ) 
2
 i   6 561.5 
(c) 2007 IUPUI SPEA K300 (4392)

Illustration (2): Test b
 How to test whether beta is zero (no effect)?
 Like y, α and β follow a normal distribution; a
and b follows the t distribution
 b=.9644, SE(b)=.2381,df=N-K=6-2=4
 Hypothesis Testing
 1. H0:β=0 (no effect), Ha:β≠0 (two-tailed)
 2. Significance level=.05, CV=2.776, df=6-2=4
 3. TS=(.9644-0)/.2381=4.0510~t(N-K)
 4. TS (4.051)>CV (2.776), Reject H0
 5. Beta (not b) is not zero. There is a significant
impact of X on Y
1
1  ˆ1  t 2 se  .9644  2.776  .2381
 i
( X  X ) 2
(c) 2007 IUPUI SPEA K300 (4392)

Illustration (3): Test a
 How to test whether alpha is zero?
 Like y, α and β follow a normal distribution; a
and b follows the t distribution
 a=81.0481, SE(a)=13.8809, df=N-K=6-2=4
 Hypothesis Testing
 1. H0:α=0, Ha:α≠0 (two-tailed)
 2. Significance level=.05, CV=2.776
 3. TS=(81.0481-0)/.13.8809=5.8388~t(N-K)
 4. TS (5.839)>CV (2.776), Reject H0
 5. Alpha (not a) is not zero. The intercept is
discernable from zero (significant intercept).
ˆ 1 X2
 0   0  t 2 se   .81.0481  2.776 13.8809
n  ( X i  X )2
(c) 2007 IUPUI SPEA K300 (4392)

Questions
 How do we test H0: β0(α)=β1=β2 …=0?

 Remember that t-test compares only two
group means, while ANOVA compares more
than two group means simultaneously.
 The same thing in linear regression.
 Construct the ANOVA table by partitioning
variance of Y; F test examines the above H0
 The ANOVA table provides key information of
a regression model
(c) 2007 IUPUI SPEA K300 (4392)

Partitioning Variance of Y (1)
Y hat = 81.048 + .964X, Ybar=136.5
150
Yi
140
Yhat=81+.96X
Ybar=136.5
130
120
40 50 60 70
x
Fitted values y
(c) 2007 IUPUI SPEA K300 (4392)

yi  y  yî  y  yi  yî
  
Total Model Re sidual( Error)
 (

yi
 
 y ) 2
 (

ˆ
y i
 
 y ) 2
 (

yi  ˆ
y
 i ) 2
Total Model Re sidual( Error)

n
SSM   (Yî  Y ) 2
i 1
n
SSE   (Yi  Yˆ ) 2
i 1
n
 (Y  Yˆ )
i
2
SSE
s2  i 1
  MSE
NK N K
SST  SS y   (Yi  Y)2  Yi 2  nY 2
(c) 2007 IUPUI SPEA K300 (4392)

81+.96X
No x y yhat (y-ybar)^2 (yhat-ybar)^2 (y-yhat)^2
1 43 128 122.52 72.25 195.54 30.07
2 48 120 127.34 272.25 83.94 53.85
3 56 135 135.05 2.25 2.09 0.00
4 61 143 139.88 42.25 11.39 9.76
5 67 141 145.66 20.25 83.94 21.73
6 70 152 148.55 240.25 145.32 11.87
Mean 57.5 136.5 SST SSM SSE
Sum 345 819 649.5000 522.2124 127.2876
•122.52=81+.96×43, 148.6=.81+.96×70
•SST=SSM+SSE, 649.5=522.2+127.3
(c) 2007 IUPUI SPEA K300 (4392)

ANOVA Table
 H0: all parameters are zero, β0 = β1 = 0
 Ha: at least one parameter is not zero
 CV is 12.22 (1,4), TS>CV, reject H0
Sources Sum of Squares DF Mean Squares F

Model SSM K-1 MSM=SSM/(K-1) MSM/MSE
Residual SSE N-K MSE=SSE/(N-K)
Total SST N-1
Sources Sum of Squares DF Mean Squares F
Model 522.2124 1 522.2124 16.41047
Residual 127.2876 4 31.8219
Total 649.5000 5
(c) 2007 IUPUI SPEA K300 (4392)

R2 and Goodness-of-fit
 Goodness-of-fit measures evaluates how well
a regression model fits the data
 The smaller SSE, the better fit the model
 F test examines if all parameters are zero.
(large F and small p-value indicate good fit)
 R2 (Coefficient of Determination) is SSM/SST
that measures how much a model explains the
overall variance of Y.
 R2=SSM/SST=522.2/649.5=.80
 Large R square means the model fits the data
(c) 2007 IUPUI SPEA K300 (4392)

Myth and Misunderstanding in R2
 R square is Karl Pearson correlation coefficient

squared. r2=.89672=.80
 If a regression model includes many regressors, R2 is
less useful, if not useless.
 Addition of any regressor always increases R2
regardless of the relevance of the regressor
 Adjusted R2 give penalty for adding regressors, Adj.
R2=1-[(N-1)/(N-K)](1-R2)
 R2 is not a panacea although its interpretation is
intuitive; if the intercept is omitted, R2 is incorrect.
 Check specification, F, SSE, and individual parameter
estimators to evaluate your model; A model with
smaller R2 can be better in some cases.
(c) 2007 IUPUI SPEA K300 (4392)

Interpolation and Extrapolation
 Confidence interval of E(Y|X), where x is within
the rage of data x; interpolation
 Confidence interval of Y|X, where x is beyond
the range of data x; extrapolation
 Extrapolation involves penalty and danger,
which widens the confidence interval; less
reliable
1 ( x  x )2
yˆ  t 2 s 
n SS x
1 ( x  x )2
yˆ  t 2 s 1  
n SS x
(c) 2007 IUPUI SPEA K300 (4392)

Topic13 OLS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic13 OLS

Uploaded by

Copyright:

Available Formats

Outline

 Least Squares Methods

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

 Dependent variable (DV) = response

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

 Deviation does not have good properties

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

 Ordinary least squares (OLS) =

(c) 2007 IUPUI SPEA K300 (4392)

Min  2  Min (Y  a  bX )2

How to get a and b that can minimize the sum

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

Mean 57.5 136.5

a  Y  bX  136.5  .9644  57.5  81.0481

Mean 57.5 136.5

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

 a is an estimator of its parameter α

(c) 2007 IUPUI SPEA K300 (4392)

 For unit increase in x, the expected

(c) 2007 IUPUI SPEA K300 (4392)

 The outcome of least squares method is OLS

(c) 2007 IUPUI SPEA K300 (4392)

 How reliable are a and b we compute?

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

Now, how do we compute the variance

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

 How do we test H0: β0(α)=β1=β2 …=0?

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

Total Model Re sidual( Error)

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

Sources Sum of Squares DF Mean Squares F

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

 R square is Karl Pearson correlation coefficient

(c) 2007 IUPUI SPEA K300 (4392)

(c) 2007 IUPUI SPEA K300 (4392)

You might also like