You are on page 1of 34

Outline

 Least Squares Methods


 Estimation: Least Squares
 Interpretation of estimators
 Properties of OLS estimators
 Variance of Y, b, and a
 Hypothesis Test of b and a
 ANOVA table
 Goodness-of-Fit and R2

(c) 2007 IUPUI SPEA K300 (4392)


Linear regression model
Y = 2 +.5X
5
4
3
y

.5
2

1
1

-1 0 1 2 3 4 5
x

(c) 2007 IUPUI SPEA K300 (4392)


Terminology

 Dependent variable (DV) = response


variable = left-hand side (LHS) variable
 Independent variables (IV) = explanatory
variables = right-hand side (RHS)
variables = regressor (excluding a or b0)
 a (b0) is an estimator of parameter α, β0
 b (b1) is an estimator of parameter β, β1
 a and b are the intercept and slope

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method
 How to draw such a line based on data points
observed?
 Suppose a imaginary line of y= a + bx
 Imagine a vertical distance (or error) between
the line and a data point. E=Y-E(Y)
 This error (or gap) is the deviation of the data
point from the imaginary line, regression line
 What is the best values of a and b?
 A and b that minimizes the sum of such errors
(deviations of individual data points from the
line)

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method
Least Squares Method
4

x3
3

e3

E(Y)=a + bX
2
y

x1

e2
e1
1

x2
0

0 1 2 3 4 5
x

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method

 Deviation does not have good properties


for computation
 Why do we use squares of deviation?
(e.g., variance)
 Let us get a and b that can minimize the
sum of squared deviations rather than
the sum of deviations.
 This method is called least squares

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method
 Least squares method minimizes the sum of
squares of errors (deviations of individual data
points form the regression line)
 Such a and b are called least squares
estimators (estimators of parameters α and β).
 The process of getting parameter estimators
(e.g., a and b) is called estimation
 “Regress Y on X”
 Lest squares method is the estimation method
of ordinary least squares (OLS)

(c) 2007 IUPUI SPEA K300 (4392)


Ordinary Least Squares

 Ordinary least squares (OLS) =


 Linear regression model =
 Classical linear regression model
 Linear relationship between Y and Xs
 Constant slopes (coefficients of Xs)
 Least squares method
 Xs are fixed; Y is conditional on Xs
 Error is not related to Xs
 Constant variance of errors

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method 1
Y    X  
E (Y )  Yˆ  a  bX
  Y  Yˆ  Y  (a  bX )  Y  a  bX
 2  (Y  Yˆ ) 2  (Y  a  bX ) 2
(Y  a  bX )2  Y 2  a 2  b2 X 2  2aY  2bXY  2abX
 
 2
 (Y  ˆ
Y ) 2
  (Y  a  bX ) 2

Min  2  Min (Y  a  bX )2

How to get a and b that can minimize the sum


of squares of errors?
(c) 2007 IUPUI SPEA K300 (4392)
Least Squares Method 2
• Linear algebraic solution
• Compute a and b so that partial derivatives
with respect to a and b are equal to zero


  2

  (Y  a  bX ) 2 
 2na  2 Y  2b X  0
a a
na  Y  b X  0

a  Y
b  X
 Y  bX
n n

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method 3
Take a partial derivative with respect to b and
plug in a you got, a=Ybar –b*Xbar

  2

 
  (Y  a  bX ) 2 2b X 2  2 XY  2a  X  0
b b
b X 2   XY  a X  0 b X 2   XY  Y  bX  X  0
 Y  X 
b X   XY  
2
 b  X  0

 n n 
b X 2   XY   X Y
b
 X
2

0
n n
 n X 2   X 2   XY   X Y
b 
 n  n
 

(c) 2007 IUPUI SPEA K300 (4392)


Least Squares Method 4
Least squares method is an algebraic solution
that minimizes the sum of squares of errors
(variance component of error)
n XY   X  Y  ( X  X )(Y  Y ) SP
b   xy

n X 2   X  (X  X )
2 2
SS x

a Y  b  X  Y  bX
n n

a     X  XY
Y X 2

n X 2   X 
2 Not recommended

(c) 2007 IUPUI SPEA K300 (4392)


OLS: Example 10-5 (1)
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)^2
1 43 128 -14.5 -8.5 123.25 210.25
2 48 120 -9.5 -16.5 156.75 90.25
3 56 135 -1.5 -1.5 2.25 2.25
4 61 143 3.5 6.5 22.75 12.25
5 67 141 9.5 4.5 42.75 90.25
6 70 152 12.5 15.5 193.75 156.25

Mean 57.5 136.5


Sum 345 819 541.5 561.5

b  ( X  X )(Y  Y ) SP
 xy

541.5
 .9644
( X  X ) 2
SS x 561.5

a  Y  bX  136.5  .9644  57.5  81.0481


(c) 2007 IUPUI SPEA K300 (4392)
OLS: Example 10-5 (2), NO!
No x y xy x^2
1 43 128 5504 1849
2 48 120 5760 2304
3 56 135 7560 3136
4 61 143 8723 3721
5 67 141 9447 4489
6 70 152 10640 4900

Mean 57.5 136.5


Sum 345 819 47634 20399
n XY   X  Y 6  47634  345  819
b   .964
n X 2   X  6  20399  3452
2

a 
Y X   X  XY 819  20399  345  47634
2

  81.048
n X   X  6  20399  345
2 2 2

(c) 2007 IUPUI SPEA K300 (4392)


OLS: Example 10-5 (3)
Y hat = 81.048 + .964X
150
140
130
120

40 50 60 70
x

Fitted values y

(c) 2007 IUPUI SPEA K300 (4392)


What Are a and b ?

 a is an estimator of its parameter α


 a is the intercept, a point of y where the
regression line meets the y axis
 b is an estimator of its parameter β
 b is the slope of the regression line
 b is constant regardless of values of Xs
 b is more important than a since that is
what researchers want to know.

(c) 2007 IUPUI SPEA K300 (4392)


How to interpret b?

 For unit increase in x, the expected


change in y is b, holding other things
(variables) constant.
 For unit increase in x, we expect that y
increases by b, holding other things
(variables) constant.
 For unit increase in x, we expect that y
increases by .964, holding other
variables constant.

(c) 2007 IUPUI SPEA K300 (4392)


Properties of OLS estimators

 The outcome of least squares method is OLS


parameter estimators a and b.
 OLS estimators are linear
 OLS estimators are unbiased (precise)
 OLS estimators are efficient (small variance)
 Gauss-Markov Theorem: Among linear
unbiased estimators, least square estimator
(OLS estimator) has minimum variance.
BLUE (best linear unbiased estimator)

(c) 2007 IUPUI SPEA K300 (4392)


Hypothesis Test of a an b

 How reliable are a and b we compute?


 T-test (Wald test in general) can answer
 The standardized effect size (effect size /
standard error)
 Effect size is a-0 and b-0 assuming 0 is the
hypothesized value; H0: α=0, H0: β=0
 Degrees of freedom is N-K, where K is the
number of regressors +1
 How to compute standard error (deviation)?

(c) 2007 IUPUI SPEA K300 (4392)


Variance of b (1)
 b is a random variable that changes across
samples.
 b is a weighted sum of linear combinations of
random variable Y
̂ 
 ( X  X )(Y  Y )  XY  nXY  ( X  X )Y
   w Y
(X  X ) (X  X ) (X  X )
2 2 2 i i

 w Y  w Y  w Y  ...  w Y
i i 1 1 2 2 n n

(Xi  X )
wi 
 ( X i  X )2
 ( X  X )(Y  Y )   ( XY  XY  XY  XY )   XY   XY   XY   XY 
 XY  Y  X  X  Y  nXY   XY  YnX  XnY  nXY   XY  nXY
(c) 2007 IUPUI SPEA K300 (4392)
Variance of b (2)
 Variance of Y (error) is σ2
 Var(kY) = k2Var(Y) = k2σ2

̂  b   ( X  X )(Y  Y )
 wY
( X  X ) 2 i i

Var ( ˆ )  Var ( wiYi )  w12Var (Y1 )  w22Var (Y2 )  ...  wn2Var (Yn ) 
w12 2  w22 2  ...  wn2 2   2  wi2
(Xi  X )
wi 
 ( X i  X )2
2
 (Xi  X )  2 2
 2  wi2   2      (X  X) 
2

 ( Xi  X ) 
2 
 ( X i  X) 2 2 i
( X i  X )2

(c) 2007 IUPUI SPEA K300 (4392)


Variance of a
 a=Ybar + b*Xbar
 Var(b)=σ2/SSx , SSx = ∑(X-Xbar)2
 Var(∑Y)=Var(Y1)+Var(Y2)+…+Var(Yn)=nσ2
Var (ˆ )  Var (Y  bX )  Var (Y )  Var (bX )  2Cov(Y , bX ) 
 Y  1   2 
Var   X Var (b)  2 Var ( Y )  X
2 2 

 (Xi  X ) 
 2 
 n  n

1 
2 2  1 X 2 
n  X
2   2   
n 2  (X  X ) 
2  n (X  X ) 
2
 i   i 

Now, how do we compute the variance


of Y, σ2?
(c) 2007 IUPUI SPEA K300 (4392)
Variance of Y or error
 Variance of Y is based on residuals (errors), Y-
Yhat
 “Hat” means an estimator of the parameter
 Y hat is predicted (by a + bX) value of Y; plug
in x given a and b to get Y hat
 Since a regression model includes K
parameters (a and b in simple regression), the
degrees of freedom is N-K
 Numerator is SSE in the ANOVA table

̂ 2  se2   i i 
(Y  ˆ
Y ) 2
SSE
 MSE
N K NK
(c) 2007 IUPUI SPEA K300 (4392)
Illustration (1)
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)^2 yhat (y-yhat)^2
1 43 128 -14.5 -8.5 123.25 210.25 122.52 30.07
2 48 120 -9.5 -16.5 156.75 90.25 127.34 53.85
3 56 135 -1.5 -1.5 2.25 2.25 135.05 0.00
4 61 143 3.5 6.5 22.75 12.25 139.88 9.76
5 67 141 9.5 4.5 42.75 90.25 145.66 21.73
6 70 152 12.5 15.5 193.75 156.25 148.55 11.87
Mean 57.5 136.5
Sum 345 819 541.5 561.5 127.2876

s2
  (Y  Yˆ )
i i
2


127.2876
 31.8219 SSE=127.2876, MSE=31.8219
N K 62
e

ˆ 2 31.8219
Var (b)  Var ( ˆ )    .0567  .2381
2

 ( X i  X )2 561.5
1 X2   1 57.52 
Var (a)  ˆ2    31.8219    13.88092
 n (X  X ) 
2
 i   6 561.5 

(c) 2007 IUPUI SPEA K300 (4392)


Illustration (2): Test b
 How to test whether beta is zero (no effect)?
 Like y, α and β follow a normal distribution; a
and b follows the t distribution
 b=.9644, SE(b)=.2381,df=N-K=6-2=4
 Hypothesis Testing
 1. H0:β=0 (no effect), Ha:β≠0 (two-tailed)
 2. Significance level=.05, CV=2.776, df=6-2=4
 3. TS=(.9644-0)/.2381=4.0510~t(N-K)
 4. TS (4.051)>CV (2.776), Reject H0
 5. Beta (not b) is not zero. There is a significant
impact of X on Y
1
1  ˆ1  t 2 se  .9644  2.776  .2381
 i
( X  X ) 2

(c) 2007 IUPUI SPEA K300 (4392)


Illustration (3): Test a
 How to test whether alpha is zero?
 Like y, α and β follow a normal distribution; a
and b follows the t distribution
 a=81.0481, SE(a)=13.8809, df=N-K=6-2=4
 Hypothesis Testing
 1. H0:α=0, Ha:α≠0 (two-tailed)
 2. Significance level=.05, CV=2.776
 3. TS=(81.0481-0)/.13.8809=5.8388~t(N-K)
 4. TS (5.839)>CV (2.776), Reject H0
 5. Alpha (not a) is not zero. The intercept is
discernable from zero (significant intercept).
ˆ 1 X2
 0   0  t 2 se   .81.0481  2.776 13.8809
n  ( X i  X )2

(c) 2007 IUPUI SPEA K300 (4392)


Questions

 How do we test H0: β0(α)=β1=β2 …=0?


 Remember that t-test compares only two
group means, while ANOVA compares more
than two group means simultaneously.
 The same thing in linear regression.
 Construct the ANOVA table by partitioning
variance of Y; F test examines the above H0
 The ANOVA table provides key information of
a regression model

(c) 2007 IUPUI SPEA K300 (4392)


Partitioning Variance of Y (1)
Y hat = 81.048 + .964X, Ybar=136.5
150

Yi
140

Yhat=81+.96X
Ybar=136.5
130
120

40 50 60 70
x

Fitted values y

(c) 2007 IUPUI SPEA K300 (4392)


Partitioning Variance of Y (2)
yi  y  yˆi  y  yi  yˆi
  
Total Model Re sidual( Error)

 (

yi
 
 y ) 2
 (

ˆ
y i
 
 y ) 2
 (

yi  ˆ
y
 i ) 2

Total Model Re sidual( Error)


n
SSM   (Yˆi  Y ) 2
i 1
n
SSE   (Yi  Yˆ ) 2
i 1
n

 (Y  Yˆ )
i
2
SSE
s2  i 1
  MSE
NK N K
SST  SS y   (Yi  Y)2  Yi 2  nY 2

(c) 2007 IUPUI SPEA K300 (4392)


Partitioning Variance of Y (3)
81+.96X
No x y yhat (y-ybar)^2 (yhat-ybar)^2 (y-yhat)^2
1 43 128 122.52 72.25 195.54 30.07
2 48 120 127.34 272.25 83.94 53.85
3 56 135 135.05 2.25 2.09 0.00
4 61 143 139.88 42.25 11.39 9.76
5 67 141 145.66 20.25 83.94 21.73
6 70 152 148.55 240.25 145.32 11.87
Mean 57.5 136.5 SST SSM SSE
Sum 345 819 649.5000 522.2124 127.2876

•122.52=81+.96×43, 148.6=.81+.96×70
•SST=SSM+SSE, 649.5=522.2+127.3

(c) 2007 IUPUI SPEA K300 (4392)


ANOVA Table
 H0: all parameters are zero, β0 = β1 = 0
 Ha: at least one parameter is not zero
 CV is 12.22 (1,4), TS>CV, reject H0

Sources Sum of Squares DF Mean Squares F


Model SSM K-1 MSM=SSM/(K-1) MSM/MSE
Residual SSE N-K MSE=SSE/(N-K)
Total SST N-1
Sources Sum of Squares DF Mean Squares F
Model 522.2124 1 522.2124 16.41047
Residual 127.2876 4 31.8219
Total 649.5000 5

(c) 2007 IUPUI SPEA K300 (4392)


R2 and Goodness-of-fit
 Goodness-of-fit measures evaluates how well
a regression model fits the data
 The smaller SSE, the better fit the model
 F test examines if all parameters are zero.
(large F and small p-value indicate good fit)
 R2 (Coefficient of Determination) is SSM/SST
that measures how much a model explains the
overall variance of Y.
 R2=SSM/SST=522.2/649.5=.80
 Large R square means the model fits the data

(c) 2007 IUPUI SPEA K300 (4392)


Myth and Misunderstanding in R2

 R square is Karl Pearson correlation coefficient


squared. r2=.89672=.80
 If a regression model includes many regressors, R2 is
less useful, if not useless.
 Addition of any regressor always increases R2
regardless of the relevance of the regressor
 Adjusted R2 give penalty for adding regressors, Adj.
R2=1-[(N-1)/(N-K)](1-R2)
 R2 is not a panacea although its interpretation is
intuitive; if the intercept is omitted, R2 is incorrect.
 Check specification, F, SSE, and individual parameter
estimators to evaluate your model; A model with
smaller R2 can be better in some cases.

(c) 2007 IUPUI SPEA K300 (4392)


Interpolation and Extrapolation
 Confidence interval of E(Y|X), where x is within
the rage of data x; interpolation
 Confidence interval of Y|X, where x is beyond
the range of data x; extrapolation
 Extrapolation involves penalty and danger,
which widens the confidence interval; less
reliable
1 ( x  x )2
yˆ  t 2 s 
n SS x
1 ( x  x )2
yˆ  t 2 s 1  
n SS x

(c) 2007 IUPUI SPEA K300 (4392)

You might also like