You are on page 1of 35

Chapter 3.

Two-VariableRegression Model:
The Problem of Estimation
Ordinary Least Squares Method (OLS)
Recall that, PRF: Yi = 1 + 2 Xi + ui

Thus, since PRF is not directly observable, it is


estimated by SRF; that is,

Yi  ˆ1  2ˆ Xi  i

And, uˆ

Y i  Yiˆ  i
On Error Term More

If
Yi  ˆi  uˆ i
Y
Then,
û i  Yi  ˆi
Y
And,
uˆi  Yi  ˆ1  ˆ
2

Xi
On error term more
We need to choose SRF in such a way that, error terms should be as small
as possible,

That is,

The sum of residuals which is represented by

ˆ
 uˆ   
i
i Yi  Y

 Should be as SMALL as possible


On Error Terms more
Therefore, the essential solution is to find a criterion in
order to minimize error disturbances in SRF.

All of the errors are to be as


closer as possible to the
central line of SRF
Then, Least Squares Criterion Comes as a
Solution

Least Squares Criterion is based on:

 uˆ   
2
i Yi  Yˆi

 Y ˆ  ˆ X
2


2
i 1 2 i

Thus,

 
u i  f ˆ1 , ˆ 2
ˆ 2


Example to Least Squares Criterion

Sum
The first Model
of sq isEBetter?
uares of rror
Why?
disturba nces of the
second model is low
er
Regression Equation
ˆ
Yi  1  2ˆ X i  i

n X Y   X  Y
 X  Y  X X Y
i i i i
 i2
ˆ2
   ̂ 
i i i i

n X  X
n X
X i2 i
 1
2 2

   
X  X Y 2  Y
X  X
2 i i

 Y - ˆ 
i
i i

 2

 xi yi X

 x i2 SampleSmameapnleomf Yean
of X
The Classical Linear Regression Model (CLRM):
The Assumptions Underlying The Method of Least Squares

 The inferences about the true 1 and 2 are


important because the estimated values of
them are needed to be closer and closer to
population values.

 Therefore CLRM, which is the cornerstone of


most econometric theory, makes 10 assumptions.
Assumptions of CLRM:
 Assumption 1. Linear Regression Model
 The regression model is linear in the parameters, that is:
Yi = 1 + 2 Xi + ui

 Assumption 2. X values are fixed in repeated sampling.


 More technically, X is assumed to be non-stochastic
 X: 80$ income level  Y: 60$ weekly consumption of a family
 X: 80$ income level  Y: 75$ weekly consumption of another family

Assumption 2 is known as: Conditional Regression Analysis, that is,


conditional on the given values of the regressor(s) X.
Assumption 3. Zero Mean value of disturbance ui

E u i / X i  
0
Assumption 4. Homoscedasticity or Equal Variance of ui

varui / X i   E ui  E ui / X i


2  
 E ui2 / X i because of Assumption
3  2
where var stands for
variance
Homoscedasticity vs Heteroscedasticity

var u i / X i 
2
Assumption 5. No Autocorrelation between the disturbances
Autocorrelation
If :
PRF: Yt = 1 + 2Xt + ut
And if ut and ut-1 are correlated, then Yt depends not only Xt, but also on ut-
1.
utocorrelation
n Graphs
Assumption 6. Zero Covariance between ui and Xi.
Assumption 7.
Assumption 8.
Assumption 9.
Assumption 10. There is No Perfect Multicollinearity

That is, there is no perfect linear relationship among the


explanatory variables.

Yt  0  1 X1  2 X 2 .....n X n ut

High correlation among independent variables causes multicollinearity which


also causes standard errors to be high, hypotheses to be inefficient (low t
values), etc...
Properties of the Least-Squares Estimators:
The Gauss-Markov Theorem
 Gauss-Markov Theorem is the least squares approach of Gauss (1821)
with the minimum variance approach of Markov (1900).

 Standard error of estimate is simply the standard deviation of the


Y values about the estimated regression line and is often used as a
summary measure of the “goodness of fit” of the estimated
regression line.
BLUE (Best Linear Unbiased Estimator)

1. An estimator is linear, that is, a linear function of a random


variable, such as the dependent variable Y in the regression
model.
2. An estimator is unbiased, that is, its average or expected
value, E(2), is equal to the true value, 2.
3. An estimator has minimum variance in the class of all such
linear unbiased estimators; an unbiased estimator with the least
variance is known as an efficient estimator.

Therefore, in the regression context it can be proved that the OLS


estimators are BLUE which also sets the base of Gauss- Markov
Theorem.
The Coefficient of Determination, r2:
A Measure of “Goodness of Fit”

The coefficient of determination, r2 (two-variable


case) or R2 (multiple regression) is a summary
measure that tells how well the sample regression
line fits the data.
The Ballentine View of R2

See Peter Kennedy, “Ballentine: A Graphical Aid for Econometrics”, Australian Economics Papers, Vol 20, 1981,
414-416. The name Ballentine is derived from the emblem of the well-known Ballantine beer with its circles.
Coefficient of Determination, r2
TSS = ESS + RSS

where;
TSS = total sum of squares
ESS = explained sum of squares
RSS = residual sum of squares

If TSS = ESS + RSS, then:

ESS RSS
1 


TSS ˆ
 TSS
Yi

 uˆ i2

   Y  Y
2 2
YYi  Y
2
i
On r2 more:
R2 indicates the explained part of the
regression model, therefore,

ESS
r2 
TSS
And,

2
  ˆY Yi 2
r  
 Y  Y
i
2

 ESS
Alternatively,

r 2
 1
 uˆ i

 Y
2
i Y

RSS  2

r2  1  TSS
Coefficient
of
Determinat
ion
Coefficient Of

Determination
Data Example
Data Example
Data Example
Illustrative Examples
HW # Assignment 1: (CLO-1)
Take some two-variable cross-sectional data from
you area of interest (based on some
economic/finance model) and make both manual
and computer-based regression analysis. You are
also required to bring data into next class for
practice and evaluation of your individual
assignments.
Caution: Every student must make sure that
he/she has a different data set for doing this
assignment.

You might also like