Two-Variable Regression Model - The Problem of Estimation

Chapter 3.
Two-VariableRegression Model:
The Problem of Estimation
Ordinary Least Squares Method (OLS)
Recall that, PRF: Yi = 1 + 2 Xi + ui
Thus, since PRF is not directly observable, it is

estimated by SRF; that is,
Yi  ˆ1  2ˆ Xi  i

And, uˆ
Y i  Yiˆ  i
On Error Term More
If
Yi  î  uˆ i
Y
Then,
û i  Yi  î
Y
And,
uî  Yi  ˆ1  ˆ
2

Xi
On error term more
We need to choose SRF in such a way that, error terms should be as small
as possible,
That is,
The sum of residuals which is represented by
ˆ
 uˆ   
i
i Yi  Y
 Should be as SMALL as possible

On Error Terms more
Therefore, the essential solution is to find a criterion in
order to minimize error disturbances in SRF.
All of the errors are to be as

closer as possible to the
central line of SRF
Then, Least Squares Criterion Comes as a
Solution
Least Squares Criterion is based on:
 uˆ   
2
i Yi  Yî
 Y ˆ  ˆ X
2

2
i 1 2 i
Thus,

 
u i  f ˆ1 , ˆ 2
ˆ 2

Example to Least Squares Criterion
Sum
The first Model
of sq isEBetter?
uares of rror
Why?
disturba nces of the
second model is low
er
Regression Equation
ˆ
Yi  1  2ˆ X i  i
uˆ
n X Y   X  Y
 X  Y  X X Y
i i i i
 i2
ˆ2
   ̂ 
i i i i
n X  X
n X
X i2 i
 1
2 2
   
X  X Y 2  Y
X  X
2 i i
 Y - ˆ 
i
i i
 2
 xi yi X

 x i2 SampleSmameapnleomf Yean
of X
The Classical Linear Regression Model (CLRM):
The Assumptions Underlying The Method of Least Squares
 The inferences about the true 1 and 2 are

important because the estimated values of
them are needed to be closer and closer to
population values.
 Therefore CLRM, which is the cornerstone of

most econometric theory, makes 10 assumptions.
Assumptions of CLRM:
 Assumption 1. Linear Regression Model
 The regression model is linear in the parameters, that is:
Yi = 1 + 2 Xi + ui
 Assumption 2. X values are fixed in repeated sampling.

 More technically, X is assumed to be non-stochastic
 X: 80$ income level  Y: 60$ weekly consumption of a family
 X: 80$ income level  Y: 75$ weekly consumption of another family
Assumption 2 is known as: Conditional Regression Analysis, that is,

conditional on the given values of the regressor(s) X.
Assumption 3. Zero Mean value of disturbance ui
E u i / X i  
0
Assumption 4. Homoscedasticity or Equal Variance of ui
varui / X i   E ui  E ui / X i

2  
 E ui2 / X i because of Assumption
3  2
where var stands for
variance
Homoscedasticity vs Heteroscedasticity
var u i / X i 
2
Assumption 5. No Autocorrelation between the disturbances
Autocorrelation
If :
PRF: Yt = 1 + 2Xt + ut
And if ut and ut-1 are correlated, then Yt depends not only Xt, but also on ut-
1.
utocorrelation
n Graphs
Assumption 6. Zero Covariance between ui and Xi.
Assumption 7.
Assumption 8.
Assumption 9.
Assumption 10. There is No Perfect Multicollinearity
That is, there is no perfect linear relationship among the

explanatory variables.
Yt  0  1 X1  2 X 2 .....n X n ut
High correlation among independent variables causes multicollinearity which

also causes standard errors to be high, hypotheses to be inefficient (low t
values), etc...
Properties of the Least-Squares Estimators:
The Gauss-Markov Theorem
 Gauss-Markov Theorem is the least squares approach of Gauss (1821)
with the minimum variance approach of Markov (1900).
 Standard error of estimate is simply the standard deviation of the

Y values about the estimated regression line and is often used as a
summary measure of the “goodness of fit” of the estimated
regression line.
BLUE (Best Linear Unbiased Estimator)
1. An estimator is linear, that is, a linear function of a random

variable, such as the dependent variable Y in the regression
model.
2. An estimator is unbiased, that is, its average or expected
value, E(2), is equal to the true value, 2.
3. An estimator has minimum variance in the class of all such
linear unbiased estimators; an unbiased estimator with the least
variance is known as an efficient estimator.
Therefore, in the regression context it can be proved that the OLS

estimators are BLUE which also sets the base of Gauss- Markov
Theorem.
The Coefficient of Determination, r2:
A Measure of “Goodness of Fit”
The coefficient of determination, r2 (two-variable

case) or R2 (multiple regression) is a summary
measure that tells how well the sample regression
line fits the data.
The Ballentine View of R2
See Peter Kennedy, “Ballentine: A Graphical Aid for Econometrics”, Australian Economics Papers, Vol 20, 1981,
414-416. The name Ballentine is derived from the emblem of the well-known Ballantine beer with its circles.
Coefficient of Determination, r2
TSS = ESS + RSS
where;
TSS = total sum of squares
ESS = explained sum of squares
RSS = residual sum of squares
If TSS = ESS + RSS, then:
ESS RSS
1 

TSS ˆ
 TSS
Yi

 uˆ i2
   Y  Y
2 2
YYi  Y
2
i
On r2 more:
R2 indicates the explained part of the
regression model, therefore,
ESS
r2 
TSS
And,
2
  ˆY Yi 2
r  
 Y  Y
i
2
 ESS
Alternatively,
r 2
 1
 uˆ i
 Y
2
i Y
RSS  2
r2  1  TSS
Coefficient
of
Determinat
ion
Coefficient Of
Determination
Data Example
Data Example
Data Example
Illustrative Examples
HW # Assignment 1: (CLO-1)
Take some two-variable cross-sectional data from
you area of interest (based on some
economic/finance model) and make both manual
and computer-based regression analysis. You are
also required to bring data into next class for
practice and evaluation of your individual
assignments.
Caution: Every student must make sure that
he/she has a different data set for doing this
assignment.

Two-Variable Regression Model - The Problem of Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Two-Variable Regression Model - The Problem of Estimation

Uploaded by

Copyright:

Available Formats

Chapter 3.

Thus, since PRF is not directly observable, it is

The sum of residuals which is represented by

 Should be as SMALL as possible

All of the errors are to be as

Least Squares Criterion is based on:

 The inferences about the true 1 and 2 are

 Therefore CLRM, which is the cornerstone of

 Assumption 2. X values are fixed in repeated sampling.

Assumption 2 is known as: Conditional Regression Analysis, that is,

varui / X i   E ui  E ui / X i

That is, there is no perfect linear relationship among the

High correlation among independent variables causes multicollinearity which

 Standard error of estimate is simply the standard deviation of the

1. An estimator is linear, that is, a linear function of a random

Therefore, in the regression context it can be proved that the OLS

The coefficient of determination, r2 (two-variable

If TSS = ESS + RSS, then:

You might also like