Professional Documents
Culture Documents
1
____________________________________________________________________________________________________
TABLE OF CONTENTS
1. INTRODUCTION
2. ASSUMPTIONS OF GAUSS MARKOV THEOREM
3. GAUSS MARKOV THEOREM AND PROOF
3.1. PROOF THAT OLS ESTIMATOR ARE LINEAR AND UNBIASED
3.2. PROOF THAT OLS ESTIMATOR IS EFFICIENT
3.3. PROOF THAT OLS ESTIMATOR IS CONSISTENT
4. GOODNESS OF FIT
4.1. MEASURES OF VARIATION
4.2. COEFFICIENT OF DETERMINATION
4.3. COEFFICIENT OF CORRELATION
5. SUMMARY
2
____________________________________________________________________________________________________
1. INTRODUCTION
Using OLS we estimate the parameters from the sample regression function.
However this estimates of are from the sample regression function. So we
need to make some assumptions about the population regression function so that the
sample estimates of can be used to make inferences about the population
estimate� � . These sets of assumptions are known as Classical Linear Regression
Model (CLRM) Assumptions.
Under these assumptions the OLS estimators has very good statistical properties. So these
assumptions are also known as the Gauss Markov Theorem assumptions. We now look at
those Gauss Markov assumptions for the Classical Linear Regression (CLRM) Model.
� = � + � � + �
�⁄ � =
This assumption also implies that information which are not captured by explanatory
variable (s) and falls into the error term are not related to the explanatory variable (s) and
hence do not systematically affect the dependent variable.
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM
3
____________________________________________________________________________________________________
�⁄ � = �
By definition
�⁄ � = [ � − �⁄ � ]
�⁄ � = � ⁄ � = � �
4
____________________________________________________________________________________________________
Assumption 5: (No Autocorrelation): The correlation between any two disturbance terms
� and � ≠ given any two values � and � are zero.
� , ⁄ � , = {[ � − ]⁄ � }{[ − � ]⁄ }
= [ �⁄ � ⁄ ]=
� � = [ � − � ][ � − � ]
= [ � � − � ] � � � ℎ �
= [ � �] − [ � ] [ � ] Since � =
= �
This basically says that the explanatory variables are uncorrelated with the disturbance
term. So the values of the explanatory variables has nothing to say about the disturbance
term.
Assumption 7: (Identification):
To find unique estimates of the normal equations, the number of observations must be
greater than the number of parameters to be estimated. Otherwise it would not be possible
to find unique OLS estimates of the parameters.
5
____________________________________________________________________________________________________
∑� �− ̅
< <∞
−
If all the values of � are the same we have∑� � − ̅ = . Thus it will not be possible
to estimates the OLS estimates.
� ~ �� ,� � = , ,……….,
Where NID stands for Normal Independently Distributed. The normality assumption of
the disturbance term implies that � is also normally distributed. This assumption is
necessary for constructing confidence intervals of � � and hence for conducting
hypothesis testing.
The functional form of the regression model need to correctly specify. Otherwise there
will specification bias or error in the estimation of the regression model.
Assumption 11: (No Multicollinearity). When the regression model has more than one
explanatory variables there should not be any perfect linear relationship between any of
these variables.
The above assumptions about the regression models relates to the population regression
function. Since we can only observed the sample regression function and not the
population regression function we cannot really know if the above assumptions are
actually valid.
The Gauss Markov Theorem basically states that under the assumptions of the Classical
Linear Regression Model (assumptions 1-8), the least squares estimators are the
minimum variance estimators among the class of unbiased linear estimators; that is, they
are BLUE.
6
____________________________________________________________________________________________________
The OLS estimator is unbiased if its expected value is equal to population parameter
� . The estimator is a random variable and takes on different values from sample to
sample. However unbiasedness property implies that on average the value of is equal
to the population parameter �
= ∑ � �
�=
Where
∑�= � − ̅
� =
∑�= � − ̅
′
The � has the following properties
∑ � =
�=
∑�= �− ̅
∑ � = =
[∑�= � − ̅ ] ∑�= � − ̅
�=
∑�= � − ̅ �
∑ � � = =
∑�= �− ̅
�=
To prove the unbiasedness of the OLS estimator we need to rewrite our estimator in
terms of population parameter.
= ∑ � �
�=
7
____________________________________________________________________________________________________
= ∑ � � + � � + �
�=
=� ∑ � + � ∑ � � + ∑ � �
�= �= �=
= � + ∑ � �
�=
The OLS estimator is thus a linear function of � .The explanatory variable (s) are
assumed to be non-stochastic. So the � are also non-stochastic as well.
Taking expectation operator both the sides we have
= � + ∑ � � =�
�=
Therefore OLS estimator is an unbiased linear estimator of �
The OLS estimator has the second desirable property of being an efficient estimator. This
efficiency property relates to the variance of the estimator. We have to prove that the
variance of OLS estimator has the smallest variance among all the possible estimators.
To prove this we have to first define an arbitrary estimator �̃ which is linear in .
Secondly we impose restrictions implied by unbiasedness. Lastly we will show that
variance of arbitrary estimator �̃ is larger than (or atleast equal to) the variance of OLS
estimator
�̃ = ∑ � �
=
Next we substitute the Population Regression Function in �
�̃ = ∑ � �
�=
= ∑ � � + � � + �
�=
8
____________________________________________________________________________________________________
=� ∑ � + � ∑ � � + ∑ � �
�= �= �=
∑ � = � ∑ � � =
�= �=
�̃ = � + ∑ � �
�=
[�̃ − ] = [∑ � �] = � ∑ �
�= �=
� − ̅ � − ̅
= � ∑[ � − + ]
∑�= �− ̅ ∑�= �− ̅
�=
� − ̅ � − ̅
= � ∑[ � − ] + � ∑[ ]
∑�= �− ̅ ∑�= � − ̅
�= �=
� − ̅ � − ̅
+ � ∑[ � − + ]
∑�= �− ̅ ∑�= � − ̅
�=
It can be shown that the last term in the above equation is zero
� − ̅ �− ̅
� ∑[ � − + ]
∑�= � − ̅ ∑�= � − ̅
�=
� − ̅ � − ̅
= � ∑[ � − ] − � ∑[ � + ]
∑�= � − ̅ ∑�= � − ̅
�= �=
= � − � =
[�̃ − ] = � ∑[ � − �] + [ −� ]
�=
�� −�̅
Where � = ∑�
�= �� −�̅
The first term on the Right Hand Side is always positive except when � = � for all
values of i.
So [�̃ − ] [ −� ]
9
____________________________________________________________________________________________________
This property of invariance does not hold valid for the expectation operator . For
instance if �̂ is an unbiased estimator of �ie (�̂ ) = �. However this does not mean that
̂ is an unbiased estimator of � (ie ̂ ≠ �(�̂) ≠ �. This is because the expectation
� �
operator applies only to linear functions of random variables while � � operator is valid
for any continuous function.
We know that
∑�= � − ̅ � −̅
=
∑�= � − ̅
∑�=�− ̅ � − ̅ ∑�= � − ̅
=
∑�= � − ̅
∑�= � − ̅ �
=
∑�= � − ̅
∑�= � − ̅ � + � � + �
=
∑�= � − ̅
� ∑�= � − ̅ � ∑�= � − ̅ � ∑�= �− ̅ �
= + +
∑�= � − ̅ ∑�= � − ̅ ∑�= � − ̅
∑�= � − ̅ �
= � +
∑�= � − ̅
10
____________________________________________________________________________________________________
Plim ∑�= � − ̅ �
→∞
= � +
Plim ∑�= � − ̅
→∞
We divide both the numerator and denominator in the second term by so that the
summation does not goes to infinity when → ∞. Then next we apply the law of large
numbers to both numerator and denominator. According to Law of Large number that
under general conditions, the sample moments converge to their corresponding
population moments.
,
�� = � + = �
Provided ≠ . Note that , = [ − ̅ ]= − ̅ [ ]=
4. GOODNESS OF FIT
We have estimated our model parameters using OLS and have seen how they have
various desirable statistical properties under certain assumptions. But we are still not sure
if the estimated model fits the data well. If all the observations of the sample lie on the
regression line then we say that the regression model fits the data perfectly. Usually, we
will have some negative and some positive residual term. We want that these residuals
around the regression line as minimum as possible. The coefficient of determination
provides a summary measure of how well the sample regression line fits the data.
Summing both the sides and dividing it by the sample size we have
̅= + ̅
Squaring both the sides and taking summation over the sample we have
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS
ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM
11
____________________________________________________________________________________________________
∑ � = ∑ ̂� + ∑ � + ∑ ̂� �
�= �= �= �=
The Last term is zero by the assumption that the covariance of fitted value and error is
zero
∑ � − ̅ = ∑( ̂ − ̅ ) + ∑ �
�= �= �=
= +
Or, Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of
Squares (RSS)
Where
= ∑�= − ̅ is the total variation of actual values about their sample mean
�
Therefore the Total Variation in can be decomposed into two parts (1) ESS which is
the part accounted for by and (2) RSS which is the unexplained and unaccounted part.
RSS is known as unexplained part of variation because the residual term captures the
effect of variables other than the explanatory variable that are not included in the
regression model.
We define as follows
∑�= ( ̂ − ̅ )
= =
∑�= � − ̅
12
____________________________________________________________________________________________________
So, is now equal to 1 minus the total sum of squares that is not explained by the
regression model (Residual Sum of Squares). When the observed points are closer to the
estimated regression line, then we say that the data fits the model very well. In such case
ESS will be higher and RSS will be smaller. We want which is a measure of goodness
of fit to be very high. When is low, this means that there are lots of variations in
which cannot be explain by
There are other interpretations for . It also measures the correlation between the
observed value � and the predicted value ̂� ( �,�̂ ). Therefore
̂
= ̂
( �, ̂� ) ÷ ̂� ( ̂� ) = =
��̂
The question which commonly arises relates to the value of the goodness of fit. There is
no rule which suggest what value of is considered as high and what is considered as
low. For time series data the value of is usually high and above 0.9. However for
cross-sectional data, value of 0.6 or 0.7 may be considered as good. We should be
cautious not to depend too much on the value of . is simply one measure of model
adequacy. We should be more concerned about the signs of the regression coefficients
and whether they conform to economic theory or prior informations.
Properties of ��
1. is a non-negative number.
2. It is unit free as both the numerator and the denominator have the same units
3. The following relationship will hold for coefficient of determination
.
13
____________________________________________________________________________________________________
The concept of Coefficient of Correlation is quite different from that of goodness of fit.
However they are closely connected. The Coefficient of Correlation measures the degree
of association between two variables. The sample correlation coefficient can be obtained
as follows:
∑ �− ̅ �−̅
=
√∑ � − ̅ ∑ � − ̅
4. The change in origin and scale of measurement does not affect the measurement
of the coefficient of correlation. Suppose �∗ = � + and �∗ = + where
∗ ∗
, , are constants. The correlation coefficient between � � and the
correlation coefficient between �∗ ∗
� are the same.
14
____________________________________________________________________________________________________
5. SUMMARY
1. The Classical Linear Regression Model is based on as set of assumptions known as the
Gauss Markov assumptions.
3. The assumptions under the Classical Linear Regression Model are necessary to prove
the Gauss Markov Theorem. The Theorem basically states that under these assumptions,
the least squares estimators are the minimum variance estimators among the class of
unbiased linear estimators; that is, they are BLUE (Best Linear Unbiased Estimator)
4. The OLS estimator is unbiased if its expected value is equal to population parameter
� .The property of unbiasedness implies that on average the value of is equal to the
population parameter �
5. This efficiency property of estimator relates to the concept of the smallest variance of
the estimator. The variance of OLS estimators has the smallest variance among all the
possible estimators.
6. The property of consistency is a large sample property which basically means that as
the sample size tends to infinity, the density function of the estimator collapses to the
parameter value.
15
____________________________________________________________________________________________________
7. The Total Variation in (TSS) is a sum of two parts (1) Explained Sum of Squares
(ESS) which is the part accounted for by and (2) Residual Sum of Squares (RSS) which
is the unexplained and unaccounted part.
8. The coefficient of determination measures the overall goodness of fit of the regression
model. It tells what proportion of the variation in the dependent variable is explained by
the explanatory variable.
9. The coefficient of determination lies between 0 and 1.The closer it is to 1 the better is
the overall goodness of fit of the model. There is no rule which says that such level of
coefficient of determination is high and such level is low. The sign of regression
coefficient is very important.
10. The Coefficient of Correlation measures the degree of association between two
variables. It lies between− . The statistical independence of two variables
implies zero correlation coefficient but not necessarily vice-versa.
11. The Coefficient of determination and the Correlation Coefficient are related as
follows: = ±√
16
____________________________________________________________________________________________________
17