The Gauss Markov Theorem

____________________________________________________________________________________________________
Subject Business Economics
Paper No and Title 8, Fundamentals of Econometrics
Module No and Title 3, The gauss Markov theorem
Module Tag BSE_P8_M3
BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

ECONOMICS MODULE No. : 3, GAUSS MARKOV THEOREM
1
____________________________________________________________________________________________________
TABLE OF CONTENTS
1. INTRODUCTION
2. ASSUMPTIONS OF GAUSS MARKOV THEOREM
3. GAUSS MARKOV THEOREM AND PROOF
3.1. PROOF THAT OLS ESTIMATOR ARE LINEAR AND UNBIASED
3.2. PROOF THAT OLS ESTIMATOR IS EFFICIENT
3.3. PROOF THAT OLS ESTIMATOR IS CONSISTENT
4. GOODNESS OF FIT
4.1. MEASURES OF VARIATION
4.2. COEFFICIENT OF DETERMINATION
4.3. COEFFICIENT OF CORRELATION
5. SUMMARY

2
____________________________________________________________________________________________________
1. INTRODUCTION
Using OLS we estimate the parameters from the sample regression function.
However this estimates of are from the sample regression function. So we
need to make some assumptions about the population regression function so that the
sample estimates of can be used to make inferences about the population
estimate� � . These sets of assumptions are known as Classical Linear Regression
Model (CLRM) Assumptions.
Under these assumptions the OLS estimators has very good statistical properties. So these
assumptions are also known as the Gauss Markov Theorem assumptions. We now look at
those Gauss Markov assumptions for the Classical Linear Regression (CLRM) Model.
2. ASSUMPTIONS OF GAUSS MARKOV THEOREM
Assumption 1: (Linear Regression Model): The regression model is linear in

the parameters. It need not be linear in explanatory variables
� = � + � � + �
Assumption 2: ( � Values are Non-Stochastic): The values taken by the explanatory

variables remain unchanged in repeated samples. So the regression analysis is a
conditional regression analysis because it is conditional on the given value of �
Assumption 3: (Conditional mean of disturbance term is zero): Given the value of

explanatory variables the conditional mean of disturbance term is zero
�⁄ � =
If this assumption is violated then
[ �⁄ �] ≠ � + � � which is certainly not desirable.
This assumption also implies that information which are not captured by explanatory
variable (s) and falls into the error term are not related to the explanatory variable (s) and
hence do not systematically affect the dependent variable.
3
____________________________________________________________________________________________________
Assumption 4: (Homoscedasticity): The conditional variance of the disturbance term

given the values of the explanatory variables are the same for all the observations.
�⁄ � = �
By definition
�⁄ � = [ � − �⁄ � ]
Since by assumption 3: �⁄ � = we have
�⁄ � = � ⁄ � = � �
Diagrammatically the concept of homoscedasticity is shown in figure 1 where the

variation around the regression line is same for all values of � . On the contrary the
concept of heteroscedasticity is shown in figure 2 where the conditional variance of the
population varies with .

4
____________________________________________________________________________________________________
Assumption 5: (No Autocorrelation): The correlation between any two disturbance terms
� and � ≠ given any two values � and � are zero.
� , ⁄ � , = {[ � − ]⁄ � }{[ − � ]⁄ }
= [ �⁄ � ⁄ ]=
Assumption 6: Zero Covariance between disturbance term and explanatory variable or

� � =
� � = [ � − � ][ � − � ]
= [ � � − � ] � � � ℎ �
= [ � �] − [ � ] [ � ] Since � =
= �
This basically says that the explanatory variables are uncorrelated with the disturbance
term. So the values of the explanatory variables has nothing to say about the disturbance
term.
Assumption 7: (Identification):
To find unique estimates of the normal equations, the number of observations must be
greater than the number of parameters to be estimated. Otherwise it would not be possible
to find unique OLS estimates of the parameters.
Assumption 8: < <∞

5
____________________________________________________________________________________________________
To find the OLS estimates there should be some

variability in the value of the explanatory variables. In other words all the values of �
cannot be the same i.e.
∑� �− ̅
< <∞
−
If all the values of � are the same we have∑� � − ̅ = . Thus it will not be possible
to estimates the OLS estimates.
Assumption 9: The disturbance term � is assumed to be normally distributed
� ~ �� ,� � = , ,……….,
Where NID stands for Normal Independently Distributed. The normality assumption of
the disturbance term implies that � is also normally distributed. This assumption is
necessary for constructing confidence intervals of � � and hence for conducting
hypothesis testing.
Assumption 10: (Correct functional form Specification)
The functional form of the regression model need to correctly specify. Otherwise there
will specification bias or error in the estimation of the regression model.
Assumption 11: (No Multicollinearity). When the regression model has more than one
explanatory variables there should not be any perfect linear relationship between any of
these variables.
The above assumptions about the regression models relates to the population regression
function. Since we can only observed the sample regression function and not the
population regression function we cannot really know if the above assumptions are
actually valid.
3. GAUSS MARKOV THEOREM
The Gauss Markov Theorem basically states that under the assumptions of the Classical
Linear Regression Model (assumptions 1-8), the least squares estimators are the
minimum variance estimators among the class of unbiased linear estimators; that is, they
are BLUE.

6
____________________________________________________________________________________________________
We need to prove that the OLS estimators are (i)

Unbiased (ii) Efficient and (iii) Consistent
3.1. Proofthat OLS estimator are linear and unbiased.
The OLS estimator is unbiased if its expected value is equal to population parameter
� . The estimator is a random variable and takes on different values from sample to
sample. However unbiasedness property implies that on average the value of is equal
to the population parameter �
We know that the OLS estimates

∑�= � − ̅ �− ̅
=
∑�= � − ̅
∑�= � − ̅ � − ̅ ∑�= � − ̅
=
∑�= � − ̅
∑�= � − ̅ �
=
∑�= � − ̅
= ∑ � �
�=
Where
∑�= � − ̅
� =
∑�= � − ̅
′
The � has the following properties
∑ � =
�=
∑�= �− ̅
∑ � = =
[∑�= � − ̅ ] ∑�= � − ̅
�=
∑�= � − ̅ �
∑ � � = =
∑�= �− ̅
�=
To prove the unbiasedness of the OLS estimator we need to rewrite our estimator in
terms of population parameter.
= ∑ � �
�=

7
____________________________________________________________________________________________________
= ∑ � � + � � + �
�=
=� ∑ � + � ∑ � � + ∑ � �
�= �= �=
= � + ∑ � �
�=
The OLS estimator is thus a linear function of � .The explanatory variable (s) are
assumed to be non-stochastic. So the � are also non-stochastic as well.
Taking expectation operator both the sides we have
= � + ∑ � � =�
�=
Therefore OLS estimator is an unbiased linear estimator of �
3.2. Proof that OLS estimator is efficient
The OLS estimator has the second desirable property of being an efficient estimator. This
efficiency property relates to the variance of the estimator. We have to prove that the
variance of OLS estimator has the smallest variance among all the possible estimators.
To prove this we have to first define an arbitrary estimator �̃ which is linear in .
Secondly we impose restrictions implied by unbiasedness. Lastly we will show that
variance of arbitrary estimator �̃ is larger than (or atleast equal to) the variance of OLS
estimator
Let �̃ be an arbitrary estimator which is linear in .
�̃ = ∑ � �
=
Next we substitute the Population Regression Function in �
�̃ = ∑ � �
�=
= ∑ � � + � � + �
�=

8
____________________________________________________________________________________________________
=� ∑ � + � ∑ � � + ∑ � �
�= �= �=
For the estimator �̃ to be unbiased we need the following restrictions to hold
∑ � = � ∑ � � =
�= �=
�̃ = � + ∑ � �
�=
The variance of this arbitrary estimator �̃ is
[�̃ − ] = [∑ � �] = � ∑ �
�= �=
� − ̅ � − ̅
= � ∑[ � − + ]
∑�= �− ̅ ∑�= �− ̅
�=
� − ̅ � − ̅
= � ∑[ � − ] + � ∑[ ]
∑�= �− ̅ ∑�= � − ̅
�= �=
� − ̅ � − ̅
+ � ∑[ � − + ]
∑�= �− ̅ ∑�= � − ̅
�=
It can be shown that the last term in the above equation is zero
� − ̅ �− ̅
� ∑[ � − + ]
∑�= � − ̅ ∑�= � − ̅
�=
� − ̅ � − ̅
= � ∑[ � − ] − � ∑[ � + ]
∑�= � − ̅ ∑�= � − ̅
�= �=
= � − � =
[�̃ − ] = � ∑[ � − �] + [ −� ]
�=
�� −�̅
Where � = ∑�
�= �� −�̅
The first term on the Right Hand Side is always positive except when � = � for all
values of i.
So [�̃ − ] [ −� ]

9
____________________________________________________________________________________________________
3.3. Proof that OLS estimator is consistent
The property of consistency is a large sample property or an asymptotic property unlike

the property of unbiasedness which holds for any sample size. By consistency we
basically mean that as the sample size tends to infinity the density function of the
estimator collapses to the parameter value. So an OLS estimator is said to be
consistent if
Plim = �
→∞
Where � � means probability limit. In other words converges in probability to �

The operator � � has an invariance property for any continuous function. So if �̂ is a
consistent estimator of � and if ℎ (�̂ ) is any continuous function of �̂ then
Plim →∞ ℎ (�̂ ) = ℎ � . Therefore if �̂ is a consistent estimator of � then ln �̂
̂
�
are also consistent estimator of � ln � respectively.
This property of invariance does not hold valid for the expectation operator . For
instance if �̂ is an unbiased estimator of �ie (�̂ ) = �. However this does not mean that
̂ is an unbiased estimator of � (ie ̂ ≠ �(�̂) ≠ �. This is because the expectation
� �
operator applies only to linear functions of random variables while � � operator is valid
for any continuous function.
We know that
∑�= � − ̅ � −̅
=
∑�= � − ̅
∑�=�− ̅ � − ̅ ∑�= � − ̅
=
∑�= � − ̅
∑�= � − ̅ �
=
∑�= � − ̅
∑�= � − ̅ � + � � + �
=
∑�= � − ̅
� ∑�= � − ̅ � ∑�= � − ̅ � ∑�= �− ̅ �
= + +
∑�= � − ̅ ∑�= � − ̅ ∑�= � − ̅
∑�= � − ̅ �
= � +
∑�= � − ̅
Take � � operator on both the sides

∑�= � − ̅ �
Plim = Plim [� + ]
→∞ →∞ ∑�= � − ̅

10
____________________________________________________________________________________________________
Plim ∑�= � − ̅ �
→∞
= � +
Plim ∑�= � − ̅
→∞
We divide both the numerator and denominator in the second term by so that the
summation does not goes to infinity when → ∞. Then next we apply the law of large
numbers to both numerator and denominator. According to Law of Large number that
under general conditions, the sample moments converge to their corresponding
population moments.
,
�� = � + = �
Provided ≠ . Note that , = [ − ̅ ]= − ̅ [ ]=
Therefore OLS estimator is a consistent estimator.
4. GOODNESS OF FIT
We have estimated our model parameters using OLS and have seen how they have
various desirable statistical properties under certain assumptions. But we are still not sure
if the estimated model fits the data well. If all the observations of the sample lie on the
regression line then we say that the regression model fits the data perfectly. Usually, we
will have some negative and some positive residual term. We want that these residuals
around the regression line as minimum as possible. The coefficient of determination
provides a summary measure of how well the sample regression line fits the data.
4.1 Measures of variation
Recall that the Sample Regression Function is

� = + �+ �
Summing both the sides and dividing it by the sample size we have
̅= + ̅
Subtracting (2) from (1) we have

�−
̅= �−
̅ + �
Writing equation (3) in deviation form we have

� = �+ �
� = ̂� + �
Squaring both the sides and taking summation over the sample we have
11
____________________________________________________________________________________________________
∑ � = ∑ ̂� + ∑ � + ∑ ̂� �
�= �= �= �=
The Last term is zero by the assumption that the covariance of fitted value and error is
zero
∑ � − ̅ = ∑( ̂ − ̅ ) + ∑ �
�= �= �=
= +
Or, Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of
Squares (RSS)
Where
= ∑�= − ̅ is the total variation of actual values about their sample mean
�
= ∑�= ( ̂ − ̅ ) = ∑�= ̂ − ̅̂ = ∑�= � − ̅ is the variation of estimated

values about the sample mean
= ∑�= � is the residual or unexplained variation of actual about regression line.
Therefore the Total Variation in can be decomposed into two parts (1) ESS which is
the part accounted for by and (2) RSS which is the unexplained and unaccounted part.
RSS is known as unexplained part of variation because the residual term captures the
effect of variables other than the explanatory variable that are not included in the
regression model.
4.2. Coefficient of Determination
We have TSS = ESS + RSS
Now divide both sides by TSS we have

= +
∑�= ( ̂ − ̅ ) ∑�= �
= +
∑�= � − ̅ ∑�= � − ̅
We define as follows
∑�= ( ̂ − ̅ )
= =
∑�= � − ̅

12
____________________________________________________________________________________________________
Therefore measures the percentage of the total

variation in � that is explained by the regression model. In other words, it is the
proportion of Total Sum of Squares (TSS) which is explained by the Explained sum of
squares (ESS).
Alternatively, can also be defined in another form by little manipulation of formulae.

∑�= �
= − = −
∑�= � − ̅
So, is now equal to 1 minus the total sum of squares that is not explained by the
regression model (Residual Sum of Squares). When the observed points are closer to the
estimated regression line, then we say that the data fits the model very well. In such case
ESS will be higher and RSS will be smaller. We want which is a measure of goodness
of fit to be very high. When is low, this means that there are lots of variations in
which cannot be explain by
There are other interpretations for . It also measures the correlation between the
observed value � and the predicted value ̂� ( �,�̂ ). Therefore
̂
= ̂
( �, ̂� ) ÷ ̂� ( ̂� ) = =
��̂
So squaring the simple correlation between � and ̂� gives the coefficient of

determination . This result is valid for multiple regression models as well provided the
regression model has a constant term.
The question which commonly arises relates to the value of the goodness of fit. There is
no rule which suggest what value of is considered as high and what is considered as
low. For time series data the value of is usually high and above 0.9. However for
cross-sectional data, value of 0.6 or 0.7 may be considered as good. We should be
cautious not to depend too much on the value of . is simply one measure of model
adequacy. We should be more concerned about the signs of the regression coefficients
and whether they conform to economic theory or prior informations.
Properties of ��
1. is a non-negative number.
2. It is unit free as both the numerator and the denominator have the same units
3. The following relationship will hold for coefficient of determination
.

13
____________________________________________________________________________________________________
When there is perfect relationship between and we

have = and hence = . So all the variation in is explained by the linear
regression model and we have = . When there is no relationship between and
we have = as = . Thus = and = . So all the variation in is left
unaccounted for by the model.
4.3. Coefficient of Correlation
The concept of Coefficient of Correlation is quite different from that of goodness of fit.
However they are closely connected. The Coefficient of Correlation measures the degree
of association between two variables. The sample correlation coefficient can be obtained
as follows:
∑ �− ̅ �−̅
=
√∑ � − ̅ ∑ � − ̅
Alternatively the coefficient of correlation could be obtained as follows:

= ±√
Properties of Coefficient of Correlation
1. The sign of Coefficient of Correlation can be positive or negative depending upon

the sign of sample covariance between
2. It can lie between -1 and +1. So − .
3. The Coefficient of Correlation is symmetrical in nature. So Coefficient of

Correlation between is equal to Coefficient of Correlation
between .
4. The change in origin and scale of measurement does not affect the measurement
of the coefficient of correlation. Suppose �∗ = � + and �∗ = + where
∗ ∗
, , are constants. The correlation coefficient between � � and the
correlation coefficient between �∗ ∗
� are the same.
5. If are statistically independent then the coefficient of correlation

between them is zero. However if the correlation coefficient is zero, this does not
necessary mean that are independent of each other.
6. The Coefficient of Correlation measure on linear association or dependence. So it

is not meaningful to describe nonlinear relationships.

14
____________________________________________________________________________________________________
7. The Coefficient of Correlation does not imply any

cause-and-effect relationship between variables.
The goodness of fit is more meaningful than the coefficient of correlation in the
regression context. The goodness of fit measures the proportion of variation in dependent
variable that is caused by the explanatory variable. It provides up to what extent does the
variation in one variable determined the variation in other variable. The coefficient of
correlation does not have such significant meaning.
5. SUMMARY
1. The Classical Linear Regression Model is based on as set of assumptions known as the
Gauss Markov assumptions.
2. The Gauss Markov assumptions include assumption of linearity in parameter, non-

stochastic value of explanatory variable, expectation of disturbance term is zero,
homoscedasticity of disturbance term, no auto correlation between error terms, no
covariance between error term and disturbance term, identification of equation,
variability of explanatory variables, normality of error term, correct functional form.
3. The assumptions under the Classical Linear Regression Model are necessary to prove
the Gauss Markov Theorem. The Theorem basically states that under these assumptions,
the least squares estimators are the minimum variance estimators among the class of
unbiased linear estimators; that is, they are BLUE (Best Linear Unbiased Estimator)
4. The OLS estimator is unbiased if its expected value is equal to population parameter
� .The property of unbiasedness implies that on average the value of is equal to the
population parameter �
5. This efficiency property of estimator relates to the concept of the smallest variance of
the estimator. The variance of OLS estimators has the smallest variance among all the
possible estimators.
6. The property of consistency is a large sample property which basically means that as
the sample size tends to infinity, the density function of the estimator collapses to the
parameter value.

15
____________________________________________________________________________________________________
7. The Total Variation in (TSS) is a sum of two parts (1) Explained Sum of Squares
(ESS) which is the part accounted for by and (2) Residual Sum of Squares (RSS) which
is the unexplained and unaccounted part.
8. The coefficient of determination measures the overall goodness of fit of the regression
model. It tells what proportion of the variation in the dependent variable is explained by
the explanatory variable.
9. The coefficient of determination lies between 0 and 1.The closer it is to 1 the better is
the overall goodness of fit of the model. There is no rule which says that such level of
coefficient of determination is high and such level is low. The sign of regression
coefficient is very important.
10. The Coefficient of Correlation measures the degree of association between two
variables. It lies between− . The statistical independence of two variables
implies zero correlation coefficient but not necessarily vice-versa.
11. The Coefficient of determination and the Correlation Coefficient are related as
follows: = ±√

16
____________________________________________________________________________________________________

17

The Gauss Markov Theorem

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Gauss Markov Theorem

Uploaded by

Copyright:

Available Formats

____________________________________________________________________________________________________

Subject Business Economics

Paper No and Title 8, Fundamentals of Econometrics

Module No and Title 3, The gauss Markov theorem

Module Tag BSE_P8_M3

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

2. ASSUMPTIONS OF GAUSS MARKOV THEOREM

Assumption 1: (Linear Regression Model): The regression model is linear in

Assumption 2: ( � Values are Non-Stochastic): The values taken by the explanatory

Assumption 3: (Conditional mean of disturbance term is zero): Given the value of

If this assumption is violated then

[ �⁄ �] ≠ � + � � which is certainly not desirable.

Assumption 4: (Homoscedasticity): The conditional variance of the disturbance term

Since by assumption 3: �⁄ � = we have

Diagrammatically the concept of homoscedasticity is shown in figure 1 where the

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

Assumption 6: Zero Covariance between disturbance term and explanatory variable or

Assumption 8: < <∞

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

To find the OLS estimates there should be some

Assumption 9: The disturbance term � is assumed to be normally distributed

Assumption 10: (Correct functional form Specification)

3. GAUSS MARKOV THEOREM

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

We need to prove that the OLS estimators are (i)

3.1. Proofthat OLS estimator are linear and unbiased.

We know that the OLS estimates

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

3.2. Proof that OLS estimator is efficient

Let �̃ be an arbitrary estimator which is linear in .

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

For the estimator �̃ to be unbiased we need the following restrictions to hold

The variance of this arbitrary estimator �̃ is

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

3.3. Proof that OLS estimator is consistent

The property of consistency is a large sample property or an asymptotic property unlike

Where � � means probability limit. In other words converges in probability to �

Take � � operator on both the sides

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

Therefore OLS estimator is a consistent estimator.

4.1 Measures of variation

Recall that the Sample Regression Function is

Subtracting (2) from (1) we have

Writing equation (3) in deviation form we have

= ∑�= ( ̂ − ̅ ) = ∑�= ̂ − ̅̂ = ∑�= � − ̅ is the variation of estimated

4.2. Coefficient of Determination

We have TSS = ESS + RSS

Now divide both sides by TSS we have

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

Therefore measures the percentage of the total

Alternatively, can also be defined in another form by little manipulation of formulae.

So squaring the simple correlation between � and ̂� gives the coefficient of

BUSINESS PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS

When there is perfect relationship between and we

4.3. Coefficient of Correlation

Alternatively the coefficient of correlation could be obtained as follows:

Properties of Coefficient of Correlation

1. The sign of Coefficient of Correlation can be positive or negative depending upon

2. It can lie between -1 and +1. So − .

3. The Coefficient of Correlation is symmetrical in nature. So Coefficient of

5. If are statistically independent then the coefficient of correlation

6. The Coefficient of Correlation measure on linear association or dependence. So it