You are on page 1of 26

CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

CHAPTER THREE
MULTIPLE LINEAR REGRESSION ANALYSIS

3.1. The Concept of Multiple Linear Regression Analysis


So far, we have been discussing about the simplest form of regression analysis called
simple linear regression analysis. However, the most realistic representation of real
world economic relationships is obtained from multiple regression analysis. This is
because most economic variables are determined by more than a single variable.

MLRMs also have some advantages over SLRMs. Firstly, SLRMs are doubtful to draw
citrus paribus conclusion. In order to reach at a ceteris paribus conclusion, the effect of
all other factors should be controlled. In SLRMs, the effect of all other factors is assumed
to be captured by the random term . In this case a ceteris paribus interpretation would
be possible if all factors included in the random error term are uncorrelated with the X.
This, however, is rarely realistic as most economic variables are interdependent. In effect
in most SLRMs the coefficient of X misrepresents the partial effect of on . This
problem in econometrics is known as exclusion bias. Exclusion biases could be
minimized in econometric analysis if we could explicitly control for the effects of all
variables that determine X. Multiple regression analysis is more amenable to ceteris
paribus analysis because it allows us to explicitly control for many other factors that
simultaneously affect the dependent variable. This is important both for testing economic
theories and for evaluating policy effects when we must rely on non-experimental data.

MLRMs are also flexible. A single MLRM can be used to estimate the partial effects of
so many variables on the dependent variable. In addition to their flexibility, MLRMs also
have higher explanatory power than SLRMs. Because the larger the number of
explanatory variables in a model, the large is part of which could be explained or
predicted by the model.

Consider the following relationship among four economic variables say, quantity demand
( ), price of the good ( ), price of a substitute good ( ) and consumer’s income
( ). Assuming linear functional form of the relationship, the true relationship can be
modeled as follows:

By: Teklebirhan A. (Asst. Prof.) Page 1


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

= + + + + ……………………..( . )
Where,
′ are fixed and unknown parameters, and is the population disturbance term.
is the intercept.
measures the change in with respect to , holding other factors fixed.
measures the change in with respect to , holding other factors fixed.
measures the change in with respect to , holding other factors fixed.
 Equation (3. ) is a multiple regression model with three explanatory
variables.

Just as in the case of simple regression, the variable is the error term or disturbance
term. No matter how many explanatory variables we include in our model, there will
always be factors we cannot include, and these are collectively contained in . This
disturbance term is of similar nature to that in simple regression, reflecting:
- Omissions of other relevant variables
- Random nature of human responses
- Errors of aggregation and measurement, etc.

In this chapter, we will first start our discussion with the basic assumptions of the
multiple regression analysis, and we will proceed our analysis with the case of two
explanatory variables and then we will generalize the multiple regression model for the
case of k-explanatory variables using matrix algebra.

2.1. The Basic Assumptions of the Classical Regression Analysis (Revisited)


These are necessary conditions to obtain desirable results from the application of OLS to
estimate MLRMs. These include:
1. Randomness of the error term: The variable is a real random variable.
2. Zero mean of the error term: ( ⁄ )=
3. Hemoskcedasticity: The variance of each is the same for all the values.
i.e. = (constant variance)
4. Normality of : The values of each are normally distributed.
i.e., ~ ( , )
5. No auto or serial correlation: The values of (corresponding to ) are
independent from the values of any other (corresponding to ) for  .

By: Teklebirhan A. (Asst. Prof.) Page 2


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

i.e., = for 
6. The values of the explanatory variables included in the model are fixed in repeated
sampling
7. Independence of and : Every disturbance term is independent of the
explanatory variables. i.e., ( )= ( )=
 This condition is automatically fulfilled if we assume that the values of the
X’s are a set of fixed numbers in all (hypothetical) samples.
8. No perfect multi-collinearity:- The explanatory variables of the models are not
perfectly correlated. That is, no explanatory variable of the model is a linear
combination of the other. Perfect collinearity is a problem, because the least squares
estimator cannot separately attribute variation in to the independent variables.
 Example: Suppose we regress weight ( ) on height measured in meters
( ) and height measured in centimeters( ). How could we
decide which regressor to attribute the changing weight to?
9. No model specification error
10. No error of aggregation and so on.

2.2. Estimation of a Multiple Regression Model


As in the case of SLRA, we can estimate the relationship between variables of a multiple
regression model using the method of Ordinary Least Squares (OLS). Thus, to
understand the nature of multiple regression analysis, we start our analysis with the case
of two explanatory variables, and then we shall extend this to the case of k-explanatory
variables.

3.3.1 A Model with Two Explanatory Variables


A MLRM with two explanatory variables is expressed mathematically as:
= + + + ……………………… ( . )
The conditional expected value of the above equation given the explanatory variables of
the model is called the population regression function (PRF) and given by;
( )= + + ……………………( . )
( )=

By: Teklebirhan A. (Asst. Prof.) Page 3


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

Where, is the population parameters,


is referred to as the intercept and
are known as the slope coefficient of the PRF.

Since the population regression equation is unknown, it has to be estimated from sample
data. That is ( . ) has to be estimated by the sample regression function as follows:
= + + ………………………………( . )
Where, are estimates of the and is known as the predicted value of .

Therefore, given sample observations on , , we can estimate ( . ) by


( . ) as follows.
= + + + ……………………...( . )
From ( . ) we can obtain the prediction error/residuals of the model as:
= − = − − − ……………………..( . )
The method of ordinary least squares chooses the estimates that minimize the squared
prediction error of the model/sum of squared residuals. Therefore, squaring and
summing ( . ) for all sample values of the variables, we get the total of squared
prediction error of the model.
That is, ∑ = ∑( − − − ) ……………………( . )
Therefore, to obtain expressions for the least square estimators, we partially differentiate
∑ with respect to , and set the partial derivatives equal to zero.
[∑ ]
=− ( − − − ) = ……………….( . )

[∑ ]
=− ( − − − )= ……………( . )

[∑ ]
=− ( − − − )= ……………( . )

Simple manipulation of the multiple regression equation produces the following three
equations called OLS Normal Equations: ∑ = + ∑ +
∑ …………………………...( . ) ∑ = ∑ + ∑ +
∑ …………..…( . ) ∑ = ∑ + ∑ +
∑ …………..…( . )
From ( . ) we obtain ,

By: Teklebirhan A. (Asst. Prof.) Page 4


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

= − − ………………………………………..( . )
Substituting (3.12) in (3.10), we obtain:
∑ = − − ∑ + ∑ + ∑

⇒ = − − +

⇒ − = − + −

⇒ −

= − + − ………...( . )

We know that
∑( − ) =∑ − =∑
∑( − ) =∑ − =∑
Substituting the above equations in equation ( . ), equation ( . ) can be written in
deviation form as follows:
∑ = ∑ + ∑ …………………..( . )
Following the above procedure, if we substitute (3.13) in (3.12), we obtain;
∑ = ∑ + ∑ …………………………… ( . )
Let’s put ( . ) and ( . ) together
∑ = ∑ + ∑
∑ = ∑ + ∑
, can easily be solved using matrix approach
We can rewrite the above two equations in matrix form as follows.
∑ ∑ = ∑ ………….( . )
∑ ∑ ∑
We can solve the above matrix using ’ and obtain and as follows;

By: Teklebirhan A. (Asst. Prof.) Page 5


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

∑ ∑ ∑ ∑
∑ ∑ ∑ ∑
= ∑ ∑
= ∑ ∑
∑ ∑ ∑ ∑

Therefore, we obtain;
∑ .∑ −∑ .∑
= ……………………..( . )
∑ .∑ − (∑ )
∑ . ∑ −∑ .∑
= ……………………..( . )
∑ .∑ − (∑ )

3.3.2. Interpretation of the OLS Regression Equation


More important than the details underlying the computation of the is the
interpretation of the estimated equation. Let’s consider the case of two explanatory
variables:
= + +

The intercept in the above equation is the predicted value of when = and
= .
Sometimes, setting and both equal to zero is an interesting scenario; in other cases,
it may not make sense. Nevertheless, the intercept is always needed to obtain a prediction
of from the OLS regression line.

The estimates and have partial effect, or ceteris paribus, interpretations. From
the above equation, we have
∆ = ∆ + ∆
So we can obtain the predicted change in given the changes in and . In particular,
when is held fixed, so that ∆ = , then
∆ = ∆ ,
holding fixed. The key point is that, by including in our model, we obtain a
coefficient on with a ceteris paribus interpretation. This is why multiple regression
analysis is so useful. Similarly,
∆ = ∆ ,
holding fixed.

By: Teklebirhan A. (Asst. Prof.) Page 6


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

Example:3.1: Suppose an econometrician has estimated the following wage model


based on a sample of 100 individuals from a given city:
( )= . + . + .

Where, ( ) is the natural logarithm of hourly wage measured in ETB


Educ is education attainment of sample respondent measured in years of
schooling &
is experience measured in years of related work experience

How can one interpret the coefficients of Educ and Exper? (NB: the coefficients
have a percentage interpretation when multiplied by 100)

The coefficient 0.125 means that, holding exper fixed, another year of education is
predicted to increase by . %, on average. Alternatively, if we take two people
with the same levels of experience, the coefficient on educ is the proportionate difference
in predicted wage when their education levels differ by one year. Similarly, the
coefficient of Experience, 0.085 means that holding Educ fixed, another year of related
work experience is predicted to increase by . %, on average.

3.3.3. The Coefficient of Determination ( ): The Case of Two explanatory variables


In the simple regression model, we introduced as a measure of the proportion of
variation in the dependent variable that is explained by variation in the explanatory
variable. In multiple regression model the same measure is relevant, and the same
formulas are valid but now we talk of the proportion of variation in the dependent
variable explained by all explanatory variables included in the model.

The coefficient of determination is defined as:



= = − = − ………………………………..( . )

In the present model of two explanatory variables:
= + +
∴ = − −

By: Teklebirhan A. (Asst. Prof.) Page 7


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

∑ = ∑( − − )
∑ =∑ ( − − )
= ∑ − ∑ − ∑
= ∑ ∑ =∑ =0
∑ = ∑ ( − − )
∑ = ∑ − ∑ − ∑

∑ =

∑ + ∑ + ∑ … … … … . . (3.20)

∑ + ∑
∴ = = … … … … … … … … … . . (3.21)

As in simple regression, R2 is also viewed as a measure of the prediction ability of the


model over the sample period, or as a measure of how well the estimated regression fits
the data. If is high, the model is said to “fit” the data well. On the other hand, if is
low, the model does not fit the data well.

Adjusted Coefficient of Determination ( )


One major limitation with is that it can be made large by adding more and more
variables, even if the variables added have no economic justification. Algebraically, it is
the fact that as the variables are added, the sum of squared errors (RSS) goes down (it can
remain unchanged, but this is rare) and thus goes up. If the model contains −
variables then = . The manipulation of model just to obtain a high is not wise.
To overcome this limitation of the , we can “adjust” it in a way that takes into account
the number of variables included in a given model. This alternative measure of goodness
of fit, called the adjusted and often symbolized as , is usually reported by
regression programs. It is computed as:

By: Teklebirhan A. (Asst. Prof.) Page 8


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

∑ ⁄
= −∑ ⁄
=

−( − ) ………………( . )
This measure does not always goes up when a variable is added because of the degree of
freedom term − is the numerator. That is, the primary attractiveness of is that it
imposes a penalty for adding additional regressors to a model. If a regressor is added to
the model then RSS decreases, or at least remains constant. On the other hand, the
degrees of freedom of the regression − always decrease.

An interesting algebraic fact is that if we add a new regressor to a model, increases if, and
only if, the t statistic on the new regressor is greater than 1 in absolute value. Thus, we
see immediately that could be used to decide whether a certain additional regressor
must be included in the model. The has an upper bound that is equal to 1, but it does
not strictly have a lower bound since it can take negative values. While solving one
problem, this corrected measure of goodness of fit unfortunately introduces another one.
It loses its interpretation; is no longer the percent of variation explained.

3.3.4. General Linear Regression Model and Matrix Approach


So far we have discussed the regression models containing one or two explanatory
variables. The econometric analysis of the simple regression model and a model with
two explanatory variables was made with ordinary algebra. However, a model with more
than two explanatory variables is virtually intractable with this tool. For this reason, a
multiple regression model with more than two explanatory variables can easily be
estimated using matrix algebra.

The general linear regression model with explanatory variables is written in the form:
= + + + ……..+ + ………………( . )
Where, ( = 1, 2, 3, … … … ) and is the size of the observation, and is the intercept,
to partial slope coefficients is stochastic disturbance term and is
observation.

By: Teklebirhan A. (Asst. Prof.) Page 9


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

Since represents the observation, we shall have ‘ ’ number of equations with ‘ ’


number of observations on each variable.
= + + + ……..+ +
= + + + ……..+ +
= + + + ……..+ +
…………………………………………………...............
= + + + ……..+ +
The above system of equations can be expressed in a compact form by using matrix
notation as:
 Y1  1 X 11 X 21 ....... X k 1   0  U 1 
Y  1 X    U 
 2  12 X 22 ....... X k 2   1   2
 Y 3   1 X 13 X 23 ....... X k 3   2   U 3 
       
 .  . . . ....... .   .   . 
Y n  1 X 1n X 2n ....... X kn    n  U n 
Y  X .   U

In short = + ……………………………………………………………..( . )
Where: is an ( × ) column vector of true values of .
is an ( × ( + )) matrix of true values of explanatory variables of the
model where the first column of 1’ represents the intercept term.
is a (( + ) × ) column vector of the population parameters
, , ,…., .
is an ( × ) column vector of the population random disturbance (error)
term.

Equation ( . ) is true population relationship of the variables in matrix format. By


taking the conditional expected value of ( . ) for a given values of the explanatory
variables of the model, we get the population regression function ( ) in matrix format
as:
( )= ( )+ ( )
( )= , ( )=
Therefore, = ( )= ………………………………( . )
In econometrics equations like (3.26) is difficult to estimate since estimation of (3.26)
requires population observation on all possible values of the variables of the model. As a
result, in most econometric analysis the true population relationship like (3.26) is
estimated by sample relationship. The sample relationship among variables with ‘ ’
number of explanatory variables and ‘ ’ number of observations can be set in matrix
format as follows:

By: Teklebirhan A. (Asst. Prof.) Page 10


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

= + ……………………………………………………………..( . )
Where: is a (( + ) × ) column vector of estimates of the true population
parameters
= is an ( × ) column vector of predicted values of the dependent
variable .
is an ( × ) column vector of the sample disturbance (error) term.

As in the two explanatory variables model, in the k-explanatory variable case the OLS
estimators are obtained by minimizing
= ( − − − − ⋯..− ) ……………………( . )

Where, ∑ is the total squared prediction error (or RSS) of the model.
In matrix notation, this amounts to minimizing . That is:

⎡ ⎤
⎢ . ⎥
=[ , ,…, ] . ⎢ . ⎥= + + + ⋯………+ =
⎢ ⎥
⎢ . ⎥
⎣ ⎦
∴ = …………………………………………..( . )
From (3.28) we can derive that; = − = −
Therefore, through substitution in equation 3.30, we get;
= − ( − )
= − − + ……………..( . )
The order of the matrix = ( × ) × ( ×( ) × (( )× = ( × ) , it is
scalar.
Since the matrix is a scalar, has only a single entry, it is equal to its transpose. That
is;
( × ) = [ ( × )]

( × ) = ( × )
= − + ……………..( . )
Minimizing in (3.32) with respect to we get the expressions for OLS estimates in
matrix format as follows:
( )
= = −2 +

To get the above expression, we used the rule of differentiation of matrix notations,
namely;

By: Teklebirhan A. (Asst. Prof.) Page 11


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

( ) ( )
= , =
Equating the above expression to null vector, 0, we obtain:
−2 + = ⇒ =  OLS Normal Equation

=( ) ……………………………….……….( . )

 Therefore, is the vector of required least square estimators, , ,……..,

3.3.5. The Coefficient of Determination ( ): The case of ‘k’ explanatory variables


The coefficient of multiple determination ( ) of MLRMs with ‘k’ number of
explanatory variables can be derived in matrix form as follows.

We know from (3.32) that


= = − +
Since ( ) = ∑ =
∴ = + −
= − ……………………………………………………( . )
= − ……………………………………………………( . )
We know, = − , and hence, ∑ = ∑[ − ]
∴ = −
In matrix notation
∑ = − ……………………………………………….( . )
Equation (3.36) is the total variation (TSS) of the model.
( )=∑ −∑

= − − ⇒ = − −( − )

= − ……………………………..( . )
∑ ∑ …… ∑
Recall that, = = ∑
=


∴ =
− ………………………………..( . )

By: Teklebirhan A. (Asst. Prof.) Page 12


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

Numerical Example-2:
 As illustration, let’s rework our consumption-income example of chapter -2
Observations 1 2 3 4 5 6
Consumption Expenditure (Y) 4 4 7 8 9 10
Monthly Income (X) 5 4 8 10 13 14

 Based on the above data,


a) Compute the OLS estimates using matrix formulation
b) Compute using matrix formulation
 Solution
a) For one explanatory variable case, = =( )

⎡ ⎤
⎢ ⎥
……. ⎢ ⎥=
=
……. ⎢ . ⎥
⎢ . ⎥
⎣ ⎦
⎡ ⎤
⎢ ⎥
……. ⎢ ⎥=
=
……. ⎢ . ⎥
⎢ . ⎥
⎣ ⎦
, ;
= and =
 Now, find the inverse of the matrix
Recall that the inverse of a matrix can be found as follows:
( ) = [ ( )]
| |
ℎ , ( ) ℎ ℎ
and = (−) | |
( )= −
, and its transpose is



− . − .
( ) = =
− − . .
Where, the determinant of matrix is 504
Therefore,

By: Teklebirhan A. (Asst. Prof.) Page 13


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

. − . .
= =( ) = =
− . . .

b) Recall that ∴ =
=[ . . ] = .

⎡ ⎤
⎢ ⎥
=[ .. ] ⎢ ⎥= =
⎢ . ⎥
⎢ . ⎥
⎣ ⎦

= ( )=
. − .
, = = ≅ .

3.4. Statistical Properties of OLS Estimators: Matrix Approach
Like in the case of simple linear regression, the OLS estimators satisfy the Gauss-
Markov Theorem in multiple regressions. That is, in the class of linear and unbiased
estimators, OLS estimators are best estimators. Now, we are in a position to examine the
desirable properties of the OLS estimators in matrix notations:
1. Linearity: Proposition
We know that: =( )
To show the above proposition, let =( )
∴ = …………………………………………………..( . )
Since is a matrix of fixed variables, equation (3.39) indicates that is linear in Y.
2. Unbiasedness: Proposition
=( )
=( ) + ) (
= +( ) ………………………………………( . )
, [( ) ]=
( )= ( +( ) )
= ( ) + [( ) ]
= + ( ) ( )
( )= ……………………..……………( . )
Since ( ) =
Thus, least square estimators are unbiased estimators in MLEMs.

By: Teklebirhan A. (Asst. Prof.) Page 14


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

3. Minimum variance
Before showing all the OLS estimators are best (possess the minimum variance property),
it is important to derive their variance.

We know that, = − ( = − = − −
− − =

− − − …… − −
⎡ ⎤
⎢ − − − …… − − ⎥
=⎢ . . ⎥
⎢ ⎥
⎢ . . ⎥
⎣ − − − − ……. − ⎦

, …… ,
⎡ ⎤
⎢ , … …. , ⎥
=⎢ . . . ⎥
⎢ ⎥
⎢ . . . ⎥
⎣ , , …….. ⎦

The above matrix is a symmetric matrix containing variances along its main diagonal and
covariance of the estimators everywhere else. This matrix is, therefore, called the
Variance-covariance matrix of least squares estimators of the regression slopes. Thus,

= − − … … … … … … … … … … … . (3.42)

From (3.40), we know that = +( )


⇒ − =( ) … … … … … … … … … … … . (3.43)
Substituting (3.43) in (3.42)
= [{( ) }{( ) }]
= [( ) ( ) ]
=( ) ( ) ( )
=( ) ( )
= ( ) ( )

= ( )… … … … … … … … … … … … (3.44)

By: Teklebirhan A. (Asst. Prof.) Page 15


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

Note: ( being a scalar can be moved in front or behind of a matrix while identity
matrix can be suppressed).

Thus, we obtain, = ( )
∑ …….. ∑
⎡∑ ⎤
∑ … … .. ∑
⎢ ⎥
Where, ( )=⎢ . . . ⎥
⎢ . . . ⎥
⎣ ∑ ∑ ……. ∑ ⎦
th
We can, therefore, obtain the variance of any estimator say by taking the i term from
the principal diagonal of ( ) and then multiplying it by .
NB, the X’s are in their absolute form.
When the ’ are in deviation form, we can write the multiple regression in matrix
form as; =( )
∑ ∑ …….. ∑
⎡ ⎤ ⎡ ⎤
⎢ ⎥ ⎢∑ ∑ … … .. ∑ ⎥
Where, =⎢ . ⎥ ( )=⎢ . . . ⎥
⎢ . ⎥ ⎢ . . . ⎥
⎣ ⎦ ⎣ ∑ ∑ ……. ∑ ⎦
The above column matrix doesn’t include the constant term . Under such condition,
the variances of slope parameters in deviation form can be written as:
= ( ) …………………………..( . )
In general, for MLEMs with two explanatory variables, the variances of OLS estimates
can be derived as follows. Such a model can be written in deviation form as:
= +

= − −


In this model; − =

− = − −

∴ − − = − −

By: Teklebirhan A. (Asst. Prof.) Page 16


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

, − − =

− − −
− − −
,
∴ − − =
,
Therefore, in case of two explanatory variables, in deviation form shall be:

⎡ ⎤
⎢ ⎥ ……..
=⎢ . ⎥ = ……..
⎢ . ⎥
⎣ ⎦

⎡ ⎤
…….. ⎢ ⎥
= ⎢ . ⎥ =
……..
⎢ . ⎥
⎣ ⎦
∑ 22 −∑ 1 2
( ) −∑ 1 2 ∑ 21
( ) = =
| | ∑ 21 ∑ 1 2
∑ 1 2 ∑ 22
∑ ∑
∑ ∑
Thus, = ( ) = ∑ ∑
∑ ∑


∴ = …………………………( . )
∑ ∑ − ∑( )

= ……………………………..…( . )
∑ ∑ − (∑ )
(−) ∑
, = ……………………….…( . )
∑ ∑ − (∑ )

The only unknown part in variances and covariance of the estimators is . Thus, we
have to have an unbiased estimate of the variance of the population error . As we have

established in simple regression model = −
is an unbiased estimator of .

By: Teklebirhan A. (Asst. Prof.) Page 17


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

For k-parameters (including the constant parameter),



= …………………………( . )

Where,
∑ =∑ − ∑ − ∑ − ⋯..− ∑ …………….……( . )

For MLEMs with two explanatory variables, we have three parameters including the
constant term and therefore,

= ………………………………….( . )

This is all about the variance covariance of the parameters. Now it is time to see the
minimum variance property.
Minimum variance of
To show that all the in the vector are Best Estimators, we have to prove that the
variances obtained in ( . ) are the smallest amongst all other possible linear unbiased
estimators. We follow the same procedure as followed in case of single explanatory
variable model where, we first assumed an alternative linear unbiased estimator and then
it was established that its variance is greater than the estimator of the regression model.


Assume that is an alternative unbiased and linear estimator of , and is given by

= [( ) + ]
Where, is ( ) arbitrary matrix that is a function of X and/or other non-stochastic
variables, but it is not a function of y.
∴ ∗ = [( ) + ][ + ] , Since = +

⇒ = + + [( ) + ] ………………………( . )
Taking expectations on both sides of the above expression, we have

( )= + + [( ) + ] ( )

( )= + , [ ( ) = ]…………………………………..( . )
Since our assumption regarding an alternative ∗ is that it has to be an unbiased estimator
of , that is, ( ∗ ) = ; in other words, ( ) should be a null matrix.

Thus, we say should be zero if = [( ) + ] has to be an unbiased
estimator.

By: Teklebirhan A. (Asst. Prof.) Page 18


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

Let us now find variance of this alternative estimator.



Given that = , equation (3.53) can be written as − = [( ) + ]
∗ ∗ ∗
ℎ , = − −
∗ [(
= ) + ] [( ) + ]
[( ) + ] ( )[( ) + ]
= [( ) ( ) + ( ) +( ) + ]
= [( ) + ]
( ( ) = ℎ ℎ , = = )
∗ …… …………………………….( . )
∴ = ( ) +


Therefore, is greater than by an expression and it proves that
is the best estimator.

In conclusion, is Best Unbiased Linear Estimator; that is to say, it is a BLUE


estimator.

3.5. Evaluation of an Estimated MLRM Model Using Statistical Criteria


After estimation of a MLRM, the next task is evaluation of the statistical relevance of the
variables included in the model. This can be done using the usual standard test
techniques. However, to use the standard significance test techniques, we need
Normality in the Probability Distributions of OLS’s Parameter Estimates in
MLRMs, .

3.5.1 The Probability Distributions of OLS’s Parameter estimates in MLRMs


In equation (3.39), we have established that OLS’s estimates in MLRMs are linear in .
And from the basic property of normal probability distribution, we know that any linear
function of a normal random variable is itself normal. Thus, since is a linear
combination of , its probability distribution will be normal if is a normal.

From equation (3.26), we know:


= +

By: Teklebirhan A. (Asst. Prof.) Page 19


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

In the above equation, is a matrix of fixed values, because the values of ′s are fixed
in repeated samples and are the true values of parameters of the population. As a
result, in the above equation is linear in , i.e., the dependent variable is a linear
combination of the values of the population random term. Consequently, since the
population random disturbance term is normal by assumption, it follows that the
dependent variable Y is also normal. Since are linear combinations of another normal
random variable ( ), then it follows that OLS’s estimates themselves are normal
random variables. Therefore, the probability distributions of the sampling distribution of
OLS estimates in MLEMs are normal.

That is, the sampling distributions of OLS estimates in MLRMs are normal with mean
values of the true values of their respective population parameters and variances given
by ( ) .

Symbolically:
~ [ , ( ) )] … … … … … … … … … . ( . )
Or Equivalently,
~ , ( ) ) ……………………….( . )

The normality of the sampling distributions of OLS’s estimates around the true values of
the population parameters implies that under AMLRA there is equal chance for any OLS
estimate to over or under estimate the true value of the population parameter in a
particular sample. But the most probable value for an estimate at a particular sample is
the true value of the population parameter.

3.5.2 Statistical Significance Test of Estimates in MLRMs /Hypothesis


Testing/
In multiple regression models, we will undertake two types of tests of significance. These
are test of individual and overall significance. Let’s examine them one by one.
1. Tests of Individual Significance
This is the process of verifying the individual statistical significance of each parameter
estimates of a model. That is, checking whether the impact of a single explanatory

By: Teklebirhan A. (Asst. Prof.) Page 20


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

variable on the dependent variable is significant or not after taking the impact of all other
explanatory variables on the dependent variable into account. To elaborate test of
individual significance, consider the following model of the determinants of Teff farm
productivity.
= + + + … … … … … … … … ….(3.58)
Where: is total output of Teff per hectare of land, and are the amount of
fertilizer used and rainfall, respectively.
Given the above model suppose we need to check whether the application of fertilizer
( ) has a significant effect on agricultural productivity holding the effect of rainfall ( )
on Teff farm productivity constant, i.e., whether fertilizer ( ) is a significant factor in
affecting Teff farm productivity after taking the impact of rainfall on Teff farm
productivity into account. In this case, we test the significance of holding the
influence of on Y constant. Mathematically, test of individual significance involves
testing the following two pairs of null and alternative hypotheses.
. : = B. : =
: ≠ : ≠

The null hypothesis in states that holding constant, has no significant (linear)
influence on . Similarly, the null hypothesis in ‘ ’ states that holding constant,
has no influence on the dependent variable . To test the individual significance of
parameter estimates in MLRMs, we can use the usual statistical techniques of tests. These
include:

A. The Standard Error Test


The standard error test can be applied if the population variances of the parameter
estimates are known or if the sample size is sufficiently large ( > ). It can be used
to test the individual significance of parameter estimates at 5% level of significance. To
elaborate the procedures of standard error test as a test of individual significance, we see
a model of two explanatory variables given by = + + . But we will see
the test procedure only for .The test procedure for can be done in the same way.
Step 1: Set the null and the alternative hypotheses empirically:

By: Teklebirhan A. (Asst. Prof.) Page 21


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

: = and : ≠
Step 2: Compute the standard error of the estimate.

∑ ∑
= ( )= ; ℎ , =
∑ ∑ − (∑ ) −
Step 3: Make decision. That means, accept or reject the null-hypothesis. In this case
If > , Accept the null hypothesis. That is, the estimate is not
statistically significant at 5% level of significance. This would imply that holding
constant has no significant linear impact on .
If < , reject the null hypothesis. That is, the estimate is
statistically significant at 5% level of significance. This would imply holding
constant has significant linear impact on .

B. The Student’s T-Test


If the size of the sample is small ( ≤ ) and the population variances of the estimates
are unknown then, we use the student’s t-test to perform test of individual significance of
parameter estimates at any chosen level of significance. The procedure is as follows.
Step 1: State the null and alternative hypotheses empirically.
: = and : ≠
Step 2: Choose the level of significance ( ).
Step 3: Determine the critical values and identify the acceptance and rejection regions of
the null hypothesis at the chosen level of significance ( ) and the degrees of freedom
( − ). To identify the critical values divide level of significance (α) into two then read
table value, from t-probability table at with ( − ), where is number of
observations and is number of parameters in the model including the intercept term. In
case of two explanatory variables model, the number of parameters is 3, then the degree
of freedom is − .
Step 4: Compute the t-statistics ( ) of the estimates under the null hypothesis. That is,

=

Since = in the null hypothesis, the computed t-statistics ( ) of the estimate is

By: Teklebirhan A. (Asst. Prof.) Page 22


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24


= ⇒ =
( ) ( )
Step 5: Compare and make decision
If < | |, accept the null hypothesis. That is, is not significant at the chosen
level of significance. This would imply holding constant has no significant
linear impact on .
If > | |, reject the null hypothesis and hence accept the alternative one. That
is, is significant at the chosen level of significance. This would imply
holding constant has significant linear impact on Y.
3.5.2 Test of the Overall Significance of MLRMs
This is the process of testing the joint significance of parameter estimates of the model. It
involves checking whether the variation in dependent variable of a model is significantly
explained by the variation in all explanatory variables included in the model. To elaborate
test of the overall significance consider a model:
= + + + + ⋯…..+ +
Given this model we may interested to know whether the variation in the dependent
variable can be attributed to the variation in all explanatory variables of the model or not.
If no amount of the variation in the dependent variable can be attributed to the variation
of explanatory variables included in the model then, none of the explanatory variables
included in the model are relevant. That is, all estimates of slope coefficient will be
statistically not different from zero. On the other hand, if it is possible to attribute
significant proportion of the variation in the dependent variable to the variation in
explanatory variables then, at least one of the explanatory variables included in the model
is relevant. That is, at least one of the estimates of slope coefficient will be statistically
different from zero (significant).

Thus, this test has the following null and alternative hypotheses to test:
: = = ………………..= =
: t least one of the is different from zero
The null hypothesis in a joint hypothesis states that none of the explanatory variables
included in the model are relevant in a sense that no amount of the variation in Y can be

By: Teklebirhan A. (Asst. Prof.) Page 23


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

attributed to the variation in all explanatory variables simultaneously. That means if all
explanatory variables of the model are change simultaneously it will left the value of Y
unchanged.
How to approach test of the overall significance of MLRM?
If the null-hypothesis is true, that is if all the explanatory variables included in the model
are irrelevant then, there wouldn’t be a significant explanatory power difference between
the models with and without all the explanatory variables. Thus, test of the overall
significance of MLRMs can be approached by testing whether the difference in
explanatory power of the model with and without all explanatory variables is significant
or not. In this case, if the difference is insignificant we accept the null-hypothesis and
reject it if the difference is significant.

Similarly, this test can be done by comparing the sum of squared errors (RSS) of the
model with and without all explanatory variables. In this case we accept the null-
hypothesis if the difference between the sums of squared errors (RSS) of the model with
and without all explanatory variables is insignificant. The notion of this is
straightforward in a sense that if all explanatory variables are irrelevant then, inclusion of
them in the model contributes insignificant amount to the explanation of the model as a
result the sample prediction error of the model wouldn’t reduce significantly.

Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors of the
model without the inclusion of all the explanatory variables of the model, i.e., the residual
sum of square of the model obtained assuming that all the explanatory variables are
irrelevant (under the null hypothesis) and Unrestricted Residual Sum of Squares
(URSS) be the sum of squared errors of the model with the inclusion of all explanatory
variables in the model. It is always true that ≥ (why?). To elaborate these
concepts consider the following model
= + + + + ⋯…..+ +
This model is called the unrestricted model. The test of joint hypothesis is given by:
: = = ………………..= =
: t least one of the is different from zero
We know that:
= + ⇒ = −

By: Teklebirhan A. (Asst. Prof.) Page 24


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

= ( − )
This sum of squared error is called unrestricted residual sum of square (URSS).

However, if the null hypothesis is assumed to be true, i.e., when all the slope coefficients
are zero the model shrinks to:
= +
This model is called restricted model. Applying OLS we obtain:

= = ………………………………………..( . )
Therefore, = − , but =
= −
∴ ∑ = ∑( − ) = ∑ =
The sum of squared error when the null hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and this is equal to the total sum of
square (TSS).

The ratio:
− ⁄ −
~ ( , ) …………………..( . )
⁄ −
has an − with − and − degrees of freedom for the numerator
and denominator, respectively.
=
=∑ − ∑ − ∑ − ⋯…..………...− ∑ =
. ., =
− ⁄ −
= ~ ( , )
⁄ −
⁄ −
( , ) =…………………………………( . )
⁄ −
If we divide numerator and denominator of the above equation by then:

=

− ⁄
∴ = ………………………………………( . )
− ⁄ −
This implies that the computed value of F can be calculated either as a ratio of
& or & − . This value is compared with the table value of F which

By: Teklebirhan A. (Asst. Prof.) Page 25


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2023/24

leaves the probability of in the upper tail of the F-distribution with − & −
degrees of freedom.

 If the null hypothesis is not true, then the difference between RRSS and URSS
(or i.e., TSS & RSS) becomes large, implying that the constraints placed on the
model by the null hypothesis have large effect on the ability of the model to fit the
data, and the value of tends to be large. Thus, reject the null hypothesis if the
computed value of F (i.e., F test statistic) becomes too large or the P-value for
the F-statistic is lower than any acceptable level of significance ( ), and vice
versa.
In short, the Decision Rule is to
 Reject if > ( − , − ), − < , and vice versa.
 Implication: implies that the parameters of the model are jointly
significant or the dependent variable is linearly related to at least one of the
independent variables included in the model.

The Analysis of Variance (ANOVA)


When we use Stata to estimate a given model, the Stata will produce a table at the top of
the regression result. This table lists the results of what is called the ANOVA. The
purpose of the Analysis of Variance (ANOVA) table is to partition the total variation of
the dependent variable into component parts, one due to its systematic association with
another variable and the second due to random error, i.e., residuals. It is a summary of the
explanation of the variation in the dependent variable. The ANOVA table for two
explanatory variables model is given as follows:
Source of variation Sum of Squares (SS) df Mean Square (MS)
Explained + 2 ∑ + ∑
Variation (ESS)
Unexplained n-3 ∑
=
Variation (RSS) −
Total Variation (TSS) = + + n-1

NB: From the ANOVA table above, we can obtain the F-statistics of the model.

By: Teklebirhan A. (Asst. Prof.) Page 26

You might also like