Professional Documents
Culture Documents
CHAPTER THREE
MULTIPLE LINEAR REGRESSION ANALYSIS
MLRMs also have some advantages over SLRMs. Firstly, SLRMs are doubtful to draw
citrus paribus conclusion. In order to reach at a ceteris paribus conclusion, the effect of
all other factors should be controlled. In SLRMs, the effect of all other factors is assumed
to be captured by the random term . In this case a ceteris paribus interpretation would
be possible if all factors included in the random error term are uncorrelated with the X.
This, however, is rarely realistic as most economic variables are interdependent. In effect
in most SLRMs the coefficient of X misrepresents the partial effect of on . This
problem in econometrics is known as exclusion bias. Exclusion biases could be
minimized in econometric analysis if we could explicitly control for the effects of all
variables that determine X. Multiple regression analysis is more amenable to ceteris
paribus analysis because it allows us to explicitly control for many other factors that
simultaneously affect the dependent variable. This is important both for testing economic
theories and for evaluating policy effects when we must rely on non-experimental data.
MLRMs are also flexible. A single MLRM can be used to estimate the partial effects of
so many variables on the dependent variable. In addition to their flexibility, MLRMs also
have higher explanatory power than SLRMs. Because the larger the number of
explanatory variables in a model, the large is part of which could be explained or
predicted by the model.
Consider the following relationship among four economic variables say, quantity demand
( ), price of the good ( ), price of a substitute good ( ) and consumer’s income
( ). Assuming linear functional form of the relationship, the true relationship can be
modeled as follows:
= + + + + ……………………..( . )
Where,
′ are fixed and unknown parameters, and is the population disturbance term.
is the intercept.
measures the change in with respect to , holding other factors fixed.
measures the change in with respect to , holding other factors fixed.
measures the change in with respect to , holding other factors fixed.
Equation (3. ) is a multiple regression model with three explanatory
variables.
Just as in the case of simple regression, the variable is the error term or disturbance
term. No matter how many explanatory variables we include in our model, there will
always be factors we cannot include, and these are collectively contained in . This
disturbance term is of similar nature to that in simple regression, reflecting:
- Omissions of other relevant variables
- Random nature of human responses
- Errors of aggregation and measurement, etc.
In this chapter, we will first start our discussion with the basic assumptions of the
multiple regression analysis, and we will proceed our analysis with the case of two
explanatory variables and then we will generalize the multiple regression model for the
case of k-explanatory variables using matrix algebra.
i.e., = for
6. The values of the explanatory variables included in the model are fixed in repeated
sampling
7. Independence of and : Every disturbance term is independent of the
explanatory variables. i.e., ( )= ( )=
This condition is automatically fulfilled if we assume that the values of the
X’s are a set of fixed numbers in all (hypothetical) samples.
8. No perfect multi-collinearity:- The explanatory variables of the models are not
perfectly correlated. That is, no explanatory variable of the model is a linear
combination of the other. Perfect collinearity is a problem, because the least squares
estimator cannot separately attribute variation in to the independent variables.
Example: Suppose we regress weight ( ) on height measured in meters
( ) and height measured in centimeters( ). How could we
decide which regressor to attribute the changing weight to?
9. No model specification error
10. No error of aggregation and so on.
Since the population regression equation is unknown, it has to be estimated from sample
data. That is ( . ) has to be estimated by the sample regression function as follows:
= + + ………………………………( . )
Where, are estimates of the and is known as the predicted value of .
[∑ ]
=− ( − − − )= ……………( . )
[∑ ]
=− ( − − − )= ……………( . )
Simple manipulation of the multiple regression equation produces the following three
equations called OLS Normal Equations: ∑ = + ∑ +
∑ …………………………...( . ) ∑ = ∑ + ∑ +
∑ …………..…( . ) ∑ = ∑ + ∑ +
∑ …………..…( . )
From ( . ) we obtain ,
= − − ………………………………………..( . )
Substituting (3.12) in (3.10), we obtain:
∑ = − − ∑ + ∑ + ∑
⇒ = − − +
⇒ − = − + −
⇒ −
= − + − ………...( . )
We know that
∑( − ) =∑ − =∑
∑( − ) =∑ − =∑
Substituting the above equations in equation ( . ), equation ( . ) can be written in
deviation form as follows:
∑ = ∑ + ∑ …………………..( . )
Following the above procedure, if we substitute (3.13) in (3.12), we obtain;
∑ = ∑ + ∑ …………………………… ( . )
Let’s put ( . ) and ( . ) together
∑ = ∑ + ∑
∑ = ∑ + ∑
, can easily be solved using matrix approach
We can rewrite the above two equations in matrix form as follows.
∑ ∑ = ∑ ………….( . )
∑ ∑ ∑
We can solve the above matrix using ’ and obtain and as follows;
∑ ∑ ∑ ∑
∑ ∑ ∑ ∑
= ∑ ∑
= ∑ ∑
∑ ∑ ∑ ∑
Therefore, we obtain;
∑ .∑ −∑ .∑
= ……………………..( . )
∑ .∑ − (∑ )
∑ . ∑ −∑ .∑
= ……………………..( . )
∑ .∑ − (∑ )
The intercept in the above equation is the predicted value of when = and
= .
Sometimes, setting and both equal to zero is an interesting scenario; in other cases,
it may not make sense. Nevertheless, the intercept is always needed to obtain a prediction
of from the OLS regression line.
The estimates and have partial effect, or ceteris paribus, interpretations. From
the above equation, we have
∆ = ∆ + ∆
So we can obtain the predicted change in given the changes in and . In particular,
when is held fixed, so that ∆ = , then
∆ = ∆ ,
holding fixed. The key point is that, by including in our model, we obtain a
coefficient on with a ceteris paribus interpretation. This is why multiple regression
analysis is so useful. Similarly,
∆ = ∆ ,
holding fixed.
How can one interpret the coefficients of Educ and Exper? (NB: the coefficients
have a percentage interpretation when multiplied by 100)
The coefficient 0.125 means that, holding exper fixed, another year of education is
predicted to increase by . %, on average. Alternatively, if we take two people
with the same levels of experience, the coefficient on educ is the proportionate difference
in predicted wage when their education levels differ by one year. Similarly, the
coefficient of Experience, 0.085 means that holding Educ fixed, another year of related
work experience is predicted to increase by . %, on average.
∑ = ∑( − − )
∑ =∑ ( − − )
= ∑ − ∑ − ∑
= ∑ ∑ =∑ =0
∑ = ∑ ( − − )
∑ = ∑ − ∑ − ∑
∑ =
∑ + ∑ + ∑ … … … … . . (3.20)
∑ + ∑
∴ = = … … … … … … … … … . . (3.21)
∑
∑ ⁄
= −∑ ⁄
=
−( − ) ………………( . )
This measure does not always goes up when a variable is added because of the degree of
freedom term − is the numerator. That is, the primary attractiveness of is that it
imposes a penalty for adding additional regressors to a model. If a regressor is added to
the model then RSS decreases, or at least remains constant. On the other hand, the
degrees of freedom of the regression − always decrease.
An interesting algebraic fact is that if we add a new regressor to a model, increases if, and
only if, the t statistic on the new regressor is greater than 1 in absolute value. Thus, we
see immediately that could be used to decide whether a certain additional regressor
must be included in the model. The has an upper bound that is equal to 1, but it does
not strictly have a lower bound since it can take negative values. While solving one
problem, this corrected measure of goodness of fit unfortunately introduces another one.
It loses its interpretation; is no longer the percent of variation explained.
The general linear regression model with explanatory variables is written in the form:
= + + + ……..+ + ………………( . )
Where, ( = 1, 2, 3, … … … ) and is the size of the observation, and is the intercept,
to partial slope coefficients is stochastic disturbance term and is
observation.
In short = + ……………………………………………………………..( . )
Where: is an ( × ) column vector of true values of .
is an ( × ( + )) matrix of true values of explanatory variables of the
model where the first column of 1’ represents the intercept term.
is a (( + ) × ) column vector of the population parameters
, , ,…., .
is an ( × ) column vector of the population random disturbance (error)
term.
= + ……………………………………………………………..( . )
Where: is a (( + ) × ) column vector of estimates of the true population
parameters
= is an ( × ) column vector of predicted values of the dependent
variable .
is an ( × ) column vector of the sample disturbance (error) term.
As in the two explanatory variables model, in the k-explanatory variable case the OLS
estimators are obtained by minimizing
= ( − − − − ⋯..− ) ……………………( . )
Where, ∑ is the total squared prediction error (or RSS) of the model.
In matrix notation, this amounts to minimizing . That is:
⎡ ⎤
⎢ . ⎥
=[ , ,…, ] . ⎢ . ⎥= + + + ⋯………+ =
⎢ ⎥
⎢ . ⎥
⎣ ⎦
∴ = …………………………………………..( . )
From (3.28) we can derive that; = − = −
Therefore, through substitution in equation 3.30, we get;
= − ( − )
= − − + ……………..( . )
The order of the matrix = ( × ) × ( ×( ) × (( )× = ( × ) , it is
scalar.
Since the matrix is a scalar, has only a single entry, it is equal to its transpose. That
is;
( × ) = [ ( × )]
( × ) = ( × )
= − + ……………..( . )
Minimizing in (3.32) with respect to we get the expressions for OLS estimates in
matrix format as follows:
( )
= = −2 +
To get the above expression, we used the rule of differentiation of matrix notations,
namely;
( ) ( )
= , =
Equating the above expression to null vector, 0, we obtain:
−2 + = ⇒ = OLS Normal Equation
=( ) ……………………………….……….( . )
= − − ⇒ = − −( − )
= − ……………………………..( . )
∑ ∑ …… ∑
Recall that, = = ∑
=
−
∴ =
− ………………………………..( . )
Numerical Example-2:
As illustration, let’s rework our consumption-income example of chapter -2
Observations 1 2 3 4 5 6
Consumption Expenditure (Y) 4 4 7 8 9 10
Monthly Income (X) 5 4 8 10 13 14
⎡ ⎤
⎢ ⎥
……. ⎢ ⎥=
=
……. ⎢ . ⎥
⎢ . ⎥
⎣ ⎦
⎡ ⎤
⎢ ⎥
……. ⎢ ⎥=
=
……. ⎢ . ⎥
⎢ . ⎥
⎣ ⎦
, ;
= and =
Now, find the inverse of the matrix
Recall that the inverse of a matrix can be found as follows:
( ) = [ ( )]
| |
ℎ , ( ) ℎ ℎ
and = (−) | |
( )= −
, and its transpose is
−
−
−
− . − .
( ) = =
− − . .
Where, the determinant of matrix is 504
Therefore,
. − . .
= =( ) = =
− . . .
b) Recall that ∴ =
=[ . . ] = .
⎡ ⎤
⎢ ⎥
=[ .. ] ⎢ ⎥= =
⎢ . ⎥
⎢ . ⎥
⎣ ⎦
= ( )=
. − .
, = = ≅ .
−
3.4. Statistical Properties of OLS Estimators: Matrix Approach
Like in the case of simple linear regression, the OLS estimators satisfy the Gauss-
Markov Theorem in multiple regressions. That is, in the class of linear and unbiased
estimators, OLS estimators are best estimators. Now, we are in a position to examine the
desirable properties of the OLS estimators in matrix notations:
1. Linearity: Proposition
We know that: =( )
To show the above proposition, let =( )
∴ = …………………………………………………..( . )
Since is a matrix of fixed variables, equation (3.39) indicates that is linear in Y.
2. Unbiasedness: Proposition
=( )
=( ) + ) (
= +( ) ………………………………………( . )
, [( ) ]=
( )= ( +( ) )
= ( ) + [( ) ]
= + ( ) ( )
( )= ……………………..……………( . )
Since ( ) =
Thus, least square estimators are unbiased estimators in MLEMs.
3. Minimum variance
Before showing all the OLS estimators are best (possess the minimum variance property),
it is important to derive their variance.
We know that, = − ( = − = − −
− − =
− − − …… − −
⎡ ⎤
⎢ − − − …… − − ⎥
=⎢ . . ⎥
⎢ ⎥
⎢ . . ⎥
⎣ − − − − ……. − ⎦
, …… ,
⎡ ⎤
⎢ , … …. , ⎥
=⎢ . . . ⎥
⎢ ⎥
⎢ . . . ⎥
⎣ , , …….. ⎦
The above matrix is a symmetric matrix containing variances along its main diagonal and
covariance of the estimators everywhere else. This matrix is, therefore, called the
Variance-covariance matrix of least squares estimators of the regression slopes. Thus,
= − − … … … … … … … … … … … . (3.42)
= ( )… … … … … … … … … … … … (3.44)
Note: ( being a scalar can be moved in front or behind of a matrix while identity
matrix can be suppressed).
Thus, we obtain, = ( )
∑ …….. ∑
⎡∑ ⎤
∑ … … .. ∑
⎢ ⎥
Where, ( )=⎢ . . . ⎥
⎢ . . . ⎥
⎣ ∑ ∑ ……. ∑ ⎦
th
We can, therefore, obtain the variance of any estimator say by taking the i term from
the principal diagonal of ( ) and then multiplying it by .
NB, the X’s are in their absolute form.
When the ’ are in deviation form, we can write the multiple regression in matrix
form as; =( )
∑ ∑ …….. ∑
⎡ ⎤ ⎡ ⎤
⎢ ⎥ ⎢∑ ∑ … … .. ∑ ⎥
Where, =⎢ . ⎥ ( )=⎢ . . . ⎥
⎢ . ⎥ ⎢ . . . ⎥
⎣ ⎦ ⎣ ∑ ∑ ……. ∑ ⎦
The above column matrix doesn’t include the constant term . Under such condition,
the variances of slope parameters in deviation form can be written as:
= ( ) …………………………..( . )
In general, for MLEMs with two explanatory variables, the variances of OLS estimates
can be derived as follows. Such a model can be written in deviation form as:
= +
= − −
−
In this model; − =
−
− = − −
−
∴ − − = − −
−
, − − =
− − −
− − −
,
∴ − − =
,
Therefore, in case of two explanatory variables, in deviation form shall be:
⎡ ⎤
⎢ ⎥ ……..
=⎢ . ⎥ = ……..
⎢ . ⎥
⎣ ⎦
⎡ ⎤
…….. ⎢ ⎥
= ⎢ . ⎥ =
……..
⎢ . ⎥
⎣ ⎦
∑ 22 −∑ 1 2
( ) −∑ 1 2 ∑ 21
( ) = =
| | ∑ 21 ∑ 1 2
∑ 1 2 ∑ 22
∑ ∑
∑ ∑
Thus, = ( ) = ∑ ∑
∑ ∑
∑
∴ = …………………………( . )
∑ ∑ − ∑( )
∑
= ……………………………..…( . )
∑ ∑ − (∑ )
(−) ∑
, = ……………………….…( . )
∑ ∑ − (∑ )
The only unknown part in variances and covariance of the estimators is . Thus, we
have to have an unbiased estimate of the variance of the population error . As we have
∑
established in simple regression model = −
is an unbiased estimator of .
For MLEMs with two explanatory variables, we have three parameters including the
constant term and therefore,
∑
= ………………………………….( . )
−
This is all about the variance covariance of the parameters. Now it is time to see the
minimum variance property.
Minimum variance of
To show that all the in the vector are Best Estimators, we have to prove that the
variances obtained in ( . ) are the smallest amongst all other possible linear unbiased
estimators. We follow the same procedure as followed in case of single explanatory
variable model where, we first assumed an alternative linear unbiased estimator and then
it was established that its variance is greater than the estimator of the regression model.
∗
Assume that is an alternative unbiased and linear estimator of , and is given by
∗
= [( ) + ]
Where, is ( ) arbitrary matrix that is a function of X and/or other non-stochastic
variables, but it is not a function of y.
∴ ∗ = [( ) + ][ + ] , Since = +
∗
⇒ = + + [( ) + ] ………………………( . )
Taking expectations on both sides of the above expression, we have
∗
( )= + + [( ) + ] ( )
∗
( )= + , [ ( ) = ]…………………………………..( . )
Since our assumption regarding an alternative ∗ is that it has to be an unbiased estimator
of , that is, ( ∗ ) = ; in other words, ( ) should be a null matrix.
∗
Thus, we say should be zero if = [( ) + ] has to be an unbiased
estimator.
∗
Therefore, is greater than by an expression and it proves that
is the best estimator.
In the above equation, is a matrix of fixed values, because the values of ′s are fixed
in repeated samples and are the true values of parameters of the population. As a
result, in the above equation is linear in , i.e., the dependent variable is a linear
combination of the values of the population random term. Consequently, since the
population random disturbance term is normal by assumption, it follows that the
dependent variable Y is also normal. Since are linear combinations of another normal
random variable ( ), then it follows that OLS’s estimates themselves are normal
random variables. Therefore, the probability distributions of the sampling distribution of
OLS estimates in MLEMs are normal.
That is, the sampling distributions of OLS estimates in MLRMs are normal with mean
values of the true values of their respective population parameters and variances given
by ( ) .
Symbolically:
~ [ , ( ) )] … … … … … … … … … . ( . )
Or Equivalently,
~ , ( ) ) ……………………….( . )
The normality of the sampling distributions of OLS’s estimates around the true values of
the population parameters implies that under AMLRA there is equal chance for any OLS
estimate to over or under estimate the true value of the population parameter in a
particular sample. But the most probable value for an estimate at a particular sample is
the true value of the population parameter.
variable on the dependent variable is significant or not after taking the impact of all other
explanatory variables on the dependent variable into account. To elaborate test of
individual significance, consider the following model of the determinants of Teff farm
productivity.
= + + + … … … … … … … … ….(3.58)
Where: is total output of Teff per hectare of land, and are the amount of
fertilizer used and rainfall, respectively.
Given the above model suppose we need to check whether the application of fertilizer
( ) has a significant effect on agricultural productivity holding the effect of rainfall ( )
on Teff farm productivity constant, i.e., whether fertilizer ( ) is a significant factor in
affecting Teff farm productivity after taking the impact of rainfall on Teff farm
productivity into account. In this case, we test the significance of holding the
influence of on Y constant. Mathematically, test of individual significance involves
testing the following two pairs of null and alternative hypotheses.
. : = B. : =
: ≠ : ≠
The null hypothesis in states that holding constant, has no significant (linear)
influence on . Similarly, the null hypothesis in ‘ ’ states that holding constant,
has no influence on the dependent variable . To test the individual significance of
parameter estimates in MLRMs, we can use the usual statistical techniques of tests. These
include:
: = and : ≠
Step 2: Compute the standard error of the estimate.
∑ ∑
= ( )= ; ℎ , =
∑ ∑ − (∑ ) −
Step 3: Make decision. That means, accept or reject the null-hypothesis. In this case
If > , Accept the null hypothesis. That is, the estimate is not
statistically significant at 5% level of significance. This would imply that holding
constant has no significant linear impact on .
If < , reject the null hypothesis. That is, the estimate is
statistically significant at 5% level of significance. This would imply holding
constant has significant linear impact on .
−
= ⇒ =
( ) ( )
Step 5: Compare and make decision
If < | |, accept the null hypothesis. That is, is not significant at the chosen
level of significance. This would imply holding constant has no significant
linear impact on .
If > | |, reject the null hypothesis and hence accept the alternative one. That
is, is significant at the chosen level of significance. This would imply
holding constant has significant linear impact on Y.
3.5.2 Test of the Overall Significance of MLRMs
This is the process of testing the joint significance of parameter estimates of the model. It
involves checking whether the variation in dependent variable of a model is significantly
explained by the variation in all explanatory variables included in the model. To elaborate
test of the overall significance consider a model:
= + + + + ⋯…..+ +
Given this model we may interested to know whether the variation in the dependent
variable can be attributed to the variation in all explanatory variables of the model or not.
If no amount of the variation in the dependent variable can be attributed to the variation
of explanatory variables included in the model then, none of the explanatory variables
included in the model are relevant. That is, all estimates of slope coefficient will be
statistically not different from zero. On the other hand, if it is possible to attribute
significant proportion of the variation in the dependent variable to the variation in
explanatory variables then, at least one of the explanatory variables included in the model
is relevant. That is, at least one of the estimates of slope coefficient will be statistically
different from zero (significant).
Thus, this test has the following null and alternative hypotheses to test:
: = = ………………..= =
: t least one of the is different from zero
The null hypothesis in a joint hypothesis states that none of the explanatory variables
included in the model are relevant in a sense that no amount of the variation in Y can be
attributed to the variation in all explanatory variables simultaneously. That means if all
explanatory variables of the model are change simultaneously it will left the value of Y
unchanged.
How to approach test of the overall significance of MLRM?
If the null-hypothesis is true, that is if all the explanatory variables included in the model
are irrelevant then, there wouldn’t be a significant explanatory power difference between
the models with and without all the explanatory variables. Thus, test of the overall
significance of MLRMs can be approached by testing whether the difference in
explanatory power of the model with and without all explanatory variables is significant
or not. In this case, if the difference is insignificant we accept the null-hypothesis and
reject it if the difference is significant.
Similarly, this test can be done by comparing the sum of squared errors (RSS) of the
model with and without all explanatory variables. In this case we accept the null-
hypothesis if the difference between the sums of squared errors (RSS) of the model with
and without all explanatory variables is insignificant. The notion of this is
straightforward in a sense that if all explanatory variables are irrelevant then, inclusion of
them in the model contributes insignificant amount to the explanation of the model as a
result the sample prediction error of the model wouldn’t reduce significantly.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors of the
model without the inclusion of all the explanatory variables of the model, i.e., the residual
sum of square of the model obtained assuming that all the explanatory variables are
irrelevant (under the null hypothesis) and Unrestricted Residual Sum of Squares
(URSS) be the sum of squared errors of the model with the inclusion of all explanatory
variables in the model. It is always true that ≥ (why?). To elaborate these
concepts consider the following model
= + + + + ⋯…..+ +
This model is called the unrestricted model. The test of joint hypothesis is given by:
: = = ………………..= =
: t least one of the is different from zero
We know that:
= + ⇒ = −
= ( − )
This sum of squared error is called unrestricted residual sum of square (URSS).
However, if the null hypothesis is assumed to be true, i.e., when all the slope coefficients
are zero the model shrinks to:
= +
This model is called restricted model. Applying OLS we obtain:
∑
= = ………………………………………..( . )
Therefore, = − , but =
= −
∴ ∑ = ∑( − ) = ∑ =
The sum of squared error when the null hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and this is equal to the total sum of
square (TSS).
The ratio:
− ⁄ −
~ ( , ) …………………..( . )
⁄ −
has an − with − and − degrees of freedom for the numerator
and denominator, respectively.
=
=∑ − ∑ − ∑ − ⋯…..………...− ∑ =
. ., =
− ⁄ −
= ~ ( , )
⁄ −
⁄ −
( , ) =…………………………………( . )
⁄ −
If we divide numerator and denominator of the above equation by then:
−
=
−
− ⁄
∴ = ………………………………………( . )
− ⁄ −
This implies that the computed value of F can be calculated either as a ratio of
& or & − . This value is compared with the table value of F which
leaves the probability of in the upper tail of the F-distribution with − & −
degrees of freedom.
If the null hypothesis is not true, then the difference between RRSS and URSS
(or i.e., TSS & RSS) becomes large, implying that the constraints placed on the
model by the null hypothesis have large effect on the ability of the model to fit the
data, and the value of tends to be large. Thus, reject the null hypothesis if the
computed value of F (i.e., F test statistic) becomes too large or the P-value for
the F-statistic is lower than any acceptable level of significance ( ), and vice
versa.
In short, the Decision Rule is to
Reject if > ( − , − ), − < , and vice versa.
Implication: implies that the parameters of the model are jointly
significant or the dependent variable is linearly related to at least one of the
independent variables included in the model.
NB: From the ANOVA table above, we can obtain the F-statistics of the model.