You are on page 1of 10

CHAPTER THREE

The Multiple Linear Regression Model


3.1 Introduction
In the preceding chapter, we discussed the simple linear regression in which we considered a
dependent variable to be a function of one independent variable. In real life, however, a
dependent variable is a function of many explanatory variables. The logic behind the multiple
regression is, therefore, to study many explanatory variables that explain a dependent variable.
This unit introduces this and related concepts that are useful in econometric research. For
instance, in demand studies we study the relationship between quantity demanded of a good and
price of the good, price of substitute goods and the consumer’s income. The model we assume
is:
Yi =  0 + 1 P1 +  2 P2 +  3 X i + ui -------------------- (3.1)

Where Yi = quantity demanded, P1 is price of the good, P2 is price of substitute goods, Xi is

consumer’s income, and  ' s are unknown parameters and ui is the disturbance.

Equation (3.1) is a multiple regression with three explanatory variables. In general for K-
explanatory variable we can write the model as follows:
Yi =  0 + 1 X 1i +  2 X 2i +  3 X 3i + .........+  k X ki + ui ------- (3.2)

Where X k i = (i = 1,2,3,.......,K ) are explanatory variables, Yi is the dependent variable and

 j ( j = 0,1,2,....(k + 1)) are unknown parameters and ui is the disturbance term. The disturbance
term is of similar nature to that in simple regression, reflecting:
- the basic random nature of human responses
- errors of aggregation
- errors of measurement
- errors in specification of the mathematical form of the model
and any other (minor) factors, other than xi that might influence Y.

In this chapter we will first start our discussion with the assumptions of the multiple regressions
and we will proceed our analysis with the case of two explanatory variables and then we will
generalize the multiple regression model in the case of k-explanatory variables using matrix
algebra.

1|Page
3.2. Method of Ordinary Least Squares Revised
3.2.1. Assumptions of Multiple Regression Model
In order to specify our multiple linear regression model and proceed our analysis with regard to
this model, some assumptions are compulsory. But these assumptions are the same as in the
single explanatory variable model developed earlier except the assumption of no perfect
multicollinearity. These assumptions are:
1. Randomness of the error term: The variable u is a real random variable.
2. Zero mean of the error term: E (ui ) = 0

3. Hemoscedasticity: The variance of each ui is the same for all the xi values.

i.e. E (u i ) =  u (constant)
2 2

4. Normality of u: The values of each ui are normally distributed. i.e. U i ~ N (0,  2 )

5. No auto or serial correlation: The values of ui (corresponding to Xi ) are independent

from the values of any other ui (corresponding to Xj ) for i j.

i.e. E (ui u j ) = 0 for xi  j

6. Independence of ui and Xi : Every disturbance term ui is independent of the explanatory

variables. i.e. E (ui X 1i ) = E (ui X 2i ) = 0 This condition is automatically fulfilled if we

assume that the values of the X’s are a set of fixed numbers in all (hypothetical) samples.
7. No perfect multicollinearity: The explanatory variables are not perfectly linearly
correlated. Note that we can’t exclusively list all the assumptions but the above
assumptions are some of the basic assumptions that enable us to proceed our analysis.

3.2.2. A Model with Two Explanatory Variables


In order to understand the nature of multiple regression model easily, we start our analysis with
the case of two explanatory variables, then extend this to the case of k-explanatory variables.
3.2.3 Estimation of parameters of two-explanatory variables model
The model: Y =  0 + 1 X 1 +  2 X 2 + U i ……………………………………(3.3)
is multiple regression with two explanatory variables. The expected value of the above model is
called population regression equation i.e.
E (Y ) =  0 + 1 X 1 +  2 X 2 , Since E (U i ) = 0 . …………………................(3.4)

2|Page
where  i is the population parameters.  0 is referred to as the intercept and  1 and  2 are also

sometimes known as regression slopes of the regression. Note that,  2 for example measures the

effect on E (Y ) of a unit change in X 2 when X 1 is held constant. Since the population regression
equation is unknown to any investigator, it has to be estimated from sample data. Let us suppose
that the sample data has been used to estimate the population regression equation. We leave the
method of estimation unspecified for the present and merely assume that equation (3.4) has been
estimated by sample regression equation, which we write as:
Yˆ = ˆ0 + ˆ1 X 1 + ˆ 2 X 2 ……………………………………………….(3.5)

Where ˆ j are estimates of the  j and Yˆ is known as the predicted value of Y. Now it is time to

state how (3.3) is estimated. Given sample observation on Y , X 1 & X 2 , we estimate (3.3) using
the method of least square (OLS).
Yˆ = ˆ0 + ˆ1 X 1i + ˆ 2 X 2i + ei ……………………………………….(3.6)

is sample relation between Y , X 1 & X 2 .

ei = Yi − Yˆ = Yi − ˆ0 − ˆ1 X 1 − ˆ 2 X 2 …………………………………..(3.7)

To obtain expressions for the least square estimators, we partially differentiate e 2


i with respect

to ˆ0 , ˆ1 and ˆ 2 and set the partial derivatives equal to zero.


  ei2  ( )
= −2 Yi − ˆ0 − ˆ1 X 1i − ˆ 2 X 2i = 0 ………………………. (3.8)
ˆ 0


  ei2  ( )
= −2 X 1i Yi − ˆ0 − ˆ1 X 1i − ˆ1 X 1i = 0 ……………………. (3.9)
ˆ
 1


  ei2  ( )
= −2 X 2i Yi − ˆ0 − ˆ1 X 1i − ˆ 2 X 2i = 0 ………… ………..(3.10)
ˆ 2

Summing from 1 to n, the multiple regression equation produces three Normal Equations:

 Y = nˆ 0 + ˆ1X 1i + ˆ 2 X 2i …………………………………….(3.11)

X 2i iY = ˆ0 X 1i + ˆ1X 12i + ˆ 2 X 1i X 1i …………………………(3.12)

X 2i iY = ˆ0 X 2i + ˆ1X 1i X 2i + ˆ 2 X 22i ………………………...(3.13)

3|Page
From (3.11) we obtain ˆ 0 : ˆ0 = Y − ˆ1 X 1 − ˆ 2 X 2 ---------------------------------------------- (3.14)

Substituting (3.14) in (3.12) , we get:

X Y = (Y − ˆ1 X 1 − ˆ 2 X 2 )X 1i + ˆ1X 1i + ˆ 2 X 2i


2
1i i

 X Y − YˆX 1i = ˆ1 (X 1i − X 1X 2i ) + ˆ 2 (X 1i X 2i − X 2 X 2i )


2
1i i

 X Y − nY X 1i = ˆ 2 (X 1i − nX 1i ) + ˆ 2 (X 1i X 2 − nX 1 X 2 ) ------- (3.15)


2 2
1i i

 (X − Yi ) = (X i Yi − nX i Yi ) = xi yi
2
We know that i

 (X − X i ) = (X i − nX i ) = xi
2 2 2 2
i

Substituting the above equations in equation (3.14), the normal equation (3.12) can be written in
deviation form as follows:

 x y = ˆ x + ˆ 2 x1 x 2 …………………………………………(3.16)
2
1 1 1

Using the above procedure if we substitute (3.14) in (3.13), we get

x y = ˆ1x1 x 2 + ˆ 2 x 2 ………………………………………..(3.17)
2
2

Let’s bring (2.17) and (2.18) together

 x y = ˆ x + ˆ 2 x1 x 2 ……………………………………….(3.18)
2
1 1 1

x y = ˆ1x1 x 2 + ˆ 2 x 2 ……………………………………….(3.19)
2
2

ˆ1 and ̂ 2 can easily be solved using matrix


We can rewrite the above two equations in matrix form as follows.

x 1
2
x x
1 2 ˆ1 = x 2 y ………….(3.20)

x x 1 2 x 2
2
̂ 2 x 3y

If we use Cramer’s rule to solve the above matrix we obtain


x1 y . x2 − x1 x2 . x2 y
2

ˆ1 = …………………………..…………….. (3.21)


x1 . x2 − ( x1 x2 ) 2
2 2

x y . x − x x . x y
2

ˆ 2 = 2 2 1 2 1 2 2 1 ………………….……………………… (3.22)
x1 . x2 − ( x1 x2 )

4|Page
3.3. The coefficient of determination (R2): two explanatory variables case
In the simple regression model, we introduced R2 as a measure of the proportion of variation in
the dependent variable that is explained by variation in the explanatory variable. In multiple
regression model the same measure is relevant, and the same formulas are valid but now we talk
of the proportion of variation in the dependent variable explained by all explanatory variables
included in the model. The coefficient of determination is:
e
2
ESS RSS
R2 = = 1− = 1 − i 2 ------------------------------------- (3.25)
TSS TSS yi

ESS ˆ1x1i y i + ˆ 2 x 2i y i
 R2 = = ----------------------------------(3.27)
TSS y 2

As in simple regression, R2 is also viewed as a measure of the prediction ability of the model
over the sample period, or as a measure of how well the estimated regression fits the data. The
value of R2 is also equal to the squared sample correlation coefficient between Yˆ & Yt . Since the

sample correlation coefficient measures the linear association between two variables, if R 2 is
high, that means there is a close association between the values of Yt and the values of predicted

by the model, Ŷt . In this case, the model is said to “fit” the data well. If R2 is low, there is no

association between the values of Yt and the values predicted by the model, Ŷt and the model

does not fit the data well.


3.3.1 Adjusted Coefficient of Determination ( R 2 )
One difficulty with R 2 is that it can be made large by adding more and more variables, even if
the variables added have no economic justification. Algebraically, it is the fact that as the
variables are added the sum of squared errors (RSS) goes down (it can remain unchanged, but
this is rare) and thus R 2 goes up. If the model contains n-1 variables then R 2 =1. The
manipulation of model just to obtain a high R 2 is not wise. An alternative measure of goodness
of fit, called the adjusted R 2 and often symbolized as R 2 , is usually reported by regression
programs. It is computed as:
ei2 / n − k  n −1 
R = 1−
2
= 1 − (1 − R 2 )  --------------------------------(3.28)
y / n − 1 n−k
2

5|Page
This measure does not always goes up when a variable is added because of the degree of
freedom term n-k is the numerator. As the number of variables k increases, RSS goes down, but
so does n-k. The effect on R 2 depends on the amount by which R 2 falls. While solving one
problem, this corrected measure of goodness of fit unfortunately introduces another one. It
losses its interpretation; R 2 is no longer the percent of variation explained. This modified R 2 is
sometimes used and misused as a device for selecting the appropriate set of explanatory
variables.
3.4. Hypothesis Testing in Multiple Regression Model
In multiple regression models we will undertake two tests of significance. One is significance of
individual parameters of the model. This test of significance is the same as the tests discussed in
simple regression model. The second test is overall significance of the model.
3.4.1 Tests of individual significance
If we invoke the assumption that U i ~. N (0,  2 ) , then we can use either the t-test or standard

error test to test a hypothesis about any individual partial regression coefficient. To illustrate
consider the following example.

Let Y = ˆ0 + ˆ1 X 1 + ˆ 2 X 2 + ei ………………………………… (3.51)

A. H 0 : 1 = 0 B. H 0 :  2 = 0

H 1 : 1  0 H1 :  2  0
The null hypothesis (A) states that, holding X2 constant X1 has no (linear) influence on Y.
Similarly hypothesis (B) states that holding X1 constant, X2 has no influence on the dependent
variable Yi. To test these null hypothesis we will use the following test:
ii. The student’s t-test: We compute the t-ratio for each ˆi

ˆi − 
t* = ~ t n -k , where n is number of observation and k is number of parameters. If
SE( ˆi )
we have 3 parameters, the degree of freedom will be n-3. So;
ˆ 2 −  2
t* = ; with n-3 degree of freedom
SE( ˆ 2 )

In our null hypothesis  2 = 0, the t* becomes:

6|Page
ˆ 2
t* =
SE( ˆ 2 )

• If t*<t (tabulated), we accept the null hypothesis, i.e. we can conclude that ̂ 2 is not
significant and hence the regressor does not appear to contribute to the explanation of
the variations in Y.
• If t*>t (tabulated), we reject the null hypothesis and we accept the alternative one; ̂ 2
is statistically significant. Thus, the greater the value of t* the stronger the evidence
that  i is statistically significant.

3.5.1. Test of Overall Significance


Throughout the previous section we were concerned with testing the significance of the
estimated partial regression coefficients individually, i.e. under the separate hypothesis that each
of the true population partial regression coefficients was zero. In this section we extend this idea
to joint test of the relevance of all the included explanatory variables. Now consider the
following:
Y =  0 + 1 X 1 +  2 X 2 + .........+  k X k + U i

H 0 : 1 =  2 =  3 = ............ =  k = 0

H 1 : at least one of the  k is non-zero

This null hypothesis is a joint hypothesis that 1 ,  2 ,........ k are jointly or simultaneously equal

to zero. A test of such a hypothesis is called a test of overall significance of the observed or
estimated regression line, that is, whether Y is linearly related to X 1 , X 2 ,........X k .

The test procedure for any set of hypothesis can be based on a comparison of the sum of
squared errors from the original, the unrestricted multiple regression model to the sum of
squared errors from a regression model in which the null hypothesis is assumed to be true.
When a null hypothesis is assumed to be true, we in effect place conditions or constraints, on the
values that the parameters can take, and the sum of squared errors increases. The idea of the test
is that if these sum of squared errors are substantially different, then the assumption that the joint
null hypothesis is true has significantly reduced the ability of the model to fit the data, and the
data do not support the null hypothesis.

7|Page
If the null hypothesis is true, we expect that the data are compliable with the conditions placed
on the parameters. Thus, there would be little change in the sum of squared errors when the null
hypothesis is assumed to be true. Let the Restricted Residual Sum of Square (RRSS) be the
sum of squared errors in the model obtained by assuming that the null hypothesis is true and
URSS be the sum of the squared error of the original unrestricted model i.e. unrestricted residual
sum of square (URSS). It is always true that RRSS - URSS  0.

Consider Yˆ = ˆ0 + ˆ1 X 1 + ˆ 2 X 2 + .........+ ˆ k X k + ei .

This model is called unrestricted. The test of joint hypothesis is that:


H 0 : 1 =  2 =  3 = ............ =  k = 0

H 1 : at least one of the  k is different from zero.

We know that: Yˆ = ˆ0 + ˆ1 X 1i + ˆ 2 X 2i + .........+ ˆ k X ki

Yi = Yˆ + e or ei = Yi − Yˆi

ei2 = (Yi − Yˆi ) 2


This sum of squared error is called unrestricted residual sum of square (URSS). This is the case
when the null hypothesis is not true. If the null hypothesis is assumed to be true, i.e. when all the
slope coefficients are zero.
Y = ̂ 0 + ei

̂ 0 =
Y i
=Y → (Applying OLS)…………………………….(3.52)
n
e = Y − ̂ 0 but ̂ 0 = Y

e = Y −Y

ei2 = (Yi − Yˆi ) 2 = y 2 = TSS


The sum of squared error when the null hypothesis is assumed to be true is called Restricted
Residual Sum of Square (RRSS) and this is equal to the total sum of square (TSS).
RRSS − URSS / K − 1
The ratio: ~ F( k −1,n − k ) ……………………… (3.53); (has an F-
URSS / n − K
ditribution with k-1 and n-k degrees of freedom for the numerator and denominator respectively)

8|Page
RRSS = TSS
URSS = ei2 = y 2 − ˆ1yx1 − ˆ 2 yx2 + ..........ˆ k yxk = RSS

(TSS − RSS ) / k − 1 ESS / k − 1


F= Or F = ……………………………. (3.54)
RSS / n − k RSS / n − k
If we divide the above numerator and denominator by y 2 = TSS then:

ESS
/ k −1
TSS R2 / k −1
F= Or F = ………………………………………..(3.55)
RSS
/k −n 1 − R 2
/ n − k
TSS
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or R2 & 1-
R2. If the null hypothesis is not true, then the difference between RRSS and URSS (TSS & RSS)
becomes large, implying that the constraints placed on the model by the null hypothesis have
large effect on the ability of the model to fit the data, and the value of F tends to be large. Thus,
we reject the null hypothesis if the F- test static becomes too large. This value is compared with
the critical value of F which leaves the probability of  in the upper tail of the F-distribution
with k-1 and n-k degree of freedom. If the computed value of F is greater than the critical value
of F (k-1, n-k), then the parameters of the model are jointly significant or the dependent variable
Y is linearly related to the independent variables included in the model.

3.6. Predictions Using Multiple Linear Regressions


The formulas for prediction in the multiple regression are similar to those in the case of simple
regression. Let the estimated regression be
̂1 𝑋1 + 𝛽
̂𝑖 = 𝛼̂ + 𝛽
𝑌 ̂2 𝑋2 ------------------------------------------------------------- (3.56)

Now consider the prediction of Y0 of Y given values X10 of X1, and X20 of X2, respectively. Then
we predict the corresponding value of by
𝑌0 = 𝛼 + 𝛽1 𝑋10 + 𝛽2 𝑋20 + 𝑈0 where 𝑈0 𝑖𝑠 𝑒𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚-------------------------------- (3.57)
Consider
̂1 𝑋1 + 𝛽
̂𝑖 = 𝛼̂ + 𝛽
𝑌 ̂2 𝑋2
Hence, the prediction error is
̂1 −𝛽1 )𝑋1 + (𝛽
𝑌̂0 − 𝑌0 = (𝛼̂ – 𝛼) + (𝛽 ̂2 −𝛽2 )𝑋2 − 𝑈0 ------------------------------(3.58)
Since (𝛼̂ – 𝛼) = 0, 𝐸(𝛽 ̂1 − 𝛽1 ) = 0, 𝐸(𝛽
̂2 − 𝛽2 ) = 0 𝑎𝑛𝑑 𝐸(𝑈0 ) = 0, we have
𝐸(𝑌̂0 − 𝑌0 ) = 0

9|Page
This equation shows that the predictant given by equation (3.58) is unbiased. Recall that the
predictant is unbiased in the sense that 𝐸(𝑌̂0 ) = 𝐸(𝑌0 ), since both 𝑌̂0 𝑎𝑛𝑑 𝑌0 are random
variables.

10 | P a g e

You might also like