You are on page 1of 13

CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

CHAPTER THREE
MULTIPLE LINEAR REGRESSION ANALYSIS: ESTIMATION AND HYPOTHESIS TESTING
3.1 Introduction
So far, we have been discussing about the simplest form of regression analysis called simple linear regression
analysis. However, the most realistic representation of real world economic relationships is obtained from
multiple regression analysis. This is because most financial and economic variables are determined by more
than a single variable.

In this chapter we expand the SLR model of chapter 2 to a Multiple Regression Model, which means that there
is not one explanatory variable, but there are k explanatory variables.
𝒀𝒊 = 𝜶 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊
where 𝑌 is the dependent variable value of observation i, 𝛼 is the constant (the intercept), 𝛽 is the coefficient
(or the slope) for the kth explanatory variable, 𝑋 is the value of the kth variable for observation i, 𝑢 is the error
term of observation i.

There are three advantages of MLR over SLR modeling:


1. Multiple regression analysis can be used to build better models for predicting the dependent variable,
because the Y can now be predicted based on more information (X’s). The R2 of a model will, therefore,
always be higher if additional independent variables are added.
2. Secondly, a single MLR model can be used to estimate the partial effects of so many variables on the
dependent variable. We can model the effect of various X’s on the Y at once (in one model).
3. Lastly, and most importantly, MLR allows us to control for other variables that may affect Y. If we, for
example, want to find out the effect of age (X) on salary (Y), we will likely find a positive  with SLR.
However, not age, but other factors which are correlated with age may be the reason for a higher salary.
This model suffers of omitted variable bias. Variables that are relevant, but omitted, may, for example, be:
experience or years of schooling. If we add all three in a MLR model, we can see the effect of age on salary
controlling for experience and years of schooling. In that case, we can easier reach a ceteris paribus
conclusion (if X changes by 1, ceteris paribus, Y changes by ). Multiple regression analysis is more
amenable to ceteris paribus analysis because it allows us to explicitly control for many other factors that
simultaneously affect the dependent variable. This is important both for testing financial and economic
theories and for evaluating policy effects when we must rely on non-experimental data.

As discussed in chapter 2, the assumptions of linear regression include the following.


1. The model is linear in parameters.
2. The model is correctly specified & the data is correctly aggregated.
3. Randomness of the error term: The variable 𝑈 is a real random variable, with a mean of 0, 𝐸 (𝑈 ) = 0.
4. Homoscedasticity: The variance of each 𝑈 is the same for all the 𝑋 values.
i.e. 𝐸 (𝑈 ) = 𝛿 (constant variance)
5. Normality of 𝐔𝐢 : The values of each 𝑈 are normally distributed.
i.e., 𝑈 ~𝑁(0, 𝛿 )
6. No auto- or serial-correlation: The values of 𝑈 (corresponding to 𝑋 ) are independent from the values of
any other 𝑈 (corresponding to 𝑋 ) for 𝑖 𝑗. i.e., 𝐸 𝑈 𝑈 = 0 for 𝑖 𝑗

Instructor: Sileshi A. & Geertje D. Page 1


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

7. Independence of 𝐔𝐢 and 𝐗 𝐢 : Every disturbance term 𝑈 is independent of the explanatory variables. i.e.,
𝐸 (𝑈 𝑋 ) = 𝐸 (𝑈 𝑋 ) = 0. This means that there is no omitted variable bias.
8. X values do vary in the sample.
9. No perfect multicollinearity: The explanatory variables of the models are not perfectly correlated. That is,
no explanatory variable of the model is a linear combination of the other. Perfect collinearity is a problem,
because the least squares estimator cannot separately attribute variation in 𝑌 to the independent variables.
 Example: Suppose we regress weight (𝑌) on height measured in meters (𝑋 ) and height measured
in centimeters(𝑋 ). How could we decide which regressor to attribute the changing weight to?

3.2 Estimation: method of least squares


As in the case of SLR, we can estimate the relationship between variables of a multiple regression model using
the method of Ordinary Least Squares (OLS). To understand the nature of multiple regression analysis, we start
our analysis with the case of two explanatory variables. Once getting this principle, you get a reading
assignment about the case of more than two explanatory variables. The examples in this text are, however, very
simplified. In reality it is preferred to use software to find the best parameters. Like SLR, the MLR models can
also be estimated by MoM and MLM methods, but it goes beyond the scope of this course to explain them in
depth. Therefore, we only focus our discussion on MLR by using OLS regression.

3.2.1 Estimation of the model with two explanatory variables


A MLR model with two explanatory variables is expressed mathematically as:
𝒀𝒊 = 𝜶 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝑼𝒊 (3.1)
This equation is the population regression function (PRF) and can be estimated based on the sample regression
function of the following form:
𝒀𝒊 = 𝜶 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝒆𝒊 (3.2)
From (𝟑. 𝟓) we can obtain the prediction error/residuals of the model as:
𝒆𝒊 = 𝒀𝒊 − 𝒀𝒊 = 𝒀𝒊 − 𝜶 − 𝜷𝟏𝑿𝟏𝒊 − 𝜷𝟐𝑿𝟐𝒊 (3.3)
The method of ordinary least squares chooses the estimates that minimize the squared prediction error of
the model/sum of squared residuals. Therefore, squaring and summing (𝟑. 𝟑) for all sample values of the
variables, we get the total of squared prediction error of the model given below:
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝜶 − 𝜷𝟏 𝑿𝟏𝒊 − 𝜷𝟐 𝑿𝟐𝒊 )𝟐 (3.4)
Therefore, to obtain expressions for the least square estimators, we partially differentiate ∑ 𝒆𝟐𝒊 with respect
to 𝜶, 𝜷𝟏 𝑎𝑛𝑑 𝜷𝟐 and set the partial derivatives equal to zero.
𝝏[∑ 𝒆𝟐𝒊 ]
= −𝟐 ∑(𝒀𝒊 − 𝜶 − 𝜷𝟏 𝑿𝟏𝒊 − 𝜷𝟐 𝑿𝟐𝒊 ) = 𝟎 (3.5)
𝝏𝜶
𝝏[∑ 𝒆𝟐𝒊 ]
𝝏𝜷𝟏
= −𝟐 ∑ 𝑿𝟏𝒊 (𝒀𝒊 − 𝜶 − 𝜷𝟏 𝑿𝟏𝒊 − 𝜷𝟐 𝑿𝟐𝒊 ) = 𝟎 (3.6)
𝝏[∑ 𝒆𝟐𝒊 ]
= −𝟐 ∑ 𝑿𝟐𝒊 (𝒀𝒊 − 𝜶 − 𝜷𝟏 𝑿𝟏𝒊 − 𝜷𝟐 𝑿𝟐𝒊 ) = 𝟎 (3.7)
𝝏𝜷𝟐
Manipulation of the multiple regression equation produces the following three equations called OLS Normal
Equations:

Instructor: Sileshi A. & Geertje D. Page 2


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

∑ 𝒀𝒊 = 𝒏𝜶 + 𝜷𝟏 ∑ 𝑿𝟏𝒊 + 𝜷𝟐 ∑ 𝑿𝟐𝒊 (3.8)


∑ 𝒀𝒊 𝑿𝟏𝒊 = 𝜶 ∑ 𝑿𝟏𝒊 + 𝜷𝟏 ∑ 𝑿𝟐𝟏𝒊 + 𝜷𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 (3.9)
∑ 𝒀𝒊 𝑿𝟐𝒊 = 𝜶 ∑ 𝑿𝟐𝒊 + 𝜷𝟏 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 + 𝜷𝟐 ∑ 𝑿𝟐𝟐𝒊 (3.10)
From (𝟑. 𝟖) we obtain 𝜶 as:
𝜶 = 𝒀 − 𝜷𝟏 𝑿 𝟏 − 𝜷𝟐 𝑿 𝟐 (3.11)
Substituting (3.11) in (3.9), we obtain:
𝒀𝒊 𝑿𝟏𝒊 = 𝒀 − 𝜷𝟏 𝑿𝟏 − 𝜷𝟐 𝑿𝟐 𝑿𝟏𝒊 + 𝜷𝟏 𝑿𝟐𝟏𝒊 + 𝜷𝟐 𝑿𝟏𝒊 𝑿𝟐𝒊

⇒ 𝒀𝒊 𝑿𝟏𝒊 = 𝒀 𝑿𝟏𝒊 − 𝜷𝟏 𝑿𝟏 𝑿𝟏𝒊 − 𝜷𝟐 𝑿𝟐 𝑿𝟏𝒊 + 𝜷𝟏 𝑿𝟐𝟏𝒊 + 𝜷𝟐 𝑿𝟏𝒊 𝑿𝟐𝒊

⇒ 𝒀𝒊 𝑿𝟏𝒊 − 𝒀 𝑿𝟏𝒊 = 𝜷𝟏 𝑿𝟐𝟏𝒊 − 𝑿𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟏𝒊 𝑿𝟐𝒊 − 𝑿𝟐 𝑿𝟏𝒊

⇒ 𝒀𝒊 𝑿𝟏𝒊 − 𝒏 𝒀𝑿𝟏 = 𝜷𝟏 𝑿𝟐𝟏𝒊 − 𝒏𝑿𝟐𝟏 + 𝜷𝟐 𝑿𝟏𝒊 𝑿𝟐𝒊 − 𝒏𝑿𝟏 𝑿𝟐 (𝟑. 𝟏𝟐)


To simplify the notation, we define the small letters as follows:
𝒙𝒊 = 𝑿𝒊 − 𝑿 and 𝒚𝒊 = 𝒀𝒊 − 𝒀
Therefore,
𝑿𝒊 𝒀𝒊 − 𝒏𝑿𝒀 = 𝒙𝒊 𝒚𝒊

(𝑿𝒊 − 𝑿)𝟐 = 𝑿𝟐𝒊 − 𝒏𝑿𝟐 = 𝒙𝟐𝒊


Substituting the above expressions, equation (𝟑. 𝟏𝟐) can be written in deviation form as:
∑ 𝒙𝟏𝒊 𝒚𝒊 = 𝜷𝟏 ∑ 𝒙𝟐𝟏𝒊 + 𝜷𝟐 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 (3.13)
If we substitute (3.11) in (3.10) and follow similar procedure as above, we obtain;
∑ 𝒙𝟐𝒊 𝒚𝒊 = 𝜷𝟏 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 + 𝜷𝟐 ∑ 𝒙𝟐𝟐𝒊 (3.14)
Let’s put (𝟑. 𝟏𝟑) and (𝟑. 𝟏𝟒) together

𝒙𝟏𝒊 𝒚𝒊 = 𝜷𝟏 𝒙𝟐𝟏𝒊 + 𝜷𝟐 𝒙𝟏𝒊 𝒙𝟐𝒊

𝒙𝟐𝒊 𝒚𝒊 = 𝜷𝟏 𝒙𝟏𝒊 𝒙𝟐𝒊 + 𝜷𝟐 𝒙𝟐𝟐𝒊

Therefore, 𝜷𝟏 𝑎𝑛𝑑 𝜷𝟐 can be solved using matrix approach.

We can rewrite the above two equations in matrix form as follows.


∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 𝜷𝟏 = ∑ 𝒙𝟏𝒊 𝒚𝒊 (𝟑. 𝟏𝟓)
∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊 𝜷𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊

We can solve the above matrix using 𝐂𝐫𝐚𝐦𝐞𝐫’𝐬 𝐫𝐮𝐥𝐞 and obtain 𝜷𝟏 and 𝜷𝟐 as follows
∑ 𝒙𝟏𝒊 𝒚𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒚𝒊
∑ 𝒙𝟐𝒊 𝒚𝒊 ∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊 𝒚𝒊
𝜷𝟏 = ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
𝑎𝑛𝑑 𝜷𝟐 = ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊

Therefore, we obtain;

Instructor: Sileshi A. & Geertje D. Page 3


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

∑ 𝒙𝟏𝒊 𝒚𝒊 .∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 .∑ 𝒙𝟐𝒊 𝒚𝒊


𝜷𝟏 = ∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐
(3.16)

∑ 𝒙𝟐𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 .∑ 𝒙𝟏𝒊 𝒚𝒊


𝜷𝟐 = ∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐
(3.17)

Numerical example of calculating MLR with two independent variables


No of
Branch size Number of
customers
in 𝒎𝟐 employees 𝒚𝒊 𝒙𝟏𝒊 𝒙𝟐𝒊 𝒙𝟏𝒊 𝟐 𝒙𝟐𝒊 𝟐 𝒙𝟏𝒊 ∗ 𝒚𝒊 𝒙𝟐𝒊 ∗ 𝒚𝒊 𝒙𝟏𝒊 ∗ 𝒙𝟐𝒊
served per
(𝑿𝟏 ) (𝑿𝟐 )
hour (𝒀)
320 60 24 120 25 15 625 225 3000 1800 375
180 30 6 -20 -5 -3 25 9 100 60 15
280 40 10 80 5 1 25 1 400 80 5
20 10 1 -180 -25 -8 625 64 4500 1440 200
200 35 4 0 0 -5 0 25 0 0 0
mean: 200 mean: 35 mean: 9 SUM 1300 324 8000 3380 595
Consider the average number of customers served per hour in 5 bank branches of a certain bank. Y is the
number of customers, X1 is the square meter floor (the size of the bank) and the number of employees in this
branch is X2. The aim is to predict the number of customers served per day (Y) based on the size and the
number of employees.
𝒀 = 𝜶 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊
Now, let’s find the optimal parameter estimates by using the above stated formulas.
∑ 𝒙𝟏𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟐𝒊 − ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 . ∑ 𝒙𝟐𝒊 𝒚𝒊 8000 ∗ 324 − 595 ∗ 3380
𝜷𝟏 = = ≈ 8.65
∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 − (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐 1300 ∗ 324 − 595
This means that, after controlling for the number of employees, the number of customers served increases by
approximately 9 customers if the branch size expands by 1 square meter.
∑ 𝒙𝟐𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟏𝒊 − ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 . ∑ 𝒙𝟏𝒊 𝒚𝒊 3380 ∗ 1300 − 595 ∗ 8000
𝜷𝟐 = = ≈ −5.45
∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 − (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐 1300 ∗ 324 − 595
This means that, after controlling for the branch size in square meter, the number of customers served decreases
by approximately 5 if one employee is added in the branch.
𝜶 = 𝒀 − 𝜷𝟏 𝑿𝟏𝒊 − 𝜷𝟐 𝑿𝟐𝒊 = 200 − 8.65 ∗ 35 − 5.45 ∗ 9 ≈ −53.63
This means that, for the fictive case with 0 squared meter and 0 employees, the expected number of customers
served per day is negative 54.

In general, the estimates 𝜷𝟏 and 𝜷𝟐 have partial effect, or ceteris paribus, interpretations. From the above
equation, we have
∆𝒀𝒊 = 𝜷𝟏 ∆𝑿𝟏 + 𝜷𝟐 ∆𝑿𝟐
So, we can obtain the predicted change in 𝒀 given the changes in 𝑿𝟏 and 𝑿𝟐 . In particular, when 𝑿𝟐 is held
fixed, so that ∆𝑿𝟐 = 𝟎, then
∆𝒀𝒊 = 𝜷𝟏∆𝑿𝟏 ,
holding 𝑿𝟐 fixed. The key point is that, by including 𝑿𝟐 in our model, we obtain a coefficient on 𝑿𝟏 with a
ceteris paribus interpretation. This is why multiple regression analysis is so useful. Similarly,
∆𝒀𝒊 = 𝜷𝟏∆𝑿𝟐 , holding 𝑿𝟏 fixed.

Instructor: Sileshi A. & Geertje D. Page 4


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

Example of parameter estimate interpretation in MLR


Suppose an econometrician has estimated the following wage model based on a sample of 100 individuals from
a given city:
𝒍𝒏(𝑾𝒂𝒈𝒆 ) = 𝟎. 𝟕𝟓 + 𝟎. 𝟏𝟐𝟓𝑬𝒅𝒖𝒄𝒊 + 𝟎. 𝟎𝟖𝟓𝑬𝒙𝒑𝒆𝒓
Where, 𝒍𝒏(𝑾𝒂𝒈𝒆) is the natural logarithm of hourly wage measured in ETB
Educ is education attainment of sample respondent measured in years of schooling &
𝑬𝒙𝒑𝒆𝒓 is experience measured in years of related work experience

How can one interpret the coefficients of Educ and Exper? (NB: the coefficients have a percentage
interpretation when multiplied by 100)

The coefficient 0.125 means that, holding exper fixed, another year of education is predicted to increase
𝐥𝐧(𝐰𝐚𝐠𝐞) by 𝟏𝟐. 𝟓% increase in wage, on average. Alternatively, if we take two people with the same levels
of experience, the coefficient on educ is the proportionate difference in predicted wage when their education
levels differ by one year. Similarly, the coefficient of Experience, 0.085 means that holding Educ fixed, another
year of related work experience is predicted to increase 𝐥𝐧(𝐰𝐚𝐠𝐞) by 𝟖. 𝟓%, on average.

Example of MLR with Stata output


The Unilever team of Lifeboy soap wants to increase sales. To this end, they made radio and television
advertisements (ads). To investigate the effect of these ads on sales, they have collected the following data:
Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Sales (Y) 17578 46978 35875 12020 74068 20497 47428 45094 18917 26030 50671 30976 92430 51963 27483 41362 100843
radio 101 97 87 26 177 59 141 259 200 240 196 84 245 202 139 242 239
adds (X1)
TVadds 2 28 26 32 23 26 7 36 16 27 50 50 9 6 19 39 30
(X2)

They use Stata software to analyse the model


𝑌 = 𝛼+𝛽 𝑋 +𝛽 𝑋 +𝑒
Where 𝑌 is the soap sales of week i, 𝛼 is the constant/intercept, 𝛽 is the coefficient for the effect of radio ads
on sales, 𝑋 is the radio ads in week i. 𝛽 is the coefficient for the effect of tv ads on sales, 𝑋 is the tv ads in
week i and 𝑒 is the random error term. The Stata output is displayed in Figure 1. Notice that, like the output of
SLR as seen in chapter 2, the confidence intervals of the parameter estimates are also displayed. The formula to
calculate the confidence interval is the same: 𝜷 = 𝜷 ± 𝒕𝜶 . 𝑺𝑬 𝜷 .
𝟐

Figure 1: Predicting sales based on the number of radio and tv ads


From this output we can find that 𝛼 = 14661, 𝛽 = 192 and 𝛽 = −77, so, to predict the soap sales based on
the number of radio and tv ads, we can use the following formula: 𝑌 = 14661 + 192𝑋 − 77𝑋

Instructor: Sileshi A. & Geertje D. Page 5


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

How can we interpret this estimation? 𝛼 = 14661 means that, if there are no radio and tv ads (𝑋 = 0 and
𝑋 = 0), we expect the sales to be 14661 blocks of soap. 𝛽 = 192 means that if radio ads increase by one, we
expect the soap sales to increase by 192 blocks, assuming that tv ads are unchanged (ceteris paribus, thus
controlling for the number of tv ads). This is a significant effect (t:2.54, p:0.023, which is less than 0.05).
𝛽 = −77 means that if the team adds one tv ad, the soap sales are expected to drop by 77 blocks (assuming
that radio ads stay the same, thus controlling for the number of radio ads). This is not in line with our
expectation, as theory predicts that ads will increase sales. A possible reason may be that the tv ad is of bad
quality (not convincing the consumers to buy the soap). Also, when looking at the t-test results, we see that 𝛽
does not significantly differ from 0 (t:-0.2, p:0.393, which is more than 0.05). Therefore, we can conclude that
tv ads do not influence soap sales.

The implication of this model for the management is that they should rather focus on radio advertisement to
boost sales, than focus on tv advertisement. If they want to use tv advertisement, they must change the content
of the ad, as the current ad does not lead to increased sales. For finding the method to calculate MLR with >2
explanatory variables by hand, read the Reading Assignment Chapter 3.

3.3 Coefficients of multiple determination


As you have seen in chapter 2, the strength of a linear regression model can be computed by calculating R2,
which is the explained variance as a ratio of the total variance of the dependent variable. In the multiple
regression model the same measure is relevant, and the same formulas are valid but now we talk of the
proportion of variation in the dependent variable explained by all explanatory variables included in the model.
The coefficient of determination is defined as:
𝑴𝑺𝑺 𝑹𝑺𝑺 ∑ 𝒆𝟐𝒊
𝑹𝟐 = =𝟏− =𝟏−∑ (3.18)
𝑻𝑺𝑺 𝑻𝑺𝑺 𝒚𝟐𝒊

In the present model of two explanatory variables:


∑ 𝑒 = ∑(𝑦 − 𝛽 𝑥 − 𝛽 𝑥 )
∑ 𝑒 = ∑ 𝑒 (𝑦 − 𝛽 𝑥 − 𝛽 𝑥 )
= ∑𝑒 𝑦 − 𝛽 ∑𝑥 𝑒 − 𝛽 ∑𝑥 𝑒
= ∑𝑒 𝑦 𝑆𝑖𝑛𝑐𝑒 ∑ 𝑥 𝑒 = ∑ 𝑒 𝑥 = 0
∑ 𝑒 = ∑ 𝑦 (𝑦 − 𝛽 𝑥 − 𝛽 𝑥 )
∑𝑒 = ∑𝑦 − 𝛽 ∑𝑥 𝑦 − 𝛽 ∑𝑥 𝑦

∑𝑦 = 𝛽 ∑𝑥 𝑦 + 𝛽 ∑𝑥 𝑦 + ∑ 𝑒 … … … … . . (3.19)

𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑆𝑢𝑚 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑆𝑢𝑚


𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

𝑬𝑺𝑺 𝜷𝟏 ∑ 𝒙𝟏𝒊 𝒚𝒊 + 𝜷𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊


∴ 𝑹𝟐 = = … … … … … … … … … . . (3.20)
𝑻𝑺𝑺 ∑ 𝒚𝟐𝒊

Instructor: Sileshi A. & Geertje D. Page 6


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

As in simple regression, R2 is also viewed as a measure of the prediction ability of the model over the sample
period, or as a measure of how well the estimated regression fits the data. If 𝑹𝟐 is high, the model is said to
“fit” the data well. On the other hand, if 𝑹𝟐 is low, the model does not fit the data well.

Adjusted Coefficient of Determination (𝑹𝟐)


One major limitation with 𝑹𝟐 is that it can be made large by adding more and more variables, even if the
variables added have no theoretical justification. Algebraically, it is the fact that as the variables are added, the
sum of squared errors (RSS) goes down (it can remain unchanged, but this is rare) and thus 𝑹𝟐 goes up. Adding
explanatory variables just to obtain a high 𝑹𝟐 is not wise. To overcome this limitation of the 𝑹𝟐 , we can
“adjust” it in a way that takes into account the number of variables included in a given model. This alternative
measure of goodness of fit, called the adjusted 𝑹𝟐 and often symbolized as 𝑹𝟐 , is usually reported by regression
programs. It is computed as:
∑ 𝒆𝟐 ⁄𝒏 𝒌 𝒏 𝟏
𝑹𝟐 = 𝟏 − ∑ 𝒚𝒊𝟐 ⁄𝒏 𝟏
= 𝟏 − (𝟏 − 𝑹𝟐 ) 𝒏 𝒌
(3.21)
𝒊

Where n is the sample size and k is the number of parameter estimates ( and ’s) in the model. This measure
does not always goes up when a variable is added because of the degree of freedom term, 𝒏 − 𝒌 is the
numerator. That is, 𝑹𝟐 imposes a penalty for adding additional regressors to a model. If a regressor is added to
the model then RSS decreases, or at least remains constant. On the other hand, the degrees of freedom of the
regression 𝒏 − 𝒌 always decrease.

Example reading the coefficient of multiple determinations from Stata output


Correcting for the degrees
of freedom, the Mean Sum
Decomposition of the variation.
of Squares (MS) for model,
Note that R-squared and
residual and total
ModelSS+ResidualSS=TotalSS Adjusted R-
squared

Figure 2: Decomposition of the variation, leading to R-squared and Adjusted R-squared

Consider the example given in Figure 2, about the effect of advertisement (radio and tv) on sales of soap. In the
output, the TSS, ESS and RSS are displayed. The R2 is
𝑴𝑺𝑺 𝟑. 𝟐𝟖𝟗𝟗𝒆 + 𝟎𝟗
𝑹𝟐 = = = 𝟎. 𝟑𝟏𝟕𝟗
𝑻𝑺𝑺 𝟏. 𝟎𝟑𝟒𝟖𝒆 + 𝟏𝟎

Instructor: Sileshi A. & Geertje D. Page 7


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

The adjusted R2 considers not the SS (sum of squares), but the MS (mean squares). If the RSS is divided by n-k
(in this case 17-3=14), the RMS becomes 504120809. If the TSS is divided by n-1 (in this case 17-1=16), the
TMS becomes 646722450. This allows us to calculate the adjusted R2 as follows:
∑ 𝒆𝟐𝒊 ⁄𝒏 − 𝒌 𝟕. 𝟎𝟓𝟕𝟕𝒆 + 𝟎𝟗⁄𝟏𝟒
𝑹𝟐 = 𝟏 − = 𝟏 − = 𝟎. 𝟐𝟐𝟎𝟓
∑ 𝒚𝟐𝒊 ⁄𝒏 − 𝟏 𝟏. 𝟎𝟑𝟒𝟖𝒆 + 𝟏𝟎⁄𝟏𝟔

3.4 Properties of least squares and Gauss-Markov Theorem


Like in the case of simple linear regression, the OLS estimators (𝛼 and the 𝛽 ’s) satisfy the Gauss-Markov
Theorem in multiple linear regression if all assumptions as stated in 3.1 are met. In other words: in the class of
linear and unbiased estimators, OLS estimators are best estimators.
Recall that they are BLUE. Best, Linear, Unbiased and Efficient.
Linear indicates that the explanatory variables linearly affect the explained variable. As the effect of a change
in an X on Y is constant, the OLS method facilitates this linearity.
Unbiased indicates that the 𝛽 ’s are likely close to their true  equivalent. For example, if 𝛽 = 23.4, and if we
would draw 100 samples from the data, and each time calculate 𝛽 , we expect that the mean value of these 100
𝛽 ’s is 23.4, so that all estimates are close to the true population parameter value.
Efficient indicates unbiased, but with minimum variance. To continue our example above, where 𝛽 = 23.4,
the 100 estimates of 𝛽 could have large or low deviation from the true 𝛽 . Efficient indicates a small deviation
(a small variance) of possible 𝛽 ’s from the true 𝛽 . This is the case by using the OLS method.
For extensive proof that OLS estimators are BLUE, please read the reading assignment ch 3.

3.5 Hypothesis testing and interval estimation


In multiple regression models, we will undertake two types of tests of significance. These are test of individual
and overall significance. Let’s examine them one by one.

A) Tests of Individual Significance—Student’s t-test


This is the process of verifying the individual statistical significance of each parameter estimates of a model.
That is, checking whether the impact of a single explanatory variable on the dependent variable is significant or
not after taking the impact of all other explanatory variables on the dependent variable into account. To
elaborate test of individual significance, consider the following model of the determinants of Teff farm
productivity.
Let 𝒀 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + 𝒆𝒊 (3.22)
Where: 𝒀 is total output of Teff per hectare of land, 𝑿𝟏 and 𝑿𝟐 are the amount of fertilizer used and rainfall,
respectively.
Given the above model suppose we need to check whether the application of fertilizer (𝑿𝟏 ) has a significant
effect on agricultural productivity, holding the effect of rainfall (𝑿𝟐 ) on Teff farm productivity constant, i.e.,
whether fertilizer (𝑿𝟏 ) is a significant factor in affecting Teff farm productivity after taking the impact of
rainfall on Teff farm productivity into account. In this case, we test the significance of 𝜷𝟏 holding the influence

Instructor: Sileshi A. & Geertje D. Page 8


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

of 𝑿𝟐 on Y constant. Mathematically, test of individual significance involves testing the following two pairs of
null and alternative hypotheses.
𝑨. 𝑯𝟎 : 𝜷𝟏 = 𝟎 B. 𝑯𝟎 : 𝜷𝟐 = 𝟎
𝑯𝑨 : 𝜷𝟏 ≠ 𝟎 𝑯𝑨 : 𝜷𝟐 ≠ 𝟎
The null hypothesis in 𝐴 states that holding 𝑿𝟐 constant, 𝑿𝟏 has no significant (linear) influence on 𝒀. Similarly,
the null hypothesis in ‘𝑩’ states that holding 𝑿𝟏 constant, 𝑿𝟐 has no influence on the dependent variable 𝒀. To
test the individual significance of parameter estimates in MLRMs, it is common to use the Student’s t-test (like
you’ve seen in chapter 2)

The procedure is as follows.


Step 1: State the null and alternative hypotheses empirically.
𝑯𝟎 : 𝜷𝟏 = 𝟎 and 𝑯𝑨 : 𝜷𝟏 ≠ 𝟎
Step 2: Choose the level of significance (𝜶).
Step 3: Determine the critical values and identify the acceptance and rejection regions of the null hypothesis at
the chosen level of significance (𝜶) and the degrees of freedom (𝒏 − 𝒌). To identify the critical values divide
level of significance (α) into two then read table value, 𝒕𝒕 from t-probability table at 𝜶 𝟐 with (𝒏 − 𝒌), where 𝒏
is number of observations and 𝒌 is number of parameters in the model including the intercept term. In case of
two explanatory variables model, the number of parameters is 3, then the degree of freedom is 𝒏 − 𝟑.
Step 4: Compute the t-statistics (𝒕𝑪 ) of the estimates under the null hypothesis. That is,
𝜷𝟏 − 𝜷 𝟏
𝒕𝑪 =
𝑺𝑬 𝜷𝟏
Since 𝜷𝟏 = 𝟎 in the null hypothesis, the computed t-statistics (𝒕𝑪 ) of the estimate 𝜷𝟏 is
𝜷𝟏 − 𝟎 𝜷𝟏
𝒕𝑪 = ⇒ 𝒕𝑪 =
𝑺𝑬(𝜷𝟏 ) 𝑺𝑬(𝜷𝟏 )
Step 5: Compare 𝒕𝑪 𝒂𝒏𝒅 𝒕𝒕 and make decision
 If 𝒕𝑪 < |𝒕𝒕 |, accept the null hypothesis. That is, 𝜷𝟏 is not significant at the chosen level of significance.
This would imply that, holding 𝑿𝟐 constant, 𝑿𝟏 has no significant linear impact on 𝒀.
 If 𝒕𝑪 > |𝒕𝒕 |, reject the null hypothesis and hence accept the alternative one. That is, 𝜷𝟏 is significant at
the chosen level of significance. This would imply, holding 𝑿𝟐 constant, 𝑿𝟏 has significant linear impact
on Y.

Like you’ve seen in chapter 2, Stata output presents the t-test statistic (tc) and the corresponding p-value (the
probability to get this (or a more extreme) t if H0 is true). Note that the output in Figure 3 indicates that the
constant () is 14661, with a t of 0.88, and a p-value of 0.393. As the p-value of the constant is > 0.05, we can
conclude that  is not statistically significant from 0.

𝛽 , the coefficient for the explanatory variable “radio_ads” is 191.66, with a t-value of 2.54 and a p-value of
0.023. As the p-value<0.05, it can be concluded that 𝛽 is significantly different from 0. As the coefficient is

Instructor: Sileshi A. & Geertje D. Page 9


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

positive, we can state that, after controlling for tv_ads, radio_ads has a significant positive effect on sales. What
about tv_ads? Make the answer by yourself & validate it answer in the class.

Notice that the t-value for radio_ads is


𝜷𝟏 𝟎 𝟏𝟗𝟏.𝟔𝟓𝟓𝟏 The p-value is 0.023, which is below 0.05,
= = 𝟐. 𝟓𝟒 so 𝜷𝟏 significantly different from zero
𝑺𝑬(𝜷𝟏 ) 𝟕𝟓.𝟑𝟑𝟓𝟒𝟐

Figure 3: Coefficient output of Multiple Linear Regression, including t- and p-values

B) Test of the Overall Significance --- the F-test


3.5.2 Test of the Overall Significance of MLRMs
This is the process of testing the joint significance of parameter estimates of the model. It involves checking
whether the variation in dependent variable of a model is significantly explained by the variation in all
explanatory variables included in the model. To elaborate test of the overall significance consider a model:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + 𝜷𝟑 𝑿𝟑 + ⋯ … . . +𝜷𝒌 𝑿𝒌 +𝒆𝒊
Given this model we may interested to know whether the variation in the dependent variable can be attributed to
the variation in all explanatory variables of the model or not. If no amount of the variation in the dependent
variable can be attributed to the variation of explanatory variables included in the model then, none of the
explanatory variables included in the model are relevant. That is, all estimates of slope coefficient will be
statistically not different from zero. On the other hand, if it is possible to attribute significant proportion of the
variation in the dependent variable to the variation in explanatory variables then, at least one of the explanatory
variables included in the model is relevant. That is, at least one of the estimates of slope coefficient will be
statistically different from zero (significant).

Thus, this test has the following null and alternative hypotheses to test:
𝑯𝟎 : 𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 … … … … … … . . = 𝜷𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
The null hypothesis in a joint hypothesis states that none of the explanatory variables included in the model are
relevant in a sense that no amount of the variation in Y can be attributed to the variation in all explanatory
variables simultaneously. That means if all explanatory variables of the model are change simultaneously it will
left the value of Y unchanged.

How to approach test of the overall significance of MLRM?

Instructor: Sileshi A. & Geertje D. Page 10


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

If the null-hypothesis is true, that is if all the explanatory variables included in the model are irrelevant then,
there wouldn’t be a significant explanatory power difference between the models with and without all the
explanatory variables. Thus, test of the overall significance of MLRMs can be approached by testing whether
the difference in explanatory power of the model with and without all explanatory variables is significant or not.
In this case, if the difference is insignificant we accept the null-hypothesis and reject it if the difference is
significant.

Similarly, this test can be done by comparing the sum of squared errors (RSS) of the model with and without
all explanatory variables. In this case we accept the null-hypothesis if the difference between the sums of
squared errors (RSS) of the model with and without all explanatory variables is insignificant. The notion of
this is straightforward in a sense that if all explanatory variables are irrelevant then, inclusion of them in the
model contributes insignificant amount to the explanation of the model as a result the sample prediction error of
the model wouldn’t reduce significantly.

Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors of the model without the
inclusion of all the explanatory variables of the model, i.e., the residual sum of square of the model obtained
assuming that all the explanatory variables are irrelevant (under the null hypothesis) and Unrestricted Residual
Sum of Squares (URSS) be the sum of squared errors of the model with the inclusion of all explanatory
variables in the model. It is always true that 𝑹𝑹𝑺𝑺 ≥ 𝑼𝑹𝑺𝑺 (why?). To elaborate these concepts consider the
following model
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐𝑿𝟐 + 𝜷𝟑 𝑿𝟑 + ⋯ … . . +𝜷𝒌 𝑿𝒌 +𝒆𝒊
This model is called the unrestricted model. The test of joint hypothesis is given by:
𝑯𝟎 : 𝜷𝟏 = 𝜷𝟐 = 𝜷𝟑 … … … … … … . . = 𝜷𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
We know that:
𝒀𝒊 = 𝒀𝒊 +𝒆𝒊 ⇒ 𝒆𝒊 = 𝒀𝒊 − 𝒀𝒊
𝒆𝟐𝒊 = (𝒀𝒊 − 𝒀𝒊 )𝟐
This sum of squared error is called unrestricted residual sum of square (URSS).

However, if the null hypothesis is assumed to be true, i.e., when all the slope coefficients are zero the model
shrinks to:
𝒀𝒊 = 𝜷𝟎 + 𝒆𝒊
This model is called restricted model. Applying OLS we obtain:
∑ 𝒀𝒊
𝜷𝟎 = = 𝒀 … … … … … … … … … … … … … … … . . (𝟑. 𝟐𝟑)
𝒏
Therefore, 𝒆𝒊 = 𝒀𝒊 − 𝜷𝟎 , but 𝜷𝟎 = 𝒀
𝒆𝒊 = 𝒀𝒊 − 𝒀
∴ ∑ 𝒆𝒊 = ∑(𝒀𝒊 − 𝒀)𝟐 = ∑ 𝒚𝟐𝒊 = 𝑻𝑺𝑺
𝟐

The sum of squared error when the null hypothesis is assumed to be true is called Restricted Residual Sum of
Square (RRSS) and this is equal to the total sum of square (TSS).
The ratio:

Instructor: Sileshi A. & Geertje D. Page 11


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

𝑹𝑹𝑺𝑺 − 𝑼𝑹𝑺𝑺⁄𝑲 − 𝟏
~𝑭(𝑲 𝟏, 𝒏 𝑲) … … … … … … … . . (𝟑. 𝟐𝟒)
𝑼𝑹𝑺𝑺⁄𝒏 − 𝑲
has an 𝑭 − 𝒅𝒊𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏 with 𝒌 − 𝟏 and 𝒏 − 𝒌 degrees of freedom for the numerator and denominator,
respectively.
𝑹𝑹𝑺𝑺 = 𝑻𝑺𝑺
𝑼𝑹𝑺𝑺 = ∑ 𝒚𝟐𝒊 − 𝜷𝟏 ∑ 𝒙𝟏 𝒚 − 𝜷𝟐 ∑ 𝒙𝟐 𝒚 − ⋯ … . . … … … . . . − 𝜷𝒌 ∑ 𝒙𝒌 𝒚 = 𝑹𝑺𝑺
𝑰. 𝒆. , 𝑼𝑹𝑺𝑺 = 𝑹𝑺𝑺
𝑻𝑺𝑺 − 𝑹𝑺𝑺⁄𝑲 − 𝟏
𝑭= ~𝑭(𝑲 𝟏, 𝒏 𝑲)
𝑹𝑺𝑺⁄𝒏 − 𝑲
𝑬𝑺𝑺⁄𝑲 − 𝟏
𝑭𝒄(𝑲 𝟏, 𝒏 𝑲) = … … … … … … … … … … … … … (𝟑. 𝟐𝟓)
𝑹𝑺𝑺⁄𝒏 − 𝑲
If we divide numerator and denominator of the above equation by 𝑻𝑺𝑺 then:
𝑬𝑺𝑺
𝑲−𝟏
𝑭𝒄 = 𝑻𝑺𝑺
𝑹𝑺𝑺
𝒏−𝑲
𝑻𝑺𝑺
𝑹𝟐 ⁄𝑲 − 𝟏
∴ 𝑭𝒄 = … … … … … … … … … … … … … … … (𝟑. 𝟐𝟔)
𝟏 − 𝑹𝟐 ⁄𝒏 − 𝑲
This implies that the computed value of F can be calculated either as a ratio of 𝑬𝑺𝑺 & 𝑻𝑺𝑺 or 𝑹𝟐 & 𝟏 − 𝑹𝟐 .
This value is compared with the table value of F which leaves the probability of 𝜶 in the upper tail of the F-
distribution with 𝒌 − 𝟏 & 𝒏 − 𝒌 degrees of freedom.

 If the null hypothesis is not true, then the difference between RRSS and URSS (or i.e., TSS & RSS)
becomes large, implying that the constraints placed on the model by the null hypothesis have large effect
on the ability of the model to fit the data, and the value of 𝑭 tends to be large. Thus, reject the null
hypothesis if the computed value of F (i.e., F test statistic) becomes too large or the P-value for the F-
statistic is lower than any acceptable level of significance (𝜶), and vice versa.

In short, the Decision Rule is to


 Reject 𝑯𝟎 if 𝑭𝒄 > 𝑭𝟏 𝜶 (𝒌 − 𝟏, 𝒏 − 𝒌), 𝒐𝒓 𝑷 − 𝒗𝒂𝒍𝒖𝒆 < 𝜶, and vice versa.

 Implication: 𝑅𝑒𝑗𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐻 implies that the parameters of the model are jointly significant or the
dependent variable 𝑌 is linearly related to the independent variables included in the model.

3.6 Point and interval forecasting using multiple linear regressions


Like forecasting/predicting was possible in Simple Linear Regression (SLR), it can also be done in Multiple
Linear Regression (MLR). If the X-values are known, the Y can be predicted by substituting the X-values in the
formula with parameter estimates. Include all variables and all parameter estimates, also the ones that are
insignificant based on the t-test.

For example, based on the output in figure 4, it is found that 𝛼 = 14661.19, 𝛽 = 191.6551 and 𝛽 = −77.49.
Therefore, if we want to forecast the sales of a week with 1500 radio ads and 25 tv ads, the predicted sales are:
𝑌 = 14661.19 + 191.6551 ∗ 1500 − 77.49 ∗ 25 ≈ 300,207

Instructor: Sileshi A. & Geertje D. Page 12


CHAPTER THREE: MULTIPLE LINEAR REGRESSION ANALYSIS 2022

This is point forecasting, because it indicates one point (one exact number) as forecast. However, it is very
unlikely that the sales are exactly 300207 if the radio ads are 1500 and the tv ads 25. There is some variation
around this prediction. This can be expressed in the interval forecast. This interval forecast is a confidence
interval for the predicted value. Similar to the confidence interval, as discussed in chapter 2, it considers the t-
value and the standard error.

The t-value should be obtained from the table and depends on the accuracy of the interval. The accuracy is
measured by , which is the acceptable probability to draw a wrong conclusion given the data (type 1 error).
Once the  is known, the t-value can be obtained from the table: t(1-/2,N-2). For example, if we want to calculate
a 95% prediction interval for a case based on a model from a sample of 200, =100%-95%=5%=0.05 and
N=200, so the table value is t0.975,198=1.96. (see table 2.5, from ch 2).

Next, also the standard error of the forecast/prediction is needed. The standard error of the forecast can be
derived by

𝒔𝒆(𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏) = 𝑹𝑴𝑺 + (𝑹𝑴𝑺(𝑿𝑻𝒉 (𝑿𝑻𝑿) 𝟏 𝑿𝒉 ) (3.27)

Where RMS is the residual mean squares (see section 3.3 if you forgot what that was), Xh is the matrix
including the values of X for the case that we want to predict/forecast (in our case (1500 25)) and X is the
matrix with all X-values (of the whole dataset from the sample).

Rather than calculating the standard error by hand, econometricians use Stata to compute this standard error of
the forecast, and to find the interval forecast. Figure 5 displays the forecast (Coef.) of the case where X1=1500
and X2=25. The predicted value is indeed 300207 soap block sales, and the standard error of this prediction is
101034. This leads to a 95% confidence interval of sales between 83510 and 516902 blocks of soap. That
means that we can be 95% sure that a new observation (a new week) with 1500 radio ads and 25 tv ads
(X1=1500 and X2=25) will result in sales between 83510 and 516902 blocks of soap. Note that this prediction
interval is very wide, which suggests that there is a lot of uncertainty about the prediction. We need to include
additional explanatory variables to better understand the effect of ad quantities on sales.

Figure 5: Forecasting/prediction of the sales (Y) if X1=1500 and X2=25

Instructor: Sileshi A. & Geertje D. Page 13

You might also like