You are on page 1of 80

Chapter 3

The Multiple Linear Regression


Model
Teferi D.
Department of Ecoomics
CBE, AMu
Introduction
• Many applications of regression analysis
involve situations in which there are more than
one regressor variable.
• A regression model that contains more than
one regressor variable is called a multiple
regression model.
The Model
• The k-variable population regression function
involving the dependent variable Y and k-1
explanatory variables, X1, X2, X3, …., Xk may be
written as:

where
– β1 is intercept
– β1… βk are partial slope coefficients   i 
yi 
 x i 

• they are the "ceteris paribus" effect of their variation on Y
– εi is the error term
• This is equivalent to


• in vector form, the econometric model can be
rewritten as

y = x’β +ε (4)
where
– y represent the dependent variable (a nx1 column vector)
– x represent the vector of independent variables (nxk).
– β is the vector of parameters to be estimated (kx1)
– ε is the vector of error terms (nx1column vector)
The Gauss-Markov Assumptions for
Multiple Regression
• Linearity in Parameters

• Random sampling
• No Perfect Collinearity)
– In the sample none of the independent variables is
constant, and there are no exact linear relationships
among the independent variables
• None of the independent variables is a multiple of another
• None has perfect correlation with a linear combination of the
others . Otherwise it would mean that some variable was
redundant - can't "identify“ the parameters
– X has full column rank (k+1)
• Zero Conditional Mean:
– The error ε has an expected value of zero given
any values of the independent variables
E(ε|x1, x2, ... xk) = 0 implies E(ε) = 0
• Spherical disturbance:
– Constant variance-covariance matrix of the error tem
E(εε’) = δ2I .....w here I is identity matrix

2
 0 .... 0 1 0 .... 0
2
0  .... 0 2 0 1 .... 0 2
    
.... .... .... .... .... .... .... ....
0 0 ....  2 0 0 .... 1

• This is the variance covariance matrix of the disturbance term ε.


– This involves two assumptions: homoscedasticity and autocorrelation
• The vector X is not stochastic
– The elements of X are fixed or nonrandom
• Normality of ε:
– The error has multivariate normal distribution
OLS Estimation
• The method of least squares may be used to estimate the
regression coefficients in the multiple regression model

• The least squares method chooses β so as to minimize the


sum of squared residuals over all observations:
n
2 n    
min L    i   ( yi   0  1 xi1   2 xi 2  ...   k xk1 ) 2
i 1 i 1

This equation is often called the least squares function.


• The objective function can be compressed to
n
2 n  k 
min L    i   ( yi   0    j xij ) 2
i 1 i 1 j 1
Estimation… (cont’d)
• This is a simple unconstrained optimisation
problem, so that β is obtained by the following
two steps:
– differentiating (5) with respect to each βj
– solving the K equations (first-order conditions)
simultaneously
Estimation… (cont’d)
• Step 1:
– The least squares estimates must satisfy

– This will yield k+1 normal equations


Estimation… (cont’d)
• The least squares normal Equations are
Estimation… (cont’d)
• Step 2: Solve the normal equations
simultaneously.
– The solution to the normal equations are the least
squares estimators of the regression coefficients.
– the second Step can be difficult if K is relatively
large (e.g. K > 3) and the optimisation needs to be
done each time a different K arises.
– Matrix algebra simplifies things we only have to do
the optimisation once and the solution holds for all
K.
Estimation… (cont’d)
• The matrix approach
 
– Recall that y  x'   
 
   y  x' 
– the square of the error term is given by

– So the least square criteria become minimizing the


above equation
Estimation… (cont’d)
– differentiating it (eq.#) with respect to β gives a Kx1
vector
– by setting this vector equal to a Kx1 vector of zeros
and solved for β we have

(5)
Estimation… (cont’d)
• Notes:
– The assumption of full column rank (=k+1) for vector X
ensures that the matrix X′X is invertible , hence, that the OLS
estimator b can be obtained.
• X′X is invertible means (X′X)-1 exists

– To check that (6) is the minimum of L(β), differentiate (5)


again:
 
2
 ( '  )
   x' x

• which is positive definite under the assumption of full column rank,
and hence β corresponds to a minimum.
Estimation… (cont’d)
Estimation… (cont’d)
Estimation… (cont’d)
• The variance-covariance of β^
– Reporting the s.e of each estimated β is one of the
routines in regression analysis. This will be derived
from the variance-covariance matrix of β^
– By definition, the variance-covariance of β^ is given
by
• the variance-covariance matrix of β^ is given by
• So, how do we estimate the variance of β^

– We know that

and

– Then, we have
• Make all the necessary substitutions:
• Variance of the error term (σ2)
– The variance of β’s depend on the variance of the error
term.
– In practice, the error variance σ2 is unknown, we must
estimate it
– Following the logic we used in simple linear regression
model, we estimate it as

where n-k is the degree of freedom


– The standard error of the regression is given by:
Properties of OLS Estimates
• Linearity
– The OLS vector β is a linear function of y


Properties… (cont’d)
• Unbiasedness
– Under GMAs, the OLS estimator is unbiased for β

– Proof:
• we knew that y=xβ+ε and
• if we substitute the former in to the latter, we will find
• Minimum variance
– Among all linear and unbiased estimates, β^ has minimum
variance

• Normality
– Since ε∼ N(0, σ2), it follows that:
β^∼ N[0, σ2(X′X)−1]
Measure of Goodness of Fit
• By the analogy of the simple regression model,
we can have:


Measure of Goodness of Fit
• How well does the estimated model fit the
data?
– One GOF measure is the standard error of the regression

– But the coefficient of determination (R2) , is the popular


measure of goodness of fit, similar to SLR case.
• Just as in simple regression, we may calculate fitted values, or
predicted values, after estimating a multiple regression.

• So, this decomposition is also valid for the multiple linear regression model
– We can define the same three sums of squares-SST, SSE,
SSR- as in simple regression, and R2 is still the ratio of the
explained sum of squares (SSE) to the total sum of squares
(SST):

– The interpretation of R2 is similar to the SLR model.


• It measures the proportion of the variability of Y explained by our
model. But it is no longer a simple correlation (e.g. r yx) squared (why?)
• In matrix language,
– the three sum squares are

– thus, the coefficient of determination is


• Properties of R2
– 0 ≤ R2 ≤1

• do not use R2 in models that don’t contain an intercept;


here R2 can be negative!
– The R-squared does not tell anything about causality.
• it is a measure of how much variation in the dependent variable is
explained by the estimated econometric model. No more, no less!
– R2 never decreases when an explanatory variable is added
to a regression
• no matter how irrelevant that variable may be, the R2 of the
expanded regression will be no less than that of the original
regression (why?)

• Note that
– only the RSS depends on X. by adding more regressors, RSS will never
rise (and will typically fall).
– Thus, R2 could be made artificially large by using irrelevant regressors
• So, we should not be impressed with high value of R2 in a model
with a long list of explanatory variables
• The R2 can be used to compare different models with
the same dependent and same number of explanatory
variables.
• Thus, when comparing models with the same
dependent variable but with different number of
independent, R2 is not a good measure
• Adjusted-R2
– To avoid this problem, R2 can be adjusted by
accounting for the degrees of freedom
– Adjusted-R2 is calculated as follows:

• Adjusted-R2 penalizes excessive use of unimportant


independent variables
• Adjusted R2 is always smaller than R2 (except when R2 =1)
Interpreting OLS estimators
• What does the Holding Other Factors Fixed mean?
– Are the data have been collected in a ceteris paribus
manner?
• The strength of MLR is that it provides the ceteris
paribus conditions.


• Step 1: Regress y on xi2
• Step 2: Regress xi1 on xi2
• Step 2: Regress ui2 on ui1
• Partialling out:
– Let's focus on the case of two explanatory variables (besides the
constant), so k=2


• Let us estimate the effect of education on wages,
taking also into account the effect of experience
– Start by regressing education on experience (even if it
seems silly...), storing the residuals of this regression
– Then regress wages on the previously stored residuals
• What if we regress wages on education and experience directly?

• Estimated coefficient of education is the same as the coefficient


from the previous regression (that controls for experience )!
• Too many or too few?
– Assume that factors in u are uncorrelated with all the
considered variables:
1. What happens if we include variables in our
specification that don't belong?
• The estimated effects will be zero or close to zero and there
is no effect on the remaining parameter estimates, and OLS
remains unbiased
• Note: if the other factors (in u) tell us nothing about the
regressors (they are uncorrelated with the regressors) we
can still interpret the estimated effects as "ceteris paribus"
effects
2. What if we exclude a variable from our
specification that does belong?
• If this variable is uncorrelated with the included
regressors, then the OLS estimators will be unbiased
• If this variable is correlated with the included
regressors, then the OLS estimators will be biased
Data scaling
• Data scaling is usually used to present the coefficient
results in a meaningful and clear way.
• Instead of reporting a coefficient of .00001 one can
rescale the explanatory variable.
• For example:
– If the price increases by 1, the demand of oranges per year
decreases by 0.002.
– That means that if the price of an orange increases by
1,000, the demand of oranges per year decreases by 2.
– So if we rescale the price variable suitably, instead of
having a coefficient of – 0.002, we will have a coefficient of
-2.
• To show this in a general way we can change the
scale of independent and dependent variables.
• Let bo, b1, b2 are OLS estimates
– If we replace x*i1 by c.xi1, where c is a constant, then
b*o= bo
b*1= c-1bo
b*2= b2

• To sum up

Q? What will happen to the coefficients if both dependent and independent


variables are rescaled simultaneously?
• Beta Coefficients
– In standard OLS equation it is not possible from the
size of coefficients to conclude the importance of
them!
– Sometimes the units will not mean much.
– For example, labor economists often include test
scores in wage equations.
– These test scores have different scales, so that it is not
a priori easy to understand what “an increase in one
unit of the test score” means.
– Solution: “Standardizing the units”

• Advantage of Beta coefficients
– beta coefficient makes the scale of the regressors irrelevant.
– So it puts the explanatory variables on equal basis and
therefore we can compare them directly.
– Suppose we aim to test the relation between the house price
and its determinants in Addis


• How do we interpret the beta coefficients?
– This estimated equation shows that
• by one standard deviation increase in oxide decreases price
by .34 standard deviations and
• a one standard deviation increase in crime reduces price by
.14 standard deviation.
– Thus, the same relative movement of pollution has a
larger effect on housing prices than crime does.
– Size of the house, as measured by number of rooms
(rooms), has the largest standardized effect.
Hypothesis testing and Inferences
• A hypothesis test is a statistical inference
technique that assesses whether the
information provided by the data (sample)
supports or does not support a particular
conjecture or assumptions about the
population.
• Testing the significants of a single coefficient
– Hypothesis testing about individual partial regression
coefficient can be conducted using the t -statistic as
usual.
• Testing the multiple coefficients
– One of the things we can do in the MLR which we cannot do in the CLR
is to test hypothesis which involve several of the regression
coefficients simultaneously.
• Post hoc t Tests
– Besides restricted least squares, we also can test linear
restrictions and (within that) equality of coeffcients using a
post hoc t -test.

a) Equality of regression coefficients
• This can be tested using t — statistic.
b) Test for linear restrictions
Multiple Regression with Dummy
Independent Variable
• The multiple regression model often contains
qualitative factors, which are not measured in
any units, as independent variables:
– gender, race or nationality
– employment status or home ownership
– temperatures before 1900 and after (including)
1900
• Such qualitative factors often come in the form of
binary information and are captured by defining a
zero-one variable, called dummy variables.
– take value 1 in a category and value 0 “otherwise".
– “Otherwise" can represent one or more other
categories
• Warning: Dummy variables can be used only
as regressors. Why?
– This is because the binomial feature violates the
normal distribution assumption which renders t-
statistics invalid.
• Should the dependent variable be binomial,
you need to use Logit or Probit regression
models, which employ ML estimator.
End of chapter 3

10Q!

You might also like