You are on page 1of 96

Multiple Regression

1
• In general the regression estimates are more
reliable if:
i) n is large (large dataset)
ii) The sample variance of the explanatory
variable is high.
iii) the variance of the error term is small
iv) The less closely related are the explanatory
variables.
2
Objectives of Multiple Regression
• Establish the linear equation that best predicts
values of a dependent variable Y using more
than one explanatory variable from a large set
of potential predictors {x1, x2, ... xk}.
• It uses a subset of all possible predictor
variables that explains a significant and
appreciable proportion of the variance of Y,
trading off adequacy of prediction against the
cost of measuring more predictor variables.

3
Multiple Regression
• The constant and parameters are derived in
the same way as with the simple regression
model.
• It involves minimising the sum of the error
terms.
• When a new variable is added it affects the
coefficients of the existing variables

4
Model of Multiple Regression

yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ...  ˆk xk , so


yˆ  ˆ x  ˆ x  ...  ˆ x ,
1 1 2 2 k k

so holding x2 ,..., xk fixed implies that


yˆ  ˆ x , that is each  has
1 1

a ceteris paribus interpretation


5
“Partialling Out” Interpretation
yˆ  ˆ0  ˆ1 x1  ˆ2 x2
• This equation implies that regressing y on x1
and x2 gives same effect of x1 on y as
regressing y on residuals from a regression of
x1 on x2.
• This means only the part of xi1 that is
uncorrelated with xi2 are being related to yi so
we’re estimating the effect of x1 on y after x2
has been “partialled out”

6
More about R-squared
• R2 can never decrease when another
independent variable is added to a regression,
and usually will increase

• Because R2 will usually increase with the


number of independent variables, it is not a
good way to compare models

7
Adjusted R2
• Adjusted R2
• So we'd like a measure like R2, but one that takes into account
the fact that adding extra variables always increases your
explanatory power.
• The statistic we use for this is call the Adjusted R2, and its
formula is:

n 1
R2  1 (1  R2 );
nk
n  number of observations,
k  number of independent variables.
• So the Adjusted R2 can actually fall if the variable you add
doesn't explain much of the variance.
8
Stepwise Regression
• One strategy for model building is to add variables only if
they increase your adjusted R2.
• This technique is called stepwise regression.
• However, I don't want to emphasize this approach to
strongly. Just as people can fixate on R2 they can fixate on
adjusted R2.

****IMPORTANT****
If you have a theory that suggests that certain
variables are important for your analysis then
include them whether or not they increase the
adjusted R2.
Negative findings can be important!
9
The F-test
• The F-test is an analysis of the variance of a
regression
• It can be used to test for the significance of a group
of variables or for a restriction
• When determining the F-statistic we need to collect
either the residual sum of squares (RSS) or the R-
squared statistic
• The formula for the F-test of a group of variables can
be expressed in terms of either the residual sum of
squares (RSS) or explained sum of squares (ESS)

10
F-test of explanatory power
• This is the F-test for the goodness of fit of a
regression and in effect tests for the joint
significance of the explanatory variables.
• It is based on the R-squared statistic
• It is routinely produced by most computer
software packages

11
F-test formula
• The formula for the F-test of the goodness of fit
is:

R / k 1
2
F
(1  R ) /( n  k )
2

k 1
Fnk
12
F-distribution
• To find the critical value of the F-distribution,
in general you need to know the number of
parameters and the degrees of freedom
• The number of parameters is then read across
the top of the table, the d of f. from the side.
Where these two values intersect, we find the
critical value.

13
F-test critical value

1 2 3 4 5
1 161.4 199.5 215.7 224.6 230.2
2 18.5 19.0 19.2 19.3 19.3
3 10.1 9.6 9.3 9.1 9.0
4 7.7 7.0 6.6 6.4 6.3
5 6.6 5.8 5.4 5.2 5.1

14
F-distribution
• Both go up to infinity
• If we wanted to find the critical value for F(3,4),
it would be 6.6
• The first value (3) is often termed the
numerator, whilst the second (4) the
denominator.
• It is often written as:

3
F4
15
F-statistic
• When testing for the significance of the
goodness of fit, our null hypothesis is that the
explanatory variables jointly equal 0.
• If our F-statistic is below the critical value we
fail to reject the null and therefore we say the
goodness of fit is not significant.

16
Joint Significance
• The F-test is useful for testing a number of
hypotheses and is often used to test for the
joint significance of a group of variables
• In this type of test, we often refer to ‘testing a
restriction’
• This restriction is that a group of explanatory
variables are jointly equal to 0

17
F-tests
• The test for joint significance has its own formula,
which takes the following form:
RSS R  RSSu / m
F
RSSu / n  k
m  number of restrictions
k  parameters in unrestricted mod el
RSSu  unrestricted RSS
RSS R  restricted RSS
18
Joint Significance of a group of variables

• To carry out this test you need to conduct two


separate OLS regression, one with all the explanatory
variables in (unrestricted equation), the other with
the variables whose joint significance is being tested,
removed.
• Then collect the RSS from both equations.
• Put the values in the formula
• Find the critical value and compare with the test
statistic. The null hypothesis is that the variables
jointly equal 0.

19
Joint Significance
• If we have a 3 explanatory variable model and wish to
test for the joint significance of 2 of the variables (x
and z), we need to run the following restricted and
unrestricted models:

yt   0  1wt  ut  restricted
yt   0  1wt   2 xt   3 zt  ut
 unrestricted
20
Example of the F-test for joint significance

• Given the following model, we wish to test the joint


significance of w and z. Having estimated them, we collect
their respective RSSs (n=60).
yt   0  1xt   2 wt   3 zt  ut  unrestrict ed
 RSSu  0.75
yt   0  1 xt  vt  restricted
 RSS R  1.5

21
Joint significance

where :  0 ,  0 are constants.


1.... 3 , 1 are slope parameters.
ut , vt are error terms
xt , wt , zt are explanatory variables

22
Joint significance
• Having obtained the RSSs, we need to input the
values into the earlier formula (slide 18):

1.5  0.75 / 2 0.375


  28
0.75 / 60  4 0.0134
critical value : 2
F56  3.15
23
Joint significance
H 0 :  2  3  0
H1 :  2   3  0

• As the F statistic is greater than the critical value


(28>3.15), we reject the null hypothesis and
conclude that the variables w and z are jointly
significant and should remain in the model.

24
Conclusion
• Multiple regression analysis is similar to bi-variate
analysis, however correlation between the x
variables needs to be taken into account
• The adjusted R-squared statistic tends to be used in
this case
• The F-test is used to test for joint explanatory power
of the whole regression or a sub-set of the variables

25
Violation of the Assumptions of
Classical Linear Regressions

26
• Major Issues
Causes
Consequences
Tests
Solutions

27
In general when the assumptions are violated we
could encounter any combination of 3 problems:

• The coefficient estimates are wrong


[biased]
• The associated standard errors are wrong
• The distribution that we assumed for the
hypothesis test statistics will be
inappropriate

28
Heteroskedasticity

29
Heteroskedasticity
• Ordinary least squares assumes that all
observations are equally reliable (i.e.,
the error variance is constant).
Var(i )   2

• Heteroskedasticity is a systematic
pattern in the errors where the variances
of the errors are not constant.
30
f(Yi)

.
. rich people

.
poor people

X1 X2 X3 Income Xi
31
Consumption
.
Yi .
. .
. . . .
. .
. . . .
. . . . .
. . . .. . . .
. . . . . . .
. . . . .
. . . . .
. .

Income Xi 32
Consequences of Heteroskedasticity
1. Ordinary least squares estimators still linear and
unbiased but not BLUE.
2. Ordinary least squares estimators are not
efficient; some other linear estimator will have a
lower variance.
3. Usual formulas give incorrect standard errors for
least squares.
4. Confidence intervals and hypothesis tests (both t-
and F- tests) based on usual standard errors are
wrong.

33
Tests for Heteroskedasticity

• There are two types of tests:


1. Breusch–Pagan test
2. White test

A large chi-square would indicate that heteroskedasticity was


present.

34
The Breusch–Pagan–Godfrey (BPG) heteroskedasticity test
The steps involved in this test are as follows:
Step 1. Estimate the original model by OLS and obtain the
residuals.
Step 2: Square the OLS residuals
Step 3. Estimate an auxiliary regression taking the square of the
residuals as dependent variable and including all the
independent variables included in Step 1
Step 4: Keep the R-squared from the regression in Step 3 as 𝑅𝑢22
Step 5: Determine the BPG statistic which is equal to n∗ 𝑅𝑢22 . The
statistics is distributed as chi-square.
Step 6: Determine the value from chi-square distribution table at
a significance level ,
Step 7: The null hypothesis of homoskedasticity is rejected if BPG
statistics is greater than the chi-square values
35
• Breusch–Pagan test is used for linear forms of
heteroskedasticity, e.g. as yˆ goes up, the error variances
go up.

• Stata commands for Breusch–Pagan test:


 reg y x
 estat hettest
• The p-value should be bigger than 0.05 to not
reject the null of homoscedasticity at the 5%
level.
• This test is an asymptotic test in that it is only
valid for large samples.

36
White Test
1. Estimate Yˆi  ˆ1  ˆ 2 X 2i  ˆ3 X 3i
And obtain the residuals
2. Run the following auxiliary regression:

u  A0  A1 X 2i  A2 X 3i  A3 X  A4 X  A5 X 2i X 3i  Vi
2
i
2
2i
2
3i

3. Calculate White test statistic from auxiliary regression

nR ~  d . f .
2 2

(d.f. = no. of explanatory variables in auxiliary regression)


4. Obtain critical value from 2 distribution

5. Decision rule: if test statistic > critical 2 value then reject


null hypothesis of no heteroscedasticity
37
White’s Test
• White test is used for non-linear forms of
heteroskedasticity

• Stata commands:
 reg y x
 estat imtest, white

The p-value should be bigger than 0.05 to not


reject the null of homoskedasticity at the 5%
level.
38
Dealing with heteroskedasticity
1. If the form (i.e. the cause) of the heteroscedasticity is
known, then we can use an estimation method which
takes this into account (called generalised least squares,
GLS). GLS is different from OLS

2. Respecify the Model/Transform the Variables

3. Use Robust Standard Errors 9for both k.


 Heteroskedasticity causes standard errors to be biased.
 OLS assumes that errors are both independent and
identically distributed;
 Robust standard errors relax either or both of those
assumptions. Hence, when heteroskedasticity is present,
robust standard errors tend to be more trustworthy even
if we don’t know the type of hetroskedasticity.
39
The robust standard error is determined as follows:

where 𝑟𝑖𝑗 2 denotes the ith residual from regressing xj


on all other independent variables, 𝑢𝑖 2 denote the
square of OLS residuals from the initial regression of
y on x, and SSRj is the sum of squared residuals from
this regression.
The square root of the variance is called the
heteroskedasticity-robust standard error for 𝛽𝑗 for
heteroskedasticity of any form
40
• In empirical works it is a normal procedure to
use heteroskedasticity-robust standard errors
• In stata we follow the following command to
get heteroskedasticity-robust standard error

reg y x1 x2 x3, robust

41
Multicolinearity

42
 Multicolinearity indicates the degree of
correlation between independent variables.
 It becomes an issue for multiple
regression model

 The best regression models are those in which


the predictor variables each correlate highly with
the dependent (outcome) variable but correlate-
- at most-- only minimally with each other

43
Perfect Collinearity can exist if:
1) One variable is a constant multiple of another
2) Logs are used inappropriately
3) One variable is a linear function of two or more
other variables
In general, all of these issues are easy to fix, once
they are identified.

44
1. One variable is a constant multiple of
another

y   0  1 x   2 (5 x)  u

45
Properties of Logarithmic Functions
• Logarithmic function log(x) is defined only for
x>0.
• log (MN) = log M + log N
• log (M/N) = log M - log N
• logMa = alog M
• Log(1)= 0
• log 0= undefined

46
2) Logs are used inappropriately

-Consider the following equation and apply log


rules: y   0  1 log( x)   2 log( x 2 )  u
y   0  1 log( x)  2  2 log( x)  u
y   0  ( 1  2 2 ) log( x)  u
A variable is included twice, causing an inability
to estimate B1 and B2 separately
Note that x and x2 could both have been used, as
they are not linearly related. 47
3) One variable is a linear function of two or
more other variables

-Consider a person who spends all their income


on movies and clothes:
income  movies  clothes
-if income and expenditures on both movies and
clothes are in the regression, perfect
collinearity exists and the regression fails
48
The consequence of multicollinearity
• Large variance (and standard error) in
regression coefficient estimates
• Low t-statistics and insignificant
coefficients
• Misleading signs of the coefficients
• Significant F-test with insignificant
individual coefficients

49
Detection of Multicolinearity
• Correlation Matrix
 corr var1 var2 var3 var4
• Look for non-significant t tests for
individual β parameters when the F test
is significant
• Look for opposite signs with β than you
expected
• Variance Inflation Factor (VIF)
50
The variance inflation factor (VIF): measures
how much the variance of an estimated
regression coefficient increases if your
predictors are correlated (multicollinear).
1
VIF (  i )   Cii
1  Ri2

– VIF = 1 indicates no relation;


– VIF > 1, otherwise.
– When VIF is greater than 10, then the
regression coefficients are poorly estimated
and imperfect muIticollinearity is likely.
51
Stata commands

reg y x1 x2 x3 x4
estat vif (after regression)

Some software instead calculates the tolerance


which is just the reciprocal of the VIF.

52
Possible Solutions
1. Drop a variable if it is neither significant
nor theoretically valid nor likely to cause
omitted variable bias

2. Transform a variable

53
Normality Tests

54
• It is assumed that the distribution of residuals
is normal.
• If the normality assumption is violated our
hypothesis testing is not reliable.

55
• A normal distribution
– symmetrical, bell-shaped (so they say)

56
• The two common tests of normality are:
(1) histogram of residuals;
(2) the Jarque–Bera test.

57
Histogram of residuals
• Used to learn about the shape of the
probability density function of the residual.
• Stata command
 reg dv iv
 predict uhat, residual
 hist uhat, normal

58
• the Jarque–Bera test
• Stata command
reg dv iv
predict uhat, residual
sum uhat, detail
• Look at kurtosis and skewness.
• Skewness and Kurtosis are ingredients of the
Jarque-Bera test for normality.

59
Indicators if it is not normally distributed
• Skew
– non-symmetrical
– one tail longer than the other
• Kurtosis
– too flat or too peaked
– kurtosed
• Outliers
– Individual cases which are far from the
distribution
60
Effects on the Mean
• Skew
– biases the mean, in direction of skew
• Kurtosis
– mean not biased
– standard deviation is
– and hence standard errors, and
significance tests are biased

61
Measurement error

62
• Measurement error is the error that arises
when a recorded value is not exactly the same
as the true value due to a flaw in the
measurement process.
• Measurement error can be divided into
systematic error and random error

63
Potential causes of measurement error

• Misuse of measurement tools,


• Poor choice of measurement tool
• Lack of training
• Carelessness
• Not possible to measure exactly
• Selection bias: Subjects recruited not representative of the
target population

64
Measurement error in explanatory
variable
• Measurement error is a source of
endogeneity which makes the
coefficient biased.
• For example, in a typical survey, the
respondents may report their annual
incomes with a lot of errors.

65
• Now, let us understand the nature of the problem.
• Suppose that you want to estimate the following
simple regression.

y=β0+β1x1* +u …………………….(1)

where x1* is the measurement-error free variable.


• Now, suppose that you only observe the error-
ridden variable x1. That is
x1=x1*+e1

66
• Our assumption about the correlation
between e1and x1* determines the effect of the
measurement error.
• Suppose that e1 is a random error uncorrelated
with x1*
• It means that, the measurement error is such that
x1=x1*+e1 …………….(2)
and
Cov(x1*, e1)=0 ………….(3)

67
• (2) and (3) is called the classical errors-in-variables (CEV)
assumption.
• Therefore, in the model:
y=β0+β1x1 + u +e1 …………………….(4)
 x1 is correlated with the error term which
makes x1 to be endogenous.
Thus, β1 using x1 will be biased and inconsistent,
that is, it remains biased even if the sample size n
increases indefinitely.

 If e1 is a systematic error correlated with x1*, in the


model (4), β1 using x1 will be unbiased. 68
Measurement error in dependent variable

• When the measurement error is in the


dependent variable, but explanatory variables
have no measurement-errors, there will be no
bias in OLS.
• However, it increases the variance of the error
terms as . Unless the increase is substantial,
this is generally not a serious problem.

• Please read Section “9.4 Properties of OLS under


Measurement Error” of Wooldridge: Introductory
Econometrics- A Modern Approach
69
Misspecification

70
Too Many or Too Few Variables
• What if we exclude a variable from our
specification that does belong?
 OLS will usually be biased
• What happens if we include variables in our
specification that don’t belong?
 There is no effect on our parameter
estimate, and OLS remains unbiased

71
Omitted Variable Bias

Suppose the true model is given as


y   0  1 x1   2 x2  u ,

but we estimate
~ ~ ~
y     x  u,
0 1 1

72
Omitted Variable Bias (cont)

Consider the regression of x2 on x1


~ ~ ~
x2   0  1 x1
 
~ ~
so E 1  1   21

73
Summary of Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0

2 > 0 Positive bias Negative bias

2 < 0 Negative bias Positive bias

74
Omitted Variable Bias Summary
• Two cases where bias is equal to zero
– 2 = 0, that is x2 doesn’t really belong in model
– x1 and x2 are uncorrelated in the sample

• If correlation between x2 , x1 and x2 , y is the


same direction, bias will be positive
• If correlation between x2 , x1 and x2 , y is the
opposite direction, bias will be negative

75
The More General Case
• Technically, can only sign the bias for the
more general case if all of the included x’s are
uncorrelated

• Typically, then, we work through the bias


assuming the x’s are uncorrelated, as a useful
guide even if this assumption is not strictly
true

76
-a positive (negative) bias indicates that
given random sampling, on average
your estimates will be too large (small)
-the SIZE of the bias is important, as a
small bias may not be cause for
concern
-therefore the SIZE of B2 and delta tilde
are important

77
-although B2 is unknown, theory can
give us a good idea about its sign

-likewise, the direction of correlation


between x1 and x2 can be guessed
through theory

78
Stata command for OVB
 reg y x1 x2 x3
estat ovtest
• Rejection of the null hypothesis implies that
there are possible missing variables and the
model suffers from endogeneity, causing
biased coefficient estimates
The command “estat ovtest” performs the
Ramsey regression specification error test
(RESET) for omitted variables.

79
How to Address OVB: Using Proxy Variables

• Suppose you are interested in estimating the return to


Education. So you consider the following model.

Log(Wage)=β0+β1Educ+ β2Exp+ +u …(1)


• Ability is unobserved, so it is included in the composite
error term. If Ability is correlated with the year of
education, β1 will be biased.

Question: if ability is correlated with Educ, what is the


direction of the bias?

80
• One way to eliminate the bias is to use a
proxy variable for ability.
• Suppose that IQ is a proxy variable for
ability, and that IQ is available in your
data.

81
• Then, the basic idea is to estimate the following.

Regress Log(Wage) on Educ, Exp, and IQ ……………(2)

This is called the plug-in solution to the omitted


variables problem.
The question is under what conditions (2) produces
consistent estimates for the original regression (1).
It turns out, the following two conditions ensure that
you get consistent estimates by using the plug-in
solution.

82
Condition 1: u is uncorrelated with IQ. In addition, the original equation
should satisfy the usual conditions (i.e, u is also uncorrelated with Educ,
Exp, and Ability).

Omitted variable The initial explanatory variables

Condition 2: E(Ability|Educ, Exp, IQ)=E(Ability|IQ)


The proxy variable
Condition 2 means that, once IQ is conditioned, Educ and Exp does not
explain Ability. More simple way to express condition 2 is that the
ability can be written as:

Ability=δ0+δ3IQ+v3 …………(3)

where, v3 is a random error which is uncorrelated with either IQ, Educ or


Exp. What it means is that Ability is a function of IQ only.

83
Irrelevant Variables in a Regression
Model

 When an independent variable that does not


actually affect y is included in the model the
variable is irrelevant variable and the model is
said to be OVERSPECIFIED
-Consider the model:

y   0  1 x1   2 x2   3 x3  u

84
• where x3 has no impact on y; B3=0
• x3 may or may not be correlated with x2
and x1
• in terms of expectations:

E ( y | x1 , x2 , x3 )  E ( y | x1 , x2 )   0  1 x1   2 x2

85
Including irrelevant variables doesn’t
affect OLS unbiasedness, but it
affects OLS variance

86
Wrong Functional Forms
Linear
1.5 2.0 2.5y 3.0 3.5

Quadratic

1.5 2.0 2.5yn 3.0 3.5


10 20 30 40
x

Adding one more terms to


the model significantly
improves the model fit.
10 20 30 40
xn

87
Autocorrelation

88
• CLRM states that the covariances and
correlations between different disturbances are
all zero:
cov(ui, uj)=0 for all i≠j
• If this assumption is no longer valid, then the
disturbances are pairwise autocorrelated (or
Serially Correlated).
• Autocorrelation is most likely to occur in time
series data.
• This means that an error occurring at period t
may be carried over to the next period t+1.
89
Causes of Autocorrelation
• A missing variable,
• An incorrect functional form, or
• Pure serial correlation that frequently arises in
time series data.

90
What Causes Autocorrelation
1. Omitted variables
 Suppose Y is related to X2 and X3, but we
wrongfully do not include X3 in our model.
 The effect of X3 will be captured by the
disturbances ui.
 If X3 like many economic series exhibit a
trend over time, then X3 depends on Xi-1, Xi -2
and so on.
 Similarly then ui depends on ui-1, ui-2 and
so on.
91
2. Misspecification.
Suppose Yi is related to Xi with a quadratic
relationship:
Yi=β1 +β2Xi++β3X2i+ui
but we wrongfully assume and estimate a
straight line:
Yi=β1+β2Xi+ui
Then the error term obtained from the
straight line will depend on X2i.

92
Consequences of Autocorrelation
1. The OLS estimators are still unbiased as unbiasedness
does not depend on the autocorrelation assumption.
2. The estimated variances of the regression coefficients
will be biased,
3. The OLS estimators will be inefficient and therefore no
longer BLUE.
4. Hypothesis testing is no longer valid. In most of the
cases, the R2 will be overestimated and the t-statistics
will tend to be higher

93
STATA for Reporting

94
Stata command to generate regression tables
• esttab command
eststo clear
eststo: reg y x, robust
esttab using [file path\Table3.rtf, nogaps
stats(N F p r2 , labels(N F-test sig R2 )
fmt(%9.0f %9.0f %9.3f %9.3f)) se(3)
b(3) nolz

95
Outreg2 Command
reg y xi, robust
est store m1
reg y xi x2 x3, robust
est store m2
outreg2 [m1 m2] using [filepath], bdec(3) rdec(3) title(Table 3) word replace

96

You might also like