Professional Documents
Culture Documents
1
• In general the regression estimates are more
reliable if:
i) n is large (large dataset)
ii) The sample variance of the explanatory
variable is high.
iii) the variance of the error term is small
iv) The less closely related are the explanatory
variables.
2
Objectives of Multiple Regression
• Establish the linear equation that best predicts
values of a dependent variable Y using more
than one explanatory variable from a large set
of potential predictors {x1, x2, ... xk}.
• It uses a subset of all possible predictor
variables that explains a significant and
appreciable proportion of the variance of Y,
trading off adequacy of prediction against the
cost of measuring more predictor variables.
3
Multiple Regression
• The constant and parameters are derived in
the same way as with the simple regression
model.
• It involves minimising the sum of the error
terms.
• When a new variable is added it affects the
coefficients of the existing variables
4
Model of Multiple Regression
6
More about R-squared
• R2 can never decrease when another
independent variable is added to a regression,
and usually will increase
7
Adjusted R2
• Adjusted R2
• So we'd like a measure like R2, but one that takes into account
the fact that adding extra variables always increases your
explanatory power.
• The statistic we use for this is call the Adjusted R2, and its
formula is:
n 1
R2 1 (1 R2 );
nk
n number of observations,
k number of independent variables.
• So the Adjusted R2 can actually fall if the variable you add
doesn't explain much of the variance.
8
Stepwise Regression
• One strategy for model building is to add variables only if
they increase your adjusted R2.
• This technique is called stepwise regression.
• However, I don't want to emphasize this approach to
strongly. Just as people can fixate on R2 they can fixate on
adjusted R2.
****IMPORTANT****
If you have a theory that suggests that certain
variables are important for your analysis then
include them whether or not they increase the
adjusted R2.
Negative findings can be important!
9
The F-test
• The F-test is an analysis of the variance of a
regression
• It can be used to test for the significance of a group
of variables or for a restriction
• When determining the F-statistic we need to collect
either the residual sum of squares (RSS) or the R-
squared statistic
• The formula for the F-test of a group of variables can
be expressed in terms of either the residual sum of
squares (RSS) or explained sum of squares (ESS)
10
F-test of explanatory power
• This is the F-test for the goodness of fit of a
regression and in effect tests for the joint
significance of the explanatory variables.
• It is based on the R-squared statistic
• It is routinely produced by most computer
software packages
11
F-test formula
• The formula for the F-test of the goodness of fit
is:
R / k 1
2
F
(1 R ) /( n k )
2
k 1
Fnk
12
F-distribution
• To find the critical value of the F-distribution,
in general you need to know the number of
parameters and the degrees of freedom
• The number of parameters is then read across
the top of the table, the d of f. from the side.
Where these two values intersect, we find the
critical value.
13
F-test critical value
1 2 3 4 5
1 161.4 199.5 215.7 224.6 230.2
2 18.5 19.0 19.2 19.3 19.3
3 10.1 9.6 9.3 9.1 9.0
4 7.7 7.0 6.6 6.4 6.3
5 6.6 5.8 5.4 5.2 5.1
14
F-distribution
• Both go up to infinity
• If we wanted to find the critical value for F(3,4),
it would be 6.6
• The first value (3) is often termed the
numerator, whilst the second (4) the
denominator.
• It is often written as:
3
F4
15
F-statistic
• When testing for the significance of the
goodness of fit, our null hypothesis is that the
explanatory variables jointly equal 0.
• If our F-statistic is below the critical value we
fail to reject the null and therefore we say the
goodness of fit is not significant.
16
Joint Significance
• The F-test is useful for testing a number of
hypotheses and is often used to test for the
joint significance of a group of variables
• In this type of test, we often refer to ‘testing a
restriction’
• This restriction is that a group of explanatory
variables are jointly equal to 0
17
F-tests
• The test for joint significance has its own formula,
which takes the following form:
RSS R RSSu / m
F
RSSu / n k
m number of restrictions
k parameters in unrestricted mod el
RSSu unrestricted RSS
RSS R restricted RSS
18
Joint Significance of a group of variables
19
Joint Significance
• If we have a 3 explanatory variable model and wish to
test for the joint significance of 2 of the variables (x
and z), we need to run the following restricted and
unrestricted models:
yt 0 1wt ut restricted
yt 0 1wt 2 xt 3 zt ut
unrestricted
20
Example of the F-test for joint significance
21
Joint significance
22
Joint significance
• Having obtained the RSSs, we need to input the
values into the earlier formula (slide 18):
24
Conclusion
• Multiple regression analysis is similar to bi-variate
analysis, however correlation between the x
variables needs to be taken into account
• The adjusted R-squared statistic tends to be used in
this case
• The F-test is used to test for joint explanatory power
of the whole regression or a sub-set of the variables
25
Violation of the Assumptions of
Classical Linear Regressions
26
• Major Issues
Causes
Consequences
Tests
Solutions
27
In general when the assumptions are violated we
could encounter any combination of 3 problems:
28
Heteroskedasticity
29
Heteroskedasticity
• Ordinary least squares assumes that all
observations are equally reliable (i.e.,
the error variance is constant).
Var(i ) 2
• Heteroskedasticity is a systematic
pattern in the errors where the variances
of the errors are not constant.
30
f(Yi)
.
. rich people
.
poor people
X1 X2 X3 Income Xi
31
Consumption
.
Yi .
. .
. . . .
. .
. . . .
. . . . .
. . . .. . . .
. . . . . . .
. . . . .
. . . . .
. .
Income Xi 32
Consequences of Heteroskedasticity
1. Ordinary least squares estimators still linear and
unbiased but not BLUE.
2. Ordinary least squares estimators are not
efficient; some other linear estimator will have a
lower variance.
3. Usual formulas give incorrect standard errors for
least squares.
4. Confidence intervals and hypothesis tests (both t-
and F- tests) based on usual standard errors are
wrong.
33
Tests for Heteroskedasticity
34
The Breusch–Pagan–Godfrey (BPG) heteroskedasticity test
The steps involved in this test are as follows:
Step 1. Estimate the original model by OLS and obtain the
residuals.
Step 2: Square the OLS residuals
Step 3. Estimate an auxiliary regression taking the square of the
residuals as dependent variable and including all the
independent variables included in Step 1
Step 4: Keep the R-squared from the regression in Step 3 as 𝑅𝑢22
Step 5: Determine the BPG statistic which is equal to n∗ 𝑅𝑢22 . The
statistics is distributed as chi-square.
Step 6: Determine the value from chi-square distribution table at
a significance level ,
Step 7: The null hypothesis of homoskedasticity is rejected if BPG
statistics is greater than the chi-square values
35
• Breusch–Pagan test is used for linear forms of
heteroskedasticity, e.g. as yˆ goes up, the error variances
go up.
36
White Test
1. Estimate Yˆi ˆ1 ˆ 2 X 2i ˆ3 X 3i
And obtain the residuals
2. Run the following auxiliary regression:
u A0 A1 X 2i A2 X 3i A3 X A4 X A5 X 2i X 3i Vi
2
i
2
2i
2
3i
nR ~ d . f .
2 2
• Stata commands:
reg y x
estat imtest, white
41
Multicolinearity
42
Multicolinearity indicates the degree of
correlation between independent variables.
It becomes an issue for multiple
regression model
43
Perfect Collinearity can exist if:
1) One variable is a constant multiple of another
2) Logs are used inappropriately
3) One variable is a linear function of two or more
other variables
In general, all of these issues are easy to fix, once
they are identified.
44
1. One variable is a constant multiple of
another
y 0 1 x 2 (5 x) u
45
Properties of Logarithmic Functions
• Logarithmic function log(x) is defined only for
x>0.
• log (MN) = log M + log N
• log (M/N) = log M - log N
• logMa = alog M
• Log(1)= 0
• log 0= undefined
46
2) Logs are used inappropriately
49
Detection of Multicolinearity
• Correlation Matrix
corr var1 var2 var3 var4
• Look for non-significant t tests for
individual β parameters when the F test
is significant
• Look for opposite signs with β than you
expected
• Variance Inflation Factor (VIF)
50
The variance inflation factor (VIF): measures
how much the variance of an estimated
regression coefficient increases if your
predictors are correlated (multicollinear).
1
VIF ( i ) Cii
1 Ri2
reg y x1 x2 x3 x4
estat vif (after regression)
52
Possible Solutions
1. Drop a variable if it is neither significant
nor theoretically valid nor likely to cause
omitted variable bias
2. Transform a variable
53
Normality Tests
54
• It is assumed that the distribution of residuals
is normal.
• If the normality assumption is violated our
hypothesis testing is not reliable.
55
• A normal distribution
– symmetrical, bell-shaped (so they say)
56
• The two common tests of normality are:
(1) histogram of residuals;
(2) the Jarque–Bera test.
57
Histogram of residuals
• Used to learn about the shape of the
probability density function of the residual.
• Stata command
reg dv iv
predict uhat, residual
hist uhat, normal
58
• the Jarque–Bera test
• Stata command
reg dv iv
predict uhat, residual
sum uhat, detail
• Look at kurtosis and skewness.
• Skewness and Kurtosis are ingredients of the
Jarque-Bera test for normality.
59
Indicators if it is not normally distributed
• Skew
– non-symmetrical
– one tail longer than the other
• Kurtosis
– too flat or too peaked
– kurtosed
• Outliers
– Individual cases which are far from the
distribution
60
Effects on the Mean
• Skew
– biases the mean, in direction of skew
• Kurtosis
– mean not biased
– standard deviation is
– and hence standard errors, and
significance tests are biased
61
Measurement error
62
• Measurement error is the error that arises
when a recorded value is not exactly the same
as the true value due to a flaw in the
measurement process.
• Measurement error can be divided into
systematic error and random error
63
Potential causes of measurement error
64
Measurement error in explanatory
variable
• Measurement error is a source of
endogeneity which makes the
coefficient biased.
• For example, in a typical survey, the
respondents may report their annual
incomes with a lot of errors.
65
• Now, let us understand the nature of the problem.
• Suppose that you want to estimate the following
simple regression.
y=β0+β1x1* +u …………………….(1)
66
• Our assumption about the correlation
between e1and x1* determines the effect of the
measurement error.
• Suppose that e1 is a random error uncorrelated
with x1*
• It means that, the measurement error is such that
x1=x1*+e1 …………….(2)
and
Cov(x1*, e1)=0 ………….(3)
67
• (2) and (3) is called the classical errors-in-variables (CEV)
assumption.
• Therefore, in the model:
y=β0+β1x1 + u +e1 …………………….(4)
x1 is correlated with the error term which
makes x1 to be endogenous.
Thus, β1 using x1 will be biased and inconsistent,
that is, it remains biased even if the sample size n
increases indefinitely.
70
Too Many or Too Few Variables
• What if we exclude a variable from our
specification that does belong?
OLS will usually be biased
• What happens if we include variables in our
specification that don’t belong?
There is no effect on our parameter
estimate, and OLS remains unbiased
71
Omitted Variable Bias
but we estimate
~ ~ ~
y x u,
0 1 1
72
Omitted Variable Bias (cont)
73
Summary of Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
74
Omitted Variable Bias Summary
• Two cases where bias is equal to zero
– 2 = 0, that is x2 doesn’t really belong in model
– x1 and x2 are uncorrelated in the sample
75
The More General Case
• Technically, can only sign the bias for the
more general case if all of the included x’s are
uncorrelated
76
-a positive (negative) bias indicates that
given random sampling, on average
your estimates will be too large (small)
-the SIZE of the bias is important, as a
small bias may not be cause for
concern
-therefore the SIZE of B2 and delta tilde
are important
77
-although B2 is unknown, theory can
give us a good idea about its sign
78
Stata command for OVB
reg y x1 x2 x3
estat ovtest
• Rejection of the null hypothesis implies that
there are possible missing variables and the
model suffers from endogeneity, causing
biased coefficient estimates
The command “estat ovtest” performs the
Ramsey regression specification error test
(RESET) for omitted variables.
79
How to Address OVB: Using Proxy Variables
80
• One way to eliminate the bias is to use a
proxy variable for ability.
• Suppose that IQ is a proxy variable for
ability, and that IQ is available in your
data.
81
• Then, the basic idea is to estimate the following.
82
Condition 1: u is uncorrelated with IQ. In addition, the original equation
should satisfy the usual conditions (i.e, u is also uncorrelated with Educ,
Exp, and Ability).
Ability=δ0+δ3IQ+v3 …………(3)
83
Irrelevant Variables in a Regression
Model
y 0 1 x1 2 x2 3 x3 u
84
• where x3 has no impact on y; B3=0
• x3 may or may not be correlated with x2
and x1
• in terms of expectations:
E ( y | x1 , x2 , x3 ) E ( y | x1 , x2 ) 0 1 x1 2 x2
85
Including irrelevant variables doesn’t
affect OLS unbiasedness, but it
affects OLS variance
86
Wrong Functional Forms
Linear
1.5 2.0 2.5y 3.0 3.5
Quadratic
87
Autocorrelation
88
• CLRM states that the covariances and
correlations between different disturbances are
all zero:
cov(ui, uj)=0 for all i≠j
• If this assumption is no longer valid, then the
disturbances are pairwise autocorrelated (or
Serially Correlated).
• Autocorrelation is most likely to occur in time
series data.
• This means that an error occurring at period t
may be carried over to the next period t+1.
89
Causes of Autocorrelation
• A missing variable,
• An incorrect functional form, or
• Pure serial correlation that frequently arises in
time series data.
90
What Causes Autocorrelation
1. Omitted variables
Suppose Y is related to X2 and X3, but we
wrongfully do not include X3 in our model.
The effect of X3 will be captured by the
disturbances ui.
If X3 like many economic series exhibit a
trend over time, then X3 depends on Xi-1, Xi -2
and so on.
Similarly then ui depends on ui-1, ui-2 and
so on.
91
2. Misspecification.
Suppose Yi is related to Xi with a quadratic
relationship:
Yi=β1 +β2Xi++β3X2i+ui
but we wrongfully assume and estimate a
straight line:
Yi=β1+β2Xi+ui
Then the error term obtained from the
straight line will depend on X2i.
92
Consequences of Autocorrelation
1. The OLS estimators are still unbiased as unbiasedness
does not depend on the autocorrelation assumption.
2. The estimated variances of the regression coefficients
will be biased,
3. The OLS estimators will be inefficient and therefore no
longer BLUE.
4. Hypothesis testing is no longer valid. In most of the
cases, the R2 will be overestimated and the t-statistics
will tend to be higher
93
STATA for Reporting
94
Stata command to generate regression tables
• esttab command
eststo clear
eststo: reg y x, robust
esttab using [file path\Table3.rtf, nogaps
stats(N F p r2 , labels(N F-test sig R2 )
fmt(%9.0f %9.0f %9.3f %9.3f)) se(3)
b(3) nolz
95
Outreg2 Command
reg y xi, robust
est store m1
reg y xi x2 x3, robust
est store m2
outreg2 [m1 m2] using [filepath], bdec(3) rdec(3) title(Table 3) word replace
96