You are on page 1of 41

ECO 401 Econometrics

SI 2021
Week 4, 28 September

Dr Syed K Abbas
Office Location: BS304
Email: Syed.Abbas@xjtlu.edu.cn
Plan
Last week, we looked at Gauss-Markov assumptions and Goodness-of-fit. This week, at the end of this
session, you should be able to understand how to:

• Interpret the linear model


• Account for the diminishing marginal effect
• Define Asymptotic properties of OLS
• Define, detect and correct multicollinearity problem
• Approach missing observations, outliers, and prediction
• Choose a model and specification
This lecture is based on Chapter 2 and 3 of your textbook by Verbeek (2017)
When is OLS BLUE? Gauss-Markov assumptions
What does BLUE mean?
• Best – minimum variance of the estimator
• Linear – within the class of linear estimators
• Unbiased – the expected value = 'truth': 𝐸 𝒃 = 𝜷
• Estimator
When is OLS BLUE? Under the Gauss Markov assumptions:
• (A1) mean zero; error terms have mean zero: 𝐸 𝜺 = 𝟎
• (A2) independent; error terms independent of exogenous variables
• (A3) homoskedasticity; error terms have same variance 𝑉 𝜀! = 𝜎 "
• (A4) no autocorrelation; error terms mutually uncorrelated 𝑐𝑜𝑣 𝜀! , 𝜀# = 0, 𝑓𝑜𝑟 𝑖 ≠ 𝑗
• (A5): εi ~ NID(0, σ2); which is shorthand for: all εi are independent drawings from a
normal distribution with mean 0 and variance σ2. (“normally and independently
distributed”)
Let us start with the wage regression, which was discussed in the lecture.
• What effect can be captured using an extra variable EXPER^2.
Interpretation of marginal effects

• EXPER^2 captures the diminishing marginal effect of experience.

• What is the marginal effect of experience on wage for a worker with 1 and 5 years of
experience?

• Take the derivative w.r.t experience and plug the values 1 and 5 for experience. It
yields 0.0321 and 0.0281.

• For workers with 1 and 5 years of experience, the marginal effects are estimated to be
approximately 3.2% and 2.8% respectively. You can observe that an extra year of
experience increases wages of the workers at the decreasing rate.
Interpreting the linear model
• The linear model 𝑦! = 𝒙"! 𝜷 + 𝜀! (3.1) has no meaning unless we make
some assumptions about 𝜀!
• 𝜀! has expectation zero and 𝒙! are given. Commonly, we state that 𝐸 𝜀! 𝒙! = 0
(3.2)
• Under this assumption, the regression model describes the expected value of
𝑦 given 𝒙, that is 𝐸 𝑦! 𝒙! = 𝒙"! 𝜷
• It answers, if we know 𝒙, what do we expect 𝑦 to be?
• The coefficient 𝛽# measures the expected change in 𝑦! if 𝑥!# changes by one
$% 𝑦! 𝒙!
unit, but all other variables in 𝒙! do not change. That is: = 𝛽#
$&!"
• The statement 'other variables in 𝒙! do not change' is a ceteris paribus
condition (other things equal). In multiple regressions single coefficients can
only be interpreted this way; strictly speaking we can only interpret 𝛽# if we
know which other variables are included
Ceteris paribus – other things equal
• If we are interested in the relationship between 𝑦! and 𝑥!" the other variables in 𝒙! act
as control variables.
• For example, what is the impact of an earnings announcement upon a firm’s stock
price controlling for overall market movements?
• Sometimes, ceteris paribus is hard to maintain.
• For example, what is the impact of age upon a person’s wage, keeping years of
experience fixed?
• Sometimes, ceteris paribus is impossible, for example if the model includes both age
and age-squared: the effect of age cannot be measured keeping age squared constant.
• Example: the model includes 𝛽# 𝑎𝑔𝑒! + 𝛽$ 𝑎𝑔𝑒! #
• In this case, we can interpret the derivative: the marginal effect of a changing age
%& 𝑦! 𝒙!
(ceteris paribus) = 𝛽# + 2𝛽$ 𝑎𝑔𝑒! ; it is the marginal effect of changing age if
%'!"
other variables in xi are constant.
Elasticities
• Often, researchers are interested in elasticities

• An elasticity measures the relative change in the dependent variable 𝑦! due to a


relative change in 𝑥!$

• Elasticities can be estimated directly from a linear% model formulated in natural logs
(excluding dummy variables): log 𝑦! = log 𝒙! 𝜸 + 𝜈!

where log 𝒙! is shorthand for a vector with elements 1 log 𝑥!" … log 𝑥!& ′. This
is called a loglinear model; it is assumed that 𝐸 𝜈! log 𝒙! = 0

'( 𝑦! 𝒙!
)!" '( log 𝑦! log 𝒙!
• In a loglinear model, elasticity is ≈ = 𝛾$
')!" *! ' +,- )!"
'( 𝑦! 𝒙! )!" )!"
• In a linear model, elasticity is ')!" *!
= 𝛽$ 𝒙 #𝜷
!
• Thus, in linear model elasticities are nonconstant but vary with 𝒙! , while in a
loglinear model we have constant elasticities.
Elasticities
• Thus:
• In a linear model, elasticity is coefficient times something; if you want to report
elasticity: specify where, e.g. at mean or median
• In a loglinear model, elasticity is the coefficient itself
• The choice of functional form is dictated by convenience of economic
interpretation in most cases
• In some cases a loglinear model may be preferred when explaining (log 𝑦! )
rather than 𝑦! helps to reduce heteroskedasticity problems
• If a dummy variable is included in a loglinear model, its coefficient measures
the expected relative change in 𝑦! due to an absolute (unit) change in 𝑥!#
• It is possible to include some explanatory variables in logs and some in levels
• The interpretation of a coefficient for a level variable is: relative change in 𝑦! resulting
from absolute change 𝑥!$ ; semi-elasticity
Let us look at some examples here.

10
11
Asymptotic properties of OLS

12
Asymptotic properties of OLS
• If the assumptions (A2) and (A5) are violated, the properties of the OLS
estimator may differ from those reported above.
• In many cases, the exact properties are unknown.
• We employ asymptotic theory, which refers to the question what happens if,
hypothetically, the sample size grows infinitely large 𝑁 → ∞
'
• (A6) ∑(
!)' 𝒙! 𝒙! ′ converges to a finite non-singular matrix 𝜮&& .
(
Note that if A-1 does not exist, A is called a singular matrix.
• (A7) 𝐸 𝒙! 𝜀! = 0
Vˆ{b} = s 2 ( X ¢X ) -1 (2.36)
N

åe 2
i
æ N ö -1
and ( X ¢X ) = ç å xi xi¢ ÷ s 1 é1 2ù
-1 2 N
Where s = 2 i =1
N -K è i =1 ø V {b2 } = å i 2 2 úû
1 - r232 N êë N i =1
( x - x ) (2.37)
and
Asymptotic properties of OLS
• We use this to approximate the properties of our estimator in a given
sample. (In reality, sample sizes rarely grow.)

• 𝒃 is a consistent estimator if 𝒃 approached 𝜷 as 𝑁 becomes large; we


write 𝑝𝑙𝑖𝑚 𝒃 = 𝜷; if 𝑁 grows the probability that 𝒃 differs from 𝜷
becomes arbitrarily small.

• An estimator can be biased, but still consistent

• Under assumptions (A6) and (A7) [which are weaker than the Gauss-
Markov assumptions (A1)-(A4)]: 𝑝𝑙𝑖𝑚 𝒃 = 𝜷
Asymptotic properties of OLS
• For testing purposes, asymptotic distributions are important; it can be shown
that for (A6) plus the Gauss-Markov assumptions (A1)-(A4): 𝑁(𝒃 −
𝜷) → 𝒩(𝟎, 𝜎 * 𝜮&& +' ) ‘ → means asymptotically distributed.’ The estimator is consistent and
asymptotically normal (CAN).

• In practice we have a finite sample and have to estimate 𝜎 * using 𝑠 * and


approximate the distribution, such that (A1)-(A4)&(A6) imply 𝒃 is
approximately 𝒩 𝜷, 𝑠 * 𝑿" 𝑿 +' ; the basis for all tests.
Vˆ{b} = s 2 ( X ¢X ) -1 (2.36)
N

• Note that åe 2
i
æ N ö
Where s = 2 i =1
and ( X ¢X ) -1
= ç å xi xi¢ ÷
N -K è i =1 ø
-1
and s 1 é1
2 N

V {b2 } = å i 2 2 úû
1 - r232 N êë N i =1
( x - x ) (2.37)
Multicollinearity
• In general, there is nothing wrong with including variables in the
model that are correlated, for example
• experience and schooling
• age and experience
• inflation rate and nominal interest rate
• However, when correlations are high, it becomes hard to identify the
individual impact of each of the variables.
• Multicollinearity is used to describe the situation when an exact or
approximate linear relationship exists between the explanatory
variables (regressors).
• It is the problem when an approximate linear relationship among the
explanatory variables leads to unreliable regression estimates.
Multicollinearity
• The signs of multicollinearity are:
- High standard errors (low t-values)
- Strange signs or magnitudes of coefficients
• This could lead to misleading conclusions.
• The variance of 𝑏# is inflated if 𝑥# can be approximated by the other
explanatory variables.
• The Variance Inflation Factor (VIF) can be used to detect multicollinearity:
' *
𝑉𝐼𝐹 𝑏# = , where 𝑅 # is the squared correlation coefficient
'+-" #
between the 𝑘-th explanatory variable and the other explanatory variables; a
VIF of 10 or more is usually considered 'high'
Multicollinearity

Exact multicollinearity arises when an exact linear relationship exists between the
explanatory variables. For example:
male = 1 – female

With exact multicollinearity, the OLS estimator cannot be computed.

The natural solution is to drop one explanatory variable (or more than one, if necessary).

Some programs (e.g. Stata) do this automatically, other programs (e.g. Eviews) give an
error message. [“near collinear matrix”]
48 Alternative parameterizations
AN INTRODUCTION TO LINEAR REGRESSION

Table 2.7 Alternative specifications with dummy variables

Dependent variable: wage


Specification A B C
constant 5.147 6.313 −
(0.081) (0.078)
male 1.166 − 6.313
(0.112) (0.078)
female − −1.166 5.147
(0.112) (0.081)
R2 0.0317 0.0317 0.0317
Note: Standard errors in parentheses.

19
Missing observations, outliers, and prediction
Outliers
• In calculating the OLS estimator, some observations may have a
disproportional impact.
• An outlier is an observation that deviates markedly from the rest of the sample.
• It could be due to mistakes or problems in the data.

• A outlier becomes an influential observation if it has a substantial impact on the


estimated regression line.

• See below Figure 2.3. What do you observe?


Impact of outliers

(c) John Wiley and Sons, 2012


21
Outliers
• Clearly, the inclusion of the outlier pulls down the regression line.

• The estimated slope coefficient when the outlier is included is 0.52 (with a
standard error of 0.18), and the 𝑅 * is only 0.18.
• When the outlier is dropped, the estimated slope coefficient increases to
0.94 (with a standard error of 0.06), and the 𝑅 * increases to 0.86.
• Approaches:
• investigate sensitivity of results
• test for the presence of outliers
• use robust estimation methods (LAD = least absolute deviation, estimates
conditional median rather than mean)
Outliers-regression approach
Missing observations, outliers, and prediction

Missing observations
• In particular Micro-economic data
• This needs to be properly indicated in the dataset, so that the software
would not take it as ‘zero’
• OLS estimator may be subject to sample selection bias
• One approach for using the complete sample is to use the sample average
and augment the model with a missing data indicator: but could still be
biased
• Another approach is hot deck imputation: missing values are replaced by
random draws from the available observed values, but these could be non-
random, so not advised
Missing observations, outliers, and prediction

Prediction
One of the goals for the econometrician is to make predictions, after having produced
the coefficient estimates and corresponding standard errors.
This means, we are interested in predicting the value of the dependent variable at a
given value for the explanatory variables.
The unbiased predictor 𝑦C! can be computed using the estimated 𝒃 coefficients for a
given value of regressor 𝒙! ′: 𝑦C! = 𝒙%! 𝒃 ; this means we are interested in predicting the
value of the dependent variable (use 'predict' command in stata)
Model selection and
misspecification
6.3 Model selection and misspecification
Model Specification

In any econometric investigation, choice of the model is one of the first steps.

What are the important considerations when choosing a model?

What are the consequences of choosing the wrong model?

Are there ways of assessing whether a model is adequate?


Model selection and misspecification

In principle, we may define two linear models,


one describing 𝐸 𝑦! 𝒙! = 𝒙! ′𝜷 and
another describing 𝐸 𝑦! 𝒛! = 𝒛! ′𝜸
The conditioning variables are different and both models may be correct – they just
explain something different.
However, if implicitly or explicitly it is assumed that 𝐸 𝑦! 𝒙! , 𝒛! = 𝒙! ′𝜷 and
𝐸 𝑦! 𝒙! , 𝒛! = 𝒛! ′𝜸
(all relevant variables are included) the two models are in conflict and at most one of
them can be correct.
Suppose, now, that we have gathered a lot of data on an endogenous variable 𝑦! and
a range of explanatory variables 𝑥!$ for 𝑘 = 2, . . , 𝐾
We can make two types of misspecifications of regressors
a relevant variable is excluded from the model or
an irrelevant variable is included in the model.
Model selection and misspecification

These misspecifications can be illustrated with the following models


Model 1: 𝑦! = 𝒙"! 𝜷 + 𝒛"! 𝜸 + 𝜀!
Model 2: 𝑦! = 𝒙"! 𝜷 + 𝜗!

A relevant variable is excluded if model 1 is the 'truth' but model 2 is estimated


An irrelevant variable is included if model 2 is the 'truth' but model 1 is estimated

If a relevant variable is excluded ,we call it omitted variable bias:


a bias in the OLS estimator owing to estimating the incorrect model because of omitted variables.

When an irrelevant variable is included, and we estimate model 1 while model 2 is the truth, this is less
of a problem. The main disadvantage is that we now include irrelevant information for estimating 𝜷;
this increases the variance; means 𝜷 is less accurately estimated; in this case it is better to estimate
𝜷 from the restricted model 2.
Model selection and misspecification

Thus; including irrelevant variables increases the variance of the estimators for the
other model parameters.

Potentially relevant variables come from economic theories and arguments.

It is also possible that a chosen model may have important variables omitted.

Our economic principles may have overlooked a variable, or lack of data may lead us to drop a variable even when it is
prescribed by economic theory.

On the other hand, including too few variables has the danger of biased estimates.

We need guidance on how to select the regressors.


6.3
Model Specification

Consider the model:


If we incorrectly omit wife’s education:

Omitting
Eq. 6.21WEDU leads us to overstate the effect of an extra year of
education for the husband by about $2,000
• Omission of a relevant variable (defined as one whose coefficient is nonzero)
leads to an estimator that is biased.
• This bias is known as omitted-variable bias
6.3
Model selection and misspecification
Model Specification

Write a general model as:

y = β1 + β 2 x2 + β3 x3 + e
Eq. 6.22

Omitting x3 is equivalent to imposing the restriction β3 = 0


It can be viewed as an example of imposing an incorrect constraint on the parameters
6.3
Model Specification

Now consider the model:

Notice that the coefficient estimates for HEDU and WEDU have not
changed a great deal. This outcome occurs because KL6 is not
highly correlated with the education variables.
6.3 Model selection and misspecification
Model Specification

You may think that a good strategy is to include as many variables as possible in
your model.

Doing so will not only complicate your model unnecessarily but may also
inflate the variances of your estimates because of the presence of irrelevant
variables.
6.3
Model Specification

Consider the model:

The inclusion of irrelevant variables has reduced the precision of the estimated
coefficients for other variables in the equation.
Model selection and misspecification

Some points for choosing a model:

1. Choose variables and a functional form on the bases of your theoretical and general
understanding of the relationship.

2. If an estimated equation has coefficients with unexpected signs, or unrealistic magnitudes, they
could be caused by a misspecification such as the omission of an important variable.

3. One method for assessing whether a variable or a group of variables should be included in an
equation is to perform significance tests.

4. Consider various model selection criteria.

5. The adequacy of a model can be tested using a general specification test known as RESET
(coming soon).
Some tests for model selection

Ways to test the relative goodness of fit of statistical models


Adjusted R-squared
AIC: Akaike Information Criterion
BIC: Schwarz Bayesian Information Criterion
All three methods are based on the performance of the estimated residuals relative to the number 𝐾 of included
regressors
1
( )𝒆" 𝒆
𝑁−𝐾
𝑅. # = 1 −
1
( ) 𝒚 − 𝑦. ′ 𝒚 − 𝑦.
𝑁−1
1 2𝐾
𝐴𝐼𝐶 = log 𝒆" 𝒆 +
𝑁 𝑁
1 " 𝐾
𝐵𝐼𝐶 = log 𝒆 𝒆 + log 𝑁
𝑁 𝑁
AIC/ BIC tools are used to select among different models
Lower values are better
Can be used if the endogenous variable is the same
6.3
Model Specification Table 6.2 Goodness-of-Fit and Information Criteria for Family Income
Example

Based on the above criterion, which model should be chosen?


6.3 Model selection and misspecification
Model Specification

A model could be mis-specified if:

• we have omitted important variables

• included irrelevant ones

• chosen a wrong functional form

• have a model that violates the assumptions of the multiple regression model
We would extend our topic next week and look at the
general specification tests like RESET test, Chapter 3

41

You might also like