Econometric Modeling:: Model Specification and Diagnostic Testing

Econometric Modeling:
Model Specification and

Diagnostic Testing
Chapter 13
Introduction
 Economists’ search for ‘truth’ has over the

years given rise to the view that economists
are people searching in a dark room for a
non-existent black cat econometricians are
regularly accused of finding one.
 Three golden rules of econometrics are test,
test, and test.
01/06/22 Prepared by Sri Yani K 2

The traditional view of econometric
modeling
 Average economic regression (AER) 
Gilbert (1986)
 Bottom-up approach  we start our model with a
given number of regressors and, based on
diagnostics, go on adding more variables to the
model
 In recent years, AER has come under heavy
criticism

Model selection criteria
 According to Harvey (1981):
1. Parsimony
2. Identifiability
3. Data Coherency
4. Data Admissibility
5. Theoretical consistency
6. Predictive power
7. Encompassing

 According to Hendry and Richard (1983):

1. Be data admissible
2. Be consistent with theory
3. Have weakly exogenous regressors
4. Exhibit parameter constancy
5. Exhibit data coherency
6. Be encompassing

 According to Gujarati (2003):

1. Parsimony
2. Identifiability
3. Goodness of fit
4. Theoretical consistency
5. Predictive power

Model selection criteria: summary
Harvey Hendry & Richard Gujarati
Criteria
(1981) (1983) (2003)
Parsimony  
Identifiability  
Data Coherency  
Data Admissibility  
Theoretical consistency   
Predictive power  
Encompassing  
Have weakly exogenous regressors 
parameter constancy 
Goodness of fit 

Specimetrics
 According to Leamer:
Specimetrics describe the process by which a
researcher is led to choose one specification of the
model rather than another; furthermore, it attempts to
identify the inferences that may be properly drawn from
a data set when the data generating mechanism is
ambiguous.
 The idea that a model must be tested before it can be

taken to be an adequate basis for studying economic
behavior has become widely accepted.
 Leamer’s approach and Hendry’s approach

Leamer’s approach to model selection
 Two of his contribution:

1. He has discussed How AER methodology
conducts specification searches and how, by
using Bayesian statistics, one can improve this
search process
2. He has suggested how the reporting of
regressions results can be strengthened by
undertaking an extreme bound analysis (EBA)

Leamer’s approach to model selection
There are six different reasons for model specification
searches:
Type of search Purpose
1. Hypothesis- To choose a “true” model
testing
2. Interpretive To interpret data involving several correlated
variables
3. Simplification To construct a “fruitful” model
4. Proxy To choose between measures that purport to
measure the same variable
5. Data selection To select the appropriate data for estimation and
prediction
6. Post data model To improve an existing model
construction
Hendry’s approach to model selection
 The Hendry of LSE approach is popularly known as

the top-down or general to specific approach  one
start with a model with several regression and then
whittles it down to a model containing only the
“important” variables
 We will have to try several specifications be we
finally settle down on the “final” model  TTT
methodology, that is, “test, test, and test”

Types of specification errors
1. Omission of a relevant variable(s)

2. Inclusion of an unnecessary variable(s)
3. Adopting the wrong functional form
4. Errors of measurement
5. Incorrect specification of the stochastic
error term

Types of specification errors
 Distinguish between model specification

errors and model mis-specification errors
 model specification errors  we have in mind a
“true” model but somehow we do not estimate the
correct model
 model mis-specification errors  we do not know
the true model is to begin with.

Consequences of model specification
errors
 Underfitting a model (omitting a relevant
variable)
the true model: Yi  1   2 X 2i  3 X 3i  ui
the estimate model: Yi  1   2 X 2i  ui
The consequences of omitting X3 are:
1. If X3 is correlated with X2, 1 and 2 are biased
and inconsistent, and the bias does not disappear
as the sample size gets larger

errors
2. Even if X2 and X3 are not correlated, 1 is biased,
although 2 is unbiased
3. The disturbance variance is incorrectly estimated
4. The variance is a biased estimator of the variance
of the true estimator
5. The confidence interval and hypothesis testing
procedures are likely to give misleading
conclusions
6. The forecast based on the incorrect model and the
forecast confidence interval will be unreliable

errors
 Overfitting a model (inclusion of an irrelevant
variable)
1. The OLS estimators of the “incorrect” model are all
unbiased and consistent
2. The error of variance is correctly estimated
3. The confidence interval and hypothesis testing
procedures remain valid
4. However, the estimated will be generally inefficient,
that is, their variance will be larger than the variance
of the true model

errors
 There are the asymmetry in the two types of
specification biases.
 An unwanted conclusion: it is better to include
irrelevant variables than to omit the relevant
ones
 In general, the best approach is to include only
explanatory variables that, on theoretical
ground, directly influence the dependent
variable and that are not accounted for by other
included variables

Tests of specification errors
 Detecting the presence of unnecessary variables

 Tests for omitting variables and incorrect
functional form
 Examination of residuals  if there are specification
errors, the residuals will exhibit noticeable patterns
 The Durbin-Watson d statistic
 Ramsey’ RESET Test
 Lagrange Multiplier (LM) Test for adding variables

Ramsey’s RESET Test
 Test for specification error  RESET
(regression specification error test)
The steps involved in RESET are:
1. Estimate the regression model, obtain Yî
2. Rerun the model introducing Yˆ in some form as an additional regressors
i
3. We can use the F test
F
 R  R  number of new regressors
2
new
2
old
1 R2
new   n  number of parameters in the new model 
4. If the computed F value is significant, can accept the hypothesis the model
is mis-specified

Example:
Dependent Variable: Y
Method: Least Squares
Sample: 1 10
Linear model
Included observations: 10
Variable Coefficient Std. Error t-Statistic Prob.
C 166.4667 19.02142 8.751537 0.0000
X 19.93333 3.065580 6.502305 0.0002
R-squared 0.840891 Mean dependent var 276.1000
Adjusted R-squared 0.821002 S.D. dependent var 65.81363
S.E. of regression 27.84451 Akaike info criterion 9.668005
Sum squared resid 6202.533 Schwarz criterion 9.728522
Log likelihood -46.34002 F-statistic 42.27997
Durbin-Watson stat 0.715725 Prob(F-statistic) 0.000188

Example:
Dependent Variable: Y
Sample: 1 10
Included observations: 10 Cubic model
C 2140.215 131.9893 16.21507 0.0000
X 476.5521 33.39086 14.27193 0.0000
YF^2 -0.091865 0.006192 -14.83680 0.0000
YF^3 0.000119 7.46E-06 15.89677 0.0000
R-squared 0.998339 Mean dependent var 276.1000

Example:
F
 0.9983  0.8409  2 0.1574 2
 
0.0787
 277.4035
 1  0.9983  10  4  0.0017 6 0.0002833
F value is highly significant, indicating that the model linier is
mis-specified. 60
40
20
0
E1= residual from linear model
-20
E3= residual from cubic model
-40
1 2 3 4 5 6 7 8 9 10
E1 E3

Lagrange Multiplier (LM) Test for
adding variables
1. Estimated the linear (restricted) model by OLS and
obtain the residual
2. If unrestricted model is the true model, the residual
obtained should be related to such as the squared
and cubed output terms. Estimate the residuals on
all the regressors:
ut  1   2 X t   3 X t   4 X t  vt
ˆ 2 3
3. For large sample size: nR 2

~  2
number of restrictions
asy
If the chi-square value > the critical chi-square

value, we reject the restricted model.

Example:
Dependent Variable: E1
Sample: 1 10
Included observations: 10
C -24.70000 6.375322 -3.874314 0.0082
X 43.54433 4.778607 9.112348 0.0001
X^2 -12.96154 0.985665 -13.15005 0.0000
X^3 0.939588 0.059106 15.89677 0.0000
R-squared 0.989562 Mean dependent var -3.55E-15

Example:
nR 2   10   0.9896   9.896
1%,
2
df  2  9.21034
nR 2  1%,
2
df  2  reject the restricted models (linier)
We reached the similar conclusion on the basis of

Ramsey's RESET Test

Errors of measurement
 We have assumed implicit that the dependent
variable Y and the explanatory variables, the
X’s, are measured without any errors.
 Variables are ‘accurate’; they are no quest
estimate, extrapolated, interpolated, or rounded
off in any systematic manner.
 The ideal is not met in practice for a variety
reasons, such as nonresponse errors, reporting
errors, and computing errors.
 Errors of measurement is a potentially
troublesome problem

 Error of measurement in the dependent
variable
 Still give unbiased estimates of the parameters and
their variance, the estimated variances are larger
than in the case where there are no errors of
measurement

 
2

Yi     X i  ui var ˆ  u 2
 xi
  
 
2 2
Yi  Yi    i     X i  ui   i var ˆ  u 2 
 xi

 Error of measurement in the explanatory
variable X
 The OLS estimators are not only biased but also
inconsistent, they remain biased even if the sample
size n increases indefinitely
Yi     X i  ui and X i  X i  wi
 
Yi     X i  wi  ui     X i   ui  wi 
Yi     X i  zi
cov  zi , X i   E  zi  E  zi    X i  E  X i  
 
 E  ui   wi   wi   E   wi2    w2

Nested vs Non-nested Models
Model A: Yi  1   2 X 2i   3 X 3i   4 X 4i   5 X 5i  ui
Model B: Yi  1   2 X 2i   3 X 3i  ui
Model C: Yi  1   2 X 2i   3 X 3i  ui
Model D: Yi  1   2 Z 2i   3 Z 3i  vi
Model E: Yi  1   2 ln Z 2i   3 ln Z 3i  wi
Model B is nested in Model A
Model C and D are non-nested
Model D and E are non-nested

Tests of Non-nested Hypothesis
 According to Harvey (1990), there are two
approach to testing non-nested hypothesis: the
discrimination approach and the discerning
approach
 The discrimination approach
 Given two or more competing models that the
regressand must be the same
 Choose between two or more models based on
some goodness of fit criteria
 Criteria: R2, Adj R2, Akaike’s IC, Schwarz’s IC,
Mallow’s Cp criterion, forecast chi-square

Tests of Non-nested Hypothesis
 The discerning approach
 Where in investigating one model, we take into
account information provided by other models
Kind of the test:
1. The non-nested F test or Encompassing F test
2. Davidson-MacKinnon J test
3. Cox test
4. JA test
5. P test
6. Mizon-Richard encompassing test

The non-nested F test (1)
Model F: Yi  1  2 X 2i  3 X 3i  4 Z 2i  5 Z3i  ui
 Notice that model F nests or encompasses
models C and D. But not that C is not nested in
D and D is not nested in C, so they are non-
nested models
 If Model C is correct, 4 = 5 = 0, whereas Model
D is correct if 2 = 3 = 0

 There are problems with this testing

procedure
1. If the X’s and the Z’s are highly correlated, it
is quite likely that one or more of the ’s are
individually statistically insignificant,
although on the basis of the F test one can
reject the hypothesis that all the slope
coefficients are simultaneously zero.

2. Suppose we choose Model C as the

reference hypothesis, and find that all its
coefficients are significant. Now we add Z2
or Z3 or both to the model and find that their
incremental contribution to the ESS is
statistically insignificant. We decide to
choose Model C.
3. The artificially nested model F may not have
any economic meaning

Davidson-MacKinnon J Test (1)
 The J test proceed as follows:
1. Estimate Model D and obtain the estimated Y
values, Yˆ D
i
2. We add the predicted Y value as an additional
regressor to Model C
Yi  1   2 X 2i   3 X 3i   4Yî D  ui
3. Using the t test, test the hypothesis that 4=0
4. If the hypothesis that 4=0 is not rejected, we can
accept Model C as the true model.
5. We reverse the roles of hypothesis
Yi  1   2 Z 2i   3 Z 3i   4Yî C  ui

Davidson-MacKinnon J Test (2)
 Since the tests are performed independently, we
have the following likely outcome:
Hypothesis: 4=0
Hypothesis: 4=0
Do not reject Reject
Do not reject Accept both C and D Accept D, reject C
Reject Accept C, reject D Reject both C and D
 J test may not be vary powerful in small sample

because it tends to reject the true hypothesis.

 Criteria that have been used to choose among

competing model and/or to compare models for
forecasting purposes.
 We distinguish between:
 In-sample forecasting  how the choose model fits
the data in a given sample
 Out-sample forecasting  concerned with determining
how a fitted model forecasts future values of the
regressand, given the values of the regressors

Several criteria are used for this purpose.
1. The R2 criteria
2. The adjusted R2 criteria
3. Akaike Information Criterion (AIC)
4. Schwarz Information Criterion (SIC)
5. Mallow’s Cp Criterion
6. Forecast chi-square
 All these criteria aim it minimizing the residual sum

of squares

The R2 criteria
ESS RSS
R 
2
 1
TSS TSS
There are problem with R2:
 It measures in-sample goodness of fit. There is
no guarantee that it will forecast well out-of-
sample observations.
 In comparing two or more R2’s, the dependent
variable must be the same
 An R2 can not fall when more variables are
added to the model

The adjusted R2 criteria
RSS  n  k 
R  1
2
TSS  n  1
R 2  R2
 The adjusted R2 will increase only if the
absolute t value of the added variable is
greater than 1.
 For comparative purposes, the adjusted R2 is
better measure than R2
 The regressand must be the same for the
comparison to be valid

Akaike Information Criterion (AIC)
AIC  e 2 k / n
i
ˆ
u 2
RSS
 e2k / n
n n
 2k   RSS 
ln AIC     ln   2k n  penalty factor
 n   n 
 The model with the lowest value of AIC is

preferred.
 It is useful for not only in-sample but also out-
sample forecasting performance of a regression
model.
 It is useful for both nested and non-nested models.
 It has been also used to determine the lag length in
an AR(p) model

Schwarz Information Criterion (SIC)
k /n  i
2
ˆ
u RSS
SIC  n n k /n
n n
k  RSS 
ln SIC    ln n  ln  
n
   n 
 SIC imposes a harsher penalty than AIC
 The lower the value of SIC, the better the
model.
 SIC can be used to compare in-sample or out-
sample forecasting performance of a model

Mallow’s Cp Criterion (1)
 We have a model consisting of k regressor, including
the intercept. Suppose that we only choose p
regressors (p  k) and obtain the RSS
RSS p
Cp  2   n  2 p
ˆ
 If the model with p regressors is adequate in that it
does not suffer from lack of fit, it can be shown that
E(RSSp)=(n-p)2. So,
E  Cp 
 n  p  2
   n  2 p  p
 2

Mallow’s Cp Criterion (2)
 We would look for a model
that has a low Cp value, about
equal to p. Cp
Cp= p
 In other words, following the
B
principle of parsimony, we will A
choose a model with p
regressors (p < k) that gives a
fairly good of fit to the data
 Model A may be preferable to
Model B, as it is closer to the
Cp = p line than Model B p

Forecast Chi-square (2)
 Suppose we have a regression model based on
n observations and suppose we want to use it to
forecast the mean value of the regressand for
an additional t observation
 n1 ui
n t
2
ˆ
Forecast,  
2
ˆ 2
 If we hypothesize that the parameter values
have not changed between the sample and
postsample, it can be shown that the statistic
follow the chi-square distribution with t degrees
of freedom

Additional topics in econometric
modeling
We consider these topics:

1. Outliers, leverage, and influence
2. Recursive least squares
3. Chow’s prediction failure test

Outliers, Leverage, and Influence
 Outlier  an observation with a “large residual”

 Leverage  a data point is said to exert (high)
leverage if it is disproportionately distant from
the bulk of the values of a regressor(s). It matter
because it is capable of pulling the regression
line, thus distorting the slope of regression line.
 Influence  the removal of such a data point
from the sample can dramatically affect the
regression line

Y OLS Line for all the data

OLS line with the outlier (*) omitted
The outlier (*) is near the

mean value of X and has low
leverage and little influence
on regression coefficients
*


The outlier (*) is far away

* from the mean value of X
and has high leverage as well
as substantial influence on
regression coefficients


* The outlier (*) has high

leverage but low influence on
the regression coefficients
because it is in line with the
rest of the observations

According to Draper and Smith (1998):
Automatic rejection of outlier is not always a wise

procedure. Sometimes the outlier is providing
information that order data points cannot due to
the fact that it arises from an unusual combination
of circumstances which may be of vital interest
and requires further investigation rather than
rejection. A a general rule, outliers should be
rejected out of hand only if they can be traced to
causes such as errors of responding the
observations or setting up the apparatus.
Otherwise, careful investigation is in order.

Recursive Least Squares
 The structural stability of regression model  the
Chow test. What happen if we do not know the point
of the structural break?
 We can used recursive least squares (RELS)
 The basic idea: Yt  1   2 X t  ut

 Suppose for period 1970-1995
 Estimate the model for 1970-1974, obtaining the
estimators
 Estimated for 1970-1975

Recursive Least Squares
 Estimated for 1970-1976, and we go on adding an
additional data until the entire sample
 If you plot the estimated values of these
parameters against each iteration, you will see
how the values of estimated parameters change.
 If the model under consideration is structurally
stable, the changes in the estimated values of the
two parameters will be small and essentially
random. However, if the estimated values of
parameters change significantly, it would indicate
a structural break.

Chow’s Prediction Failure Test
 We will revert to the US savings-income
regression for period 1970-1995
 We estimate the model for the period 1970-
1981 and obtaining the parameters.
 Using the actual values of income for period
1982-1995, we predict the values of savings for
period 1982-1995.
 If there is no serious structural change in
parameter values, the values of saving
estimated for 1982-1995 should not be vary
different from the actual values

Chow’s Prediction Failure Test
 Whether the difference between the actual and
estimated savings value is large or small can be
tested by the F test as follow:
F
  t  t n2
ˆ
u *2
 ˆ
u 2

  uˆ   n  k 
2
t 1
n1  number of observations in first period (1970-1981)

n2  number of observations in second period (1982-1995)
 t  RSS when the equation eatimated for all observations
ˆ
u *2
 t  RSS when the equation eatimated for the first observations

ˆ
u 2

Ten Commandments of Applied
Econometrics
According to Peter Kennedy (1998):
1. use common sense and economic theory.
2. ask the right questions (i.e. put relevance before
mathematical elegance)
3. know the context (do not perform ignorant
statistical analysis)
4. inspect the data

Ten Commandments of Applied
Econometrics
5. not worship complexity. Use the KISS (keep it
stochastically simple) principle
6. look long and hard at any results
7. beware the costs of data mining
8. be willing to compromise (do not worship textbook
prescriptions)
9. not confuse significance with substance
10. confess in the presence of sensitivity (that is,
anticipate criticism)

Econometric Modeling:: Model Specification and Diagnostic Testing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometric Modeling:: Model Specification and Diagnostic Testing

Uploaded by

Copyright:

Available Formats

Econometric Modeling:

Model Specification and

 Economists’ search for ‘truth’ has over the

01/06/22 Prepared by Sri Yani K 2

01/06/22 Prepared by Sri Yani K 3

01/06/22 Prepared by Sri Yani K 4

 According to Hendry and Richard (1983):

01/06/22 Prepared by Sri Yani K 5

 According to Gujarati (2003):

01/06/22 Prepared by Sri Yani K 6

01/06/22 Prepared by Sri Yani K 7

 The idea that a model must be tested before it can be

 Leamer’s approach and Hendry’s approach

01/06/22 Prepared by Sri Yani K 8

 Two of his contribution:

01/06/22 Prepared by Sri Yani K 9

 The Hendry of LSE approach is popularly known as

01/06/22 Prepared by Sri Yani K 11

1. Omission of a relevant variable(s)

01/06/22 Prepared by Sri Yani K 12

 Distinguish between model specification

01/06/22 Prepared by Sri Yani K 13

01/06/22 Prepared by Sri Yani K 14

01/06/22 Prepared by Sri Yani K 15

01/06/22 Prepared by Sri Yani K 16

01/06/22 Prepared by Sri Yani K 17

 Detecting the presence of unnecessary variables

01/06/22 Prepared by Sri Yani K 18

3. We can use the F test

01/06/22 Prepared by Sri Yani K 19

01/06/22 Prepared by Sri Yani K 20

01/06/22 Prepared by Sri Yani K 21

01/06/22 Prepared by Sri Yani K 22

3. For large sample size: nR 2

If the chi-square value > the critical chi-square

01/06/22 Prepared by Sri Yani K 23

01/06/22 Prepared by Sri Yani K 24

We reached the similar conclusion on the basis of

01/06/22 Prepared by Sri Yani K 25

01/06/22 Prepared by Sri Yani K 26

01/06/22 Prepared by Sri Yani K 27

01/06/22 Prepared by Sri Yani K 28

01/06/22 Prepared by Sri Yani K 29

01/06/22 Prepared by Sri Yani K 30

01/06/22 Prepared by Sri Yani K 31

01/06/22 Prepared by Sri Yani K 32

 There are problems with this testing

01/06/22 Prepared by Sri Yani K 33

2. Suppose we choose Model C as the

01/06/22 Prepared by Sri Yani K 34

01/06/22 Prepared by Sri Yani K 35

Do not reject Accept both C and D Accept D, reject C

Reject Accept C, reject D Reject both C and D

 J test may not be vary powerful in small sample

01/06/22 Prepared by Sri Yani K 36

 Criteria that have been used to choose among

01/06/22 Prepared by Sri Yani K 37

 All these criteria aim it minimizing the residual sum

01/06/22 Prepared by Sri Yani K 38

01/06/22 Prepared by Sri Yani K 39

01/06/22 Prepared by Sri Yani K 40

 The model with the lowest value of AIC is