You are on page 1of 15

Economics Department of the University of Pennsylvania

Institute of Social and Economic Research -- Osaka University

On Comparing Regression Models in Levels and First Differences


Author(s): A. C. Harvey
Source: International Economic Review, Vol. 21, No. 3 (Oct., 1980), pp. 707-720
Published by: Wiley for the Economics Department of the University of Pennsylvania and Institute of Social
and Economic Research -- Osaka University
Stable URL: http://www.jstor.org/stable/2526363 .
Accessed: 08/09/2013 06:45

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley, Economics Department of the University of Pennsylvania, Institute of Social and Economic Research --
Osaka University are collaborating with JSTOR to digitize, preserve and extend access to International
Economic Review.

http://www.jstor.org

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
INTERNATIONAL ECONOMIC REVIEW
Vol. 21, No. 3, October, 1980

ON COMPARING REGRESSION MODELS IN


LEVELS AND FIRST DIFFERENCES*

BY A.C. HARVEY1

1. INTRODUCTION

It has long been recognized that regressions involving economic variables in


levels can be misleading. A high value of R2 is often obtained even when there
is no underlying causal relationship between the dependent and explanatory
variables. This has been stressed again recently by Granger and Newbold [1974].
Spurious correlations are less likely to occur with variables in first differences,
and this has led some researchers to adopt first difference formulations auto-
matically.2 However, if the choice is between an equation in levels and a cor-
responding equation with the same variables in first differneces, the relative
merits of the two formulations should, in the absence of any a priori guidelines,
be assessed on statistical grounds. The situation envisaged is therefore one in
which models involving dependent variables in both levels and differences have
been fitted, and it is desired to discriminate between them on the basis of goodness
of fit.
The proposed criteria are based on the maximized log-likelihood function,
log L. This is a natural measure of goodness of fit; see Sargan [1964]. How-
ever, when competing models contain different numbers of parameters some
adjustment is necessary. The Akaike Information Criterion (AIC) makes an
allowance for the number of parameters estimated, v, by adopting a decision rule
to select the model for which
(1.1) AIC= -2logL+ 2v
is a minimum; see Akaike [1973].
Section 2 of the paper considers the basic problem of discriminating between
a linear regression model in levels and the corresponding model in first differences.
Several criteria are developed and then applied in the context of a simple
Keynesian model first estimated by Friedman and Meiselman [1963]. Since the
conventional R2 can be misleading in these circumstances, an alternative defini-

* ManuscriptreceivedMarch 8, 1978; revisedJuly 30, 1979.


1 The first draft of this article was writtenwhile I was -visitingthe Departmentof Economics,
Universityof BritishColumbia,and was financedby a U.B.C. researchgrant. I would like to
thank R. W. Farebrother,D. F. Hendry, A. Pagan and T. Wales for their comments and the
refereefor some helpful suggestions. I would also like to thank David Ryan for carrying out
the Monte Carlo experimentsof Section 6 with admirableefficiency. However, I am solely
responsiblefor any errors.
2
See, for example,the argumentsin Williams[1978].
707

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
708 An,C. HARVEY

tion is proposed in Section 4. This is defined in terms of first differences, but


may readily be obtained from a regression run in levels.
The results are extended to models with ARMA disturbances in Section 5.
Discriminating between different ARIMA models in univariate time series analysis
may be regarded as a special case of this problem. Ozaki [1977] has recently
proposed a modification of the AIC in this context, but for reasons discussed in
Section 2 his criterion is not the one favored here.
The results of a series of Monte Carlo experiments are discussed in Section 6.
These focus on the problem of discriminating between a levels model with first-
order autoregressive disturbances and a model in first differences. As well as
providing an indication of the effectiveness of the criterion, they show the con-
sequences of an incorrect choice regarding whether the model should be in levels
or first differences.
Although the development in the paper is primarily in terms of static regression,
the same methodology may be applied in assessing goodness of fit in dynamic
models. This is particularly relevant in the case of transfer function models, as
described by Box and Jenkins [1976, Chapter 11]. It is arguably less important,
however, for dynamic models which are cast in the form of stochastic difference
equations. Within this context, the whole question of levels and differences can
be interpreted-in a totally different way; see, for example, Hendry and Mizon
[1978].

2. THE SIMPLE LEVELS AND FIRST DIFFERENCE MODELS

The classical linear regression model in levels is


= LX + St,
(2.1) + X I# T-,., T
where Ytis the t-th observation on the dependent variable, xt is a k x I vector of
explanatory variables at time t, ft is a k x I vector of unknown parameters and o
is an unknown scalar parameter. The disturbance term, Et, is independently and
normallydistributedwith meanzero and variance,o2, i.e., et-NID(O, c2).
Model (2.1) is often contrasted with the classical first difference form,
(2.2) y t = G(nx)' + qt, t 2,..., T
where Ayt-y =t-Yt- AAxtis a k x I vector of first differences, y is a k x I vector
of unknown parameters and qt NID(O, a2). Model (2.2) does not include a
constant term since this would imply a linear time trend in (2.1).
Even with an allowance made for the extra parameter in (2.1), a direct com-
parison of the maximized likelihood functions of (2.1) and (2.2) would not be
valid, since model (2.2) is a hypothesis about the distribution of T- 1 first differ-
ences. There are two ways of resolving this difficulty. The first is to consider
what (2.2) implies about the distribution of the levels, while the second is to
consider what (2.1) implies about the distribution of first differences.
Re-arrangingthe first difference model (2.2) yields

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
LEVELS AND FIRST DIFFERENCES 709

(2.3) Y't= Yt-1 + (ext)' y+ 1t, t = 2,..., T.


If Yi is regarded as fixed, a likelihood function for Y2'*' YT may be defined and
this has exactly the same form as the likelihood function for (2.2). However
a direct comparison of the maximized likelihood functions still cannot be made
since (2.1) contains an extra parameter, and consists of T observations. One
solution, based on the route apparently followed by Ozaki [1977] in the context
of ARIMA model selection, is to construct the AIC for each model and then to
gross up the AIC for the first difference formulation by a factor of T/(T- 1).
The two models may then be compared on the basis of the criterion

(2.4) - 02 exp 2(k+2) 2(k + 1)


Q~~~ T T- 1f
where 62 and aj are the ML estimators of o0 and 4 respectively; see Ozaki
[1977]. The levels model is accepted if (t < 1.
An alternative solution is to regard yi as being generated from a levels equation
of the form (2.1), rather than being fixed. The classical first difference model,
(2.2), is then augmented by the equation
(2.5) l a + x17 + al,
where q NID(0, U2). A likelihoodfunctionfor all T observations,Yi' Y25 ...
YT1may now be defined from (2.5) and (2.3). However the ML estimator of y
and the residual sum of squares both remain unchanged when yi is added to the
data set; cf. Brown, Durbin and Evans [1975, p. 153]. Since the augmented first
difference model contains the same number of parametersas does the levels model,
a comparison may be based directly on the maximized likelihood functions. This
is equivalent to comparing SSEo and SSE1, the residual sums of squares in the
levels and first difference regressions respectively. Alternatively, s2 and s2, the
unbiased estimators of a and o2, may be compared since both the levels and first
difference regressions involve T- k - 1 degrees of freedom. The proposed cri-
terion is therefore
(2.6) (5 -- SSEO/SSE, = S21S2

with the levels formulation selected if 6 < l.


The above criterion clearly has a more attractive form than (2.4). Furthermore
although augmenting the first difference model by (2.5) is to some extent arbitrary,
it is less open to objection than the device of grossing up the AIC to allow-for a
differing number of observations.
A completely different basis for comparison between (2.1) and (2.2) is to take
first differences in (2.1). The first observation is no longer assumed to be fixed,
and the resulting model is
(2.7) (y (Axt)'f
' + V12,.. t T,
where the disturbance term, vt, is generated by a first-order moving average pro-
cess:

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
710 A. C. HARVEY

(2.8) Vt = et + Oet-l,

with 0=-I.
The GLS estimator of /3 in (2.7) is the ML estimator. If this is denoted by /3,
the residuals are defined by
(2.9) VI.= lvt - (xt)'/, t=2,...,T.

The maximized log-likelihood function is

(2.10) log L(ay2, ..., Iy T).--(T-1


= ) (n2+I)

PI(T- 1) In {v`Q-1v/(T- I)} la IQ,


In

where V= V VT) and a2Q is the covariance matrix of the disturbances in (2.7).
The last term in (2.10) may be evaluated very easily since 101= T Comparing
(2.10) with the log-likelihood for (2.2) therefore provides a basis for discriminating
between the two models, since (2.2) and (2.7) contain the same number of un-
known parameters. However, in the Appendix it is shown that the application
of GLS to (2.7) is not necessary, since V'-"v' is identical to SSEO. This suggests
the adoption of the. criterion

(2.11)
(2.11) 6*= SSE0
0: A*= SSE1 exp [(T-)' ln T],
the levels formulation again being chosen if S*< 1.
The above three criteria will tend to yield similar results if T is at all large.
However, (2.6) is the most appealing in practice. Furthermore this approach
retains its simplicity when extended to situations where higher order differencing
or seasonal differencing is involved. With quarterly observations, for example,
a levels model might be compared with one in fourth differences. If the T-4
fourth difference equations are augmented by four levels equations with quarterly
dummies, it is once again straightforward to justify the use of s2 as a means of
measuring goodness of fit.

3. APPLICATION

The simple Keynesian autonomous expenditure model estimated by Friedman


and Meiselman [1963] provides a useful example for comparing levels and dif-
ferences. The levels formulation is
(3.1) Ct =xo +/3A? + s =I..., IT
where Ct is personal consumption expenditures in current dollars and At is autono-
mous expenditure in current dollars. In the debate following the Friedman-
Meiselman article, the model was sometimes estimated in-levels and sometimes
in differences, but it seems that no attempt was made at statistical discrimination

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
LEVELS AND FIRST DIFFERENCES 711

in a systematic fashion.
The simple autonomous expenditure model appears from the Friedman-
Meiselman article, and from later work, to give a reasonably satisfactory 'ex-
planation' of the behavior of consumption over the period 1929-1939. It is less
impressive in other periods, however, suggesting that some change in specification
may be appropriate. Restricting attention to the 1929-1939 period, therefore,
the results are as follows:
(3.2) Ct = 58335.9 + 2.498At,
(1169.9) (0.312)
R2= 0.8771 d= .89 SSE= 11943x 104
(3.3) Act = 1.993AAt
(0.324)
R2 = 0.8096 d = 1.51 SSE = 83872 x 103.
The figures in parenthesis are standard errors.
The levels equation has a higher R2 than the differenced equation. However,
a comparison of R2's is irrelevant in this context. The residual sum of squares
is considerably smaller in (3.3) and this is reflected in values of 1.40, 1.49 and
1.81 for 6. 6t and 6* respectively. The evidence in favor of first differences is
further supported by the Durbin-Watson d-statistic being very low in (3.2).

4. COEFFICIENT OF MULTIPLE CORRELATION FOR FIRST DIFFERENCES

The irrelevance of R2 in comparing the goodness of fit of levels and first differ-
ence models suggests that some modification may be appropriate. Since R2 is
so open to misinterpretation for regressions involving time series data in levels,
the most useful approach is to define a coefficient of multiple correlation in terms
of first differences. If R2 denotes the coefficient of multiple correlation for the
first differencemodel, the corresponding R2 for the levels model may be defined by

R2 =1-T
D T SSE = 1SSE0(I-R )
E, (,Ayt-,Ayt) 21 ~~~~~SSE,

where zy is the arithmetic mean of the Ayt's. Thus R2 = R2 if SSEo = SSE1 while
in the example of the previous section

- 2 =D 1 _ 119430 (1 - 0.8096) = 0.7289.


83872
The term SSE0=Q'Q-1'.rmay be interpreted in this context as the sum of the
squares of the standardized one-step ahead prediction errors in (2.7); cf. Harvey
and Phillips [1979].
Although R2 cannot exceed unity, it may be negative. This might seem to be

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
712 A. C. HARVEY

an undesirable property, but, in fact, a negative value of R2 has a rather useful


interpretation. Fitting a mean to the first differences is equivalent to assuming
a naive model3 which predicts that the change in yt is equal to the average change
in the past. Thus, if R2 is negative, the levels model is giving a poorer 'explana-
tion' of the changes in y than is the naive model. If y is a variable in logarithms
this naive model is based on the average growth rate.
Since a first difference model need not contain a constant term, there is a case
for defining R2 and R2 without the adjustment for the mean. However, because
of the strong trends typically encountered in economic time series a naive model
which simply predicts that the current level of y is equal to its value in the previ-
ous time period would not be a particularly stringent yardstick.

5. SERIAL CORRELATION

The 6 criterion may be extended to discriminate between levels and difference


formulations when the disturbances have been modeled by stationary ARMA
processes.
Suppose the disturbances in a levels model of the form (2.1) are denoted by
Ut, t = 1,..., T, and are assumed to follow a stationary and invertible ARMA(po,
q0) process,
(5.1) Ut = l4t-I + + Et+ +
Et + t-i ? + 08t-qq
in which st-NID(O, a2). Let U2VO be the covariance matrix of (u1,..., UT)'.
Conditional on 00 =(0q,..., 4p)' and ?0=(01 ...O Oq)' the residual sum of squares
from GLS estimation, is
(5.2) SSEo = i'Vo1
where the t-th element of the Tx 1 vector, a, is the residual,
(5.3) tit = Ytt-- tA
and a and /3 are the GLS estimators of a and /3.
The difference model (2.2) may be expanded similarly by replacing 'h by a new
disturbance term, wt, which is assumed to follow an ARMA(pl, q1) process and
has a covariance matrix o2V1. The argument used to derive (2.6) may be applied
again to give the criterion,

6=
(5.4) SSE? exp f ln(IV01/1V11)+ 2 (po+qo-pl -qi))

where SSE1 =w4 W. As before, the levels model is accepted if 6 is less than
one.
Allowing the levels and first difference models to have ARMA disturbances
suggests the possibility of nesting the hypothesis. Taking first differences in the

3 Such a model is of the form Yt=a+yt-i+st, and is termed a random walk with drift.

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
LEVELS AND FIRST DIFFERENCES 713

levels formulation gives a model of the form (2.7) with vt generated by an


ARMA(po, q0?+1) process. If q1>0, any levels model with q0<q1 and Po<Pl
will be nested within the first difference model; cf. Plosser and Schwert [1977].
Unfortunately, the non-invertibility of the ARMA process for Au, means that
there is no theoretical basis on which to construct a likelihood ratio test of the
levels formulation. More fundamentally nesting a levels model within a first
difference model in this way is somewhat arbitrary,since there may be no a prior
reason for wishing to have qI > q0 and PI > po in the specification of wt.
If an AR(1) disturbance term is fitted to the Friedman-Meiselman model, (3.1),
for 1929-1939 the result is

(5.5) C = 59158 + 2.172A


(2049) (0.290)
4=0.64 d = 1.41 SSE = 76866 x 103.
The 3 statistic is given by

SSE, exp {[ln(1-42)1l+2]/T}


=SSEo

= 76866 x 1.054 x 1.2216 = 1.18.


83878
This indicates a clear preference for the first difference model, in contrast to a
comparison of SSEo with SSE1 or CO with U1. Both of these criteria would
indicate a preference for the levels formulation since no account is taken of the
extra parameter, 0, in the model. Indeed the comparison between SSEo and
SSE1 is reflected in the R2 of 0.8255.
In a recent critique of the Friedman-Meiselman study, Savin [1978] argued
that (3.1) should be estimated with AR(2) disturbances. However, the criteria
described here again indicate a preference for a simple first difference model.
The value of 5 in this case is 1.12.

6. MONTE CARLO RESULTS

A series of Monte Carlo experiments were carried out in order to assess the
effectiveness of the criterion described in the previous section.
Observations on Ytwere generated by a levels model of the form
(6.1) Yt = a + fixt + Ut, = ,
in which x = /3=1, and the disturbances followed a first-order autoregressive
process,
(6.2) Ut = ut-1 + t,

with et-NID(O, 0.0036). Values of q equal to 0.5 and 0.9 were used in (6.2).
Observations were also generated for first differences of yt using the model

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
714 A.-C. HARVEY

(6.3) 4Yt -= Y/Xt + 1t, I = 2,..., T,

where qtNID(O, 0.0036) and y=1. A series of observations in levels was then
constructed by arbitrarily setting Yi = 1.
Values of the explanatory variable, xt, were generated artificially, but were kept
fixed for repeated realizations of the disturbances. Both trending and stationary
data sets were employed. In the trending case, values of the explanatory variable
were generated from
(6.4) xt= exp (0.04t) + wt, t = 1,..., T,
where wt-NID(0, 0.0009), while for stationary data, xt-NID(O, 0.0625). These
data formed the basis for both the levels and first differences model. These
methods of generating explanatory variables correspond to the methods employed
by Beach and MacKinnon [1978] in their study of the properties of estimators
for regression models with AR(1) disturbances. However, it should be noted
that the actual values taken by ox,/3, y and the variances of -t and Ih have no effect
on the overall pattern of the results.
Overall, therefore, experiments were carried out for six different models: levels
with AR(1) disturbances and 0=0.5, levels with AR(1) disturbances and 0=0.9,
and first differences, for both trending and stationary data. Samples of sizes 20,
50 and 100 were employed for each model. In all cases three estimation pro-
cedures were carried out. These were as follows:
(a) Full maximum likelihood (ML) estimation of ox,/3 and qYon the assump-
tion that the model is of the form (6.1), (6.2); cf., Beach and MacKinnon
[1978].
(b) OLS applied to the first difference model.
(c) Full ML estimation of the first difference model on the assumption that
the disturbances, wt, are generated by an AR(1) process.
Method (a) is appropriate when the true model is in levels with AR(1) disturb-
ances, while (b) is the best method for (6.3). Method (c) yields asymptotically
efficient estimators of the parameters in (6.3), but represents a case of 'overfitting'
since the AR(1) specification is unnecessary.
Procedures (a) and (c) were carried out by concentrating the log-likelihood
function and performing a grid search with respect to the autoregressive parameter.
The search was accurate to two places of decimals, the largest permitted value of
q being 0.99.
The results of the simulations are set out in Tables 1 and 2. These are based
on two hundred independent replications for each model. Table 1 shows the
estimated probabilities of choosing a first difference formulation, the criteria 31
and b2 being based on a comparison of (a) and (b), and (a) and (c) respectively.
The main conclusions from Table 1 are as follows:
(i) For all three criteria, the probability of choosing the incorrect model
tends to fall as Tincreases.
(ii) The estimated probabilities of choosing the first difference formulation

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
LEVELS AND FIRST DIFFERENCES 715

TABLE 1
ESTIMATED PROBABILITIES OF CHOOSING A MODEL IN FIRST DIFFERENCES

True Model T Stationary


01
Data Trending Data
a2
62 1

20 .09 .09 .05 .04


Levels
L
with =O0.5 50 .00 .00 .00 .01
100 .00 .00 .00 .00

20 .68 .53 .43 .21


Levels 50 .54 .36 .32 .20
with o =0.9 100 .27 .16 .13 .08

20 .79 .71 .43 .27


First 50 .86 .74 .53 .34
Differences 100 .88 .79 .73 .60

TABLE 2
SAMPLING PROPERTIES OF LEVELS AND FIRST DIFFERENCE ESTIMATORS

Stationary Data Trending Data


True Model T MSE for : (x 103) Mean MSE for : (x 103) Mean
for 0) for rp
(a) (b) (c) from (a) (a) (b) (c) from (a)

Levels with 20 3.413 3.245 3.343 .365 3.894 18.289 13.080 .300
50 .624 .648 .695 .445 .084 .689 .440 .437
~-0.5* 100 .452 .471 .492 .466 .0006 .0068 .0057 .467

Levels with 20 2.638 2.229 2.546 .698 23.041 19.062 28.853 .564
50 .643 .636 .670 .829 .752 1.097 1.104 .766
100 .263 .262 .265 .869 .009 .018 .018 .845

First 20 1.230 1.217 1.288 .786 45.164 25.240 31.192 .535


Differences 50 .602 .603 .634 .839 3.572 2.761 2.870 .818
100 .309 .310 .306 .955 .078 .061 .061 .925

are uniformly lower for the trending data sets.


(iii) The probability of choosing first differences falls when 31 is replaced by
62. This reflects the addition of the (unnecessary) extra parameter to the
differences model.
(iv) For the levels model with 0=0.5, the chances of choosing a first differ-
ences formulation are very small. However, when 0 = 0.9 and the
sample size is small, the results suggest that the criteria may be biased in
the sense that they show a probability of accepting the incorrect model
which is greater than one-half.
(v) When the true model is in first differences, the estimated probability of
choosing this model is relatively high for stationary data. However, for
trending data both 31 and 2 are biased when T=20.
The last two findings reported above prompt questions concerning the per-
formance of a particular estimation procedure when an incorrect decision has
been made with respect to whether the model is in levels or first differences. In

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
716 A. C. HARVEY

order to answer these questions, various sampling statistics associated with the
procedures are reported in Table 2. The table shows the mean squared error
(MSE) of each of the estimators of /3 taken over all replications. Since the
estimators are unbiased, the mean squared errors may be regarded as estimates
of variance, and hence can be used as a guide to small sample efficiency. The
sample means for the estimates of 0 produced by (a) are also given.
The following points emerge from Table 2:
(vi) When the true model is in levels, and the xe's are stationary, there is very
little to choose between the three methods of estimating /3,although for
T=20 the sample MSE of the first difference estimator (b) is actually
smaller than that of the levels estimators for both 0=0.5 and 0=0.9.
The relative efficiency of applying OLS to first differences in such cases
may be explained by noting that the disturbance term in the first differ-
ence formulation is an ARMA(1, 1) process,

(6.5) V 1- t.

The first order autocorrelation for this process is

p, = - (1-0)/25 101 < 1,

and so with 0=0.5, pl=-0.25, while for 0=0.9, pl=-0.05. Since


higher-order autocorrelations are given by p, = Pt 4w-' z = 2, 3, 4,..., the
serial correlation in vt is relatively weak for moderate or high values of 0.
Comparing these figures with the corresponding results in Table 1 suggests that
when there is a high probability of 31 indicating a first difference model when the
true model is in levels, the consequences are unlikely to be severe. In fact for
a small sample size and a high value of 0, it seems that a straightforward ap-
plication of OLS to first differences may be the best estimation procedure.
Furthermore the structure of the disturbance process (6.5) means that for a given
/3,any increase in the MSE's of the prediction errors arising from the use of (b)
will be small.
For trending data the performance of the first difference estimator, (b), is less
impressive. When 0=0.9 and T=20, (b) still has the smallest MSE for /3, but
as T increases it becomes relatively inefficient. For 0 0.5 the MSE produced by
(b) is considerably greater than the corresponding MSE for (a) for all sample
sizes. These findings are not entirely unexpected since for a simple linear time
trend in xt, i.e., xt= t, it is known that the variance of the GLS estimator of /3is
of 0(T-3), whereas OLS applied to first differences yields a variance of 0(T-2);
see Hannan [1960, pp. 114-115].
The implications for the model selection procedure are therefore similar to
those for the stationary data. When the 5 criterion has a high probability of
selecting a first difference model, OLS applied to first differences will yield a
reasonably efficient estimator of /3.
(vii) When the true model is in first differences, (a) estimates / surprisingly

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
LEVELS AND FIRST DIFFERENCES 717

well for the stationary data sets. However, for trending data, the gain
from using (b) is much more pronounced.
The overall conclusion from these results if that ( provides a relatively effective
criterion for discriminating between levels and first difference models. In cases
where it is difficult to discriminate - e.g., levels with 0 = 0.9 - the costs of
making a wrong decision are unlikely to be high. The least satisfactory per-
formance of the criterion is for trending data and a first difference model. In
small samples the chances of choosing a first difference formulation are not as
high as one might wish given the losses incurred in assuming a levels model to be
appropriate. Particular care should therefore be exercised for trending data
when the decision to choose a levels model is a marginal one. Finally, the results
in Table 2 provide some interesting supplementary information on estimator (c).
(viii) When the true model is (6.3), the performance of (c) shows the effect of
overparameterization. Although estimating the model under the
assumption that the disturbances are generated by an AR(1) process
does not affect asymptotic efficiency, the loss of efficiency in small
samples may be considerable. The case of trending data with T= 20
provides the most extreme example.
(ix) For a levels model with trending data and 0=0.5 the MSE's for (c) are
considerably higher than those produced by (a). Since a levels model is
equivalent to a first difference model with ARMA(1, 1) disturbances,
this is an indication that the assumption of an AR(1) disturbance may
be totally inadequate in these circumstances. The figures for 0 =0,
which are not shown in the table, illustrate the point even more dramati-
cally. For T= 20, for example, the MSE of (c) was over six times the
size of the MSE of (a) for trending data.4 These findings may have
important practical implications in applied econometric work, where the
assumption of AR(1) disturbances is widely employed, even in first
difference models.

7. CONCLUSION

This paper gives a formal justification for the use of s2, the unbiased estimator
of the disturbance variance, as the basis for assessing the goodness of fit of re-
gression models irrespective of whether the dependent variable is in levels or first
differences. A coefficient of multiple correlation in terms of first differences is
defined for a levels regression model. This would seem to be more useful than
the conventional R2 for a time series regression.
When the disturbances in a model have been modeled by an ARMA process an

I However, as with 0==0.5 and 0=0.9, method (c) performedmuch better for stationary
data. For T=20 and 6=0, it gave a MSE only 140 greaterthan that of (a). Neverthelessit is
the trendingcase which is most relevantin this context, since it is in such cases that differencing
tends to be carriedout automatically.

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
718 A. C. HARVEY

allowance is made for the number of parameters estimated by adopting a measure


of goodness of fit based on the Akaike Information Criterion; see (5.4). Similar
considerations apply in assessing the goodness of fit of dynamic transfer function
models.
There is unlikely to be any difficulty in discriminating between levels and first
difference models when both these alternatives can be assumed to have white
noise disturbances. However, the problems arising in practice are likely to
involve models with ARMA disturbances. Discriminating between a first differ-
ence model and a levels model with AR(1) disturbances is a particularly pertinent
case in applied economics. This is the situation examined in the Monte Carlo
experiments although it must be stressed that these were very limited in scope
and furthermore, each experiment involved only two hundred replications. The
conclusion presented in the next paragraph should therefore be treated with some
caution.
For the stationary data set, the criterion gives a very high probability of selecting
the levels model when the serial correlation is moderate (0=0.5) and also a high
probability of selecting the first difference model when it is the true state of nature.
For 0 =0.9, the probability of choosing a first difference model is relatively high
for a small sample size, but because the first difference model is more parsimoni-
ous it actually yields a better estimator of P than the correctly specified model.
For the more relevant case of trending data the estimated probabilities of selecting
the first difference model are uniformly lower in all cases. This is reasonable
insofar as the losses in efficiency in assuming a first difference model when a
levels model is appropriate are relatively higher in this case. However, in all
cases the losses incurred in making a wrong decision are far more serious than
for stationary data. When the true model is actually in first differences the
probability of choosing such a formulation is relatively low for trending data.
Furthermore, the results show that, on average, the estimates of 4 will not be
close to unity in small samples and so the fact that the true model is in first differ-
ences would not necessarily be apparent from the fitted equation. The conclusion
from this is that in small samples it may not always be easy to discriminate
between a levels model with AR disturbances and a first difference model. How-
ever, attempting to discriminate between models on statistical grounds is clearly
preferable to taking first differences automatically, since the loss in precision in
doing so may be considerable.
London School of Economics, England

APPENDIX

Consider the levels model, (2.1) and pre-multiply y, the Tx 1 vector of observa-
tions on the dependent variable and X, the Tx (k +1) matrix of independent
variables, by the Tx T matrix

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
LEVELS AND FIRST DIFFERENCES 719

-1 1 0 .0... 0

0-1 1 0.0
0
01I
O 0-1 -1 1r *

0 ......... 0 1
This gives the T- 1 equations (2.7), together with (2.1) for t= T, i.e.,
(A.1) Aly = (Aext)'f + vt, t = 2,..., T,

Y T = 0 + XT'f + STA

Note that H is an upper triangular matrix with non-zero elements on the leading
diagonal, and so it is non-singular.
The covariance matrix of the disturbances v2,..., VT, ET in (A.1) is U2HH'.
Since HH' is positive definite, there must be a lower triangular matrix, J, such
that J'J=(HH')-1. Pre-multiplying the dependent and independent variables in
(A.1) by J leads to a new set of transformed equations in which the disturbances
have a scalar covariance matrix, i.e., U2I. Thus OLS applied to this model will
yield exactly the same estimator of f and exactly the same residual sum of squares,
SST, as OLS applied to the original model (2.1). Furthermore, in view of the
lower triangular nature of J, the new transformed system of equations will have
a similar form to (A.1), in that a constant term appears in the last equation only.
An estimator of all the parameters apart from the constant term, may be con-
structed from the first T- 1 observations. When the observations corresponding
to the final equation are added, however, the residual sum of squares remains
unchanged, since the extra parameter must be accommodated; cf., Brown, Durbin
and Evans [1975, p. 153]. Thus SST=SST-1.
Suppose now that the observations in the first T- 1 equations of (A.1) are pre-
multiplied by J1, the (T- 1) x (T- 1) sub-matrix of J obtained by deleting the
T-th row and T-th column of J. In view of the lower triangularity of J, this
yields the first T- 1 equations of the system obtained by pre-multiplying the full
set of equations in (A.1) by J. The matrix Q defined in (2.10) is equal to the
sub-matrix of HH' given by deleting the last row and column in HH'. A little
algebraic manipulation shows that J'1 (J 1)'= Q and so J'J1 = Q-T. The residual
sum of squares SST-1, is therefore equal to v'Q-v' and by the argument of the
previous paragraph this is also equal to the residual sum of squares in the un-
transformed levels model (2.1).

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions
720 A. C. HARVEY

REFERENCES

AKAIKE, H.,"InformationTheory and an Extension of the Maximum Likelihood Principle,"


in B. N. Petrov and F. Csaki, eds., 2nd International Symposium on Information Theory
(Budapest:AkademiaiKiado, 1973),267-281.
BEACH,C. M. AND J. G. MACKINNON, "A MaximumLikelihoodProcedurefor Regressionwith
Auto-CorrelatedErrors,"Econometrica,46 (January, 1978), 51-58.
Box, G. E. P. AND G. M. JENKINS, Time Series Analysis: Forecasting and Control, revised edition
(San Francisco:Holden-Day Inc., 1976).
BROWN, R. L., DURBIN, J. AND J. M. EVANS, "Techniques for Testing the Consistency of
RegressionRelationshipsover Time, with Discussion,"Journalof the Royal Statistical So-
ciety, Series B, 37 (January,1975), 149-193.
FRIEDMAN, M. AND D. MEISELMAN, "The Relative Stability of Monetary Velocity and the
Investment Multiplier in the United States, 1897-1958," in StabilizationPolicies, Com-
mission on Money and Credit (Englewood Cliffs: Prentice-Hall, 1963).
GRANGER, C. W. J. AND P. NEWBOLD, "Spurious Regressions in Econometrics,"Journal of
Econometrics,2 (July, 1974), 111-120.
HANNAN, E. J., Time Series Analysis (London: Methuen, 1960).
HARVEY, A. C. AND G. D. A. PHILLIPS, "TheEstimationof RegressionModelswith Autoregres-
sive-MovingAverageDisturbances,"Biometrika,66 (April, 1979),49-58.
HENDRY, D. F. AND G. MIZON, "Serial Correlation as a Convenient Simplification, not a
Nuisance: A Comment on a Study of the Demand for Money in the Bank of England,"
Economic Journal, 88 (September, 1978), 549-563.
OZAKI,T., "On the Order Determinationof ARMA Models," Applied Statistics, 26 (no. 3,
1977), 290-301.
C. I. AND G. W. SCHWERT, "Estimationof a Non-InvertibleMoving Average Process:
PLOSSER,
The Case of Overdifferencing,"Journal of Econometrics,6 (September, 1977), 199-224.
SARGAN, J.D., "Wages and Prices in the United Kingdom: A Study in Econometric Meth-
odology," in P. E. Hart, G. Mills and J. K. Whitaker, eds., EconometricAnalysis for
NationalEconomicPlanning(London: ButterworthsScientificPublications,1964).
SAVIN,N. E., "Friedman-MeiselmanRevisited: A Study in Auto-correlation,"EconomicIn-
quiry,16 (January,1978), 37-52.
WILLIAMS,D., "Estimatingin Levels or First Differences:A Defence of the Method used for
Certain Demand-For-Money Equations," Economic Journal, 88 (September, 1978), 564-568.

This content downloaded from 143.88.66.66 on Sun, 8 Sep 2013 06:45:27 AM


All use subject to JSTOR Terms and Conditions

You might also like