Professional Documents
Culture Documents
To cite this article: Benjamin Born & Jörg Breitung (2016) Testing for Serial Correlation
in Fixed-Effects Panel Data Models, Econometric Reviews, 35:7, 1290-1316, DOI:
10.1080/07474938.2014.976524
Download by: [KU Leuven University Library] Date: 18 July 2016, At: 04:57
Econometric Reviews, 35(7):1290–1316, 2016
Copyright © Taylor & Francis Group, LLC
ISSN: 0747-4938 print/1532-4168 online
DOI: 10.1080/07474938.2014.976524
In this article, we propose various tests for serial correlation in fixed-effects panel data
regression models with a small number of time periods. First, a simplified version of the
test suggested by Wooldridge (2002) and Drukker (2003) is considered. The second test is
based on the Lagrange Multiplier (LM) statistic suggested by Baltagi and Li (1995), and
the third test is a modification of the classical Durbin–Watson statistic. Under the null
hypothesis of no serial correlation, all tests possess a standard normal limiting distribution
as N tends to infinity and T is fixed. Analyzing the local power of the tests, we find that
the LM statistic has superior power properties. Furthermore, a generalization to test for
autocorrelation up to some given lag order and a test statistic that is robust against time
dependent heteroskedasticity are proposed.
1. INTRODUCTION
Panel data models are increasingly popular in applied work as they have many advantages
over cross-sectional approaches (see, e.g., Hsiao, 2003, p. 3ff). The classical linear panel
data model assumes serially uncorrelated disturbances. As argued by Baltagi (2008, p. 92),
this assumption is likely to be violated as the dynamic effect of shocks to the dependent
variable is often distributed over several time periods. In such cases, serial correlation
leads to inefficient estimates and biased standard errors. Nowadays, (cluster) robust
standard errors are readily available in econometric software like Stata, SAS, or EViews
that allow for valid inference under heteroskedasticity and autocorrelation. Therefore,
many practitioners consider a possible autocorrelation as being irrelevant, in particular, in
large data sets where efficiency gains are less important. On the other hand, a significant
Address correspondence to Jörg Breitung, Center of Econometrics and Statistics, University of Cologne,
Meister-Ekkehart-Str. 9, 50923 Cologne, Germany; E-mail: breitung@statistik.uni-koeln.de
TESTING FOR SERIAL CORRELATION 1291
test statistic provides evidence that a static model may not be appropriate and should be
replaced by a dynamic panel data model.
A number of tests for serial error correlation in panel data models have been proposed
in the literature. Bhargava et al. (1982) generalized the Durbin–Watson statistic to
the fixed-effects panel model. Baltagi and Li (1991, 1995) and Baltagi and Wu (1999)
derived Lagrange Multiplier (LM) statistics for first order serial correlation. Drukker
(2003), elaborating on an idea originally proposed by Wooldridge (2002), developed an
easily implementable test for serial correlation based on the ordinary least-squares (OLS)
residuals of the first-differenced model. This test is referred to as the Wooldridge–Drucker
(henceforth: WD) test. Inoue and Solon (2006) suggested a portmanteau statistic to test
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
2. PRELIMINARIES
Consider the usual fixed effects panel data model with serially correlated disturbances
unknown vector of associated coefficients, i is the (fixed) individual effect, and uit denotes
the regression error. In our benchmark situation, the following assumption is imposed.
Assumption 1.
(i) The error it is independently distributed across i and t with E(it ) = 0, E(2it ) =
i2 with c−1 < i2 < c for all i = 1, , N and some finite positive constant c.
4+
Furthermore, E it < ∞ for some > 0.
(ii) The vector of coefficients is estimated by a consistent estimator ˆ with ˆ − =
Op (N −1/2 ).
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
1
T N
E(xit xit ) → C,
N t=1 i=1
where C = tr(C C) < ∞.
All tests are based on the residual êit = yit − xit ˆ = i + uit − xit (ˆ − ). Assumption 1
(ii) allows us to replace the residuals êit by eit = i + uit in our asymptotic considerations.
If xit is assumed to be strictly exogenous, the usual within-group (or least-squares dummy-
variable, cf. Hsiao, 2003) estimator or the first difference estimator (cf. Wooldridge, 2002)
may be used. If xit is predetermined or endogenous, suitable instrumental variable (or
Generalized Method of Moments (GMM)) estimators of satisfy Assumption 1 (ii)
(e.g., Wooldridge, 2002). It is important to notice that the efficiency of the estimator ˆ
does not affect the asymptotic properties (i.e., size or local power) of tests based on êit .
To test the null hypothesis = 0, Bhargava et al. (1982) proposed the pooled
Durbin–Watson statistic given by
N T
t=2 (ûit − ûi,t−1 )
2
pDW = i=1
N T 2 ,
i=1 t=1 ûit
where ûit = yit − xit ˆ − ˆ i , and ˆ and ˆ i denote the least-squares dummy-variable (or
within-group) estimator of and i , respectively. A serious problem of this test is that
its null distribution depends on N and T and, therefore, the critical values are provided
in large tables depending on both dimensions in Bhargava et al. (1982). Furthermore, no
critical values are available for unbalanced panels.
Baltagi and Li (1995) derive the LM test statistic for the hypothesis = 0 assuming
normally distributed errors. The resulting test statistic is equivalent to (the LM version
of) the t-statistic of
in the regression
ûit =
ûi,t−1 + it (3)
TESTING FOR SERIAL CORRELATION 1293
Baltagi and Li (1995) show that if N → ∞ and T → ∞, the LM statistic has a standard
normal limiting distribution. However, if T is fixed and N → ∞, the application of the
respective critical values leads to severe size distortions (see Section 3.2).
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
In this section, we consider test procedures that are valid for fixed T and N → ∞. All
statistics have the general form
N
ẑTi
NT = i=1
2 , (4)
N 2
i=1 ẑTi − 1
N
N
i=1 ẑTi
ˆ and AT is a deterministic T × T
where ẑTi = êi AT êi , êi = [êi1 , , êiT ] , êit = yit − xit ,
matrix. The matrix AT is constructed such that it eliminates the individual effects from êi .
Specifically, we make the following assumption with respect to the matrix AT .
Assumption 2.
(i) Let
T denote the T × 1 vector of ones. The T × T matrix AT obeys AT
T = 0, AT
T =
0, and tr(AT ) = 0.
(ii) Let Xi = [xi1 , , xiT ] and ui = [ui1 , , uiT ] . For all i = 1, , N , it holds that
E[Xi (AT + AT )ui ] = 0.
In Lemma A.1 of the appendix, it is shown that under Assumptions 1 and 2 the
estimation error ˆ − is asymptotically negligible. Note that, in general, Assumption 2
(ii) is violated if xit is predetermined or endogenous.
The following lemma presents the limiting distribution of the test statistic under a
sequence of local alternatives.
Lemma 1.
(i) Under Assumptions 1 and 2, = 0, fixed T , and N → ∞, the test statistic NT has a
standard normal limiting distribution.
1294 B. BORN AND J. BREITUNG
√
(ii) Under the local alternative N = c/ N , it ∼ (0, i2 ), and Assumptions 1 and 2 the
limiting distribution is given by
⎛
⎞
N
c i=1 tr(A T H T )
−→ ⎝ , 1⎠ ,
d
NT
tr(A2T + AT AT )
√
where = m2 / m4 with mk = limN →∞ N −1 Ni=1 ik and HT is a T × T matrix with
(t, s)-element HT ,ts = 1 for |t − s| = 1 and zeros elsewhere.
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
Hence, to accommodate unbalanced panel data, the tests are computed with individual
specific transformation matrices ATi instead of a joint matrix AT .
To obtain a valid test statistic for fixed T , Wooldridge (2002, p. 282f) suggests to run
a least squares regression of the differenced residuals êit = êit − êi,t−1 on the lagged
differences êi,t−1 .2 Under the null hypothesis = 0, the first order autocorrelation of
êit converges in probability to −05. Since êit is serially correlated, Drukker (2003)
suggests to employ heteroskedasticity and autocorrelation consistent (HAC) standard
errors,3 yielding the test statistic
ˆ + 05
WD =
ŝ
1
As pointed out by Baltagi and Wu (1999), the case of gaps within the sequence of observations is more
complicated.
2
Since this test employs the least-squares estimator in first differences, this test assumes E(xit uit ) = 0
or E(xit uis ) = 0 for all t and s ∈ t − 1, t, t + 1.
3
This approach is also known as “robust cluster” or “panel corrected” standard errors.
TESTING FOR SERIAL CORRELATION 1295
N
T 2
i=1 t=3 ê ˆ
i,t−1 it
ŝ2 =
2 ,
N T 2
i=1 t=3 ê i,t−1
Theorem 1. Let ẑTi = Tt=3 (êit − 21 êi,t−1 − 21 êi,t−2 )(êi,t−1 − êi,t−2 ), and denote by W D the
corresponding test statistic constructed as in (4).
√
(i) Under Assumptions 1 and 2, uit ∼ (0, i2 ), T ≥ 3, and N = c/ N , it follows that
(T − 2)
d
WD −→ c √ ,1 ,
2(T − 3) + 3
The WD statistic deviates from WD by computing the HAC variance based on the
residuals obtained under the null hypothesis, i.e., 0it = êit + 05êi,t−1 instead of the
residuals ˆ it used to compute the original WD statistic.
is a test against
where rj is the jth autocorrelation of uit . This suggests that WD (and WD)
the difference between the first and second order autocorrelation of the errors. Therefore,
the test is expected to have poor power against alternatives with r1 ≈ r2 .
An important feature of the LM test suggested by Baltagi and Li (1995) is that the limit
distribution depends on T . This is due to the fact that the least-squares estimator of
in
regression (3) is biased and the errors it in (3) are autocorrelated. Under fairly restrictive
assumptions the following asymptotic null distribution is obtained.
iid
Lemma 2. Under Assumption 1 and uit ∼ (0, 2 ) for all i and t, it holds for fixed T
and N → ∞ that
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
N d (T + 1)(T − 2)2
LM + −→ 0,
T −1 (T − 1)3
It follows that applying the usual critical values derived from the 12 distribution to
the (two-sided) test statistic LM2 yields a test with actual size tending to unity as N →
∞. Note also that the limit distribution of the LM statistic tends to a standard normal
distribution if T → ∞ and N /T → 0. To obtain an asymptotically valid test for fixed T ,
the transformed statistic
(T − 1)3 N
L M= LM +
(T + 1)(T − 2)2 T −1
may be used, which has a standard normal limiting null distribution. An important
limitation of this test statistic is, however, that the result requires the errors to be normally
distributed with identical variances for all i ∈ 1, , N . To obtain a test statistic that is
valid in more general (and realistic) situations, we follow Wooldridge (2002, p. 275) and
construct a test of the least squares estimator
ˆ of
in the regression
1 1
T T
êit − êit =
êi,t−1 − êit + it (6)
T t=1 T t=1
p tr(M1 MT ) (T − 1)/T 1
ˆ −→
0 = =− =− (7)
tr(MT MT ) (T − 1)2 /T T −1
Thus the regression t-statistic is employed to test the modified null hypothesis H0 :
=
0 = −1/(T − 1). To account for autocorrelation in it , HAC robust standard errors are
TESTING FOR SERIAL CORRELATION 1297
ˆ −
0
LM∗ = ,
ṽ
where
N N
êi MT ˜ i ˜ i MT êi êi MT (M1 −
ˆ MT )êi êi (M1 −
ˆ MT ) MT êi
ṽ2 =
i=1
2 =
i=1
2
N N
i=1 êi MT MT êi i=1 êi MT MT êi
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
As for the WD test, we propose a simplified version of this test that is asymptotically
equivalent to the statistic LM∗ .
Theorem 2. Let
⎡ 2 ⎤
T T T T
ẑTi = ⎣ êit − 1 êit êi,t−1 −
1
êit +
1
êi,t−1 −
1
êit ⎦ ,
t=2
T t=1 T t=1 T −1 T t=1
∗
and denote by L M the corresponding test statistic constructed as in (4).
(i) Under Assumptions 1 and 2, uit ∼ (0, i2 ), T ≥ 3, and the local alternative N =
√ ∗
c/ N , the test statistic L M is asymptotically distributed as
⎛ ⎞
2
⎝c T − 3 + 2 , 1⎠
T −T
∗
(ii) Under the the local alternative, the statistic L M is asymptotically equivalent to LM∗ .
The pDW statistic suggested by Bhargava et al. (1982) is the ratio of the sum of squared
differences and the sum of squared residuals. Instead of the ratio (which complicates
the theoretical analysis), our variant of the Durbin–Watson test is based on the linear
combination of the numerator and denominator of the Durbin–Watson statistic. Let
and M as defined above. Using tr(MD DM) = 2(T − 1) and tr(M) = T − 1, it follows
that E(zTi ) = 0 for all i, where zTi = ei MD DMei − 2 ei Mei .
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
The panel test statistic is constructed as in (4), where ẑTi is given by (8). The resulting
test is denoted by mDW. The following theorem presents the asymptotic distribution of
the test statistic.
Theorem 3. Under√ Assumptions 1 and 2, uit ∼ (0, i2 ), T ≥ 3, and the local
alternative N = c/ N , the limiting distribution is
d T −1√
mDW −→ −c T − 2, 1
T
Theorems 1–3 allow us to compare the asymptotic power of the three different tests. Let
√
WD (T ) = (T − 2)/ 2(T − 3) + 3 denote the Pitman drift of the local power resulting
from Theorem 1(i), and denote by LM (T ) and mDW (T ) the respective terms resulting
from Theorem 2(i) and Theorem 3. The test A is asymptotically more powerful than
test B for some given value of T if A (T )/B (T ) > 1, where A, B ∈ WD, LM, mDW .
In what follows, we use the Pitman drift of the WD test as a benchmark and define
the relative Pitman drift as LM (T ) = LM (T )/WD (T ) and mDW (T ) = mDw (T )/WD (T ).
Figure 1 presents the relative Pitman drifts of both tests as a function of T . It turns out
that for the minimum value of T = 3 the mDW test is asymptotically more powerful
among the set of three alternative tests. For T ≥ 4, however, the LM test is asymptotically
most powerful, where the difference between the LM and the mDW tests diminish as T
gets large. The power gain of both tests relative to the WD test approaches 40% in terms
of the Pitman drift of the WD test.
Note that the asymptotic power of some test A ∈ WD, LM, mDW is a nonlinear
function of A (T ). For small values of c, however, the asymptotic power is approximately
linear. In Table 1 we report the asymptotic power of the tests (in parentheses) for various
values of c and T . For example, consider c = 05 and T = 50. The asymptotic power of
the LM test relative to the WD test is 0929/0683 = 136 and the relative asymptotic
power of the mDW test results as 0923/0683 = 135, which roughly corresponds to the
relative Pitman drifts LM (50) and LM (50).
TESTING FOR SERIAL CORRELATION 1299
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
Since the estimation error xit (ˆ − ) in the residuals êit = eit − xit (ˆ − ) is asymptotically
negligible, we simplify our notation and write eit instead of êit in this section.
A test for higher order serial correlation in panel data models was proposed by Inoue and
Solon (2006). Due to the fact that the covariance matrix of the demeaned error vector
∗i = E(Mei ei M) is singular with rank T − 1, we drop the kth row and column of ∗i
yielding the invertible matrix ∗i,k . The null hypothesis of no autocorrelation implies that
∗i,k = i2 Mk , where Mk results from M by dropping the kth row and column of M. Denote
by Si,k the matrix that results from deleting the kth row and columns of Si = Mei ei M. The
LM statistic is given by
N N −1
N
ISk = s̃i,k s̃i,k s̃i,k s̃i,k , (10)
i=1 i=1 i=1
1300 B. BORN AND J. BREITUNG
TABLE 1
Size and Power of Alternative Tests
∗
T WD
W D LM ∗
L M mDW
c=0
5 0.050 0.049 0.054 0.052 0.055
N = 500. Empirical values are computed using 10,000 Monte Carlo simulations. Asymptotic values (given
in parentheses) are computed according to the results in Theorems 1–3.
where
and vech(A) denotes the linear operator that stacks all elements of the m × m matrix A
below the main diagonal into a m(m − 1)/2 dimensional vector. In empirical practice the
unknown variances 12 , , N2 are replaced by the unbiased estimate ˆ i2 = (T − 1)−1 ei Mei .
TESTING FOR SERIAL CORRELATION 1301
An important problem with this test statistic is that it essentially tests a (T − 2)(T −
1)/2 dimensional null hypothesis. Thus, to yield a reliable asymptotic approximation
of the null distribution, N should be large relative to T 2 /2. Another problem is
that the power of the test is poor if only a few autocorrelation coefficients are
substantially different from zero. For example, if the errors are generated by a first order
autoregression with uit = 01ui,t−1 + it , then the second and third order autocorrelations
are as small as 0.01 and 0.001. In such cases, a test that includes all autocorrelations up
to the order T − 1 lacks power. In order to improve the power of the test statistic in such
cases, the test statistic should focus on the first order autocorrelations. Inoue and Solon
(2006) suggest to construct a test statistic based on the first p autocorrelations by forming
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
eit − ēi =
k (ei,t−k − ēi ) + it , t = k + 1, , T ; i = 1, , N (11)
In the following lemma, we present the probability limit of the OLS estimator of
k for
fixed T and N → ∞.
Lemma 3. Let
ˆ k be the pooled OLS estimator of
k in (11). Under Assumption 1 and
= 0, we have
p 1
ˆ k −→ − ,
T −1
Using this lemma, a test for zero autocorrelation at lag k can be constructed based
on the (heteroskedasticity and autocorrelation robust) t-statistic of the hypothesis
k =
−1/(T − 1). An asymptotically equivalent test statistic is obtained by letting
T
1
(k)
zTi = (eit − ēi )(ei,t−k − ēi ) + (ei,t−k − ēi )2
t=k+1
T −1
1302 B. BORN AND J. BREITUNG
and
N (k)
∗k = i=1 zTi
LM
2 (12)
N (k)2
i=1 zTi − 1
N
N
i=1
(k)
zTi
Along the lines of the proof of Theorem 2, it is not difficult to show that, under the null
∗k has a standard normal limiting distribution.
hypothesis, LM
In empirical practice it is usually more interesting to test the hypothesis that the
errors are not autocorrelated up to order p. Unfortunately, the probability limit of the
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
is a much more complicated function of T , since the regressors are mutually correlated.
Instead of presenting the respective probability limits, we therefore suggest a simple
transformation of the regressors such that the autoregressive coefficients can be estimated
unbiasedly under the null hypothesis. Specifically, we propose an implicit correction that
eliminates the bias. In matrix notation, the (transformed) autoregression is given by
Mei = 1 L1 ei + 2 L2 ei + · · · + p Lp ei + i , (13)
where
⎛ ⎞
0k×1
⎜ ei1 − ēi⎟ T −k
⎜ ⎟
Lk ei = ⎜ ⎟+ ei ,
⎝ ⎠ T2 − T
ei,T −k − ēi
where the first vector on the right-hand side represents the lagged values of the demeaned
residuals, whereas the second vector is introduced to remove the bias. It is not difficult to
show that
T −k T −1 T −k
E(ei MLk ei ) = − i +
2
i2 = 0
T T T −1
for all k = 1, , p and, therefore, under the null hypothesis, the least-squares estimators
of 1 , , p converge to zero in probability for all T and N → ∞. The null hypothesis
1 = · · · = p = 0 can therefore be tested using the associated Wald statistic
ˆ
Q(p) = ˆ
S −1 , (14)
TESTING FOR SERIAL CORRELATION 1303
where
N −1 −1
N
N
S= Zi Zi Zi ˆ i ˆ i Zi Zi Zi
i=1 i=1 i=1
is the HAC
estimator
for the covariance matrix of the vector of coefficients
ˆ ˆ ˆ ˆ The Q(p) statistic is
with = 1 , , p , Zi = L1 ei , , Lp ei , and ˆ i = Mei − Zi .
asymptotically equivalent to the statistic
N !−1
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
N
N
1
N
N
Q(p) = ei MZi Zi Mei ei MZi − Zi Mei ei MZi Zi Mei
i=1 i=1
N i=1 i=1 i=1
Along the lines of Theorem 2, it is straightforward to show that the statistics Q(p) and
Q(p) are asymptotically equivalent and have an asymptotic 2 distribution with p degrees
of freedom.
An important drawback of all test statistics considered so far is that they are not
robust against time dependent heteroskedasticity. This is due to the fact that the
implicit or explicit bias correction of the autocovariances depends on the error variances.
To overcome this drawback of the previous test statistics, we construct an unbiased
estimator of the autocorrelation coefficient. The idea is to apply backward and forward
transformations such that the products of the transformed series are uncorrelated under
the null hypothesis. Specifically, we employ the following transformations for eliminating
the individual effects:
1 " #
ẽitf = êit − êit + · · · + êiT ,
T −t+1
1 " #
ẽitb = êit − êi1 + · · · + êit
t
ẽitf = ẽi,t−1
b
+ it , t = 3, , T − 1, (15)
and
⎡ ⎤
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
T −3
0 0 T −2
− T −2
1
− T −2
1
− T −2
1
− T −2
1
⎢ ⎥
⎢0 0 0 T −4
− T −3
1
− T −3
1 1 ⎥
− T −3
⎢ T −3 ⎥
V1 = ⎢
⎢
⎥
⎥
⎢ ⎥
⎣ ⎦
0 0 0 0 0 1
2
− 21
Under the null hypothesis of no (first order) autocorrelation, we have = 0. Since the
error vector wi is heteroscedastic and autocorrelated, we again employ robust HAC
standard errors. The resulting test statistic is equivalent to the test statistic
N
= i=1 ẑTi
HR
2 ,
N 2
i=1 ẑTi − 1
N
N
i=1 ẑTi
where
T −1
ẑTi = êi AT êi = ẽitf ẽi,t−1
b
with AT = V0 V1
t=3
Since V0
T , V1
T = 0, and tr(V0 V1 ) = 0, it follows from Lemma 1 (i) that the test statistic
has a standard normal limiting null distribution.
HR
This section presents the finite sample performance of the test statistics for serial
correlation and assesses the reliability of the asymptotic results derived in Section 3. We
also conduct experiments with different forms of serial correlation and heteroskedasticity.
The benchmark data generating process for all simulations is a linear panel data model
of the form
where i = 1, , 5004 and various values of T . We set to 1 in all simulations and draw
the individual effects i from a (0, 252 ) distribution. To create correlation between
the regressor and the individual effect, we follow Drukker (2003) by drawing xit0 from a
(0, 182 ) distribution and computing xit = xit0 + 05i . The regressor is drawn once and
then held constant for all experiments. In our benchmark model, the disturbance term
follows an autoregressive process of order 1:
where it ∼ (0, 1) and we discard the first 100 observations to eliminate the influence of
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
where again it ∼ (0, 1) and we discard the first 100 observations to eliminate the
influence of the initial value. The results for three different AR(2) processes are presented
in Table 2. The Q(2) statistic has good power properties in all configurations considered.
Not surprisingly, the WD test exhibit high power when 1 = 003 and 2 = −003 or 1 =
0 and 2 = 008. As mentioned in Remark 2, however, the WD (and WD) statistic is a
test of the difference between the first and second order autocorrelation of the errors.
As a consequence, it also has power against most AR(2) alternatives. However, choosing
1 = 2 = 003, which implies equal first- and second-order autocorrelations of uit , leads
to a complete loss of power of the WD test, confirming our considerations in Remark 2.
The LM statistic has less power than the Q(2) statistic, in particular for the setup with
1 = 0 and 2 = 008.
4
The cross-sectional dimension is held constant at N = 500, results for smaller N are qualitatively similar.
Values of c = 0, c = 05, and c = 1 therefore imply = 0, = 00244, and = 00447, respectively.
5
As suggested by a referee, we also consider non-Gaussian error distributions. The results were very similar
and can be obtained on request.
1306 B. BORN AND J. BREITUNG
TABLE 2
AR(2) Processes
∗
T WD
W D LM ∗
L M mDW
Q(2) ISk IS(2)
1 = 0, 2 = 0
5 0.048 0.048 0.049 0.047 0.051 0.052 0.049 0.050
10 0.050 0.049 0.051 0.051 0.049 0.050 0.063 0.053
20 0.050 0.049 0.049 0.049 0.048 0.049 0.067 0.045
30 0.053 0.052 0.055 0.055 0.055 0.053 na 0.041
50 0.054 0.054 0.056 0.056 0.055 0.057 na 0.036
1 = 003, 2 = −003
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
N = 500. Rejection frequencies are computed using 10,000 Monte Carlo replications. Note that the IS test
statistics cannot be computed if the degrees of freedom are larger than N (na).
In the last two columns of Table 2 we present the original portmanteau statistic ISk
with k = 1 and the IS(2) statistic that is based on all autocorrelation up to to the lag
order p = 2 (see Section 4). Note that for a sample with N = 500 the inverse of ISk does
not exist for T > 20. It turns out that for the AR(2) alternatives considered in this Monte
Carlo experiment6 the Q(2) statistics has substantially more power than the variants of
the IS statistic.
Next, we consider four different types of time-dependent heteroskedasticity by using
the following error process:
uit = ht it ,
6
We also considered a wide range of other combinations of 1 and 2 . In all cases, the Q(2) turns out to
be more powerful than the IS statistics.
TESTING FOR SERIAL CORRELATION 1307
TABLE 3
Size under Temporal Heteroskedasticity
∗ ∗
T
W D
L M mDW HR
W D
L M mDW HR
N = 500. Rejection frequencies are computed using 10,000 Monte Carlo replications.
where it ∼ (0, 1). The first model implies a break in the variance function, i.e., ht = 10
for t = 1, , T /5, and ht = 1 otherwise. The second specification is a U-shaped variance
function of the form ht = (t − T /2)2 + 1. The third and fourth types of heteroskedasticity
are positive and negative exponential variance functions, i.e., ht = exp(−02t) and ht =
exp(02t), respectively. Table 3 shows that the modified Durbin–Watson test has massive
size distortions under all four types of heteroskedasticity. While the WD test has the
proper size in case of a U-shaped variance function, the break in the variance function
and the exponential variance functions lead to large size distortions. The modified LM
statistic does surprisingly well under heteroskedasticity, but for small T we observe also
some size distortions, especially for the negative exponential variance function. On the
other hand, our proposed heteroskedasticity-robust test statistic retains its correct size
under all types of heteroskedasticity considered here.
7. CONCLUSION
This article proposes various tests for serial correlation in fixed-effects panel data
regression models with a small number of time periods. First, a simplified version of the
test suggested by Wooldridge (2002) and Drukker (2003) is considered. The second test
is based on the LM statistic suggested by Baltagi and Li (1995), and the third test is a
modification of the classical Durbin–Watson statistic. Under the null hypothesis of no
serial correlation, all tests possess a standard normal limiting distribution as N → ∞ and
T is fixed. Analyzing the local power of the tests, we find that the LM statistic has superior
power properties. We also propose a generalization to test for autocorrelation up to some
given lag order and a test statistic that is robust against time dependent heteroskedasticity.
1308 B. BORN AND J. BREITUNG
APPENDIX: PROOFS
Proof of Lemma 1
(i) Let yi = [yi1 , , yiT ] and Xi = [xi1 , , xiT ] . The residual vector can be written
as
The following lemma implies that the estimation error in the residuals êit is asymptotically
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
negligible.
Proof. Let
N
N
N
êi AT êi = ei AT ei + i ,
i=1 i=1 i=1
where
(i) We have
& & & &
&1 N & &1 N &
& ˆ & & &
&
ei (AT + AT )Xi ( − )& ≤ & ui (AT + AT )Xi & · ˆ −
&N & &N &
i=1 i=1
1
N
u (AT + AT )Xi = Op (N −1/2 ),
N i=1 i
(ii) Consider
ê AT êi − e AT ei ≤ | i |,
i i
" #2 " #2
êi AT êi − ei AT ei ≤ 2 | i | · |ui AT ui | + 2i ,
with
' 2 2 (
2i ≤ 2 ui (AT + AT )Xi (ˆ − ) + (ˆ − ) Xi AT Xi (ˆ − )
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
We have
1 1
N N
|u (AT + AT )Xi (ˆ − )|2 ≤ AT + AT 2 u Xi (ˆ − ) (ˆ − )
2
N i=1 i N i=1 i
= Op (N −1 )
and
1 ˆ 2 1 2
N N
( − ) Xi AT Xi (ˆ − ) ≤ AT 2 Xi Xi ˆ − )4
N i=1 N i=1
= Op (N −2 )
It follows that
1 2
N
= Op (N −1 )
N i=1 i
Since
1/2 1/2
1 1 2 1
N N N
| i | |ui AT ui | ≤ (u AT ui )2 = Op (N −1/2 ),
N i=1 N i=1 i N i=1 i
we finally obtain
1 " #2 1 " #2
N N
êi AT êi − ei AT ei = Op (N −1/2 )
N i=1 N i=1
Since E(ui AT ui ) = i2 tr(AT ) = 0 and there exist some real number M such that 1/M <
var(ui AT ui ) < M, it follows from the Lindeberg–Feller central limit theorem that
N
ẑTi d
NT = i=1 + op (1) −→ (0, 1)
N 2
i=1 ẑTi
1310 B. BORN AND J. BREITUNG
2 c
E(ui AT ui ) = tr(AT ) = i2 tr(AT ) + √i tr(AT HT ) + O(N −1 )
N
2 c
= √i tr(AT HT ) + O(N −1 )
N
Using standard results for the variance of quadratic forms of normally distributed
random variables7 (e.g., Searle, 1971), we have
1
N
p
√ ui AT ui −→ (c m2 tr(AT HT ), m4 tr(A2T + AT AT )),
N i=1
1 1 " #2
N N
(êi AT êi )2 = ui AT ui + Op (N −1/2 )
N i=1 N i=1
1 " #2
N
= ui AT ui + Op (N −1/2 )
N i=1
p
−→ m4 tr(A2T + AT AT )
7
Note that for non-symmetric AT the expression can be rewritten in a symmetric format as ui AT ui =
05[ui (AT + AT )ui ].
TESTING FOR SERIAL CORRELATION 1311
and therefore, the second term of the denominator in (4) does not affect the limiting
distribution.
It follows that the limiting distribution of the test statistic is given by
N
√1 ui AT ui
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
i=1 d tr(AT HT )
NT = N
N " #2 + op (1) −→ c tr(A2 + A A ) , 1
1
N i=1 u i A T u i T T T
Proof of Theorem 1
(i) Let Dk denote a (T − 2) × T matrix that results from dropping the kth row of
the first difference matrix D defined in (9). Using Lemma A.1, we have N −1/2 Ni=1 ẑTi =
N −1/2 Ni=1 zTi + Op (N −1 ), where
For T ≥ 3,
⎡ ⎤
05 −05 0 0 0 0 ··· 0 0 0 0 0 0
⎢ 05 0 −05 0 0 0 ··· 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ −1 15 0 −05 0 0 ··· 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ 0 −1 15 0 −05 0 ··· 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ 0 0 −1 15 0 −05 · · · 0 0 0 0 0 0 ⎥
⎢ ⎥
AT = ⎢ ⎥
⎢ ··· ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 · · · −1 15 0 −05 0 0 ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 · · · 0 −1 15 0 −05 0 ⎥
⎢ ⎥
⎣ 0 0 0 0 0 0 ··· 0 0 −1 15 −05 0 ⎦
0 0 0 0 0 0 ··· 0 0 0 −1 1 0
For example,
⎡ ⎤ ⎡ ⎤
0 ) * −1 ) *
A3 = ⎣−1⎦ −1 1 0 + 05 ⎣ 1 ⎦ −1 1 0
1 0
⎡ ⎤ ⎡ ⎤
−05 ) * 05 −05 0
= ⎣−05⎦ −1 1 0 = ⎣05 −05 0⎦
1 −1 1 0
1312 B. BORN AND J. BREITUNG
iid
Obviously, tr(AT ) = 0, AT
T = 0, and
T AT = 0. If uit ∼ N (0, i2 ), the variance of the
quadratic form ei AT ei = ui AT ui is i4 tr(A2T + AT AT ), where
3
tr(A2T ) = − (T − 3),
2
7(T − 3)
tr(AT AT ) = 3 +
2
tr(AT HT ) = T − 2
N
T 2
1
N i=1 t=3 êi,t−1 (êit + 05êi,t−1 ) − (ˆ + 05)êt−1
2
N
√1 ẑTi
i=1 −1/2
= N
2 + Op (N )
N
T
1
N i=1 t=3 êi,t−1 (êit + 05êi,t−1 )
N
√1 ẑTi
i=1
= N
+ Op (N −1/2 ),
1 N 2
N i=1 ẑTi
where ˆ + 05 = Op (N −1/2 ) under the null hypothesis and local alternative. Furthermore,
by using the results in the proof of Lemma 1,
N
= i=1 ẑTi
WD
2
N
i=1 ẑTi −
2 1 N
N i=1 ẑTi
N
√1 ẑTi
i=1
= N
+ Op (N −1/2 )
1 N 2
N i=1 ẑTi
TESTING FOR SERIAL CORRELATION 1313
= Op (N −1/2 ).
It follows that WD − WD
Proof of Lemma 2
For the first order autocorrelation, Cox and Solomon (1988) derive the asymptotic
approximation
T
√ N
t=2 (uit − ūi )(ui,t−1 − ūi ) d 1 (T + 1)(T − 2)2
N i=1
N T −→ − ,
i=1 t=1 (uit − ūi )
2 T T 2 (T − 1)2
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
Using this result, the limiting distribution of the LM statistic is easily derived.
Proof of Theorem 2
(i) As shown in the proof of Lemma 1, we have ẑTi = ei AT ei + op (1). Using the
matrix notation introduced in Section 3.2, AT = M1 MT + T −1 1
MT MT . Let Sk denote a
selection matrix that is obtained from dropping the kth row of IT . Since MT
T = ST M
T =
0 and M1
T = S1 M
T = 0, it follows that AT
T = 0 and AT
T = 0. Since tr(M1 MT ) =
tr(MS1 ST M) = tr(ST MS1 ) = −(T − 1)/T and tr(MT MT ) = tr(ST MST ) = (T − 1) − (T −
1)/T it follows that tr(AT ) = 0. Thus,
The variance of zTi is obtained as var(zTi ) = i4 tr(A2 + AA ) by using the following results:
It follows that
2
var(zTi ) = i4 T −3+
T (T − 1)
Furthermore,
Using
tr(ST HT S1 ) = T − 1,
tr(ST HT ST ) = 0,
tr(ST
T
T HT S1 ) = tr(ST
T
T HT ST ) = 2(T − 1) − 1,
tr(ST
T
T HT
T
T S1 ) = tr(ST
T
T HT
T
T ST ) = 2(T − 1)2
yields
Downloaded by [KU Leuven University Library] at 04:57 18 July 2016
1 2
tr(AT HT ) = tr M1 MT HT − M M T HT =T −3+ ,
T −1 1 T (T − 1)
which is identical to tr(A2T + AT AT ). With these results the limiting distribution follows.
(ii) Using
ˆ + T −1
1
= Op (N −1/2 ) and êi = T −1 Tt=1 êit the LM∗ statistic can be
written as
N
ẑTi
LM∗ = i=1
N T
2
i=1 ˆ êi,t−1 − (1 −
ˆ )êi
t=2 (êi,t−1 − êi ) êit −
N
√1 ẑTi
i=1
=
N
N T
1
N i=1
T
t=2 (êit − êi )(êi,t−1 − êi ) + (T − 1)−1 t=2 (êit − êi ) + Op (N
2 −1/2 )
N
√1 ẑTi
i=1
= N + Op (N −1/2 )
1 N 2
N i=1 ẑTi
Proof of Theorem 3
N
N
N −1/2 ẑTi = N −1/2 ei M(D D − 2IT )Mei + Op (N −1 ) = ui AT ui + Op (N −1 ),
i=1 i=1
TESTING FOR SERIAL CORRELATION 1315
where
⎡ ⎤
−1 −1 0 0 ··· 0 0 0
⎢−1 0 −1 0 · · · 0 0 0⎥
⎢ ⎥
⎢ 0 −1 0 −1 · · · 0 0 0⎥
⎢ ⎥
D D − 2IT = ⎢ ⎥
⎢ ⎥
⎢ ⎥
⎣0 0 0 0 · · · −1 0 −1⎦
0 0 0 0 · · · 0 −1 −1
= 4i4 (T − 2)
Furthermore,
T −1
tr(AT HT ) = − 2(T − 2)
T
Proof of Lemma 3
L0 M êi =
k
Lk M êi + i ,
where
⎛ ⎞ ⎛ ⎞
0 ··· 0 1 0 ··· 0 1 0 ··· 0 0 ··· 0
⎜ ⎟ ⎜ ⎟
⎜0 · · · 0 0 1 ⎟ ⎜0 1 0 ···0⎟
L0 = ⎜
⎜ ⎟ Lk = ⎜
⎟
⎜
⎟
⎟
⎝
0⎠ ⎝ 0 ⎠
0 ··· 0 0 ··· 0 1 0 ··· 0 1 0 ··· 0
1316 B. BORN AND J. BREITUNG
T
" # 1 " #
= tr
LkLk − tr
T Lk
L k
T
T
T −k
= (T − 1) ,
T
it follows, as N → ∞,
p tr(M
Lk
L0 M) 1
ˆ k −→ == − ,
tr(M
Lk
Lk M) T −1
REFERENCES
Baltagi, B. H. (2008). Econometric Analysis of Panel Data. 4th ed. New York: Wiley & Sons.
Baltagi, B. H., Li, Q. (1991). A joint test for serial correlation and random individual effects. Statistics and
Probability Letters 11(3):277–280.
Baltagi, B. H., Li, Q. (1995). Testing AR(1) against MA(1) disturbances in an error component model. Journal
of Econometrics 68(1):133–151.
Baltagi, B. H., Wu, P. X. (1999). Unequally spaced panel data regressions with AR(1) disturbances.
Econometric Theory 15(06):814–823.
Bhargava, A., Franzini, L., Narendranathan, W. (1982). Serial correlation and the fixed effects model. Review
of Economic Studies 49(4):533–49.
Cox, D. R., Solomon, P. J. (1988). On testing for serial correlation in large numbers of small samples.
Biometrika 75(1):145–148.
Drukker, D. M. (2003). Testing for serial correlation in linear panel-data models. Stata Journal 3(2):168–177.
Hsiao, C. (2003). Analysis of Panel Data. 2nd ed. Cambridge: Cambridge University Press.
Inoue, A., Solon, G. (2006). A Portmanteau test for serially correlated errors in fixed effects models.
Econometric Theory 22(05):835–851.
Searle, S. (1971). Linear Models. New York: Wiley.
Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA:
MIT Press.