You are on page 1of 15

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

Pooled Mean Group Estimation of Dynamic


Heterogeneous Panels

M. Hashem Pesaran , Yongcheol Shin & Ron P. Smith

To cite this article: M. Hashem Pesaran , Yongcheol Shin & Ron P. Smith (1999) Pooled
Mean Group Estimation of Dynamic Heterogeneous Panels, Journal of the American Statistical
Association, 94:446, 621-634

To link to this article: https://doi.org/10.1080/01621459.1999.10474156

Published online: 17 Feb 2012.

Submit your article to this journal

Article views: 1899

Citing articles: 812 View citing articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=uasa20
Pooled Mean Group Estimation of Dynamic
Heterogeneous Panels
M. Hashem PESARAN, Yongcheol SHIN, and Ron P. SMITH

It is now quite common to have panels in which both T, the number of time series observations, and N, the number of groups,
are quite large and of the same order of magnitude. The usual practice is either to estimate N separate regressions and calculate
the coefficient means, which we call the mean group (MG) estimator, or to pool the data and assume that the slope coefficients
and error variances are identical. In this article we propose an intermediate procedure, the pooled mean group (PMG) estimator,
which constrains long-run coefficients to be identical but allows short-run coefficients and error variances to differ across groups.
We consider both the case where the regressors are stationary and the case where they follow unit root processes, and for both
cases derive the asymptotic distribution of the PMG estimators as T tends to infinity. We also provide two empirical applications:
aggregate consumption functions for 24 Organization for Economic Cooperation and Development economies over the period
1962-1993, and energy demand functions for 10 Asian developing economies over the period 1974-1990.
KEY WORDS: Consumption functions; Energy demand; Heterogeneous dynamic panels; 1(0) and 1(1) regressors; Pooled mean
group estimator.

1. INTRODUCTION estimates of the average of the parameters. This estima-


Recent years have brought increasing interest in dynamic tor, however, does not take account of the fact that certain
panel data models, where the number of time series obser- parameters may be the same across groups. At the other
vations, T, is relatively large and of the same order of mag- extreme are the traditional pooled estimators, such as the
nitude as N, the number of groups. Such panels arise par- fixed and random effects estimators, where the intercepts
are allowed to differ across groups while all other coeffi-
ticularly in cross-country analyses. In most applications of
cients and error variances are constrained to be the same.
this type, the parameters of interest are the long-run effects
In this paper, we consider an intermediate estimator, which
and the speed of adjustment to the long run. An example
we call the pooled mean group (PMG) estimator because it
is the large literature on testing purchasing power parity in
involves both pooling and averaging. This estimator allows
panels, where according to economic theory the long-run
the intercepts, short-run coefficients, and error variances to
coefficients of the logarithms of domestic prices, foreign
differ freely across groups, but constrains the long-run co-
prices, and exchange rates should be (1, -1, -1), with the
efficients to be the same. There are often good reasons to
speed of adjustment being of central policy concern. (A re- expect the long-run equilibrium relationships between vari-
cent review of this literature was provided in Rogoff 1996; ables to be similar across groups, due to budget or solvency
some of the related econometric issues were discussed in constraints, arbitrage conditions, or common technologies
Boyd and Smith 1998.) Another prominent example is the influencing all groups in a similar way. The reasons for as-
Fisher equation, which postulates a unit coefficient for the suming that short-run dynamics and error variances should
long-run effect of (expected) inflation on the nominal rate be the same tend to be less compelling. Not imposing equal-
of interest, but is silent as to the magnitude of the short-run ity of short-run slope coefficients also allows the dynamic
effects of changes in inflation on interest rates. specification (e.g., the number of lags included) to differ
There are two procedures commonly used for such pan- across groups.
els. At one extreme, one can estimate separate equations We use two important empirical examples to compare
for each group and examine the distribution of the esti- the MG, PMG, and dynamic fixed effect (DFE) estima-
mated coefficients across groups. Of particular interest will tors. The first is the consumption function in Organization
be the mean of the estimates, which we call the mean group for Economic Cooperation and Development (OECD) coun-
(MG) estimator. In earlier work, Pesaran and Smith (1995) tries. Determination of saving rates is an important policy
we showed that the MG estimator will produce consistent issue and, as with the example of purchasing power par-
ity mentioned earlier, the theory makes strong predictions:
The long-run income elasticity of consumption should be
M. Hashem Pesaran is Professor, Faculty of Economics and Politics,
Trinity College, University of Cambridge CB3 9DD, U.K., and Univer- unity in all countries, despite their important institutional
sity of Southern California (E-mail: hashem.pesaran@econ.cam.ac.uk). and cultural differences. Otherwise, national (private) sav-
Yongcheol Shin is Lecturer, Department of Economics, University of Ed- ing rates will be rising or falling indefinitely. The PMG
inburgh, EH8 9JY, U.K. (E-mail: YongcheoI.Shin@ed.ac.uk).RonP.Smith
estimator allows us to estimate this common long-run co-
is Professor, Department of Economics, Birkbeck College, University of
London, WIP IPA, U.K. (E-mail: R.Smith@bbk.ac.uk). Partial financial efficient without making the less plausible assumption of
support from the ESRC (grant H519255(03) and the Isaac Newton Trust identical dynamics in each country. In this application, both
of Trinity College, Cambridge is gratefully acknowledged. The first author N = 24 and T = 32 are quite large. The second example is
also acknowledges the hospitality of the IMF Research Department while
some of the ideas behind this article were formalized. The authors are also
grateful for the efforts of Sunil Sharma and Mansour Gill in compiling
the OECD dataset; for helpful discussions with Michael Binder, Nadeem © 1999 American Statistical Association
U. Haque, Robert MacKay, Sunil Sharma, and Richard Smith; and for Journal of the American Statistical Association
valuable comments on an earlier version from the editors and a referee. June 1999, Vol. 94, No. 446, Theory and Methods

621
622 Journal of the American Statistical Association, June 1999

energy demand in developing Asian economies. Prior to the the different equations.) Other approaches must be sought.
financial crises of the summer of 1997, this was the world's It is often likely that the cause of nonzero error covari-
fastest growing region, and the growth in its energy de- ances may be due to omitted common effects that impact
mand will have serious implications both for world energy all groups; an indication of misspecification rather than er-
markets and greenhouse gas emissions. As with the con- ror correlation. In such cases it is more appropriate to model
sumption function example, long-run responses of energy the effects that account for between-group nonzero error co-
demand to income and relative energy prices are likely to variances explicitly. This can be done in a number of ways.
be similar across countries, although short-run adjustments One possibility is to include cross-sectional means of the
in energy demand, depending on patterns of investment in included regressors as additional regressors in the model.
energy using equipment and supply constraints, is unlikely For instance, in an earlier work (Pesaran and Smith 1995),
to be homogeneous across countries. Again, the PMG esti- aggregate output was included in addition to industry out-
mator allows us to investigate long-run homogeneity with- put in industry employment equations. A second possibility
out imposing parameter homogeneity in the short run. In is to express all of the variables as deviations from their
this example, both N = 10 and T = 17 are quite small. respective cross-sectional means in each period. In the spe-
Thus there is a contrast in the panel dimensions in the two cial case where the slope coefficients are identical across
examples. groups, the common time-period effects will be completely
The article is organized as follows: Section 2 provides eliminated by this (cross-sectional) demeaning procedure.
a brief review of alternative panel data estimators and dis- In cases where the slope coefficients differ, such demeaning
cusses how the PMG estimator is related to them. Section will reduce (but not eliminate) the common time-specific ef-
3 sets out the model and its underlying assumptions, and fects. A third possibility is using explicit spatial models of
derives the log-likelihood function. Section 4 develops the the interactions between neighbors. In what follows, we as-
general theory of PMG estimation, in both stationary re- sume that the use of such procedures has made it reasonable
gressors and unit root processes. Section 5 discusses a num- to assume that the errors are independent between groups.
ber of modeling issues and presents the empirical appli- Second, there is the "small T, large N" dynamic panel
cations, and Section 6 contains some concluding remarks. literature. In the case where T is small, we (Pesaran and
Mathematical proofs are provided in an Appendix. Smith 1995) have shown that under certain assumptions,
The following notation is used throughout: The symbol the cross-section regression based on time-averages of the
1+ signifies convergence in probability; =?, weak conver- variables will provide consistent estimates of the long-run
gence in probability measure; ~, asymptotic equality in dis- coefficients. However, these assumptions are quite strong.
tributions; MN, mixture normal distribution; 1m , an identity In particular, they require that the group-specific parameters
matrix of order m; diag]-], a diagonal matrix; and I(d), an are distributed independently of the regressors, and that the
integrated variable of order d. regressors are strictly exogenous. For larger T, we (Pesaran
and Smith 1995) showed that the traditional procedures for
estimation of pooled models, such as the fixed effects, in-
2. ALTERNATIVE ESTIMATORS OF
strumental variables, and generalized method-of-moments
DYNAMIC PANELS
(GMM) estimators proposed, by among others, Ahn and
Numerous dynamic panel data estimators exist. To see the Schmidt (1995), Anderson and Hsiao (1981, 1982), Arellano
relationship of the PMG estimator advanced in this article (1989), Arellano and Bover (1995), and Keane and Runkle
to the ones proposed in the literature, it is convenient to (1992) can produce inconsistent, and potentially very mis-
divide the latter into three categories, differing in their as- leading estimates of the average values of the parameters in
sumptions about the relative magnitudes of Nand T. First, dynamic panel data models unless the slope coefficients are
there is the "small N, large T" time-series literature de- in fact identical. (This result holds in the case of dynamic
voted to estimating long-run effects. In the case of individ- random coefficient models even if it is assumed that the ran-
ual time-series where N = 1, the traditional approach was dom coefficients and the regressors are independently dis-
to estimate an autoregressive distributed lag (ARDL) model. tributed.) But tests on most panels of this sort, indicate that
In recent years, the emphasis has shifted to estimating coin- these parameters differ significantly across groups. Thus an
tegrating relationships from a single time series. The rela- estimator that imposes weaker homogeneity assumptions
tion between these two approaches has been discussed in would be useful.
earlier work (Pesaran and Shin 1999). For N > 1, following The third literature, and the one closest to our concerns,
Zellner (1962), the seemingly unrelated regression equation is the Bayes and empirical Bayes estimators proposed by
(SURE) procedure is often used. The main attraction of the Hsiao and Tahmiscioglu (1997) and Hsiao, Pesaran and Tah-
SURE procedure is that it allows the contemporaneous er- miscioglu (1999). These estimators build on the early work
ror covariances to be freely estimated. But this is possible of Lindley and Smith (1972) and Swamy (1970). Hsiao et
only when N is reasonably small relative to T. When N is al. (1999) considered Bayes estimation of short-run coeffi-
of the same order of magnitude as T -the case we are inter- cients in dynamic heterogenous panels, and established the
ested in-SURE is not feasible. (It is also worth noting that asymptotic equivalence of the Bayes estimator and the MG
most of the SURE literature is concerned with linear cross- estimator. In particular, they showed that the MG estimator
equation restrictions, whereas our concern with common is asymptotically normal for large N and large T as long
long-run coefficients implies nonlinear restrictions across as .jN IT --+ 0 as both Nand T --+ 00. Using Monte Carlo
Pesaran, Shin, and Smith: The Pooled Mean Group Estimator 623

experiments, they also showed that although the MG esti- run parameters. Thus, when using the random coefficient
mator is consistent, it is unlikely to be a good estimator approach, one must ensure that the joint distribution for
when either N or T is smalL The main difference between the short-run parameters implies a meaningful joint distri-
the approach adopted in this article and the literature on bution for the long-run parameters, or vice versa. In the
Bayes estimation is that whereas we regard the parameters case that we consider-homogenous long-run parameters-
as fixed, this literature regards them as random: drawn from the long-run parameters across the groups have a degener-
some distribution with a finite number of parameters. Given ate distribution, which raises some technical difficulties in
that the possibility of random parameters is admitted, the treating the heterogenous short-run parameters as random.
issue is not primarily one of classical versus Bayesian ap- Although a number of issues remain to be resolved, and
proaches, as classical estimators can be given a Bayesian there is unlikely to be a single estimator that is appropriate
interpretation. For example, the Swamy (1970) estimator for all dynamic heterogenous panel problems, we believe
for random coefficient models, motivated by classical gen- that the PMG estimator that we propose may be useful in
eralized least squares arguments, can also be viewed as an a number of cases that are important in practice.
empirical Bayes estimator.
The choice between fixed- and random-effects formula- 3. THE MODEL
tions, which has been extensively discussed in the literature, Suppose that given data on time periods, t = 1,2, ... ,T,
depends on a number of related considerations. First, the and groups, i = 1,2, ... , N, we wish to estimate an
purpose of the exercise matters, as the two approaches try ARDL(p, q, q, ... , q) model,
to answer distinctly different questions. Hsiao (1996, pp.
93-94) gave a number of examples in which the purpose of p q
analysis will determine the choice between the two formula-
tions. Stoker (1993, p. 1848) pointed out that if one wishes
Yit = L AijYi,t-j +L l5:j x i,t- j + J.li + e«, (1)
j=l j=O
to make inference about macro relationships from micro
estimates based on a subset of the population-a common
where Xit (k xl) is the vector of explanatory variables (re-
problem-then the effects must be treated as random.
gressors) for group i; J.li represent the fixed effects; the co-
Second, the perceived degree of commonality in the pa-
efficients of the lagged dependent variables, Aij, are scalars;
rameters matters. There is a continuum, with at one extreme
and l5ij are k x 1 coefficient vectors. T must be large enough
fixed and common parameters and the other extreme fixed
such that we can estimate the model for each group sepa-
and heterogenous parameters, with the random coefficient
rately. For notational convenience, we use a common T and
model fitting somewhere in between (see, e.g., Maddala and
p across groups and a common q across groups and regres-
Hu 1996). In practice, it is often difficult to determine where
sors, but this is not necessary. Similarly, time trends or other
on this continuum a particular case lies. The examples that types of fixed regressors, such as seasonal dummies, can be
we use are not samples, but rather almost the whole pop- included in (1). But to keep the notations simple, we do not
ulation of countries in a particular category: members of allow for such effects.
the OECD or Asian developing countries. This suggests It is convenient to work with the following repararneter-
treating their parameters as fixed. The difficulty of mak- ization of (1):
ing the distinction precise was illustrated by Hsiao et al.
(1995), who proposed a procedure for choosing between
p-1
fixed- and random-effects models, which they evaluated by
Monte Carlo methods. But in the simulations, the fixed ef- D.Yit = cPiYi,t-1 + 13:Xit + L A7jD.Yi,t-j
j=l
fects are actually generated randomly either as a mixture of
normal distributions or as a function of the regressors. Al- q-1
though this design effectively captures the situations where + L 157;D.xi,t-j + J.li + Eit, (2)
traditional random-effects estimators do badly, the fact that j=O
there is no obvious way to generate fixed-effects data for
heterogenous panels indicates the conceptual difficulty.
i = 1,2, ... , N, and t = 1,2, ... , T, where cPi -(1 -
Third are considerations of estimation. The fixed-effects
approach uses the conditional likelihood (conditional on the L:~=1 Aij),13i = L:J=o s.;
particular effects); the random approach uses the uncondi-
p
tional or marginal likelihood. Given the correct specifica-
tion, the latter will be more efficient. When N is large rela- A7j =- L Aim, j=1,2,oo.,p-1,
tive to T, the fixed-effects approach can be very inefficient. m=j+1
Fourth, and an important practical issue, is whether the
effects are independent of the regressors and this is the basis and
of a number of tests between the two specifications.
Dynamic models add further complications. The initial q
conditions can also be treated as fixed or random, and the
long-run parameters are nonlinear functions of the short-
157j = - L s-; j = 1,2, ... , q - 1. (3)
m=j+1
624 Journal of the American Statistical Association, June 1999

If we stack the time-series observations for each group, then Under Assumptions 2 and 3, relation (4) can be written
(2) can be written as more compactly as
p-1
i = 1,2, ... ,N, (6)
t1Yi = cPiYi,-l + X i{3i + L A:jt1Yi,-j
j=l where
q-1
+L t1Xi,-j t57j + /1i" + e., (4) i=1,2, ... ,N,
j=O
is the error correction component, W, = (t1Yi,-l,""
i = 1,2, ... ,N, where Y« = (Yi1, ... ,YiT)' is a T x 1 vec- t1Yi,-p+!, t1X i, t1X i,-l, ... , t1X i,-q+1, e), and "'i =
tor of the observations on the dependent variable of the «: ... ,
(A:1, ... , A:,p_1' t5:~, t5:,~_1 , /1i)" The group-specific
ith group, Xi = (XiI, ... , XiT)' is a T x k matrix of ob- equations in the panel (6) are nonlinear in cPi and 0, and
servations on the regressors that vary both across groups since 0 is common across groups the panel is subject to
and time periods, " = (1, ... ,1)' is a T x 1 vector of Is, cross-equation parameter restrictions. In what follows we
Yi,-j and Xi,-j are j period lagged values of Yi and Xi, also allow the error variances, var( cit) = o?, to differ across
and t1Yi = Y: - Yi,-l, t1X i = Xi - X i,-l, t1Yi,-j and groups.
t1X i,-j are j period lagged values of t1Yi and t1X i, and To estimate the model, we adopt a likelihood approach
ei = (cd, ... , ecr)', and initially assume that the disturbances Cit are normally
We make the following assumptions. distributed, though this assumption is not required for the
asymptotic results. Under Assumption 1, the likelihood of
Assumption 1. The disturbances Cit, i = 1,2, ... , N, t = the panel data model can be written as the product of the
1,2, ... , T, in (1) are independently distributed across i and likelihoods for each group. Because the parameters of inter-
t, with means 0, variances 0-;
> 0, and finite fourth-order est are the long-run effects and adjustment coefficients, we
moments. They are also distributed independently of the work directly with the concentrated log-likelihood function.
regressors, Xit. Given normality, we have
The assumption that the disturbances cit are distributed
independently across groups was discussed in Section 2.
The assumption that they are independent across time is
also not very restrictive, and can be satisfied in most appli-
cations by increasing the distributed lag orders on Yit and
Xit. The independence of the disturbances and the regres-
sors is needed for the consistent estimation of the short-run
coefficients, but, as shown by Pesaran (1997), it is relatively where Hi = IT - W i(W;Wi)-lW;, cp = (0', ¢', u')', ¢ =
straightforward to allow for the possible dependence of Xit (cP1' cP2, ... , cPN )', and a = (or, a~, ... , a~)'.
on eit when estimating the long-run coefficients, as long as For the proof of consistency and asymptotic normality
Xit have finite-order autoregressive representations. of the PMG estimators, we also require the following ad-
ditional assumptions.
Assumption 2. The ARDL(p, q, q, .. . , q) model (1) is
stable in that the roots of Assumption 4. cp E 8.., = 8 0 x 8q, x 8(}", where 8..,
p is a compact subset of IR n ", with n..,
= k + 2N. The true
L j
AijZ = 1, i=1,2, ... ,N, value of cp, denoted by CPo = (O~, ¢~' u~)', is an interior
point of 8..,.
j=l
lie outside the unit circle. Assumption 5.
This assumption ensures that cPi < 0, and hence there
exists a long-run relationship between Yit and Xit defined a. For a given sample size, T, and all values of
by cp E 8.." the redefined observation matrix Z(cp) =
[X( ¢, zr), €(O, cr), W(u)] has a full column rank,
Yit = - ({3:!cPi)Xit + 'T/it, where
for each i = 1,2, ... , N, where 'T/it is a stationary process. X(¢,u)
Pesaran, Shin, and R. J. Smith (1999) provide a general
framework for testing assumption 2, irrespective of whether = -[(cPI/adX~, (cP2/a2)X~, ... , (cPN/aN)X'tv)]'
the regressors, Xit, are 1(1) or 1(0). Assumption 2 also en- and W (o) and €(0, u) are block diagonal matrices
sures that the order of integration of Yt is at most equal to with their ith blocks given by Wi/ai and ("i(O)/ai.
that of Xit. b. When Xit are stationary, T- 1Z' (cp )Z (cp ) converges in
probability to a positive definite matrix as T -+ 00.
Assumption 3 (Long-Run Homogeneity). The long-run
coefficients on Xi, defined by Oi = -{3i/cPi, are the same c. When Xit are 1(1), as T -+ 00, KzZ'(cp)Z(cp)K z
across the groups, namely weakly converges in probability to a positive defi-
nite random matrix with probability 1, where K,
i = 1,2, ... , N. (5) diag (T- 1I k , T- 1/2I N, T-l/2I N(p+kq))'
Pesaran, Shin, and Smith: The Pooled Mean Group Estimator 625

Assumption 5 sets out standard identification conditions In order to derive the asymptotic distribution of the PMG
and rules out the possibility of exact multicollinearity. In the estimators we distinguish between the cases of stationary
case where Xit are 1(1), part c of this assumption ensures and nonstationary regressors, Xii. Although in principle the
that Xii are not themselves cointegrated. Notice also that same algorithm can be used to compute the PMG estimators
this assumption is weaker than is needed for identification irrespective of whether the regressors are 1(0) or 1(1), the
of the group-specific long-run coefficients, Oi, if the long- underlying asymptotic theories for these two cases are fun-
run homogeneity assumption 3 is not imposed. damentally different and their derivations require separate
Necessary conditions for parts band c of assumption 5 treatments.
to hold are given in the Appendix, Section A.l. In the case
where Xi! are 1(0), the pooled cross-product observation 4.1 The Case of Stationary Regressors
matrix on the regressors, N- 1 L~1 (¢Ual)T- 1X;HiX i, In this case, under fairly standard conditions, the consis-
must converge in probability to a fixed positive definite tency and the asymptotic normality of the ML estimators
matrix as T -+ oo. In the case where Xii are 1(1), of the parameters in (8) can be easily established. We set
N- 1 L~l (¢~ /oDT- 2X;H iX i must weakly converge to out the consistency and the asymptotic distribution of the
a random positive definite matrix with probability 1 as MLEs of <p, denoted by r:p = (O I ,;PI, iTI )', in the following
T -+ co. These conditions also cover the case where both T theorem.
and N are large. The conditions should hold for all feasible
values of ¢i and a}. Theorem 1. Under assumptions 1--4, and 5(a) and (b),
and assuming that the regressors Xit are stationary, the MLE
4. THE POOLED MEAN GROUP ESTIMATOR of e in the dynamic heterogeneous panel data model (6) is
consistent. Furthermore, as T -+ oc for a fixed N, the PMG
Maximum likelihood (ML) estimation of the long-run co-
estimator of 'l/J = (0 1 , cPT has the following asymptotic
efficients, 0, and the group-specific error-correction coeffi-
distribution:
cients, ¢i, can be computed by maximizing (8) with respect
to <p. These ML estimators (MLEs) are termed the pooled VT(~ - 'l/Jo) /!.- N{O,J-1('l/Jo)}, (12)
mean group (PMG) estimators to highlight both the pool-
ing implied by the homogeneity restrictions on the long-run where J('l/Jo) is the (k + N) x (k + N) information matrix
given by
coefficients and the averaging across groups used to obtain
means of the estimated error-correction coefficients and the J('l/Jo)
other short-run parameters of the model.
The PMG estimators can be computed by the familiar
Newton-Raphson algorithm, which makes use of both the o
first and second derivatives. Alternatively, they can be com-
puted by a "back-substitution" algorithm that makes use of
only the first derivatives of (8). In this case, setting the first
derivatives of the concentrated log-likelihood function with
respect to <p to 0 yields the following relations in 0, ¢i, and (13)
ao;, which need to be solved iteratively:
with QXiX i, QXi~iO' and q~iO~iO being the probability limits
of T- 1 X;HiX i, T- 1 X;Hi~iO, and T-l~;oHi~iO'

Proof In this case the proof can be established using


familiar results. (Details can be found in Pesaran, Shin, and
Smith 1997, appendix.)

Remark 1. Focusing on the long-run coefficients, for a


fixed N as T -+ ao the pooled MLE 0, defined by (9), is
asymptotically distributed as
i = 1,2, ... , N, (10)
VT(O - ( 0 )
and
N 2 ] -1 }
/!.- NO"'" ¢iO (Q
, [ ~ a2 XiX,
. _ Q S,€,oq~iO~iOQXi~iO
_. -1 I )
.
{ ,=1 ,0
i = 1, 2, ... ,N, (11)

where ei
= Yi.-l - XiO. To simplify the notations, we
(14)

denote ~i(O) by ei.Starting with an initial estimate of 0,


say 0(0), estimates of ¢i and a;
can be computed using (10) 4.2 The Case of Nonstationary Regressors
and (11), which can then be substituted in (9) to obtain a Here the underlying regressors are assumed to follow in-
new estimate of 0, say 0(1), and so on until convergence is tegrated processes of order 1, namely 1(1). The asymptotic
achieved. analysis in this case is more complicated, due to the fact
626 Journal of the American Statistical Association, June 1999

that the MLEs of the long-run and the short-run parame- Proof See the Appendix, Section A2.
ters, (J and (eli,oo')', converge to their true values at dif-
ferent rates, and the mere consistency of the MLEs is not Remark 2. More specifically, the pooled MLE 0, de-
sufficient to guarantee the weak convergence of the sample fined by (9), has the following large T asymptotic distribu-
information matrix to its true value. In particular, to ensure tion:
such a convergence, the sample information matrix should
satisfy the stochastic equicontinuity.condition. (For a gen-
eral treatment of these issues, see Pesaran and Shin 1999;
Saikkonen 1995.) T((J - (JO) ~ MN
A a
{
0,
(
L
N

t=l
¢~
a2
to
R X iX i
) -I} , (17)
The following theorem establishes the consistency, the
relative rates of convergence, and the asymptotic distribu-
tion of the MLE of 'P.
where R X i X i , i = 1,2, ... , N, are the random proba-
Theorem 2. Under Assumptions 1--4 and 5(a) and (c), bility limits defined by (AI). Notice that it is not re-
and assuming that Xit are 1(1), the MLE of the short-run quired that all the matrices R X i X i be positive definite. It
coefficients c/J and 00 in the dynamic heterogeneous panel is, however, easily seen that by part (c) of Assumption
data model (6) are VT consistent and the MLE of (J is T 5, 2:;:'1 (¢;ola;o)R x i X i is a positive definite matrix with
consistent, namely probability 1. Also note that in the case of 1(1) regres-
sors, the MLEs of the long-run and short-run parameters
are asymptotically distributed independently of each other.
and [For a proof of (17), see Section A3 in the Appendix.]
&- - 000 = op(l). (15)
Once the pooled MLE of the long-run parameters, 0, is
successfully computed, the short-run coefficients, including
Furthermore, for a fixed N and as T --+ 00, the MLE of 'l/J = the group-specific error-correction coefficients, ¢i can be
((J', c/J')' asymptotically has the mixture-normal distribution consistently estimated by running the individual ordinary
n.,t;l(-r], -'l/Jo) ~ MN{O, I-I ('l/Jo)} , (16)
least squares (OLS) regressions of ~Yi on W,}, i = «.
1, ... , N, where ei = Yi,-l - XiO. The covariance matrix
where n", = diag(T-IIk,T-l/2IN) and I('l/Jo) is the ran- of the MLEs, (0', ¢1, ... ,¢N, K,~ , ... , K,~ )', is then consis-
dom information matrix defined by (A20). tently estimated by the inverse of

N ¢2X'X' • t : . ,. . , . ,
~'P1x]e] -¢NXNeN -¢IX]W] -¢]~!!WN
2:i=l~ '2
<T] '2
<TN
'2
<T 1 <TN

~
<T]
0 (;.~]
<T]
0

(~~N 0 (~~N
<TN <TN
,
W!~] 0
<T]

4.3 The Case of Large T and Large N mator of the variance of ¢MG is given by
In this case the mean of the error correction coefficients
and the other short-run parameters can be estimated con-
sistently by the unweighted average of the individual coef-
ficients (or the MG estimators):

Alternatively, one can use the weighted estimator proposed


N
A N- 1 , " " A
by Swamy (1970) in the context of the static random co-
KMG = ~Ki'
efficient models. But as was shown by Hsiao, Pesaran, and
i=l
Tahmiscioglu (I999) and noted earlier in this article, the
Swamy estimator (also known as the empirical Bayes esti-
The variance of these estimators can be consistently esti- mator) and the MG estimator are asymptotically equivalent
mated along the lines suggested by Pesaran, Smith, and 1m as T --+ 00 and N --+ 00 such that VNIT --+ O. Under
(1996). For example, in the case of ¢MG, a consistent esti- these conditions, the asymptotic distribution of ¢MG, for
Pesaran, Shin, and Smith: The Pooled Mean Group Estimator 627

example, is given by the fixed-effects estimators. (The details of such tests were
discussed in Pesaran, Smith and 1m 1996.) The third issue
is that although the MG estimator is consistent for large N
and T, for small T the familiar lagged dependent variable
where cP = E(cPi) and t!.¢ = var(cpi). bias causes the estimates of Ai and cPi = (Ai - 1) to under-
For the common long-run coefficients, 0, the pooled MLE estimate their true values. Large N does not help with this
is consistent as long as T -+ 00, irrespective of whether N problem, as all of the estimates are biased in the same di-
is large or not. To obtain the asymptotic distribution of iJ rection. Pesaran and Zhao (1999) proposed a bias-corrected
when N is large, we need to assume that N is a monotonic MG estimator that directly adjusts the long-run coefficient
function of T, say N(T), such that N -+ 00 only as T -+ 00. by an estimate of its bias. In the case of the pooled estima-
In this setting the rate of convergence of iJ toward its true tors, the downward-lagged dependent variable bias, which
value is given by VNT in the case where the regressors are they also suffer, may offset the upward heterogeneity bias
1(0). For this result to hold, the limit of discussed by Pesaran and Smith (1995). In empirical appli-
cations it may be difficult to judge the relative effect of the
two biases, making inference about the speed of adjustment
difficult. This is a feature of both of our examples.

as N -+ 00 must be a positive definite matrix [see (14)]. 5.1 The Consumption Function in the OECD
When the regressors are 1(1), the rate of convergence of iJ The first example that we examine is a standard con-
is given by TVN as long as sumption function of the Davidson et al. (1978) type for
N ",2
a sample of OECD countries. Similar specifications have
1 also been estimated for a number of developing countries
N- "L..J 2 R XiX,
" ' .!iQ
i=l (JiO by Haque and Montiel (1989). We assume that the long-run
consumption function is given by
tends to a positive definite matrix with probability 1, as
N -+ 00 [see (17)]. Notice that iJ will not be consistent for
finite T, even if N -+ 00. i = 1,2, ... , N, t=1,2, ... ,T,
5. EMPIRICAL APPLICATIONS where Cit is the logarithm of real consumption per capita,
Before considering the two examples, we briefly dis- yft is the logarithm of real per capita disposable income, and
cuss three modeling issues. The first issue is that although n« is the rate of inflation. Most theories of aggregate con-
the aforementioned theory assumes that all long-run coef- sumption would suggest that 81i = 1. The PMG estimation
ficients are the same across groups, it is straightforward procedure allows us to estimate a common long-run coeffi-
to adapt the PMG estimator to allow only a subset of the cient and test whether it is unity. The inflation variable, ti«.
long-run coefficients to be the same while the others differ. is a proxy for various wealth effects, and we would expect
Details have been given in the earlier version of this article 82i < o. We assume that all of these variables are 1(1) and
(Pesaran, Shin, and Smith 1997). The second issue is that cointegrated, making Uit an 1(0) process for all i. In this
we need to choose between the alternative specifications. application we take the maximum lag as being 1; thus the
Tests of homogeneity of error variances and/or short- or autoregressive distributed lag (ARDL) (1,1,1) equation is
long-run slope coefficients can be easily carried out using
likelihood ratio and other classical statistical tests, because
Cit = I-ti + 610iyft + 61liyf t-l + 620i 1rit
the PMG and fixed-effects estimators are restricted versions + 621i 1ri,t-l + AiCi,t-1 + Cit,
of the set of individual group equations. Although it is com- and the error correction equation is
mon to use pooled estimators without testing the implied
restrictions, in the case of cross-country studies the like- t!.Cit = cPi(Ci,t-l - 80i - 81i Y it - 82i 1rit)
lihood ratio tests usually reject equality of error variances - 6 1l it!.yt - 6 21it!. 1rit + Cit,
and/or slopes (short-run or long-run) at conventional signif-
icance levels. This is the case in both of our examples. The where
interpretation of this feature has been discussed extensively I-ti
by Pesaran, Smith, and Akiyama (1998), and we return to (}Oi = 1- Ai'

this work in our conclusion. An alternative to likelihood


ratio tests would be to use Hausman (1978)-type tests. The () . _ 620i + 621i cPi = -(1 - Ai).
2t - 1 - Ai '
MG estimator provides consistent estimates of the mean
of the long-run coefficients, though these will be inefficient The foregoing error correction equations are written in
if slope homogeneity holds. Under long-run slope homo- terms of current, rather than lagged levels of the exoge-
geneity, the pooled estimators are consistent and efficient. nous regressors. This allows an ARDL(l, 0, 0) as a special
Therefore, the effect of heterogeneity on the means of the case.
coefficients can be determined by a Hausman-type test ap- We measure consumption by the logarithm of total real
plied to the difference between the MG and the PMG or private consumption per capita and inflation by the change
628 Journal of the American Statistical Association, June 1999

in the logarithm of the consumption deflator. The initial allows the user to start the iterations with different sets of
measure of income that we use is national disposable per initial values, depending on the nature of the ARDL spec-
capita income deflated by the consumption deflator, NDI. ification across the groups. When the same ARDL is esti-
(The data are taken from the OECD National Accounts mated across the countries, the MG and DFE estimates of
Statistics and were collected as part of an IMF project on the long-run coefficients can be used as initial values. In the
international saving rates.) Using 1960 and 1961 to create examples considered here, these resulted in the same max-
lags, we have data for 32 years (1962-1993) for 23 coun- imum for the pooled log-likelihood function, This has not
tries and 31 years (1962-1992) for Belgium. been the case in some other examples.
First, a common ARDL(1, 1, 1) was run for each country The DFE standard errors are corrected for the het-
separately. The results are given in the earlier version of eroscedasticity of error variances across countries; the un-
this article (Pesaran, Shin, and Smith 1997). For a long-run corrected ones are substantially smaller. The heteroscedas-
relationship to exist, we require that cPi -I- O. It is clear that ticity robust standard errors are computed allowing for a
in a number of countries, the hypothesis of no long-run re- general covariance matrix of the disturbances cit across i
lation would not be rejected. (Pesaran, Shin, and R. J. Smith (see, e.g., Baltagi 1995, pp. 12-13), The three estimates of
[1999] develop a test for the existence of a long-run rela- the long-run income elasticities are very similar, close to
tionship in ARDLs of this form.) The long-run income elas- .91 and significantly less than unity. The long-run infla-
ticities range from .62 in Switzerland to 1,19 in the United tion coefficients are all significantly negative, though the
States and are significantly less than unity in nine coun- estimates differ somewhat. As the econometric theory sug-
tries and significantly greater than unity in three (Italy, the gests, imposing homogeneity causes an upward bias in the
United Kingdom, and the United States). The long-run in- coefficient of the lagged dependent variable, and the MG
flation coefficient is more dispersed, ranging between 1.46 estimate suggests much faster adjustment than the PMG or
in Germany and -1.25 in Canada. However, it is negative DFE. The MG estimate of the adjustment coefficient also
in all but four countries. More than 60% of the change in has the smallest standard error.
the logarithm of per capita consumption is explained in all Imposing long-run homogeneity reduces the standard er-
but three countries: Luxembourg, Norway, and Austria. The rors of the long-run coefficients, but does not change the es-
standard error of regression varies from .5% in France to timates very much. This is confirmed by the Hausman test
3.3% in Turkey, and the equality of error variances does statistic of 1.19, which is X 2 (2) under the null hypothesis
not seem to be an appropriate assumption-a result born of no difference between the MG and PMG estimators. As
out by formal statistical tests. At the 5% level, there is evi- is clear from Table 1, likelihood ratio tests at conventional
dence of serial correlation in the equations for two of the 24 significance levels, on the other hand, would strongly reject
countries, and of functional form misspecification in three, all of the restrictions, including homogeneity of long-run
nonnormal errors in two, and heteroscedasticity in one. The coefficients. This is true for both income and inflation. If
tests have been described by Pesaran and Pesaran (1997). homogeneity of just the long-run income elasticities is im-
The RESET tests for functional form and the test of het- posed (23 restrictions), then the maximized log-likelihood
eroscedasticity are based on the fitted values for the changes (MLL) falls from 2,390 to 2,270. If homogeneity of both
in the logarithm of per capita consumption. The fact that income and inflation long-run effects is imposed (46 restric-
17 of the 24 country-specific equations show no evidence tions), then the MLL falls to 2,247.
of misspecification is reassuring. One advantage of the PMG over the traditional DFE
Table 1 presents the alternative pooled estimates: MG, model is that it can allow the short-run dynamic specifi-
which imposes no restrictions; PMG, which imposes cation to differ from country to country. The lag order was
common long-run effects; and DFE, which constrains first chosen in each country on the unrestricted model by the
all of the slope coefficients and error variances to be Schwarz Bayesian criterion (SBC), subject to a maximum
the same. The PMG computations were carried out us- lag of 1. Then, using these SBC-determined lag orders, ho-
ing the Newton-Raphson algorithm in a program writ- mogeneity was imposed. The most common choice, in half
ten in Gauss. The program and data are available on of the countries, was an ARDL( 1, 1, 0), one lag of consump-
http://www.econ.cam.ac.uk/f.aculty/pesaran. The program tion and income but only current inflation. Performance on
the misspecification tests and estimates of the long-run ef-
Table 1. Alternative Pooled Estimates for ARDL(1, 1, 1) Consumption fects were very similar to those obtained using an ARDL(l,
Functions for DECO Countries Over the Period 1962-1993 1, 1) model. Again, the Hausman test statistic does not reject
Using National Disposable Income
equality between the MG and PMG estimates, whereas the
MG PMG DFE likelihood ratio test statistic of 281 rejects equality of all of
the long-run coefficients. Notice that under the null hypoth-
Income elasticity, 01 .918 (.027) .904 (.010) .912 (.045)
Intlation effect. 02 -.353 (.117) -.466 (.063) -.266 (.099) esis, this statistic is asymptotically distributed as X 2 (46). In
Speed of adjustment, ¢ -.306 (.030) -.200 (.032) -.179 (.042) this example, estimates of the long-run coefficients are ro-
Maximized log likelihood 2,390 2,247 1,999 bust to the order of the ARDL. This is partly because T is
Number of parameters large; as shown in the next example, when T is small, the
estimated 168 122 30
choice of lag order is more important.
NOTE: Figures in parentheses are asymptotic standard errors. In the case of the dynamic Although data for national disposable income is widely
fixed effects estimates the standard errors are corrected tor possible heteroscedasticity in the
cross-sectional error variances.
available, it is not the most appropriate theoretical measure
Pesaran, Shin, and Smith: The Pooled Mean Group Estimator 629

of income. Thus we also estimated an equation using a bet- than 0 and greater than -.5, but the estimates of the ad-
ter measure, private disposable labor income (interest and justment coefficients tend to show more sensitivity to the
dividend income are excluded). Data availability reduces choice of the estimation method. The example of the con-
the sample to 15 countries, with some countries having as sumption function was chosen to demonstrate the PMG es-
few as 8 observations (the Netherlands). The characteristics timator because of its familiarity and importance. However,
of these data allow us to investigate the sensitivity of the it should be recognized that modeling consumption raises
estimators to a number of interesting features. The panel many difficult issues not captured by this simple model.
is very unbalanced, with T very small in some cases; in Clearly, some important variables are omitted, including
these cases, choosing lag order by the SBC is more reli- initial wealth, real interest rate effects, credit regulations,
able than using a common ARDL(l, 1, 1), which tends to government budget surpluses (deficits), and demographics.
show misspecification. There is also an extreme outlier. In This type of equation has proved structurally unstable in
Sweden, where the SBC chose ARDL(1, 1,0), the speed of a number of the countries in our sample. In many of the
adjustment was -.03 (.154), the long-run income elastic- countries one would not be able to reject the null hypothe-
ity was -3.74 (25.22), and the long-run inflation coefficient sis of no long-run relationship, though these tests have low
was -7.768 (40.36). Although this equation passes the four power. The MG estimates are also subject to small T bias
misspecification tests, the fit is very poor, and it distorts the because of the familiar lagged dependent variable problem.
MG estimates.
Table 2 presents the alternative pooled estimates of the 5.2 Energy Demand in Asian Developing Countries
long-run income elasticity for three groups of countries Our second example involves explaining the logarithm
(N = 15,14, and 9), for three ARDL specifications. The of energy consumption (measured in tons of oil equivalents
N = 15 group includes all the countries for which there [TOE] per capita), Cit, by the logarithm of real per capita
are data on private disposable labor income. The N = 14 output in international prices taken from the Penn World
group excludes Sweden. The N = 9 group also excludes Tables, Yit, and by the logarithm of real energy prices (aver-
countries for which T is 13 or less (Italy, Netherlands, New age energy prices per TOE in local currency, deflated by the
Zealand, Portugal, Spain). The effect of Sweden on the MG consumer price index), Pit. The data are for 10 Asian devel-
estimator in the N = 15 group is very obvious, completely oping countries for the years 1974--1990; thus both Nand
distorting the ARDL(I, 1, 1) estimate and severely biasing T are small. This example is drawn from Pesaran, Smith,
the SBC estimate. The PMG estimates are much less sensi- and Akiyama (1998), who discussed the data in more detail
tive, as one would expect, given the large standard errors on and investigated functional form, dynamics, disaggregation,
the Swedish estimates. Dropping Sweden from the sample and various other issues.
hardly changes the PMG estimate, but changes the MG sub- The initial unrestricted model is a partial adjustment
stantially. Again, the Hausman test statistic does not reject (ARDL(l, 0, 0» model:
equality between the MG and PMG, whereas the likelihood
ratio test statistic rejects equality of all of the long-run co-
efficients.
These results indicate that the PMG seems quite robust Partial adjustment and homogeneity of degree zero in prices
to outliers and to choice of lag order. Overall the estimates were not rejected in all but one of the 10 countries (Tai-
suggest that the average long-run income elasticity is rather wan). In the country-specific regressions, the long-run in-
larger than .9, but significantly less than unity on PMG come and price elasticity estimates for Malaysia, Sri Lanka,
standard errors. However, these standard errors may be un- and Bangladesh are implausible, which could be due to data
derestimates. The average long-run inflation effect is sig- inadequacies and the high level of aggregation. Apart from
nificantly less than O. These general conclusions seem ro- these, the estimates are reasonable. The long-run income
bust to lag order, sample, and estimation method. The av- elasticity ranges from .835 to 1.564, with an average of
erage adjustment coefficient seems to be significantly less 1.12; the long-run price elasticities range from .05 to -.488,
with an average estimate of -.22. All of the estimates of
Table 2. Alternative Pooled Estimates of the Long-Run Consumption In-
the adjustment coefficients, cPi' are negative and fall in the
come Elasticity Using per Capita Real Private Disposable Labor Income range - .132 to - .825. The standard error of the regression
varies from 1.7% in the case of Thailand to 6.3% in Sri
N ARDL MG PMG DFE Lanka; so again, the assumption of constant error variances
15 1,0,0 .948 (.070) .933 (.010) .936 (.022) across countries seems inappropriate. Diagnostic tests re-
15 1,1,1 -12.8 (13.8) .918 (.016) .935 (.023) veal few problems: one failure for serial correlation and
15 sse .669 (.319) .938 (.013) one for normality at the 5% level. When the order of each
14 1,0,0 .999 (.052) .934 (.010) .94 (.012)
14 1, 1, 1 .918 (.066) .919 (.016) .943 (.021) lag is chosen by the SBC for each country, starting with a
14 sse .985 (.054) .939 (.013) general ARDL(l, 1, 1) specification, the ARDL(l, 0, 0) is
9 1,0,0 .993 (.060) .926 (.011) .931 (.014) chosen for seven countries, a static model is chosen for In-
9 1,1,1 .916 (.100) .927 (.015) .925 (.025)
dia and Sri Lanka, and an unstable ARDL(l, 0, 1) is chosen
9 sse .995 (.059) .941 (.013)
for Taiwan.
NOTE: Figures in parentheses are asymptotic standard errors. In the case of the dynamic The alternative pooled estimators for the ARDL(l, 0, 0)
fixed effects estimates the standard errors are corrected for possible heteroscedasticity in the
cross-sectional error variances. specification are presented in Table 3. The table presents es-
630 Journal of the American Statistical Association, June 1999

Table 3. Alternative Pooled Estimators for ARDL(1, 0, 0) Energy Demand Table 4. Alternative Pooled Estimators for ARDL-SBC
Function for 10 Asian Economies Over the Period 1974-1990 Energy Demand Function

Static DFE
MG PMG DFE fixed-effects MG PMG estimators
estimators estimators estimators estimators estimators estimators ARDL(1, " 1)
Output 1.228 1.184 1.301 1.009 Output 1.277 1.171 1.181
Elasticity (0,) (.183) (.039) (.109) (.037) Elasticity (iI,) (.182) (.036) (.147)
Price -.261 -.339 -.365 -.067 Price -.160 -.301 -.324
Elasticity (02 ) (.118) (.033) (.097) (.030) Elasticity (0 2 ) (.164) (.028) (.0137)
Adjustment -.524 -.298 -.235 -1 Adjustment -.502 -.417 -.153
Coefficient (4)) (.070) (.063) (.040) (N/A) Coefficient (¢) (.113) (.121) (.042)
Log-likelihood 347.12 322.79 288.36 186.95 Log-likelihood 356.81 315.41 300.27
NxT 170 170 170 170 NxT 170 170 170
No. of estimated No. of estimated
parameters 50 32 15 13 parameters 51 33 17

NOTE: Figures in parentheses are asymptotic standard errors. In the case of the dynamic NOTE: Figures in parentheses are asymptotic standard errors. In the case of the dynamic
fixed effects estimates the standard errors are corrected for possible heteroscedasticity in the fixed effects estimates the standard errors are corrected for possible heteroscedasticity in the
cross-sectional error variances. cross-sectional error variances.

timates of the long-run output and price elasticities and the mogeneity restrictions and dynamic specification interact
adjustment coefficients, together with the MLL, the number in a complex way. What might be the optimal order for
of observations, and the number of parameters estimated the country-specific estimates may not be the optimal order
(including the error variances). As in the previous exam- when cross-country homogeneity restrictions are imposed.
ple, likelihood ratio tests reject the hypothesis of equality The rather serious policy implications of an income elastic-
of long-run coefficients, whereas Hausman tests do not re- ity around 1.2 and a price elasticity around -.3 have been
ject this hypothesis when the PMG and MG estimators are discussed by Pesaran, Smith, and Akiyama (1998).
compared. Because the MG estimates have large standard
errors, the Hausman test is likely to have low power. If
the focus of the analysis is on average (across countries) 6. CONCLUDING REMARKS
income and price elasticities, then the PMG estimates are The PMG estimator, which assumes homogeneous long-
probably preferable to the MG estimates on the grounds of run coefficients, provides a useful intermediate alternative
their better precision and the fact that they are less sensi- between estimating separate regressions, which allows all
tive to outlier estimates. In this context, the Hausman test coefficients and error variances to differ across the groups,
can be seen as providing formal statistical evidence that we and conventional fixed-effects estimators, which assume
are not in gross violation of the data by relying on PMG that all slope coefficients and error variances are the same.
estimates rather than the MG estimates. The estimates sug- It has the practical advantage in allowing the short-run
gest a long-run income elasticity of 1.18 (.039) and a price dynamics to be data determined for each country, taking
elasticity of -.34 (.033). The PMG estimates are close to into account the number of time series observations avail-
the MG estimates, and the DFE estimates are rather larger. able in each case. A number of unresolved issues remain,
The dynamics clearly matter, and the static fixed-effects es- however. For small T, all of the estimators (group-specific,
timator has an insignificant price effect. The standard er- MG, PMG, and fixed-effects) will be subject to the familiar
rors of both the PMG and the DFE are very much smaller downward bias on the coefficient of the lagged dependent
than those of the MG; pooling sharpens the estimates con- variable. Because the bias is in the same direction for each
siderably. As before, pooling also leads to a much smaller group, averaging or pooling does not reduce this bias. Bias
estimated speed of adjustment; the MG estimates suggest corrections are available in the literature (e.g., Kiviet and
speeds of convergence to equilibrium of around 50% per Phillips 1993), but these apply to the short-run coefficients.
year; the PMG, 30%; and the DFE, about 20%. Because the long-run coefficient is a nonlinear function of
Table 4 gives MG and PMG estimates when the order of the short-run coefficients, procedures that remove the bias
the specification in the individual countries is chosen by the in the short-run coefficients can leave the long-run coeffi-
SBC. This procedure cannot be used with the fixed-effects cient biased. Pesaran and Zhao (1999) discussed how the
estimator; the ARDL(1, 1, 1) estimates are given instead. bias in the long-run coefficient can be reduced. The MG
Again, the PMG estimates of the long-run price and income estimator used in this article is a simple unweighted mean
elasticities hardly change, though the estimated speed of ad- of the coefficients. Weighted means are an obvious alterna-
justment is rather higher, partly because the SBC chooses tive; Hsiao et al. (1999) have considered Bayes estimation
the static model, with instantaneous adjustment, in the case of the means of short-run coefficients in dynamic panel data
of some countries. Using the SBC gives a substantial im- models. Estimation in this article was conducted under the
provement in the MLL for the MG estimator (at 357 it is assumption that a long-run relationship existed. It would be
close to the ARDL(1, 1, 1) MLL of 362, with far fewer useful to have a panel test for the existence of a long-run
parameters) but less for the PMG, where the SBC MLL relationship when the variables are 1(1), similar to the panel
is worse than the ARDL(1, 0, 0) MLL. It is clear that ho- unit root test suggested by 1m, Pesaran, and Shin (1997).
Pesaran. Shin, and Smith: The Pooled Mean Group Estimator 631

Perhaps the most important issue is interpretation of the and


heterogeneity. Most studies that estimate separate relation- N
ships for a number of groups find differences in coefficients BT = T- 1 L ~ [(~Yi - <Pi{i)'Hi(~Yi - <Pi{i) - e;Hieij.
that are not only statistically significant, but also econom- i=l G"i

ically implausible. In our empirical examples such cases (A6)


were the income elasticity of consumption out of labor in-
Also, using the result
come in Sweden and the income elasticity of demand for en-
ergy in Sri Lanka. Despite this, the MG and PMG estimates <Pi{i (<Pi - <PiG + <PiG)({i - {iG + {iO)
of the long-run coefficients tend to be sensible. One expla-
nation is that the group-specific estimates are biased be- -<PiXi(O - ( 0 ) + (<Pi - <PiG){iO + <PiG{iO,
cause of sample-specific omitted variables or measurement and noting that ~Yi = <PiG{iG + WiKiO + e., we have
errors correlated with the regressors. If one is estimating
an equation for a single group, then one might experiment ~Yi - <Pi{i = ei + Wi Kia + <PiXi(O - (0) - (<Pi - <PiG){iO.
with different specifications or alternative data until plausi- (A7)
ble estimates are obtained. When estimating equations for
large numbers of groups, where no other data is available [See (6), and note that under the data-generating process, cp = CPo.]
in many cases, this is not an option. If the coefficients re- Substituting (A7) in (A.6), and after some algebra, we have
ally are the same and the bias-inducing correlations are not
systematic (i.e., they average to 0 over groups), then pooled BT = (0 - ( 0 ) ' (~ ~} X;~iXi) (0 - ( 0)
estimation will be appropriate despite the homogeneity re-
strictions being rejected. However, there is no obvious way
of determining from the data that this is the case. This
raises obvious problems for inference that require further
+ t [(<Pi ~tO)2] ({~O~i{iO)
analysis.
APPENDIX: MATHEMATICAL PROOFS _ 2 ~ [ <Pi (<Pia? <PiG)] ({~oHiX~O - ( 0))
To save space, we give the proofs only for cases where the
regressors are 1(1). Proof for the 1(0) case is given in the appendix
to the earlier version of this article (Pesaran et al. 1997). + 2(0 - (0)' (~ ~? X;~iei)
A.1 Preliminary Results
A.1.1 Some Useful Probability Limits
To simplify the notations in what follows, we denote {i (0) by {i
- 2 t [(<Pi ~to)] ({:O~iei) , (A.8)

and {i(OO) by {iO. When Xi! are all 1(1), the following probability
which can also be written more compactly as [recall that 'l/J =
limits exist as T --+ 00, for each i:
(0',4>')']

BT = ('l/J -'l/Jo)'GT('l/J -'l/Jo) + 2('l/J -'l/Jo)'fT , (A9)


T-1X;Hiei = Op(I), (AI)
where
where RXiX i and RXi~iO are Op(l) random matrices. We also
have
"\'N ",2 X;HiX i
Wi=l;;ti T

and o
i = 1, .. . ,N. (A2)
o o
A.1.2 Decomposition of the Log-Likelihood Ratio
Using (8), it is easily seen that
o
T-1[lT(cpO) -IT(cp)j = ~ (AT + BT), (A3)

where
"\'N ~ X;Hi"i
AT = AD" +CT , (AA) L..,..i=l O"i T

eioH,,,,
AD" = L
N ( 2
~i~ - In ~i2.0
2)
1 ,-
fT =
1
-~ T
(A 10)
a: a
i=l t t

CT = ~ (~ - ~)
L... a a
(e;Hiei -
T
~;o) , (AS)
i=l t t
o
632 Journal of the American Statistical Association, June 1999

In terms of the matrix notations used in Section 3, GT can also already been covered. Notice that on N T(00,8d ) , we also have
be written as [d] ~ s:
Let C(<Po,8 p,8 d) = UOd,op (NT (00, 8d) x B(po,8 p », where
the union is taken over all values of 8d and 8p such that 8", =
(8~ + 8~)1/2 and s, = (8~ + 8;)1/2. We now prove that for every
where ZI(<p) = [X(e'/>,O'):{(O, 0')], X(e'/>, 0') = -[(.pdaI)X~, 8", > 0,
(.p2/a2)X;, ... , (.pN IaN )X;'" )]', and W(O') and {(O,O') are lim Pr{ inf T- 1[IT(<po) -IT(<p)] > O} = 1. (AI4)
block diagonal matrices with their ith blocks given by W;jai and T--+oo ",EC(",o,Op,Od)

f.i(O)lai, with 0 = 0 0. Therefore, G T is the partitioned inverse Using (A4) and (A9) in (A3), we first note that
of the upper left-hand corner of
2infT- 1[IT(<po) -IT(<p)]

~ inf(A a ) + inf(CT) + inf[(,p - ,po)'GT(,p - ,po)]

with <p = <po. Hence, under assumption 5, G T is also a positive


+ 2inf[(,p - ,po)'fT], (AI5)
definite matrix. where it is now assumed that all of the inf operations are taken
over the set <.p E C(<Po,8 p,8d). But
A.2 Proof of Theorem 2: The Case of 1(1) Regressors N ,
Partition <p = (O',p')' into the long-run parameters, 0, and (.'I-'/, _ ./, )'f = d' ~.pi XiHiei
'1-'0 T L...J a2 T3/2
the short-run parameters, p = (e'/>',O")'. Define the open ball i=l t

B(00,8 0 ) = {O E eo: 110 - 0011 < 80 } and its complement N

13(00 , 80) = {O E 80: 110 - 0011 ~ 80}. _ ~ (.pi - .pia) f.;oHiei (AI6)
L...J a2 T
To prove the consistency of the MLE of 0 (denoted by 9), it is i=l t

sufficient to show that for all values of pEep = 8", x 8/T and Therefore, using the results in (Al) and (A.2), and recalling that
for every 80 > 0, (see, e.g., Saikkonen 1995, p. 903), d and p are defined on compact sets, it then follows that inf[(,p-
,po)'fT] = opel). Also from (A2), it is easily seen that CT = opel).
lim Pr{ _ inf T- 1[IT(<Po) -IT(<p)] > O} = 1. (AI2)
T--+oo 'l'EB(Oo ,06) «e; Using these results in (AI5), we now have

Using the results in (Al) and (A2) and under assumption 4, it 2infT- 1[IT(<po) -IT(<p)] ~ inf(A a )

is easily seen that except for the first term in (A8), the inf over
+ inf[(,p - ,po)'GT(,p - ,po)] + opel), (AI?)
<p E 13 (00 , 80 ) x 8 p of all of the other terms are at most Op (1).
Therefore, where it is again assumed that all of the inf operations are
taken over the set <P E C(<po, 8p , 8d). But the ith term in A a -
namely, (aTolal) - In(a;ola;) - l-e-attains its unique minimum
at (a;ol an = 1 and is strictly positive for all feasible values of
a; not equal to a;o, for all i = 1,2, ... , N. Hence inf(A a ) > 0 <=}
8a > O. To establish (AI4), it thus is sufficient to prove that even
if 8a = 0, there exists 8", > 0 such that
lim Pr{inf{(,p - ,po)'GT(,p - ,po)} > O} = 1.
+ Opel), T--+oo

To this end, first note that


~ T8~Amin(RT) + Opel), (Al3)
(,p - ,pO)'GT(,p - ,po) = [K1/>(,p - ,po)]'~h[K1/>(,p - ,po)],
where Amin (R T is the smallest eigenvalue of R T
where K1/> = diag (T 1 / 2Ik,IN), K1/>(,p - ,po) = iJ = (d',(e'/>-
)

2::1 (cPT lal)(X;HiX;jT2) defined over e p . As T -+ 00, R T


converges to R = 2:: 1
(.pT la;)Rxix i with probability 1,
where RX i X i is given by (Al). But under assumptions 4 and 5,
e'/>o)')', and QT = K;lGTK;I. Hence
inf{(,p - ,pO)'GT(,p - ,po)} ~ 8~Amin(QT)'
2::1 (</iT laT)RxiXi is a positive definite matrix with probabil- where Amin (QT) is the smallest eigenvalue of QT, and 8,J = (8~ +
ity I for all values of p E 8 p . Hence, as T -+ 00, Amin(R T ) also 8~)1/2 > O. (Notice that for 8a = 0, we have 8{) = 8'f' > O. In
weakly converges to Amin (R) > 0, and the right side of (Al3) the case where 8a > 0, (AI?) will be satisfied even if 8,J = 0.)
will increase without bounds with probability 1, which establishes As T -+ 00, Amin (QT) converges weakly to Amin (9) > 0, which
the consistency of 0. is the smallest eigenvalue of Q, where
To prove the superconsistency of 0, and the consistency of the
MLEs of the other parameters, we define B(Po,8p ) = {p E "\'N ¢2 ROO
Ui=1 ~ XiX i
e p : lip - Poll < 8p},B(e'/>0,8¢» = {e'/> E 8",: lie'/> - e'/>oll < o o
8¢},B(0'0,8 a ) = {O' E 8/T: 110' - 0'011 < 8a } and their comple-
ments, B(Po,8p),B(e'/>0,8¢), and B(0'0,8 a ) , and follow Saikko-
Q=
nen (1995) and define the open shrinking ball NT(Oo,8d) = {O E
eo: T 1 / 2110 - 0 0II < 8d } for 0 and its complement NT (00, 8d) = o o
{O E eo: T 1 / 2110 - 0 0 11 ~ 8d}. Because the consistency of 0
is already established, we focus our analysis on values of 0 close and RX i X i and q€iO€iO' i = 1,2, ... , N, are defined in (Al) and
to 0 0, defined by 0 = 0 0 + T- 1 / 2d , where we take d to be a (A2). But under assumptions 1, 4, and 5(a) and (b) Q is positive
k x 1 vector of fixed constants defined on a compact set. The case definite with probability 1, and hence Amin(Q) > 0 with prob-
where elements of d are allowed to increase without bound has ability 1. Hence for sufficiently large T, inf {(,p - ,po)'G T (,p -
Pesaran, Shin, and Smith: The Pooled Mean Group Estimator 633

""an> o. This completes the proof and establishes the desired A.3 Asymptotic Distribution of the Long-Run Coefficients:
consistency and superconsistency results given by (4.7). The Case of 1(1) Regressors
To derive the asymptotic distribution of the MLEs of 0 and
First, note that
cPi, i = 1, ... , N, we now allow for the fact that in this case the
different components of the score vector and the sample infor- (A25)
mation matrix converge to their true values at different rates. Let
D", = diag (T-1l k , T- 1 / 2I N ) , and define and

dT("") D",[81T(C;?)/a""J, H i(6Yi - <PiYi,-l) = Hd-X,OOcPiO - (<Pi - cPiO)Yi,-l + e.].


2Z
IT("") D",[-a T(C;?)/a""a""'jD",. (A18) (A26)
Then, using standard results from the unit root literature, it can Using (6) and (A25) in (10), we have
be established that (see, e.g., Phillips and Durlauf 1986) that
<Pi - cPiO = (~:Hi~i)-l[~:HiXi(O - OO)cPiO + ~:H;cij. (A27)
dT(""a) ~ MN{O,I(""on, IT(""o) => I(""a), (A19)
where the (k + N) x (k + N) matrix, Moreover, using (A26) in (9),

o o
o o
7:(""0) ==

o o

(A20) (A28)

is the random information matrix, which is positive definite with Substituting (A27) in (A28), and solving for 0- 0 0 , we obtain
probability 1.
(A29)
Now using the mean-value expansion of 8l T «(j;) / around a"" ""0
and standardizing the result by D"" we have where
D;j;l (~- ""0) = I T (i/J )- l d T (""o), (A2l)
where the (i, j) element of a2z T (cp) /8""8"'" is evaluated at
('Pi, 'Pj) and 'Pi is a convex combination of 'Pia and <Pi.
Notice, however, that in the case of the models with unit root
(A30)
regressors the sample information matrix IT (i/J) need not con-
verge weakly to I(""o) even when if converges to To ensure ""0.
that such a convergence occurs, the sample information matrix and
IT ("") should satisfy the stochastic equicontinuity condition (see, N .r-

e.g., Saikkonen 1995, condition SEa, p. 894). This result is sum-


marized in the following lemma.
SNT = - '"
~ ~cPi X'H
i

iCi
i=l t

N
Lemma A.i. Under assumptions 1--4, and 5(a) and (c), and as-
suming that the regressors Xi! in the dynamic heterogeneous panel +2:
i=l
data model (6) are 1(1), the sample information matrix, 'IT(""),
defined by (AI8), satisfy the stochastic equicontinuity condition Using the consistency of 0; <P1, ... ,<P N; ai, ... ,a'ftv, it is also eas-
such that ily seen that
sup IIIT("".) - 'IT (""0) II S Op(1), (A.22)
"'. E 8(¢0 ,J) x NT(Oo .J)

where B(<p,t5) = {<p. E 8¢: 11<p. - <p11 < t5} and N T(O,t5) = (A32)
{O. E eo: T1/2110. - Oil < t5}, with 15 a common positive real
number. where RXi€iO and q€iO€iO are defined by (AI) and (A2).
Using (Al), (A32), and the consistency of <Pi and a;, i =
Proof See the Appendix in the earlier version of this article 1,2, ... , N, in (A30) and (A.31), we now have
(Pesaran et a1. 1997). N
In view of the relative orders of consistency of the MLEs es- T - 2 S NT => '~
" cP;o R XiX i'
-2-
tablished in (15) and the fact that the sample information matrix i=l aiD
satisfies the stochastic equicontinuity condition, (A22), it also fol-
lows that (see Saikkonen, 1995, prop. 3.2)
(A23)
Using this result in (A21) now yields
where RXiX i is defined by (A.I), Using these results in (A29)
D;j;l(~ - ""0) = 'IT(""0)-10 T(",,0) + op(l). (A24) and by the continuous mapping theorem, we obtain (17).
Using (A.19) in (A.24) and by the continuous mapping theorem,
we obtain (16). [Received August 1997. Revised December 1998.]
634 Journal of the American Statistical Association, June 1999

REFERENCES Maddala, G. S., and Hu, W. (1996), "The Pooling Problem," in The Eco-
nomics of Panel Data: A Handbook of Theory With Applications, (2nd
Ahn, S. C; and Schmidt, P. (1995), "Efficient Estimation of Models for
rev. ed.), eds. L. Matyas and P. Sevestre, Dordrecht: Kluwer, chap. 13.
Dynamic Panel Data," Journal of Econometrics, 68, 29-52.
Matyas, L., and Sevestre, P. (Eds.) (1996), The Econometrics of Panel
Anderson, T. W., and Hsiao, C. (1981), "Estimation of Dynamic Models
Data: A Handbook of Theory With Applications, (2nd rev. ed.), Dor-
With Error Components," Journal of the American Statistical Associa-
drecht: Kluwer.
tion, 76, 598--606.
Pesaran, M. H. (1997), "The Role of Economic Theory in Modelling the
- - - (1982), "Formation and Estimation of Dynamic Models Using
Long-Run," The Economic Journal, 107, 178-191.
Panel Data," Journal of Econometrics, 18,47-82.
Arellano, M. (1989), "A Note on the Anderson-Hsiao Estimator for Panel Pesaran, M. H., and Pesaran, B. (1997), Microfit 4.0: Interactive Econo-
metric Analysis. Oxford, U.K.: Oxford University Press.
Data," Economics Letters, 31, 337-341.
Arellano, M., and Bover, O. (1995), "Another Look at the Instrumental Pesaran, M. H., and Shin, Y (1999), "Long-Run Structural Mod-
Variable Estimation of Error-Components Models," Journal of Econo- elling," unpublished manuscript, University of Cambridge. (http://
metrics, 68, 29-52. www.econ.cam.ac.uk/faculty/pesaran/).
Baltagi, B. H. (1995), Econometric Analysis of Panel Data, New York: - - (1999), "An Autoregressive Distributed Lag Modelling Approach
Wiley. to Cointegration Analysis," in Econometrics and Economic Theory in the
Boyd, D. A. C; and Smith, R. P. (in press), "Testing for Purchasing Power 20th Century: The Ragnar Frisch Centennial Symposium, ed. S. Strom,
Parity: Econometric Issues and an Application to Developing Countries," Cambridge, U.K.: Cambridge University Press.
unpublished manuscript, Manchester School. Pesaran, M. H., Shin, Y, and Smith, R. J. (1999), "Bounds Testing
Davidson, J. E. H., Hendry, D. F., Srba, F., and Yeo, S. (1978), "Econo- Approaches to the Analysis of Long Run Relationships," Unpub-
metric Modelling of the Aggregate Time-Series Relationship Between lished manuscript, University of Cambridge (http://www.econ.cam.ac.
Consumers' Expenditure and Income in the United Kingdom," Economic uk/facuity/pesarani.
Journal, 88, 661--692. Pesaran, M. H., Shin, Y., and Smith, R. P. (1997), "Pooled Estimation of
Haque, N. U., and Montiel, P. (1989), "Consumption in Developing Coun- Long-Run Relationships in Dynamic Heterogenous Panels," DAE Work-
tries: Tests for Liquidity Constraints and Finite Horizons," Review of ing Papers Amalgamated Series 9721, University of Cambridge.
Economics and Statistics, 71, 408-415. . Pesaran, M. H., and Smith, R. P. (1995), "Estimating Long-Run Relation-
Hausman, J. (1978), "Specification Tests in Econometrics," Econometrica, ships From Dynamic Heterogeneous Panels," Journal of Econometrics,
46, 1251-1271. 68,79-113.
Hsiao, C. (1996), "Random Coefficient Models," in The Economics ofPanel Pesaran, M. H., Smith, R. P., and Akiyama, T. (1998), Energy Demand in
Data: A Handbook of Theory With Applications (2nd rev. ed.), eds. L. Asian Economies, Oxford, U.K.: Oxford University Press.
Matyas and P. Sevestre, Dordrecht: Kluwer, chap. 5. Pesaran, M. H., Smith, R. P., and Im, K. S. (1996), "Dynamic Linear Mod-
Hsiao, C., Pesaran, M. H., and Tahmiscioglu, A. K. (1999), "Bayes Esti- els for Heterogeneous Panels," in The Economics ofPanel Data: A Hand-
mation of Short-Run Coefficients in Dynamic Panel Data Models," in book of Theory With Applications, (2nd rev. ed.), eds. L. Matyas and P.
Analysis of Panels and Limited Dependent Variables: A Volume in Hon- Sevestre, Dordrecht: Kluwer, chap. 8.
our of G. s. Maddala, eds. C. Hsiao, K. Lahiri, L-F. Lee, and M. H. Pesaran, M. H., and Zhao, Z. (1999), "Bias Reduction in Estimating Long-
Pesaran, Cambridge, U.K.: Cambridge University Press. Run Relationships From Dynamic Heterogeneous Panels," in Analysis
Hsiao, C., Sun, B. H., and Lightwood, J. (1995), "Fixed Versus Random of Panels and Limited Dependent Variables: A Volume in Honour of
Effects Specification for Panel Data Analysis," unpublished manuscript, G. S. Maddala, eds. C. Hsiao, K. Lahiri, L-F. Lee, and M. H. Pesaran,
University of Southern California. Cambridge, U.K.: Cambridge University Press.
Hsiao, C; and Tahrniscioglu, A. K. (1997), "A Panel Analysis of Liquidity Phillips, P. C. B., and Durlauf, S. (1986), "Multiple Time Series With
Constraints and Firm Investment," Journal of the American Statistical Integrated Variables," Review of Economic Studies, 53, 473-496.
Association, 92, 455-465. Rogoff, K. (1996), "The Purchasing Power Parity Puzzle," Journal of Eco-
Im, K. S., Pesaran, M. H., and Shin, Y. (1997), "Testing for Unit Roots nomic Literature, 34, 647--668.
in Heterogeneous Panels," unpublished manuscript, University of Cam- Saikkonen, P. (1995), "Problems With the Asymptotic Theory of Maxi-
bridge (http://www.econ.cam.ac.uk/faculty/pesaran). mum Likelihood Estimation in Integrated and Cointegrated Systems,"
Keane, M., and Runkle, D. (1992), "On the Estimation of Panel Data Mod- Econometric Theory, II, 888-911.
els With Serial Correlation When Instruments Are Not Strictly Exoge- Stoker, T. M. (1993), "Empirical Approaches to the Problem of Aggrega-
nous," Journal of Business and Economic Statistics, 10, 1-9. tion Over Individuals," Journal of Economic Literature, 31, 1827-1874.
Kiviet, J. F., and Phillips, G. D. A. (1993), "Alternative Bias Approxi- Swamy, P. A. V. B. (1970), "Efficient Inference in a Random Coefficient
mations With a Lagged Dependent Variable," Econometric Theory, 9, Regression Model," Econometrica, 38, 311-323.
62-80. Zellner, A. (1962), "An Efficient Method of Estimating Seemingly Unre-
Lindley, D. v., and Smith, A. F. M. (1972), "Bayes Estimates for the Linear lated Regressions and Test for Aggregation Bias," Journal of the Amer-
Model," Journal of the Royal Statistical Society, Ser. B, 34, 1-41. ican Statistical Association, 57, 348-368.

You might also like