Spatial Autoregressive Panel FE

Estimation of spatial autoregressive panel data models with …xed
e¤ects
Lung-fei Lee Jihai Yu
Department of Economics Department of Economics
Ohio State University University of Kentucky
l‡ee@econ.ohio-state.edu jihai.yu@uky.edu
March 4, 2008
Abstract
This paper establishes asymptotic properties of quasi-maximum likelihood estimators for …xed e¤ects
SAR panel data models with SAR disturbances where the time periods T and/or the number of spatial
units n can be …nite or large in all combinations except that both T and n are …nite. A direct approach
is to estimate all the parameters including …xed e¤ects. We propose alternative estimation methods
based on transformation. For the model with only individual e¤ects, the transformation approach yields
consistent estimators for all the parameters when either n or T are large, while the direct approach does
not yield a consistent estimator of the variance of disturbances unless T is large, although the estimators
for other parameters are the same as those of the transformation approach. For the model with both
individual and time e¤ects, the transformation approach yields consistent estimators of all the parameters
when either n or T are large. When we estimate both individual and time e¤ects directly, consistency of
the variance parameter requires both n and T to be large and consistency of other parameters requires
n to be large.
JEL classi…cation: C13; C23; R15

Keywords: Spatial autoregression, Panel data, Fixed e¤ects, Time e¤ects, Quasi-maximum likelihood
estimation, Conditional likelihood
Lee acknowledges …nancial support for his research from NSF under Grant No. SES-0519204
1 Introduction
Spatial econometrics deals with the spatial interactions of economic units in cross-section and/or panel
data. To capture correlation among cross-sectional units, the spatial autoregressive (SAR) model by Cli¤ and
Ord (1973) has received the most attention in economics. It extends autocorrelation in times series to spatial
dimensions and captures interactions or competition among spatial units. Early development in estimation
and testing is summarized in Anselin (1988), Cressie (1993), Kelejian and Robinson (1993), and Anselin and
Bera (1998), among others. The spatial correlation can be extended to panel data models (Anselin, 1988).
Baltagi et al. (2003) consider the speci…cation test of the spatial correlation in a panel regression with error
component and SAR disturbances. Kapoor et al. (2007) provide a rigorous theoretical analysis of a panel
model with SAR disturbances which incorporate error components. Baltagi et al. (2007) generalize Baltagi et
al. (2003) by allowing for spatial correlations in both individual and error components such that they might
have di¤erent spatial autoregressive parameters, which encompasses the spatial correlation speci…cations in
Baltagi et al. (2003) and Kapoor et al. (2007). Instead of random e¤ect error components, an alternative
speci…cation for panel data model assumes …xed e¤ects. The …xed e¤ects speci…cation has the advantage of
robustness in that the …xed e¤ects are allowed to correlate with included regressors in the model (Hausman,
1978). Yu et al. (2006, 2007) and Yu and Lee (2007) consider the spatial correlation in a dynamic panel data
setting, where the data generating processes (DGPs) are speci…ed to be, respectively, stationary, partially
nonstationary and nonstationary.
For panel data models with …xed individual e¤ects, when the time dimension T is …xed, we are likely to
encounter the “incidental parameters” problem discussed in Neyman and Scott (1948). This is because the
introduction of …xed e¤ects increases the number of parameters to be estimated. In a linear panel regression
model or a logit panel regression model with …xed individual e¤ects, the …xed e¤ects can be eliminated by
the method of conditional likelihood when e¤ective su¢ cient statistics can be found for each of the …xed
e¤ects. For those panel models, the time average of the dependent variables provides the su¢ cient statistic
(see Hsiao, 1986).
For the linear panel regression model with …xed e¤ects, the direct maximum likelihood (ML) approach
will estimate jointly the common parameters and …xed e¤ects. The corresponding ML estimates (MLEs) of
the regression coe¢ cients are known as the within estimates, which happen to be the conditional likelihood
estimates conditional on the time means1 . For the SAR panel data models with individual e¤ects, similar
…ndings of the direct ML approach will be shown in this paper. This direct estimation approach will yield
consistent estimates for the spatial and regression coe¢ cients except for the variance of the disturbances
1 However,e¤ective su¢ cient statistics might not be available for many other models. The well-known example is the probit
panel regression model, where the time average of the dependent variables does not provide the su¢ cient statistic even though
probit and logit models are close substitutes (see Chamberlain, 1982).
1
when T is small (but n is large).2 However, for SAR panel models with time e¤ects, the direct estimation
approach will be shown to be inconsistent for all parameters when n is small (but T is large). The inconsistent
estimates are consequences of the incidental parameters (Neyman and Scott, 1948).
In this paper, in order to avoid the incidental parameters problem, we suggest alternative estimation
1 0
methods. By using the data transformation (IT T lT lT ) to eliminate the individual e¤ects, the transformed
disturbances are uncorrelated although not i:i:d: in general. The transformed equation can be estimated by
the quasi-maximum likelihood (QML) approach. For the more general model with both individual and time
1 0 1 0
…xed e¤ects, one may combine the transformation (In n l n ln ) with the transformation (IT T lT l T ) to
eliminate both the individual and time …xed e¤ects. By exploring the generalized inverse of the transformed
equation, one may end up with a QML approach for the transformed model3 .
Panel regression models with SAR disturbances have been recently considered in the literature. The
model considered in Baltagi et al. (2003) is Ynt = Xnt 0 + cn0 + Unt ; Unt = 0 Wn Unt + Vnt ; t = 1; 2; :::; T ,
2
where elements of Vnt are i:i:d: (0; 0 ), cn0 is an n 1 vector of individual error components and the
spatial correlation is in Unt . A di¤erent speci…cation has been considered in Kapoor et al. (2007) with
+ + +
Ynt = Xnt 0 + Unt and Unt = 0 Wn Unt + dn0 + Vnt ; t = 1; 2; :::; T , where dn0 is the vector of individual
error components. Kapoor et al. (2007) propose method of moment (MOM) procedure for the estimation of
0 and the variance parameters of dn0 and Vnt . The two panel models are di¤erent in terms of the variance
matrices of the overall disturbances. The variance matrix in Baltagi et al. (2003) is more complicated
and its inverse is computationally demanding; the variance matrix in Kapoor et al. (2007) has a special
pattern and its inverse can be easier to compute. Baltagi et al. (2007) allow for spatial correlations in both
individual and error components where they might have di¤erent spatial autoregressive parameters. Both
Baltagi et al. (2003) and Baltagi et al. (2007) have emphasized on the test of spatial correlation in their
models. With the …xed e¤ects speci…cation, these panel models can have the same representation. By the
transformation (In 0 Wn ), the DGP of Kapoor et al. (2007) becomes Ynt = Xnt 0 + cn0 + Unt where
1 + 1
cn0 = (In 0 Wn ) dn0 and Unt = Unt (In 0 Wn ) dn0 . The Unt = 0 Wn Unt + Vnt forms a SAR
1
process. By regarding (In 0 Wn ) dn0 as a vector of unknown …xed e¤ect parameters, these two equations
are identical to a linear panel regression with …xed e¤ects and SAR disturbances. Hence, to generalize Baltagi
et al. (2003), Baltagi et al. (2007) and Kapoor et al. (2007), where the spatial e¤ects are in the disturbances,
and to generalize the SAR panel model where the spatial e¤ects are in the regression equation, we are going
to consider the estimation of the SAR panel model with both spatial lag and spatial disturbances. We allow
that the time periods T and/or the number of spatial units n can be …nite or large in all combinations except
2 When a dynamic e¤ect is considered into the SAR panel data, we will have an “initial condition” problem which will cause
the inconsistency of the direct likelihood estimates for all the parameters unless T is large ( see Yu et al, 2006, 2007 and Yu
and Lee (2007)).
3 The use of (I 1
T l l0 ) to eliminate time …xed e¤ects has been considered in Lee and Yu (2007a) for a spatial dynamic
T T T
panel model with large T . In a group setting with group …xed e¤ects, a similar transformation can eliminate the group e¤ects
(Lee et al., 2008).
2
that both T and n are …nite. In this paper, we pay special attention to the model with individual e¤ects
when n is large but T is small. On the other hand, for the model with time e¤ects, the special interest is on
the model with large T but small n.
This paper is organized as follows. In Section 2, the model with individual …xed e¤ects is introduced
and the data transformation procedure is proposed. We then establish the consistency and asymptotic
distribution of the QML estimator of the transformation approach. The direct ML approach is discussed
in Section 3 where the individual e¤ects are estimated directly. Section 4 generalizes the model to include
both individual and time e¤ects. After the individual e¤ects are eliminated, we can further eliminate the
time e¤ects and the asymptotics are derived. Alternatively, we can estimate the transformed time e¤ects
directly, or estimate both e¤ects directly, both of which are discussed in Section 5. Simulation results are
reported in Section 6 to compare di¤erent approaches. Section 7 concludes the paper. Proofs are collected
in the Appendix.
2 Transformation Approach
The SAR panel model with SAR disturbances where we have individual e¤ects is
Ynt = 0 Wn Ynt + Xnt 0 + cn0 + Unt ; Unt = 0 Mn Unt + Vnt ; t = 1; 2; :::; T , (2.1)
where Ynt = (y1t ; y2t ; :::; ynt )0 and Vnt = (v1t ; v2t ; :::; vnt )0 are n 1 column vectors and vit is i:i:d: across i
2
and t with zero mean and variance 0, Wn is an n n spatial weights matrix, which is predetermined and
generates the spatial dependence among cross sectional units yit , Xnt is an n kX matrix of nonstochastic
regressors, and cn0 is an n 1 column vector of …xed e¤ects.
In panel data models, when T is …nite, we need to take care of the incidental parameters problem. In
dynamic panel data, the …rst di¤erence or Helmert transformation can be made to eliminate the individual
e¤ects (see Anderson and Hsiao (1981) and Arellano and Bover (1995) among others). In this paper, we use
an orthogonal transformation which includes the Helmert transformation as a special case. Our asymptotic
results are obtained where T and/or n can be …nite or large in all combinations except that both T and
n are …nite4 . De…ne Sn ( ) = In Wn and Rn ( ) = In Mn for any and . At the true parameter,
Sn = S n ( 0) and Rn = Rn ( 0 ). Then, presuming Sn and Rn are invertible, (2.1) can be rewritten as
Ynt = Sn 1 Xnt 0 + Sn 1 cn0 + Sn 1 Rn 1 Vnt . (2.2)
For our analysis of the asymptotic properties of estimators, we make the following assumptions:
Assumption 1. Wn and Mn are nonstochastic spatial weights matrices and their diagonal elements satisfy
wn;ii = 0 and mn;ii = 0 for i = 1; 2; ; n.
4 We do not have an exact …nite small sample theory for the estimators with both n and T being …nite.
3
Assumption 2. The disturbances fvit g, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with zero
2 4+
mean, variance 0 and E jvit j < 1 for some > 0.
Assumption 3. Sn ( ) and Rn ( ) are invertible for all 2 and 2 P. Furthermore, and P are compact,
0 is in the interior of and 0 is in the interior of P.
Assumption 4. The elements of Xnt are nonstochastic and bounded5 , uniformly in n and t. Also, under
1
PT ~ 0 ~ 6
the setting in Assumption 6, the limit of nT t=1 Xnt Xnt exists and is nonsingular.
Assumption 5. Wn and Mn are uniformly bounded in row and column sums in absolute value (for short,
UB)7 . Also Sn 1 ( ) and Rn 1 ( ) are UB8 , uniformly in 2 and 2 P.
Assumption 6. (1) n is large, where T can be …nite or large; or, (2) T is large, where n can be …nite or
large.
Assumption 1 is a standard normalization assumption in spatial econometrics. This assumption helps
the interpretation of the spatial e¤ect as self-in‡uence shall be excluded in practice. Assumption 2 provides
regularity assumptions for vit and our analysis is based on i.i.d. disturbances. If there are unknown het-
eroskedasticity, the MLE (QMLE) would not be consistent. Consistent methods such as the GMM in Lin
and Lee (2005) and that in Kelejian and Prucha (2007) may be designed for the model. Invertibility of Sn ( )
and Rn ( ) in Assumption 3 guarantees that (2.2) is valid. Also, compactness is a condition for theoretical
analysis. In many empirical applications, each of the rows of Wn and Mn sums to 1, which ensures that
all the weights are between 0 and 1. When Wn and Mn are row normalized, it is often to take a compact
subset of (-1,1) as the parameter space. When exogenous variables Xnt are included in the model, it is
convenient to assume that the exogenous regressors are uniformly bounded as in Assumption 4. Assumption
5 is originated by Kelejian and Prucha (1998, 2001) and also used in Lee (2004, 2007). That Wn , Mn , Sn 1 ( )
and Rn 1 ( ) are UB is a condition that limits the spatial correlation to a manageable degree. Assumption 6
allows three cases: (i) both n and T are large; (ii) T is …xed and n is large; (iii) n is …xed and T is large.
For (ii), we are interested in the short panel data case in contrast to the case where T needs to be large in
other studies, e.g., Hahn and Kuersteiner (2002) and Yu et al. (2006). When n is large and T is …nite, the
incidental parameter problem may appear so that careful estimation methods need to be designed. However,
our suggested transformation approach for the estimation of (2.1) is general and it may also apply to the
cases (i) and (iii) where T can be large.
5 If
Xnt is allowed to be stochastic and unbounded, appropriate moment conditions can be imposed instead.
notational purposes, we de…ne Y~nt = Ynt
6 For YnT and Y~n;t 1 = Yn;t 1 YnT; 1 for t = 1; 2; ; T where YnT =
1 PT 1 PT ~
T t=1 Ynt and YnT; 1 = T t=1 Yn;t 1 . Similarly, we de…ne Xnt = Xnt XnT and V~nt = Vnt VnT .
7 We say a (sequence of n n) matrix Pn isP uniformly bounded in row and column sums if supn 1P kPn k1 < 1 and
supn 1 kPn k1 < 1, where kPn k1 = sup1 i n n j=1 jpij;n j is the row sum norm and kPn k1 = sup1 j n
n
i=1 jpij;n j is the
column sum norm.
8 This assumption has e¤ectively ruled out some cases, and, hence, imposed limited dependence across spatial units. For
example, if 0n = 1 1=n under n ! 1, it is a near unit root case for a cross sectional spatial autoregressive model and Sn 1
will not be UB (see Lee and Yu (2007b)).
4
2.1 Data Transformation and Conditional Likelihood
1 1 0
Let [FT;T 1 ; pT lT ] be the orthonormal matrix of the eigenvectors of JT = (IT T lT lT ), where FT;T 1 is
the T (T 9
1) eigenvector matrix corresponding to the eigenvalues of one and p1 lT is the T -dimensional
T
column vector of ones. For any n T matrix [Zn1 ; ; ZnT ] where each Znt , t = 1; ; T , is a T -
dimensional column vector, we de…ne the corresponding transformed n (T 1) matrix [Zn1 ; ; Zn;T 1] =
[Zn1 ; ; ZnT ]FT;T 1. Denote Xnt = [Xnt;1 ; Xnt;2 ; ; Xnt;kX ]. Then, (2.1) implies
Ynt = 0 Wn Ynt + Xnt 0 + Unt ; Unt = 0 Mn Unt + Vnt , t = 1; ;T 1. (2.3)

0 1 0 1
Vn1 Vn1
B .. C 0 B .. C
Because @ . A = (FT;T 1 In ) @ . A and vit is i:i:d:, we have
Vn;T 1 VnT
0 1
Vn1
B . C
E @ .. A (Vn10 ; 0
; Vn;T 2 0 2 0 2
1 ) = 0 (FT;T 1 In )(FT;T 1 In ) = 0 (FT;T 1 FT;T 1 In ) = 0 In(T 1) .
Vn;T 1
Hence, vit ’s are uncorrelated for all i and t (and independent under normality) where vit is the ith element
of Vnt .
Denote = ( 0; ; ; 2 0
) and = ( 0 ; ; )0 . At the true value, 0 =( 0
0; 0; 0;
2 0
0) and 0 =( 0
0;
0
0; 0) .
The likelihood function of (2.3) as if the disturbances were normally distributed, is
n(T 1) n(T 1) 2 1 XT 1
ln Ln;T ( ) = ln 2 ln + (T 1)[ln jSn ( )j + ln jRn ( )j] 2
Vnt0 ( )Vnt ( ),
2 2 2 t=1
(2.4)
where Vnt ( ) = Rn ( )[Sn ( )Ynt Xnt ]. Thus, Vnt = Vnt ( 0 ). The QMLE ^nT is the extremum estimator
derived from the maximization of (2.4). For any n-dimensional column vectors pnt and qnt , as
PT 1
t=1 pnt0 qnt = (p0n1 ; ; p0nT )(FT;T 1
0
In )(FT;T 1
0
In )(qn1 ; 0
; qnT )0
= (p0n1 ; ; p0nT )(JT 0

In )(qn1 ; 0
; qnT )0
PT
= t=1 p~0nt q~nt
by using (~
pn1 ; ; p~nT ) = (pn1 ; ; pnT )JT , (2.4) can be rewritten as
n(T 1) n(T 1) 1 XT
ln Ln;T ( ) = ln 2 ln 2
+ (T 1)[ln jSn ( )j + ln jRn ( )j] 2
V~nt
0
( )V~nt ( ),
2 2 2 t=1
(2.5)
where V~nt ( ) = Rn ( )[Sn ( )Y~nt ~ nt ]. From (2.5), the …rst and second order derivatives of the likelihood
X
function are (A.1) and (A.2) in Appendix A.1. At true 0, they are (A.3) and (A.4). We note that the
likelihood function in (2.5) has a conditional likelihood interpretation. It is the conditional likelihood condi-
tional on YnT , which is a su¢ cient statistic for cn0 under normality. This is so as follows. (2.1) implies that
9A special selection of FT;T 1 gives rise to the Helmert transformation where Vnt is transformed to ( T T t+1
t
)1=2 [Vnt
1
T t
(Vn;t+1 + + VnT )], which is of particular interest for dynamic panel data models.
5
YnT = 0 Wn YnT + XnT 0 + cn0 + UnT with UnT = 0 Mn UnT + VnT and Y~nt = ~
0 Wn Ynt
~ nt
+X 0
~nt
+U
~nt =
with U ~
0 Mn Unt + V~nt . As V~nt , t = 1; ; T , are independent of VnT under normality, the likelihood in
(2.5) corresponds to the density function of Y~nt , t = 1; ;T.
2.2 Asymptotic Properties
For the likelihood function (2.5) divided by the e¤ective sample size n(T 1), the corresponding expected
1
value function is Qn;T ( ) = E maxcn n(T 1) ln Ln;T ( ; cn ), which is
1
Qn;T ( ) = E ln Ln;T ( ) (2.6)
n(T 1)
1 1 1 1 1 XT
= ln 2 ln 2 + [ln jSn ( )j + ln jRn ( )j] 2
E V~nt
0
( )V~nt ( ).
2 2 n 2 n(T 1) t=1
To show the consistency of ^nT , we need the following uniform convergence result.
1 p
Claim 1 Let be any compact parameter space of . Under Assumptions 1-6, n(T 1) ln Ln;T ( ) Qn;T ( ) !
0 uniformly in 2 and Qn;T ( ) is uniformly equicontinuous for 2 .
Proof. See Appendix A.2.
For local identi…cation, a su¢ cient condition (but not necessary) is that the information matrix 0 ;nT ,
1 @ 2 ln Ln;T ( 0) 1 @ 2 ln Ln;T ( )
where 0 ;nT = E n(T 1) @ @ 0 , is nonsingular and E n(T 1) @ @ 0 has full rank for any
in some neighborhood N ( 0 ) of 0 (see Rothenberg (1971)). The 0 ;nT
is derived in (A.4) of Appendix
A.1 and its nonsingularity is analyzed in Appendix A.3. While the conditions for the nonsingularity of the
information matrix provide local identi…cation, the conditions in the following assumption are global ones.
Denote
1 XT
HnT ( ) = ~ nt ; Gn X
(X ~ nt 0 0
0 ) Rn (
~ nt ; Gn X
)Rn ( )(X ~ nt 0 ),
n(T 1) t=1
2
2 0
n( ) = tr[(Rn ( )Rn 1 )0 (Rn ( )Rn 1 )],
n
2
2 0
n( ; ) = tr[(Rn ( )Sn ( )Sn 1 Rn 1 )0 (Rn ( )Sn ( )Sn 1 Rn 1 )].
n
Assumption 7. Either (a) the limit of HnT ( ) is nonsingular for each possible in P and the limit of
1 1
n ln 2 10
0 Rn R n
1
n ln 2
n( )Rn 1 ( )0 Rn 1 ( ) is not zero10 for 6= 0; or (b) the limit of
1 2 10 10 1 1 1 2
ln 0 R n Sn Sn R n ln n( ; )Rn 1 ( )0 Sn 1 ( )0 Sn 1 ( )Rn 1 ( )
n n
11
is not zero for ( ; ) 6= ( 0 ; 0 ).
1 0 When 1
n is …nite and T is large, this inequality becomes n ln j 20 Rn 10 Rn 1 j n1
ln j 2n ( )Rn 1 ( )0 Rn 1 ( )j 6= 0.
1 1 The inequality will be n ln j 0 Rn Sn Sn Rn j n ln j n ( ; )Rn ( ) Sn ( )0 Sn 1 ( )Rn 1 ( )j 6= 0 when n is …nite and T
1 2 10 10 1 1 1 2 1 0 1
is large. When Mn = Wn and 0 6= 0 , this condition would not be satis…ed as ( 0 ; 0 ) and ( 0 ; 0 ) could not be distinguished
from each other. Identi…cation will rely on either Assumption 7 (a) or extra information on the order of magnitudes of 0 and
0.
6
This assumption states the identi…cation conditions of the model which generalize those for a cross section
SAR model in Lee and Liu (2006) to the panel case. The part (a) of Assumption 7 represents the possible
identi…cation of 0 and 0 through the deterministic part of the reduced form equation of (2.3) and the
2
identi…cation of 0 and 0 from the SAR process of Unt in (2:3). The part (b) of Assumption 7 provides
identi…cation through the SAR process of the reduced form of disturbances of Ynt . The global identi…cation
and consistency are shown in the following theorem.
Theorem 1 Under Assumptions 1-7, 0 is globally identi…ed and, for the extremum estimator ^nT derived
p
from (2.5), ^nT ! 0.
@ ln Ln;T (^nT )
The asymptotic distribution of the QMLE ^nT can be derived from the Taylor expansion of @
around 0. At 0, the …rst order derivative of the likelihood function involves both linear and quadratic
@ ln Ln;T ( 0)
functions of V~nt and is derived in (A.3). The variance matrix of p 1
@ is equal to
n(T 1)
1 @ ln Ln;T ( 0 ) @ ln Ln;T ( 0 )
E = 0 ;nT
+ 0 ;n
,
n(T 1) @ @ 0
0 1
0k X kX
B 1
Pn •2 C
4
B 01 kX G n;ii C
and 0 ;n
= 4 3
4
0
B 1
P
n
n
i=1
• 1
Pn 2 C is a symmetric matrix with 4
0 @ 01 kX n i=1 Gn;ii Hn;ii n i=1Hn;ii A
01 kX
1 •n
trG 1
2 n trHn
1
2 2n
0 2 0 4 4
0
• n is a
being the fourth moment of vit , where Gn;ii is the (i; i) entry of Gn , Hn;ii is the (i; i) entry of Hn , G
matrix transformed from Gn as de…ned in Appendix A.1 after (A.4). When Vnt are normally distributed,
4
0 ;n
= 0 because 4 3 0 = 0 for a normal distribution. Denote 0
as the limit of 0 ;nT
and 0
as
@ ln Ln;T ( 0)
the limit of 0 ;n
, then, the limiting variance matrix of p 1 @ is equal to 0
+ 0
. The
n(T 1)
1 @ ln Ln;T ( 0 )
asymptotic distribution of p @ can be derived from the central limit theorem for martingale
n(T 1)
trGn trHn
di¤erence arrays12 . Denote Cn = Gn n In and Dn = Hn n In .
1
Assumption 8. The limit of n2 tr(Cns Cns )tr(Dns Dns ) tr2 (Cns Dns ) is strictly positive.13
Assumption 8 is a condition for the nonsingularity of the limiting information matrix 0
(see Appendix
1
A.3). When the limit of HnT is singular, as long as the limit of n2 tr(Cns Cns )tr(Dns Dns ) tr2 (Cns Dns ) is
strictly positive, the limiting information matrix 0
is still nonsingular. Also, its rank does not change in
14
a small neighborhood of 0. .
1 @ ln Ln;T ( 0 ) d
Claim 2 Under Assumptions 1-6 and 7(a); or 1-6, 7(b) and 8, p @ ! N (0; 0 + 0 ).
n(T 1)
1 @ ln Ln;T ( 0 ) d
When fvit g, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, p @ ! N (0; 0
).
n(T 1)
1 2 When T is …nite, we can use the central limit theorem in Kelejian and Prucha (2001). When T is large, we can use the
central limit theorem in Yu et al. (2006).

1 3 When n is …nite and T is large, Assumption 8 is “ 1 tr(C s C s )tr(D s D s ) tr2 (Cn
s D s ) > 0”.
n2 n n n n n
1 4 See (C.10) in Yu et al. (2006) for the case T is large. When T is …nite, it still holds according to Lee (2004).
7
1 @ 2 ln Ln;T ( ) 1 @ 2 ln Ln;T ( 0)
Also, under Assumptions 1-7, we have n(T 1) @ @ 0 n(T 1) @ @ 0 = k 0k Op (1) and
2 2
1 @ ln Ln;T ( 0) @ Qn;T ( 0) 1
n(T 1) @ @ 0 @ @ 0 = Op p .15 Combined with Claim 2, we have the following theorem
n(T 1)
for the distribution of ^nT .
Theorem 2 Under Assumptions 1-6 and 7(a); or 1-6, 7(b) and 8, for the extremum estimator ^nT derived
from (2.5),
p d
n(T 1)(^nT 0) ! N (0; 0
1
( 0
+ 0
) 1
0
), (2.7)
p d
Additionally, if fvit g, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, (2.7) becomes n(T 1)(^nT 0) !
1
N (0; 0
).
Hence, after the data transformation to eliminate the individual e¤ects, the QMLE is consistent and
asymptotically normal when either n or T are large.
3 The Direct Approach

For the estimation of the linear panel regression model with …xed individual e¤ects, the ML approach
which estimates the …xed e¤ects directly provides consistent estimates of the regression coe¢ cients, which
are known as the within estimates. For the spatial panel model with …xed individual e¤ects, one may wonder
whether or not the ML approach will yield consistent estimates when T is small. As we will see below, this
0 0
direct approach will yield the same consistent estimator of the transformation approach for 0 =( 0; 0; 0) ;
2
however, the estimator of 0 is inconsistent unless T is large.
3.1 The Likelihood Function
The likelihood function for the model before transformation (2.1) is
nT nT 1 XT
ln Ldn;T ( ; cn ) = ln 2 ln 2
+ T [ln jSn ( )j + ln jRn ( )j] 2
0
Vnt ( )Vnt ( ), (3.1)
2 2 2 t=1
where Vnt ( ) = Rn ( )[Sn ( )Ynt Xnt cn ]. We can estimate cn directly and have the asymptotic analysis
on the estimator of via the concentrated likelihood function.
0
@ ln Ld
n;T ( ;cn )
PT PT
Using the …rst order condition that @cn = 12 Rn0 ( ) t=1 Vnt ( ), we have ^
cnT ( ) = 1
T t=1 (Sn ( )Ynt
Xnt ) and the concentrated likelihood is
nT nT 1 XT
ln Ldn;T ( ) = ln 2 ln 2
+ T [ln jSn ( )j + ln jRn ( )j] 2
V~nt
0
( )V~nt ( ), (3.2)
2 2 2 t=1
1 5 See (C.7) and (C.8) in Yu et al. (2006) for the case T is large. When T is …nite, it still holds according to Lee (2004).
8
with V~nt ( ) being the same one in (2.5). One may compare the concentrated likelihood function in (3.2) with
the likelihood function from the transformation approach in (2.5). We see that the di¤erence is on the use
of T in (3.2) but (T 1) in (2.5). For large T , the two functions can be very close to each other. Therefore,
we may expect that the estimates of 0 from these two approaches could be asymptotically equivalent when
T is large. The interesting comparison is for the case where T is …nite.
For (3.2), we can further concentrate out and 2 and focus on ( ; ). Denote
h i 1 hP i
^ d ( ; ) = PT X ~0 0 ~ T ~0 0 ~
nT t=1 nt Rn ( )Rn ( )Xnt t=1 Xnt Rn ( )Rn ( )Sn ( )Ynt ,
1 PT h ~
i0 h
~ nt ^ dnT ( ; ) Rn0 ( )Rn ( ) Sn ( )Y~nt X
i
~ nt ^ dnT ( ; ) .
^ d2
nT ( ; ) = t=1 Sn ( )Ynt X
nT
The concentrated log likelihood function of ( ; ) is
nT nT
ln Ldn;T ( ; ) = (ln(2 ) + 1) ln ^ d2
nT ( ; ) + T [ln jSn ( )j + ln jRn ( )j]. (3.3)
2 2
We can compare it with the concentrated likelihood function from (2.5) where the corresponding estimates
are
hP i 1 hP i
^ T~ nt
0 ~ nt T ~0 0 ~
nT ( ; ) = X
t=1 Rn0 ( )Rn ( )X t=1 Xnt Rn ( )Rn ( )Sn ( )Ynt ,
1 PT h ~
i0 h
~ nt ^ nT ( ; ) Rn0 ( )Rn ( ) Sn ( )Y~nt
i
~ nt ^ nT ( ; ) ,
^ 2nT ( ; ) = t=1 Sn ( )Ynt X X
n(T 1)
and the ^ 2nT ( ; ) for the transformed approach is consistent even when T is small ( n goes to in…nity). The
concentrated log likelihood function of ( ; ) from (2.5) is
n(T 1) n(T 1)
ln Ln;T ( ; ) = (ln(2 ) + 1) ln ^ 2nT ( ; ) + (T 1)[ln jSn ( )j + ln jRn ( )j]. (3.4)
2 2
d
Note that ^ nT ( ; ) is the same as ^ nT ( ; ), and ^ d2
nT ( ; ) =
T
T
1
^ 2nT ( ; ). Equation (3.3) can be rewritten
as ln Ldn;T ( ; ) = nT
2 (ln(2 ) + ln T T 1 + 1) nT
2 ln ^ 2nT ( ; ) + T [ln jSn ( )j + ln jRn ( )j]. By comparing
d
(3.3) and (3.4), we can see that they will yield the same maximizer ( ^ nT ; ^nT ). As ^ nT ( ; ) has the same
expression as ^ nT ( ; ), we can conclude that the QMLE of 0 =( 0
0; 0; 0)
0
from this direct approach will
2
yield the same consistent estimate as the transformation approach. However, the estimation of 0 from the
direct approach will not be consistent unless T is large, which can be seen from ^ d2
nT ( ; ) and ^ 2nT ( ; ).16
Hence, the ML estimation of the spatial panel model with …xed individual e¤ects shares some common
features on their estimates with those of the ML estimation of the linear panel regression model with …xed
e¤ects.17
1 6 Note that, for the linear panel regression model with …xed e¤ects, while the within estimates of the regression coe¢ cients
are consistent, the corresponding MLE of 20 is not, which is the consequence of the incidential parameters problem (Neyman
and Scott 1948).
1 7 As the bias of the direct estimate of 2 is due to the degree of freedom (T 1) instead T , one may easily correct the biased
0
estimate to a bias corrected estimate. The bias corrected estimator will become the conditional likelihood estimator in this
model.
9
4 A General Model With Time E¤ects: Transformation Approach
Both Baltagi et al. (2003) and Kapoor et al. (2007) focus on models with only individual e¤ects. While
in the panel data literature, there are also two way error component regression models where we have not
only unobservable individual e¤ects but also unobservable time e¤ects (See Wallace and Hussain (1969),
Amemiya (1971), Nerlove (1971) and Hahn and Moon (2006), etc). Hence, it is natural to generalize the
model to include both individual e¤ects and time e¤ects. This would be useful for empirical applications
where the time dummy e¤ects might be important and should be taken into account, for example, in growth
theory and regional economics (see Ertur and Koch (2007) and Foote (2007) for recent empirical applications
of panel data models with both time dummy e¤ects and spatial e¤ects). Hence, we generalize (2.1) to
Ynt = 0 Wn Ynt + Xnt 0 + cn0 + t ln + Unt ; Unt = 0 Mn Unt + Vnt , t = 1; 2; :::; T , (4.1)
where t is the …xed time e¤ects. For (4.1), we may …rst eliminate the individual e¤ects by FT;T 1 similar
to (2.3), which yields
Ynt = 0 Wn Ynt + Xnt 0 + t ln + Unt , Unt = 0 Mn Unt + Vnt , t = 1; 2; :::; T 1, (4.2)
where [ 1 ln ; 2 ln ; ; T 1 ln ] = [ 1 ln ; 2 ln ; ; T ln ]FT;T 1 can be considered as the transformed time

e¤ects. We can make a further transformation to (4.2) to eliminate the transformed time e¤ects. For this
further transformation approach, it is investigated in this section. Alternatively, we can estimate the t
directly. Section 5 covers the direct approach where we will estimate the transformed time e¤ects directly.
Furthermore, we might be interested to investigate the estimators when we estimate both time e¤ects and
individual e¤ects directly. This is also discussed in Section 5.
4.1 Data Transformation and the Likelihood Function
To eliminate the time dummy e¤ects, we need Wn and Mn to be row normalized for analytical purpose18 .
1 0
Also, Assumption 4 is changed accordingly. Let Jn = In n ln ln be the deviation from the group mean
transformation over spatial units.
Assumption 1’. Wn and Mn are row normalized nonstochastic spatial weights matrices.
Assumption 4’. The elements of Xnt are nonstochastic and bounded, uniformly in n and t. Also, under
1
PT ~ 0 ~
the setting in Assumption 6, the limit of nT t=1 Xnt Jn Xnt exists and is nonsingular.
p
Let (Fn;n 1 , ln = n) be the orthonormal matrix of eigenvectors of Jn where Fn;n 1 corresponds to the
p
eigenvalues of ones and ln = n corresponds to the eigenvalue zero. Similar to Lee and Yu (2007a), we can
0
transform the n-dimensional vector Ynt to an (n 1)-dimensional vector Ynt such that Ynt = Fn;n 1 Ynt .
1 8 When W and M are not row normalized, we can still eliminate the transformed time e¤ects; however, we will not have
n n
the presentation of (4.3).
10
Hence, (4.2) will be transformed into
0 0
Ynt = 0 (Fn;n 1 Wn Fn;n 1 )Ynt + Xnt 0 + Unt ; Unt = 0 (Fn;n 1 Mn Fn;n 1 )Unt + Vnt , (4.3)
0 0
where Xnt;k = Fn;n 1 Xnt;k
and Vnt = Fn;n V . After the transformations, the e¤ective sample size is
0 1 1 nt 0 1
Vn1 Vn1
B . C 0 B .. C 0 0
now (n 1)(T 1). Because @ .. A = (IT 1 Fn;n 1) @ . A = (IT 1 Fn;n 1 )(FT;T 1
Vn;T 1 Vn;T 1
0 1 0 1 0 1
Vn1 Vn1 Vn1
B C B . C B . C
In ) @ ... 0
A = (FT;T 1
0
Fn;n 1 ) @ .. A, we have E @ .. A (Vn1 0 ; 0
; Vn;T 1 ) = 20 (FT;T 1
VnT VnT Vn;T 1
0
Fn;n 1 )(FT;T 1 Fn;n 1 ) = 20 (IT 1 In 1 ) = 20 I(n 1)(T 1) . Hence, vit ’s are uncorrelated for all i and t
(and independent under normality) where vit is the ith element of Vnt .
The likelihood function for (4.3) is
(n 1)(T 1) (n 1)(T 1) 2 0
ln Ln;T ( ) = ln 2 ln + (T 1) ln In 1 Fn;n 1 Wn Fn;n 1
2 2
T 1
0 1 X
+(T 1) ln In 1 Fn;n 1 Mn Fn;n 1 2
Vnt 0 ( )Vnt ( ), (4.4)
2 t=1
0 0
where Vnt ( ) = Rn ( )[(In 1 Fn;n 1 Wn Fn;n 1 )Ynt Xnt ], Rn ( ) = In 1 Fn;n 1 Mn Fn;n 1 and the
0
determinant and inverse of (In 1 Fn;n 1 Wn Fn;n 1 ) are
0 1 0 1 0 1
In 1 Fn;n 1 Wn Fn;n 1 = jIn Wn j , (In 1 Fn;n 1 Wn Fn;n 1 ) = Fn;n 1 (In Wn ) Fn;n 1,
1
0
and similarly for (In 1 Fn;n 1 Mn Fn;n 1 ) (see Lee and Yu (2007a)). For any n-dimensional column vector
pnt and qnt , as Jn (pn1 ; ; pnT )JT = Jn (~
pn1 ; ; p~nT ),
PT 1
t=1 pnt0 qnt = (p0n1 ; ; p0nT )(FT;T 1 Fn;n 0
1 )(FT;T 1
0
Fn;n 0
1 )(qn1 ;
0
; qnT )0
= (p0n1 ; ; p0nT )(JT 0

Jn )(qn1 ; 0
; qnT )0
PT
= t=1 p~0nt Jn q~nt .
This implies that the likelihood function (4.4) is numerically identical to
(n 1)(T 1) (n 1)(T 1) 2
ln Ln;T ( ) = ln 2 ln (T 1) ln(1 ) (T 1) ln(1 )
2 2
T
1 X ~0
+(T 1) ln jSn ( )j + (T 1) ln jRn ( )j 2
Vnt ( )Jn V~nt ( ), (4.5)
2 t=1
where V~nt ( ) = Rn ( )[(In Wn )Y~nt ~ nt ].19

X
1 9 We note that this likelihood function is, in general, not necessarily a conditional likelihood as the sample average over
spatial units at each t might not be a su¢ cient statistic for the time dummy.
11
The …rst and second order derivatives of (4.5) are (C.1) and (C.2) in Appendix C.1. From (C.1) and
1 @ 2 ln Ln;T ( 0)
(C.2), the score is in (C.3) and the information matrix 0 ;nT
= E (n 1)(T 1) @ @ 0 is in (C.4).
The following Assumptions provide conditions for global identi…cation. Denote
1 XT
HnT ( ) = ~ nt ; Gn X
(X ~ nt 0 0
0 ) Rn (
~ nt ; Gn X
)Jn Rn ( )(X ~ nt 0 ),
(n 1)(T 1) t=1
2
2 0
n( ) = tr[(Rn ( )Rn 1 )0 Jn (Rn ( )Rn 1 )],
n 1
2
2 0
n( ; ) = tr[(Rn ( )Sn ( )Sn 1 Rn 1 )0 Jn (Rn ( )Sn ( )Sn 1 Rn 1 )].
n 1
Assumption 7’. Either (a) the limit of HnT ( ) is nonsingular for each possible in P and the limit
1 2 10 1 1 2 1 0 1 20
of n 1 ln 0 Rn J n Rn n 1 ln n( )Rn ( ) Jn Rn ( ) is not zero for 6= 0; or (b) the limit of
1 2 10 10 1 1 1 2
n 1 ln 0 R n Sn J n Sn Rn n 1 ln n( ; )Rn 1 ( )0 Sn 1 ( )0 Jn Sn 1 ( )Rn 1 ( ) is not zero for ( ; ) 6=
21
( 0 ; 0 ).
1 s s s s
Assumption 8’. The limit of (n 1)2 tr(Cn Cn )tr(Dn Dn ) tr2 (Cns Dns ) is strictly positive, where Cn =
trJn Gn
J n Gn n 1 In and Dn = Jn Hn trJ n Hn
n 1 In .
22
1 @ ln Ln;T ( 0)
The variance matrix of p @ is equal to
(n 1)(T 1)
1 @ ln Ln;T ( 0 ) @ ln Ln;T ( 0 )
E = 0 ;nT + 0 ;n ,
(n 1)(T 1) @ @ 0
0 1
0kX kX
B 1
Pn
• n )ii ]2 C
B 01 kX [(Jn G C
4 B n 1 C
4 3 0 B i=1 C. The asymp-
where 0 ;n = 4 B Pn Pn
C
0
B 01 kX 1 • n )ii (Jn Hn )ii ]
[(Jn G 1
[(Jn Hn )ii ]2 C
@ n 1 n 1 A
i=1 i=1
01 kX 1
tr(Jn G •n) 1
tr(Jn Hn ) 4 1 4
2 20 (n 1) 2 20 (n 1) 0
totics of the transformation approach with both time and individual e¤ects eliminated can be obtained
similarly as Theorem 2.
Theorem 3 Under Assumptions 1’,2,3,4’,5,6 and 7’(a); or 1’,2,3,4’,5,6,7’(b) and 8’, for the extremum
estimator ^nT derived from (4.5),
p d
(n 1)(T 1)(^nT 0) ! N (0; 0
1
( 0
+ 0
) 1
0
), (4.6)
n is …nite and T is large, this inequality becomes n 1 1 ln j 20 Rn 10 Jn Rn 1 j n

2 0 When 1
ln j 2n ( )Rn 1 ( )0 Jn Rn 1 ( )j 6= 0.
inequality will be n 1 ln j 0 Rn Sn Jn Sn Rn j n 1 ln j n ( ; )Rn ( ) Jn Sn 1 ( )0 Sn 1 ( )Rn 1 ( )j =
2 1 The 1 2 10 10 1 1 1 2 1 0 6 0 when n is
…nite and T is large. When Mn = Wn and 0 6= 0 , this condition would not be satis…ed as ( 0 ; 0 ) and ( 0 ; 0 ) could
not be distinguished from each other. Identi…cation will rely on either Assumption 7’(a) or extra information on the order of
magnitudes of 0 and 0 .
2 2 When n is …nite and T is large, Assumption 8’is “ 1 s C s )tr(D s D s )
(n 1)2
[tr(Cn n n n tr2 (Cn s D s )] > 0”.
n
12
Additionally, if fvit g, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, (4.6) becomes
p d
(n 1)(T 1)(^nT 0) ! N (0; 1
0
):
Proof. See Appendix C.2.
Hence, after the data transformation to eliminate both the individual e¤ects and time e¤ects, the QMLE
is consistent and asymptotically normal when either n or T are large.
5 A General Model With Time E¤ects: Direct Approaches

5.1 Direct Approach I: Estimation of Transformed Time E¤ects
Given (4.2) where the individual e¤ects are eliminated and time e¤ects are still present, when n ! 1
and T might be …nite or large, we can estimate the transformed time e¤ects consistently. Denote T =
( 1; 2; ; T ), the likelihood function for (4.2) is
T 1
n(T 1) n(T 1) 1 X
ln Ldn;T ( ; T) = ln 2 ln 2
+(T 1)[ln jSn ( )j+ln jRn ( )j] 2
Vnt0 ( ; T )Vnt ( ; T ),
2 2 2 t=1
(5.1)
where Vnt ( ; T) = Rn ( )[Sn ( )Ynt Xnt t ln ]. By using the …rst order condition, given , the estimate
1
of t is ^ t ( ) = (ln0 Rn0 ( )Rn ( )ln ) 1 0
ln Rn0 ( )Rn ( )(Sn ( )Ynt Xnt ). Using Rn ( )ln = 1 ln , the likelihood
function with T concentrated out is
T 1
n(T 1) n(T 1) 1 X
+ (T 1)[ln jSn ( )j + ln jRn ( )j] 2
Vnt0 ( )Jn Vnt ( );
2 2 2 t=1
(5.2)
where Vnt ( ) = Rn ( )[Sn ( )Ynt Xnt ]. For any n-dimensional column vector pnt and qnt , as
PT 1
t=1 pnt0 Jn qnt = (p0n1 ; ; p0nT )(FT;T 1 In )(IT 1
0
Jn )(FT;T 1
0
In )(qn1 ; 0
; qnT )0
= (p0n1 ; ; p0nT )(JT 0

Jn )(qn1 ; 0
; qnT )0
PT
= t=1 p~0nt Jn q~nt ,
the likelihood function (5.2) is numerically identical to

T
n(T 1) n(T 1) 1 X ~0
+ (T 1)[ln jSn ( )j + ln jRn ( )j] 2
Vnt ( )Jn V~nt ( ).
2 2 2 t=1
(5.3)
For the concentrated likelihood function (5.3), the …rst and second order derivatives are in (D.1) and (D.2)
in Appendix D.1.
From Sections 2 and 3, we can see that for the SAR panel data model with only individual e¤ects, both the
0 0
transformation approach and the direct approach will yield the same consistent estimator of 0 =( 0; 0; 0) .
13
2
But the direct approach will not yield a consistent estimator of 0 as the transformation approach does,
unless T is large. However, for the SAR panel model with both individual and time e¤ects, this direct
approach will not yield any consistent estimator, unless n is large.
For the SAR panel data with both individual and time e¤ects, one can see the di¤erence of the two
approaches via their log likelihood functions in (4.5) and (5.3). For the direct approach, its concentrated
likelihood (5.3) does not adjust the degree of freedom in spatial units n and also does not adjust the compo-
nents on the determinants of Sn ( ) and Rn ( ) while the likelihood of the transformed approach in (4.5) does.
2
These di¤erences would result in the inconsistent estimates of 0 and 0 in addition to that of 0. Because
the estimate of 0 will depend on the estimates of 0 and 0, it would also be inconsistent. To be convincing,
the inconsistency of the QMLE with a …nite (small) n can be revealed by investigating the probability limit
1 @ ln Ld
n;T ( 0) 1 @ ln LnT ( 0)
of the normalized gradient vector, n(T 1) @ from (D.1) and compare it with (n 1)(T 1) @
from (C.1) of the transformation approach. As the one from (C.1) is zero because the transformation ap-
2
proach is consistent, the di¤erences are on the derivatives with , and . For simplicity, let plim and
@Ld
n;T ( 0)
lim denote that at least one of n and T goes to in…nity. We have plim n(T1 1) @ = 1
1
0
lim n1 ;
@Ld
n;T ( 0 ) @Ld
n;T ( 0 )
plim n(T1 1) @ = 1
1
lim n1 , and plim n(T1 1) @ 2 = 2
1
2 lim n1 . These three limits are, in gen-
0 0
@ ln Ld
n;T ( )
eral, not zero unless n is large. When n is …nite, 0 does not solve the equation plim n(T1 1) @ = 0.
The estimator ^ml which maximizes the concentrated log likelihood ln Ldn;T ( ) would solve the normal equa-
@ ln Ld ^
n;T ( ml )
tion 1
n(T 1) @ = 0. From the asymptotic theory of an extremum (or M) estimation theory, ^ml
1 @ ln Ld
n;T (
1
)
would converge in probability to a which solves the limiting equation plim n(T1 1) @ = 0 (see,
1
e.g., Amemiya (1985), Ch.4). But 6= 0, so the estimates from the concentrated likelihood function would
not be consistent unless n is large. Compared to Section 3, with time e¤ects included, the direct approach
0 0
does not give a consistent estimate of 0 =( 0; 0; 0) when n is …nite (T goes to in…nity).
5.2 Direct Approach II: Estimation of Both Time and Individual E¤ects
We can also estimate both time e¤ects and individual e¤ects directly for (4.1). The likelihood function
of (4.1) is
T
nT nT 1 X
ln Ldn;T ( ; cn ; T) = ln 2 ln 2
+ T [ln jSn ( )j + ln jRn ( )j] 2
0
Vnt ( ; cn ; T )Vnt ( ; cn ; T ),
2 2 2 t=1
(5.4)
where Vnt ( ; cn ; T) = Rn ( )[Sn ( )Ynt Xnt cn t ln ]. Using the …rst order conditions for t and cn ,
the likelihood function with both cn and T concentrated out is
T
nT nT 1 X ~0
+ T [ln jSn ( )j + ln jRn ( )j] 2
Vnt ( )Jn V~nt ( ). (5.5)
2 2 2 t=1
For (5.5), the …rst and second order derivatives are, respectively, (D.5) and (D.6) in Appendix D.2.
14
The concentrated likelihood estimates of 0 from (5:5) can be derived from the …rst order conditions which
set the …rst order derivatives in (D.5) to zero. These …rst order conditions characterize the concentrated
likelihood estimates of the direct approach. Denote these estimates as ^ nd ; ^ nd ; ^nd and ^ 2nd . For the direct
estimation of the transformed time e¤ect in Section 5.1, their estimates, denoted by ~ nd ; ~ nd ; ~nd and ~ 2nd ,
are characterized by the …rst order conditions with (D.1). We see that these two sets of …rst order conditions
T 1
are the same except that the parameter 2
in (D.5) is taken place by T
2
in (D.1).23 Thus, it follows that
0 0
( ~ nd ; ~ nd ; ~nd ) = ( ^ nd ; ^ nd ; ^nd ) and ~ 2nd = T
T
1
^ 2nd . From Section 5:1, the direct estimation of transformed
time e¤ects will yield inconsistent estimators for all the parameters unless n is large. If we are going to
estimate both the time e¤ects and individual e¤ects directly, the consistency of 0 will require that n is large
2 24
and the consistency of 0 requires that both n and T are large.
6 Monte Carlo
We conduct a small Monte Carlo experiment to evaluate the performance of our transformation approach
and the direct ML estimators under di¤erent settings. We …rst check the case that there is individual
e¤ects but no time e¤ects in the DGP (see (2.1)), where we compare the performance of the transformation
approach in Section 2 with the direct approach in Section 3. Then, we check the case that time e¤ects are
also included in the DGP (see (4.1)), where we compare the transformation approach in Section 4 with the
direct approaches in Section 5.
We …rst generate samples from (2.1):
Ynt = 0 Wn Ynt + Xnt 0 + cn0 + Unt ; Unt = 0 Mn Unt + Vnt t = 1; 2; :::; T ,
a b 0
using 0 = (1:0; 0:2; 0:5; 1)0 and 0 = (1; 0:5; 0:2; 1)0 where 0 =( 0; 0; 0;
2 0
0) , and Xnt ; cn0 and Vnt are
generated from independent standard normal distributions and both the spatial weights matrices Wn and
Mn we use are the same rook matrices25 . We use T = 5, 10; 50, and n = 9, 16; 49. For each set of generated
sample observations, we calculate the ML estimator ^nT and evaluate the bias ^nT 0 . We do this for
1
P1000 ^
1000 times to get 1000 i=1 ( nT 0 )i . With two di¤erent values of 0 for each n and T , …nite sample
properties of both estimators are summarized in Table 1. For each case, we report the bias (Bias), empirical
standard deviation (E-SD), root mean square error (RMSE) and theoretical standard deviation (T-SD)26 .
0 0 2
Both approaches have the same estimate of 0 =( 0; 0; 0) while the estimator of 0 by the direct approach
2 3 Instead of the …rst order conditions, one may also follow the analysis in Section 3 by investigating the two concentrated
likelihood functions of ( , ) by concentrating out and 2 .
2 4 For this direct approach, the asymptotic bias will be of the order O(max(1=n; 1=T )) and we can have bias corrected
estimators which have centered normal distributions as long as n=T 3 ! 0 and T =n3 ! 0. See Appendix D:2 for more details.
2 5 We use the rook matrix based on an r board (so that n = r 2 ). The rook matrix represents a square tessellation with a
connectivity of four for the inner …elds on the chessboard and two and three for the corner and border …elds, respectively. Most
empirically observed regional structures in spatial econometrics are made up of regions with connectivity close to the range of
the rook tessellation.
2 6 The T-SD is obtained from diagonal elements of the estimated Hessian matrix.
15
2
has a larger bias. The transformation approach yields a consistent estimator of 0 and the direct approach
does not, which can be seen from the last two columns in Table 1 when T is small. We can see that the
0 0
Biases, E-SDs, RMSEs and T-SDs for estimators of the 0 =( 0; 0; 0) are small when either n or T are
large. Also, T-SDs are similar to E-SDs, which implies that the Hessian matrix provides proper estimates
2
for the variances of estimators. Also, when T is larger, the bias of the estimator of 0 by the direct approach
decreases.
We then generate samples from (4.1):
Ynt = 0 Wn Ynt + Xnt 0 + cn0 + t ln + Unt ; Unt = 0 Mn Unt + Vnt , t = 1; 2; :::; T ,
a b
using the same n, T , 0, 0, Wn and Mn . The Xnt ; cn0 , T0 =( 1; 2; ; T) and Vnt are generated from
independent standard normal distributions. The …nite sample properties of the estimators are summarized
in Table 2 and Table 3, where Table 2 is for the performance of the estimators using the transformation
approach in Section 4. Table 3 is for the performance of the estimators using both direct approaches discussed
in Section 5.1 and 5.2. We can see that the bias of the transformation approach is small. For the approach
that estimates the transformed time e¤ects directly, the bias is small when n is large, while the bias is
large when n is small even though T might be large. For the direct approach, it has the same estimate of
0 0
0 =( 0; 0; 0) as the approach that estimates the transformed time e¤ects directly, while the bias for the
2
estimate of 0 is small only when both n and T are large. This is consistent with the theoretical prediction.
Also, when both n and T are large, the biases of all the parameters from three approaches are small and the
RMSEs are reduced.
Table 1-3 here.
7 Conclusion
In this paper, we consider the estimation of a SAR panel model with …xed e¤ects and SAR disturbances
where the time periods T and/or the number of spatial units n can be …nite or large in all combinations
except that both T and n are …nite.
We …rst consider the SAR panel model with individual e¤ects. If T is …nite but n is large, we show that a
direct ML estimation by estimating jointly all the parameters including the …xed e¤ects will yield consistent
estimators except for the variance of disturbances. These features are similar to the direct ML estimation
of the linear panel regression model with …xed individual e¤ects. In this paper, we suggest a transformation
approach, which eliminates the individual …xed e¤ects and can provide consistent estimates for all the
parameters including the variance of disturbances. When the individual e¤ects are eliminated by taking
deviation from time average for each spatial unit, the resulted disturbances will be correlated over the time
16
dimension and there is linear dependence among the resulted disturbances. The transformation approach is
motivated by a ML approach which takes into account the generalized inverse of the resulted disturbances.
The transformation approach is shown to be a conditional likelihood approach if the disturbances were
normally distributed.
We consider next the SAR model with both individual and time …xed e¤ects. We investigate two possible
direct ML approaches for the estimation. The …rst direct approach is to transform the data to eliminate the
individual e¤ects and then estimates the remaining parameters including the time e¤ects by the ML method.
The second direct approach is to estimate both individual and time e¤ects directly. We show that the …rst
direct ML approach will yield inconsistent estimates for all the parameters, unless n is large; and the second
direct approach will yield inconsistent estimates only when both n and T are large. In fact, these two direct
ML approaches provide identical estimates of the spatial e¤ects and the regression coe¢ cients except for
2
the estimates of 0. These results are in contradiction with those of the direct ML estimation of the panel
regression models with both individual and time e¤ects where the regression coe¢ cients can be consistently
estimated as long as either n or T is large. Consistent estimation based on transformations is available,
where both the individual and time e¤ects can be eliminated by proper transformations. All the parameter
estimates are consistent when either n or T is large. Monte Carlo results are provided to illustrate …nite
sample properties of the various estimators with n and/or T being small or moderately large.
Compared with Baltagi et al. (2003), Baltagi et al. (2007) and Kapoor et al. (2007) where random e¤ects
are assumed, the SAR model in this paper considers a …xed e¤ects speci…cation. The proposed estimation
methods are robust regardless of the di¤erent speci…cations in Baltagi et al. (2003) and Kopoor et al. (2007),
and are computationally simpler than the ML approach for the estimation of the generalized random e¤ects
model in Baltagi et al. (2007). However, when the individual e¤ects are random in the true DGP, proper
methods which take into account the random e¤ects’ variance structure can improve the e¢ ciency of the
estimates. Hausman’s type of speci…cation test of …xed e¤ects vs random e¤ects may also be constructed.
These may be investigated in the future research.
17
Appendices
A Transformation Approach
A.1 The First and Second Order Derivatives
For the …rst and second order derivatives of (2.5), we have

0 PT 1
1
2 (Rn ( )X ~ nt )0 V~nt ( )
B Pt=1
T C
@ ln Ln;T ( ) B 2 t=1 (Rn ( )Wn Y~nt )0 V~nt ( ) (T
1
1)trGn ( ) C
= B 1 PT C, (A.1)
@ @ 2 t=1 (Hn ( )V~nt ( ))0 V~nt ( ) (T 1)trHn ( ) A
1
PT ~ 0 ~
2 4 t=1 (Vnt ( )Vnt ( ) n TT 1 2)
0 PT 1
1 ~ nt )0 Rn ( )X
~ nt
2 t=1 (Rn ( )X
PT
B PT 1
)Wn Y~nt )0 Rn ( )Wn Y~nt C
B 1 ~ 0 ~ 2 t=1 (Rn ( C
t=1 (Rn ( )Wn Ynt ) Rn ( )Xnt
@ ln Ln;T ( ) B C
2
2
B +(T 1)tr(G2n ( )) C
=B PT ~ ~ PT C
@ @ 0 B 1
2 t=1 (H
0
n ( )Vnt ( )) Rn ( )Xnt )
1
2 (Rn ( )Wn Y~nt )0n Hn ( )V~nt ( )
t=1 P C
B P T ~ ~ T 0 0 C
@ 1 0
+ 2 t=1 Vnt ( )Mn Xnt + 12 t=1 (Mn Wn Y~nt )0 V~nt ( ) A
1
PT ~ 0 ~ 1
P T ~ 0~
4 t=1 Vnt ( )Rn ( )Xnt 4 t=1 (Rn ( )Wn Ynt ) Vnt ( ) 0 0
0 1
0 0 0 0
B 0 0 0 0 C
B PT C
+B
B 0
1
2 )V~nt ( ))0 Hn ( )V~nt ( )
t=1 (Hn (
C.
C (A.2)
@ 0 A
+(T 1)tr(Hn2 ( ))
1
PT ~ 0~ n(T 1) 1
PT ~ 0 ~
0 0 4 t=1 (Hn ( )Vnt ( )) Vnt ( ) 2 4 + 6 t=1 (Vnt ( )Vnt ( ))
At true 0, we have
0 1 1
PT • 0 ~ 1
2 p t=1 Xnt Vnt
B 0 n(T 1) C
B 1 p 1
PT • • 0~ 1 p 1
PT ~ 0 •0 ~ T 1 2 • C
1 @ ln Ln;T ( 0 ) B 2
n(T 1) t=1 (Gn Xnt 0 ) Vnt + 20 n(T 1) t=1 (Vnt Gn Vnt T 0 tr Gn ) C
=B C,
0
p B 1 p 1
PT ~ 0 0 ~ T 1 2 C
n(T 1) @ B 2 t=1 (V nt H n V nt T 0 trH n) C
@ 0 n(T 1) A
1 p 1
PT ~ 0 ~ T 1 2
2 40 n(T 1) t=1 (Vnt Vnt n T 0)
(A.3)
1 @ 2 ln Ln;T ( 0)
and the information matrix is equal to 0 ;nT = E n(T 1) @ @ 0 =
0 1
0 1 0k X kX
HnT B 01 1 •s • C
1 @ A+B kX n tr Gn Gn ) C;
01 (kX +1) 0 @ 01 1 s • 1 s A (A.4)
n tr(Hn Gn ) n tr(Hn Hn )
2 kX
0 01 0 0
(kX +1)
01 kX
1 •
2 n tr(Gn )
1
2 n tr(Hn )
1
4
0 0 2 0
where we denote Asn = A0n + An for any n n matrix An , Gn = Wn Sn 1 , W • n = Rn Wn R 1 , G •n =

n
• n (In • • nt = Rn X
~ nt PT • nt , G
•nX • nt 0 )0 (X
• nt , G
•nX
• nt 0 ).
W 0 Wn )
1
, Hn = Mn Rn 1 , X and HnT = n(T1 1) t=1 (X
A.2 Proof of Claim 1

1 p
To prove n(T 1) ln Ln;T ( ) Qn;T ( ) ! 0 uniformly in in any compact parameter space :
18
From V~nt ( ) = Rn ( )[Sn ( )Y~nt ~ nt ], we have
X
V~nt ( ) V~nt = Rn ( )[Sn ( )Y~nt ~ nt ]
X Rn [Sn Y~nt ~ nt
X 0]
= Rn ( )[Sn ( )Y~nt ~ nt ]
X Rn ( )[Sn Y~nt ~ nt
X 0] + Rn ( )[Sn Y~nt ~ nt
X 0] Rn [Sn Y~nt ~ nt
X 0]
= Rn ( )[( 0 )Wn Y~nt + X

~ nt ( 0 )] + ( 0 )Mn [Sn Y~nt ~ nt
X 0]
= Rn ( )[( 0 )Wn Y~nt + X

~ nt ( 0 )] + ( 0 )Hn V~nt .
Similarly to Lee (2004) and Yu et al. (2006), we can show that27
T
X T
X
1 1 p
V~nt
0
( )V~nt ( ) E V~nt
0
( )V~nt ( ) ! 0 uniformly in .
n(T 1) t=1
n(T 1) t=1
2
Hence, by using the fact that is bounded away from zero in ,
T T
!
1 1 1 X 1 X p
ln Ln;T ( ) Qn;T ( ) = 2
V~nt
0
( )V~nt ( ) E ~ 0 ~
Vnt ( )Vnt ( ) ! 0
n(T 1) 2 n(T 1) t=1
n(T 1) t=1
uniformly in in .
To prove Qn;T ( ) is uniformly equicontinuous in in any compact parameter space :

From (2.6),
1 1 1 1 XT
QnT ( ) = ln 2 ln 2
+ [ln jSn ( )j + ln jRn ( )j] 2 n(T
E V~nt
0
( )V~nt ( ):
2 2 n 2 1) t=1
The uniform equicontinuity of Qn;T ( ) can be shown similarly to Lee (2004) and Yu et al. (2006).
A.3 Information Matrix
We can prove the nonsingularity of the limiting information matrix by using an argument by contradiction
(similar to Lee (2004)). Denote the limit of 0 ;nT
as 0
where 0 ;nT
is (A.4), we need to prove that
0
c = 0 implies c = 0 where c = (c01 ; c2 ; c3 ; c4 )0 , c2 ; c3 ; c4 are scalars and c1 is kX 1 vector. If this is true,
then, columns of 0 would be linear independent and 0 would be nonsingular. Denote H as the limit
PT • 0 • PT • 0 • •
of n(T1 1) t=1 X nt Xnt , H as the limit of n(T1 1) t=1 X nt Gn Xnt 0 , H = H0 and H as the limit of
1
PT • ~ 0• ~ 28
n(T 1) t=1 (Gn Xnt 0 ) Gn Xnt 0 , then
0 1
H H 0kX 1 0kX 1
2 2
1 BB H H + limn!1 n0 tr(G • sn G
•n) •n)
limn!1 n0 tr(Hns G •n)
limn!1 n1 tr(G C
C
= 2B 2
•n)
2
1 C.
0
0 @ 01 kX limn!1 n0 tr(Hns G limn!1 n0 tr(Hns Hn ) limn!1 n tr(Hn ) A
01 kX
1 •
limn!1 n tr(Gn ) limn!1 n1 tr(Hn ) 1
2
2 0
Hence, 0
c = 0 implies
(1) H c1 + H c2 = 0;
2 7 When n is large and T is …xed, the derivation is similar to Lee (2004) for the cross sectional SAR model. When T is large
and n could be …nite and large, the derivation is similar to Yu et al. (2006).
2 8 When n is …nite and T is large, we do not need the limit before each trace operator in the entries of .
0
19
(2) 1
H c1 + 1 •s G
H + limn!1 n1 tr(G • •n)
c2 + limn!1 n1 tr(Hns G
2
0
2
0
n n) c3
+ limn!1 1 •
2 n tr(Gn ) c4 = 0;
0
(3) •n)
limn!1 n1 tr(Hns G c2 + limn!1 n1 tr(Hns Hn ) c3 + 1
2 limn!1 n1 tr(Hn ) c4 = 0,
0
•n)
(4) limn!1 n1 tr(G c2 + limn!1 n1 tr(Hn ) c3 + 1
2 c4 = 0.
2 0
1 trGn trHn
The …rst equation implies c1 = (H ) H c2 . Denote Cn = Gn n In and Dn = Hn n In
• 2 2
so that 1 •s • 2 trnGn 1
tr(Cns Cns ), n1 tr(Hns Hn ) 2 trH 1
tr(Dns Dns ) 1 s •
n tr(Gn Gn ) = 2n = 2n and n tr(Hn Gn )
n
n
•n
2 trH
n
n tr G
n
1
= 2n tr(Cns Dns ). From the third and fourth equations, we have
1 1
tr(Cns Dns )c2 + tr(Dns Dns )c3 = 0,
n n
4 h •n
i
• n )trHn c2 + 1 tr(Dns Dns )c4
tr(Hns Hn )trG tr(Hns G = 0.
n2 n 20
By eliminating c1 ; c3 and c4 , the second equation becomes
1 1 s s 1
lim 2 n tr(Dn Dn ) H H (H ) H + n c2 = 0
n!1
0
where
1
n = tr(Cns Cns )tr(Dns Dns ) tr2 (Cns Dns ) (A.5)
4n2
1
is nonnegative by the Cauchy inequality. The H H (H ) H is nonnegative by the Schwartz inequality.
The nonsingularity of 0
follows from Assumption 7.
A.4 Proof of Theorem 1

PT n(T 1) n(T 1)
As E t=1 V~nt
0 ~
Vnt = n(T 1) 2
0, at 0, (2.6) implies E ln Ln;T ( 0 ) = 2 ln 2 2 ln 2
0 + (T
n(T 1)
1)[ln jSn j + ln jRn j] 2 . As Y~nt = Sn 1 (X
~ nt 0 + Rn 1 V~nt ) and Sn ( )Sn 1 = In + ( 0 )Gn , we have
V~nt ( ) = Rn ( )[Sn ( )Sn 1 Rn 1 V~nt + ( 0

~ nt
)Gn X 0
~ nt (
+X 0 )]:
Denote
2
2 0
n( ) = tr[(Rn ( )Rn 1 )0 (Rn ( )Rn 1 )],
n
2
2 0
n( ; ) = tr[(Rn ( )Sn ( )Sn 1 Rn 1 )0 (Rn ( )Sn ( )Sn 1 Rn 1 )].
n
It follows that
1 1
n(T 1) E ln Ln;T ( ) n(T 1) E ln Ln;T ( 0 )
PT
= 1
2 (ln
2
ln 2 1
0 )+ n ln jSn ( )j 1
n ln jSn j+ n1 ln jRn ( )j 1
n ln jRn j 2
1
2
1
n(T 1) t=1 E V~nt
0
( )V~nt ( ) 1
2
2 1
= T1;n ( ; ; ) 2 2 T2;n;T ( ; ; )
where
2 1 1 1 1 1 1
T1;n ( ; ; ) = (ln 2 ln 20 ) + ln jSn ( )j ln jSn j + ln jRn ( )j ln jRn j ( 2( ; ) 2
),
2 n n n n 2 2 n
1 XT n o
T2;n;T ( ; ; ) = ~ nt ( 0
(X )+( 0 )Gn X~ nt 0 )0 Rn0 ( )Rn ( )(X~ nt ( 0 )+( 0 ~ nt
)Gn X 0 ) :
n(T 1) t=1
20
2
From the pure SAR panel model with SAR disturbances, using the information inequality, T1;n ( ; ; ) 0
2
for any ( ; ; ). Also, T2;n;T ( ; ; ) is a quadratic function of and given .
Under the condition that the limit of HnT ( ) is nonsingular, T2;n;T ( ; ; ) > 0 given any when-
2
ever ( ; ) 6= ( 0; 0 ). Hence, ( ; ) is globally identi…ed. Given 0, 0 and 0 are the unique maxi-
2 1 2 10 1
mizer of the limiting function of T1;n ( ; ; ) under the condition that the limit of n ln 0 Rn R n
1 2 29
n ln n( )Rn 1 ( )0 Rn 1 ( ) is not zero for 6= 0. Hence, ( ; ; ; 2
) is globally identi…ed.
When the limit of HnT ( ) is singular, 0 and 0 cannot be identi…ed from T2;n;T ( ; ; ). Global iden-
2 2
ti…cation requires that the limit of T1;n ( ; ; ) is strictly less than zero. As T1;n ( ; ; ) 0 by the
2
information inequality for the pure SAR model with SAR disturbances, the limit of T1;n ( ; ; ) is not zero
1 2 10 10 1 1 1 2
is equivalent to the limit of n ln 0 R n Sn Sn R n n ln n( ; )Rn ( ) Sn ( ) Sn ( )Rn 1 ( ) is not
1 0 1 0 1
2
zero (similar to Lee (2004), Proof of Theorem 4.1). After 0, 0 and 0 are identi…ed, given 0, 0 can be
identi…ed from T2;n;T ( ; ; ).
Combined with uniform convergence and equicontinuity in Claim 1, the consistency follows.
A.5 Proof of Claim 2
The central limit theorem of martingale di¤erence arrays can be applied. When T is …nite and n is large,
we can use the central limit theorem in Kelejian and Prucha (2001). When T is large and n could be …nite
and large, we can use the central limit theorem in Yu et al. (2006).
A.6 Proof of Theorem 2

p @ 2 ln Ln;T ( nT )
1
@ ln Ln;T ( 0)
According to the Taylor expansion, n(T 1)(^nT 0) = 1
n(T 1) @ @ 0
p 1
@
n(T 1)
d 2
@ ln Ln;T ( 0) 1 @ ln Ln;T ( nT )
where p 1
@ ! N (0; 0
+ 0
) and nT lies between 0 and ^nT . As nT @ @ 0 =
n(T 1)
2 2 2
1 @ ln Ln;T ( nT ) 1 @ ln Ln;T ( 0) 1 @ ln Ln;T ( 0)
nT @ @ 0 nT @ @ 0 + nT @ @ 0 0 ;nT + 0 ;nT where the …rst term is
2
1 1 @ ln Ln;T ( nT )
nT 0 Op (1) and the second term is Op p respectively30 , nT @ @ 0 = nT 0
n(T 1)
p 1
Op (1) + Op + 0 ;nT . Because nT 0 = op (1) and 0 ;nT is nonsingular in the limit,
n(T 1)
1
1 @ 2 ln Ln;T ( nT ) 1 @ 2 ln Ln;T ( nT )
n(T 1) @ @ 0 is invertible for large n or T and n(T 1) @ @ 0 is Op (1). Then, it fol-
p 1
lows that ^nT 0 = Op p 1
. Hence, n(T 1)(^nT 0) = 0 ;nT
+ Op p 1
n(T 1) n(T 1)
1
@ ln Ln;T ( 0) 1
p1 . Using the fact that + Op p 1
= + Op p 1
, we have
nT @ 0 ;nT 0 ;nT
n(T 1) n(T 1)
p d
n(T 1)(^nT 0) ! N (0; 1
0
( 0 + 0 ) 0
1
).
2 9 This
is equivalent to the identi…cation of a pure SAR model. See Proof of Theorem 4.1 in Lee (2004).
3 0 When
n is large and T is …xed, the derivation is similar to Lee (2004) for the cross sectional SAR model. When T is large
and n could be …nite and large, the derivation is similar to Yu et al. (2006).
21
B Direct Approach: The First and Second Order Derivatives
For the concentrated likelihood function (3.2), the …rst and second order derivatives are
0 PT 1
1 p1
2 (R n ( ) ~ nt )0 V~nt ( )
X
t=1
B 1 nT PT C
1 @ ln Ldn;T ( ) B ~ 0~ trGn ( ) C
1 2
B 2 pnT t=1 (Rn ( )Wn Ynt ) Vnt ( ) C
p =B PT C, (B.1)
nT @ B 12 p1 (H ( )V~ ( ))0~
V ( ) 2
trH ( ) C
@ nT t=1
PT ~ 0
n nt nt n A
1 p1
(V ( )V~nt ( ) n 2
)
2 4 nT t=1 nt
0 PT 1
1 ~ nt )0 Rn ( )X
~ nt
2 t=1 (Rn ( )X
PT
B PT 1
)Wn Y~nt )0 Rn ( )Wn Y~nt C
B 1 ~ 0 ~ 2 t=1 (Rn ( C
t=1 (Rn ( )Wn Ynt ) Rn ( )Xnt
2
ln Ldn;T ( 1 B C
2
1 @ ) B +T tr(G2n ( )) C
= B PT ~ nt )0 Hn ( )V~nt ( ) PT C
nT @ @ 0
nT B 1
2 t=1P(Rn ( )X 1
2 (Rn ( )Wn Y~nt )0n Hn ( )V~nt ( )
t=1 P C
B T ~ nt )0 V~nt ( ) T 0 0 C
@ 1
+ 2 t=1 (Mn X + 2 t=1 (Mn Wn Y~nt )0 V~nt ( )
1 A
1
P T ~0 ~ 1
P T ~ 0~
4 t=1 Vnt ( )Rn ( )Xnt 4 t=1 (Rn ( )Wn Ynt ) Vnt ( ) 0 0
0 1
0 0 0 0
B 0 0 0 0 C
B PT C
+B
B 0
1
2
~ 0 ~
t=1 (Hn ( )Vnt ( )) Hn ( )Vnt ( )
C.
C (B.2)
@ 0 2 A
+T tr(Hn ( ))
PT PT
0 0 1
4 (Hn ( )V~nt ( ))0 V~nt ( )
t=1
nT
2 4 + 1
6
~0
t=1 (Vnt ( )V~nt ( ))
Hence,
0 1 p1
PT • 0 ~ 1
2
nT t=1 Xnt Vnt
B 0
1 p1
PT • • 0~ 1 p1
PT ~0 • ~ 2 C
1 @ ln Ldn;T ( 0 ) B 2 t=1 (Gn Xnt 0 ) Vnt + 20 nT t=1 (Vnt Gn Vnt 0 trGn ) C
p =B
B
0
1 p1
nT
PT ~0 0 ~ 2
C,
C (B.3)
nT @ @ 2 t=1 (Vnt Hn Vnt 0 trHn ) A
nT
0
1 p1
P T ~ 0 ~ 2
2 40 nT t=1 (Vnt Vnt n 0)
2 d
d 1 @ ln Ln;T ( 0)
and the information matrix is equal to 0 ;nT
= E nT @ @ 0 where
0 PT 1
1
2 nT
• nt
X 0 •
Xnt
t=1
B 0 PT • • 0• •
C
B PT 1
t=1 (Gn Xnt 0 ) Gn Xnt 0
C
B 1 • • 0• 2 nT
C
t=1 (Gn Xnt 0 ) Xnt 0
d
0 ;nT
=B
B
2 nT
0 • 0n G
+ T T 1 n1 tr(G • n ) + 1 tr(G • 2n ) C.
C
n
B T 1 1 • 1 0 • 1 T 1 C
@ 01 kX T [ n tr(Hn Gn ) + n tr(Hn Gn )] n T tr(Hn0 Hn ) + tr(Hn2 ) A
01 kX
T 1 1 •
2 n tr(Gn )
T 1 1
2 n tr(Hn )
T 1 1
T 0 T 0 T 2 40
(B.4)
22
C Transformation Approach with Time Dummy
C.1 The First and Second Order Derivatives of (4.5)
1 1
Using trGn ( ) tr(Jn Gn ( )) = 1 and tr(G2n ( )) tr((Jn Gn ( ))2 ) = (1 )2 (see Lee and Yu (2007a)),
for the concentrated likelihood function (4.5), the …rst and second order derivatives are
0 @ ln Ln;T ( ) 1 0 PT 1
@
1
2 (Rn ( )X~ nt )0 Jn V~nt ( )
t=1
PT
@ ln Ln;T ( ) B
B
@ ln Ln;T ( ) C B
C B 1
2 (Rn ( )Wn Y~nt )0 Jn V~nt ( ) (T 1)trJn Gn ( ) C
C
=B @
@ ln Ln;T ( ) C=B 1
Pt=1
T ~ 0 ~ C, (C.1)
@ @ A @ 2 t=1 (Hn ( )Vnt ( )) Jn Vnt ( ) (T 1)trJn Hn ( ) A
@
1
P T ~0 ~
@ ln Ln;T ( )
2 4 t=1 (Vnt ( )Jn Vnt ( ) (n 1) T T 1 2 )
@ 2
0 PT 1
1 ~ nt )0 Jn Rn ( )X
~ nt
2 t=1 (Rn ( )X
PT
B PT 1
)Wn Y~nt )0 Jn Rn ( )Wn Y~nt C
B 1
)Wn Y~nt )0 Jn Rn ( )X
~ nt 2 t=1 (Rn ( C
t=1 (Rn (
@ ln Ln;T ( ) B C
2
2
B +(T 1)tr(Jn G2n ( )) C
=B PT P T C
@ @ 0
B 1
t=1 (H ( )V~ ( ))0 Jn Rn ( )X
~ nt ) 1
(Rn ( )Wn Y~nt )0n Jn Hn ( )V~nt ( ) C
B 2
PnT ~nt ~
2 t=1 P
T 0 0 C
@ 1 0
+ 2 t=1 Vnt ( )Jn Mn Xnt + 2 t=1 (Mn Wn Y~nt )0 Jn V~nt ( )
1 A
1
PT ~ 0 ~ 1
PT ~ 0 ~
4 t=1 Vnt ( )Jn Rn ( )Xnt 4 t=1 (Rn ( )Wn Ynt ) Jn Vnt ( ) 0 0
0 1
0 0 0 0
B 0 0 0 0 C
B PT C
+B
B 0
1
2
~ 0 ~
t=1 (Hn ( )Vnt ( )) Jn Hn ( )Vnt ( )
C.
C (C.2)
@ 0 A
+(T 1)tr(Jn Hn2 ( ))
1
PT ~ 0 ~ (n 1)(T 1) 1
PT ~ 0 ~
0 0 4 t=1 (Hn ( )Vnt ( )) Jn Vnt ( ) 2 4 + 6 (Vnt ( )Jn Vnt ( ))
t=1
From (C.1), the score vector and the information matrix are
1 @ ln Ln;T ( 0 )
p
(n 1)(T 1) @
0 1
1
P
T
• 0 Jn V~nt )
p (X
B 2 (n 1)(T 1) t=1 nt C
B 0
C
B PT P
T C
B p 1 •nX
(G • nt 0 ~
0 ) Jn Vnt + p 1
(V~nt
0 •
Gn Jn V~nt T 1 2
trJ •
G ) C
B 2 (n 1)(T 1) t=1 2 (n 1)(T 1) t=1 T 0 n n C
= B
B
0 0 C,
(C.3)
C
B PT
C
B 2
p 1
(V~nt
0
Hn Jn V~nt T
T
1 2
0 trJn Hn ) C
B 0 (n 1)(T 1) t=1 C
@ PT A
4
p 1
(V~nt
0
Jn V~nt T
T
1
(n 1) 2
0)
2 0 (n 1)(T 1) t=1
0 1
0 1 0k X kX
HnT B 1 •s • C
1 @ A+B
01 kX n 1 tr(Gn Jn Gn ) C
0 ;nT = 0 1 kX 0 B 1 s • 1 s C, (C.4)
01 n 1 tr(Hn Jn Gn ) n 1 tr(Hn Jn Hn )
2 @ kX A
0 01 k X 0 0
01 kX
1 •
2 (n 1) tr(Jn Gn )
1
2 (n 1) tr(Jn Hn )
1
4
0 0 2 0
1
P
T
• nt ; Gn X
• nt 0 • •
where HnT = (n 1)(T 1) (X 0 ) Jn (Xnt ; Gn Xnt 0 ).
t=1
C.2 Proof for Theorem 3
This is similar to proof of Theorem 1 and 2.
23
D Direct Approaches with Time Dummy
D.1 The First and Second Order Derivatives of (5.3)
For the concentrated likelihood function (5.3), the …rst and second order derivatives are
0 PT 1
1
2 ((Rn ( )X~ nt )0 Jn V~nt ( )
t=1
PT
@ ln Ldn;T ( ) BB 12 t=1 (Jn Rn ( )Wn Y~nt )0 V~nt ( ) (T 1)trGn ( ) C
C
= B 1 PT C, (D.1)
@ @ 2 t=1 (Jn Hn ( )V~nt ( )) V~nt ( ) (T 1)trHn ( ) A
0
1
P T ~0 ~
2 4 t=1 (Vnt ( )Jn Vnt ( ) n TT 1 2)
0 PT 1
1 ~ nt )0 Jn Rn ( )X
~ nt
2 t=1 (Rn ( )X
PT
B PT 1
B 1 ~ 0 ~ 2 t=1 (Rn ( C
2
ln Ldn;T ( ) B 2 t=1 (Rn ( )Wn Ynt ) Jn Rn ( )Xnt C
@ B +(T 1)tr(G2n ( )) C
=B PT ~ nt )0 Jn Hn ( )V~nt ( ) PT C
(D.2)
@ @ 0
B 1
2 t=1P(Rn ( )X 1
2 (Rn ( )Wn Y~nt )0n Jn Hn ( )V~nt ( )
t=1 P C
B T ~ nt )0 Jn V~nt ( ) T 0 0 C
@ 1
+ 2 t=1 (Mn X + 2 t=1 (Mn Wn Y~nt )0 Jn V~nt ( )
1 A
1
P T ~0 ~ 1
P T ~ 0 ~
0 1
0 0 0 0
B 0 0 0 0 C
B PT C
+B
B 0
1
2 t=1 (H n ( )V~nt ( ))0 Jn Hn ( )V~nt ( ) C.
C
@ 0 A
+(T 1)tr(Hn2 ( ))
PT n(T 1) PT ~ 0
0 0 1
4 (Hn ( )V~nt ( ))0 Jn V~nt ( )
t=1 2 4 + 1
6 (V (
t=1)J ~
V
n nt
nt ( ))
Hence, for the …rst order derivative evaluated at 0, we have

0 1
P
T
• nt
B
1
2 p 1
X 0
Jn V~nt C
0 n(T 1) t=1
B C
B PT P
T C
B 1 p 1 •nX
(G • nt 0 ~
0 ) Jn Vnt + 1 p 1
(V~nt
0 •0
Gn Jn V~nt 2T 1 C
0 T trGn ) C
1 @ ln Ldn;T ( 0 ) B 2
n(T 1) t=1
2
n(T 1) t=1
=B C.
0 0
p B C
n(T 1) @ B PT
C
B
1
2 p 1
(V~nt
0
Hn0 Jn V~nt 2T 1
0 T trHn ) C
B 0 n(T 1) t=1 C
@ PT A
2
1
4 p 1
(V~nt
0
Jn V~nt T
T
1
n 2
0)
0 n(T 1) t=1
(D.3)
d 1 @ 2 ln Ld
n;T ( 0)
For the information matrix, denote 0 ;nT
= E n(T 1) @ @ 0 , we have
0 1
0 1 0k X kX h i
d
1 @
HnT B 0 1 • 0n Jn G
tr(G • n ) + tr(G
• 2n ) C
d
= 01 kX 0 A+B
B
1 kX n C
C,
0 ;nT 2 1 • 1
0 @ 01 kX
s
n tr(Hn Jn Gn ) tr(Hn0 Jn Hn ) + tr(Hn2 ) A
01 kX 0 0 n
01 kX
1 •
2 tr(Jn Gn )
1
2 n tr(Jn Hn )
1
4
0n 0 2 0
(D.4)
d 1
P
T
• nt ; G
•nX
• nt 0 • • •
where HnT = n(T 1) (X 0 ) Jn (Xnt ; Gn Xnt 0 ).
t=1
24
D.2 The First and Second Order Derivatives of (5.5) and Asymptotic Bias
The …rst and second order derivatives of the concentrated log likelihood in (5.5) are
0 PT 1
1
2 (Rn ( )X ~ nt )0 Jn V~nt ( )
Pt=1
@ ln Ln;T ( ) B
d
B 1 T
(Rn ( )Wn Y~nt )0 Jn V~nt ( ) T trGn ( )
C
C
= B 12 Pt=1T C, (D.5)
@ @ 2 t=1 (Hn ( )V~nt ( ))0 Jn V~nt ( ) T trHn ( ) A
1
PT ~ 0 ~
2 4 t=1 (Vnt ( )Jn Vnt ( ) n 2)
0 PT 1
1 ~ nt )0 Jn Rn ( )X
~ nt
2 t=1 (Rn ( )X
PT
B PT 1
B 1 ~ 0 ~ 2 t=1 (Rn ( C
@ ln Ln;T ( ) B
2 d 2 t=1 (Rn ( )Wn Ynt ) Jn Rn ( )Xnt C
B +T tr(G2n ( )) C
=B PT ~ nt )0 Jn Hn ( )V~nt ( ) PT C
@ @ 0
B 1
2 (Rn ( )X
t=1P
1
2 (Rn ( )Wn Y~nt )0n Jn Hn ( )V~nt ( )
t=1 P C
B T ~ nt )0 Jn V~nt ( ) T 0 0 C
@ + 12 t=1 (Mn X + 12 t=1 (Mn Wn Y~nt )0 Jn V~nt ( ) A
1
PT ~ 0 ~ 1
P T ~ 0 ~
0 1
0 0 0 0
B 0 0 0 0 C
B PT C
+B
B 0
1
2 t=1 (H n ( )V~ 0 ~
nt ( )) Jn Hn ( )Vnt ( )
C.
C (D.6)
@ 0 2 A
+T tr(Hn ( ))
1
PT ~ 0 ~ nT 1
PT ~ 0 ~
0 0 4 t=1 (Hn ( )Vnt ( )) Jn Vnt ( ) 2 4 + 6 (V ( )Jn Vnt ( ))
t=1 nt
For the …rst order derivative evaluated at 0, it has three components such that
@ ln Ldn;T ( 0 ) @ ln Ld;u
n;T ( 0 )
= n a 0 ;n;1
(T 1) a 0 ;2
(D.7)
@ @
where
0 1
P
T
• nt
B
1
2 X 0
Jn V~nt C
0
B t=1 C
B PT P
T C
d;u B 1 •nX
(G • nt 0 ~
0 ) Jn Vnt + 1
(V~nt
0 •0
Gn Jn V~nt 2T 1 • 0 C
0 T tr Gn Jn ) C
@ ln Ln;T ( 0 ) B 2 2
=B C,
0 0
t=1 t=1
@ B PT C
B 1
(V~nt
0
Hn0 Jn V~nt 2T 1 0 C
B 2 0 T trHn Jn ) C
B 0
t=1 C
@ PT A
2
1
4 (V~nt
0
Jn V~nt T
T
1
(n 1) 2
0)
0
t=1
1 1 1 0 1 1
a 0 ;n;1
= (01 kX ; n trGn ; n trHn ; 2 2 ) and a 0 ;2
= (01 kX ; 1 0
;1 ; 2 1 2 )0 . For the information matrix,
0 0 0
d
2 d
1 @ ln Ln;T ( 0) d 1
P
T
• nt ; G
•nX
• nt 0 • • •
denote 0 ;nT
= E nT @ @ 0 and HnT = nT (X 0 ) Jn (Xnt ; Gn Xnt 0 ), we have
t=1
0 1
0 1 0kX kX h i
d
1 @
HnT B 0 1 • 0n Jn G
tr(G • n ) + tr(G
• 2n ) C
d
= 01 kX 0 A+B
B
1 kX n C
C.
0 ;nT 2 1 • 1
0 @ 01 kX
s
n tr(Hn Jn Gn ) tr(Hn0 Jn Hn ) + tr(Hn2 ) A
01 kX 0 0 n
01 kX
1
2 tr(J G•
n n)
1
2 n tr(Jn Hn )
1
4
0n 0 2 0
@ ln Ld;u
n;T ( 0)
As p1 will be normally distributed asymptotically, we can see that the estimators from this
nT @
1 d 1 1 d 1
direct approach will have O(1=T ) bias T ( 0 ;nT
) a 0 ;n;1 and O(1=n) bias n( 0 ;nT
) a 0 ;2 . Similar to
25
Lee and Yu (2007a), a bias correction procedure can be designed to eliminate the bias. Denote ^nT as the
d1 d ^ 1;nT
B B^ 2;nT
QMLE that solves (5:5), the bias corrected estimator can be ^nT = ^nT T n
^1;nT =
where B
h i h i
( d;nT ) 1 a ;n;1 ^2;nT =
and B ( d;nT ) 1 a ;2 . Similar to Lee and Yu (2007a), it can
d d
=^nT =^nT
d1 p
be shown that when n=T 3 ! 0 and T =n3 ! 0, ^nT is nT consistent and asymptotically centered normal.
26
References
Amemiya, T., 1971. The estimation of the variances in a variance-components model. International Eco-
nomic Review 12, 1-13.
Amemiya, T., 1985. Advanced Econometrics. Harvard University Press, Cambridge, MA.
Anderson, T.W. and C. Hsiao, 1981. Estimation of dynamic models with error components. Journal of the
American Statistical Association 76, 598-606.
Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic, The Netherlands.
Anselin, L. and A.K. Bera, 1998. Spatial dependence in linear regression models with an introduction to
spatial econometrics, A. Ullah and D.E.A. Giles (eds.). Handbook of Applied Economics Statistics, Marcel
Dekker, New York.
Arellano, M. and O. Bover, 1995. Another look at the instrumental-variable estimation of error-components
models. Journal of Econometrics 68, 29-51.
Baltagi, B., S.H. Song and W. Kon, 2003. Testing panel data regression models with spatial error correlation.
Journal of Econometrics 117, 123-150.
Baltagi, B., P. Egger and M. Pfa¤ermayr, 2007. A generalized spatial panel data model with random e¤ects.
Working Paper, Syracuse University.
Chamberlain, G., 1982. Multivariate regression models for panel data. Journal of Econometrics 18, 5-46.
Cli¤, A.D. and J.K. Ord, 1973. Spatial Autocorrelation. London: Pion Ltd.
Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York.
Ertur C. and W. Koch, 2007. Growth, technological interdependence and spatial externalities: theory and
evidence. Journal of Applied Econometrics 22, 1033-1062.
Foote, C.L., 2007. Space and time in macroeconomic panel data: young workers and state-level unemploy-
ment revisited. Working Paper No. 07-10, Federal Reserve Bank of Boston.
Hahn, J. and Kuersteiner, 2002. Asymptotically unbiased inference for a dynamic panel model with …xed
e¤ects when both n and T are Large. Econometrica 70, No.4, 1639-1657.
Hahn, J. and H.R. Moon, 2006. Reducing bias of MLE in a dynamic panel model. Econometric Theory 22,
499-512.
Hausman, J.A., 1978. Speci…cation tests in econometrics. Econometrica 46, 1251-1271.
Hsiao, C., 1986. Analysis of Panel Data. Cambridge University Press.
Kapoor, M., Kelejian, H.H. and I.R. Prucha, 2007. Panel data models with spatially correlated error
components. Journal of Econometrics, 140, 97-130.
Kelejian, H.H. and I.R. Prucha, 1998. A generalized spatial two-stage least squares procedure for estimating
a spatial autoregressive model with autoregressive disturbance. Journal of Real Estate Finance and
Economics 17:1, 99-121.
27
Kelejian H.H. and I.R. Prucha, 2001. On the asymptotic distribution of the Moran I test statistic with
applications. Journal of Econometrics, 104, 219-257.
Kelejian, H.H. and D. Robinson, 1993. A suggested method of estimation for spatial interdependent models
with autocorrelated errors, and an application to a county expenditure model. Papers in Regional Science
72, 297-312.
Kelejian H.H. and I.R. Prucha, 2007. Speci…cation and estimation of spatial autoregressive models with
autoregressive and heteroskedastic disturbances. Forthcoming in Journal of Econometrics.
Lee, L.F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric
models. Econometrica 72, 1899-1925.
Lee, L.F., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. Journal of
Econometrics 137, 489-514.
Lee, L.F. and X. Liu, 2006. E¢ cient GMM estimation of a spatial autoregressive model with autoregressive
disturbances. Working Paper, The Ohio State University.
Lee, L.F., X. Liu and X. Lin, 2008. Speci…cation and estimation of social interaction models with network
structure, contexual factors, correlation and …xed e¤ects. Working Paper, The Ohio State University.
Lee, L.F. and J. Yu, 2007a. A spatial dynamic panel data model with both time and individual …xed e¤ects.
Working Paper, The Ohio State University.
Lee, L.F. and J. Yu, 2007b. Near unit root in the spatial autoregressive model. Working Paper, The Ohio
State University.
Lin, X. and L.F. Lee, 2005. GMM estimation of spatial autoregressive models with unknown heteroskedas-
ticity. Working Paper, The Ohio State University. Forthcoming in Journal of Econometrics.
Nerlove, M., 1971. A note on error components models. Econometrica 39, 383-396.
Neyman, J. and E.L. Scott, 1948. Consistent estimates based on partially consistent observations. Econo-
metrica 16, 1-32.
Rothenberg, T.J., 1971. Identi…cation in parametric models. Econometrica 39, No.3, 577-591.
Wallace, T.D. & A. Hussain, 1969. The use of error components models in combining cross-section and
time-series data. Econometrica 37, 55-72.
Yu, J., R. de Jong and L.F. Lee, 2006. Quasi-maximum likelihood estimators for spatial dynamic panel data
with …xed e¤ects when both n and T are large. Working Paper, The Ohio State University.
Yu, J., R. de Jong and L.F. Lee, 2007. Quasi-maximum likelihood estimators for spatial dynamic panel data
with …xed e¤ects when both n and T are large: a nonstationary case. Working Paper, The Ohio State
University.
Yu, J. and L.F. Lee, 2007. Estimation of unit root spatial dynamic panel data models. Working Paper, The
Ohio State University.
28
Table 1: Transformation and Direct Approaches: Model with Individual E¤ects Only
2 2
T n 0 1 2
a
(1) 5 49 0 Bias -0.0027 0.0096 -0.0279 -0.0216 -0.2173
E-SD 0.0766 0.1377 0.1459 0.1067 0.0854
RMSE 0.0766 0.1380 0.1485 0.1089 0.2334
T-SD 0.0743 0.1355 0.1371 0.1043 0.0746
b
(2) 5 49 0 Bias -0.0039 -0.0173 0.0021 -0.0027 -0.2182
E-SD 0.0736 0.1150 0.1590 0.1044 0.0835
RMSE 0.0737 0.1163 0.1590 0.1068 0.2336
T-SD 0.0718 0.1134 0.1574 0.1024 0.0733
a
(3) 10 49 0 Bias -0.0005 0.0040 -0.0110 -0.0116 -0.1104
E-SD 0.0492 0.0948 0.0939 0.0704 0.0633
RMSE 0.0492 0.0949 0.0945 0.0713 0.1273
T-SD 0.0496 0.0925 0.0921 0.0701 0.0599
b
(4) 10 49 0 Bias -0.0011 -0.0066 0.0007 -0.0120 -0.1108
E-SD 0.0466 0.0759 0.1053 0.0691 0.0622
RMSE 0.0466 0.0762 0.1053 0.0702 0.1271
T-SD 0.0475 0.0755 0.1069 0.0687 0.0586
a
(5) 50 9 0 Bias 0.0003 0.0072 -0.0126 -0.0082 -0.0280
E-SD 0.0501 0.0844 0.0810 0.0713 0.0699
RMSE 0.0501 0.0847 0.0820 0.0718 0.0753
T-SD 0.0499 0.0842 0.0787 0.0704 0.0683
b
(6) 50 9 0 Bias -0.0010 -0.0065 0.0018 -0.0093 -0.0291
E-SD 0.0481 0.0664 0.0961 0.0708 0.0694
RMSE 0.0482 0.0668 0.0962 0.0714 0.0752
T-SD 0.0475 0.0645 0.0967 0.0689 0.0669
a
(7) 50 16 0 Bias -0.0010 0.0021 -0.0050 -0.0079 -0.0278
E-SD 0.0380 0.0692 0.0660 0.0536 0.0525
RMSE 0.0380 0.0692 0.0662 0.0542 0.0594
T-SD 0.0374 0.0663 0.0641 0.0528 0.0512
b
(8) 50 16 0 Bias -0.0015 -0.0037 0.0016 -0.0082 -0.0280
E-SD 0.0367 0.0549 0.0792 0.0526 0.0516
RMSE 0.0367 0.0550 0.0793 0.0532 0.0587
T-SD 0.0356 0.0524 0.0762 0.0516 0.0501
a
(9) 50 49 0 Bias -0.0009 -0.0011 -0.0004 -0.0025 -0.0224
E-SD 0.0220 0.0405 0.0401 0.0305 0.0298
RMSE 0.0220 0.0405 0.0401 0.0306 0.0373
T-SD 0.0214 0.0404 0.0396 0.0303 0.0294
b
(10) 50 49 0 Bias -0.0007 -0.0031 0.0026 -0.0019 -0.0219
E-SD 0.0212 0.0321 0.0465 0.0297 0.0291
RMSE 0.0212 0.0323 0.0466 0.0298 0.0365
T-SD 0.0203 0.0324 0.0464 0.0296 0.0287
Note: 1. a0 = (1; 0:2; 0:5; 1) and b0 = (1; 0:5; 0:2; 1).
2. The column of 21 is from the transformation approach;
and the column of 22 is from the direct approach.
3. The transformarion approach and the direct approach yield the same estimate of
0 0
0= ( 0; 0; 0) .
4. The T-SD is obtained from the transformation approach, except for 22 , which is from
the direct approach.
29
Table 2: Transformation Approach: Model with Both Time and Individual E¤ects
2
T n 0
a
(1) 5 49 0 Bias -0.0020 0.0121 -0.0300 -0.0223
E-SD 0.0764 0.1403 0.1529 0.1078
RMSE 0.0764 0.1408 0.1558 0.1100
T-SD 0.0751 0.1406 0.1481 0.1045
b
(2) 5 49 0 Bias -0.0042 -0.0167 0.0017 -0.0242
E-SD 0.0737 0.1227 0.1658 0.1052
RMSE 0.0738 0.1238 0.1658 0.1079
T-SD 0.0723 0.1223 0.1654 0.1031
a
(3) 10 49 0 Bias -0.0001 0.0056 -0.0137 -0.0124
E-SD 0.0500 0.0986 0.1031 0.0706
RMSE 0.0500 0.0988 0.1040 0.0717
T-SD 0.0502 0.0955 0.0994 0.0702
b
(4) 10 49 0 Bias -0.0013 -0.0064 -0.0005 -0.0133
E-SD 0.0471 0.0836 0.1126 0.0700
RMSE 0.0471 0.0839 0.1126 0.0712
T-SD 0.0478 0.0816 0.1122 0.0691
a
(5) 50 9 0 Bias 0.0010 0.0098 -0.0102 -0.0110
E-SD 0.0546 0.1038 0.1260 0.0729
RMSE 0.0546 0.1042 0.1264 0.0738
T-SD 0.0540 0.1021 0.1276 0.0721
b
(6) 50 9 0 Bias -0.0017 -0.0010 0.0028 -0.0121
E-SD 0.0512 0.1094 0.1306 0.0745
RMSE 0.0512 0.1094 0.1306 0.0755
T-SD 0.0507 0.1066 0.1314 0.0731
a
(7) 50 16 0 Bias -0.0011 0.0019 -0.0046 -0.0093
E-SD 0.0393 0.0755 0.0845 0.0540
RMSE 0.0393 0.0755 0.0846 0.0548
T-SD 0.0390 0.0737 0.0830 0.0532
b
(8) 50 16 0 Bias -0.0019 -0.0031 0.0013 -0.0095
E-SD 0.0373 0.0709 0.0915 0.0537
RMSE 0.0373 0.0710 0.0915 0.0546
T-SD 0.0365 0.0684 0.0894 0.0529
a
(9) 50 49 0 Bias -0.0009 -0.0011 -0.0002 -0.0026
E-SD 0.0222 0.0422 0.0434 0.0305
RMSE 0.0222 0.0423 0.0434 0.0306
T-SD 0.0216 0.0417 0.0428 0.0304
b
(10) 50 49 0 Bias -0.0008 -0.0030 0.0025 -0.0021
E-SD 0.0213 0.0358 0.0494 0.0298
RMSE 0.0213 0.0360 0.0494 0.0299
T-SD 0.0204 0.0351 0.0487 0.0298
a b
Note: 0= (1; 0:2; 0:5; 1) and 0= (1; 0:5; 0:2; 1).
30
Table 3: Direct Approaches: Model With Both Time and Individual E¤ects
2 2
T n 0 1 2
a
(1) 5 49 0 Bias 0.0021 0.0271 -0.0904 -0.0259 -0.2207
E-SD 0.0749 0.1213 0.1342 0.1053 0.0843
RMSE 0.0749 0.1243 0.1618 0.1085 0.2362
T-SD 0.0662 0.1254 0.1338 0.0734 0.1026
b
(2) 5 49 0 Bias -0.0017 -0.0382 0.0183 -0.0334 -0.2267
E-SD 0.0733 0.1063 0.1443 0.1039 0.0831
RMSE 0.0733 0.1129 0.1455 0.1092 0.2415
T-SD 0.0642 0.1090 0.1478 0.0725 0.1013
a
(3) 10 49 0 Bias 0.0038 0.0241 -0.0779 -0.0167 -0.1151
E-SD 0.0488 0.0856 0.0910 0.0692 0.0623
RMSE 0.0489 0.0889 0.1198 0.0712 0.1308
T-SD 0.0468 0.0900 0.0952 0.0588 0.0688
b
(4) 10 49 0 Bias 0.0001 -0.0305 -0.0178 -0.0240 -0.1216
E-SD 0.0471 0.0733 0.0980 0.0691 0.0622
RMSE 0.0471 0.0794 0.0996 0.0731 0.1366
T-SD 0.0450 0.0771 0.1060 0.0579 0.0679
a
(5) 50 9 0 Bias -0.0014 -0.0179 -0.3438 -0.1081 -0.1260
E-SD 0.0519 0.0541 0.0566 0.0663 0.0649
RMSE 0.0520 0.0570 0.3484 0.1268 0.1417
T-SD 0.0488 0.0983 0.1140 0.0587 0.0605
b
(6) 50 9 0 Bias -0.0091 -0.1959 -0.1330 -0.1079 -0.1258
E-SD 0.0526 0.0528 0.0571 0.0664 0.0651
RMSE 0.0534 0.2029 0.1447 0.1267 0.1416
T-SD 0.0479 0.0965 0.1192 0.0600 0.0619
a
(7) 50 16 0 Bias 0.0038 0.0262 -0.1964 -0.0417 -0.0608
E-SD 0.0377 0.0496 0.0551 0.0508 0.0498
RMSE 0.0379 0.0561 0.2040 0.0657 0.0786
T-SD 0.0365 0.0713 0.0803 0.0478 0.0493
b
(8) 50 16 0 Bias -0.0021 -0.0948 -0.0539 -0.0502 -0.0692
E-SD 0.0375 0.0461 0.0578 0.0510 0.0500
RMSE 0.0376 0.1054 0.0791 0.0716 0.0854
T-SD 0.0354 0.0660 0.0862 0.0480 0.0494
a
(9) 50 49 0 Bias 0.0030 0.0195 -0.0671 -0.0073 -0.0272
E-SD 0.0217 0.0365 0.0385 0.0297 0.0291
RMSE 0.0219 0.0413 0.0774 0.0306 0.0398
T-SD 0.0210 0.0409 0.0428 0.0288 0.0297
b
(10) 50 49 0 Bias -0.0002 -0.0286 -0.0132 -0.0138 -0.0335
E-SD 0.0213 0.0314 0.0428 0.0294 0.0288
RMSE 0.0213 0.0425 0.0448 0.0325 0.0442
T-SD 0.0201 0.0347 0.0479 0.0284 0.0293
Note: 1. a0 = (1; 0:2; 0:5; 1) and b0 = (1; 0:5; 0:2; 1).
2. The column of 21 is from direct approach I;
and the column of 22 is from dirct approach II.
0 0
3. The two direct approaches yield the same estimate of 0 = ( 0 ; 0 ; 0 ) .
2
4. The T-SD is obtained from the direct approach, except for 2 , which is
from the direct approach II.
31

Spatial Autoregressive Panel FE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Autoregressive Panel FE

Uploaded by

Copyright:

Available Formats

Estimation of spatial autoregressive panel data models with …xed

JEL classi…cation: C13; C23; R15

Ynt = Sn 1 Xnt 0 + Sn 1 cn0 + Sn 1 Rn 1 Vnt . (2.2)

Ynt = 0 Wn Ynt + Xnt 0 + Unt ; Unt = 0 Mn Unt + Vnt , t = 1; ;T 1. (2.3)

The likelihood function of (2.3) as if the disturbances were normally distributed, is

= (p0n1 ; ; p0nT )(JT 0

2.2 Asymptotic Properties

Proof. See Appendix A.4.

central limit theorem in Yu et al. (2006).

for the distribution of ^nT .

3 The Direct Approach

3.1 The Likelihood Function

The likelihood function for the model before transformation (2.1) is

3.2 Asymptotic Properties

Ynt = 0 Wn Ynt + Xnt 0 + t ln + Unt , Unt = 0 Mn Unt + Vnt , t = 1; 2; :::; T 1, (4.2)

where [ 1 ln ; 2 ln ; ; T 1 ln ] = [ 1 ln ; 2 ln ; ; T ln ]FT;T 1 can be considered as the transformed time

4.1 Data Transformation and the Likelihood Function

= (p0n1 ; ; p0nT )(JT 0

This implies that the likelihood function (4.4) is numerically identical to

where V~nt ( ) = Rn ( )[(In Wn )Y~nt ~ nt ].19

n is …nite and T is large, this inequality becomes n 1 1 ln j 20 Rn 10 Jn Rn 1 j n

Proof. See Appendix C.2.

5 A General Model With Time E¤ects: Direct Approaches

= (p0n1 ; ; p0nT )(JT 0

the likelihood function (5.2) is numerically identical to

Ynt = 0 Wn Ynt + Xnt 0 + cn0 + Unt ; Unt = 0 Mn Unt + Vnt t = 1; 2; :::; T ,

Ynt = 0 Wn Ynt + Xnt 0 + cn0 + t ln + Unt ; Unt = 0 Mn Unt + Vnt , t = 1; 2; :::; T ,

Table 1-3 here.

For the …rst and second order derivatives of (2.5), we have

where we denote Asn = A0n + An for any n n matrix An , Gn = Wn Sn 1 , W • n = Rn Wn R 1 , G •n =

A.2 Proof of Claim 1

= Rn ( )[( 0 )Wn Y~nt + X

= Rn ( )[( 0 )Wn Y~nt + X

To prove Qn;T ( ) is uniformly equicontinuous in in any compact parameter space :

A.3 Information Matrix

A.4 Proof of Theorem 1

V~nt ( ) = Rn ( )[Sn ( )Sn 1 Rn 1 V~nt + ( 0

A.5 Proof of Claim 2

A.6 Proof of Theorem 2

C.2 Proof for Theorem 3

This is similar to proof of Theorem 1 and 2.

Hence, for the …rst order derivative evaluated at 0, we have

You might also like