Semiparametric Spatial Autoregressive Panel Data Model With Fixed Effects and Time-Varying Coefficients

Journal of Business & Economic Statistics
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ubes20
Semiparametric Spatial Autoregressive Panel

Data Model with Fixed Effects and Time-Varying
Coefficients
Xuan Liang, Jiti Gao & Xiaodong Gong
To cite this article: Xuan Liang, Jiti Gao & Xiaodong Gong (2021): Semiparametric Spatial
Autoregressive Panel Data Model with Fixed Effects and Time-Varying Coefficients, Journal of
Business & Economic Statistics, DOI: 10.1080/07350015.2021.1979564
To link to this article: https://doi.org/10.1080/07350015.2021.1979564
View supplementary material
Published online: 15 Nov 2021.
Submit your article to this journal
Article views: 343
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=ubes20
JOURNAL OF BUSINESS & ECONOMIC STATISTICS
2021, VOL. 00, NO. 0, 1–19
https://doi.org/10.1080/10618600.2021.1979564
Semiparametric Spatial Autoregressive Panel Data Model with Fixed Effects and
Time-Varying Coefficients
Xuan Lianga , Jiti Gaob , and Xiaodong Gongc
a
Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, Australia; b Monash University, Melbourne,
Australia; c University of Canberra and IZA, Canberra, Australia
ABSTRACT ARTICLE HISTORY

This article considers a semiparametric spatial autoregressive (SAR) panel data model with fixed effects Received November 2019
and time-varying coefficients. The time-varying coefficients are allowed to follow unknown functions Accepted September 2021
of time, while the other parameters are assumed to be unknown constants. We propose a local linear
quasi-maximum likelihood estimation method to obtain consistent estimators for the SAR coefficient, the KEYWORDS
variance of the error term, and the nonparametric time-varying coefficients. The asymptotic properties of Concentrated
quasi-maximum likelihood
the proposed estimators are also established. Monte Carlo simulations are conducted to evaluate the finite estimation; Local linear
sample performance of our proposed method. We apply the proposed model to study labor compensation estimation; Time-varying
in Chinese cities. The results show significant spatial dependence among cities and the impacts of capital, coefficient
investment, and the economy’s structure on labor compensation change over time.
1. Introduction found in Yu, De Jong, and Lee (2008), Lee and Yu (2014) and
Li (2017).
Panel data analysis has been widely used in many fields of
A common feature of the aforementioned models is that they
social sciences as it usually enables strong identification and
are fully parametric with a linear form in regressors, which
increases estimation efficiency. A comprehensive review about
may lead to model misspecification. To enhance model flexi-
these methodologies can be found in Arellano (2003), Baltagi
bility, nonparametric and semiparametric spatial econometric
(2008), and Hsiao (2014). To model spatial dependence, spa-
models have been studied in the literature. Su and Jin (2010)
tial econometric models have drawn a lot of attention as they
provided a way to model the cross-sectional dependence with a considered a partially linear SAR model. Su (2012) proposed an
clear structure and intuitive interpretations. SAR model with a nonparametric regressor term. Functional-
A class of spatial autoregressive (SAR) models was first pro- coefficient SAR models are also studied in Sun (2016) and
posed in Cliff and Ord (1973). Since then, spatial econometrics Malikov and Sun (2017). These studies are all about cross-
has become an active research area in econometrics. One issue sectional data. In terms of nonparametric and semi-parametric
with spatial econometric models is that the spatial lag term is spatial econometric models for panel data, Zhang and Shen
endogenous. Various estimation methods have been proposed (2015) considered a partially linear SAR panel data model with
to deal with this issue, for example, the instrumental vari- functional coefficients and random effects. Sun and Malikov’s
able (IV) method (Kelejian and Prucha 1998), the generalized (2018) study a functional-coefficient SAR panel data model with
method of moments (GMM) framework (Kelejian and Prucha fixed effects. It is worth noting that they focus on the case of
1999) and the quasi-maximum likelihood (QML) method (Lee large N and finite T. In addition, the functional coefficients
2004). More details of spatial econometrics can be found in in these spatial models are mostly permitted to be unknown
classic spatial econometrics books, for example, Anselin, Florax, smooth functions of some exogenous variables. Sometimes,
and Rey (2004) and LeSage and Pace (2009). As more temporal finding such appropriate exogenous variables in practice is chal-
data became available, spatial panel data models have received lenging.
considerable attentions. Spatial panel data models with SAR dis- It has been noted that especially when the time span of data
turbances are considered in Baltagi, Song, and Koh (2003) and is long, coefficients of covariates are likely to change over time
Kapoor, Kelejian, and Prucha (2007). Fingleton (2008) studies in many real examples (see discussions in Cai 2007; Silvapulle
a spatial panel data model with a SAR-dependent variable and et al. 2017). The reasons for such changes may include changes
a spatial moving average-disturbance. Lee and Yu (2010) focus in the economic structure or environment, policy reform, or
on a spatial panel model with individual fixed effects. More technology development. To accommodate such cases, time-
recent studies on spatial dynamic panel data models can be varying coefficient models have been studied extensively in the
CONTACT Xuan Liang xuan.liang@anu.edu.au Research School of Finance, Actuarial Studies and Statistics, The Australian National University, Canberra, ACT
2600, Australia.
Supplementary materials for this article are available online. Please go to www.tandfonline.com/UBES.
© 2021 American Statistical Association
2 X. LIANG, J. GAO, AND X. GONG
panel data setting, where the regressor coefficients are allowed to reforms and dramatic changes in the economy. Consistent with
be unknown smooth functions of time (e.g., Li, Chen, and Gao our conjecture, the estimated effects show quite strong time-
2011; Chen, Gao, and Li 2012; Robinson 2012). One advantage varying features.
of time-varying coefficient models is that the time variable can
The rest of the article is organized as follows. Section 2 dis-
be self-explanatory and naturally captures the nonlinear time
cusses the model setting and the estimation procedure. Section 3
variation in the coefficients. To the best of our knowledge, time-
lays out the assumptions. Asymptotic properties of the proposed
varying coefficient models have not been studied in spatial
estimators are established in Section 4. We report the results of
econometrics.
Monte Carlo simulations and of the empirical application in Sec-
In this article, we propose a semiparametric time-varying
tions 5 and 6, respectively. In Section 7, we conclude. Appendix
coefficient spatial panel data model with fixed effects for large N
A provides the justification of the identification condition and
and large T. Specifically, the coefficient of the spatial lag term in
then gives the proofs of the main theorems. Technical lemmas
the model is assumed to be constant over time, while the regres-
and their proofs as well as additional numerical results are given
sor coefficients vary with time. In addition, the regressors can be
in Appendices B–D of the supplementary material.
trending nonstationary. To obtain consistent estimators for both
the parametric components and nonparametric time-varying
coefficients, we propose a local linear concentrated quasi- 2. Model Setting and Estimation
maximum likelihood estimation (LLQML) method. When the
time-varying coefficients are constant and the regressors are 2.1. Model
stationary, our model reduces to a fully parametric SAR panel The model we consider in this article takes the following form:
data model that has been considered in Lee and Yu (2010).

Our model allows only the coefficients of the explanatory Yit = ρ0 wij Yjt + Xit β 0,t + α0,i + eit ,
variables to be time-varying. A more general setting with a j=i
nonparametric coefficient for the spatial lag term would be less
t = 1, . . . , T, i = 1, . . . , N, (1)
restrictive. The model studied in Sun and Malikov (2018) has
a functional-coefficient for the spatial lag term but for fixed where Yit is the response of location i at time t, Xit =
T. We would like to leave more general models for future (Xit1 , . . . , Xitd ) is the d-dimensional covariate, β 0,t =
research. (β0,t1 , . . . , β0,td ) is the corresponding d-dimensional time-
Our contributions in this article are summarized as follows. varying coefficient, α0,i reflects the unobserved individual fixed
effect, the error component eit has mean zero and variance σ02 ,
(i) We propose a semiparametric time-varying coefficient and T and N are the time length and the number of spatial
spatial panel data model. This model is suitable for panel data units, respectively. In this model, wij describes the spatial
with spatial interaction and time-varying features, as it com- weight of location j to i, which can be a decreasing function
bines the strengths from different models, including strong of spatial distance between i and j. The scalar parameter ρ0 is a
identification of panel data models, clear interpretation of cross- measure of the strength of spatial dependence. Hence, the term
sectional dependence in spatial econometric models, and the ρ0 wij Yjt captures the spatial interaction and Xit β 0,t measures
flexibility of time-varying coefficient models. In the existing lit- the covariate effects over time.
erature of spatial econometrics, the regressors are often assumed When β 0,t does not vary over time, it reduces to a vector
to be nonstochastic (see, e.g., Lee and Yu 2010; Su and Jin of constants. In this case, model (1) reduces to the traditional
2010). We relax such assumptions so that the regressors can be SAR panel data model discussed in Lee and Yu (2010). If only
trending non-stationary, which renders our model and method some components of β 0,t change over time, model (1) takes
of estimation more general and practically useful. the form of a partially time-varying spatial panel data model,
(ii) Since the model consists of both unknown parametric which means that a few covariates have effects changing over
and nonparametric components, we propose the LLQML time while the effects of other covariates stay constant. In this
method to consistently estimate the parameters and the article, we assume that β 0,t is fully nonparametric and follows
unknown time-varying functions by incorporating the local the specification
linear estimation (Fan and Gijbels 1996) into the QML
estimation. We also establish the consistency and asymptotic β 0,t = β 0 (τt ), t = 1, . . . , T, (2)
normality of the proposed estimators.
(iii) We evaluate the finite-sample performance of our pro- where β 0 (·) is a d-dimensional vector of unknown smooth
posed model under several scenarios. We find that our estima- functions defined on Rd and τt = t/T ∈ (0, 1]. The same
tors are consistent and robust against different model specifica- specification is used in Li, Chen, and Gao (2011) and Chen, Gao,
tions, including not only time-varying models or non-stationary and Li (2012). The reason to rescale time onto the interval (0,1]
covariates, but also time-invariant models and stationary covari- is for convenience when estimating the model with the kernel
ates. The results also show that if the time-varying coefficients method.
are misspecified as constants, it would lead to severely inconsis- For the purpose of identifying β 0 (τt ) when the constant 1 is
tent estimation. included in Xit , the individual fixed effects are assumed to satisfy
N
(iv) As an empirical application of our model, we analyze i=1 α0,i = 0. Such a condition is standard in the literature, for
the time-varying effects of factors on labor compensation in example, Su and Ullah (2006) and Chen, Gao, and Li (2012) with
urban China over 1995–2009, a period that has seen continuous its justification provided in Appendix A.1.
JOURNAL OF BUSINESS & ECONOMIC STATISTICS 3
Let 0n and 1n be the vectors with n elements of zeros and where

ones, respectively. Let 0m1 ×m2 denote an m1 × m2 matrix with
all zero elements and let Im denote an m-dimensional identity
T
τt − τ τt − τ
matrix. Define an N × N spatial weight matrix W = (wij )N×N L(a, b) = K SN (ρ)Yt − D0 α − Xt a − Xt b
h h
t=1
with zero diagonal elements, that is, wii = 0, an N × (N − 1)
τt − τ
matrix D0 = (−1N−1 , IN−1 ) . A clear matrix form of model (1) × SN (ρ)Yt − D0 α − Xt a − Xt b .
h
can be written as
Define an NT-dimensional vector Y = (Y1 , · · · , YT ) and an
Yt = ρ0 WYt + Xt β 0 (τt ) + D0 α 0 + et , t = 1, . . . , T, (3) NT × NT matrix SN,T (ρ) = IT ⊗ SN (ρ), where ⊗ denotes the
Kronecker product. Further denote an NT-dimensional vector
where Yt = (Y1t , . . . , YNt ) , Xt = (X1t , . . . , XNt ) , α 0 =
Y ∗ (ρ) = SN,T (ρ)Y and an NT × (N − 1) matrix D = 1T ⊗ D0 .
(α0,2 , . . . , α0,N ) and et = (e1t , . . . , eNt ) . Define an N × N
Function L(a, b) can be rewritten as
matrix SN (ρ) = IN − ρW. Model (3) can further be written as

SN (ρ0 )Yt = Xt β 0 (τt ) + D0 α 0 + et . (4) L(a, b) = Y ∗ (ρ) − Dα − M(τ )(a , b ) (τ )

In (4), we move the spatial lag term (ρ0 WYt ) to the left side so × Y ∗ (ρ) − Dα − M(τ )(a , b ) ,
that SN (ρ0 )Yt would be regarded as the new response variable as
if ρ0 were known. The goal is to construct consistent estimators where the NT × 2d matrix M(τ ) and the NT × NT matrix (τ )
for the spatial coefficient ρ0 , the variance σ02 , and the unknown are defined as follows:
⎛ ⎞
time-varying coefficient function β 0 (τ ). X1 τ1h−τ X1
⎜ ⎟
M(τ ) = ⎝ ... ..
. ⎠ and
2.2. Estimation XT τTh−τ XT
⎛ τ1 −τ ⎞
The joint quasi log-likelihood function of model (4) can be K h IN
written as ⎜ .. ⎟
(τ ) = ⎝ . ⎠.
τT −τ
NT K IN
logLN,T ρ, σ 2 , α, β(·) = − log(2π σ 2 ) + Tlog|SN (ρ)| h
2
Taking the first derivative of L(a, b) with respect to (a , b )
1
T
− 2 U Ut , (5) and equating it to zero, we obtain
2σ t=1 t

β ρ,α (τ ) −1

= M (τ )(τ )M(τ ) M (τ )(τ )
where Ut = SN (ρ)Yt − D0 α − Xt β(τt ) and β(·) is a smooth hβ ρ,α (τ )
functions on Rd . If β(τ ) is a vector of constants, the model (4)
becomes fully parametric so that the traditional QML method × Y ∗ (ρ) − Dα .
based on Equation (5) can be used to estimate the parameters
Denoting a d × NT matrix (τ ) = (Id , 0d×d ) M (τ )(τ )
(see Lee 2004; Lee and Yu 2010 for more details). In the presence
M(τ )}−1 M (τ )(τ ), the estimator of the time-varying coeffi-
of the nonparametric time-varying component β(τ ) in Equa-
cient β 0 (·) can be expressed by
tion (5), the traditional QML would fail. Motivated by Su and
Ullah (2006) and Su and Jin (2010), we propose the LLQML
β ρ,α (τ ) = (τ ){Y ∗ (ρ) − Dα}. (6)
method, which is a two-step procedure: (i) Estimate β 0 (τ ) for
fixed ρ and α by the local linear kernel method (Fan and Gijbels Step 2. In this step, we plug β ρ,α (τ ) into the original log-
1996) and denote it as β ρ,α (τ ); (ii) Plug
β ρ,α (τ ) into (5), and likelihood (5) and estimate ρ0 and σ02 by maximizing the quasi
obtain the QML estimators ρ σ and
, 2 α . With ρ and α estimated, log-likelihood function:
the estimator of β 0 (τ ) can then be updated by α (τ ). To be
β ρ,
logLN,T (ρ, σ 2 , α)
more specific:
NT
Step 1. Assume that β 0 (·) has continuous derivatives of up to =− log(2π σ 2 ) + Tlog|SN (ρ)|
2
the second order. Let K(·) and h be the kernel function and
1
T

the smoothing bandwidth, respectively. Applying the Taylor − 2 SN (ρ)Yt − Xt β ρ,α (τt ) − D0 α
expansion to β 0 (τt ) at τ such that |τt − τ | = o(1), 2σ t=1
we obtain
β 0 (τt ) = β 0 (τ ) + β 0 (τ )(τt − τ ) + O (τt − τ )2 , where β 0 (·) × SN (ρ)Yt − Xt β ρ,α (τt ) − D0 α
is the first derivative of β 0 (·) and τ ∈ (0, 1]. We also have that NT
Xt β 0 (τt ) ≈ Xt β 0 (τ ) + τt −τ
h X t hβ 0 (τ ). According to the local =− log(2π σ 2 ) + Tlog|SN (ρ)|
linear kernel method, the estimators of β 0 (τ ) and hβ 0 (τ ) for 2

given (ρ, α) can be obtained by minimizing the weighted loss 1
− 2 Ỹ(ρ) − D̃α Ỹ(ρ) − D̃α , (7)
function L(a, b) with respect to (a , b ) , that is, 2σ
where Ỹ(ρ) = (INT − S)Y ∗ (ρ) and D̃ = (INT − S)D are the

β ρ,α (τ )
= arg min L(a, b), smoothing versions of Y ∗ (ρ) and D by the NT × NT matrix
hβ ρ,α (τ ) (a ,b ) S = X̃ ˜ , in which the NT × dT matrix X̃ is a diagonal block
matrix with Xt being its tth diagonal block, and dT × NT matrix (ii) Denote vt = (v1t , . . . , vNt ) . Suppose that {vt , t ≥ 1} is a
˜ = ( (τ1 ) , . . . , (τT ) ) . Taking the derivative of (7) with strictly stationary sequence with mean zero and α-mixing
respect to α and setting it to be zero, we have with mixing coefficient αmix,N (t), and that there exists a
function αmix (t) and a constant δ such that αmix,N (t) ≤
α (ρ) = (D̃ D̃)−1 D̃ Ỹ(ρ).
αmix (t) and ∞ δ/(4+δ) < ∞ for some δ > 0.
t=1 αmix (t)
(iii) Let vit be identically distributed in index i for any given t.
Define an NT × NT matrix QN,T = INT − D̃(D̃ D̃)−1 D̃ .
In addition, we assume E|vitk |4+δ < ∞ for k = 1, . . . , d
Plugging
α (ρ) into (7) leads to
and let E(vit vit ) = v = (σv 1 2 )d×d where σv 1 2 =
(k ,k ) (k ,k )
NT E(vitk1 vitk2 ).
log LN,T (ρ, σ 2 ) = − log(2π σ 2 ) + Tlog|SN (ρ)|
2
1 Remark: Assumption 1 is a list of assumptions about the d-
− 2 Ỹ (ρ)QN,T Ỹ(ρ). (8) dimensional explanatory variable Xit .
2σ
Assumption 1(i) assumes that the time trend g(τ ) is con-
Then, taking the derivative of Equation (8) with respect to σ 2 tinuous, which is a standard assumption to model the trend in
and equating it to zero, we have the estimator of σ02 as the Xit . With this structure, the regressors can be either stationary
following function of ρ: or non-stationary over time. Specially, if g(τt ) reduces to a
constant vector, it covers the case with stationary Xit . Otherwise,
1
σ 2 (ρ) =
Ỹ (ρ)QN,T Ỹ(ρ). Xit is generally nonstationary. By assuming this, we take the
NT nonstationarity of Xit into account when we derive the the-
Replacing σ 2 with σ 2 (ρ) in Equation (8), we obtain the concen- oretical properties of the estimators. The reason why g(τ ) is
trated quasi log-likelihood function: defined over (0, 1] is to scale the time domain to a bounded
set, for the same reason as for β 0 (τ ). Note that g(τ ) here can be
NT further generalized to allow for an individual time trend gi (τ ).
log LN,T (ρ) = − log(2π ) + 1
2 To make theoretical derivations less complicated, we consider
NT 1 the homogeneous trend. The trend g(τ ) can be estimated by
− log Ỹ (ρ)QN,T Ỹ(ρ) T
K( τt h−τ )Xit
2 NT g(τ ) = N1 N
i=1 gi (τ ) = t=1
gi (τ ), where T τt −τ .
+ Tlog|SN (ρ)|. t=1 K( h )
To allow for serial dependence in vt , we impose the sta-
Therefore, we estimate the parameters θ 0 = (ρ0 , σ02 ) and α 0 tionarity and α-mixingness of vt in Assumption 1(ii) (see, e.g.,
by
θ = ( σ 2 ) and
ρ, α as follows: Fan and Yao 2003; Gao 2007). Since vt consists of N vectors
vit for i = 1, . . . , N, its α-mixing coefficient depends on N
1 and hence is denoted as αmix,N (t). We further assume that
= arg max logLN,T (ρ),
ρ σ2 =
ρ )QN,T Ỹ(
Ỹ ( ρ ),
ρ NT there exists an upper bound αmix (t). Similar assumption can be
α = (D̃ D̃)−1 D̃ Ỹ(
ρ ). found
∞ in Chen, Gao, and Li (2012). Moreover, the assumption
δ/(4+δ) < ∞ is commonly used in the literature;
t=1 αmix (t)
Finally, the updated estimator of β 0 (τ ) is obtained by plugging see, for example, Dou, Parrella, and Yao (2016). This assumption
and
ρ α into Equation (6): is weaker than the exponentially decaying α-mixing coefficient
αmix (t) = cα ψ t for 0 < cα < ∞ and 0 < ψ < 1; see, for exam-

β(τ ) = (τ ){Y ∗ (
ρ ) − D
α }. (9) ple, Chen, Gao, and Li (2012) and Chen, Li, and Linton (2019).
It is worth noting that we only assume vit to be identically
In order to establish asymptotic properties for the proposed
distributed in index i in Assumption 1(iii), which is weaker than
estimators, we need to introduce the following assumptions.
the iid assumption for covariates in Sun and Malikov (2018). It
ensures that the cross-sectional dependence of vit in the vector
3. Model Assumptions vt can be allowed.
Meanwhile, it is allowed the constant 1 term to be included
In this section, we lay out the assumptions for our model. in Xit . When g1 (τ ) reduces to constant 1 and vit1 degenerates to
Denote ||a||s = ( ni=1 |ai |s )1/s as the s-norm (s ≥ 1) for vit1 ≡ 0, Xit1 ≡ 1 is exactly the constant 1 term.
any generic vector a = (a1 , · · · , an ) . For any generic m ×
m matrix A = (aij )m×m , define the diagonal vector of A as Assumption 2. The error term {et = (e1t , . . . , eNt ) : t ≥ 1} is
m
diag(A) = (a11 , . . . , amm ) , ||A||1 = max i=1 |aij | and a stationary process such that
1≤j≤m
m
||A||∞ = max j=1 |aij | as the row sum norm and column (i) for some δ > 0, supi≥1 E(|eit |4+δ ) < ∞;
1≤i≤m
sum norm, respectively. (ii) E(et |Et−1 ) = 0N and E(et e t |Et−1 ) = σ0 IN , where Et−1 =
2
FV ∨ σ e1 , . . . , et−1 , is the σ -field generated by FV ∪

Assumption 1. Let the d-dimensional vector Xit = g(τt ) + vit e1 , . . . , et−1 and FV = σ {vit : i ≥ 1, t ≥ 1} is the
contain a deterministic time trend g(τ ) = (g1 (τ ), . . . , gd (τ )) σ -field generated by {vit : i ≥ 1, t ≥ 1};
and a random component vit = (vit1 , . . . , vitd ) . (iii) Given Et−1 , ei = (ei1 , . . . , eiT ) is a vector of conditionally
j j
(i) Suppose that g(τ ) is a continuous function for any independent random errors with E(eit |Et−1 ) = E(eit ) =
0 < τ ≤ 1. mj , almost surely for j = 3 and 4.
Remark: Assumption 2 summarizes the conditions on the error degree. To save space, we refer readers to Kelejian and Prucha
term. Assumption 2(ii) ensures that the first and second condi- (2001) for more discussions.
tional moments of et given Et−1 do not depend on the informa-
tion generated by {vit : i ≥ 1, t ≥ 1}. In particular, Assump- Assumption 6. The time-varying coefficient β 0 (·) has continu-
tion 2(ii) implies E(et |FV ) = E(E(et |Et−1 )|FV ) = 0N . Note ous derivatives of up to the second order.
that if {et } and {vit } are independent of each other, this result nat-
urally holds for the mean zero error vector et . Assumption 2(iii) Assumption 7. The fixed effects satisfy that ||D0 α 0 ||1 < ∞.
assumes the conditional independence of eit (i = 1, . . . , N)
given Et−1 as well as its conditional moments. A sufficient con- Remark: Assumption 6 is a mild condition on the smoothness
dition for Assumption 2(ii) and (iii) is that eit are independent of the functions which is required by the local linear fitting
in both i and t (e.g., see Assumption 2 of Yu, De Jong, and Lee procedure. Such an assumption is common for nonparametric
2008) and {eit } is independent of FV . It is worth noting that estimation methods, for example, Li and Racine (2007, condit.
the conditional independence of eit in Assumption 2 (iii) along 2.1), Gao (2007, assump. 2.7) and Assumption A3 of Chen, Gao,
with Assumption 2 (ii) can help form a martingale difference and Li (2012). Assumption 7 guarantees the uniform bounded-
array in both i and t in the theoretical derivations; see, for ness of the sum of absolute fixed effects.
example, the proof of Theorem 2 in Appendix A.2. Further, this
technique of the proof can be adapted to model (3) if a cross- To proceed, we need to introduce the following notations.
sectional dependent random structure is specified. For example, Let SN = SN (ρ0 ), SN,T = SN,T (ρ0 ), GN (ρ) = WS−1 N (ρ),
we can still impose Assumption 2 but we replace et in model GN = GN (ρ0 ), GN,T = IT ⊗ GN , PN,T = (INT −
(3) by a cross-sectional dependent random error εt = Let , S) QN,T (INT − S) and RN,T = GN,T (X̃ β̃ 0 + Dα 0 ) where

where L is a non-stochastic matrix and E(εt ε 2
t |Et−1 ) = σ0 LL β0 = β
0 (τ1 ), · · · , β 0 (τT ) .
can measure the cross-sectional dependence. If we assume that
Assumption 8. = limN,T→∞ 1
L is uniformly bounded in both row and column sums in R,R NT E(RN,T PN,T RN,T ) > 0.
absolute value (analogously to Assumption 4), similar theoret-
ical results can be established but more technical derivations Remark: Assumption 8 is a condition for the identification
are involved. of ρ0 , which is similar to Lee (2004, assump. 8), Lee and
Yu (2010, assump. 4), and Su and Jin (2010, assump. 7). To
1
Assumption 3. The kernel function K(·) is a continuous and interpret Assumption 8, we rewrite R,R = limN,T→∞ NT
symmetric probability density function with compact support. E(R∗ ∗ ∗
N,T QN,T RN,T ), where RN,T = (INT − S)RN,T . Similar to
The bandwidth is assumed to satisfy h → 0 as min(N, T) → ∞, the explanation of Assumption 7 in Su and Jin (2010), R∗N,T can
Th → ∞ and NTh8 → 0. be interpreted as the remainder of RN,T deviated from the time-
varying projection onto Xit β(τt ). In practice, one may use the
Remark: Assumption 3 first imposes the conditions on the ker- estimate of R,R to check this assumption.
nel function, which is common in the literature; see, for exam-
ple, Chen, Gao, and Li (2012). Conditions on the bandwidth h
along with T and N are also considered; see similar conditions 4. Asymptotic Properties
in Assumption A5 of Chen, Gao, and Li (2012). Asymptotic consistency of θ = (
ρ, σ 2 ) to θ 0 = (ρ0 , σ02 )
Assumption 4. To be more specific on the dimension of the is established in Theorem 1. The asymptotic distributions of θ
N × N spatial weight matrix W, we denote W as WN in this and β(τ ) are provided in Theorems 2 and 3. The proofs of these
assumption. We assume WN is a non-stochastic matrix with zero theorems are given in Appendix A.2.
diagonals and is uniformly bounded in both row sum norm and
Theorem 1. Under Assumptions 1–8, θ 0 is globally identifiable
column sum norm (for short, UB), that is, supN≥1 ||WN ||1 < ∞
and
θ is consistent to θ 0 .
and supN≥1 ||WN ||∞ < ∞.
Denote c1 = limN,T→∞ tr(G2N,T + G N,T GN,T )/NT, c2 =
Assumption 5. SN (ρ) is invertible for all ρ ∈ , where is
limN,T→∞ tr(GN,T )/NT where the existence proofs of the limits
a compact interval with the true value ρ0 as an interior point.
are shown in Lemma C.7 of the supplementary material.
Also, SN (ρ) and S−1
N (ρ) are both UB, uniformly in ρ ∈ .
Theorem 2. Under Assumptions 1–8, as T → ∞ and N → ∞
Remark: Assumptions 4 and 5 are standard assumptions orig-
simultaneously, then
inated from Kelejian and Prucha (1998, 2001) and also used
√ d
in Lee (2004). When W is row-normalized, a compact subset
NT θ − θ 0 → N 02 , θ−1 + −1
θ0 θ
−1
0 θ0 , (10)
of (−1, 1) has often been taken as the parameter space for ρ. 0
The UB conditions limit the spatial correlation to a manageable where θ 0 = limN,T→∞ NT,θ 0 with NT,θ 0 being defined by
⎛ ⎞
2m3 E(R
N,T PN,T diag(PN,T GN,T )) (m4 −3σ04 )E( NT 2
i=1 (gp)ii ) m3 E(RN,T PN,T diag(PN,T )) (m4 −3σ04 )E( NT
i=1 (gp)ii pii )
⎜ NTσ04
+ NTσ04
+ ⎟
2NTσ06 2NTσ06
NT,θ 0 = ⎝ m3 E(R PN,T diag(P 4 )E(NT (gp) p ) −3σ 4 )E(NT p2 ) ⎠,
N,T )) (m −3σ (m
N,T
6 + 4 0 i=1
6
ii ii 4 0
8
4σ0 NT
i=1 ii
2NTσ0 2NTσ0
in which pii and (gp)ii are the ith main diagonal

1 elements ofcP
N,T in stationarity of the covariates, variation in time of the coeffi-
2
σ0 R,R + c 1
2
σ02 cients, and the degree of spatial dependence.
and PN,T GN,T , respectively, and θ 0 = c2 1 The simulated data are generated from the following
σ02 2σ04 model:
is positive definite as shown in Lemma C.9.
Yt = ρ0 WYt + Xt β 0 (τt ) + D0 α 0 + et , t = 1, . . . , T.
Since we use the QML method to estimate θ 0 , it relaxes
the normality assumption on the error term but adds an addi- The data generating process for our simulation is summarized
tional term to the variance that is a function of the error term’s below. First, the spatial matrix W in the data generating process
third and fourth moments. If the third and fourth moments is chosen as a “q step head and q step behind” spatial weights
are satisfied with m3 = 0 and m4 = 3σ02 , the asymptotic matrix as in Kelejian and Prucha (1999) with q = 2 in this
covariance matrix in Equation (10) reduces to θ−1 , as shown section. The procedure is as follows: all the units are arranged in
in the following proposition.
0 a circle and each unit is affected only by the q units immediately
before it and immediately after it with the weight being 1. Then
Proposition 1. Let Assumptions 1–8 hold. Then as T → ∞ and following Kelejian and Prucha (1999), we normalize the spatial
N → ∞ simultaneously weights matrix by letting the sum of each row equal to 1 so that
it generates an equal weight influence from all the neighboring
√ d
units to each unit. Then, the regressor is set to be Xit =
NT
θ − θ 0 → N 02 , −1
θ0 (1, Xit2 ) where Xit2 = g(τt ) + vit2 . The component vit2 is the
i-th element of an N-dimensional vector vt generated by vt =
when {eit , i ≥ 1, t ≥ 1} is independent and identically normally
0.2vt−1 + N(0N , ∗ ) with ∗ = (0.5|i−j| )N×N for −99 ≤ t ≤ T
distributed with θ 0 = 02×2 due to m3 = 0 and m4 = 3σ04 .
and v−100 = 0N . It is obvious that {vit2 } is both serially and
cross-sectionally dependent. The error term eit is independent
Define μj = uj K(u)du and νj = uj K 2 (u)du. Let β 0 (τ )
and identically generated from the distribution of N(0, 1) so
be the second derivative of β 0 (τ ). The asymptotic distribution
that σ02 = 1. The fixed effects follow α0,i = T −1 Tt=1 vit2 for
of
β(τ ) is established in the following theorem.
i = 1, . . . , N − 1 and α0,1 = − N i=2 α0,i . The time-varying
Theorem 3. Let Assumptions 1–8 hold. As T → ∞ and N → coefficient vector is set to be β 0 (τ ) = (β0,1 (τ ), β0,2 (τ )) where
∞ simultaneously, we have β0,1 (τ ) and β0,1 (τ ) represent the time-varying coefficients asso-
√ ciated with the constant 1 and Xit2 in Xit , respectively. Various

NTh β(τ ) − β 0 (τ ) − bβ (τ )h2 + oP (h2 ) simulation settings are defined by changing the specification
of g(τ ), β 0 (τ ) and ρ0 . Specifically, we consider the following
d −1
→ N 0d , σ02 ν0 X (τ ) , (11) scenarios:
provided that X (τ ) is positive definite for each given τ in (0, 1], • Set I (Setting of g(τ )): (I-1) g(τ ) = 0; (I-2) g(τ ) = 1 and

where bβ (τ ) = 12 μ2 β 0 (τ ) and X (τ ) = g(τ )g(τ ) + v . (I-3) g(τ ) = 2 sin(π τ );
• Set II (Setting of β 0 (τ )): (II-1) β 0 (τ ) = (1, 1) ; (II-2)
√ β 0 (τ ) = (1, 1 + 2τ + 2τ 2 ) , (II-3) β 0 (τ ) = (1 + 3τ , 1 +
Thus, the rate of convergence of β(τ ) is NTh, which is
standard in the nonparametric estimation. It is also clear that the 2τ + 2τ 2 ) ;
covariance matrix is related to g(τ ) since it involves the trend of • Set III (Setting of spatial coefficient): (III-1) ρ0 = 0.3, (III-2)
Xit . When Xit is stationary, the asymptotic covariance matrix in ρ0 = 0.7.
(11) reduces to a constant matrix σ02 ν0 (μX μ X + v)
−1 where
Each of these sets (and combinations of them) will generate data
μX = E(Xit ).
of (i) covariates of different stationarity (Set I): in Sets I-1 and I-2
One can use the following sample version to estimate the
Xit2 is stationary and in Set I-3 Xit2 is non-stationary; (ii) coeffi-
unknown
covariance matrices involved: θ 0 =
1
c2
cient β 0 (τ ) with different time-varying features (Set II): from
σ 2 R,R

+c 1
σ2 θ 0 = NT,θ 0 and X (τ ) = Set II-1 to Set II-3, β 0 (τ ) changes from time-invariant, par-

c2 1 ,

σ2 σ4
2 tially time-varying to fully time-varying respectively; and (iii)

g(τ )g(τ ) + v , where R,R = (NT)−1 R N,T PN,T RN,T ,
different SAR coefficient or spatial dependence among cross-
c1 = tr(GN,T + GN,T GN,T )/NT,
2 c2 = tr(GN,T )/NT, g(τ ) = sectional units (Set III). For each scenario, simulations are con-
ducted on 1000 replications. The Epanechnikov kernel K(u) =
τt −τ
K X
, v = (NT)−1
it
vit and 3/4(1−u2 )I(|u| ≤ 1) is used where I(·) is the indicator function.
i,t
vit = Xit −
h
i,t
vit
g(τt ).
τt −τ
K
i,t h The bandwidth is selected through a leave-one-unit-out cross-
The consistency of these sample estimators is shown in Lemma validation method explained in Appendix A.3.
C.10 in the supplementary material. The simulated data are first estimated by our proposed model
and estimation method, and then estimated by a standard time-
invariant spatial panel data model considered in Lee and Yu
5. Monte Carlo Simulations
(2010) and their proposed estimation. In short, we call it the
We now conduct a number of simulations to evaluate the finite “Lee-Yu model”. Tables 1 and 2 report the means and standard
sample performance and the robustness of our proposed model deviations (SDs) (in parentheses) of the bias for the estimates
and estimation under a rich set of scenarios, which are different of our model for ρ0 and σ02 under different settings of g(τ ) and
(ρ0 = 0.3, σ02 = 1).

Table 1. Means and standard deviations of bias of ρ
(a) Our model

(II-1) (II-2) (II-3)
N=10 N=15 N=30 N=10 N=15 N=30 N=10 N=15 N=30
(I-1) T=10 −0.0662 −0.0493 −0.0208 −0.0166 −0.0140 −0.0053 −0.0131 −0.0122 −0.0046
(0.1123) (0.0837) (0.0563) (0.0533) (0.0429) (0.0275) (0.0511) (0.0425) (0.0272)
T=15 −0.0403 −0.0293 −0.0131 −0.0097 −0.0071 −0.0030 −0.0078 −0.0060 −0.0026
(0.0861) (0.0678) (0.0451) (0.0428) (0.0333) (0.0226) (0.0423) (0.0332) (0.0225)
T=30 −0.0244 −0.0162 −0.0074 −0.0056 −0.0029 −0.0011 −0.0051 −0.0025 −0.0010
(0.0573) (0.0465) (0.0310) (0.0267) (0.0224) (0.0151) (0.0264) (0.0224) (0.0150)
(I-2) T=10 −0.0662 −0.0493 −0.0208 −0.0099 −0.0095 −0.0028 −0.0085 −0.0084 −0.0022
(0.1123) (0.0837) (0.0563) (0.0516) (0.0427) (0.0273) (0.0513) (0.0432) (0.0275)
T=15 −0.0403 −0.0293 −0.0131 −0.0055 −0.0040 −0.0009 −0.0051 −0.0033 −0.0003
(0.0861) (0.0678) (0.0451) (0.0428) (0.0336) (0.0228) (0.0427) (0.0335) (0.0230)
T=30 −0.0244 −0.0162 −0.0074 −0.0032 −0.0009 0.0004 −0.0026 −0.0004 0.0008
(0.0573) (0.0465) (0.0310) (0.0267) (0.0224) (0.0151) (0.0269) (0.0225) (0.0152)
(I-3) T=10 −0.0663 −0.0500 −0.0222 −0.0223 −0.0199 −0.0126 −0.0201 −0.0183 −0.0109
(0.1060) (0.0805) (0.0547) (0.0445) (0.0369) (0.0248) (0.0447) (0.0375) (0.0252)
T=15 −0.0409 −0.0301 −0.0144 −0.0162 −0.0148 −0.0111 −0.0145 −0.0129 −0.0086
(0.0847) (0.0665) (0.0435) (0.0373) (0.0297) (0.0207) (0.0378) (0.0303) (0.0212)
T=30 −0.0234 −0.0170 −0.0082 −0.0127 −0.0111 −0.0085 −0.0112 −0.0087 −0.0061
(0.0561) (0.0452) (0.0302) (0.0241) (0.0207) (0.0149) (0.0244) (0.0208) (0.0145)
(b) Lee-Yu model
(II-1) (II-2) (II-3)
N=10 N=15 N=30 N=10 N=15 N=30 N=10 N=15 N=30
(I-1) T=10 −0.0137 −0.0136 −0.0054 0.0768 0.0816 0.0878 0.1899 0.1913 0.2002
(0.0989) (0.0761) (0.0537) (0.0887) (0.0757) (0.0528) (0.0935) (0.0799) (0.0571)
T=15 −0.0073 −0.0064 −0.0026 0.0897 0.0929 0.0977 0.2018 0.2045 0.2083
(0.0808) (0.0648) (0.0437) (0.0758) (0.0616) (0.0437) (0.0809) (0.0644) (0.0467)
T=30 −0.0069 −0.0056 −0.0028 0.0982 0.1028 0.1027 0.2104 0.2125 0.2134
(0.0546) (0.0452) (0.0299) (0.0543) (0.0423) (0.0328) (0.0556) (0.0452) (0.0320)
(I-2) T=10 −0.0137 −0.0136 −0.0054 0.2608 0.2594 0.2634 0.4073 0.4054 0.4059
(0.0989) (0.0761) (0.0536) (0.0919) (0.0762) (0.0541) (0.0593) (0.0486) (0.0338)
T=15 −0.0074 −0.0064 −0.0026 0.2725 0.2716 0.2713 0.4168 0.4141 0.4121
(0.0808) (0.0648) (0.0437) (0.0773) (0.0605) (0.0440) (0.0479) (0.0391) (0.0280)
T=30 −0.0069 −0.0056 −0.0028 0.2807 0.2779 0.2755 0.4196 0.4169 0.4147
(0.0546) (0.0452) (0.0299) (0.0515) (0.0419) (0.0293) (0.0327) (0.0264) (0.0194)
(I-3) T=10 −0.0121 −0.0103 −0.0041 0.1732 0.1756 0.1861 0.3170 0.3191 0.3265
(0.0876) (0.0683) (0.0492) (0.0784) (0.0686) (0.0474) (0.0692) (0.0587) (0.0408)
T=15 −0.0051 −0.0066 −0.0040 0.1855 0.1865 0.1926 0.3271 0.3290 0.3331
(0.0742) (0.0593) (0.0392) (0.0652) (0.0551) (0.0394) (0.0586) (0.0479) (0.0347)
T=30 −0.0047 −0.0050 −0.0029 0.1902 0.1944 0.1971 0.3303 0.3340 0.3371
(0.0501) (0.0402) (0.0285) (0.0476) (0.0391) (0.0284) (0.0428) (0.0338) (0.0241)
β 0 (τ ), together with those of the Lee-Yu model (with ρ0 fixed invariant model, following the estimation of the Lee-Yu model
at 0.3, that is, Set III-1). A few comments can be made on the will lead to inconsistent estimates.
results. Third, comparing different data-generating processes, if the
First, our estimates of ρ0 and σ02 are consistent under all data are generated from a fully time-varying model (Set II-3),
settings as the means and SDs of the bias of ρ0 and σ02 are getting our estimates have the smallest biases and SDs, followed by a
smaller when either N or T is increasing. It shows the robustness partially linear model (Set II-2) and then a time-invariant model
of our model against different settings on the covariates and (Set II-1) given the setting of Xit2 . For example, when N = 15,
time-varying coefficients. T = 15 and Xit2 follows Set I-3, the means and SDs of biases
Second, if the data are generated by a time-invariant process of our estimates for ρ0 are −0.0301 (0.665), −0.0148 (0.0297),
(Set II-1), the estimates of ρ0 and σ02 from the Lee-Yu model −0.0086 (0.0212), respectively, from Set II-1 to Set II-3.
are consistent with smaller biases compared to ours. It makes We also evaluate our model by examining the finite-sample
sense as a time-invariant spatial panel date model is a special performance of the estimators for the two time-varying coeffi-
case of our model. However, when the coefficients of the covari- cients β0,1 (τ ) and β0,2 (τ ). In Table 3, the means and SDs (in
ates have time-varying features (Set II-2 and Set II-3) in the parentheses) of the mean squared errors (MSEs) of the estimates
data generating process, the estimates of ρ0 and σ02 from the of the time-varying coefficients β0,1 (τ ) and β0,2 (τ ) are reported,
Lee-Yu model are not consistent and exhibit large biases. For where for an estimator β k (τ ) (k = 1, 2), the MSE is defined by
example, under the combination of Set I-2 and Set II-2, the
1
T
2
biases are around 0.27 for ρ0 and 1.9 for σ02 . When there are MSE = βk (τt ) − β0,k (τt ) .
more coefficients having time-varying features, (e.g., from Set T t=1
II-2 to II-3), the biases become larger. These findings confirm The results show that both the means and SDs of MSE for β k (τ )
that when the time-varying model is misspecified as a time- (k = 1, 2) decrease when either N or T increases, confirming the
σ 2 (ρ0 = 0.3, σ02 = 1).

Table 2. Means and standard deviations of bias of
(a) Our model

(II-1) (II-2) (II-3)
N=10 N=15 N=30 N=10 N=15 N=30 N=10 N=15 N=30
(I-1) T=10 −0.1677 −0.1157 −0.0576 −0.1609 −0.1110 −0.0517 −0.1565 −0.1076 −0.0499
(0.1349) (0.1070) (0.0802) (0.1332) (0.1062) (0.0794) (0.1310) (0.1057) (0.0791)
T=15 −0.1392 −0.0963 −0.0479 −0.1349 −0.0907 −0.0422 −0.1317 −0.0885 −0.0414
(0.1058) (0.0933) (0.0661) (0.1052) (0.0932) (0.0656) (0.1042) (0.0927) (0.0656)
T=30 −0.1249 −0.0817 −0.0415 −0.1195 −0.0765 −0.0361 −0.1183 −0.0755 −0.0357
(0.0759) (0.0645) (0.0459) (0.0751) (0.0644) (0.0458) (0.0753) (0.0645) (0.0457)
(I-2) T=10 −0.1677 −0.1157 −0.0576 −0.1513 −0.1030 −0.0462 −0.1496 −0.1015 −0.0449
(0.1349) (0.1070) (0.0802) (0.1318) (0.1065) (0.0796) (0.1310) (0.1068) (0.0802)
T=15 −0.1392 −0.0963 −0.0479 −0.1283 −0.0852 −0.0375 −0.1276 −0.0831 −0.0360
(0.1058) (0.0933) (0.0661) (0.1046) (0.0929) (0.0660) (0.1047) (0.0935) (0.0660)
T=30 −0.1249 −0.0817 −0.0415 −0.1149 −0.0721 −0.0325 −0.1133 −0.0706 −0.0312
(0.0759) (0.0645) (0.0459) (0.0761) (0.0650) (0.0460) (0.0760) (0.0654) (0.0465)
(I-3) T=10 −0.1662 −0.1149 −0.0574 −0.1417 −0.0945 −0.0387 −0.1423 −0.0965 −0.0430
(0.1336) (0.1064) (0.0802) (0.1318) (0.1067) (0.0802) (0.1316) (0.1065) (0.0795)
T=15 −0.1388 −0.0953 −0.0473 −0.1174 −0.0756 −0.0308 −0.1203 −0.0794 −0.0363
(0.1050) (0.0933) (0.0657) (0.1044) (0.0932) (0.0654) (0.1046) (0.0931) (0.0655)
T=30 −0.1239 −0.0810 −0.0410 −0.1066 −0.0648 −0.0273 −0.1100 −0.0696 −0.0324
(0.0764) (0.0648) (0.0460) (0.0754) (0.0652) (0.0467) (0.0760) (0.0653) (0.0460)
(b) Lee-Yu model
(II-1) (II-2) (II-3)
N=10 N=15 N=30 N=10 N=15 N=30 N=10 N=15 N=30
(I-1) T=10 −0.0216 −0.0170 −0.0074 1.2871 1.3011 1.2988 1.6606 1.6689 1.6666
(0.1474) (0.1153) (0.0824) (0.4128) (0.3395) (0.2321) (0.5083) (0.4168) (0.2815)
T=15 −0.0083 −0.0072 −0.0030 1.3137 1.2993 1.2992 1.6963 1.6724 1.6611
(0.1175) (0.1003) (0.0681) (0.3459) (0.2689) (0.1943) (0.4357) (0.3400) (0.2388)
T=30 −0.0086 −0.0039 −0.0023 1.3162 1.3194 1.3020 1.6955 1.6872 1.6642
(0.0846) (0.0692) (0.0476) (0.2466) (0.1838) (0.1335) (0.2981) (0.2317) (0.1661)
(I-2) T=10 −0.0216 −0.0170 −0.0074 1.9066 1.9021 1.8800 2.3929 2.3879 2.3492
(0.1474) (0.1153) (0.0824) (0.5532) (0.4513) (0.3006) (0.6869) (0.5590) (0.3731)
T=15 −0.0083 −0.0072 −0.0030 1.9372 1.8994 1.8689 2.4061 2.3633 2.3205
(0.1175) (0.1003) (0.0681) (0.4755) (0.3698) (0.2546) (0.5856) (0.4605) (0.3123)
T=30 −0.0086 −0.0039 −0.0023 1.9383 1.9122 1.8717 2.3997 2.3658 2.3162
(0.0846) (0.0692) (0.0476) (0.3211) (0.2497) (0.1778) (0.3918) (0.3087) (0.2191)
(I-3) T=10 −0.0210 −0.0173 −0.0076 2.0818 2.0406 1.9949 2.9138 2.8141 2.6892
(0.1477) (0.1155) (0.0827) (0.5574) (0.4374) (0.2958) (0.6650) (0.5303) (0.3507)
T=15 −0.0083 −0.0068 −0.0025 2.1024 2.0328 1.9842 2.9110 2.7826 2.6595
(0.1171) (0.1005) (0.0682) (0.4473) (0.3461) (0.2520) (0.5418) (0.4245) (0.2930)
T=30 −0.0090 −0.0036 −0.0023 2.1010 2.0478 1.9840 2.9043 2.7783 2.6477
(0.0844) (0.0692) (0.0476) (0.3224) (0.2429) (0.1691) (0.3835) (0.2840) (0.2014)
consistency of our estimators. The results for a different spatial economic structure during the last four decades or so. During
coefficient ρ0 = 0.7 (Set III-2) are similar to our benchmark case the period, the organization of the economy, the environment in
of Set III-1 (ρ0 = 0.3), and the corresponding simulation results which economic agents operated, and perhaps even the agents
can be found in Tables D.1-D.3 of the supplementary material. themselves have changed dramatically. For example, the reforms
of State-owned Firm started in 1997 changed the ownership of
most of the firms, the way they are managed, the productivity
6. Empirical Application of labor, and how remuneration was determined. The series
As a case study, we apply our model to analyze the level of of reforms in housing, the health system, and education also
real wages in 159 Chinese cities over the period between 1995 changed both demand and supply of labor. In other words, it
and 2009. The level of real wages measures the demand for is likely that the economy has experienced “structural changes”
labor and is closely related to productivity; see for example over the years and that many of the key relations between
Van Biesebroeck (2015) for a literature review and Combes, economic variables may not remain constant over time.
Démurger, and Li (2017) for another example. We believe that In this case study, we explain the logarithm of the average
this is an ideal empirical application for our proposed model wage level of a city by a number of variables, including capital
for the following two reasons. First, China’s vast internal urban (measured by asset), investment (measured by FDI), and the
labor markets are inter-linked. The wage level in each city is not economic structure of the city (measured by the proportion of
only determined by the characteristics and/or performance of industries and sectors). Ideally, a variable reflecting the level of
itself, but also depends on those around it. The spatial effects of labor input should be included in the model. The only variable
wages are also discussed in the literature; see, for example, Braid that is available to us is the size of the population in each city.
(2002) and Baltagi, Blien, and Wolf (2012). Second, China has The variable appears to be highly correlated with the time trend-
experienced unprecedented economic growth and change in the the correlation coefficient between the average (log) population
Table 3. Means and standard deviations of MSE of 1 (τ ), β

β(τ ) = (β 2 (τ )) (ρ0 = 0.3, σ 2 = 1).
0
1 (τ )
(a) β
(II-1) (II-2) (II-3)
N=10 N=15 N=30 N=10 N=15 N=30 N=10 N=15 N=30
(I-1) T=10 0.0792 0.0480 0.0193 0.0447 0.0296 0.0124 0.0774 0.0531 0.0209
(0.1034) (0.0537) (0.0206) (0.0455) (0.0285) (0.0116) (0.0887) (0.0653) (0.0225)
T=15 0.0463 0.0287 0.0128 0.0283 0.0176 0.0081 0.0505 0.0307 0.0145
(0.0587) (0.0314) (0.0126) (0.0283) (0.0165) (0.0074) (0.0597) (0.0352) (0.0150)
T=30 0.0202 0.0124 0.0058 0.0125 0.0077 0.0036 0.0218 0.0140 0.0065
(0.0205) (0.0123) (0.0058) (0.0107) (0.0066) (0.0032) (0.0237) (0.0158) (0.0072)
(I-2) T=10 0.1983 0.1166 0.0446 0.1287 0.0840 0.0337 0.2105 0.1403 0.0540
(0.2750) (0.1498) (0.0524) (0.1491) (0.1015) (0.0347) (0.2535) (0.1922) (0.0650)
T=15 0.1116 0.0672 0.0291 0.0826 0.0493 0.0238 0.1365 0.0791 0.0384
(0.1462) (0.0792) (0.0307) (0.0881) (0.0548) (0.0239) (0.1634) (0.1010) (0.0439)
T=30 0.0480 0.0287 0.0130 0.0366 0.0224 0.0103 0.0581 0.0362 0.0163
(0.0528) (0.0301) (0.0141) (0.0383) (0.0223) (0.0101) (0.0707) (0.0428) (0.0183)
(I-3) T=10 0.1946 0.1164 0.0443 0.1016 0.0692 0.0285 0.1665 0.1158 0.0477
(0.3156) (0.1710) (0.0594) (0.1308) (0.0901) (0.0333) (0.2311) (0.1621) (0.0607)
T=15 0.1119 0.0647 0.0276 0.0647 0.0386 0.0194 0.1075 0.0648 0.0331
(0.1572) (0.0881) (0.0313) (0.0834) (0.0442) (0.0207) (0.1442) (0.0792) (0.0387)
T=30 0.0459 0.0288 0.0125 0.0288 0.0186 0.0089 0.0474 0.0313 0.0151
(0.0541) (0.0365) (0.0146) (0.0342) (0.0207) (0.0106) (0.0610) (0.0393) (0.0191)
2 (τ )
(b) β
(II-1) (II-2) (II-3)
N=10 N=15 N=30 N=10 N=15 N=30 N=10 N=15 N=30
(I-1) T=10 0.0419 0.0263 0.0123 0.0482 0.0326 0.0167 0.0467 0.0312 0.0169
(0.0410) (0.0263) (0.0112) (0.0405) (0.0266) (0.0122) (0.0395) (0.0253) (0.0120)
T=15 0.0289 0.0181 0.0076 0.0352 0.0236 0.0125 0.0341 0.0231 0.0127
(0.0276) (0.0182) (0.0068) (0.0288) (0.0192) (0.0085) (0.0277) (0.0183) (0.0086)
T=30 0.0139 0.0083 0.0040 0.0181 0.0127 0.0087 0.0178 0.0130 0.0089
(0.0133) (0.0076) (0.0035) (0.0141) (0.0090) (0.0054) (0.0135) (0.0091) (0.0055)
(I-2) T=10 0.0419 0.0263 0.0123 0.0464 0.0309 0.0168 0.0462 0.0305 0.0172
(0.0410) (0.0263) (0.0112) (0.0387) (0.0248) (0.0120) (0.0372) (0.0242) (0.0118)
T=15 0.0289 0.0181 0.0076 0.0344 0.0232 0.0124 0.0342 0.0232 0.0128
(0.0276) (0.0182) (0.0068) (0.0282) (0.0185) (0.0084) (0.0275) (0.0178) (0.0083)
T=30 0.0139 0.0083 0.0040 0.0177 0.0128 0.0084 0.0179 0.0131 0.0089
(0.0133) (0.0076) (0.0035) (0.0136) (0.0090) (0.0054) (0.0134) (0.0090) (0.0056)
(I-3) T=10 0.0374 0.0238 0.0111 0.0406 0.0287 0.0163 0.0403 0.0283 0.0150
(0.0388) (0.0248) (0.0102) (0.0327) (0.0217) (0.0106) (0.0313) (0.0216) (0.0109)
T=15 0.0260 0.0161 0.0070 0.0305 0.0215 0.0124 0.0299 0.0206 0.0103
(0.0254) (0.0162) (0.0065) (0.0235) (0.0152) (0.0084) (0.0236) (0.0152) (0.0077)
T=30 0.0123 0.0074 0.0035 0.0169 0.0132 0.0086 0.0158 0.0112 0.0062
(0.0123) (0.0070) (0.0032) (0.0122) (0.0088) (0.0058) (0.0120) (0.0083) (0.0044)
Table 4. Variable definitions. cities, including four cities like Beijing, Shanghai, Tianjin, and
Dependent variable (Y) Definition Chongqing directly administrated by the central government,
and other 155 cities at or above prefectural levels in China. A
log(wage) Log value of average wage per worker (1994 price)
prefectural city in China means a city that is directly controlled
Independent variables (X) Definition by provincial governments. The geographical location of these
log(FDI) Log value of FDI (10 thousand yuans, 1994 price) cities can be found in Figure 1. Following the convention, we
log(Asset) Log value of total asset (million yuans, 1994 price) divide China into seven regions: East China (EC), South China
GDPm Proportion of GDP by the manufactural sector
GDPs Proportion of GDP by the service sector (SC), Southwest (SW), North China (NC), Northwest China
Empms Proportion of employed persons in the manufactural (NW), Central China (CC), and Northeast China (NE). The
sector out of the nonagricultural sector densities of cities in these regions are quite different that EC has
56 cities, almost one-third of the whole country while it is very
sparse in the western region, reflecting the uneven distribution
size of the cities and the time trend is about 0.98. Thus, we of cities. We use highway distances between pairs of cities in
chose not to include the variable. The impact of labor input is kilometers to measure the spatial distances between cities, as
absorbed by the coefficient of the time trend when the constant they can reflect economic distances between cities. These data
term is included in the time-varying model. Table 4 provides are collected using the service provided by Google Map Services.
the definitions of these variables. Our model captures spatial We specify W as the inverse of highway distances between cities,
inter-dependence and potential change in the effects of these standardized by its maximum eigenvalue.
explanatory variables. As described in the assumptions of Section 3, regressors are
Our data are derived from Statistic Year Books of China allowed to be trending stationary. To check whether the macro-
(various years), for 1995–2009 (T = 15) and cover 159 (N) level regressors considered here are trending stationary, we first
Figure 1. Map of 159 Chinese cities where each color represents one of the following regions from Central China (CC), East China (EC), North China (NC), North East (NE),
North West (NW), South China (SC) and South West (SW).
Table 5. IPS unit root test statistics and p-values for the regressor residuals. spatial coefficient estimate reflects the spatial dependence and
Residuals in regressors Wtbar p-value Ztbar p-value confirms the existence of spillover effects between cities. From
the table we can see that, over the years, FDI contributed posi-
log(FDI) −17.7893 <0.0001 −17.9520 <0.0001
log(Asset) −5.3533 <0.0001 −5.3726 <0.0001 tively to the wage level on average, if FDI increases one percent,
GDPm −12.4843 <0.0001 −12.5689 <0.0001 the average level of real wage would increase by 0.025%. If capital
GDPs −12.5017 <0.0001 −12.6032 <0.0001 increases by one percent, the average real wage would increase
Empms −13.6499 <0.0001 −13.7323 <0.0001
by 0.026%. The estimates also show that economic structure
affects wages as well. For example, if the share of the manu-
factural or service sector increases by one percent, the average
fit the unknown trend in the regressors with the local linear real wage would increase by 0.365% or 0.326%, respectively, but
estimation method. After removing the fitted trend, we obtain the increase in wage would be 0.380% higher if the size of the
the residuals in regressors. Then, we conduct the Im-Pesaran- service sector is one percent larger relative to the manufactural
Shin (IPS) panel unit root tests on these residuals. Refer to sector. Comparison between the whole country and East China
Im, Pesaran, and Shin (2003, eqs. (4.10) and (4.6)) for the IPS illustrates that the spatial dependence is much bigger in EC
test statistics Wtbar and Ztbar , respectively. According to the test than in the whole country. It makes sense as the economic
statistics and p-values in Table 5, the null hypotheses of panel connection is spatially stronger in small regions. The impact of
unit root for these variables are all rejected, indicating that the FDI on wage is smaller in EC. This is likely due to a smaller
assumption of the trend stationary regressors is valid for our difference in economic development between EC and the rest
dataset. of the country so that the additional effect of FDI on the local
It is known that the choice of bandwidth is important for a economy is not as large as for less developed parts of the country.
nonparametric kernel estimation. We first estimate the model The average effect of capital and the share of manufactural in
with the optimal bandwidth (hopt = 0.4000) obtained by the EC are larger than the whole country. This is because it is the
leave-one-unit-out cross-validation method, and then compare densest region in the country and its manufactural sectors are
the results with a number of different bandwidths around it: more developed.
2/3hopt , 4/5hopt , 5/4hopt , and 3/2hopt . Figure 2 shows that Figure 4 displays the time-varying coefficient curves for all
the results under different bandwidths are consistent. Table 6 variables with their 95% confidence bands for the whole country.
reports that the estimates of the spatial coefficient ρ0 and vari- It shows the parameters of the explanatory variables evolve
ance σ02 , which shows that the estimates are quite similar in these clearly over time. For example, the impact of FDI on the wage
specifications. Given the robustness of the results, we decide level has been decreasing over time. This can be explained
to use the average bandwidth have = 0.4173 of those five by the fact that in the early stage of reform, foreign invest-
bandwidths for the rest of the article. ment brought advanced technology and management know-
We estimate the model for China as a whole and then for East how, which also push up the labor demand, but as the domestic
China (Figure 3). Table 7 reports the estimates of the spatial economy catches up, the impact of FDI on the labor market
coefficient ρ0 , σ02 and the time average of β 0 (τ ) for the whole becomes less important. Meanwhile, the impact of capital on the
country and for East China, respectively. The significance of the wage level kept increasing over time. This could be because as
Figure 2. Estimated curves of the time-varying coefficients under different bandwidths for the whole country (159 cities).
Table 6. Estimates of parameters under different bandwidths.

hopt 2/3hopt 4/5hopt 5/4hopt 3/2hopt
ρ0 0.1214 (0.0185) 0.1052 (0.0185) 0.1126 (0.0185) 0.1332 (0.0186) 0.1418 (0.0186)
σ02 0.0062 (0.0002) 0.0061 (0.0002) 0.0062 (0.0002) 0.0064 (0.0002) 0.0064 (0.0002)
the economy becomes less labor-intensive, as the capital level

increases, the demand for work also increases accordingly. The
effects of economic structure on wage have also changed over
time. These findings confirm that indeed the relations between
economic variables changed dramatically during such a period
of fast development. Figure 5 shows the time-varying features of
these variables and the intercept appears quite strong in EC as
well. The results imply that if a time-invariant model were used,
the impacts of these variables would be estimated with biases.
In addition, we conduct model diagnostics. To save space,
we just report the results of the model for the whole country Figure 3. Map of 56 cities in Region EC where each color represents different
since the results of the model for the EC region are quite sim- province.
ilar. To check the stationarity of residuals, we implement IPS
unit root test. Two test statistics are Wtbar = −7.3741 and stationary. To further check whether there is a serial correlation,
Ztbar = −7.4154 with p-values less than 0.0001. So we reject the we carry out the Box-Pierce test (see Box and Pierce 1970) on the
null hypothesis of panel unit root and conclude the residuals are estimated residuals for each city. It is worth noting that we are
interested in a set of hypotheses we apply the Benjamini and Hochberg (1995)’s multiple testing
Hi,0 : the estimated residuals for each city i are white noise. procedure that controls the false discovery rate (FDR) of (12)
at the rate of 0.05. We also apply the Bonferroni correction
v.s. Hi,1 : otherwise. (12)
method (see Miller Jr 1966) that controls the familywise error
for cities i = 1, . . . , N. Accordingly, for each of the hypotheses rate (FWER) of (12) at the rate of 0.05. Both of these two
above, we can calculate the Box-Pierce test statistic and its p- multiple testing procedures show that the null hypotheses for
value. To identify whether or not there exists a significant test, all cities cannot be rejected, which means there is no serial
correlation in the residuals. All the aforementioned diagnostics
Table 7. Estimation results of semiparametric spatial autoregressive panel data results support the validity of our regression model.
model (the covariate coefficient estimates are calculated by the average over time).
Whole country East China
7. Conclusion
Intercept 7.2190 6.4822
log(FDI) 0.0250 0.0183 We have considered a semiparametric SAR panel data model
log(Asset) 0.0256 0.0308
GDPm 0.3645 0.4319 with fixed effects. This model is designed particularly for
GDPs 0.3256 0.1663 situations where covariate effects on the dependent variable
Empms −0.3799 −0.3717 change over time. The spatial dependence structure between
ρ0 0.1240*** (0.0185) 0.2239*** (0.0242)
σ02 0.0063*** (0.0002) 0.0044*** (0.0002) units is assumed to be time-invariant presented by a parametric
spatial lag term. To consistently estimate both the parametric
Figure 4. Estimated curves and 95% confidence bands of time-varying coefficients for the whole country (159 cities) with bandwidth have = 0.4173.
Figure 5. Estimated curves and 95% confidence bands of time-varying coefficients for Region of East China with bandwidth have = 0.3477.
and nonparametric components, we have proposed a local linear for a fast-changing economy such as China, many important
concentrated QML estimation method. Asymptotic properties
√ economic parameters may not be consistently estimated with a
√ have been derived with parametric NT and
for the estimators time-invariant model.
nonparametric NTh rate of convergence, respectively, when
both the cross-sectional size N and the time length T go to Acknowledgments
infinity.
The finite-sample performance of our model has been eval- The authors are grateful for the editor, the associate editor and anony-
mous referees for their constructive comments and suggestions on ear-
uated and compared with those from a time-invariant spatial lier versions of this submission. The authors also acknowledge seminar
panel data model using Monte Carlo simulations. The results participants for their comments and suggestions. This project was under-
showed that when the time-varying coefficients are misspecified taken with the assistance of resources and services from the National
to be constants, using the standard time-invariant spatial panel Computational Infrastructure (NCI), which is supported by the Australian
data model would lead to inconsistent estimates while our pro- Government.
posed model is always consistent and robust.
We have also applied the proposed model to study labor Funding
compensation in Chinese cities. Our results have illustrated Xuan Liang and Jiti Gao thank to the Australian Research Council Discov-
that as China became more developed, the impacts of capital, ery Grants Program under grant numbers: DP150101012 & DP170104421
investment, and the structure of the economy on labor com- for its financial support. Xuan Liang also acknowledges the financial sup-
pensation have changed over time. The results also imply that port of the ANU RSFAS Cross-Disciplinary Grant.
Appendix A:
Justification of Identification Condition 2
N + (ρ0 − ρ)E((X̃ β̃ 0 ) PN,T RN,T )
i=1 α0,i = 0
NT
1
Considering the specification of β 0,t = β 0 (τt ) in Equation (2), our + E((X̃ β̃ 0 ) PN,T (X̃ β̃ 0 ))
NT
model (1) becomes σ2
+ 0 tr{S−1 −1
N,T SN,T (ρ)E(PN,T )SN,T (ρ)SN,T }. (A4)
NT
Yit = ρ0 wij Yjt + Xit β 0 (τt ) + α0,i + eit ,
j=i Then, (NT)−1 {logLN,T (ρ) − QN,T (ρ)} = −(1/2) log{ σ 2 (ρ)}−
2∗

t = 1, . . . , T, i = 1, . . . , N. log{σ (ρ)} , where
where the constant 1 is included in the regressor Xit . 1 ∗

) , σ 2 (ρ) =
Y (ρ)PN,T Y ∗ (ρ)
Without loss of generality, let Xit = (Xit1 , Xit,−1 NT
β 0 (τt ) = (β0,1 (τt ), β 0,−1 (τt ) ) where Xit1 = 1 and Xit,−1 =
1 2
= (ρ0 − ρ)2 R
N,T PN,T RN,T + NT (ρ0 − ρ)(X̃ β̃ 0 ) PN,T RN,T
NT
(Xit2 , . . . , Xitd ) and β 0,−1 (τt ) = (β0,2 (τt ), . . . , β0,d (τt )) . Then, 1
our model becomes + (X̃ β̃ 0 ) PN,T (X̃ β̃ 0 )
NT
2
β + (X̃ β̃ 0 ) PN,T SN,T (ρ)S−1N,T e
Yit = ρ0 wij Yjt + β0,1 (τt ) + Xit,−1 0,−1 (τt ) + α0,i + eit , NT
j=i 2(ρ0 − ρ)
+ RN,T PN,T SN,T (ρ)S−1 N,T e
t = 1, . . . , T, i = 1, . . . , N. NT
1 −1
+ e SN,T SN,T (ρ) PN,T SN,T (ρ)S−1 N,T e. (A5)
−1 N
NT
Let Y·t = N −1 N i=1 Yit , W·t = N i=1 j=i wij Yjt , X·t,−1 =
−1 N −1 N −1 N e . We To show Equation (A2), it is sufficient to show that
N i=1 Xit,−1 , α = N i=1 α0,i and e·t = N i=1 it
then have
σ 2 (ρ) − σ 2∗ (ρ) = oP (1), uniformly on .
(A6)
β
Y·t = ρ0 W·t + β0,1 (τt ) + X·t,−1 0,−1 (τt ) + α + e·t , (A1) According to Lemma B.2, we know
Without imposing the assumption α = 0, model (A1) implies that 1

(X̃ β̃ 0 ) PN,T (X̃ β̃ 0 ) = oP (1),
there is an identification issue with the identifiability and then estima- NT
1
bility of β0,1 (τ ). Hence, in order to identify β0,1 (τ ) we require the
E((X̃ β̃ 0 ) PN,T (X̃ β̃ 0 )) = o(1),
condition N i=1 α0,i = 0. In fact, the advantage of this identification
NT
1
condition is to allow us to estimate β0,1 (τ ) as a smooth time-varying (X̃ β̃ 0 ) PN,T RN,T = oP (1),
or trending effect in contrast to the fixed effects structure. Therefore, NT
1
we believe that our model is more flexible and applicable. E((X̃ β̃ 0 ) PN,T RN,T ) = o(1),
NT
1 1
R PN,T RN,T = E(R N,T PN,T RN,T )
Appendix B: Proofs of Theorems NT N,T NT
+ oP (1) → R,R .
Proof of Theorem 1. Even though we have the nonparametric terms
in our model, the idea for proving the consistency of the parametric By Equations (A4) and (A5),
estimators and the identification can be adopted from Lee (2004).
Define QN,T (ρ) = maxσ 2 E{logLN,T (θ)}, where θ = (ρ, σ 2 ). In order 2
σ 2 (ρ) − σ 2∗ (ρ) =
(X̃ β̃ 0 ) PN,T SN,T (ρ)S−1
N,T e
to show the consistency of
θ, it suffices to show NT
2(ρ0 − ρ)
+ RN,T PN,T SN,T (ρ)S−1
N,T e
1 P NT
{logLN,T (ρ) − QN,T (ρ)} → 0 uniformly on , (A2)
NT 1 −1
+ e SN,T SN,T (ρ) PN,T SN,T (ρ)S−1 N,T e
NT
and the uniqueness identification condition that − σ 2 (ρ) + oP (1)
1 := 2H1 (ρ) + 2(ρ0 − ρ)H2 (ρ) + H3 (ρ) − σ 2 (ρ)

lim sup max {QN,T (ρ) − QN,T (ρ0 )} < 0 for any > 0, + oP (1),
N,T→∞ ρ∈N (ρ0 ) NT
c
(A3)
where σ 2 (ρ) = (NT)−1 σ02 tr{S−1
N,T SN,T (ρ)E(PN,T )SN,T (ρ)SN,T }.
−1
by using White (1996) and Lee (2004), where Nc (ρ0 ) is the complement
According to Lemma B.3, we get Equation (A6) so that Equation
of an open neighborhood of ρ0 on of diameter .
(A2) holds.
(1) Proof of Equation (A2). Observe that QN,T (ρ) =
(2) Proof of Equation (A3). Consider an auxiliary SAR panel
− NT {log(2π ) + 1} − NT log{σ 2∗ (ρ)} + Tlog|S (ρ)|, where σ 2∗ (ρ) =
2 2 N process: Yt = ρ0 WYt + et where et ∼ N(0N , σ02 IN ) and t = 1, . . . , T.
(NT) E{Ỹ (ρ)QN,T Ỹ(ρ)} = (NT) E Y (ρ)PN,T Y ∗ (ρ) .
−1 −1 ∗
Denote the log-likelihood of this model as logLaN,T (ρ, σ 2 ). Let σ̃ 2 (ρ) =
and S
Due to PN,T D = 0NT,N−1 , PN,T = PN,T N,T (ρ) = SN,T + (NT)−1 σ02 tr{S−1 −1
N,T SN,T (ρ)SN,T (ρ)SN,T }. It can be verified that
2∗
(ρ0 − ρ)IT ⊗ W, we can rewrite σ (ρ) as
NT NT
max Ea {logLaN,T (ρ, σ 2 )} = − {log(2π ) + 1} − log{σ̃ 2 (ρ)}
1 2 2
σ 2∗ (ρ) = (ρ0 − ρ)2 E(R
N,T PN,T RN,T )
σ2
NT + Tlog|SN (ρ)| := Q̃N,T (ρ),
where Ea denotes the expectation under this auxiliary model. Hence, is not a zero vector, H42 (ρ) can be written as E(x Ax) with a positive
for any ρ ∈ , we have Q̃N,T (ρ) ≤ maxρ,σ 2 Ea {logLaN,T (ρ, σ 2 )} = semidefinite matrix A. Thus, (NT)−1 H42 (ρ) ≤ 0 uniformly on so
Ea {logLaN,T (ρ0 , σ02 )} = Q̃N,T (ρ0 ) so that that H4 (ρ) ≤ 0 holds uniformly on .
1 {Q
Hence, we have limN,T→∞ sup maxρ∈Nc (ρ0 ) NT N,T (ρ) −
1 QN,T (ρ0 )} ≤ 0 for any > 0.
{Q̃N,T (ρ) − Q̃N,T (ρ0 )} ≤ 0 uniformly on .
NT If the identification uniqueness condition was not satisfied, without
The term in Equation (A3) can be rewritten as follows: loss of generality, there would exist a sequence ρn converging to
ρ ∗ = ρ0 such that limN,T→∞ sup maxρ∈Nc (ρ0 ) NT 1 {Q
N,T (ρn ) −
1 1
{QN,T (ρ) − QN,T (ρ0 )} = {Q̃N,T (ρ) − Q̃N,T (ρ0 )} QN,T (ρ )} = 0. This will follow if (a) limN,T→∞ {σ̃ 2 (ρ ∗ ) −
∗
NT NT σ 2∗ (ρ ∗ )} = 0 and (b) limN,T→∞ (NT)−1 [Q̃N,T (ρ ∗ ) − Q̃N,T (ρ0 )] =
1 1 0. Since limN,T→∞ {σ̃ 2 (ρ) − σ 2∗ (ρ)} = R,R > 0 assumed in
+ H4 (ρ) + H5 ,
2 2 Assumption 8, (a) contradicts with this assumption. Hence, we have
where H4 (ρ) = log{σ̃ 2 (ρ)} − log{σ 2∗ (ρ)} and H5 = log{σ 2∗ (ρ0 )} − accomplished the proof.
log{σ̃ 2 (ρ0 )}. Proof of Theorem 2. In this section, we provide the proof of the
By Lemma C.5, we have (NT)−1 tr(E(PN,T )) = (NT)−1 tr(INT ) = asymptotic distribution of
θ. According to the Taylor expansion of the
1 so that first order condition of Equation (8), we obtain
1 −1
σ 2∗ (ρ0 ) − σ̃ 2 (ρ0 ) = E((X̃ β̃ 0 ) PN,T (X̃ β̃ 0 )) √ 1 ∂ 2 logLN,T (θ̃)
NT
NT(θ − θ 0 ) = −
tr(E(PN,T )) NT ∂θ∂θ
+ σ02 − 1 = o(1).
NT 1 ∂logLN,T (θ 0 )
×√ + oP (1),
NT ∂θ
Accordingly, we obtain H5 = o(1).
To show H4 (ρ) ≤ 0 uniformly on , it suffices to show σ̃ 2 (ρ) − where θ̃ lies between
θ and θ 0 and hence it converges to θ 0 in proba-
σ 2∗ (ρ) ≤ 0 uniformly on . Observe that σ̃ 2 (ρ) − σ 2∗ (ρ) = bility by Theorem 1. If
σ02 1 1 2
NT H4 (ρ) − NT H4 (ρ), where 1 ∂logLN,T (θ 0 ) d
√ → N(02 , θ 0 + θ 0 ), (A7)
1 1 tr(E(PN,T )) NT ∂θ
H (ρ) = 1 − 2
1 ∂ logLN,T (θ 0 )
NT 4 NT − − θ 0 = oP (1), (A8)
tr(GNT ) − tr(E(PN,T )GNT ) NT ∂θ∂θ
+ 2(ρ0 − ρ)
NT 1 ∂ 2 logLN,T (θ̃) 1 ∂ 2 logLN,T (θ 0 )
G ) − tr(G E(P
− = oP (1)
tr(GNT NT NT N,T )GNT ) NT ∂θ∂θ NT ∂θ∂θ
+ (ρ0 − ρ)2
NT uniformly in θ̃, (A9)

and H42 (ρ) = E {X̃ β̃ 0 + (ρ0 − ρ)RN,T } PN,T {X̃ β̃ 0 + √ d
all hold, the asymptotic normality follows: NT( θ − θ 0) →
(ρ0 − ρ)RN,T } . −1 −1 −1
N(02 , θ + θ θ 0 θ ).
By Lemma C.5, we get (NT)−1 H41 (ρ) = op (1) uniformly on .
0 0 0
(1) Proof of Equation (A7). First, we can get the first-order deriva-
Since PN,T = (INT − X̃ ˜ ) Q Q(INT − X̃ ˜ ) and X̃ β̃ 0 +(ρ0 −ρ)RN,T tive of logLN,T (θ) as follows:
⎛ ⎞
∂logLN,T (θ 0 )
1 ∂logLN,T (θ 0 ) √1
= ⎝ NT
∂ρ ⎠
√ ∂logLN,T (θ 0 )
NT ∂θ √1 2
NT ∂σ
⎛ ⎞
RN,T PN,T e e G 2
N,T PN,T e−Tσ0 tr(GN ) RN,T PN,T X̃ β̃ 0 (X̃ β̃ 0 ) PN,T GN,T e
1 ⎜ σ2 + + + ⎟
σ02 σ02 σ02
= √ ⎝ 0
P 2 P P ⎠
NT e N,T e−NTσ 0 + ( X̃ β̃ 0 ) N,T ( X̃ β̃ 0 )+2( X̃ β̃ 0 ) N,T e
2σ04 2σ04
⎛ ⎞
1
1
2 e G
N,T P N,T e − σ0
2 tr(G
N,T ) + 1 R P
2 N,T e
= √ ⎝ σ0 σ0 N,T ⎠ + oP (1)
1 (e P 2
NT 42σ0 N,T e − NTσ0 )
due to in Lemma B.2. By Lemma C.5, we have

1 1
√ R
N,T PN,T X̃ β̃ 0 = oP (1), √ (X̃ β̃ 0 ) PN,T (X̃ β̃ 0 ) = oP (1), √
1
tr(PN,T − INT ) = oP (1),
NT NT NT
1
√ tr(PN,T GN,T − GN,T ) = oP (1),
1 1
√ (X̃ β̃ 0 ) PN,T e = oP (1), √ (X̃ β̃ 0 ) PN,T GN,T e = oP (1), NT
NT NT
so that To construct the martingale difference array, let j = N(t − 1) + i

for 1 ≤ i ≤ N and 1 ≤ t ≤ T and eit = e(j) for j = 1, . . . , NT.
√
1 ∂logLN,T (θ 0 ) Consequently, we get e = (e11 , . . . , eN1 , . . . , e1T , . . . , eNT ) =
NT ∂θ
⎛ ⎞ (e(1) , . . . , e(N) , . . . , eN(T−1)+1 , . . . , e(NT) ) . Let
1 e G P 2 1
1 ⎜ σ2 N,T N,T e − σ0 tr(GN,T PN,T ) + σ 2 RN,T PN,T e⎟
= √ ⎝ 0 1 {e P 2
0 ⎠ + oP (1) Fj−1 = FV ∨ σ e(1) , . . . , e(j−1)
NT 4 N,T e − σ0 tr(PN,T )}
2σ0
1 for j ≥ 1 where Fj−1 := FV if j = 1, and FV ∨ σ e(1) , . . . , e(j−1)
= √ {Q − E(Q|FV )} + oP (1),
NT is the σ -field generated by FV ∪ σ e(1) , . . . , e(j−1) . Due to the condi-
⎛ ⎞ tional independence of eit and {e1t , . . . , ei−1,t } given Et−1 , we obtain
1 e G P 1
σ2 N,T N,T e + σ 2 RN,T PN,T e E(e(j) |Fj−1 ) = E(eit |Et−1 , e1t , . . . , ei−1,t ) = E(eit |Et−1 ) = 0. Thus,
where Q = ⎝ 0 1 e P
0 ⎠.
4 N,T e {(e(j) , Fj ) : 1 ≤ j ≤ NT} forms a martingale difference array.
2σ0
Next, we show the asymptotic normality of (NT)−1/2 {Q − To make use of the above constructed martingale difference
E(Q|FV )}. By the Cramér-Wold device, it is sufficient to derive the array to derive the asymptotic normality of the linear quadratic
asymptotic normality of (NT)−1/2 (a1 , a2 ){Q − E(Q|FV )}, where form e AN,T e + b N,T e, we let (A)j1 j2 be the (j1 , j2 )-th element of
(a1 , a2 ) is any given two-dimensional constant vector. Note that matrix A and (b)j be the j-th element of vector b. Note that both
AN,T , bN,T ∈ FV because they are functions of {Xt , t = 1, . . . , T}
a1 a1 a2 and also AN,T is symmetric. Hence we obtain e AN,T e + b N,T e −
(a1 , a2 )Q = e G PN,T + 2 PN,T GN,T + 4 PN,T e NT
2σ02 N,T 2σ0 2σ0 E(e AN,T e + b N,T e|F V ) = j=1 j Z , where
a1
+ 2 RN,T PN,T e =: e AN,T e + bN,T e, j−1
σ0
Zj = (AN,T )jj e(j)2 − σ02 + 2e(j) (AN,T )jj1 e(j1 ) + (bN,T )j e(j) .
which is a linear quadratic form of e = (e11 , . . . , eN1 , . . . , j1 =1
e1T , . . . , eNT ) . Here AN,T = a12 G
N,T PN,T +
a1
2 PN,T GN,T +
2σ0 2σ0 Since AN,T , bN,T ∈ FV , we then have E(Zj |Fj−1 ) = 0. This
a2
4 PN,T and b N,T = a12 RN,T PN,T are complicated functions implies that {(Zj , Fj ) : 1 ≤ j ≤ NT} forms a martingale difference
2σ0 σ0
of {Xt , t = 1, . . . , T} based on the definitions of PN,T and RN,T array. At this stage we can apply the central limit theorem for martingale
given before Assumption 8. Consequently, AN,T and b difference arrays similarly to the proof of Theorem 1 in Kelejian and
N,T are both Prucha (2001), and obtain that
random and hence different from the fixed setting in the central limit
theorem for linear quadratic forms in Kelejian and Prucha (2001, theor.
1
NT
1
1). √ (a1 , a2 ){Q − E(Q|FV )} = √ Zj
In order to adapt the proof of Theorem 1 in Kelejian and Prucha NT NT j=1
(2001) in our case, the conditions for e in Assumption 2 can help us
d

establish a martingale difference array. Accordingly, the central limit → N 0, (a1 , a2 ) Q (a1 , a2 ) ,
theorem for martingale difference arrays used in the proof of Theorem
1 in Kelejian and Prucha (2001) can be applied to derive the asymptotic where Q = limN,T→∞ (NT)−1 E Cov(Q|FV ) = limN,T→∞
normality of the linear quadratic form e AN,T e + b N,T e. ( NT,θ 0 + NT,θ 0 ),
⎛ ⎞
E(R 2
N,T PN,T RN,T ) E(tr(PN,T GN,T PN,T GN,T +PN,T
2 G 2
N,T GN,T )) E(tr(GN,T PN,T ))
+
⎜ NTσ02 NT NTσ02 ⎟
=⎝ 2 )) ⎠
NT,θ 0 E(tr(G 2
N,T PN,T )) E(tr(PN,T
NTσ02 2σ04 NT
⎛ ⎞
E(R
N,T PN,T RN,T ) tr(G2N,T )+tr(GN,T G
N,T ) tr(GN,T )
⎜ +
=⎝ NTσ 2 NT NTσ02 ⎟ + o(1),
0
tr(GN,T ) ⎠
1
NTσ02 2σ04
and
⎛ ⎞
2m3 E(R
N,T PN,T diag(PN,T GN,T )) (m −3σ04 )E( NT 2
i=1 (gp)ii ) m3 E(RN,T PN,T diag(PN,T )) (m −3σ04 )E( NT i=1 (gp)ii pii )
+ 4 + 4
⎜ NTσ0 4 NTσ 4
2NTσ0 6 2NTσ06 ⎟
NT,θ 0 =⎝ 0 ⎠,
m3 E(R
N,T PN,T diag(PN,T )) (m4 −3σ04 )E( NT
i=1 (gp)ii pii ) (m4 −3σ04 )E( NT 2
i=1 pii )
6 + 6 8
2NTσ0 2NTσ0 4σ0 NT
⎛ c2
⎞
1
where pii and (gp)ii are the i-th main diagonal elements of PN,T and
σ2 R,R + c1 σ02 ⎠
G
N,T PN,T , respectively. As
we have limN,T→∞ NT,θ 0 = θ0 = ⎝ 0 c2 1 .
E(RN,T PRN,T ) σ02 2σ04
→ R,R , Therefore, we obtain
NT
tr(G2N,T ) + tr(G
N,T GN,T ) 1 ∂logLN,T (θ 0 ) d
lim = c1 , √ → N(0, θ 0 + θ 0 ),
N,T→∞ NT NT ∂θ
tr(GN,T )
= c2 , where θ 0 = limN,T→∞ NT,θ 0 .
NT
(2) Proof of (A8). The second derivative can be obtained as follows:
∂ 2 logLN,T (θ ) 1
2
= −Ttr G2N (ρ) − 2 {(IT ⊗ W)Y} PN,T {(IT ⊗ W)Y} ,
∂ρ σ

∂ 2 logLN,T (θ ) {(IT ⊗ W)Y} PN,T SN,T (ρ)Y
= − ,
∂ρ∂σ 2 σ4

∂ 2 logLN,T (θ ) NT SN,T (ρ)Y QN,T SN,T (ρ)Y
= − . (A10)
∂σ 2 ∂σ 2 2σ 4 σ6
By some calculations, we have
1 ∂ 2 logLN,T (θ 0 ) tr(G2N,T ) R
N,T PN,T RN,T + 2RN,T PN,T GN,T e + e GN,T PN,T GN,T e
= − − ,
NT ∂ρ 2 NT NTσ02
1 ∂ 2 logLN,T (θ 0 ) R
N,T PN,T e + e GN,T PN,T e + e GN,T PN,T X̃ β̃ 0 + RN,T PN,T X̃ β̃ 0
2
=− 4
,
NT ∂ρ∂σ NTσ0
1 ∂ 2 logLN,T (θ 0 ) 1 e PN,T e + (X̃ β̃ 0 ) PN,T X̃ β̃ 0 + 2(X̃ β̃ 0 ) PN,T e
= − .
NT ∂σ 2 ∂σ 2 2σ04 NTσ06
Thus,
⎛ ⎞
2R
N,T PN,T GN,T e+e GN,T PN,T GN,T e RN,T PN,T e+e GN,T PN,T e
1 ∂ 2 logLN,T (θ 0 ) 1 ⎜ σ02 σ04 ⎟
− = ⎝ R ⎠
NT ∂θ∂θ NT N,T PN,T e+e GN,T PN,T e e PN,T e
σ04 σ06
⎛ ⎞
tr(G2N,T ) R PN,T RN,T
+ N,T 0
+ ⎝ NT NTσ 2 0 ⎠ + oP (1).
0 − 14
2σ0
By Lemma B.2, we have

⎛ ⎞
2R
N,T PN,T GN,T e+e GN,T PN,T GN,T e RN,T PN,T e+e GN,T PN,T e
1 ⎜ σ02 σ04 ⎟
⎝ R ⎠
NT N,T PN,T e+e GN,T PN,T e e PN,T e
σ04 6
σ0
⎧ ⎛ ⎞⎫
⎪ 2RN,T PN,T GN,T e+e G
N,T PN,T GN,T e RN,T PN,T e+e GN,T PN,T e ⎪
⎨ 1 ⎬
⎜ 2
σ0 4
σ0 ⎟
=E ⎝ P G P ⎠ + oP (1)
⎪ NT
⎩ R N,T N,T e+e N,T N,T e
e PN,T e ⎪
⎭
σ04 σ06
⎛ ⎞
tr(G
N,T E(PN,T )GN,T ) tr(E(PN,T )GN,T )
⎜ NT NTσ02 ⎟
= ⎝ tr(E(P )G ) tr(E(P )) ⎠
+ oP (1).
N,T N,T N,T
NTσ02 NTσ04
According to Lemma C.5, we can get
⎛ ⎞
tr(G 2
N,T E(PN,T )GN,T )+tr(GN,T ) E(R
N,T PN,T RN,T ) tr(E(PN,T )GN,T )
1 ∂ 2 logLN,T (θ 0 ) ⎜ + ⎟
− =⎝
NT NTσ02 NTσ02
⎠
NT ∂θ ∂θ tr(E(PN,T )GN,T ) tr(E(PN,T )) 1
− 4
NTσ02 NTσ04 2σ0
⎛ ⎞
tr(G2N,T )+tr(G
N,T GN,T ) E(R
N,T PN,T RN,T ) tr(GN,T )
⎜ +
=⎝
NT NTσ02 NTσ02 ⎟ + o(1) =
tr(GN,T ) 1
⎠ θ 0 + o(1).
NTσ02 2σ04
Thus, Equation (A8) has been proved.
(3) Proof of Equation (A9). By Equation (A10), we note that show Equation (A9) holds for all elements but the second derivative
1/σ 2 appears either in linear, quadratic or cubic from in all the ele- of logLN,T(θ) with
respect
GN (ρ) = WSN (ρ), we have
to ρ. Since
ments of the second derivative of logLN,T (θ ) and ρ only appears tr G2N (ρ̃) = tr G2N + 2tr G3N (ρ̃ ∗ ) (ρ̃ ∗ − ρ0 ) for some ρ̃ ∗ between
in linear form in ∂ 2 logLN,T (θ )/∂ρ∂σ 2 and ∂ 2 logLN,T (θ)/∂σ 2 ∂σ 2 . ρ̃ and ρ0 by Assumption 5 and the mean value theorem. Consequently,
According to the convergence of θ̃ to θ 0 in probability, it is easy to we have

Supplementary Materials
1 ∂ 2 logLN,T (θ̃ ) 1 ∂ 2 logLN,T (θ 0 ) tr G3N (ρ̃ ∗ )
− = −2 (ρ̃ ∗ − ρ0 ) The online supplementary material includes useful lemmas and additional
NT ∂ρ 2 NT ∂ρ 2 N
simulation results. Specifically, Appendix B provides the main lemmas
1 1 {(IT ⊗ W)Y} PN,T {(IT ⊗ W)Y} directly related to the proofs of the main theorems in the article. Appendix
+ − . (A11)
σ02 σ̃ 2 NT C contains additional lemmas which are used to show Appendix B. Last,
additional numerical results are reported in the tables of Appendix D.
Since GN (ρ) is UB, N −1 tr G3N (ρ̃ ∗ ) is bounded, implying the
first term (A11) is oP (1). The second term is also oP (1) since
(NT)−1 {(IT ⊗ W)Y} PN,T {(IT ⊗ W)Y} which can be shown by
Lemma B.2. Consequently, Equation (A9) holds. Hence, we have References
finished the whole proof.
Anselin, L., Florax, R., and Rey, S. J. (2004), Advances in Spatial Economet-
Proof of Theorem 3. As we know SN,T ( ρ ) = SN,T + (ρ0 − ρ
)IT ⊗ W rics: Methodology, Tools and Applications, Heidelberg: Springer-Verlag.
as well as the definition of
β(τ ) in Equation (9), we have [1]
β(τ ) − β 0 (τ ) = (τ ){Y ∗ (
ρ ) − Dα } − β 0 (τ ) Arellano, M. (2003), Panel Data Econometrics, Oxford: Oxford University
Press. [1]
= (τ ){Y ∗ (
ρ ) − D(D̃ D̃)−1 D̃ Ỹ(
ρ )} − β 0 (τ ) Baltagi, B. (2008), Econometric Analysis of Panel Data, Chichester: Wiley.
= (τ ){INT − D(D̃ D̃)−1 D̃ (INT − S)}X̃ β̃ 0 − β 0 (τ ) [1]
Baltagi, B. H., Blien, U., and Wolf, K. (2012), “A Dynamic Spatial Panel Data
+ (τ ){INT − D(D̃ D̃)−1 D̃ (INT − S)}Dα Approach to the German Wage Curve,” Economic Modelling, 29(1):12–
+ (τ ){INT − D(D̃ D̃)−1 D̃ (INT − S)}e + (ρ0 − ρ
) (τ ) 21. [8]
Baltagi, B. H., Song, S. H., and Koh, W. (2003), “Testing Panel Data Regres-
×{INT − D(D̃ D̃)−1 D̃ (INT − S)}GN,T (X̃ β̃ 0 + Dα + e) sion Models With Spatial Error Correlation,” Journal of Econometrics,
:= N,T (1) + N,T (2) + N,T (3) + N,T (4). (A12) 117, 123–150. [1]
Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery
Due to D̃ = (INT − S)D, we get
Rate: A Practical and Powerful Approach to Multiple Testing,” Journal
N,T (2) = (τ ){D − D(D̃ D̃)−1 D̃ D̃}α = 0d . (A13) of the Royal Statistical Society, Series B, 57, 289–300. [12]
Box, G. E., and Pierce, D. A. (1970), “Distribution of Residual Autocorrela-
We may rewrite N,T (1) as N,T (1) = (τ )X̃ β̃ 0 − β 0 (τ ) − tions in Autoregressive-Integrated Moving Average Time Series Models,”
(τ )D(D̃ D̃)−1 D̃ (INT − S)X̃ β̃ 0 := N,T (1, 1) − N,T (1, 2). Then, Journal of the American Statistical Association, 65, 1509–1526. [11]
according to Lemmas B.1 and B.4, we know Braid, R. M. (2002), “The Spatial Effects of Wage Or Property Tax
Differentials, and Local Government Choice Between Tax Instruments,”
1
N,T (1) = μ2 β 0 (τ )h2 + oP (h2 ). (A14) Journal of Urban Economics, 51, 429–445. [8]
2 Cai, Z. (2007), “Trending Time-Varying Coefficient Time Series Models
Also, Lemma B.5 shows that With Serially Correlated Errors,” Journal of Econometrics, 136, 163–188.
√ [1]
NThN,T (3) → N(0d , ν0 σ02 X−1 (τ )).
d
(A15)
Chen, J., Gao, J., and Li, D. (2012), “ Semiparametric Trending Panel Data
From Theorem 2, we have Models With Cross-Sectional Dependence,” Journal of Econometrics,
√ √
NThN,T (4) = NThOP (ρ0 − ρ
) = oP (1). (A16) 171, 71–85. [2,4,5,18]
Chen, J., Li, D., and Linton, O. (2019), “A New Semiparametric Estima-
By Equations (A12)–(A16), we have accomplished the proof. tion Approach for Large Dynamic Covariance Matrices With Multiple
Conditioning Variables,” Journal of Econometrics, 212, 155–176s. [4]
Cliff, A. D., and Ord, J. K. (1973), Spatial Autocorrelation, Monographs in
Appendix C: Optimal Bandwidth Selection Spatial Environmental Systems Analysis, London: Pion Limited. [1]
Combes, P.-P., Démurger, S., and Li, S. (2017), “Productivity Gains From
Due to the existence of individual fixed effects, the traditional cross- Agglomeration and Migration in the People’s Republic of China Between
validation method may not provide satisfactory results in panel data 2002 and 2013,” Asian Development Review, 34, 184–200. [8]
when selecting the optimal bandwidth. Hence, throughout the article, Dou, B., Parrella, M. L., and Yao, Q. (2016), “Generalized Yule–Walker
we adopt the leave-one-unit-out cross-validation method to choose Estimation for Spatio-Temporal Models With Unknown Diagonal Coef-
optimal bandwidth. Such method is also used in Li, Chen, and Gao ficients,” Journal of Econometrics, 194, 369–382. [4]
(2011) and Chen, Gao, and Li (2012). The initial value ρ̃ is obtained Fan, J., and Gijbels, I. (1996), Local Polynomial Modelling and its Applica-
from the parametric spatial panel data model Lee and Yu (2010). tions, London: Chapman & Hall/CRC. [2,3]
The idea is that firstly, use (N − 1)T observations among all data Fan, J., and Yao, Q. (2003), Nonlinear Time Series: Nonparametric and
Parametric Methods. New York: Springer-Verlag. [4]
except for the ith unit {(Xit , Yit ), 1 ≤ t ≤ T} as the training data
Fingleton, B. (2008), “A Generalized Method of Moments Estimator for a
to obtain the estimate of β(τ ), which is denoted as
(−i)
β ρ̃ (τ ) for each Spatial Panel Model With an Endogenous Spatial Lag and Spatial Moving
1 ≤ i ≤ N. The optimal bandwidth is the one that minimizes a weight Average Errors,” Spatial Economic Analysis, 3, 27–44. [1]
squared prediction error of the form Gao, J. (2007), Nonlinear Time Series: Semiparametric and Nonparametric
& ' Methods, London: Chapman & Hall/CRC. [4,5]
(−)
hopt = arg min Z(ρ̃) − B X, β ρ̃ Hsiao, C. (2014), Analysis of Panel Data, Cambridge: Cambridge University
h Press. [1]
& '
∗ ∗
M M Z(ρ̃) − B X,
(−)
β ρ̃ , (A17) Im, K. S., Pesaran, M. H., and Shin, Y. (2003), “Testing for Unit Roots in
Heterogeneous Panels,” Journal of Econometrics, 115, 53–74. [10]
where M ∗ = INT − T −1 (iT iT ) ⊗ IN is used to delete the unobserved Kapoor, M., Kelejian, H. H., and Prucha, I. R. (2007), “ Panel Data Models
fixed effect due to M∗ D = 0NT×(N−1) , With Spatially Correlated Error Components,” Journal of Econometrics,
& ' & ' 140, 97–130. [1]
B(X,
(−) (−1) (−N)
β ρ̃ ) = X11 β ρ̃ (τ1 ) , · · · , XN1 β ρ̃ (τ1 ) , · · · , Kelejian, H. H., and Prucha, I. R. (1998), “A Generalized Spatial Two-Stage
Least Squares Procedure for Estimating a Spatial Autoregressive Model
& ' & '

X1T
(−1)
β ρ̃ (τT ) , . . . , XNT
(−N)
β ρ̃ (τT ) . With Autoregressive Disturbances,” Journal of Real Estate Finance and
Economics, 17, 99–121. [1,5]
Kelejian, H. H., and Prucha, I. R. (1999), “A Generalized Moments Estima- Robinson, P. M. (2012), “Nonparametric Trending Regression With Cross-
tor for the Autoregressive Parameter in a Spatial Model,” International Sectional Dependence,” Journal of Econometrics, 169, 4–14. [2]
Economic Review, 40, 509–533. [1,6] Silvapulle, P., Smyth, R., Zhang, X., and Fenech, J.-P. (2017), “Nonparamet-
Kelejian, H. H., and Prucha, I. R. (2001), “On the Asymptotic Distribution ric Panel Data Model for Crude Oil and Stock Market Prices in Net Oil
of the Moran I Test Statistic With Applications,” Journal of Econometrics, Importing Countries,” Energy Economics, 67, 255–267. [1]
104, 219–257. [5,16] Su, L. (2012), “Semiparametric GMM Estimation of Spatial Autoregressive
Lee, L.-F. (2004), “Asymptotic Distributions of Quasi-Maximum Likelihood Models,” Journal of Econometrics, 167, 543–560. [1]
Estimators for Spatial Autoregressive Models,” Econometrica, 72, 1899– Su, L., and Jin, S. (2010), “Profile Quasi-Maximum Likelihood Estimation of
1925. [1,3,5,14] Partially Linear Spatial Autoregressive Models,” Journal of Econometrics,
Lee, L.-F., and Yu, J. (2010), “Estimation of Spatial Autoregressive Panel 157, 18–33. [1,2,3,5]
Data Models With Fixed Effects,” Journal of Econometrics, 154, 165–185. Su, L., and Ullah, A. (2006), “Profile Likelihood Estimation of Partially
[1,2,3,5,6,18] Linear Panel Data Models With Fixed Effects,” Economics Letters, 92,
(2014), “Efficient GMM Estimation of Spatial Dynamic Panel Data 75–81. [2,3]
Models With Fixed Effects,” Journal of Econometrics, 180, 174–197. Sun, Y. (2016), “Functional-Coefficient Spatial Autoregressive Models With
[1] Nonparametric Spatial Weights,” Journal of Econometrics, 195, 134–153.
LeSage, J., and Pace, R. K. (2009), Introduction to Spatial Econometrics, [1]
London: Chapman and Hall/CRC. [1] Sun, Y., and Malikov, E. (2018), “Estimation and Inference in Functional-
Li, D., Chen, J., and Gao, J. (2011), “Non-Parametric Time-Varying Coefficient Spatial Autoregressive Panel Data Models With Fixed
Coefficient Panel Data Models With Fixed Effects,” Econometrics Journal, Effects,” Journal of Econometrics, 203, 359–378. [1,2,4]
14, 387–408. [2,18] Van Biesebroeck, J. (2015), How Tight is the Link Between Wages and
Li, K. (2017), “Fixed-Effects Dynamic Spatial Panel Data Models and Productivity?: A Survey of the Literature. ILO. [8]
Impulse Response Analysis,” Journal of Econometrics, 198, 102–121. [1] White, H. (1996). Estimation, Inference and Specification Analysis, Cam-
Li, Q., and Racine, J. S. (2007), Nonparametric Econometrics: Theory and bridge: Cambridge University Press. [14]
Practice, Princeton, NJ: Princeton University Press. [5] Yu, J., De Jong, R., and Lee, L.-F. (2008), “Quasi-Maximum Likelihood
Malikov, E., and Sun, Y. (2017), “Semiparametric Estimation and Testing Estimators for Spatial Dynamic Panel Data With Fixed Effects When
of Smooth Coefficient Spatial Autoregressive Models,” Journal of Econo- Both n and t Are Large,” Journal of Econometrics, 146, 118–134. [1,5]
metrics, 199, 12–34. [1] Zhang, Y., and Shen, D. (2015), “Estimation of Semi-Parametric Varying-
Miller Jr, R. G. (1966), Simultaneous Statistical Inference, New York: Coefficient Spatial Panel Data Models With Random-Effects,” Journal of
Springer. [12] Statistical Planning and Inference, 159, 64–80. [1]

Semiparametric Spatial Autoregressive Panel Data Model With Fixed Effects and Time-Varying Coefficients

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semiparametric Spatial Autoregressive Panel Data Model With Fixed Effects and Time-Varying Coefficients

Uploaded by

Copyright:

Available Formats

Journal of Business & Economic Statistics

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ubes20

Semiparametric Spatial Autoregressive Panel

Xuan Liang, Jiti Gao & Xiaodong Gong

To link to this article: https://doi.org/10.1080/07350015.2021.1979564

View supplementary material

Published online: 15 Nov 2021.

Submit your article to this journal

Article views: 343

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

ABSTRACT ARTICLE HISTORY

Let 0n and 1n be the vectors with n elements of zeros and where

FV ∨ σ e1 , . . . , et−1 , is the σ -field generated by FV ∪

in which pii and (gp)ii are the ith main diagonal

 (ρ0 = 0.3, σ02 = 1).

(a) Our model

σ 2 (ρ0 = 0.3, σ02 = 1).

(a) Our model

Table 3. Means and standard deviations of MSE of  1 (τ ), β

Table 6. Estimates of parameters under different bandwidths.

the economy becomes less labor-intensive, as the capital level

where the constant 1 is included in the regressor Xit . 1 ∗

Without imposing the assumption α = 0, model (A1) implies that 1

1 := 2H1 (ρ) + 2(ρ0 − ρ)H2 (ρ) + H3 (ρ) − σ 2 (ρ)

due to in Lemma B.2. By Lemma C.5, we have

so that To construct the martingale difference array, let j = N(t − 1) + i

(2) Proof of (A8). The second derivative can be obtained as follows:

By Lemma B.2, we have

According to Lemma C.5, we can get

Thus, Equation (A8) has been proved.

You might also like

FV ∨ σ e1 , . . . , et−1 , is the σ -field generated by FV ∪

(ρ0 = 0.3, σ02 = 1).

Table 3. Means and standard deviations of MSE of 1 (τ ), β