You are on page 1of 20

A Spatial Autoregressive Stochastic Frontier Model for Panel Data

Incorporating a Model of Technical Inefficiency


Takahiro Tsukamoto

Graduate School of Economics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan
E-mail: tsukamoto.takahiro@j.mbox.nagoya-u.ac.jp

January 27, 2018, Revised: July 30, 2018, Revised: November 14, 2018

Abstract
By integrating Battese and Coelli’s (1995) model and the spatial autoregressive model (SAR), a spatial
autoregressive stochastic frontier model for panel data is developed. The main feature of this frontier model is a
spatial lag term of explained variables and the joint structure of a production possibility frontier with a model of
technical inefficiency. The model addresses both spatial dependence and heteroskedastic technical inefficiency.
This study applies maximum likelihood methods considering the endogenous spatial lag term. The proposed
model nests several existing models. Further, an empirical analysis using data on the Japanese manufacturing
industry is conducted and the existing models are tested against the proposed model, which is found to be
statistically supported. The findings suggest that estimates in the existing spatial and non-spatial models may
exhibit bias because of lack of determinants of technical inefficiency, as well as a spatial lag. This bias also affects
the technical efficiency score and its ranking.

Keywords Stochastic frontier analysis (SFA), Determinants of technical inefficiency, Spatial autoregressive
dependence, Japanese manufacturing industry

JEL classification C23, C51, D24, E23

Electronic copy available at: https://ssrn.com/abstract=3111193


A Spatial Autoregressive Stochastic Frontier Model for Panel Data
Incorporating a Model of Technical Inefficiency
1. Introduction
In this study, we introduce spatial econometric techniques to stochastic frontier analysis (SFA) for simultaneously
estimating the production possibility frontier (PPF) with determinants of technical inefficiency. Specifically, by
integrating Battese and Coelli’s (1995) model with the spatial autoregressive model, we develop a spatial
autoregressive stochastic frontier model for panel data.
Production functions at the establishment, plant, firm, or regional level are often estimated using
stochastic frontier models. These models were first proposed by Aigner et al. (1977) and Meeusen and van den
Broeck (1977) almost at the same time. One feature of these models is that they have a composed error structure
consisting of two variables: one random variable that captures noise and another one that explains technical
inefficiency. If a production function is estimated using the ordinary least squares (OLS), the estimated production
function is the average production function. On the other hand, if we use a stochastic frontier model, the estimated
production function can be interpreted as the PPF. The stochastic frontier model is useful for empirical studies
based on economic models, since the production function usually assumed in microeconomics is the PPF.
However, since production activities may be interdependent on neighboring production activities
through various externalities or Input-Output networks, such as a supply-chain network, observations on
production entities are likely to correlate depending on their geographical distances. Thus, it is doubtful that the
assumption of spatial independence, which stochastic frontier models generally assume, is appropriate.
Knowledge spillover is a typical example of spatial externalities and promotes the imitation and innovation of
production technology. Spatial proximity between entities leads to the promotion of knowledge spillovers.
Therefore, if a knowledge spillover is present, production activities are spatially dependent. In addition, because
production activities affect the labor market and the intermediate goods market in a certain area and its surrounding
areas and vice versa, production activities are considered spatially dependent. Furthermore, Input-Output
networks induce spatial dependency. For example, Toyota Group’s offices and plants are located mainly in Aichi
Prefecture and its surrounding prefectures and have established a large industrial linkage network. Within the
group, both human resources and technology are shared actively. These regions are thus obviously mutually
dependent.
Econometric models to address spatial dependence have been developed in the field of spatial
econometrics. The basic spatial econometric models are the spatial autoregressive model (SAR), which includes
a spatial autocorrelation structure (spatial lag) of explained variables; spatial error model (SEM), which includes
a spatial autocorrelation structure in error terms; and spatial lag of X model (SLX), which includes the
autocorrelation structure (spatial lag) of explanatory variables, and their integrated models (Elhorst, 2014). SLX
is a model that captures the local spillovers of explanatory variables. As there are no notable estimation problems,
SLX can be estimated by OLS. Meanwhile, SEM and SAR capture global spillovers and cannot be estimated by
OLS. Glass et al. (2016) emphasize that, in SAR, spillovers are adopted explicitly and are related to the
independent variables. In LeSage and Pace (2009), spatial spillover is defined as non-zero cross-partial derivatives
𝜕𝑦𝑗 ⁄𝜕𝑥𝑖 = 0, 𝑗 ≠ 𝑖, which means changes in explanatory variables in region 𝑖 impact the explained variables in
region 𝑗. In this definition, SAR allows spatial spillover, but SEM does not. The inability of SEM to address the
spatial spillover reduces its appeal (Pace and Zhu, 2012). Fingleton and López-Bazo (2006) criticized SEM, since
SEM absorbs externalities into random shocks. Many empirical studies are interested in testing whether spatial
spillovers exist (Elhorst, 2010). If an analyst is interested in spatial spillover, SAR is preferred. However, a
drawback of SAR is that omitted variables with high degree of spatial dependence could lead to an overestimation
of the magnitude of externalities (Pace and Zhu, 2012).1
Since the 2010s, studies on stochastic frontier models considering spatial dependence (spatial stochastic
frontier models) have developed rapidly. Druska and Horrace (2004) conducted the first study on spatial stochastic
frontier models. They estimated a spatial error production frontier panel data model using the generalized methods
of moments by integrating Kelejian and Prucha’s (1999) stochastic frontier model and SEM. They calculated the
time-invariant technical inefficiency and concluded that the consideration of spatial correlation affects the
technical efficiency scores and its ranking. In addition, Fusco and Vidoli (2013) and Vidoli et al. (2016) estimated
spatial stochastic frontier models introducing the structure of SEM using the maximum likelihood (ML) methods.
Adetutu et al. (2015) estimated the stochastic frontier model that incorporates the structure of SLX. Estimation
procedures for typical non-spatial stochastic frontier models can be applied directly to the stochastic frontier
model with the SLX structure. Glass et al. (2016) proposed a spatial autoregressive stochastic frontier model for
panel data by integrating SAR and a half-normal stochastic frontier model proposed by Aigner et al. (1977); they


For more information on spatial econometrics, see Anselin (1988) and LeSage and Pace (2009).

Electronic copy available at: https://ssrn.com/abstract=3111193


estimate using the ML methods.2 They also introduced the concept of efficiency spillover and their term for
technical inefficacy is homoskedastic. Ramajo and Hewings (2018) developed a spatial autoregressive stochastic
frontier model for panel data with the feature of a time-varying decay efficiency specification by extending Battese
and Coelli’s (1992) model (which is different from their 1995 model). This model permits inefficiency to increase
or decrease exponentially depending only on a scalar measuring the yearly rate of technological catch-up. As with
non-frontier SAR, omitting spatially dependent variables in these existing stochastic frontier models with a SAR
structure may lead to overestimating the magnitude of externalities.
The literature on non-spatial stochastic frontier models has indicated that estimation taking into account
the determinants of technical inefficiency is important (Kumbhakar and Lovell, 2003). Since it is possible to
estimate the score of technical efficiency for each unit in stochastic frontier models, it is natural to try to find the
determinants of the score. In early studies, such as Kalirajan (1981), to know the determinants of technical
inefficiency, a “two-stage approach” was adopted, in which technical inefficiency was first estimated using a
stochastic frontier model, and then the estimated value was regressed with factor variables. However, the second
stage contradicts the assumption in the first stage, that is, the probability distribution of random variables
explaining technical inefficiency is mutually independent. Subsequent studies (Kumbhakar et al., 1991;
Reifschneider and Stevenson, 1991; Caudill and Ford, 1993; Huang and Liu, 1994; Caudill et al., 1995; Battese
and Coelli, 1995) developed a (non-spatial) “single-stage approach” that simultaneously estimates the stochastic
frontier and determinants of technical inefficiency. As Kumbhakar and Lovell (2003) mentioned, if there are
determinants of technical inefficiency that correlate with explanatory variables (input quantity of production
function), the parameter estimates in usual stochastic frontier models will have a bias. This also affects the
technical efficiency score. Therefore, if technical inefficiency is not completely randomly determined or if its
determinants exist, the single-stage approach is preferable, regardless whether analysts are interested in the
determinants of technical inefficiency. Battese and Coelli's (1995) model is a single-stage approach model adopted
by the largest number of empirical studies, such as Fries and Taci (2005), Srairi (2010), Saeed and Izzeldin (2016),
and many others.
Therefore, a stochastic frontier model with a SAR structure introducing a model of technical
inefficiency is expected to be able to correctly estimate parameters, including those related to spatial dependence.
Pavlyuk (2011) mentioned a spatial stochastic frontier model that simultaneously estimates the determinants of
technical inefficiency as “one possible spatial modification of the stochastic frontier.” However, Pavlyuk
presented neither the estimation method nor estimates. We are not aware of any studies that estimate single-stage
approach stochastic frontier models with SAR structure.
As such, in this study, we develop a spatial stochastic frontier model with the SAR term and features
of Battese and Coelli’s (1995) model, which simultaneously estimates the determinants of technical inefficiency.
The proposed model can identify the cause of technical inefficiency. It also has the merit of coping with the
omitted-variable bias because of the lack of determinants of technical inefficiency and the spatial lag. Moreover,
the proposed model nests many existing spatial and non-spatial stochastic frontier models. The model selection
can be easily done by a statistical test for nested structure.
The remainder of this paper is structure as follows. Section 2 presents a spatial autoregressive stochastic
frontier model for panel data that incorporates a model of technical inefficiency and its estimation method. Section
3 compares the proposed model with existing models by conducting an empirical analysis using aggregated
production panel data from the Japanese manufacturing industry, while section 4 presents the results. Section 5
offers conclusions.
2. Model
Our proposed spatial autoregressive stochastic frontier production model for panel data that incorporates a model
of technical inefficiency is as follows:

𝑦𝑖𝑡 = 𝒙′𝑖𝑡 𝜷 + 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 + 𝑣𝑖𝑡 − 𝑢𝑖𝑡 , 𝑖 = 1,2, … , 𝑁, 𝑡 = 1,2, … , 𝑇𝑖 , (1)
𝑣𝑖𝑡 ~𝑖. 𝑖. 𝑑. N(0, 𝜎𝑣2 ), (2)
𝑢𝑖𝑡 ~𝑖. 𝑖. 𝑑. N + (𝜇𝑖𝑡 , 𝜎𝑢2 ), (3)
𝜇𝑖𝑡 = 𝒛′𝑖𝑡 𝜹. (4)

In the above, 𝑦𝑖𝑡 is a scalar output of producer 𝑖 in period 𝑡 and 𝒙𝑖𝑡 is a vector of inputs used by producer 𝑖 in
period 𝑡. 𝒛𝑖𝑡 is a vector of determinants that may generate technical inefficiency, including both producer-specific
attributes and environmental factors. The second term on the right-hand-side of Equation (1) represents the SAR
term that captures spatial dependency. 𝑤𝑖𝑗𝑡 is the 𝑖𝑗 element of the spatial weight matrix in period 𝑡. The whole
spatial weight matrix 𝐖 is a block diagonal matrix of {𝐖1 , 𝐖 2 , … , 𝑾𝑇 }, where 𝐖𝑡 = {𝑤𝑖𝑗𝑡 }, and 𝜌 is an unknown


Affuso’s (2010) pioneering study included the spatial lag of explained variables to explanatory variables in a
stochastic frontier model, where the log-likelihood function cannot however address the endogeneity of the
spatial lag term.

Electronic copy available at: https://ssrn.com/abstract=3111193


parameter associated with the SAR term. The range of 𝜌 is (1⁄𝜔𝑚𝑖𝑛 , 1⁄𝜔𝑚𝑎𝑥 ), where 𝜔𝑚𝑎𝑥 and 𝜔𝑚𝑖𝑛 are the
smallest and largest eigenvalues of 𝐖, respectively3. 𝑣𝑖𝑡 is random noise and 𝑢𝑖𝑡 represents technical inefficiency.
It is assumed that 𝑣𝑖𝑡 and 𝑢𝑖𝑡 are independent. In Equation (4), 𝒛𝑖𝑡 is an (𝑚 × 1) vector of exogenous variables
that explains technical inefficiency, and 𝜹 is a (𝑚 × 1) vector of unknown parameters.
The spatial weight matrix 𝐖 is a non-negative non-stochastic matrix that describes the strength of the
relationship between cross-sectional units. To eliminate its direct influence on itself, the diagonal matrix is set to
0. Various spatial weight matrix specifications have been proposed in the field of spatial econometrics, such as a
binary adjacency matrix, or a matrix defined as the decreasing function of geographical or economic distance
between regions. If 𝐖 is row-normalized, product 𝐖𝒚 of the spatial weight matrix and explanatory variables can
be interpreted as a weighted average of the explanatory variables. Additionally, the row-normalized spatial weight
matrix does not depend on the unit of distance. Therefore, many spatial econometric studies use row-normalized
spatial weight matrices (Arbia, 2014).
In typical linear models, 𝜷 represents marginal effects, which can be interpreted as elasticity when the
variables are logarithmic values. However, as Kelejian et al. (2006), LeSage and Pace (2009), Glass et al. (2016),
among others, have noted 𝜷 does not represent the marginal effects in models with SAR structure. In our model,
the partial derivatives matrix with respect to the 𝑟th explanatory variables 𝒙𝑟 is as follow:

𝜕𝒚 (5)
= (𝑰𝑁𝑇 − 𝜌𝑾)−1 𝛽𝑟 = (𝑰𝑁𝑇 + 𝜌𝑾 + 𝜌2 𝑾2 + 𝜌3 𝑾3 + ⋯ )𝛽𝑟 ,
𝜕𝒙𝑟 ′

where 𝒚 = {𝑦𝑖𝑡 } is a vector of outputs and 𝛽𝑟 is the 𝑟 th parameter of 𝜷 . The marginal effect varies across
observations. Every diagonal element of the matrix refers to the marginal effect of its own explanatory variable,
which is called a direct effect. Every non-diagonal element of the matrix refers to the marginal effect of the
explanatory variable that is not its own, which is called an indirect effect. LeSage and Pace (2009) proposed using
the average of the diagonal elements of the matrix as summary statistics of the direct effect. However, since the
average is missing a large amount of information, indices representing dispersion of the direct effect (e.g.,
maximum and minimum values) should also be reported.
The proposed model nests many existing spatial and non-spatial stochastic frontier models. If 𝜌 = 0,
our model becomes equivalent to the model suggested by Battese and Coelli (1995). If 𝒛𝑖𝑡 includes only a constant
term, it becomes a spatial stochastic frontier model assuming a homoskedastic truncated normal distribution as
the distribution that represents technical inefficiency. If 𝜹 = 𝟎 , our model becomes equivalent to the SAR
stochastic frontier model assuming a homoskedastic half-normal distribution as a distribution that represents
technical inefficiency, as proposed by Glass et al. (2016). If 𝜌 = 0 and 𝒛𝑖𝑡 has only a constant term, the model
will be a non-spatial stochastic frontier model assuming a truncated normal distribution as the distribution that
represents technical inefficiency, as proposed by Stevenson (1980). If 𝜌 = 0 and 𝜹 = 𝟎 , our model becomes
equivalent to a non-spatial stochastic frontier model assuming a homoskedastic half-normal distribution as the
distribution that represents technical inefficiency, as proposed by Aigner et al. (1977).
The spatial lag term ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 is endogenous; thus, estimating the proposed model using the ML
methods for non-spatial stochastic frontier models causes a bias, unless 𝜌 = 0. Taking the endogeneity of the
spatial lag term into consideration, we present the ML methods to estimate the spatial autoregressive stochastic
frontier model.
Following Battese and Coelli (1995), we re-parameterize as follows:

𝜎 2 ≔ 𝜎𝑣2 + 𝜎𝑢2 , (6)


2
𝜎𝑢
𝛾≔ . (7)
𝜎𝑣2 +𝜎𝑢
2

Then, the log-likelihood function in Equations (1)–(4) is as follows:


To estimate the spatial econometrics model, (𝑰𝑁𝑇 − 𝜌𝑾) needs to be a non-singular matrix (𝑰𝑁𝑇 is the unit
matrix of the observation magnitude). When 𝐖 is a symmetric matrix, the range of 𝜌 is (1⁄𝜔𝑚𝑖𝑛 , 1⁄𝜔𝑚𝑎𝑥 ).
Here, 𝜔𝑚𝑎𝑥 and 𝜔𝑚𝑖𝑛 are the smallest and largest eigenvalues of 𝐖, respectively. If 𝐖 is an asymmetric matrix,
the eigenvalues may become complex numbers. LeSage and Pace (2009) show that, if 𝐖 is row-normalized
(each row’s sum is set to 1), the range of 𝜌 should be (1⁄𝑟𝑚𝑖𝑛 , 1). Note that 𝑟𝑚𝑖𝑛 is the most negative purely
real eigenvalue of 𝐖. Regardless whether 𝐖 is a symmetric or asymmetric matrix, if 𝐖 is row-normalized, the
range of 𝜌 is (1⁄𝑟𝑚𝑖𝑛 , 1).

Electronic copy available at: https://ssrn.com/abstract=3111193


𝐿𝐿(𝜷, 𝜹, 𝛾, 𝜌, 𝜎 2 ; 𝒚)
𝑁
1
= ln|𝑰𝑁𝑇 − 𝜌𝑾| − (∑ 𝑇𝑖 ) [ln 𝜎 2 + ln 2𝜋]
2
𝑖=1
𝑁 𝑇𝑖 2
1 𝒛′𝑖𝑡 𝜹 + 𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 (8)
− ∑∑( )
2 𝜎
𝑖=1 𝑡=1
𝑁 𝑇𝑖

− ∑ ∑[ln 𝛷(𝑑𝑖𝑡 ) − ln 𝛷(𝑑𝑖𝑡 )],
𝑖=1 𝑡=1
𝑁

𝜇𝑖𝑡 ≔ 𝒛′𝑖𝑡 𝜹(1 − 𝛾) − (𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑ 𝑤𝑖𝑗𝑡 𝑦𝑗𝑡 ) 𝛾, (9)
𝑗=1

𝜎 ∗ ≔ 𝜎√(1 − 𝛾)𝛾, (10)



𝜇𝑖𝑡

𝑑𝑖𝑡 ≔ ∗, (11)
𝜎
𝒛′𝑖𝑡 𝜹
𝑑𝑖𝑡 ≔ . (12)
𝜎 √𝛾

Here, ln|𝑰𝑁𝑇 − 𝜌𝑾| comes from the Jacobian matrix that accompanies the variable transformation from 𝜀𝑖𝑡 ≔
𝑣𝑖𝑡 − 𝑢𝑖𝑡 to 𝑦𝑖𝑡 , considering the endogeneity of the spatial lag term. Derivation of the likelihood is shown in the
Appendix. The first-order conditions of the ML estimators are as follows:

𝑁 𝑇𝑖
∂𝐿𝐿 𝒛′𝑖𝑡 𝜹 + 𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡

𝜙(𝑑𝑖𝑡 ) 𝛾
= ∑∑[ 2
+ ∗ ) ∗ ] 𝒙𝑖𝑡 = 𝟎,
(13)
∂𝜷 𝜎 𝛷(𝑑𝑖𝑡 𝜎
𝑖=1 𝑡=1
𝑁 𝑇𝑖
∂𝐿𝐿 𝒛′𝑖𝑡 𝜹 + 𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 𝜙(𝑑𝑖𝑡 ) 1 ∗
𝜙(𝑑𝑖𝑡 ) (1 − 𝛾)
= ∑ ∑ [− 2
− + ∗) ] 𝒛𝑖𝑡, (14)
∂𝜹 𝜎 )
𝛷(𝑑𝑖𝑡 𝜎√𝛾 𝛷(𝑑𝑖𝑡 𝜎∗
𝑖=1 𝑡=1
=𝟎
𝑁 𝑁 𝑇𝑖 2
∂𝐿𝐿 1 (𝒛′𝑖𝑡 𝜹 + 𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 )
= − {(∑ 𝑇𝑖 ) − ∑ ∑
∂𝜎 2 2𝜎 2 𝜎2
𝑖=1 𝑖=1 𝑡=1
𝑁 𝑇𝑖 (15)

𝜙(𝑑𝑖𝑡 ) 𝜙(𝑑𝑖𝑡 ) ∗
−∑∑[ 𝑑𝑖𝑡 − 𝑑 ]} = 0,
𝛷(𝑑𝑖𝑡 ) 𝛷(𝑑𝑖𝑡 ) 𝑖𝑡

𝑖=1 𝑡=1

𝑁 𝑇𝑖
∂𝐿𝐿 𝜙(𝑑𝑖𝑡 ) 1
= ∑∑[ ( 𝑑 )
∂𝛾 𝛷(𝑑𝑖𝑡 ) 2𝛾 𝑖𝑡
𝑖=1 𝑡=1 (16)

𝜙(𝑑𝑖𝑡 ) 𝒛′𝑖𝑡 𝜹 + 𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡

(1 − 2𝛾)𝑑𝑖𝑡
− ∗) { + }] = 0,
𝛷(𝑑𝑖𝑡 𝜎∗ 2(1 − 𝛾)𝛾
∂𝐿𝐿
= −tr((𝑰𝑁𝑇 − 𝜌𝑾)−1 𝑾)
∂𝜌
𝑇𝑖
𝑁 𝑁
𝒛′𝑖𝑡 𝜹 + 𝑦𝑖𝑡 − 𝒙′𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡 ∗ (17)
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 𝜙(𝑑𝑖𝑡 ) 𝛾
+ ∑ ∑ ∑ 𝑤𝑖𝑗𝑡 𝑦𝑗𝑡 [ + ∗ ) ∗ ] = 0,
𝜎2 𝛷(𝑑𝑖𝑡 𝜎
𝑖=1 𝑡=1 𝑗=1

where 𝜙(⋅) and 𝛷(⋅) respectively indicate the probability density and cumulative distribution functions of the
standard normal distribution. As Equations (8)–(17) cannot be solved analytically, we maximize the log-likelihood
function numerically with this first-order condition satisfied.

As Battese and Coelli (1988) proposed, technical efficiency score 𝑇𝐸𝑖𝑡 is measured by the expectation
of exp(−𝑢𝑖𝑡 ) conditional on 𝜀𝑖𝑡 = 𝑣𝑖𝑡 − 𝑢𝑖𝑡 .

Electronic copy available at: https://ssrn.com/abstract=3111193


𝜇 ∗
𝛷( 𝑖𝑡 ∗
∗ −𝜎 )
1 𝜎
∗ ∗2
𝑇𝐸𝑖𝑡 ≔ E(exp(−𝑢𝑖𝑡 ) |𝜀𝑖𝑡 ) = exp [−𝜇𝑖𝑡 + 𝜎 ]⋅{ 𝜇 ∗ }. (18)
2
𝛷( 𝑖𝑡
∗) 𝜎

Estimates of 𝑇𝐸𝑖𝑡 are obtained by evaluating Equation (18) with the ML estimates.

3. Application to the Japanese manufacturing industry


Here, using balanced regional panel data of all 47 prefectures in Japan during the 13 years from 2002 to 2014, we
estimate the regional aggregate production function of the manufacturing industry. Each prefecture is an important
policy maker for its industry. Some studies have applied spatial autoregressive stochastic frontier models,
especially to European regional or country-level data (Glass et al., 2016 = 41 European countries; Han et al., 2016
= 21 OECD countries; Ramajo and Hewings, 2018 = 120 regions in nine Western European countries), whereas
our study is the first to apply a spatial stochastic frontier model to Japanese regional data.
The significance of this application section is twofold. First, we empirically clarify the features of our
proposed model. By comparing our model with the existing spatial and non-spatial stochastic frontier models that
our model nests using real-world data, we check whether model differences affect their estimates or technical
efficiency scores. In particular, we would like to investigate the influence of the SAR term on estimates and
technical efficiency scores. Additionally, we would like to check whether introducing a model of technical
inefficiency affects the magnitude of the scaling parameter of spatial dependence.
Second, we examine whether spatial spillover is detected in the Japanese manufacturing industry, which
provides policy implications for an industrial cluster policy. In this application, we employ a macroeconomics
assumption that each prefecture is treated as if it was a producer. This assumption is relevant if we consider the
industrial cluster policy purposed to enhance productivity by exploiting positive spatial externalities. In fact, the
Japanese government intends to form industrial clusters across a wide range of prefectures, and one of the
important factors of this policy is positive spatial externalities across prefectures. Thus, it is important to
empirically verify whether the manufacturing industry has an interdependency relationship over prefectures.
We adopt the Cobb–Douglas production function—one of the most standard production functions. The
estimation equation is as follows:
𝑁

ln 𝑦𝑖𝑡 = 𝛼 + 𝛽𝑙 ln 𝐿𝑖𝑡 + 𝛽𝑘 ln 𝐾𝑖𝑡 + 𝛽𝑡 𝑡 + 𝛽𝑡 2 𝑡 2 + 𝜌 ∑ 𝑤𝑖𝑗 ln 𝑦𝑗𝑡 + 𝑣𝑖𝑡 − 𝑢𝑖𝑡 ,


(19)
𝑗=1
𝑖 = 1,2, … ,47, 𝑡 = 0,1,2, … ,12,
𝜇𝑖𝑡 = 𝛿0 + 𝛿1 𝐷𝑃𝑂𝑃𝑖𝑡 + 𝛿2 𝐷𝑃𝑂𝑃𝑖𝑡2 + 𝛿3 𝐿𝐴𝑅𝐺𝐸𝑖𝑡 + 𝛿4 𝑊𝐻𝑂𝑈𝑅𝑆𝑖𝑡 + 𝛿5 𝑊𝐻𝑂𝑈𝑅𝑆𝑖𝑡2 . (20)

In the above, 𝑦𝑖𝑡 , 𝐿𝑖𝑡 , and 𝐾𝑖𝑡 are the output, labor input, and capital input, respectively. 𝛽𝑙 and 𝛽𝑘 are unknown
estimated parameters of labor and capital, respectively. As in Glass et al. (2016) and Ramajo and Hewings (2018),
we assume a Hicks-neutral technical change and add a linear time trend variable 𝑡 and its square (𝑡 is 0, with 2002
as the benchmark year, and increases by 1 for each year) in Equation (19).
The output is the value added (million yen) in manufacturing establishments with 30 or more employees
from the Census of Manufacture by the Ministry of Economy, Trade and Industry. Labor input is the number of
workers multiplied by working hours per capita. The number of employees in manufacturing establishments with
30 or more employees is taken from the Industrial Statistical Survey. The working hours are monthly average
total hours worked per capita by regular employees of manufacturing establishments with 30 or more employees;
they are taken from the Monthly Labor Survey (Regional Survey) by the Ministry of Health, Labor and Welfare.
Capital input is the “value of tangible fixed assets other than land” (million yen) in manufacturing establishments
with 30 or more employees from the Census of Manufacture.4
It is natural to think that spatial interdependence, such as externalities, diminishes with geographical
distance.5 In fact, in many urban economic studies, the property of knowledge spillover decaying with distance


The index variables of monetary value are not nominal, but real. We apply a chain-linked deflator for the
manufacturing industry to the value added and for private enterprise equipment to the value of tangible fixed
assets. These deflators are from the “National Accounts for 2015” (System of National Accounts 2008,
benchmark year = 2011)

As defined in the model section, our proposed model allows for a time-variant spatial weight matrix.
Stakhovych and Bijmolt (2009) have divided the specification of weights matrices into: (1) treating weights
matrices as completely exogenous constructs, (2) letting the data determine them, and (3) estimating them. In
fact, some studies define weight matrices using the strength of economic relations as (2). However, LeSage and
Pace (2011) argued that weight matrices should be exogenous and, they recommend using geographical

Electronic copy available at: https://ssrn.com/abstract=3111193


is thought to lead to the formation of cities, which is also known as agglomeration (Marshall, 1890). Thus, in this
study, the spatial weight matrix is defined as the row-normalized inverse distances between prefectural offices
(km). As mentioned in section 3, row-normalization eliminates the influence of the distance unit.
Variables that represent the determinants of technical inefficiency include population density 𝐷𝑃𝑂𝑃𝑖𝑡 ,
ratio of large establishments 𝐿𝐴𝑅𝐺𝐸𝑖𝑡 , and per capita working hours 𝑊𝐻𝑂𝑈𝑅𝑆𝑖𝑡 . 6 𝐷𝑃𝑂𝑃𝑖𝑡 is obtained by
dividing the intercensal adjusted population of individuals aged 15 to 64 years on October 1 of each year by the
inhabitable land of each prefecture (ha) in the Population Estimates of the Ministry of Internal Affairs and
Communications. This can be regarded as a proxy variable of agglomeration. An agglomeration economy is
usually considered to have a positive influence on efficiency. However, as agglomeration progresses, its effect
may become negative because the congestion effect dominates. Considering this, we also add the squared term of
population density. 𝐿𝐴𝑅𝐺𝐸𝑖𝑡 is the relative number of manufacturing establishments with 300 or more employees
to those with 30 or more employees. Based on estimates using aggregated data, it is impossible to distinguish
whether the increasing returns to scale of the regional production function are caused by the agglomeration
economy or by the increasing returns to scale at the establishment level. In this regard, 𝐿𝐴𝑅𝐺𝐸𝑖𝑡 is supposed to
capture the economies of scale at the establishment level. Necessary data to compute 𝐿𝐴𝑅𝐺𝐸𝑖𝑡 are taken from the
Census of Manufacture.
In Equation (20), the squared term of 𝑊𝐻𝑂𝑈𝑅𝑆𝑖𝑡 is also an explanatory variable in the model of
technical inefficiency. This is because of the existence of optimal working hours. In Japan, long working hours
are a social problem from the viewpoint of health, work-life balance, and labor productivity. Therefore, some
policies are underway to regulate working hours to improve the welfare of workers. However, from the perspective
of labor productivity, a too short working time per worker may be costly for managers to coordinate the production
plan. We measure the optimal working hours per worker in the next section.
Tables 1 and 2 show the descriptive statistics and correlation coefficient matrix of those variables in
our dataset, respectively. The explanatory variables and determinants of technical inefficiency are correlated. If
some of the significant determinants are omitted, the parameter estimates in the PPF are expected to have a bias.

Table 1 Summary statistics


Max. Min. Mean Mode Std. dev.
ln 𝑦 16.28 11.00 13.80 11.00 0.99
ln 𝐿 18.58 14.49 16.49 14.49 0.82
ln 𝐾 15.57 11.37 13.50 11.37 0.88
𝐷𝑃𝑂𝑃 10.93 0.28 1.84 1.65 2.12
𝐿𝐴𝑅𝐺𝐸 0.1131 0.0000 0.0664 0.0625 0.0189
𝑊𝐻𝑂𝑈𝑅𝑆 175.90 151.10 165.71 166.70 4.43
Note: y = output; L = labor input; K = capital input; DPOP = population density; LARGE = ratio of large
establishments; WHOURS = per capita working hours

Table 2 Correlation coefficient matrix


ln 𝑦 ln 𝐿 ln 𝐾 𝐷𝑃𝑂𝑃 𝐿𝐴𝑅𝐺𝐸 𝑊𝐻𝑂𝑈𝑅𝑆
ln 𝑦 1.000 0.514 0.528 0.234 0.276 0.028
ln 𝐿 1.000 0.384 0.153 -0.143 0.197
ln 𝐾 1.000 -0.234 0.095 -0.228
𝐷𝑃𝑂𝑃 1.000 -0.226 -0.356
𝐿𝐴𝑅𝐺𝐸 1.000 0.025

distance. Therefore, in this application, we use the time-invariant weight matrix based on geographical distance
secured exogenous.

Theoretically, the proposed model can reduce omitted-variable bias by introducing appropriate determinants
of technical inefficiency. Since we adopted variables that are statistically significant, we believe that our
specification of the determinants of technical inefficiency allows us successfully to remove omitted-variable
bias. However, our specification search is limited by availability of data. Thus, the possibility of
misspecification cannot be completely ruled out.

Electronic copy available at: https://ssrn.com/abstract=3111193


𝑊𝐻𝑂𝑈𝑅𝑆 1.000
Note: y = output; L = labor input; K = capital input; DPOP = population density; LARGE = ratio of large
establishments; WHOURS = per capita working hours

The proposed spatial autoregressive stochastic frontier model for panel data that incorporates a model
of technical inefficiency (hereinafter, SSFTE) nests many existing spatial and non-spatial stochastic frontier
models. Therefore, in addition to the proposed model, we estimate several models with constraints on parameters.
First, the model with 𝜌 = 𝛾 = 0 and 𝜹 = 𝟎 is a linear regression model. Second, the model with 𝛾 = 0 and 𝜹 =
𝟎 is a SAR regression model. Third, the model with 𝜌 = 0 and 𝜹 = 𝟎 is a non-spatial stochastic frontier model
with a half-normal distribution proposed by Aigner et al. (1977) (hereinafter, ALS). Fourth, the model with 𝜹 = 𝟎
is a spatial stochastic frontier model with a SAR structure and a half-normal distribution proposed by Glass et al.
(2016) (hereinafter, GKS). Fifth, the model with 𝜌 = 0 is a non-spatial stochastic frontier model that incorporates
the model of technical inefficiency proposed by Battese and Coelli (1995) (hereinafter, BC95).

4. Estimation results
Table 3 shows the estimation results. In the models with a spatial lag, the coefficient 𝜌 of the spatial lag is
statistically significant at the 1% significance level, with a positive sign. This indicates that production activities
of the Japanese manufacturing industry are spatially dependent and have mutually positive externality effects.
Thus, the Japanese government’s industrial cluster policy is supported. The magnitude of the coefficient varies
depending on the models. The coefficient in SSFTE is smaller than that in the other models (𝜌 = 0.3129 in SAR
and 𝜌 = 0.3329 in GKS, whereas 𝜌 = 0.2115 in SSFTE). In models that do not consider determinants of technical
inefficiency, 𝜌 is considered to be overestimated because 𝜌 absorbs some of the heteroskedasticity of technical
inefficiency. This indicates that consideration of the determinants of technical inefficiency is also important in
spatial models.
Table 4 shows the labor elasticity of production, capital elasticity of production, degree of returns to
scale, and average annual rate of the Hicks-neutral technical change, which are calculated from the estimation
results. In the model with the spatial lag term, these values vary over observations, so their maximum, minimum,
and average values are displayed. The average values are equivalent to the summary statistics of the direct effect
in LeSage and Pace (2009). The degree of returns to scale is the sum of labor elasticity and capital elasticity of
production. The degree of returns to scale that is greater (less) than 1 indicates increasing (decreasing) returns to
scale technology.
Labor coefficient and labor elasticity in the models with spatial lag are lower than those in models
without spatial lag. This suggests that the labor elasticity value in the model without spatial lag is overestimated,
as labor input correlates with spatial spillover effects, including externality. Degrees of returns to scale indicate
the economy of scale in not only linear regression and SAR but also stochastic frontier models without a model
of technical efficiency. For example, the estimates by ALS and GKS are 1.08 and 1.15, respectively, indicating
increasing returns to scale. However, BC95 and SSFTE show almost constant returns to scale technology, as their
estimates of the degree of returns to scale are 1.005 and 1.01, respectively. This suggests that the coefficients of
input quantities in such models that ignore the determinants of technical inefficiency are overestimated because
of the correlation between the determinants of technical inefficiency and input amount, especially capital input.
As Table 3 shows, in all models, the coefficient of the time trend in the PPF is statistically significant
at the 5% significance level and its sign is positive. The sign of the coefficient of the squared time trend is negative,
but insignificant for all the models except BC95. The results indicate that the PPF shifts upward through technical
change during the analysis period. The annual rate of the Hicks-neutral technical change is positively constant in
all models except BC95. Looking at the average annual rate of the Hicks-neutral technical change in Table 4, the
rate in models that consider spatial dependence (SAR, GKS and SSFTE) is lower than that in the models that do
not take it into consideration.
In both BC95 and SSFTE, the coefficient on population density is significantly negative at the 1%
significance level and the sign of the coefficient of the square term of population density is positive. In BC95, the
latter is statistically significant at the 10% significance level, while it is not significant in the case of SSFTE.
Eventually, it is implied that the increase in population density raises technical efficiency within the dataset.

Electronic copy available at: https://ssrn.com/abstract=3111193


Table 3 Estimation results
OLS SAR ALS GKS BC95 SSFTE

Coef. z-stat Coef. z-stat Coef. z-stat Coef. z-stat Coef. z-stat Coef. z-stat

𝛼 -3.6458*** (-18.95) -7.0903*** (-16.89) -3.1177*** (-13.66) -6.5592*** (-16.15) -1.2218*** (-5.24) -3.9661*** (-7.95)

𝛽𝑙 0.5987*** (18.31) 0.5100*** (15.89) 0.5483*** (16.10) 0.4180*** (13.44) 0.5396*** (19.05) 0.4654*** (14.88)

𝛽𝑘 0.5501*** (17.95) 0.5943*** (20.67) 0.5854*** (19.17) 0.6613*** (25.63) 0.4651*** (15.91) 0.5382*** (16.25)

𝛽𝑡 0.0327*** (3.75) 0.0175** (2.17) 0.0322*** (4.23) 0.0151** (1.99) 0.0369*** (4.87) 0.0239*** (3.28)

𝛽𝑡 2 -0.0010 (-1.39) -0.0001 (-0.22) -0.0010 (-1.64) 0.0000 (0.00) -0.0014** (-2.28) -0.0006 (-1.04)

𝛿0 49.1749*** (4.53) 45.0132*** (3.47)

𝛿1 -0.1961*** (-6.18) -0.1746*** (-4.51)

𝛿2 0.0079* (1.83) 0.0041 (0.57)

𝛿3 -7.6464*** (-9.35) -8.1727*** (-6.87)

𝛿4 -0.5770*** (-4.37) -0.5280*** (-3.35)

𝛿5 0.0017*** (4.31) 0.0016*** (3.30)

𝜌 0.3129*** (9.16) 0.3329*** (10.04) 0.2115*** (6.20)

𝛾 0.6119*** (6.66) 0.7463*** (11.33) 0.7692*** (14.53) 0.7334*** (14.51)


2
σ 0.0468*** (17.81) 0.0415*** (17.65) 0.0765*** (8.75) 0.0785*** (9.13) 0.0496*** (10.97) 0.0529*** (7.05)

𝐿𝐿 68.1950 103.9273 72.3177 114.3961 163.8507 182.7421

𝐴𝐼𝐶 -124.3900 -193.8547 -130.6353 -212.7922 -301.7013 -337.4843


Note: OLS: Linear Regression; SAR: spatial autoregressive model; ALS: non-spatial stochastic frontier model with half-normal distribution proposed by Aigner
et al. (1977); GKS: spatial stochastic frontier model with a SAR structure and a half-normal distribution proposed by Glass et al. (2016); BC95: non-spatial
stochastic frontier model that incorporates a model of technical inefficiency proposed by Battese and Coelli (1995); SSFTE: the proposed model. Standard
errors are calculated using the inverse of the negative Hessian evaluated at maximum likelihood estimates. The second partial derivatives are computed by
finite differences. *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively.

Electronic copy available at: https://ssrn.com/abstract=3111193


Table 4 Marginal effects and rate of technical change
OLS SAR ALS GKS BC95 SSFTE
Frontier Yes Yes Yes Yes
Spatial lag Yes Yes Yes
Model of TE Yes Yes
Labor elasticity of production
Max. 0.5987 0.5171 0.5483 0.4247 0.5396 0.4682
Min. 0.5987 0.5105 0.5483 0.4184 0.5396 0.4655
Mean 0.5987 0.5129 0.5483 0.4207 0.5396 0.4665
Capital elasticity of production
Max. 0.5501 0.6025 0.5854 0.6719 0.4651 0.5415
Min. 0.5501 0.5948 0.5854 0.6620 0.4651 0.5384
Mean 0.5501 0.5976 0.5854 0.6656 0.4651 0.5395
Degrees of returns to scale
Max. 1.1488 1.1196 1.1337 1.0966 1.0047 1.0097
Min. 1.1488 1.1053 1.1337 1.0804 1.0047 1.0039
Mean 1.1488 1.1105 1.1337 1.0863 1.0047 1.0060
Average annual rate of the Hicks-neutral technical change
Max. 0.0209 0.0167 0.0204 0.0155 0.0203 0.0169
Min. 0.0209 0.0164 0.0204 0.0152 0.0203 0.0168
Mean 0.0209 0.0165 0.0204 0.0153 0.0203 0.0169

Note: Model of TE: model of technical inefficiency; OLS: Linear Regression; SAR: spatial autoregressive model;
ALS: non-spatial stochastic frontier model with a half-normal distribution proposed by Aigner et al. (1977); GKS:
spatial stochastic frontier model with a SAR structure and a half-normal distribution proposed by Glass et al.
(2016); BC95: non-spatial stochastic frontier model that incorporates a model of technical inefficiency proposed
by Battese and Coelli (1995); SSFTE: the proposed model; Hicks-neutral technical change rate is the mean annual
technical progress rate calculated based on the estimated time trends.

In BC95 and SSFTE, the coefficients of working hours and working hours squared are statistically
significant at the 1% significance level. The per capita working time to maximize technical efficiency was 167.1
hours for BC95 and 167.2 hours for SSFTE. This result is thus robust in the model’s specification. Coefficients of
the large-scale business establishment ratio are significantly negative at the 1% significance level in BC95 and
SSFTE. This suggests that economies of scale are present at establishment level.
Table 5 shows the results of testing the several nested models by the proposed model using the likelihood
ratio (LR) test.7 The null hypothesis of no spatial lag (i.e., Ho : 𝜌 = 0) is rejected at the 1% significance level. The
model with spatial lag is supported empirically. The null hypotheses of no determinants of technical inefficiency
(i.e., Ho : 𝜹 = 𝟎), and the null hypothesis of no spatial lag and no determinants of technical inefficiency (i.e.,
Ho : 𝜌 = 0 and 𝜹 = 𝟎) are both rejected at the 1% significance level. The modeling determinants of technical
inefficiency are statistically supported. The null hypothesis of no technical inefficiency (i.e., Ho : 𝛾 = 0 and 𝜹 =
𝟎) is rejected at the 1% significance level. This supports the composed error structure peculiar to the stochastic
frontier model. The null hypothesis of no spatial lag and no determinants of technical inefficiency and no technical
inefficiency (i.e., Ho : 𝜌 = 𝛾 = 0 and 𝜹 = 𝟎) is decisively rejected at the 1% significance level. As a result of the
LR test, all the existing nested models are rejected, which indicates that SSFTE is preferable.

Table 5 LR test results


Null Number of 1% rejection
Model Test statistic Decision
hypothesis constraints statistic
𝜌 = 𝛾 = 0,
OLS 8 229.09 26.12 Reject
𝜹=𝟎
SAR 𝛾 = 0, 𝜹 = 𝟎 7 157.63 24.32 Reject
ALS 𝜌 = 0, 𝜹 = 𝟎 7 220.85 24.32 Reject
GKS 𝜹=𝟎 6 136.69 22.46 Reject
BC95 𝜌=0 1 37.78 10.83 Reject

7
The LR test statistic is defined as 𝐿𝑅𝜆 = −2{𝐿𝐿[𝐻1 ] − 𝐿𝐿[𝐻0 ]}, where 𝐿𝐿[𝐻1 ] and 𝐿𝐿[𝐻0 ] are the log-
likelihood function under 𝐻1 and 𝐻0 , respectively. This test statistic asymptotically follows the chi-square
distribution with degrees of freedom equal to the number of constraints.
10

Electronic copy available at: https://ssrn.com/abstract=3111193


Note: OLS: Linear Regression; SAR: spatial autoregressive model; ALS: non-spatial stochastic frontier model
with half-normal distribution proposed by Aigner et al. (1977); GKS: spatial stochastic frontier model with a
SAR structure and a half-normal distribution proposed by Glass et al. (2016); BC95: non-spatial stochastic
frontier model that incorporates a model of technical inefficiency proposed by Battese and Coelli (1995).

[Insert Fig. 1 about here]


[Insert Fig. 2 about here]

Next, we compare the technical efficiency score (TE score) in the several models. Although there are
several definitions of the TE score, in order to compare the effects of the estimation models, we unify them by
defining them as in Equation (18). Figure 1 shows the TE scores’ histogram using 611 observations. The average
of the TE scores in SSFTE, BC95, GKS, and ALS are 0.8047, 0.7530, 0.8494, and 0.8348, respectively. In the
model that considers the determinants of technical inefficiency, the distribution of TE is dispersed. By considering
the spatial dependence, that is, removing the constraint of 𝜌 = 0, we found that the TE score tends to approach 1.

Table 6 Spearman’s rank correlation coefficient (SRCC) and maximum rank difference ratio (MRDR)
Mean SRCC Mean MRDR
SSFTE BC95 GKS ALS SSFTE BC95 GKS ALS
SSFTE 1.000 0.980 0.814 0.798 SSFTE 0.000 0.177 0.512 0.604
BC95 1.000 0.789 0.720 BC95 0.000 0.506 0.622
GKS 1.000 0.936 GKS 0.000 0.383
ALS 1.000 ALS 0.000
Note: ALS: non-spatial stochastic frontier model with half-normal distribution proposed by Aigner et al. (1977);
GKS: spatial stochastic frontier model with a SAR structure and a half-normal distribution proposed by Glass et
al. (2016); BC95: non-spatial stochastic frontier model that incorporates a model of technical inefficiency proposed
by Battese and Coelli (1995); SSFTE: the proposed model.

Figure 2 compares the TE score in each model. For example, in the upper left diagram, the horizontal
axis represents the TE scores in SSFTE and the vertical axis represents the TE scores in BC95. If the relative rank
is the same, the points will be on one line. In addition, Table 6 shows the mean of the Spearman’s rank correlation
coefficient (SRCC) matrix and the mean of the maximum rank difference ratio (MRDR) during the sample period.
𝐾
Let the TE score ranking of the 𝑖th producer in period 𝑡 in model 𝐾 be 𝑅𝑖𝑡 ; then, the MRDR of models A and B
are defined as follows:

max|𝑅𝑖𝑡𝐴 − 𝑅𝑖𝑡
𝐵
|
𝑀𝑅𝐷𝑅𝐴𝐵𝑡 ≔ 𝑖
. (21)
𝑁

As expected, there are positive correlations between the TE score ranking in all models. However, this
ranking changes significantly between models that use variables explaining the determinants of technical
inefficiency (SSFTE and BC95) and models that do not use those variables (GKS and ALS). As the variables
describing the determinants of technical inefficiency are statistically significant, the estimation considering
determinants of technical inefficiency is important. The presence or absence of the spatial lag does not lead to
dramatic change in rank order. The mean SRCC of GKS and ALS is 0.936 and that of SSFTE and BC95 is 0.980.
Since SSFTE and BC95 specify the determinants of technical inefficiency, the TE score ranking is similar, but the
mean MRDR of these models is 0.177, which means that there is a difference of up to 17.7% in rank order on
average. This indicates that the TE score ranking varies depending on the presence of the spatial lag. Considering
both these and the statistical test results, it is clear that the introduction of the spatial lag is significant.

[Insert Fig. 3 about here]

Figure 3 shows the regional mean of the TE scores. As the overall mean of the scores varies by model,
we map them using six quantiles because the distribution’s shape is significantly different depending on the models.
Considering the discussion so far, TE scores are different depending on the presence or absence of spatial lag, as
well as determinants of technical inefficiency.

11

Electronic copy available at: https://ssrn.com/abstract=3111193


[Insert Fig. 4 about here]

To clarify the influence of the spatial lag term, Figure 4 shows difference in rank of prefectural TE
scores averaged over 2002–2014 (rank in SSFTE minus that in BC95). We consider the area around Aichi
Prefecture, where the automobile industry gathers and the value added is the largest. The TE score rankings of the
prefectures around Aichi Prefecture, such as Gifu Prefecture and Mie Prefecture, are lower in SSFTE than that in
BC95. This tendency also applies to the surroundings of Kanagawa Prefecture, where added value is the third
largest. In these areas, the TE scores are considered to decrease because of positive spatial spillover effects, which
makes the PPF shift upward. On the other hand, the areas where is far away from these high-value-added
prefectures, such as around Fukuoka Prefecture, is less affected by the positive spatial spillover effects and the
PPF is low. Thus, in these areas, the ranking of TE score is higher.8

5. Conclusions
We developed a spatial stochastic frontier model with the SAR term and the feature of Battese and Coelli’s (1995)
model, which simultaneously estimates the determinants of technical inefficiency. Then, we conducted empirical
analysis using data on the Japanese manufacturing industry. Statistical tests support the proposed model. We found
that production activities of the Japanese manufacturing industry are spatially dependent and produce mutually
positive externality effects. This implies that the Japanese government’s industrial cluster policy is justified. Our
findings suggest that the estimates, such as labor elasticity, capital elasticity, and spatial dependence, in the existing
spatial and non-spatial models are biased because of a lack of technical inefficiency determinants and the spatial
lag. This bias also affected the TE score and its ranking.
In particular, it is a significant conclusion that the scale parameter of spatial dependence, 𝜌, in models
without determinants of technical inefficiency is overestimated because 𝜌 absorbs some of the heteroskedasticity
of technical inefficiency. This finding is important because overestimation of 𝜌 indicates overestimates of spatial
spillover effects such as externalities and can lead to erroneous policy judgment. Thus, in this respect, our model
is superior to existing models, because it can measure spatial spillover while controlling for the heteroskedasticity
of technical inefficiency.
Using the proposed model, we can statistically test whether there is spatial dependence as well as
whether the determinants of technical inefficiency are necessary. If the test supports spatial independence, the
existing non-spatial stochastic frontier models such as BC95 can be used. If the test supports the idea that the
determinants of technical inefficiency are not required, an existing spatial stochastic frontier model such as GKS
can be used. There is no positive reason that models without considering either the spatial dependence or
determinants of technical inefficiency are first chosen.
Our model has some extensibility. We can easily introduce a spatial lag of explanatory variables into
our model.9 Since these added variables are all exogenous, we can estimate this model directly using our estimation
method. In addition, we can potentially introduce SEM structure into the error term in our model. The SEM
structure can address spatial dependence in the error term. In addition, by adding an additional weight matrix, we
can extend our model to higher order spatial econometric models (Lacombe, 2004, Elhorst et al., 2012).
Recently, there have been many studies on how to deal with endogenous explanatory variables in
stochastic frontier models (Kutlu, 2010; Amsler et al.; 2016, 2017; Karakaplan and Kutlu, 2017; Tsukamoto
2018). However, it is difficult to apply such methods to spatial dependence models, including our model. This is
because there is an enormous number of variables and parameters in the reduced-form equations under the
spatially dependent situation. This remains a future research task.
In this study, we proposed a useful spatial stochastic frontier model, but several challenges and
applicability possibilities remain from the viewpoint of application. First, we confirmed the existence of spatial
dependence by using prefectural data because it is important from a policy standpoint to show spatial dependence
across prefectures. Our proposed model also allows various other analyses on spatial spillovers. For example, if
an analyst is interested in interdependence relationships among firms, that analyst may obtain new findings by
conducting firm-level analysis using the proposed model. Second, we used geographical distance for the spatial
weight matrix, whereas by creating a spatial weight matrix based on the economic distance calculated using Input-
Output tables, it is possible to conduct an analysis while considering technological proximity (Dietzenbacher et
al., 2005, Yamada and Kawakami, 2016). As described above, there is room for empirical studies. The proposed
model is expected to be applied to empirical analysis in many fields, including regional science and productivity
analysis.

8
These tendencies are robust throughout the period.
9
In the field of spatial econometrics, a model that adds both spatial lag of explained variables and spatial lag of
explanatory variables is called a spatial Durbin model. Therefore, this extended model can be called a spatial
Durbin stochastic frontier model that incorporates a model of technical inefficiency.
12

Electronic copy available at: https://ssrn.com/abstract=3111193


Acknowledgments
I would like to acknowledge the comments of Prof. Jiro Nemoto, Prof. Takafumi Kato, anonymous referees, and
the session participants at the 12th Japan Statistical Society Spring Meeting, International Conference on
Economic Structures 2018, Asia-Pacific Productivity Conference 2018, and the 29th Annual Conference of The
Pan Pacific Association of Input-Output Studies.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-
profit sectors.

References
Adetutu, M., Glass, A.J., Kenjegalieva, K., Sickles, R.C., 2015. The effects of efficiency and TFP growth on
pollution in Europe: a multistage spatial analysis. Journal of Productivity Analysis 43(3), 307–326.
https://doi.org/10.1007/s11123-014-0426-7
Affuso, E., 2010. Spatial autoregressive stochastic frontier analysis: an application to an impact evaluation study.
Auburn University Working Papers. https://dx.doi.org/10.2139/ssrn.1740382
Aigner, D., Lovell, C.K., Schmidt, P., 1977. Formulation and estimation of stochastic frontier production
function models. Journal of Econometrics 6(1), 21–37. https://doi.org/10.1016/0304-4076(77)90052-5
Amsler, C., Prokhorov, A., Schmidt P., 2016. Endogeneity in Stochastic Frontier Models. Journal of
Econometrics 190(2), 280–288. https://doi.org/10.1016/j.jeconom.2015.06.013
Amsler, C., Prokhorov, A., Schmidt, P., 2017. Endogenous environmental variables in Stochastic Frontier
Models. Journal of Econometrics 199(2), 131–140. https://doi.org/10.1016/j.jeconom.2017.05.005
Anselin. L., 1988. Spatial Econometrics: Methods and Models. Dordrecht: Kluwer.
Arbia, G., 2014. A primer for spatial econometrics: with applications in R. Berlin: Springer.
Battese, G.E., Coelli, T.J., 1988. Prediction of firm-level technical efficiencies with a generalized frontier
production function and panel data. Journal of Econometrics 38(3), 387–399.
https://doi.org/10.1016/0304-4076(88)90053-X
Battese, G.E., Coelli, T.J., 1992. Frontier production functions, technical efficiency and panel data: with
application to paddy farmers in India. Journal of Productivity Analysis 3(1–2), 153–169.
https://doi.org/10.1007/BF00158774
Battese, G.E., Coelli, T.J., 1995. A model for technical inefficiency effects in a stochastic frontier production
function for panel data. Empirical Economics 20(2), 325–332. https://doi.org/10.1007/BF01205442
Caudill, S.B., Ford, J.M., 1993. Biases in frontier estimation due to heteroscedasticity. Economics Letters 41(1),
17–20. https://doi.org/10.1016/0165-1765(93)90104-K
Caudill, S.B., Ford, J.M., Gropper, D.M., 1995. Frontier estimation and firm-specific inefficiency measures in
the presence of heteroscedasticity. Journal of Business & Economic Statistics 13(1), 105–111.
https://doi.org/10.2307/1392525
Dietzenbacher, E., Romero Luna, I., Bosma, N.S., 2005. Using average propagation lengths to identify
production chains in the Andalusian economy. Estudios de Economía Aplicada 23(2), 405–422.
Druska, V., Horrace, W.C., 2004. Generalized moments estimation for spatial panel data: Indonesian rice
farming. American Journal of Agricultural Economics 86(1), 185–198.
http://www.jstor.org/stable/3697883
Elhorst, J.P., 2010. Applied spatial econometrics: raising the bar. Spatial Economic Analysis 5(1), 9–28.
https://doi.org/10.1080/17421770903541772
Elhorst, J.P., 2014. Spatial Econometrics from Cross-Sectional Data to Spatial Panels. Heidelberg: Springer
Elhorst, J.P., Lacombe, D.J., Piras, G. 2012. On model specification and parameter space definitions in higher
order spatial econometric models. Regional Science and Urban Economics 42(1–2), 211–220.
https://doi.org/10.1016/j.regsciurbeco.2011.09.003
Fingleton, B., López-Bazo, E., 2006. Empirical growth models with spatial effects. Papers in Regional Science
85(2), 177–198. https://doi.org/10.1111/j.1435-5957.2006.00074.x
Fries, S., Taci, A., 2005. Cost efficiency of banks in transition: evidence from 289 banks in 15 post-communist
countries. Journal of Banking & Finance 29(1), 55–81. https://doi.org/10.1016/j.jbankfin.2004.06.016
Fusco, E., Vidoli, F., 2013. Spatial stochastic frontier models: controlling spatial global and local heterogeneity.
International Review of Applied Economics 27(5), 679–694.
https://doi.org/10.1080/02692171.2013.804493
Glass, A.J., Kenjegalieva, K., Sickles, R.C., 2016. A spatial autoregressive stochastic frontier model for panel
data with asymmetric efficiency spillovers. Journal of Econometrics 190(2), 289–300.
https://doi.org/10.1016/j.jeconom.2015.06.011

13

Electronic copy available at: https://ssrn.com/abstract=3111193


Han, J., Ryu, D., Sickles, R.C., 2016. Spillover effects of public capital stock using spatial frontier analyses: a
first look at the data. In: Green, W.H., Khalaf, L., Sickles, R., Veall, M., Voia, M-C. (Eds.).
Productivity and Efficiency Analysis, Springer Proceedings in Business and Economics. Cham:
Springer, 83–97.
Huang, C.J., Liu, J.T., 1994. Estimation of a non-neutral stochastic frontier production function. Journal of
Productivity Analysis 5(2), 171–180. http://www.jstor.org/stable/41769900
Kalirajan, K., 1981. An econometric analysis of yield variability in paddy production. Canadian Journal of
Agricultural Economy 29(3), 283–294. https://doi.org/10.1111/j.1744-7976.1981.tb02083.x
Karakaplan, M.U., Kutlu, L., 2017. Handling endogeneity in stochastic frontier analysis. Economics Bulletin
37(2), 889–901.
Kelejian, H.H., Prucha, I.R., 1999. A generalized moments estimator for the autoregressive parameter in a spatial
model. International Economic Review 40(2), 509–533. http://www.jstor.org/stable/2648817
Kelejian, H.H., Tavlas, G.S., Hondroyiannis, G. 2006. A spatial modelling approach to contagion among
emerging economies. Open Economies Review 17(4), 423–441. https://doi.org/10.1007/s11079-006-
0357-7
Kumbhakar, S.C., Lovell, C.K., 2003. Stochastic Frontier Analysis. Cambridge: Cambridge University Press.
Kumbhakar, S.C., Ghosh, S., McGuckin, J.T., 1991. A generalized production frontier approach for estimating
determinants of inefficiency in US dairy farms. Journal of Business & Economic Statistics 9(3), 279–
286. http://www.jstor.org/stable/1391292
Kutlu, L., 2010. Battese-Coelli estimator with endogenous regressors. Economics Letters 109(2), 79–81.
https://doi.org/10.1016/j.econlet.2010.08.008
Lacombe, D.J., 2004. Does econometric methodology matter? An analysis of public policy using spatial
econometric techniques. Geographical Analysis 36(2), 105–118. https://doi.org/10.1111/j.1538-
4632.2004.tb01128.x
LeSage, J.P., Pace, R.K., 2009. Introduction to Spatial Econometrics. Boca Raton: Chapman & Hall/CRC.
Marshall, A., 1890. Principles of Economics. London: Macmillan.
Meeusen, W., van Den Broeck, J., 1977. Efficiency estimation from Cobb-Douglas production functions with
composed error. International Economic Review 18(2), 435–444. https://doi.org./10.2307/2525757
Pace, R.K., Zhu, S., 2012. Separable spatial modeling of spillovers and disturbances. Journal of Geographical
Systems 14(1), 75–90.
Pavlyuk, D., 2011. Application of the spatial stochastic frontier model for analysis of a regional tourism sector.
Transport and Telecommunication 12(2), 28–38. Handle: RePEc:pra:mprapa:25052
Ramajo, J., Hewings, G.J., 2018. Modelling regional productivity performance across Western Europe. Regional
Studies 52(10), 1372–1387. https://doi.org/10.1080/00343404.2017.1390219
Reifschneider, D., Stevenson, R., 1991. Systematic departures from the frontier: a framework for the analysis of
firm inefficiency. International Economic Review 32(3), 715–723. https://doi.org/10.2307/2527115
Saeed, M., Izzeldin, M., 2016. Examining the relationship between default risk and efficiency in Islamic and
conventional banks. Journal of Economic Behavior & Organization 132, 127–154.
https://doi.org/10.1016/j.jebo.2014.02.014
Srairi, S.A., 2010. Cost and profit efficiency of conventional and Islamic banks in GCC countries. Journal of
Productivity Analysis 34(1), 45–62. https://www.jstor.org/stable/41766614
Stakhovych, S., Bijmolt, T.H., 2009. Specification of spatial models: A simulation study on weights matrices.
Papers in Regional Science 88(2), 389–408. https://doi.org/10.1111/j.1435-5957.2008.00213.x
Stevenson, R.E., 1980. Likelihood functions for generalized stochastic frontier estimation. Journal of
Econometrics 13(1), 57–66. https://doi.org/10.1016/0304-4076(80)90042-1
Tsukamoto, T., 2018. Endogenous inputs and environmental variables in Battese and Coelli’s (1995) stochastic
frontier model. SSRN Working Paper Series, No. 3231804. http://dx.doi.org/10.2139/ssrn.3231804
Vidoli, F., Cardillo, C., Fusco, E., Canello, J., 2016. Spatial nonstationarity in the stochastic frontier model: an
application to the Italian wine industry. Regional Science and Urban Economics 61, 153–164.
https://doi.org/10.1016/j.regsciurbeco.2016.10.003
Yamada, E., Kawakami, T., 2016. Distribution of industrial growth in Nagoya Metropolitan Area, Japan: an
exploratory analysis using geographical and technological proximities. Regional Studies 50(11),
1876–1888. https://doi.org/10.1080/00343404.2015.1072273

14

Electronic copy available at: https://ssrn.com/abstract=3111193


Appendix: Derivation of likelihood

The probability density functions of 𝑣𝑖𝑡 and 𝑢𝑖𝑡 are

1 𝑣𝑖𝑡2
𝑓𝑣 (𝑣𝑖𝑡 ) = exp (− ), (A1)
√2𝜋𝜎𝑣2 2𝜎𝑣2

1 (𝑢𝑖𝑡 − 𝜇𝑖𝑡 )2
𝑓𝑢 (𝑢𝑖𝑡 ) = 𝜇 exp (− ) , 𝑢𝑖𝑡 ≥ 0. (A2)
√2𝜋𝜎𝑢2 ⋅ 𝛷 ( 𝜎𝑖𝑡 ) 2𝜎𝑢2
𝑢

Since 𝑣𝑖𝑡 and 𝑢𝑖𝑡 are independent,


2 (𝑢𝑖𝑡 −𝜇𝑖𝑡 )2
1 𝑣𝑖𝑡
𝑓𝑢𝑣 (𝑢𝑖𝑡 , 𝑣𝑖𝑡 ) = 𝑓𝑣 (𝑣𝑖𝑡 ) ⋅ 𝑓𝑢 (𝑢𝑖𝑡 ) = 𝜇 exp (− − ). (A3)
2𝜋𝜎𝑢 𝜎𝑣 𝛷( 𝑖𝑡 ) 2𝜎𝑣2 2
2𝜎𝑢
𝜎𝑢

So, the probability density function of 𝜀𝑖𝑡 = 𝑣𝑖𝑡 − 𝑢𝑖𝑡 is

∞ 1 (𝜀𝑖𝑡 + 𝑢𝑖𝑡 )2 (𝑢𝑖𝑡 − 𝜇𝑖𝑡 )2


𝑓𝜀 (𝜀𝑖𝑡 ) = ∫ 𝜇𝑖𝑡 exp (− − ) 𝑑𝑢𝑖𝑡
0 2𝜋𝜎𝑢 𝜎𝑣 𝛷 ( ) 2𝜎𝑣2 2𝜎𝑢2
𝜎𝑢
2
∞ 1 1 𝜇𝑖𝑡 𝜎𝑣2 − 𝜀𝑖𝑡 𝜎𝑢2
= [∫ exp {− (𝑢 − ( )) } 𝑑𝑢𝑖𝑡 ]
−1
0 √2𝜋 (𝜎𝑣 𝜎𝑢 𝜎 ) 2(𝜎𝑣 𝜎𝑢 𝜎 −1 )2 𝑖𝑡 𝜎2
(A4)
−1
1 𝜇𝑖𝑡 + 𝜀𝑖𝑡 𝜇𝑖𝑡
⋅ 𝜙( ) ⋅ (𝛷 ( ))
𝜎 𝜎 𝜎𝑢
−1
1 𝜇𝑖𝑡 + 𝜀𝑖𝑡 𝜇𝑖𝑡 (1 − 𝛾) − 𝜀𝑖𝑡 𝛾 𝜇𝑖𝑡
= 𝜙( )⋅𝛷( ) ⋅ (𝛷 ( )) .
𝜎 𝜎 𝜎√(1 − 𝛾)𝛾 𝜎 √𝛾

The joint probability density function of 𝜺 = {𝜀𝑖𝑡 } is

𝑁 𝑇𝑖 −1
1 1 1 𝜇𝑖𝑡 + 𝜀𝑖𝑡 2 𝜇𝑖𝑡 (1 − 𝛾) − 𝜀𝑖𝑡 𝛾 𝜇𝑖𝑡
𝑓𝜺 (𝜺) = ∏ ∏ [ ⋅ exp {− ( ) }⋅𝛷( ) ⋅ (𝛷 ( )) ] . (A5)
𝜎 √2𝜋 2 𝜎 𝜎√(1 − 𝛾)𝛾 𝜎 √𝛾
𝑖=1 𝑡=1

Since 𝑑𝜺/𝑑𝒚 = (𝑰𝑁𝑇 − 𝜌𝑾), the joint probability density function of 𝒚 = {𝑦𝑖𝑡 } is

𝑓𝒚 (𝒚) = 𝑓𝜺 (𝜺) ⋅ |𝑰𝑁𝑇 − 𝜌𝑾|


= |𝑰𝑁𝑇 − 𝜌𝑾|
𝑁 𝑇𝑖
1 1 1 𝜇𝑖𝑡 + 𝜀𝑖𝑡 2 𝜇𝑖𝑡 (1 − 𝛾) − 𝜀𝑖𝑡 𝛾
⋅ ∏∏[ ⋅ exp {− ( ) }⋅𝛷( )
𝜎 √2𝜋 2 𝜎 𝜎√(1 − 𝛾)𝛾 (A6)
𝑖=1 𝑡=1
−1
𝜇𝑖𝑡
⋅ (𝛷 ( )) ].
𝜎 √𝛾

The likelihood function for the sample observations, (𝒚, 𝒙, 𝒛), is

15

Electronic copy available at: https://ssrn.com/abstract=3111193


𝐿(𝜷, 𝜹, 𝛾, 𝜌, 𝜎 2 ; 𝒚)
= |𝑰𝑁𝑇 − 𝜌𝑾|
2
𝑁 𝑇𝑖 ′ ′
1 1 1 𝒛 𝜹 + 𝑦𝑖𝑡 − 𝒙𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡
⋅∏∏ ⋅ exp − ( 𝑖𝑡 )
𝜎 √2𝜋 2 𝜎
𝑖=1 𝑡=1
{ } (A7)
[
−1
′ ′
𝒛𝑖𝑡 𝜹(1 − 𝛾) − (𝑦𝑖𝑡 − 𝒙𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 ) 𝛾 ′
𝒛 𝜹
⋅𝛷 ⋅ 𝛷 ( 𝑖𝑡 ) .
𝜎√(1 − 𝛾)𝛾 𝜎 √𝛾
( ) ( ) ]

The log-likelihood function is

𝐿𝐿(𝜷, 𝜹, 𝛾, 𝜌, 𝜎 2 ; 𝒚)
𝑁
1
= ln|𝑰𝑁𝑇 − 𝜌𝑾| − (∑ 𝑇𝑖 ) [ln 𝜎 2 + ln 2𝜋]
2
𝑖=1
2
𝑁 𝑇𝑖 ′ ′
1 𝒛 𝜹 + 𝑦𝑖𝑡 − 𝒙𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡
− ∑ ∑ ( 𝑖𝑡 )
2 𝜎
𝑖=1 𝑡=1

𝑁 𝑇𝑖 ′ (A8)
𝒛 𝜹
− ∑ ∑ ln 𝛷 ( 𝑖𝑡 )
𝜎 √𝛾
𝑖=1 𝑡=1
[
′ ′
𝒛𝑖𝑡 𝜹(1 − 𝛾) − (𝑦𝑖𝑡 − 𝒙𝑖𝑡 𝜷 − 𝜌 ∑𝑁 𝑡
𝑗=1 𝑤𝑖𝑗 𝑦𝑗𝑡 ) 𝛾
− ln 𝛷 .
𝜎√(1 − 𝛾)𝛾
( )]

16

Electronic copy available at: https://ssrn.com/abstract=3111193


Figures

Fig. 1 Distribution of TE scores

17

Electronic copy available at: https://ssrn.com/abstract=3111193


Fig. 2 Scatter plots of TE scores

18

Electronic copy available at: https://ssrn.com/abstract=3111193


Hokkaido

Okinawa

Gifu

Fukuoka
Aichi Tokyo
Osaka Shizuoka

Fig. 3 Regional mean of TE scores (six quantiles)

19

Electronic copy available at: https://ssrn.com/abstract=3111193


Gifu

Chiba

Tokyo
Fukuoka
Kanagawa
Aichi
Mie

Fig. 4 Difference in rank of prefectural TE scores averaged over 2002–2014 (SSFTE—BC 95)

20

Electronic copy available at: https://ssrn.com/abstract=3111193

You might also like