You are on page 1of 143

E62: Stochastic Frontier Models and Efficiency Analysis E-1

E62: Stochastic Frontier Models and Efficiency


Analysis
E62.1 Introduction
Chapters E62-E65 present LIMDEP‟s programs for two types of efficiency analysis,
stochastic frontier analysis (SFA) and data envelopment analysis (DEA). To a large extent, these are
competing methodologies. No formulation has yet been devised that unifies the two in a single
analytical framework. Arguably, the former is a fully parameterized model whereas the latter is
„nonparametric,‟ albeit also atheoretical in nature.
The stochastic frontier model is used in a large literature of studies of production, cost,
revenue, profit and other models of goal attainment. The model as it appears in the current literature
was originally developed by Aigner, Lovell, and Schmidt (1977). The canonical formulation that
serves as the foundation for other variations is their model,

y = x + v - u,

where y is the observed outcome (goal attainment), x + v is the optimal, frontier goal (e.g.,
maximal production output or minimum cost) pursued by the individual, x is the deterministic part
of the frontier and v ~ N[0,v2] is the stochastic part. The two parts together constitute the
„stochastic frontier.‟ The amount by which the observed individual fails to reach the optimum (the
frontier) is u, where
u = |U| and U ~ N[0,u2]

(change to v + u for a stochastic cost frontier or any setting in which the optimum is a minimum). In
this context, u is the „inefficiency.‟ This is the normal-half normal model which forms the basic
form of the stochastic frontier model.
Many varieties of the stochastic frontier model have appeared in the literature. A major
survey that presents an extensive catalog of these formulations is Kumbhakar and Lovell (2000).
(See, as well, Bauer (1990), Greene (2008) and several other surveys, many of which are cited in
Kumbhakar and Lovell and in Greene.) The estimator in LIMDEP computes parameter estimates for
most single equation cross section and panel data variants of the stochastic frontier model.
A large number of variants of the stochastic frontier model based on different assumptions
about the distribution of the „inefficiency‟ term, u have been proposed in the received literature.
Most of these are available in LIMDEP, as suggested in the list below. The bulk of the received
technology centers on cross section style modeling. However, recent advances include many
extensions that take advantage of the features of panel data. A large array of panel data estimators
are also supported by LIMDEP as well.
E62: Stochastic Frontier Models and Efficiency Analysis E-2

The conventional approach to deterministic frontier estimation is currently data envelopment


analysis. This is usually handled with linear programming techniques. The analysis assumes that
there is a frontier technology (in the same spirit as the stochastic frontier production model) that can
be described by a piecewise linear hull that envelopes the observed outcomes. Some (efficient)
observations will be on the frontier while other (inefficient) individuals will be inside. The
technique produces a deterministic frontier that is generated by the observed data, so by construction,
some individuals are „efficient.‟ This is one of the fundamental differences between DEA and SFA.
Data envelopment analysis is documented in Chapter E65.
The analysis of production, cost, etc. in the stochastic frontier framework involves two steps.
In the first, the frontier model is estimated, usually by maximum likelihood. In the second, the
estimated model is used to construct measures of inefficiency or efficiency. Individual specific
estimates are computed that provide the basis of comparison of firms either to absolute standards or
to each other. The sections of this chapter develop several model forms used in the first step.
Efficiency estimation, the second step, appears formally in Section E62.8. The general methodology
is then used in the already developed specifications and with several proposed in the sections that
follow, as well as in Chapters E63 and E64.

E62.2 Stochastic Frontier Model Specifications


The stochastic frontier model is

y = x + v-u, u =|U|.

In this area of study, unlike most others, estimation of the model parameters is usually not the
primary objective. Estimation and analysis of the inefficiency of individuals in the sample and of the
aggregated sample are usually of greater interest. This part of the development will present tools for
estimation of inefficiency.
Typically, the production or cost model is based on a Cobb-Douglas, translog, or other form
of logarithmic model, so that the essential form is

log y = x + v - u

where the components of x are generally logs of inputs for a production model or logs of output and
input prices for a cost model, or their squares and/or cross products. In this form, then, at least for
relatively small variation, u represents the proportion by which y falls short of the goal, and has a
natural interpretation as proportional or percentage inefficiency. The numerous examples below will
demonstrate. Users are also referred to the various survey sources listed earlier.
The results one obtains are, of course, critically dependent on the model assumed. Thus,
specification and estimation of model parameters, while perhaps of secondary interest, are
nonetheless a major first step in the model building process. In nearly all received formulations, the
random component, v, is assumed to be normally distributed with zero mean. In some models, v may
be heteroscedastic. But, in either form, the large majority of the different frontier models that have
been proposed result from variations on the distribution of the inefficiency term, u. The range of
specifications examined in this chapter includes the following:

 Distributional assumptions: half normal, exponential, gamma


 Partially nonparametric frontier function
 Sample selection model
E62: Stochastic Frontier Models and Efficiency Analysis E-3

The following extensions are presented in Chapter E63:

 Truncated normal with nonzero, heterogeneous mean in the underlying U


 Heteroscedasticity in v and/or u
 Heterogeneity in the parameter of the exponential or gamma distribution
 Amsler et al.‟s „scaling model‟
 Alvarez et al.‟s model of fixed, latent management

A number of treatments for panel data are presented in Chapter E64.

E62.3 Basic Commands for Stochastic Frontier Models


The command for all specifications of the stochastic frontier model is

FRONTIER ; Lhs = y ; Rhs = one, ... ; … other specifications $

NOTE: One must be the first variable in the Rhs list in all model specifications.

The default specification is Aigner, Lovell and Schmidt‟s canonical normal-half normal model. The
default form is a production frontier model,

y = x + v - u, u = |U|.

That is, the right hand side of the equation specifies the maximum goal attainable. To specify a cost
frontier model or other model in which the frontier represents a minimum, so that

y = x + v + u, u = |U|,
use
; Cost

This specification is used in all forms of the stochastic frontier model. As noted below, one
additional specification you may find useful is

; Start = values for , , .

(The meanings of the parameters are developed below.) ALS also developed the normal-exponential
model, in which u has an exponential distribution rather than a half normal distribution. To request
the exponential model, use

; Model = Exponential (or ; Model = E )

in the FRONTIER command. For this model, the parameters are (,,v). Further details appear
below. There are also several model forms, and numerous modifications such as heteroscedasticity
that are developed below.
E62: Stochastic Frontier Models and Efficiency Analysis E-4

This is the full list of general specifications that are applicable to this model estimator.

Controlling Output from Model Commands

; Par keeps ancillary parameters , , etc. with main parameter  vector in b.


; OLS displays least squares starting values when (and if) they are computed.
; Table = name saves model results to be combined later in output tables.

Robust Asymptotic Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),
same as ; Printvc.
; Choice uses choice based sampling (sandwich with weighting) estimated matrix.
; Cluster = spec requests computation of the cluster form of corrected covariance estimator.

Optimization Controls for Nonlinear Optimization

; Start = list gives starting values for a nonlinear model.


; Tlg [ = value] sets convergence value for gradient.
; Tlf [ = value] sets convergence value for function.
; Tlb[ = value] sets convergence value for parameters.
; Alg = name requests a particular algorithm, Newton, DFP, BFGS, etc.
; Maxit = n sets the maximum iterations.
; Output = n requests technical output during iterations; the level „n‟ is 1, 2, 3 or 4.
; Set keeps current setting of optimization parameters as permanent.

Predictions and Residuals

; List displays a list of fitted values with the model estimates.


; Keep = name keeps fitted values as a new (or replacement) variable in data set.
; Res = name keeps residuals as a new (or replacement) variable.
; Fill fills missing values (outside estimating sample) for fitted values.

Hypothesis Tests and Restrictions

; Test: spec defines a Wald test of linear restrictions.


; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec.
; CML: spec defines a constrained maximum likelihood estimator.
; Rst = list specifies equality and fixed value restrictions.
; Maxit = 0 ; Start = the restricted values specifies Lagrange multiplier test.
E62: Stochastic Frontier Models and Efficiency Analysis E-5

E62.3.1 Predictions, Residuals and Partial Effects


Predicted values and „residuals‟ for the stochastic frontier models are computed as follows:
The same forms are used for cross section and panel data forms. The predicted value is x. (These
are rarely useful in this setting.) The „residual‟ is computed directly as

ei  yi  ˆ xi

This residual is usually not of interest in itself. It is, however, the crucial ingredient in the efficiency
estimator discussed in Section E62.8. The estimator of ui that we will use is computed by the
Jondrow formula E[u|v-u] or E[u|v+u] if based on a cost frontier,

  ( w) 
Eˆ [u | ]  2 
 w ,   v  u , w = /,
1   1  ( w) 
u
  v2  u2 ,   .
v

In the JLMS formula, ei is the estimator of εi. The formulas and computations are discussed in
Section E62.8.
The frontier model is, save for its involved disturbance term, a linear regression model. The
conditional mean in the model is

E[yi|xi] = xi - E[ui|xi].

In most cases, E[ui|xi]is not a function of xi, so the derivatives of E[yi|xi] with respect to xi are just .
In other cases, we will consider, the conditional mean of ui does depend on xi or other variables, so
the partial effects in the model might be more involved than this. Once again, however, these will
usually not be of direct interest in the study. But, in all cases, Eˆ [u | ] will be an involved function of
xi and any other variables that appear anywhere else in the model. We will examine the partial
effects on the efficiency estimators in Section E62.8.

E62.3.2 Results Saved by the Frontier Estimator


The results saved by the frontier estimator are

Matrices: b = regression parameters, ,


varb = asymptotic covariance matrix

Scalars: sy, ybar, nreg, kreg, and logl

Last Function: JLMS estimator of ui.


E62: Stochastic Frontier Models and Efficiency Analysis E-6

Use ; Par to add the ancillary parameters to these. The ancillary parameters that are estimated for
the various models are as follows, including the scalars saved by the estimation program:

Half and truncated normal: estimates , , saves lmda and s = ,


Truncated normal: same as half normal, estimates , saved as mu,
Exponential: estimates , v, saves theta and s = v,
Heteroscedastic model: average value of  as s, average value of  as lmda
Heterogeneity in mean: estimates , , saves lmda and s = .

E62.4 Data for the Analysis of Frontier Models


We will use two data sets to illustrate the frontier estimators. The first, the data on U.S.
airlines is a panel data set that we will use primarily for illustrating the stochastic frontier model.
The second, the famous WHO data on health care attainment, will be used both for the stochastic
frontier models and for the later work on data envelopment analysis.

E62.4.1 Data on U.S. Airlines


We will develop several examples in this section using a panel data set on the U.S. airline
industry from the pre-deregulation period (airlines.dat). The observations are an unbalanced panel
on 25 airlines. The original balanced panel data set contained 15 observations (1970-1984) on each
of 25 airlines. Mergers, strikes and other data problems reduced the sample to the unbalanced panel
of 256 observations The group sizes (number of firms) are 2 (4), 4(1), 7 (1), 9 (3), 10 (3), 11 (1), 12
(2), 13 (1), 14 (3) and 15 (6). The variables in the data set are

firm = ID, 1,...,25 year = 1970...1984 t = year - 1969 = 1,...,15


cost = total cost revenue = revenue output = total output
stage = average stage length points = number of points served loadfct = load factor
cmtl = materials cost mtl = materials quantity pm = price of material
cfuel = fuel cost fuel = fuel quantity pf = fuel price
ceqpt = equipment cost eqpt = equipment quantity pe = equipment price
clabor = labor cost labor = labor quantity pl = labor price
cprop = property cost property = property quantity pp = property price
k = capital index pk = capital price index

Transformed variables used in the examples are as follows:

lc = log(cost) cn = cost/pp lcn = log(cn)


lpm = log(pm) lpf = log(pf) lpe = log(pe)
lpl = log(pl) lpp = log(pp) lpk = log(pk)
lpmpp = log(pm/pp) lpfpp = log(pf/pp) lpepp = log(pe/pp)
lplpp = log(pl/pp) lf = log(fuel) lm = log(mtl)
le = log(eqpt) ll = log(labor) lp = log(property)
lq = log(output) lq2 = lq2
E62: Stochastic Frontier Models and Efficiency Analysis E-7

E62.4.2 World Health Organization (WHO) Health Attainment Data


The data used by the WHO in their 2000 World Health Report assessment of health care
attainment by 191 countries have been used by many researchers worldwide both for developing
frontier models and for analyzing health outcomes. The data are a panel of five years, 1993-1997, on
health outcome data for 191 countries and a number of internal political units, e.g., the states of
Mexico. The main outcome variables are dale and comp (an aggregate of such measures as
efficiency and equity of health care delivery in the country). The main input variables are hexp and
educ. A variety of other variables, listed below, were observed only in 1997. The following
descriptive statistics apply to the entire data set of 840 observations:

Variable Mean Std. Dev. Description


country * * country number omitting internal units, 1...,191
year * * year (1993-1997)
small * * internal political unit, 0 for countries, else 1,...,6.
comp 75.0062726 12.2051123 composite health care attainment
dale 58.3082712 12.1442590 disability adjusted life expectancy
hexp 548.214857 694.216237 health expenditure per capita, PPP units
educ 6.31753664 2.73370613 educational attainment, years
oecd .279761905 .449149577 OECD member country, dummy variable
gdpc 8135.10785 7891.20036 per capita GDP in PPP units
popden 953.119353 2871.84294 population density per square KM
gini .379477914 .090206941 gini coefficient for income distribution
tropics .463095238 .498933251 dummy variable for tropical location
pubthe 58.1553571 20.2340835 proportion of health spending paid by government
geff .113293978 .915983955 World Bank government effectiveness measure
voice .192624849 .952225978 World Bank measure of democratization

(The data were analyzed in Greene (2004a,b). Some of the variables, such as popden and gdpc, were
augmented from other sources in these studies.) Although the data are a five year panel – a few
countries were observed for fewer than five years – there is almost no cross year variation in any
variable. (The proportion of total variation that is within groups is less than 1% for the four time
varying variables.) We have created a cross section from these data as follows: First, we discarded the
data on internal political units. We then averaged comp, dale, hexp and educ across the five years. We
retained a sample of 191 cross sectional (country) units. The following command set creates the data set.

SAMPLE ; 1-840 $
REJECT ; small > 0 $
SETPANEL ; Group = country ; Pds = ti $
RENAME ; hc3 = educ $
CREATE ; lpubthe = log(pubthe) $
CREATE ; dalebar = Group Mean(dale, Pds = ti) $
CREATE ; compbar = Group Mean(comp, Pds = ti) $
CREATE ; educbar = Group Mean(educ, Pds = ti) $
CREATE ; hexpbar = Group Mean(hexp, Pds = ti) $
CREATE ; logdbar = Log(dalebar) ; logcbar = Log(compbar) $
CREATE ; logebar = Log(educbar) ; loghbar = Log(hexpbar) $
CREATE ; loghbar2 = loghbar^2 $
REJECT ; year # 1997 $
E62: Stochastic Frontier Models and Efficiency Analysis E-8

E62.5 Skewness of the OLS Residuals and Problems Fitting


Stochastic Frontier Models
Before maximum likelihood estimation begins, the skewness of the OLS residuals in the
regression of y on x is checked. Waldman (1982) has shown that when the OLS residuals are
skewed in the wrong direction, a solution for the maximum likelihood estimator for the stochastic
frontier model is simply OLS for the slopes and for v2 and 0.0 for u2. If this condition is found, a
lengthy warning is issued. We emphasize, this is not a bug in the program, nor is it something to be
„fixed,‟ beyond changing the specification of the model or rethinking the stochastic frontier as the
modeling platform. This is our single most frequently posed question, so we offer an application to
demonstrate the effect. Consider the commands

CALC ; Ran(12345) $
SAMPLE ; 1-500 $
CREATE ; u = Abs(Rnn(0,2))
; v = Rnn(0,1)
; x = Rnn(0,1)
;y=x+v+u$
REGRESS ; Lhs = y ; Rhs = one,x
; Res = e $
FRONTIER ; Lhs = y ; Rhs = one,x $
KERNEL ; Rhs = e $

The CREATE command generates y exactly according to the model, except note that u is not
subtracted, it is added. Thus, we should expect this model to perform poorly. The estimation results
from the FRONTIER command are shown below. Note the string of warnings. Estimation is
allowed to proceed, but the results are not a „frontier‟ as such. The final estimate of  is essentially
zero, with a huge standard error and the reported estimate of u2 in the box above the results is
0.0000. The other estimates are, in fact, the same as OLS. The kernel density estimator for the OLS
residuals is clearly skewed in the positive, that is, the wrong direction. Once again, we emphasize,
this is a failure of the data to conform to the model.

Error 315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.
WARNING! OLS residuals have the wrong skewness for SFM
Other forms of the model models may also behave poorly.
In this case, one MLE for the half normal model is OLS
for beta and sigma and zero for the inefficiency term.
Warning 141: Iterations:current or start estimate of sigma nonpositive
Warning 141: Iterations:current or start estimate of sigma nonpositive
Warning 141: Iterations:current or start estimate of sigma nonpositive
Warning 141: Iterations:current or start estimate of sigma nonpositive
Warning 141: Iterations:current or start estimate of sigma nonpositive
Line search at iteration 30 does not improve fn. Exiting optimization.
E62: Stochastic Frontier Models and Efficiency Analysis E-9

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable Y
Log likelihood function -921.33848
Estimation based on N = 500, K = 4
Inf.Cr.AIC = 1850.7 AIC/N = 3.701
Variances: Sigma-squared(v)= 2.33375
Sigma-squared(u)= .00000
Sigma(v) = 1.52766
Sigma(u) = .00000
Sigma = Sqr[(s^2(u)+s^2(v)]= 1.52766
Gamma = sigma(u)^2/sigma^2 = .00000
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 -921.33851
Chi-sq=2*[LogL(SF)-LogL(LS)] = .000
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 1.61107 165.2912 .01 .9922 -322.35365 325.57580
X| 1.00746*** .07057 14.28 .0000 .86914 1.14578
|Variance parameters for compound error
Lambda| .10897D-05 135.6070 .00 1.0000 -.26578D+03 .26578D+03
Sigma| 1.52766*** .00242 630.99 .0000 1.52292 1.53241
--------+--------------------------------------------------------------------

Figure E62.1 Kernel Density for Least Squares Residuals


E62: Stochastic Frontier Models and Efficiency Analysis E-10

Unfortunately, the Waldman result is a sufficient condition, not a necessary one. That is, it
has been shown that when the OLS residuals have the „right‟ skewness, then the MLE for the frontier
model is unique, and you will have no trouble in estimation. When they have the „wrong‟ skewness,
it is only shown that the OLS results are a local stationary point of the log likelihood, not that they
are the global maximizers. There may be another point that is yet better than OLS. Our airline data
used below provide an example. Consider the following results, where we present both the
stochastic frontier estimates and OLS. (The model, itself, is developed later, so we show only the
useful results here.) As above, we receive the initial warning about the skewness of the OLS
residuals. Then, estimation proceeds and an apparently routine solution emerges that is different
from, and better than (has a higher log likelihood) OLS.

Error 315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.
WARNING! OLS residuals have the wrong skewness for SFM
Other forms of the model models may also behave poorly.
In this case, one MLE for the half normal model is OLS
for beta and sigma and zero for the inefficiency term.
Normal exit: 11 iterations. Status=0, F= -105.0617
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 105.06169
Variances: Sigma-squared(v)= .02411
Sigma-squared(u)= .00457
Sigma(v) = .15527
Sigma(u) = .06757
Stochastic Production Frontier, e = v-u
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -1.05847*** .02333 -45.37 .0000 -1.10419 -1.01274
LF| .38355*** .07045 5.44 .0000 .24547 .52163
LE| .21961*** .07300 3.01 .0026 .07653 .36270
LM| .71667*** .07654 9.36 .0000 .56666 .86668
LL| -.41139*** .06382 -6.45 .0000 -.53647 -.28630
LP| .18973*** .02960 6.41 .0000 .13171 .24775
|Variance parameters for compound error
Lambda| .43515** .20117 2.16 .0305 .04086 .82944
Sigma| .16933*** .00057 295.74 .0000 .16821 .17045
--------+--------------------------------------------------------------------
Ordinary least squares regression ............
Diagnostic Log likelihood = 105.05876
Standard error of e = .16244
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error t |t|>T* Interval
--------+--------------------------------------------------------------------
Constant| -1.11237*** .01015 -109.57 .0000 -1.13227 -1.09247
LF| .38283*** .07116 5.38 .0000 .24335 .52231
LE| .21922*** .07389 2.97 .0033 .07441 .36404
LM| .71924*** .07732 9.30 .0000 .56769 .87078
LL| -.41015*** .06455 -6.35 .0000 -.53665 -.28364
LP| .18802*** .02980 6.31 .0000 .12961 .24643
--------+--------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-11

There is no simple bullet proof strategy for handling this situation. You can try different
starting values with ; Start = values for , ,  that differ from OLS, but it is hard to know where
these will come from. Moreover, it is likely that you will end up at OLS anyway. As Waldman
points out, this is a potentially ill behaved log likelihood function. We offer the preceding as a
caution for the practitioner. For the particular data set used here, we can identify a specific culprit.
The „failure‟ of the model emerges in the presence of the variable lm, and does not occur when lm is
omitted from the equation. We have no theory, however, for why this should be the case. Simply
deleting variables from the model until one which does not have the skewness problem emerges does
not seem like an effective strategy.
We do note, the failure might signal a misspecified model. For example, for our airlines
example, the specification above omits the capital variable. When lk = log(k) is added to the model, we
obtain the following quite routine results (albeit with the wrong signs on capital and labor inputs).
Normal exit: 13 iterations. Status=0, F= -108.4392
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 108.43918
Estimation based on N = 256, K = 9
Inf.Cr.AIC = -198.9 AIC/N = -.777
Variances: Sigma-squared(v)= .01902
Sigma-squared(u)= .01692
Sigma(v) = .13791
Sigma(u) = .13007
Sigma = Sqr[(s^2(u)+s^2(v)]= .18957
Gamma = sigma(u)^2/sigma^2 = .47074
Var[u]/{Var[u]+Var[v]} = .24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = .730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
| Deterministic Component of Stochastic Frontier Model
Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439
LF| .37257*** .07038 5.29 .0000 .23463 .51052
LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299
LM| .69910*** .07580 9.22 .0000 .55054 .84766
LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530
LP| .44533*** .09498 4.69 .0000 .25917 .63149
LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759
| Variance parameters for compound error
Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373
Sigma| .18957*** .00064 297.81 .0000 .18832 .19082
--------+--------------------------------------------------------------------

We emphasize, the Waldman result, and this particular theoretical outcome, is specific to the
normal-half normal model. However, when it occurs, problems of a similar sort will often, but not
always, show up in other models. Thus, in spite of a warning, your fitted exponential, or panel data
model, may be quite satisfactory.
E62: Stochastic Frontier Models and Efficiency Analysis E-12

E62.6 The Ordinary Least Squares Estimator


For the simplest specification

y = x + v - u, u =|U|

in which  contains a constant term and both v and U are homoscedastic and have zero means, i.e., in
the original half normal or exponential models, the OLS estimator of all elements of  except the
constant term are consistent. It is convenient to rewrite the model as

y = 0 +  1x1 + v - u.
Under the assumptions, we can write the model as

y = (0 - E[u]) +  1x1 + v - (u - E[u])


or y =  +  1x1 + e
in which e has zero mean and constant variance, and is orthogonal to (1,x1). Thus, the model as shown
can be estimated consistently by OLS. The constant term estimates  = (0 - E[u]). Assuming that
E[u] is estimable, therefore, estimation of  by MLE vs. OLS is a question of efficiency, not
consistency. (However, we remain interested in estimation of u, so this may be a moot point.)

E62.6.1 Corrected Ordinary Least Squares – COLS


The COLS estimator is obtained by turning the least squares estimator into a deterministic
frontier model. This is done by shifting the intercept in the OLS estimator upward (for a production
frontier) or downward (for a cost frontier) so that all points lie either below or above the estimated
function. Figure E62.2 shows the result for estimation of a simple cost frontier for the airlines data.
The function is shifted so that it rests on the single most extreme point (residual) in the data. The
COLS estimator is requested with

FRONTIER ; Lhs = goal variable


; Rhs = one, …
; Model = COLS $

Add ; Cost if the model is a cost frontier.


Efficiency values, as discussed below, are obtained as follows:

; Eff = variable name

saves the residuals from the deterministic frontier. These are the estimates of ui. Note in Figure E62.2,
for a cost frontier, all values of ui are positive. If you fit a production frontier, then all points will lie
below the regression and all residuals will be negative. The estimated inefficiency that is saved will be
-ei. Thus, in both cases, the values saved by ; Eff = variable are the positive estimates of the size of
the deviation of the observation from the frontier. The estimator saved by ; Eff = variable name is the
inefficiency estimate, in this model, a direct estimate of ui. The estimator of technical or cost efficiency
is
Efficiency = exp (uˆi )
E62: Stochastic Frontier Models and Efficiency Analysis E-13

If you fit a production frontier, use

; Techeff = variable name

to save this variable. For a cost frontier, use

; Costeff = variable name

Figure E62.2 COLS Estimator of Cost Frontier Function

The following shows computation of a COLS estimator for the airlines. The FRONTIER
command requests both the inefficiency estimates, ui, and the cost efficiency estimates, eui_cost.
The kernel density estimate for the cost efficiency is shown in Figure E62.3. The results for the
estimator begin with the standard output for least squares regression. The second panel includes
some preliminary results for the stochastic frontier model, including the chi squared test for zero
skewness (which is rejected); 2 = (n/6)(m3/s3)2. The standard normal statistic is the signed (based on
m3) square root of 2. The third panel presents descriptive statistics for ui and exp(-ui).

CREATE ; lc = Log(cost/pp)
; lpkp = Log(pk/pp)
; lplp = Log(pl/pp)
; lpmp= Log(pm/pp)
; lpep = Log(pe/pp)
; lpfp = Log(pf/pp) $
CREATE ; lk = Log(k) $
CREATE ; ly = Log(output) ; ly2 = .5*ly*ly $
FRONTIER ; Lhs = lc ; Rhs = one,ly,ly2,lpkp,lplp,lpmp,lpep,lpfp
; Cost ; Model = COLS
; Costeff = Eui_cost ; Eff = ui $
KERNEL ; Rhs = eui_cost
; Title = Estimated Cost Efficiency Based on COLS Estimator $
E62: Stochastic Frontier Models and Efficiency Analysis E-14

-----------------------------------------------------------------------------
Corrected OLS Deterministic Frontier Cost Function
LHS=LC Mean = 2.84024
Standard deviation = 1.09256
No. of observations = 256 Degrees of freedom
Regression Sum of Squares = 300.028 7
Residual Sum of Squares = 4.36487 248
Total Sum of Squares = 304.393 255
Standard error of e = .13267
Fit R-squared = .98566 R-bar squared = .98526
Model test F[ 7, 248] = 2435.25310 Prob F > F* = .00000
Diagnostic Log likelihood = 157.91523 Akaike I.C. = -4.00909
Restricted (b=0) = -385.41031 Bayes I.C. = -3.89830
Chi squared [ 7] = 1086.65108 Prob C2 > C2* = .00000
--------------------------------------------------
Skewness test for inefficiency based on residuals
Normalized skewness = m3/s^3 = .21340
Chi squared test (1 degree of freedom) 1.94294 Critical value= 3.84000
Standard normal test statistic 1.39389 Test value = +/- 1.96000
Estimated Efficiency Values Based on e(i)+Min e(i)
--------+-----------------------------------------
| Mean Std.Dev. Minimum Maximum
CostInef| .357 .133 .000 .773
Cost Eff| .706 .091 .462 1.000
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic COLS Frontier Function
Constant| 19.4363 27.45697 .71 .4790 -34.3783 73.2510
LY| .94303*** .01809 52.12 .0000 .90757 .97849
LY2| .08248*** .01236 6.67 .0000 .05825 .10671
LPKP| 1.42385 2.14849 .66 .5075 -2.78711 5.63480
LPLP| .01915 .10169 .19 .8506 -.18016 .21847
LPMP| .04504 1.41721 .03 .9746 -2.73264 2.82272
LPEP| -.57070 .67904 -.84 .4007 -1.90159 .76019
LPFP| -.04811** .01986 -2.42 .0154 -.08704 -.00919
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.

Figure E62.3 Kernel Estimator for Cost Efficiency


E62: Stochastic Frontier Models and Efficiency Analysis E-15

E62.6.2 Modified OLS and Starting Values for the MLE


Under the specific distributional assumptions of the half normal and exponential models, we
do have method of moments estimators of the underlying parameters. They are based on the moment
equations
Var[e] = Var[v] + Var[u]
and Skewness[e] = Skewness[u]

since v is symmetric. The left hand sides can be consistently estimated using the OLS residuals:

m2 = (1/n)i ei2
and m3 = (1/n)iei3.

Both of the functions on the right hand side are known for the half normal and exponential models.
In particular, for the half normal model, the moment equations are

m2 = v2 + [1 - 2/]u2 ,
m3 = (2/)1/2[1 - 4/]u3.
1/ 3
m / 2 
The solutions are: ˆ u   3  and ˆ v  m2  (1  2/ )ˆ u2 .
 1  4 /  

Note that there is no solution for u if m3 is not negative, which is the problem discussed in Section
E62.5. Assuming that this problem does not arise, the corrected constant term is

̂  a + Est.E[u] = a + 
ˆu 2/  .

This is the „modified least squares‟ (MOLS) estimator that is discussed in a number of sources, such
as Greene (2005). These are the values used for starting values for the MLE, as well. Looking
ahead, note that there is no natural method of moments estimator for the mean parameter in the
truncated normal model discussed in Section E63.3. For this model, we use

̂ /u = 0.

For the normal-exponential model, the moment equations that correspond to the preceding are

m2 = v2 + 1/2
3
m3 = -2/ .
1/ 3
 2 
Therefore, ˆ    and ˆ v  m2  1/ ˆ 2
 m3 

and ̂  a + 1/ ˆ .
E62: Stochastic Frontier Models and Efficiency Analysis E-16

The header information in the results table will display the decomposition of the variance of
the composed error in two parts. In the case of the half normal model,

Var[u] = [(-2)/]u2

not u2. Therefore, the estimated parameters might be a bit misleading as to the relative influence of
u on the total variation in the structural disturbance.
We note, these estimators are sometimes quite far from the maximum likelihood estimators,
particularly when the sample is small. But, they are generally quite satisfactory as starting values for
the MLE. The following demonstrates these results for the airline data, where we use MOLS and
MLE to fit a normal-half normal cost frontier. (Note, the signs of the OLS residuals are reversed
because we are fitting a cost function.) In the results below, we have imposed the assumption of
linear homogeneity in prices in the cost function by normalizing the six input prices, pk, pl, pe, pp,
pm, pf, by the property price, pp. The model contains log(pj/pp). To complete the constraint, we
have also normalized total cost by pp before taking logs.

CREATE ; lpk = Log(pk) $


CREATE ; lpmpp = lpm - lpp ; lpfpp = lpf - lpp ; lpepp = lpe - lpp
; lplpp = lpl - lpp ; lpkpp = lpk - lpp $
CREATE ; lcp = lc - lpp $
NAMELIST ; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp $
REGRESS ; Lhs = lc ; Rhs = x ; Res = e $
CREATE ; e = -e ; e2 = e*e ; e3 = e2*e $
CALC ; m2 = Xbr(e2) ; m3 = Xbr(e3) $
CALC ; List ; su = (m3 * Sqr(pi/2) / (1-4/pi))^(1/3)
; sv = Sqr(m2 - (1-2/pi) * su^2)
; a = b(1) + su * Sqr(2/pi) ; lambda = su/sv
; sgma = Sqr(su^2 + sv^2) $
FRONTIER ; Lhs = lc ; Rhs = x ; Cost $
The first set of results below are the OLS estimates with the correction to the constant term
and the method of moments estimators of u and v used to start the MLE. The maximum likelihood
estimators are shown next. The estimates for the stochastic frontier model include the log likelihood
and the implied estimates of u, v and their squares, based on the estimates of  = u/v and 2 = u2
+ v2, which are estimated by ML. (The reverse transformations are u2 = 22/(1 + 2) and v2 =
2/(1 + 2). The MLE is documented further in the next section.
-----------------------------------------------------------------------------
Ordinary least squares regression ............
LHS=LC Mean = 2.84024
Standard deviation = 1.09256
No. of observations = 256 Degrees of freedom
Regression Sum of Squares = 300.028 7
Residual Sum of Squares = 4.36487 248
Total Sum of Squares = 304.393 255
Standard error of e = .13267
Fit R-squared = .98566 R-bar squared = .98526
Model test F[ 7, 248] = 2435.25310 Prob F > F* = .00000
Diagnostic Log likelihood = 157.91523 Akaike I.C. = -4.00909
Restricted (b=0) = -385.41031 Bayes I.C. = -3.89830
Chi squared [ 7] = 1086.65108 Prob C2 > C2* = .00000
E62: Stochastic Frontier Models and Efficiency Analysis E-17

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error t |t|>T* Interval
--------+--------------------------------------------------------------------
Constant| 19.7932 27.45697 .72 .4717 -34.0214 73.6079
LY| .94303*** .01809 52.12 .0000 .90757 .97849
LY2| .08248*** .01236 6.67 .0000 .05825 .10671
LPKP| 1.42385 2.14849 .66 .5081 -2.78711 5.63480
LPLP| .01915 .10169 .19 .8508 -.18016 .21847
LPMP| .04504 1.41721 .03 .9747 -2.73264 2.82272
LPEP| -.57070 .67904 -.84 .4015 -1.90159 .76019
LPFP| -.04811** .01986 -2.42 .0161 -.08704 -.00919
--------+--------------------------------------------------------------------
[CALC] SU = .1296481
[CALC] SV = .1046056
[CALC] A = 19.8966785
[CALC] LAMBDA = 1.2393989
[CALC] SGMA = .1665862
Calculator: Computed 5 scalar results
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LCN
Log likelihood function 159.20743
Estimation based on N = 256, K = 10
Inf.Cr.AIC = -298.4 AIC/N = -1.166
Variances: Sigma-squared(v)= .01021
Sigma-squared(u)= .01890
Sigma(v) = .10103
Sigma(u) = .13746
Sigma = Sqr[(s^2(u)+s^2(v)]= .17059
Gamma = sigma(u)^2/sigma^2 = .64927
Var[u]/{Var[u]+Var[v]} = .40216
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.584
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LCN| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 19.8020 25.91115 .76 .4447 -30.9829 70.5869
LY| .95577*** .01781 53.68 .0000 .92088 .99067
LY2| .09086*** .01198 7.58 .0000 .06738 .11435
LPKP| 1.43400 2.02750 .71 .4794 -2.53982 5.40783
LPLP| .01242 .09676 .13 .8979 -.17722 .20205
LPMP| .05744 1.33747 .04 .9657 -2.56396 2.67883
LPEP| -.56860 .64356 -.88 .3770 -1.82995 .69275
LPFP| -.06002*** .01993 -3.01 .0026 -.09907 -.02096
|Variance parameters for compound error
Lambda| 1.36059*** .20306 6.70 .0000 .96261 1.75857
Sigma| .17059*** .00058 294.50 .0000 .16946 .17173
--------+--------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-18

E62.7 Estimating the Normal-Half Normal and Normal-


Exponential Models
ALS‟s canonical form of the model is the normal-half normal model,

y = x + v - Su, u = |U|, S = +1 for production, -1 for cost,


U ~ N[0,u2],
v ~ N[0,v2].

The command for estimating the stochastic frontier model is

FRONTIER ; Lhs = y ; Rhs = one, ... $

The default form is the normal-half normal model. In this form, model estimates consist of ,
  v2  u2 and  = u/v, and the usual set of diagnostic statistics for models fit by maximum
likelihood. The other basic form in the ALS model is the exponential model,

u ~  exp(-u), u> 0,

which has mean inefficiency E[u] = 1/ and standard deviation, u= 1/. The parameters estimated in
the exponential specification are (,,v). The estimate of u is reported in the results as well.
The following illustrate the estimator, with a normal-half normal cost frontier and a normal-
exponential production frontier. The coefficient estimates for the exponential cost frontier are shown
as well.

FRONTIER ; Cost ; Lhs = lcn ; Rhs = x $


FRONTIER ; Cost ; Lhs = lcn; Rhs = x; Model = Exponential $

The stochastic frontier results include the standard output for MLEs The derived estimates of u, v,
u2, v2 and  are shown as well. The value of  = u2/2 is given for comparability with other parts
of the literature. This ratio, which lies in (0,1) is sometimes reported as a variance decomposition of
. However, the variance of u = |U| is (1 - 2/)u2, so the appropriate decomposition is (1 -
2/)u2/[v2 + (1 - 2/)u2]. This is the value shown next under  in the results.
A likelihood ratio test against the hypothesis of no inefficiency follows the variance
estimates. The degrees of freedom for the test are accumulated in the table.. The first is for u in the
base case. The second is for the heteroscedasticity terms in Var[u] when they are introduced in the
model. Heteroscedasticity is developed in Chapter E63. The third term is for the truncation
parameters in the normal-truncated normal model, also developed in the next chapter. The “degrees
of freedom for the inefficiency model” are the sum of these three terms. The likelihood ratio statistic
is presented next. This is a nonstandard test because the null value of u is on the boundary of the
parameter space. Appropriate tables for the mixed chi squared test used here are given in Kodde and
Palm (1986). (A copy of the relevant parts of the table is kept internally by the program. (See, also,
Coelli, Rao and Battese (1998) for further details.)
E62: Stochastic Frontier Models and Efficiency Analysis E-19

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LCN
Log likelihood function 159.20743
Estimation based on N = 256, K = 10
Inf.Cr.AIC = -298.4 AIC/N = -1.166
Variances: Sigma-squared(v)= .01021
Sigma-squared(u)= .01890
Sigma(v) = .10103
Sigma(u) = .13746
Sigma = Sqr[(s^2(u)+s^2(v)]= .17059
Gamma = sigma(u)^2/sigma^2 = .64927
Var[u]/{Var[u]+Var[v]} = .40216
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.584
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LCN| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 19.8020 25.91115 .76 .4447 -30.9829 70.5869
LY| .95577*** .01781 53.68 .0000 .92088 .99067
LY2| .09086*** .01198 7.58 .0000 .06738 .11435
LPKP| 1.43400 2.02750 .71 .4794 -2.53982 5.40783
LPLP| .01242 .09676 .13 .8979 -.17722 .20205
LPMP| .05744 1.33747 .04 .9657 -2.56396 2.67883
LPEP| -.56860 .64356 -.88 .3770 -1.82995 .69275
LPFP| -.06002*** .01993 -3.01 .0026 -.09907 -.02096
|Variance parameters for compound error
Lambda| 1.36059*** .20306 6.70 .0000 .96261 1.75857
Sigma| .17059*** .00058 294.50 .0000 .16946 .17173
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Results for the normal-exponential model appear below. It is not possible to use a LR test to
choose between these two models. The test has zero degrees of freedom – neither model is obtained
by a restriction on the other. One possibility might be a Vuong (1989) statistic, which would be
computed as
nm
V , mi  log( fi | normal )  log( f i | exponential ) .
sm

Results of the test are shown below the model results. The statistic is well inside the inconclusive
region.
E62: Stochastic Frontier Models and Efficiency Analysis E-20

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LCN
Log likelihood function 159.89917
Estimation based on N = 256, K = 10
Inf.Cr.AIC = -299.8 AIC/N = -1.171
Exponential frontier model
Variances: Sigma-squared(v)= .01147
Sigma-squared(u)= .00568
Sigma(v) = .10709
Sigma(u) = .07539
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 3.968
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LCN| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 22.6569 25.48354 .89 .3740 -27.2899 72.6038
LY| .96069*** .01892 50.77 .0000 .92360 .99777
LY2| .09281*** .01249 7.43 .0000 .06832 .11729
LPKP| 1.65439 1.99409 .83 .4067 -2.25395 5.56272
LPLP| -.00962 .09785 -.10 .9217 -.20140 .18216
LPMP| -.06595 1.31569 -.05 .9600 -2.64465 2.51275
LPEP| -.62841 .63243 -.99 .3204 -1.86795 .61114
LPFP| -.06397*** .02033 -3.15 .0017 -.10381 -.02412
|Variance parameters for compound error
Theta| 13.2651*** 2.90719 4.56 .0000 7.5671 18.9630
Sigmav| .10709*** .00980 10.93 .0000 .08788 .12629
--------+--------------------------------------------------------------------

FRONTIER ; … half normal model $


CREATE ; fn = logl_obs $
FRONTIER ; … Model = Exponential $
CREATE ; fe = logl_obs
; mi = fn - fe $
CALC ; List
; vuong = Sqr(n) * Xbr(mi)/Sdv(mi) $
[CALC] VUONG = -.9047927
E62: Stochastic Frontier Models and Efficiency Analysis E-21

E62.7.1 Log Likelihoods for the Half Normal and Exponential Models
As will be evident below, different formulations of the log likelihood are most convenient
for estimation of the different forms of the frontier models. (And, different authors sometimes
parameterize the models differently.) The base case is the normal-half normal model. In this form,
vi~ N[0,v2] and ui = |Ui| where Ui ~ N[0,u2]. It follows that f(ui) = 2(ui/u), ui> 0. The density of
i = vi- ui has been shown to be

f(i) = (2/)(i/)(-i/).

The most common form of the individual term in the log likelihood function (and the one used in
LIMDEP) is
log Li = ½ log(2/) - log - ½(i/)2 + log[-Si/]
where i = yi - xi
 = u / v,
 2
= u2 + v2, v2 = 2 / (1 + 2), u2 = 22 / (1 + 2)
S = +1 for production frontier, -1 for cost frontier

Olsen‟s transformation is used for maximizing the log likelihood. We reparameterize the function in
terms of  = 1/ and = (1/). Then,

log Li = ½ log(2/) + log + ½i2 + log (-Si)


where i = yi - ′xi.
Define the functions ai = -Si
i = (ai)/(ai)
i = -aii = i2.

Then, the gradient and Hessian are

  xi    xi   0 
       
 log Li /      i   yi   i S  yi   1 /  
  0      0 
     i   

       xi xi 0 0
     
 2 log Li /             yi xi yi2 0 
   0 0 
    
0

  2 xi xi  2 yi xi i xi   0 0 i Sxi 
   
i   2 yi xi  2 yi2 i yi    0 1 / 2 i Syi 
  
 i x i yi i2   i Sxi i Syi 0 
  
E62: Stochastic Frontier Models and Efficiency Analysis E-22

The log likelihood for the exponential model is


log Li= log + ½2v2 + Si+ log[-Si/v - v].
The parameter  in the exponential model is 1/u. The Olsen transformation is not useful for this
model. Define ci = -Si/v - v, i = (ci), i = -cii - i2 and ai = Si/v - . The gradient and
Hessian for the exponential model are

   Sx /    Sxi 
   i v  
2 
 log Li /      i  v   1 /   v  S i 
     
 v 2
 S i /  v     2 v 
 x x  / 2  Sx ai Sxi / v 
      i i v i 
  
 2 log Li /        i   Sxi v2  av 
      
 v  v   ai Sxi / v ai v ai2 
 
 0  Sx i i Sxi 
 
   Sxi 1 / 2  v2 2v  i 
 
 i Sxi 2v  i   2i S i / v 
2 3

E62.7.2 Alternative Parameterization


Some treatments of the normal-half normal model (e.g., Coelli (1996)) use the alternative
parameterization  = u2 / 2 in the formulation of the log likelihood. This does not change the
model, since it is a one to one transformation of the parameters;


 .
1 

The parameterization in terms of  is more convenient but does not produce different results.

E62.7.3 Variance Estimator in Frontier 4.1


A number of researchers have used Tim Coelli‟s (1996) Frontier 4.1 program for estimation of
stochastic frontier models. Frontier 4.1 and LIMDEP use different methods for computing estimators
of the asymptotic covariance matrix of the ML estimator. LIMDEP uses either the BHHH estimator or
the negative inverse of the Hessian. Frontier 4.1 used the weighting matrix used by the DFP algorithm
to approximate the inverse Hessian during the iterations. As a general proposition, we recommend
against this „estimator,‟ and never use it. There is no theoretical assurance of its accuracy if
convergence is reached in a finite number of iterations. Nonetheless, we have been asked about this
many times. In the interest of methodological advance, LIMDEP provides a command switch,
; F41
that will invoke this estimator. (This is only provided for the stochastic frontier estimators.) No
indication is given in the output that this option has been used.
E62: Stochastic Frontier Models and Efficiency Analysis E-23

E62.8 Estimating Inefficiency and Efficiency Measures


The main objectives of fitting the frontier models is to estimate the inefficiency terms in the
stochastic model, ui, by observation. The Jondrow estimator of E[u|v-u] is the standard estimator.
This is
  ( w) 
Eˆ [u | ]   w ,   v  u , w =S/.
1   2 1  ( w) 

(This is an indirect estimator of u. Unfortunately, it is not possible to estimate ui directly from any
observed sample information. The various surveys noted earlier discuss the computation of and
properties of this estimator.) The counterpart for the normal-exponential model is

 ( w) 
Eˆ [u | ]  v   w , w = (S/v + v).
1  ( w) 
These are computed and saved as new variables in your data set with

; Eff = variable name

The ; List specification will also request a listing of this variable. This form is used for all
distributions and all variations of the stochastic frontier model.
By adding ; Eff = u to the frontier command, then

KERNEL ; Rhs = u $

we obtain the results below. (We also added the title to the command with ; Title = …) Note an
important element of the estimation. The „Standard Deviation‟ reported below is 0.054895, whereas
the estimate of u is 0.13746. The difference arises because the 0.054895 is an estimate of the
standard deviation of E[u|], not the standard deviation of u.

+---------------------------------------+
| Kernel Density Estimator for U |
| Observations = 256 |
| Points plotted = 256 |
| Bandwidth = .016298 |
| Statistics for abscissa values---- |
| Mean = .109394 |
| Standard Deviation = .054895 |
| Minimum = .030722 |
| Maximum = .350422 |
| ---------------------------------- |
| Kernel Function = Logistic |
| Cross val. M.S.E. = .000000 |
| Results matrix = KERNEL |
+---------------------------------------+
E62: Stochastic Frontier Models and Efficiency Analysis E-24

Figure E62.4 Analysis of Estimated Inefficiencies

E62.8.1 Estimating Technical or Cost Efficiency


One might be interested in estimating the „efficiency‟ of the individuals in the sample. The
model is usually specified in logs, of the form

log y = x + v - u.

Under this assumption, the efficiency of the individual would be

y
EFF =  Exp(u )
Optimal y

This can be obtained with

; Techeff = the variable name


or ; Costeff = the variable name

if you estimate a cost frontier instead. You may compute both inefficiencies and efficiency measures
in the same command. Figure E62.5 was obtained by adding

; Costeff = ecu

to the FRONTIER command, then requesting the kernel density estimator as before (with the title
changed accordingly).
E62: Stochastic Frontier Models and Efficiency Analysis E-25

Figure E62.5 Estimated Cost Efficiencies

E62.8.2 Confidence Intervals for Inefficiency and Efficiency


Estimates
Horrace and Schmidt (1996, 2000) suggest a useful extension of the Jondrow result. JLMS
have shown that the distribution of ui|i is that of a N[μi*,σ*] random variable, truncated from the left
at zero, where μi* = -εiλ2/(1+λ2) and σ* = σλ/(1+λ2). This result and standard results for the
truncated normal distribution (see, e.g., Greene (2011)) can be used to obtain the conditional mean
and variance of ui|i. With these in hand, one can construct some of the features of the distribution of
ui|i or E[TEi|i] = E[exp(-ui|i]. The literature on this subject, including the important contributions
of Bera and Sharma (1999) and Kim and Schmidt (2000) refer generally to „confidence intervals‟ for
ui|i. For reasons that will be clear shortly, we will not use that term – at least not yet, until we have
made more precise what we are estimating.
For locating 100(1-)% of the conditional distribution of ui|i, we use the following system
of equations
2 = v2 + u2
 = u/v
i* = -iu2/2 = -i2/(1+2)
* = uv/ = /(1 + 2)

LBi  i *   *  1 1  (1  2 )  i * /  *


UBi  i *   *  1 1  
2  i * /  *

Then, if the elements were the true parameters, the region [LBi,UBi] would encompass 100(1-)% of
the distribution of ui|i. For constructing „confidence intervals‟ for technical efficiency, TEi|i, it is
necessary only to compute TEUBi = exp(-LBi) and TELBi = exp(-UBi).
E62: Stochastic Frontier Models and Efficiency Analysis E-26

We note two caveats about the estimator. First, the received papers based on classical
methods have labeled this a confidence interval for ui. However, it is a range that encompasses
100(1-)% of the probability in the conditional distribution of ui|i. based on E[ui|i], not ui, itself.
The interval is „centered‟ at the estimator of the conditional mean, E[ui|i], not the estimator of ui,
itself, as a conventional „confidence interval‟ would be. The estimator is actually characterizing the
conditional distribution of ui|i, not constructing any kind of interval that brackets a particular ui –
that is not possible. Second, these limits are conditioned on known values of the parameters, so they
ignore any variation in the parameter estimates used to construct them. Thus, we regard this as a
minimal width interval.
You can request computation of these lower and upper bounds by adding

; CI(100( 1 -  )) = lower, upper

where 100(1-) is one of 90, 95, or 99 and lower, upper are names for two variables that will be
created. You may use this feature with ; Eff = variable or ; Techeff = variable (or ; Costeff =
variable for a cost frontier). If you have both ; Eff and ; Techeff in the command, the confidence
intervals are computed for ; Techeff. (You can obtain the interval for ; Eff in this case by computing
the negatives of the logs with CREATE.)
We obtained these bounds for our cost function with

; Costeff = euc ; CI(95) = eucl,eucu

We followed the estimation with

PLOT ; Rhs = eucl,ecu,eucu


; Title = Upper and Lower Bound Estimates of Cost Efficiency
; Vaxis = Cost Efficiency$

to obtain Figure E62.6.

Figure E62.6 Lower and Upper Bound Estimates of Cost Efficiency


E62: Stochastic Frontier Models and Efficiency Analysis E-27

The centipede plot is also a useful device in this context. The following redraws Figure E62.6 using
a different view for the lower and upper bounds

CREATE ; Firm_i = Trn(1,1) $


PLOT ; Lhs = firm_i ; Rhs = eucl,eucu
; Centipede ; Endpoints = 0,260 ; Grid
; Title = Confidence Limits for Cost Efficiency $

Figure E62.7 Centipede Plot of Efficiency Bounds

E62.8.3 Partial Effects on Efficiencies


The variables in the production or cost frontier function begin with either the inputs for the
production model or input prices and outputs in the cost model. Analyses of how these variables
affect technical or cost efficiency are not likely to be particularly revealing. However, if the function
includes environmental variables (we call these zi), it might be of interest to examine how variation
in these impacts efficiency. For our example, we consider

Log(Cost/Pp) =  + q logQ + qq log2Q + kk log(Pk/Pp)


+ Lload factor + Nnodes + SLog stage length + v + u

In this case, it might be interesting to examine how increased load factor, route complexity, or stage
length impact efficiency.
Expressions for the technical inefficiency values appear at the beginning of Section E62.8.
In those expressions, we will use

Efficiency = exp{- Eˆ [u | ] }.

The two expressions for the normal and exponential models are functions of a w() that is specific to
the model. Each may be written as

Efficiency = exp{-mA[wm()]}
E62: Stochastic Frontier Models and Efficiency Analysis E-28

Where m = half normal or exponential, m = /(1+2) for the half normal and 1/v for the
exponential, and wm is defined earlier. We now suppose that

 = y - x - z

where x is the theoretical inputs to the goal and z are the environmental variables. We require the
derivatives with respect to z. For convenience, let W = -w and exploit the symmetry of the normal
density. Then, A[wm()] = [(W)/(W) + W]. The derivative is

Efficiency/z = Efficiency-mdA(W)/dW -1 wm/ -.

The two terms that we need to complete the derivation are wm/ = S/ for the half normal model
and S/v for the exponential model and

dA(W )  W (W )  (W )  


2

 1      D(W ).
dW  (W )  (W )  
Collecting terms,
  2 /(1   2 ) 
Efficiency  
 Efficiency  D(W )   or   S  (  )
z  
 1 

We can sign this result, though the magnitude will be empirical. The first three terms are all between
zero and one, as is their product. S is either +1 for a production frontier or -1 for a cost frontier.
Thus, in total, the derivative is a fraction of the corresponding coefficient, which takes the same sign
for a cost frontier and the opposite sign for a production frontier.
Partial derivatives and simulations are computed with PARTIALS and SIMULATE. The
general approach would be

FRONTIER ; Cost (optional)


; Lhs = goal variable
; Rhs = one, x variables, z variables $

The command might also contain ; Eff = variable, ; Techeff = variable or ; Costeff = variable.
Then, you may follow it with

PARTIALS ; Effects: variables desired ; other options $


or SIMULATE ; Scenario … all options $

The function analyzed in these two commands is the technical or cost efficiency,

Efficiency = exp{- Eˆ [u | ] }.
E62: Stochastic Frontier Models and Efficiency Analysis E-29

The following demonstrates using the cost frontier, with variables z = (load factor, log stage length,
points served). Data on z are missing for one of the firms.

CREATE ; logstage = Log(stage) $


NAMELIST ; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp
; z = loadfctr,logstage,points $
FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z
; Eff = u ; Costeff = euc ; CI(95) = eucl,eucu $
SIMULATE ; Scenario: & loadfctr = .4(.025)1 ; Plot(ci) $
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LC
Log likelihood function 215.15699
Estimation based on N = 256, K = 13
Inf.Cr.AIC = -404.3 AIC/N = -1.579
Variances: Sigma-squared(v)= .00820
Sigma-squared(u)= .00753
Sigma(v) = .09054
Sigma(u) = .08676
Sigma = Sqr[(s^2(u)+s^2(v)]= .12539
Gamma = sigma(u)^2/sigma^2 = .47870
Var[u]/{Var[u]+Var[v]} = .25020
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 214.75424
Chi-sq=2*[LogL(SF)-LogL(LS)] = .806
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 9.19939 21.64273 .43 .6708 -33.21957 51.61835
LY| .97398*** .01751 55.63 .0000 .93966 1.00829
LY2| .05123*** .01029 4.98 .0000 .03106 .07140
LPKP| .49455 1.69257 .29 .7701 -2.82283 3.81193
LPLP| .13721* .08121 1.69 .0911 -.02195 .29637
LPMP| .45863 1.11624 .41 .6812 -1.72915 2.64642
LPEP| -.10302 .53634 -.19 .8477 -1.15422 .94818
LPFP| -.02090 .01794 -1.16 .2441 -.05607 .01427
LOADFCTR| -.99466*** .17446 -5.70 .0000 -1.33660 -.65273
LOGSTAGE| -.17940*** .02531 -7.09 .0000 -.22902 -.12979
POINTS| .00164*** .00031 5.20 .0000 .00102 .00225
|Variance parameters for compound error
Lambda| .95827*** .16869 5.68 .0000 .62763 1.28890
Sigma| .12539*** .00039 321.29 .0000 .12463 .12616
--------+--------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-30

---------------------------------------------------------------------
Model Simulation Analysis for JLMS efficiency estimator in SF model
---------------------------------------------------------------------
Simulations are computed by average over sample observations
---------------------------------------------------------------------
User Function Function Standard
(Delta method) Value Error |t| 95% Confidence Interval
---------------------------------------------------------------------
Avrg. Function .93354 .00635 147.07 .92110 .94598
LOADFCTR= .40 .95844 .00346 277.19 .95166 .96522
LOADFCTR= .43 .95502 .00344 277.54 .94827 .96176
LOADFCTR= .45 .95123 .00357 266.70 .94424 .95822
LOADFCTR= .48 .94706 .00392 241.56 .93937 .95474
LOADFCTR= .50 .94247 .00456 206.48 .93353 .95142
LOADFCTR= .53 .93746 .00552 169.87 .92664 .94828
(some rows omitted)
LOADFCTR= .83 .84622 .03145 26.91 .78458 .90786
LOADFCTR= .85 .83696 .03384 24.73 .77063 .90329
LOADFCTR= .88 .82763 .03616 22.89 .75676 .89850
LOADFCTR= .90 .81827 .03839 21.32 .74303 .89352
LOADFCTR= .93 .80892 .04053 19.96 .72947 .88836
LOADFCTR= .95 .79958 .04259 18.78 .71611 .88305
LOADFCTR= .98 .79029 .04455 17.74 .70296 .87761

Figure E62.8 Simulated Cost Efficiency Values

We have also analyzed the partial effects.

FRONTIER ; Cost ; Lhs = lcp ; Rhs = x,z $


PARTIALS ; Effects: loadfctr & loadfctr = .4(.025)1 ; Plot(ci) $
PARTIALS ; Effects: z ; Summary $
E62: Stochastic Frontier Models and Efficiency Analysis E-31

---------------------------------------------------------------------
Partial Effects Analysis for JLMS efficiency estimator in SF model
---------------------------------------------------------------------
Effects on function with respect to LOADFCTR
Results are computed by average over sample observations
Partial effects for continuous LOADFCTR computed by differentiation
Effect is computed as derivative = df(.)/dx
---------------------------------------------------------------------
df/dLOADFCTR Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
APE. Function -.22444 .06690 3.35 -.35557 -.09331
LOADFCTR= .40 -.13020 .02575 5.06 -.18067 -.07973
LOADFCTR= .43 -.14405 .03134 4.60 -.20547 -.08263
LOADFCTR= .45 -.15900 .03766 4.22 -.23281 -.08519
LOADFCTR= .48 -.17497 .04464 3.92 -.26246 -.08748
(Some rows omitted)
LOADFCTR= .85 -.37205 .09615 3.87 -.56051 -.18359
LOADFCTR= .88 -.37392 .09265 4.04 -.55551 -.19234
LOADFCTR= .90 -.37452 .08896 4.21 -.54887 -.20017
LOADFCTR= .93 -.37403 .08524 4.39 -.54109 -.20697
LOADFCTR= .95 -.37265 .08160 4.57 -.53259 -.21271
LOADFCTR= .98 -.37054 .07813 4.74 -.52368 -.21739

Figure E62.9 Partial Effects of Load Factor

---------------------------------------------------------------------
Partial Effects for JLMS efficiency estimator in SF model
Partial Effects Averaged Over Observations
* ==> Partial Effect for a Binary Variable
---------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
LOADFCTR -.25723 .07389 3.48 -.40205 -.11240
LOGSTAGE -.04620 .01292 3.58 -.07153 -.02088
POINTS .00035 .00012 2.95 .00012 .00058
---------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-32

E62.8.4 Partial Effects of Model Variables on Efficiencies


The preceding has examined the partial effects with respect to z in the model

y = x + z + v-Su.

It was noted that partial effects with respect to x are not likely to be particularly interesting.
Nonetheless, they could be computed.

NOTE: Partial effects of variables in the stochastic frontier efficiency models may be computed
with respect to any variable in any model, regardless of where those variables appear in the model.
That includes x in the original frontier model, z in the means of the truncated regression formats, and
z in the variances of the heteroscedasticity models.

To continue the earlier example, the partial effect of LogQ could be computed in the cost function
using
NAMELIST ; x = one,lq,lq^2,lpmpp,lpfpp,lpepp,lplpp,lpkpp $
NAMELIST ; z = loadfctr,logstage,points $
FRONTIER ; Cost ; Lhs = lcp ; Rhs = x,z $
PARTIALS ; Effects : lq ; summary $

Note that the specification will correctly account for the fact that the square of LogQ appears in the
cost function when it computes the partial effects.

E62.8.5 Examining Ranks of Inefficiencies


Researchers often analyze outcome data in which the absolute values of the inefficiencies are
not necessarily of interest. Rather, it is the ranking of observations that they wish to analyze. The
WHO analysis of health care attainment (see Section E62.4.2) is a prominent example. LIMDEP
provides several tools for examining ranks of inefficiencies.
First, to rank the raw observations on efficiency or inefficiency, use

CREATE ; rank variable = Rnk(variable) $

The Rnk function sorts the data for you and creates the ranking variable. The observation with the
highest value gets the rank of one. The lowest gets a rank of n. Note, tied observations do not get the
same rank. Tied observations are ranked in the order in which they appear in the data. For example, in
a sample of 100, if 10 observations are tied for third place, they will receive ranks 3 through 12.
Two CALC functions provide descriptive measures for ranks. For two sets of ranks, the
Spearman rank correlation coefficient is computed as

 = 1 - 6 Σidi2 /n(n2 - 1),


di= variable1i - variable2i
E62: Stochastic Frontier Models and Efficiency Analysis E-33

The function for computing this is

CALC ; List ; Rkc(variable1,variable2) $

The rank correlation is a correlation coefficient, so it has a natural range of measurement. (See the
application below.) For more than two sets of ranks, a useful statistic is Kendall‟s coefficient of
concordance,
W = 12  i1 (Si - S )2/[nK2(n2 - 1)]
n

where Si = Σkrankk,i.

To compute this measure, use

CALC ; List ; Cnc(ranks1,...,ranksK) $

The concordance coefficient is not a correlation coefficient, so its magnitude is ambiguous. It can be
used for a large sample test of discordance. Under the null hypothesis that the sets of ranks are
independent, the statistic has a large sample chi squared distribution. In particular,

K(n-1)W → χ2[K(n-1)].

To illustrate these computations, we have analyzed the WHO data described in Section
E62.4.2. We have fit identical stochastic frontier models for the two attainment variables, lcomp, the
log of the composite measure, and ldale, the log of disability adjusted life expectancy. We then
computed the ranks for the 191 countries and plotted the ranks for the two measures as well as the
raw efficiency measures. The simple correlation for the efficiency measures and the rank correlation
for the ranks are displayed. The commands are as follows:

NAMELIST ; x = one,logebar,loghbar,loghbar2 $
NAMELIST ; z = gini,lpopden,lgdpc,geff,voice,oecd,lpubthe,tropics $
FRONTIER ; Lhs = logdbar ; Rhs = x,z
; Eff = udale ; Techeff = edale $
FRONTIER ; Lhs = logcbar ; Rhs = x,z
; Eff = ucomp ; Techeff = ecomp $
CREATE ; dalerank = 192 - Rnk(edale) $
CREATE ; comprank = 192 - Rnk(ecomp) $
PLOT ; Lhs = dalerank ; Rhs = comprank
; Endpoints = 0,200 ; Limits = 0,200
; Title = Ranks of Efficiencies: DALE vs. COMP $
PLOT ; Lhs = edale ; Rhs = ecomp ; Endpoints = .8,1 ; Grid
; Title = Efficiencies: DALE vs. COMP $
CALC ; List ; Rkc(dalerank,comprank) $
CALC ; List ; Cor(edale,ecomp) $
E62: Stochastic Frontier Models and Efficiency Analysis E-34

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LOGDBAR
Log likelihood function 155.83849
Estimation based on N = 191, K = 14
Inf.Cr.AIC = -283.7 AIC/N = -1.485
Variances: Sigma-squared(v)= .00145
Sigma-squared(u)= .03288
Sigma(v) = .03808
Sigma(u) = .18134
Sigma = Sqr[(s^2(u)+s^2(v)]= .18529
Gamma = sigma(u)^2/sigma^2 = .95777
Var[u]/{Var[u]+Var[v]} = .89180
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 141.59006
Chi-sq=2*[LogL(SF)-LogL(LS)] = 28.497
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LOGDBAR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 2.60812*** .18255 14.29 .0000 2.25034 2.96590
LOGEBAR| .11227*** .01869 6.01 .0000 .07564 .14891
LOGHBAR| .30118*** .05072 5.94 .0000 .20177 .40059
LOGHBAR2| -.02710*** .00455 -5.96 .0000 -.03601 -.01818
GINI| -.30417*** .10600 -2.87 .0041 -.51192 -.09642
LPOPDEN| .00213 .00402 .53 .5955 -.00574 .01001
LGDPC| .07541*** .02424 3.11 .0019 .02789 .12293
GEFF| -.00673 .01551 -.43 .6642 -.03714 .02367
VOICE| .02093* .01113 1.88 .0601 -.00089 .04275
OECD| .01608 .03055 .53 .5987 -.04381 .07596
LPUBTHE| .00974 .01497 .65 .5150 -.01959 .03908
TROPICS| -.03703** .01714 -2.16 .0307 -.07063 -.00344
|Variance parameters for compound error
Lambda| 4.76248*** 1.22054 3.90 .0001 2.37026 7.15470
Sigma| .18529*** .00086 214.30 .0000 .18360 .18698
--------+--------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-35

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LOGCBAR
Log likelihood function 248.18065
Estimation based on N = 191, K = 14
Inf.Cr.AIC = -468.4 AIC/N = -2.452
Variances: Sigma-squared(v)= .00142
Sigma-squared(u)= .00888
Sigma(v) = .03768
Sigma(u) = .09421
Sigma = Sqr[(s^2(u)+s^2(v)]= .10147
Gamma = sigma(u)^2/sigma^2 = .86207
Var[u]/{Var[u]+Var[v]} = .69429
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 241.57767
Chi-sq=2*[LogL(SF)-LogL(LS)] = 13.206
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LOGCBAR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 3.21081*** .10704 30.00 .0000 3.00101 3.42060
LOGEBAR| .06590*** .01319 4.99 .0000 .04004 .09177
LOGHBAR| .18617*** .03763 4.95 .0000 .11240 .25993
LOGHBAR2| -.01509*** .00328 -4.61 .0000 -.02151 -.00867
GINI| -.25334*** .07579 -3.34 .0008 -.40189 -.10478
LPOPDEN| .00523* .00281 1.86 .0628 -.00028 .01073
LGDPC| .05747*** .01681 3.42 .0006 .02453 .09040
GEFF| .00290 .01068 .27 .7858 -.01803 .02384
VOICE| .02082** .00872 2.39 .0170 .00373 .03791
OECD| .01699 .01946 .87 .3827 -.02115 .05513
LPUBTHE| .01798** .00903 1.99 .0466 .00027 .03568
TROPICS| -.02365** .01191 -1.99 .0471 -.04700 -.00031
|Variance parameters for compound error
Lambda| 2.50000*** .41784 5.98 .0000 1.68104 3.31896
Sigma| .10147*** .00045 224.53 .0000 .10058 .10235
--------+--------------------------------------------------------------------

[CALC] *Result*= .6353076


[CALC] *Result*= .6062125
E62: Stochastic Frontier Models and Efficiency Analysis E-36

Figure E62.10 Ranks and Estimates of Efficiency


E62: Stochastic Frontier Models and Efficiency Analysis E-37

E62.9 Partially Nonparametric Stochastic Frontier Model


The stochastic frontier is fully parametric in both the deterministic part of the frontier and
the distribution of the components of i. This section examines a partially nonparametric model of
the form
y = g(x,z) + v – Su.

The estimator is based on the locally linear regression in Section E9.5. The underlying logic is the
result that in the stochastic frontier model, apart from the constant term, OLS consistently estimates
the slope parameters of the model and estimates the constant term with a known bias. For the
constant, a, the bias is E[u], the unconditional mean, which in the stochastic frontier model is

E[u] = u 2 /  .

Continuing this approach, then, the least squares residuals estimate i + E[u]. In addition, the least
squares residual variance, ee/n, consistently estimates Var[i] = 2 = v2 + [(1 – 2/)u2]. The
implication is that the only parameter remaining to estimate is u2. In Section E62.6.2, we used the
third moment of the OLS residuals and the method of moments to estimate u, then used this
estimate to estimate , the constant term in the frontier function.
The approach proposed here uses this same method with three differences.

1. The residuals used to compute the variance estimator are based on a locally linear,
nonparametric estimator of the deterministic function.

2. The remaining parameter to be estimated in this case is  rather than u. We will base the
estimation on the result u2  2 2 / (1   2 ).

3. The approach will be based on a maximum likelihood estimator rather than the method of
moments.

Estimation uses the following steps: We begin with estimation of the conventional normal-half
normal frontier model with a linear frontier function in order to obtain an initial estimator of  and of
2. The LOWESS estimator developed in Chapter E9.5 is then employed to estimate g(x,z) for each
point in the sample. The residuals from the estimated functions are used with the estimate of 2 for
estimation of . With 2 and  in hand, we can compute the constant term, a set of residuals, and the
JLMS estimators of technical or cost efficiency. Technical details appear in Section E62.9.2.
E62: Stochastic Frontier Models and Efficiency Analysis E-38

E62.9.1 Application
We have reestimated the airlines cost frontier with the semiparametric estimator. The
frontier functions differ noticeably, primarily in the parameter estimates that are statistically
insignificant. The kernel estimators suggest, however, that the difference in the estimates of
inefficiency are quite modest. The descriptive statistics suggest the same pattern. The final plot
shows more graphically how the nonparametric function has changed the estimates. The fact that
most of the estimates from the nonparametric estimator lie below the 45 degree line is consistent
with the appearance that generally, they are smaller than the parametric values. The last set of
results are the ordinary (Pearson) correlation and Kendall‟s tau.

FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Costeff = eup $


FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Lowess ; Costeff = eunp$
KERNEL ; Rhs = eunp,eup
; Title = Estimated Inefficiencies from Parametric and Nonparametric
Frontiers $
DSTAT ; Rhs = eup,eunp $
PLOT ; Lhs = eup ; Rhs = eunp ; Rh2 = eup ; Fill ; Grid ; Vaxis = EUNP
; Title = Nonparametric vs. Parametric Estimates $
CALC ; List; Cor(eup,eunp) ; Ktr(eup,eunp) $

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LC
Log likelihood function 215.15699
Estimation based on N = 256, K = 13
Variances: Sigma-squared(v)= .00820
Sigma-squared(u)= .00753
Sigma(v) = .09054
Sigma(u) = .08676
Sigma = Sqr[(s^2(u)+s^2(v)]= .12539
Gamma = sigma(u)^2/sigma^2 = .47870
Var[u]/{Var[u]+Var[v]} = .25020
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 214.75424
Chi-sq=2*[LogL(SF)-LogL(LS)] = .806
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
-----------------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-39

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 9.19939 21.64273 .43 .6708 -33.21957 51.61835
LY| .97398*** .01751 55.63 .0000 .93966 1.00829
LY2| .05123*** .01029 4.98 .0000 .03106 .07140
LPKP| .49455 1.69257 .29 .7701 -2.82283 3.81193
LPLP| .13721* .08121 1.69 .0911 -.02195 .29637
LPMP| .45863 1.11624 .41 .6812 -1.72915 2.64642
LPEP| -.10302 .53634 -.19 .8477 -1.15422 .94818
LPFP| -.02090 .01794 -1.16 .2441 -.05607 .01427
LOADFCTR| -.99466*** .17446 -5.70 .0000 -1.33660 -.65273
LOGSTAGE| -.17940*** .02531 -7.09 .0000 -.22902 -.12979
POINTS| .00164*** .00031 5.20 .0000 .00102 .00225
|Variance parameters for compound error
Lambda| .95827*** .16869 5.68 .0000 .62763 1.28890
Sigma| .12539*** .00039 321.29 .0000 .12463 .12616
--------+--------------------------------------------------------------------
+-----------------------------------------------+
| Locally linear weighted regression estimation |
| Sample size 256 |
| Model size 11 |
| Band width .500000 |
| LOESS Sum of Squared Residuals 1.69637 |
| OLS Sum of Squared Residuals 2.79975 |
| Derivatives Matrix LOCLBETA |
+-----------------------------------------------+
Reestimating lambda using residuals based on LOWESS regression
Normal exit: 3 iterations. Status=0, F= -337.3385
-----------------------------------------------------------------------------
Partially Nonparametric Stochastic Frontier Fit by LOWESS
Dependent variable LC
Estmation based on N = 256, K = 11
Variances: Sigma-squared(u)= .00438 Sigma(u) = .06616
Sigma-squared(v)= .00504 Sigma(v) = .07096
Sigma = Sqr[(s^2(u)+s^2(v)]= .09702 Lambda = .93233
Stochastic Cost Frontier Model, e = v+u
-----------------------------------------------------------------------------
Statistical results are for the sample means of the LOWESS estimated betas.
They are not moments of an asymptotic distribution.
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
Constant| 34.8551 23.42958 1.49 .1368 -11.0661 80.7762
LY| .98897*** .05040 19.62 .0000 .89018 1.08775
LY2| .04598*** .01677 2.74 .0061 .01310 .07885
LPKP| 2.48149 1.78813 1.39 .1652 -1.02319 5.98616
LPLP| .09976 .10851 .92 .3579 -.11292 .31244
LPMP| -.85374 1.34656 -.63 .5261 -3.49295 1.78547
LPEP| -.71103 .43514 -1.63 .1023 -1.56389 .14183
LPFP| -.02183 .03324 -.66 .5114 -.08698 .04332
LOADFCTR| -.78691 .65061 -1.21 .2265 -2.06208 .48826
LOGSTAGE| -.20490* .11308 -1.81 .0700 -.42653 .01672
POINTS| .00225 .00205 1.10 .2710 -.00176 .00627
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E62: Stochastic Frontier Models and Efficiency Analysis E-40

Descriptive Statistics
--------+---------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+---------------------------------------------------------------------
EUP| .933537 .025027 .812486 .975689 256 0
EUNP| .948487 .019528 .844732 .983878 256 0
--------+---------------------------------------------------------------------

[CALC] *Result*= .8690148


[CALC] *Result*= .6339461
Calculator: Computed 2 scalar results

Figure E62.11 Kernel Estimators of Inefficiency Distributions

Figure E62.12 Plot of Nonparametric Estimates vs. Parametric Estimates


E62: Stochastic Frontier Models and Efficiency Analysis E-41

E62.9.2 Technical Details


The log likelihood function for the normal-half normal model is the sum of

log Li = ½ log(2/) - log - ½(i/)2 + log[-Si/].

The value of 2= v2 + [(1 – 2/)u2]is estimated using the squared LOWESS residuals; it is the
sample variance = q2. The LOWESS residuals, themselves, are estimates of i + E[ui]. With q2 and
the residuals in hand, the log likelihood is a function only of . During the iteration, we compute

a = /(1+2)1/2,
s2 = q2 / (1 – (2/)a2), then s

m = as 2 / 
ei = residuali - m.

These residuals and s are used to compute logLi and the derivative with respect to . This estimation
step provides the estimator of  that we need to compute the efficiencies. After estimation of ,
computation of the JLMS estimates of inefficiency is done the same as in the parametric form of the
model, using the LOWESS residuals.

E62.10 The Normal-Gamma Model


The normal-gamma model is the remaining distributional form of the stochastic frontier
model. Under this specification,

P exp(ui )uiP 1
ui ~ , ui  0, P  0,   0.
( P )

This model is more flexible than the half normal or exponential model in that with two parameters, it
allows the both the shape and location to vary independently. (The truncation model does likewise,
but it is considerably more difficult to estimate.) To specify the gamma model, use

; Model = Gamma (or ; Model = G)

The normal-gamma model is estimated by the method of simulated maximum likelihood.


(See Greene (2000b) and the details in Section E62.10.2.) The counterpart to the JLMS estimator of
the inefficiency, E[u|] must also be estimated by simulation.
E62: Stochastic Frontier Models and Efficiency Analysis E-42

E62.10.1 Application of the Normal-Gamma Model


We illustrate the gamma model by fitting a cost frontier model with normal-gamma
inefficiency. For comparison, we have also fit the exponential model, which results when P is
constrained to equal one. (The exponential model is fit directly by its own log likelihood, not by
constraining P to equal one in the gamma model.) We have also computed the inefficiencies for the
two models, and plotted kernel density estimators to compare them. The commands are

FRONTIER ; Lhs = lc ; Rhs = x ; Cost ; Model = Gamma ; Costeff = eucg


; Pts = 50 ; Halton $
FRONTIER ; Lhs = lc ; Rhs = x ; Cost ; Model = Exponential ; Costeff = euce $
KERNEL ; Rhs = eucg,euce
; Title = Kernel Density Estimates for E[u|e,exponential and gamma] $
We note by the Wald and likelihood ratio tests, we cannot reject the hypothesis of the exponential
model (P is close to one). The similarity of the kernel density estimators is consistent with this finding.
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LC
Log likelihood function 159.94270
Estimation based on N = 256, K = 11
Inf.Cr.AIC = -297.9 AIC/N = -1.164
Model estimated: Aug 22, 2011, 22:09:16
Normal-Gamma frontier model
Variances: Sigma-squared(v)= .01169
Sigma-squared(u)= .00547
Sigma(v) = .10814
Sigma(u) = .07399
Stochastic Cost Frontier Model, e = v+u
Half Normal:u(i)=|U(i)|; frontier model
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 4.055
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 22.9007 27.13658 .84 .3987 -30.2860 76.0874
LY| .96086*** .02028 47.38 .0000 .92112 1.00061
LY2| .09283*** .01327 7.00 .0000 .06682 .11883
LPKP| 1.67283 2.12387 .79 .4309 -2.48987 5.83553
LPLP| -.01112 .06724 -.17 .8687 -.14290 .12066
LPMP| -.07676 1.37564 -.06 .9555 -2.77297 2.61944
LPEP| -.63376 .68533 -.92 .3551 -1.97698 .70946
LPFP| -.06405*** .02311 -2.77 .0056 -.10934 -.01876
|Variance parameters for compound error
Theta| 12.4180** 5.05037 2.46 .0139 2.5194 22.3165
P| .84426 .69128 1.22 .2220 -.51062 2.19913
Sigmav| .10814*** .01148 9.42 .0000 .08563 .13064
--------+--------------------------------------------------------------------
E65: Data Envelopment Analysis E-43

-----------------------------------------------------------------------------
Log likelihood function 159.89917
Exponential frontier model
Variances: Sigma-squared(v)= .01147
Sigma-squared(u)= .00568
Sigma(v) = .10709
Sigma(u) = .07539
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 3.968
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 22.6569 25.48354 .89 .3740 -27.2899 72.6038
LY| .96069*** .01892 50.77 .0000 .92360 .99777
LY2| .09281*** .01249 7.43 .0000 .06832 .11729
LPKP| 1.65439 1.99409 .83 .4067 -2.25395 5.56272
LPLP| -.00962 .09785 -.10 .9217 -.20140 .18216
LPMP| -.06595 1.31569 -.05 .9600 -2.64465 2.51275
LPEP| -.62841 .63243 -.99 .3204 -1.86795 .61114
LPFP| -.06397*** .02033 -3.15 .0017 -.10381 -.02412
|Variance parameters for compound error
Theta| 13.2651*** 2.90719 4.56 .0000 7.5671 18.9630
Sigmav| .10709*** .00980 10.93 .0000 .08788 .12629
--------+--------------------------------------------------------------------

Figure E62.13 Kernel Density Estimates for Gamma and Exponential Inefficiencies
E65: Data Envelopment Analysis E-44

E62.10.2 Technical Details on Normal-Gamma Model


The log likelihood for this model is equal to the log likelihood for the normal-exponential
model plus a term that is produced by the difference between the exponential and the gamma
distributions;
Log L = Log L(exponential)
+ n[(P-1)log - log(P)] + i log h(P-1,i)

 z r 1/ v    ( z  i ) / v  dz
where h(r,i) = 0

, i = -i - v2.
0
1/ v    ( z  i ) / v  dz

The normal-exponential model results if P = 1. Computation of the function h(r,i) is the obstacle to
estimation. Beckers and Hammond (1987) derived a closed form expression, but the result has never
been operationalized – it is complex in the extreme. Greene (1990) attempted estimation by using a
crude approximation with Simpson‟s rule, but failed to obtain reasonable results. (See Ritter and
Simar (1997).)
A satisfactory solution is produced by the technique of maximum simulated likelihood. The
integral and its derivatives can be estimated consistently by Monte Carlo simulation. The crucial
result is that h(r,i) is the expectation of a random variable;

h(r,i) = E[zr | z 0]
where z ~ N[i, v2]
i = -i- v2

Therefore, h(r,i) is the expected value of zr where z has a truncated at zero normal distribution.
Thus, we estimate h(r,i) by using the mean of a sample of draws from this distribution. For given
values of i and i (i.e., yi, xi, , v, , r), h(r,i) is consistently estimated by

1 Q
hˆi   q 1 ziqr
Q

where ziq is a random draw from the truncated normal distribution with mean parameter i and
variance parameter v. This produces the simulated log likelihood function

Log LS = Log L(exponential)

+ n[(P-1)log - log(P)] + i log ĥ (P-1,i)

which for a given set of draws is a smooth and continuous function of the parameters.
E65: Data Envelopment Analysis E-45

Random draws from the truncated distribution are obtained using Geweke‟s method as
follows: Let
L = truncation point = 0 for this application
 = the mean of untruncated distribution = -i - v2
 = the standard deviation of the untruncated distribution = v
PL = [(L - ) / ]
F = one draw from U[0,1]
z =  + -1[PL + F(1 - PL)]
Then, z = the draw from the truncated distribution.

Collecting all terms, then, this produces the simulated log likelihood function:

Log L = n{log + ½ v22} + i{di + log[-(di/v + v)]}

+ n[(P-1)log - log(P)]

 P 1 
1  1    i   
q1  i   v   Fiq  (1  Fiq )  
Q
+ i log   
 Q    v   

i = yi - xi
i = -i- v2

and Fiq is a fixed set of Q draws from U[0,1] specific to the individual. Derivatives of h(r,i) and log
h(r,i) are also estimated by simulation. The JLMS efficiency measure has the simple form

E[u|] = h(P,i) / h(P-1,i).

The final consideration is the method of obtaining the draws. The default method is to use
the random number generators. Since this is a very computation intensive model, it is usually more
efficient to use Halton draws – you can use many fewer Halton draws than random draws to obtain
the same quality results. Halton draws are discussed in Section R24.7. To use Halton draws with
this estimator, add
; Halton

to the command. The number of points for either method is specified with

; Pts = the desired number of draws

We have used this feature in the example in the previous section.


E65: Data Envelopment Analysis E-46

E62.11 Sample Selection in a Stochastic Frontier Model


This model is a counterpart to familiar models of sample selection. See Greene (2010) for
details on the methodology. Additional results appear in Terza (2010). The model is a familiar
sample selection form

d* = z + w, d = 1(d* > 0)


y = x + v - u
u = |U| with U ~ N[0,u2]
(v,w) ~ bivariate normal with [(0,0),(v2, v, 1)]
(y,x) only observed when d = 1.

Thus, the selection operates through the heterogeneity component of the production model, not the
inefficiency. (Thus, observation is not viewed as a function of the level of inefficiency.)
The model is fit by maximum simulated likelihood. To request it, use LIMDEP‟s usual
format for sample selection models,

PROBIT ; Lhs = d ; Rhs = variables in w ; Hold $


FRONTIER ; Lhs = y ; Rhs = variables in x; Selection $

The model must be the base case, half normal, with no panel data application, no truncation, or
heteroscedasticity, etc. You may control the simulations with ; Halton and ; Pts for the simulation.
Efficiency and inefficiency estimates are saved as with other models with ; Eff and ; Techeff.
However, observations in the nonselected part of the sample are given missing values (-999) for any
of these computations. The PARTIALS and SIMULATE commands do not inherit the selection
model – these commands are not available after fitting this model.

E62.11.1 Application
The following creates a data set that conforms exactly to the assumptions of the model.

CALC ; Ran(123457) $
SAMPLE ; 1-2000 $
CREATE ; z1 = Rnn(0,1) ; z2 = Rnn(0,1) $
CREATE ; v1 = Rnn(0,1) ; v2 = Rnn(0,1) $
CREATE ; e1 = v1 ; e2 = .7071 * (v1+v2) $
CREATE ; ds = z1 + z2 + e1 ; d = ds > 0 $
CREATE ; u = Abs(Rnn(0,1)) ; x1 = Rnn(0,1) ; x2 = Rnn(0,1) $
CREATE ; y = x1 + x2 + e2 - u $
PROBIT ; Lhs = d ; Rhs = one,z1,z2 ; Hold $
FRONTIER ; Lhs = y ; Rhs = one,x1,x2 ; Selection $
E65: Data Envelopment Analysis E-47

-----------------------------------------------------------------------------
Binomial Probit Model
Dependent variable D
Log likelihood function -825.27526
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
D| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Index function for probability
Constant| .03616 .03525 1.03 .3051 -.03294 .10525
Z1| .96314*** .04604 20.92 .0000 .87291 1.05338
Z2| 1.01534*** .04702 21.59 .0000 .92318 1.10750
--------+--------------------------------------------------------------------
Warning 141: Iterations:current or start estimate of sigma nonpositive
Normal exit: 14 iterations. Status=0, F= 1916.202
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable Y
Log likelihood function -1916.20216
Estimation based on N = 2000, K = 6
Inf.Cr.AIC = 3844.4 AIC/N = 1.922
Variances: Sigma-squared(v)= 1.00545
Sigma-squared(u)= 1.07396
Sigma(u) = 1.03632
Sigma(v) = 1.00272
Sigma = 1.44202
Lambda = 1.03351
Sample Selection/Frontier Model
Murphy/Topel Corrected VC Matrix
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 -1662.32532
Chi-sq=2*[LogL(SF)-LogL(LS)] = -507.754
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
-----------------------------------------------------------------------------

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
Y| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -.04492 .10971 -.41 .6822 -.25994 .17011
X1| 1.00102*** .03357 29.82 .0000 .93522 1.06682
X2| .95627*** .03195 29.93 .0000 .89364 1.01890
Sigma(u)| 1.03632*** .13217 7.84 .0000 .77728 1.29537
Sigma(v)| 1.00272*** .05471 18.33 .0000 .89549 1.10995
Rho(w,v)| .77553*** .06187 12.54 .0000 .65427 .89679
--------+--------------------------------------------------------------------
E65: Data Envelopment Analysis E-48

E62.11.2 Log Likelihood and Estimation Method


Write the model structure as

d* = z + w, w ~ N[0,1], d = 1(d* > 0)


y = x + vv - u u
u = |U| with U ~ N[0,1]
(v,w) ~ bivariate normal with [(0,0),(1, , 1)]
(y,x) only observed when d = 1.

(Note for convenience later, we have moved the scale parameters into the structural model.) To set
up the estimator, we now write w in its conditional on v form,

w|v = v + h where h ~ N[0, (1 - 2)] and h is independent of v.


Therefore, d*|v = z +v + h, d = 1(d* > 0|v)

  z  v  
Then, Prob[d = 1 or 0 | z,v] =  (2d  1)  
  1  2  
 

For the selected observations, d = 1, conditioned on v, the joint density for y and d is the product of
the marginals since conditioned on v, y and d are independent;

f(y, d = 1|x,z,v) = f(y|x,v) Prob(d = 1|z,v).

We have the second part above. For the first part,

y|x,v = (x + vv ) - uu

where u is the truncation at zero of a standard normal variable, so f(u) = 2(u), u>0. The Jacobian of
the transformation from u to y is 1/u, so by the change of variable, the conditional density is

2  (x  v v)  y 
f ( y | x, v)    ,(x  v v)  y  0.
u  u 

Therefore, the joint conditional density is

2  (x  v v)  y   z  v 
f ( y, d  1| x, z, v)      .
u  u   1   
2
E65: Data Envelopment Analysis E-49

To obtain the unconditional density, it is necessary to integrate v out of the conditional density.
Thus,
2  v v  ( y  x))   z  v 
f ( y, d  1| x, z)        f (v)dv .
  1   
v  u

2
u

The relevant term in the log likelihood is log f(y,d=1|x,z). For the nonselected observations, the
contribution to the log likelihood is the log of the unconditional probability of nonselection, which is

  z  v  
Prob(d = 0|z) = v
     f (v)dv .
  1  2  

The integrals do not exist in closed form, so these terms cannot be evaluated as is. Before
proceeding, we note the additional complication, x + vv - y = uu> 0, so the density f(v) is not the
standard normal that intuition might suggest; it is a truncated normal.
The integrals can be computed by simulation. By construction,

2  x + v v  y)   z  v   2  x +  v  y)   z  v  


      f (v)dv  Ev       
v
v u  u   1  
2   u  u   1    
2

so by sampling from the distribution of v, we can compute the function of v and average to obtain the
integrals. In order to sample the draws on v, we note the implied truncation,

v> (y - x)/v or v>/v.

Draws from the truncated normal can be obtained using result (E-1) in Greene (2011). Let A equal a
draw from the uniform (0,1) population. The desired draw from the truncated normal distribution
will be
vr = -1 [(/v) + Ar(-/v)].

Collecting all terms, then, the simulated log likelihood will be

  2  x +  v  y )   z  v    z - v   


1 R 
log LS   i log  di   
v ir
  
ir
  +(1- di )   ir
 
R r 1  
  u       2    1  2  
u 1     

where the draws on vir are as shown above. Derivatives of this simulated log likelihood are obtained
numerically using finite differences.
E65: Data Envelopment Analysis E-50

E63: Heteroscedasticity and Truncation in


Stochastic Frontier Models
E63.1 Introduction
This chapter develops several extensions of the stochastic frontier model presented in
Chapter E62. The four models considered here are as follows:

 Heteroscedasticity in v and/or u
 Truncated normal with nonzero, heterogeneous mean in the underlying U
 Heterogeneity in the parameter of the exponential or gamma distribution
 Amsler et al.‟s „scaling model‟

E63.2 Heteroscedasticity and Heterogeneity


In the development of the frontier model, an important question concerns how to introduce
observed heterogeneity into the specification. Suppose the vector of variables zi contains the
information. For example, in the airline data, we have data on load factor, stage length and number
of points in the route map, that may also impact production, cost and efficiency. In the model
proposed thus far, the only point at which one might introduce zi appears to be in the goal function
itself, which would become
yi = ′xi + ′zi + vi - ui.

This is a common approach. (See, e.g., Greene (2004a,b).) In this chapter, we present two other
methods of introducing observed heterogeneity in the frontier model, in the variance parameters and
in the mean of the underlying inefficiency.

E63.2.1 Heterogeneity in the Scale Parameters


A natural departure point is to allow observable variation in v2 and/or u2. For the first of
these, the term heteroscedasticity is appropriate. (The papers by Hadri et al. (1999, 2003a,b) develop
heteroscedasticity models for frontier specifications.) For the second of these, a result which seems
routinely to be overlooked in the literature is that allowing u2 to vary over observations, call it u,i2,
induces more than just heteroscedasticity. Unavoidably in all model specifications, when this
parameter varies over individuals, then both the variance and the mean of ui do also. For the half
normal model, regardless of how u,i varies,

E[ui] = u,i(0)/(0) = 0.79788u,i.

A like result emerges in the truncated normal model. In the exponential model, the mean of ui equals its
standard deviation, while in the gamma model, it is a multiple, P1/2, of it. Thus, in all cases, as regards
ui, the term heteroscedasticity, while not inappropriate, is nonetheless ambiguous. These models cannot
be heteroscedastic without also having a heterogeneous mean. In what follows, therefore, we continue
to use the familiar terminology, but we emphasize the nature of the model as well.
E65: Data Envelopment Analysis E-51

The models of scale heterogeneity may extend either variance parameter with the
specification of the variance functions

Var[U|zi] = ui2 = u2 exp(zi) (heteroscedastic)


Var[v|zi] = v2 = v2 exp(wi) (heteroscedastic)
Var[u|zi] = u2 exp(z) and Var[v|zi] = v2 exp(wi) (doubly heteroscedastic)

There is no requirement that the same variables enter the two functions, and either or both may be
heterogeneous. The model specification is

; Heteroscedasticity or ; Het
and either or both of
; Hfv = variables in the variance of v
; Hfu = variables in the variance of u

If either variance is not given, it is assumed to be constant. The variance function is the exponential
format used throughout LIMDEP If either variance is unspecified, the implied model is ji2 = exp(
or ) which is the same as

; Hfv = one or ; Hfu = one

If both are unspecified, then the implied model

; Het ; Hfv = one ; Hfu = one

is the default, normal-half normal stochastic frontier model. It provides identical estimates. (Try it.)
A constant (one) is automatically inserted into both lists if you do not include it. This form may be
used with the normal-half normal and normal-truncated normal models.

E63.2.2 Exponential and Gamma Models with Heterogeneity


The one sided component of the normal-exponential and normal-gamma models is
parameterized with a scale parameter, , which is thus far taken to be a constant. In these models,

E[ui] = P/ = Pu

where P = 1 in the exponential model. The exponential heteroscedasticity model for ui is extended to
these two models by using

i =  exp(-zi).

With this parameterization, the estimates from this model will be comparable to those for the half
normal and truncated normal models. (See the examples below.) To request this form, use

; Het ; Hfu = the list of variables.


E65: Data Envelopment Analysis E-52

The list should not contain a constant term, one. This may be used in all implementations of the
exponential gamma model. Note, however, that in the panel data settings, the parameter is assumed
to be time invariant. The values for zi are taken from the data record for the last period for firm i.
We will return to this subject below. The symmetric component, v, may also be heteroscedastic, as
in the other models, with

; Hfv = list of variables.

E63.2.3 Efficiency Estimation with Heteroscedasticity


This extension does not change the computation of measures of efficiency or inefficiency.
The central results are the JLMS estimators,

  ( w) 
Eˆ [u | ]  2 
 w ,   v  u , w =S/
1   1  ( w) 

for the half normal models and

 ( w) 
Eˆ [u | ]  v   w , w = (S/v + v)
 1   ( w) 

for the exponential models. These functions are evaluated for each observation at

i = u,i / v,i
and i2 = u,i2 + v,i2

for the half normal model and v,i and i likewise in the exponential and gamma models.

E63.2.4 Application
The estimates below show a production frontier based on the six inputs. The second set of
results presents the heteroscedastic model, with the variance of v a function of the log of the average
stage length and the variance of u depending on the load factor and the log of the number of points
served. We examine the efficiency results, then compute the average partial effects of the
environmental variables on technical efficiency.

FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = eu $


FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = euhet
; Het ; Hfv = lstage ; Hfu = loadfctr,points $
PARTIALS ; Effects: lstage / loadfctr / points ; Summary $
KERNEL ; Rhs = eu,euhet
; Title = Kernel Estimators for Technical Efficiency $
PLOT ; Lhs = eu ; Rhs = euhet ; Rh2 = eu ; Fill ; Grid
; Title = Estimates of Technical Efficiency
; Vaxis = exp(-E[u|e]) for Heteroscedastic Model $
E65: Data Envelopment Analysis E-53

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 108.43918
Estimation based on N = 256, K = 9
Variances: Sigma-squared(v)= .01902
Sigma-squared(u)= .01692
Sigma(v) = .13791
Sigma(u) = .13007
Sigma = Sqr[(s^2(u)+s^2(v)]= .18957
Gamma = sigma(u)^2/sigma^2 = .47074
Var[u]/{Var[u]+Var[v]} = .24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = .730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439
LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530
LP| .44533*** .09498 4.69 .0000 .25917 .63149
LF| .37257*** .07038 5.29 .0000 .23463 .51052
LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299
LM| .69910*** .07580 9.22 .0000 .55054 .84766
LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759
|Variance parameters for compound error
Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373
Sigma| .18957*** .00064 297.81 .0000 .18832 .19082
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-54

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 149.30854
Estimation based on N = 256, K = 12
Inf.Cr.AIC = -274.6 AIC/N = -1.073
Variances: Sigma-squared(v)= .01292
Sigma-squared(u)= .03575
Sigma(v) = .11367
Sigma(u) = .18907
Sigma = Sqr[(s^2(u)+s^2(v)]= .22061
Gamma = sigma(u)^2/sigma^2 = .73450
Var[u]/{Var[u]+Var[v]} = .50132
Variances averaged over observations
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 2
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 3
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = 82.468
Kodde-Palm C*: 95%: 8.761, 99%: 12.483
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -3.29243*** .72664 -4.53 .0000 -4.71662 -1.86824
LL| -.47507*** .08890 -5.34 .0000 -.64932 -.30083
LP| .50435*** .10452 4.83 .0000 .29950 .70920
LF| .53204*** .07550 7.05 .0000 .38406 .68003
LE| 2.36654*** .69245 3.42 .0006 1.00936 3.72372
LM| .53413*** .08670 6.16 .0000 .36419 .70406
LK| -2.43136*** .77258 -3.15 .0016 -3.94558 -.91713
|Parameters in variance of v (symmetric)
Constant| -3.97891*** .86601 -4.59 .0000 -5.67626 -2.28155
LSTAGE| -.06406 .13359 -.48 .6315 -.32590 .19777
|Parameters in variance of u (one sided)
Constant| 9.96191** 4.51238 2.21 .0273 1.11781 18.80600
LOADFCTR| -25.9711*** 9.37571 -2.77 .0056 -44.3471 -7.5950
POINTS| -.00353 .01288 -.27 .7840 -.02877 .02171
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

The figure below displays the kernel density estimators for the two sets of estimated
inefficiencies. The upper one is for the heteroscedastic model. The figure shows clearly the
influence of the heterogeneity. The means of the two distributions are virtually the same, but the
variance in the heteroscedastic model is considerably higher.
E65: Data Envelopment Analysis E-55

Figure E63.1 Kernel Estimators for Density of E[u|] with and without Heteroscedasticity

Figure E63.2 Plot of Estimated Inefficiencies, Heteroscedastic vs. Homoscedastic

---------------------------------------------------------------------
Partial Effects for JLMS Estimator in Normal/het SF Model
Partial Effects Averaged Over Observations
* ==> Partial Effect for a Binary Variable
---------------------------------------------------------------------
Partial Standard
(Delta method) Effect Error |t| 95% Confidence Interval
---------------------------------------------------------------------
LSTAGE -.00034 .00071 .48 -.00174 .00105
LOADFCTR .62934 .17576 3.58 .28485 .97382
POINTS .00009 .00031 .28 -.00052 .00069
---------------------------------------------------------------------
E65: Data Envelopment Analysis E-56

E63.2.5 Technical Details


For the models with heteroscedasticity, we revert to the original structural form of the model
to form the log likelihoods. For the normal-half normal model, for example, we use

log Li = - log(2/) - logi - ½(i/i)2 + log[-Sii/i]

where i = ui
2
 ui
2

i = ui / vi
ui2 = exp(zi)
vi 2
= exp(wi),

where S = +1 for a production frontier and -1 for a cost frontier. Likewise, for the truncation model,

log Li = - ½log2 -logi - ½[(Si + )/i]2


+ log[(/i - Sii)/i] - log(/u.i ).

We build the structure of the model with two freely varying variance parameters, u,i and v,i, rather
than the reduced form parameters  and . The use of i as a free parameter would not be
appropriate because the numerator and denominator of i must be allowed to vary freely and
independently. A like consideration rules out the composed parameter i. The formulation of the
log likelihood and its derivatives follows the results given earlier for the homogeneous cases. Where
the derivatives with respect to  and  emerge, we use the chain rule to differentiate with respect to
u,i and v,i first. Note that the independent parameter u and v have been absorbed into the
exponential functions. Thus, v is exp(0). This ensures that the variances are always positive.
The normal-gamma and normal-exponential models are not reparameterized. The log
likelihood for the exponential model with variance heterogeneity is

log Li = logi+ ½i2i,v2 + iSi+ log[-Si/i,v - ii,v]


where i =  exp(-zi)
and i,v = v exp(wi).

The sign change in i is used to make the normal-exponential model comparable to the normal-half
normal model, since Var[ui] = 1/i2.
E65: Data Envelopment Analysis E-57

E63.3 The Normal-Truncated Normal Model


The normal-truncated normal model relaxes an implicit restriction in the normal-half normal
model, that the mean of the underlying inefficiency variable is zero. The extended model is obtained
by allowing , the mean of U, to be nonzero;

y = x + v - u, u = |U|
U ~ N[,u2]
v ~ N[0,v2]

(With a constant term in the model, no similar parameter can be introduced into the distribution of v.)
The command for estimating this model is

FRONTIER ; Lhs = dependent variable


; Rhs = one, other independent variables
; Model = Truncated Normal $ (or ; Model = T)

The specification of the cost frontier and the estimator of technical inefficiency are requested in the
same fashion,
; Cost
and ; Eff = variable name

Other optional parts of the command are the same as that for the normal-half normal model.
We note, this model is extremely volatile, owing to the rather weak identification of the
parameter . It is difficult to distinguish the mean from the variance parameter in this model. In the
truncation model,
E[ui] =  + u(/u)/(/u).

This implies that u and  can covary so as to produce little or no variation in the expectation of ui.
The likelihood is not a function of the square of ui, so this mean is the only source of information
about these two parameters. (By totally differentiating the expected value, one can solve for the
implicit relationship, d/du that produces dE[ui] = 0.) The example below suggests how this aspect
of the model influences (or fails to) the estimates of inefficiency. For purposes of the JLMS
estimator for the half normal model, when the mean of U is a nonzero , the argument to the
function is replaced with

w = S/ - /().

The remaining part of the computation is the same.


E65: Data Envelopment Analysis E-58

E63.3.1 Application
The results below show estimates of a stochastic cost frontier with the half normal then the
truncated normal specifications. The additional parameterization appears to have had a large impact
on the results; the estimates are noticeably different. The plot of the two sets of inefficiency
estimates suggest that the effect of the new specification has been little more than to double the
estimated values from the model – the dashed line in the figure shows the function uTN = 2uHN. The
extremely large estimates of  and the standard error do suggest that something is amiss with the
model, however.
The commands are:

FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = u $


FRONTIER ; Lhs = lq ; Rhs = one,ll,lp,lf,le,lm,lk ; Techeff = ut ; Model = T $
PLOT ; Lhs = u ; Rhs = ut ; Rh2 = u ; Fill ; Grid
; Title = Truncated Normal Inefficiencies vs. Half Normal $
DSTAT ; Rhs = u,ut $
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 108.43918
Variances: Sigma-squared(v)= .01902
Sigma-squared(u)= .01692
Sigma(v) = .13791
Sigma(u) = .13007
Sigma = Sqr[(s^2(u)+s^2(v)]= .18957
Gamma = sigma(u)^2/sigma^2 = .47074
Var[u]/{Var[u]+Var[v]} = .24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = .730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439
LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530
LP| .44533*** .09498 4.69 .0000 .25917 .63149
LF| .37257*** .07038 5.29 .0000 .23463 .51052
LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299
LM| .69910*** .07580 9.22 .0000 .55054 .84766
LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759
|Variance parameters for compound error
Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373
Sigma| .18957*** .00064 297.81 .0000 .18832 .19082
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-59

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 109.49695
Estimation based on N = 256, K = 10
Variances: Sigma-squared(v)= .01896
Sigma-squared(u)= 2.48813
Sigma(v) = .13771
Sigma(u) = 1.57738
Sigma = Sqr[(s^2(u)+s^2(v)]= 1.58338
Gamma = sigma(u)^2/sigma^2 = .99244
Var[u]/{Var[u]+Var[v]} = .97946
Stochastic Production Frontier, e = v-u
Half Normal:u(i)=|U(i)|; frontier model
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.845
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -3.11541*** .77143 -4.04 .0001 -4.62739 -1.60343
LL| -.44532*** .07797 -5.71 .0000 -.59814 -.29249
LP| .46908*** .11368 4.13 .0000 .24628 .69188
LF| .37437*** .07465 5.02 .0000 .22807 .52068
LE| 2.20830*** .73883 2.99 .0028 .76023 3.65637
LM| .67741*** .09341 7.25 .0000 .49433 .86048
LK| -2.20620*** .82402 -2.68 .0074 -3.82126 -.59115
|Offset [mean=mu(i)] parameters in one sided error
Mu| -31.5468 5061.203 -.01 .9950 -9951.3228 9888.2292
|Variance parameters for compound error
Lambda| 11.4545 907.8501 .01 .9899 -1767.8991 1790.8081
Sigma| 1.58338 124.7546 .01 .9899 -242.93113 246.09790
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Descriptive Statistics
--------+---------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+---------------------------------------------------------------------
U| .902312 .035500 .703534 .963108 256 0
UT| .925474 .039335 .608274 .972355 256 0
--------+---------------------------------------------------------------------
E65: Data Envelopment Analysis E-60

Figure E63.3 Inefficiency Estimates from Truncated Normal Model

E63.3.2 Battese and Coelli (1995) Formulation


There are (apparently) two formulations of the normal – truncated normal model in the
literature. The formulated above,

y = x + v - u, u = |U|
U ~ N[,u2]
v ~ N[0,v2]

is due to Stevenson (1980). Note that the inefficiency term is the absolute value of a normally
distributed variable with a nonzero mean. Battese and Coelli proposed an apparently different
formulation of the truncation model;

u =  + w

where w is a truncated normal, such that

w > -.

This is actually the same model. You can obtain the estimates using this alternative formulation with

; Model = BC95

in place of ; Model = T. The log likelihood for this formulation involves a one to one
reparameterization of the Stevenson model, which has slightly different numerical properties. You
can see this in the application below. The estimated inefficiency and efficiency values produced by
the two models are the same to five or six digits, however.
E65: Data Envelopment Analysis E-61

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 109.48819
Variances: Sigma-squared(v)= .01918
Sigma-squared(u)= 2.25705
Sigma(v) = .13850
Sigma(u) = 1.50235
Sigma = Sqr[(s^2(u)+s^2(v)]= 1.50872
Gamma = sigma(u)^2/sigma^2 = .99157
Var[u]/{Var[u]+Var[v]} = .97715
Stochastic Production Frontier, e = v-u
Battese/Coelli 1995 truncated normal model
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 1
Deg. freedom for inefficiency model: 2
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.828
Kodde-Palm C*: 95%: 5.138, 99%: 8.273
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -3.09929*** .76919 -4.03 .0001 -4.60687 -1.59172
LL| -.44370*** .07771 -5.71 .0000 -.59600 -.29140
LP| .46535*** .11351 4.10 .0000 .24288 .68781
LF| .37430*** .07432 5.04 .0000 .22863 .51997
LE| 2.18991*** .73664 2.97 .0030 .74613 3.63369
LM| .67921*** .09322 7.29 .0000 .49651 .86191
LK| -2.18647*** .82171 -2.66 .0078 -3.79700 -.57594
|Offset [mean=z(i)*delta] parameters in one sided error
Constant| -29.6062 4821.053 -.01 .9951 -9478.6972 9419.4848
|Variance parameters for compound error
Gamma| .99157 1.34377 .74 .4606 -1.64216 3.62531
SigmaSqd| 2.27624 363.5754 .01 .9950 -710.31839 714.87086
--------+--------------------------------------------------------------------
(Stevenson formulation)
Log likelihood function 94.86417
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -3.11541*** .77143 -4.04 .0001 -4.62739 -1.60343
LL| -.44532*** .07797 -5.71 .0000 -.59814 -.29249
LP| .46908*** .11368 4.13 .0000 .24628 .69188
LF| .37437*** .07465 5.02 .0000 .22807 .52068
LE| 2.20830*** .73883 2.99 .0028 .76023 3.65637
LM| .67741*** .09341 7.25 .0000 .49433 .86048
LK| -2.20620*** .82402 -2.68 .0074 -3.82126 -.59115
|Offset [mean=mu(i)] parameters in one sided error
Mu| -31.5468 5061.203 -.01 .9950 -9951.3228 9888.2292
|Variance parameters for compound error
Lambda| 11.4545 907.8501 .01 .9899 -1767.8991 1790.8081
Sigma| 1.58338 124.7546 .01 .9899 -242.93113 246.09790
E65: Data Envelopment Analysis E-62

E63.3.3 Technical Details on the Truncated Normal Model


The individual term in the log likelihood for the normal-truncated normal model is

log Li = - ½log2 -log - ½[(Si + )/]2 - log(/u ) + log[(/ - Si)/].


The definitions above imply that

u = / 1  2 .

Using this and the reparameterization

 = /()
produces the log likelihood for this model,

Log Li = - ½log2 -log - ½(di/ + )2 - log( 1  2 ) + log( - di/).


The function is then maximized with respect to , ,  and . After optimization, the structural
parameter  is recovered from the result  = . For the model with heterogeneity in the mean
presented in Section E63.3.4,

i = zi

we simply replace  with i= zi, then recover the parameter vector  from the same transformation
as before,  = .
For purposes of the JLMS estimator for the half normal model, when the mean of U is a
nonzero , the argument to the function is replaced with

w = S/ - /().

The remaining part of the computation is the same.

E63.3.4 Heterogeneity in the Mean in the Truncation Model


The models listed above are all „homogeneous.‟ Both the means and the variances of the
underlying disturbance distributions are constant. There are several models of heterogeneity
available as well. Use

; Model = T ; Rh2 = list of variables that enter the mean

to specify the heterogeneity in mean model, Ui ~ N[zi, u2]. In formulating this model, though it is
not required, you should include a constant in zi (the Rh2 variables) so that the homogeneous model
becomes a special case. Also, if you are fitting a panel data version of this, note that the assumption
underlying the model is that the same ui occurs in every period. Therefore, the zi should be the
same in every period. LIMDEP will assume this is the case, and only use the Rh2 variables provided
for the first period.
E65: Data Envelopment Analysis E-63

E63.3.5 Truncation and Heteroscedasticity


The doubly heteroscedastic model is also available for the truncated normal stochastic
frontier model. In
yi = xi + vi- ui

you may specify ; Model = Truncated Normal; Rh2 = list of variables

and Var[ui] = u2 exp(′zi) with

; Het ; Hfu = list of variables in zi

and/or Var[vi] = v2 exp(′wi) with

; Het ; Hfv = list of variables in w i

Note that since both variance functions have a free multiplicative constant, you should not include
one in either variable list.
In the absence of the Rh2 list, the mean of the underlying truncated variable is taken to be a
constant to be estimated. This formulation encompasses all of Stevenson (1980), Reifschneider and
Stevenson (1991), Huang and Liu (1994), and Battese and Coelli (1995). (Notwithstanding the
assertion in the Battese and Coelli paper, the latter is not a panel data treatment as observations are
still assumed to be independent.)
To illustrate the truncated normal estimator, we have refit the stochastic frontier production
function with a complete set of firm dummy variables (less the last one) and the load factor variable
in the mean of the underlying distribution. In the second model below, we have made the variance
of v a function of the log of the average stage length. The command set begins with a small repair to
the data set. One of the firms has no observations for the load factor, stage length or points served
variables – they are coded as zero in the data. These observations are bypassed, then the firm
dummies for the fixed effects model are assembled.

SAMPLE ; All $
REJECT ; loadfctr = 0 $
CREATE ; i = Seq(firm) $
CREATE ; Expand(i,0) $
CREATE ; lk = Log(k) $
NAMELIST ; xp = one,lf,lm,le,ll,lp,lk $
FRONTIER ; Lhs = lq ; Rhs = xp ; Model = T ; Rh2 = loadfctr,_i_ $
FRONTIER ; Lhs = lq ; Rhs = xp ; Model = T ; Rh2 = loadfctr,_i_
; Het ; Hfv = lstage $

(These are „true fixed effects‟ models.)


E65: Data Envelopment Analysis E-64

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 196.20748
Estimation based on N = 256, K = 34
Inf.Cr.AIC = -324.4 AIC/N = -1.267
Model estimated: Aug 22, 2011, 22:29:09
Variances: Sigma-squared(v)= .00960
Sigma-squared(u)= .00389
Sigma(v) = .09799
Sigma(u) = .06241
Sigma = Sqr[(s^2(u)+s^2(v)]= .11618
Gamma = sigma(u)^2/sigma^2 = .28856
Var[u]/{Var[u]+Var[v]} = .12845
Stochastic Production Frontier, e = v-u
Half Normal:u(i)=|U(i)|; frontier model
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 25
Deg. freedom for inefficiency model: 26
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = 176.266
Kodde-Palm C*: 95%:38.301, 99%: 45.026
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -2.92400*** .68225 -4.29 .0000 -4.26118 -1.58682
LF| .31938*** .09026 3.54 .0004 .14246 .49629
LM| .81647*** .08387 9.73 .0000 .65209 .98086
LE| 1.99934*** .64368 3.11 .0019 .73776 3.26092
LL| -.42790*** .10954 -3.91 .0001 -.64260 -.21321
LP| .42291*** .10529 4.02 .0001 .21654 .62929
LK| -2.07145*** .72267 -2.87 .0042 -3.48786 -.65503
|Offset [mean=mu(i)] parameters in one sided error
LOADFCTR| -.83124 6.87337 -.12 .9037 -14.30280 12.64031
I01| .63250 4.90139 .13 .8973 -8.97405 10.23904
I02| .58118 4.27763 .14 .8919 -7.80282 8.96519
(Firms 3-21 omitted)
I22| .45249 4.00889 .11 .9101 -7.40480 8.30977
I23| .64687 99.45841 .01 .9948 -194.28803 195.58176
I24| -.19804 7.26011 -.03 .9782 -14.42760 14.03152
|Variance parameters for compound error
Lambda| .63686** .28984 2.20 .0280 .06879 1.20494
Sigma| .11618*** .01008 11.53 .0000 .09643 .13593
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-65

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 215.58601
Estimation based on N = 256, K = 35
Variances: Sigma-squared(v)= .00634
Sigma-squared(u)= .01037
Sigma(u) = .10183
Sigma(v) = .07961
Sigma = Sqr[(s^2(u)+s^2(v)]= .12926
Variances averaged over observations
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 25
Deg. freedom for inefficiency model: 26
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = 215.023
Kodde-Palm C*: 95%:38.301, 99%: 45.026
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -1.98442* 1.05055 -1.89 .0589 -4.04346 .07463
LF| .45669*** .11002 4.15 .0000 .24105 .67233
LM| .59013*** .10421 5.66 .0000 .38589 .79437
LE| 1.11856 1.00928 1.11 .2677 -.85959 3.09671
LL| -.29237*** .10923 -2.68 .0074 -.50646 -.07827
LP| .31311** .14333 2.18 .0289 .03220 .59402
LK| -1.14743 1.10875 -1.03 .3007 -3.32054 1.02568
|Mean of underlying truncated distribution
LOADFCTR| -2.20067*** .42161 -5.22 .0000 -3.02701 -1.37433
I01| 1.44767*** .25736 5.63 .0000 .94326 1.95208
I02| 1.39624*** .22401 6.23 .0000 .95718 1.83529
(Firms 3-22 omitted)
I24| 1.29355*** .24998 5.17 .0000 .80360 1.78349
|Scale parms. for random components of e(i)
ln_sgmaU| -2.28443*** .02100 -108.79 .0000 -2.32559 -2.24328
ln_sgmaV| -3.22203*** 1.20573 -2.67 .0075 -5.58522 -.85884
|Heteroscedasticity in variance of symmetric v(i)
LSTAGE| .11855 .19755 .60 .5485 -.26865 .50574
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-66

E63.4 Alvarez et al. – Equality Constrained Scaling Model


Alvarez, Amsler, Orea and Schmidt (2006) have suggested a form of the truncation model
which encompasses a number of ideas in stochastic frontier modeling. Their formulation is a
normal-truncated normal frontier model with

i = zi and u,i = u′zi.

The mean and standard deviation of the underlying truncated normal variable ui are scaled by the
same linear function of the data. We are skeptical of the linear scaling of the variance, and propose
our usual exponential form instead. The linear form may be natural for the mean, but it allows the
variance to be negative, which is unacceptable. The model used here is

i = exp(zi) and u,i = u exp(zi).

The Alvarez model results if  = . Otherwise, we allow these to be free and to produce another
variant of the frontier model. Note that as stated, this model is now merely a change of the normal-
truncated normal model with heteroscedasticity in which the variables enter the truncation mean
function in the exponential function rather than linearly.
The equality constrained scaling model is requested with

FRONTIER ; Lhs = y ; Rhs = one, x...


; Model = Scaling
; Heteroscedasticity
; Rh2 = variables in mean of truncated distribution
; Hfu = the same list of variables $

Note in this case, Rh2 and Hfu give the same list. To obtain the scaling model without forcing the
equality of  and , use

FRONTIER ; Lhs = y ; Rhs = one, x...


; Model = S
; Heteroscedasticity
; Rh2 = variables in mean of truncated distribution
; Hfu = the same list of variables $

Note, ; Model = Scaling in the equality constrained case and ; Model = S when the equality
constraint is relaxed. (In this formulation, the variable lists could differ.) To constrain  = 0, which
just produces the heteroscedasticity model, use

FRONTIER ; Lhs = y ; Rhs = one, x...


; Model = T
; Heteroscedasticity
; Hfu = list of variables $
E65: Data Envelopment Analysis E-67

To constrain  = 0, you would use the available setup for the truncated normal form, but ; Model = S
rather than ; Model = T to obtain the exponential scaling of the mean.

FRONTIER ; Lhs = y ; Rhs = one, x...


; Model = S
; Rh2 = variables in mean of truncated distribution $

Finally, with both  = 0 and  = 0, this is just the standard normal-truncated normal model.

Technical Details

The implementation of the scaling model in LIMDEP is just a version of the truncation
model with heteroscedasticity. The modifications of that model are:

 The constant terms in the mean and variance are enforced by the program.
 The mean function is exponential.
 In the first form of the model, a constraint is imposed that the coefficients in the mean and
variance functions are the same.

As Alvarez et al. note in their paper, this model is not supported by any particular theory of the
frontier framework. They suggest it as a natural extension of the familiar model with truncation.
Rather, they argue that the unnatural form of the model would be the one with different scaling
factors in the mean and variance functions.

Application

To illustrate the scaling model, we use the airlines cost data. The cost function is fit with
truncation mean and variance functions that depend on the load factor and (log of) the average stage
length. The equality constraint is imposed in the first model and relaxed in the second.

FRONTIER ; Lhs = lc ; Cost ; Rhs = x


; Model = Scaling ; Het
; Rh2 = loadfctr,lstage
; Hfu = loadfctr,lstage $
FRONTIER ; Lhs = lc ; Cost ; Rhs = x
; Model = S ; Het
; Rh2 = loadfctr,lstage
; Hfu = loadfctr,lstage $
E65: Data Envelopment Analysis E-68

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LC
Log likelihood function 172.27160
Estimation based on N = 256, K = 13
Variances: Sigma-squared(v)= .01528
Sigma-squared(u)= .00000
Sigma(v) = .12361
Sigma(u) = .00169
Sigma = Sqr[(s^2(u)+s^2(v)]= .12363
Stochastic Frontier Scaling Model
Mean scale factor for E[u] = .6996
Mean scale factor for V[u] = .6996
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 2
Deg. freedom for truncation mean: 2
Deg. freedom for inefficiency model: 5
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 28.713
Kodde-Palm C*: 95%:10.371, 99%: 14.325
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 18.9477 27.00668 .70 .4829 -33.9844 71.8798
LY| .95234*** .02117 44.98 .0000 .91084 .99383
LY2| .07740*** .01534 5.04 .0000 .04733 .10747
LPKP| 1.50434 1.86479 .81 .4198 -2.15058 5.15926
LPLP| .12682 .08328 1.52 .1278 -.03640 .29003
LPMP| -.16640 1.21907 -.14 .8914 -2.55574 2.22294
LPEP| -.52809 .60356 -.87 .3816 -1.71105 .65488
LPFP| .00151 .02141 .07 .9436 -.04045 .04348
|Mean of Truncated Distribution, Mu then scale
Mu_0| 2.50985 11.12070 .23 .8214 -19.28633 24.30603
LOADFCTR| -.56559 3.85231 -.15 .8833 -8.11597 6.98479
LSTAGE| -.00823 .05624 -.15 .8837 -.11845 .10200
|Standard Deviation of u: Sigma(u) then scale
Sigmau_0| .00241 9.18604 .00 .9998 -18.00191 18.00673
LOADFCTR| -.56559 3.85231 -.15 .8833 -8.11597 6.98479
LSTAGE| -.00823 .05624 -.15 .8837 -.11845 .10200
|Standard deviation of v
Sigma(v)| .12361 .08711 1.42 .1559 -.04713 .29435
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-69

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LC
Log likelihood function 173.52520
Estimation based on N = 256, K = 15
Variances: Sigma-squared(v)= .01334
Sigma-squared(u)= .00121
Sigma(v) = .11551
Sigma(u) = .03476
Sigma = Sqr[(s^2(u)+s^2(v)]= .19230
Stochastic Frontier Scaling Model
Mean scale factor for E[u] = .3459
Mean scale factor for V[u] = .2261
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 2
Deg. freedom for truncation mean: 2
Deg. freedom for inefficiency model: 5
LogL when sigma(u)=0 157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] = 31.220
Kodde-Palm C*: 95%:10.371, 99%: 14.325
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 11.6452 24.94703 .47 .6406 -37.2501 60.5405
LY| .94078*** .02140 43.97 .0000 .89884 .98272
LY2| .06680*** .01579 4.23 .0000 .03585 .09776
LPKP| .85146 1.94378 .44 .6614 -2.95828 4.66120
LPLP| .16345** .07956 2.05 .0399 .00751 .31939
LPMP| .25417 1.26886 .20 .8412 -2.23275 2.74109
LPEP| -.34167 .62932 -.54 .5872 -1.57511 .89178
LPFP| .00164 .02164 .08 .9395 -.04078 .04406
|Mean of Truncated Distribution, Mu then scale
Mu_0| 1.92288*** .44030 4.37 .0000 1.05991 2.78584
LOADFCTR| -1.74305 4.08382 -.43 .6695 -9.74720 6.26110
LSTAGE| -.01930 .04649 -.42 .6781 -.11042 .07182
|Standard Deviation of u: Sigma(u) then scale
Sigmau_0| .15374 1.11571 .14 .8904 -2.03301 2.34049
LOADFCTR| -14.5014 10.21457 -1.42 .1557 -34.5216 5.5188
LSTAGE| 1.02454 1.26499 .81 .4180 -1.45479 3.50388
|Standard deviation of v
Sigma(v)| .11551*** .00793 14.56 .0000 .09996 .13106
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-70

E64: Panel Data Stochastic Frontier Models


E64.1 Introduction
The stochastic frontier model as it appears in the current literature was originally developed
by Aigner, Lovell, and Schmidt (1977). The canonical formulation that serves as the foundation for
other variations is their model,

y = x + v - u,

where y is the observed outcome (goal attainment), x + v is the optimal, frontier goal (e.g.,
maximal production output or minimum cost) pursued by the individual, x is the deterministic part
of the frontier and v ~ N[0,v2] is the stochastic part. The two parts together constitute the
„stochastic frontier.‟ The amount by which the observed individual fails to reach the optimum (the
frontier) is u, where
u = |U| and U ~ N[0,u2]

(change to v + u for a stochastic cost frontier or any setting in which the optimum is a minimum). In
this context, u is the „inefficiency.‟ This is the normal-half normal model which forms the basic
form of the stochastic frontier model. Chapters E62 and E63 developed several versions of the
stochastic frontier model suitable for cross section and pooled data sets. This chapter will develop
versions of the model constructed specifically for panel data.

E64.2 Panel Data Estimators for Stochastic Frontier Models


The stochastic frontiers literature has steadily evolved since the developments of basic
random and fixed effects models by Pitt and Lee (1981) and by Cornwell, Schmidt and Sickles
(1990). All of the generally used forms of panel data models are supported in LIMDEP. The
following will document them in detail. These sections are arranged as follows:

 Pitt and Lee – Time Invariant Inefficiency, Random Effects,


 Cornwell, Schmidt and Sickles – Time Invariant Inefficiency, Fixed Effects,
 Battese and Coelli – Time Dependent Inefficiency Models,
 True Fixed Effects Models with Time Varying Inefficiency,
 True Random Effects Models with Time Varying Inefficiency,
 Random Parameters Stochastic Frontier Models,
 Alvarez et al. – Fixed Management (Random Parameters) Model,
 Latent Class Stochastic Frontier Models.

The panel models developed here will share features with other panel models in LIMDEP, as
presented in Chapters R22-R25. As in other settings, panels in all models may be unbalanced. Panels
are identified by
SETPANEL ; … $

then ; Panel
in the command, or ; Pds = group count
E65: Data Envelopment Analysis E-71

Nearly all of the models to be presented here actually require panel data, but a few will work, albeit
not as well as otherwise, with ; Pds = 1, i.e., with a cross section. This will be specifically noted
below when it is the case. Second, in all models, the cost form as opposed to the production form is
requested with
; Cost

This and other model specifications are generally the same as the cross sectional cases.

E64.3 Pitt and Lee – Time Invariant Inefficiency, Random


Effects
The panel data, random effects specifications based on the model of Pitt and Lee (1981) are

yit =  + ′xit + vit - Sui

with S = +1 for a production model and -1 for a cost model. The inefficiency component is assumed
to be time invariant. The base case is the normal-half normal model

ui = |Ui|, Ui ~ N[0,2].

This is a direct extension of the cross section variant discussed earlier. Several model formulations
are grouped in this class. The command for the Pitt and Lee group of models is given by changing
the base case specifications to

FRONTIER ; Lhs = y ; Rhs = one, ... ; Panel $

Pitt and Lee is the default panel data model. The only necessary change for the default case is
specification of the panel with ; Panel. As in the cross section case, the normal-exponential case is
requested with
; Model = Exponential

while the normal-truncated normal is requested with

; Rh2 = one or ; Rh2 = one, additional variables

(The ; Model = T is not needed.) The truncation model may not be combined with the exponential
specification; it is only supported for the normal-truncated normal form.

NOTE: The gamma model does not have a random effects (panel data) version. The model
extensions, such as the scaling model and sample selection described in Chapter E63 likewise do not
support a Pitt and Lee style random effects version.

There is an important consideration for the truncation version with heterogeneous mean. If
you are fitting a panel data version of this model, note that the assumption underlying the model is
that the same ui occurs in every period. Therefore, the zi must be the same in every period.
LIMDEP will assume this is the case, and only use the Rh2 variables provided for the first period.
E65: Data Envelopment Analysis E-72

When the random effects model is estimated, maximum likelihood estimates of the cross
section models are always computed first to obtain the starting values. This will produce a full set of
results which will ignore the panel nature of the data set. A second full set of results will then follow
for the random effects model.
The model estimates retained for all cases are

b = regression parameters, ,


varb = asymptotic covariance matrix.

Use ; Par to retain the additional parameters in b and varb. As seen in the applications below, the
parameters estimated in each case will differ depending on the model formulation. The ancillary
parameters that are estimated for the various models are the same ones saved by the cross section
versions. All models save sy, ybar, nreg, kreg, and logl as well as s, b, varb, etc.

WARNING: Numerous experiments and applications have suggested that the normal-truncated
normal model is a difficult one to estimate. Identification appears to be highly variable, and small
variations in the data can produce large variation in the results. The model often fails to converge
even when convergence of the restricted model with zero underlying mean is routine.

E64.3.1 Model Specifications


There are many different combinations of the components of the random effects model listed
above. The following shows the different possibilities for the Pitt and Lee model. (There are also
many combinations of these that do not use the panel data random effects form.):

NAMELIST ; x = one, … $
CREATE ; y = the outcome variable $
SETPANEL ; … $
Model 1 = pooled
FRONTIER ; Lhs = y ; Rhs = x $
Model 2 = random effects half normal
FRONTIER ; Lhs = y ; Rhs = x ; Panel $
Model 3 = random effects exponential
FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Model = Exponential $
Model 4 = random effects normal heteroscedastic in u or v only
FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Het ; Hfv = … $
FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Het ; Hfu = … $
Model 5 = random effects normal doubly heteroscedastic
FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Het ; Hfv = … ; Hfu = … $
Model 6 = random effects truncated normal
FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Rh2 = one, … $
Model 7 = random effects truncated normal, singly or doubly heteroscedastic
FRONTIER ; Lhs = y ; Rhs = x ; Panel ; Rh2 = one, …
; Het ; Hfv = … ; Hfu = … $

The Pitt and Lee model forms assume that the inefficiency is time invariant. Thus, the
estimate of ui is repeated for each observation in the group. An example below illustrates.
E65: Data Envelopment Analysis E-73

E64.3.2 Applications
The following illustrates a few of the numerous formats of the random effects frontiers. The
data set used is the Swiss railroad data used in Greene (2011, Table F19.1). These data are provided
with the program as swissrailroads.lpj. The variables used here are

ct = total cost
pk = capital price
pe = electricity price
pl = labor price
q2 = passenger output – passenger km
q3 = freight output – ton km
rack = dummy variable for „rack rail‟ in network
tunnel = dummy variable for network with tunnels over 300 meters on average
virage = dummy variable for networks with narrow radius curvature
narrow_t = dummy variable for narrow track (1m as opposed to standard 1.435m).

Preparing the data set includes bypassing one firm for which there is only a single year of data. For
the remaining 49 firms, Ti is a mixture 3, 7, 10, 12 or 13. Figure E64.1 details the distribution of
group sizes.

Figure E64.1 Groups Sizes for Swiss Railroad Sample

Descriptive statistics for the data are shown below. Variables with names beginning with „M‟ are
firm means, repeated for each year for the firm.
We fit four models to illustrate the estimator, the pooled normal-half normal, pooled normal-
truncated (heterogeneous), basic Pitt and Lee and a full model with time invariant inefficiency,
truncation (heterogeneous) and double heteroscedasticity.
E65: Data Envelopment Analysis E-74

The commands are as follows:

SETPANEL ; Group = id ; Pds = ti $


REJECT ; ti = 1 $
CREATE ; lple = Log(pl/pe) ; lpke = Log(pk/pe) ; lnc = Log(ct/pe)$
NAMELIST ; x = one,lnq2,lnq3,lple,lpke $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Panel ; Costeff = eusfp_l $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Rh2 = rack,tunnel
; Het ; Hfu = virage ; Hfv = virage ; Costeff = eushet_t $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; panel ; Rh2 = rack,tunnel
; Het ; Hfu = virage ; Hfv = virage ; Costeff = fullmodl $

--------+---------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+---------------------------------------------------------------------
ID| 25.48760 14.60037 1.0 51.0 605 0
YEAR| 90.91570 3.692372 85.0 97.0 605 0
NI| 12.58347 1.305259 1.0 13.0 605 0
STOPS| 20.42479 18.48285 4.0 121.0 605 0
NETWORK| 39431.66 56642.38 3898.0 376997.0 605 0
LABOREXP| 12801.95 26232.69 951.0 173549.0 605 0
STAFF| 170.3810 333.0317 11.0 1934.0 605 0
ELECEXP| 968.1521 1944.830 14.0 14737.0 605 0
KWH| 7602.221 15608.39 82.0 104923.0 605 0
TOTCOST| 22470.44 42283.57 1534.0 280871.0 605 0
NARROW_T| .676033 .468375 0.0 1.0 605 0
RACK| .234711 .424169 0.0 1.0 605 0
TUNNEL| .188430 .391379 0.0 1.0 605 0
T| 5.915702 3.692372 0.0 12.0 605 0
Q1| 813914.0 1083923 61000.0 6409000 605 0
Q2| .308145D+08 .550599D+08 409000.0 .311000D+09 605 0
Q3| .101934D+08 .527303D+08 150.0 .477000D+09 605 0
CT| 26728.37 49883.51 2120.968 307433.4 605 0
PL| 86051.77 6484.535 60932.91 104930.4 605 0
PE| .157485 .022766 .076344 .265182 605 0
PK| 4534.491 2128.307 1040.323 14466.06 605 0
VIRAGE| .715702 .451452 0.0 1.0 605 0
LABOR| 52.40245 9.598136 20.03025 73.11581 605 0
ELEC| 4.044504 1.422098 .568412 9.311660 605 0
CAPITAL| 43.55305 9.461303 23.88916 77.33154 605 0
LNCT| 11.30622 1.101691 9.462956 14.57019 605 0
LNQ1| 13.06322 1.010039 11.01863 15.67321 605 0
LNQ2| 16.31759 1.339167 12.92147 19.55500 605 0
LNQ3| 12.49439 2.716709 5.010635 19.98343 605 0
LNNET| 3.200860 .908512 1.360464 5.932237 605 0
LNPL| 13.21935 .163565 12.60449 13.77599 605 0
LNPE| -1.859557 .152870 -2.572503 -1.327338 605 0
LNPK| 10.17950 .438886 8.740266 11.37466 605 0
E65: Data Envelopment Analysis E-75

LNSTOP| 2.775052 .655071 1.386294 4.795791 605 0


LNCAP| 3.137572 .328311 2.123893 3.850147 604 1
MLNQ1| 13.06322 1.005089 11.16747 15.59433 605 0
MLNQ2| 16.31759 1.333346 13.20185 19.45679 605 0
MLNQ3| 12.49439 2.648475 7.734539 19.68075 605 0
MLNNET| 3.200860 .906363 1.360464 5.927817 605 0
MLNPL| 13.21935 .126548 12.89796 13.61620 605 0
MLNPK| 10.17950 .396797 8.938699 11.03543 605 0
MLNSTOP| 2.775052 .651059 1.386294 4.789402 605 0
LPLE| 13.21943 .163692 12.60449 13.77599 604 1
LPKPE| 10.16419 .576094 1.0 11.37466 605 0
LNC| 11.30305 1.099836 9.462957 14.57019 604 1
--------+---------------------------------------------------------------------

This is the pooled normal-half normal model.

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function -209.42340
Estimation based on N = 604, K = 7
Inf.Cr.AIC = 432.8 AIC/N = .717
Variances: Sigma-squared(v)= .07332
Sigma-squared(u)= .12333
Sigma(v) = .27077
Sigma(u) = .35119
Sigma = Sqr[(s^2(u)+s^2(v)]= .44345
Gamma = sigma(u)^2/sigma^2 = .62716
Var[u]/{Var[u]+Var[v]} = .37937
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.060
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -10.0907*** 1.14284 -8.83 .0000 -12.3306 -7.8507
LNQ2| .64179*** .01371 46.80 .0000 .61491 .66867
LNQ3| .06855*** .00655 10.46 .0000 .05570 .08139
LPLE| .53971*** .08858 6.09 .0000 .36610 .71333
LPKE| .26045*** .03260 7.99 .0000 .19655 .32435
|Variance parameters for compound error
Lambda| 1.29697*** .13854 9.36 .0000 1.02545 1.56850
Sigma| .44345*** .00056 789.05 .0000 .44235 .44455
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-76

This is the original Pitt and Lee normal-half normal model with time invariant inefficiency.
In comparison to the pooled model above, u has tripled and v has decreased by two thirds. The
assumption of time invariance of the inefficiency produces a large reallocation of the random
components between noise and inefficiency. This is evident in the kernel estimate below as well.
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function 527.11659
Estimation based on N = 604, K = 7
Inf.Cr.AIC = -1040.2 AIC/N = -1.722
Stochastic frontier based on panel data
Estimation based on 49 individuals
Variances: Sigma-squared(v)= .00621
Sigma-squared(u)= .92297
Sigma(v) = .07879
Sigma(u) = .96071
Sigma = Sqr[(s^2(u)+s^2(v)]= .96394
Gamma = sigma(u)^2/sigma^2 = .99332
Var[u]/{Var[u]+Var[v]} = .98183
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1475.140
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -7.25643*** .24767 -29.30 .0000 -7.74185 -6.77101
LNQ2| .36259*** .01503 24.12 .0000 .33312 .39205
LNQ3| .01902*** .00240 7.94 .0000 .01432 .02372
LPLE| .64148*** .02112 30.38 .0000 .60009 .68287
LPKE| .30842*** .00700 44.08 .0000 .29471 .32214
|Variance parameters for compound error
Lambda| 12.1932** 5.55909 2.19 .0283 1.2975 23.0888
Sigma(u)| .96071*** .13303 7.22 .0000 .69998 1.22145
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-77

This is the pooled normal-truncated and doubly heteroscedastic normal model.


-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function -63.43402
Estimation based on N = 604, K = 11
Inf.Cr.AIC = 148.9 AIC/N = .246
Variances: Sigma-squared(v)= .07144
Sigma-squared(u)= .00074
Sigma(u) = .02720
Sigma(v) = .26729
Sigma = Sqr[(s^2(u)+s^2(v)]= .26867
Variances averaged over observations
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 1
Deg. freedom for truncation mean: 2
Deg. freedom for inefficiency model: 4
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 294.039
Kodde-Palm C*: 95%: 8.761, 99%: 12.483
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -13.4218*** 1.01232 -13.26 .0000 -15.4059 -11.4377
LNQ2| .62859*** .01404 44.79 .0000 .60108 .65610
LNQ3| .09670*** .00669 14.46 .0000 .08359 .10981
LPLE| .68419*** .07646 8.95 .0000 .53433 .83405
LPKE| .39946*** .03301 12.10 .0000 .33476 .46415
|Mean of underlying truncated distribution
RACK| .62333*** .05632 11.07 .0000 .51293 .73372
TUNNEL| -.35607*** .05500 -6.47 .0000 -.46387 -.24828
|Scale parms. for random components of e(i)
ln_sgmaU| -2.54850*** .96756 -2.63 .0084 -4.44488 -.65212
ln_sgmaV| -1.36799*** .06507 -21.02 .0000 -1.49551 -1.24046
|Heteroscedasticity in variance of truncated u(i)
VIRAGE| -1.47329 2.86559 -.51 .6072 -7.08975 4.14316
|Heteroscedasticity in variance of symmetric v(i)
VIRAGE| .06774 .08094 .84 .4026 -.09090 .22638
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-78

This is the same model as immediately above, with the additional assumption that the
inefficiency is time invariant. Compared to the previous specification, u has now increased by a
factor of 30 while v has nearly vanished, falling from 0.27 to 0.005, that is, by a factor of 50.
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function 532.94237
Estimation based on N = 604, K = 11
Inf.Cr.AIC = -1043.9 AIC/N = -1.728
Variances: Sigma-squared(v)= .00003
Sigma-squared(u)= .76238
Sigma(u) = .87314
Sigma(v) = .00543
Sigma = Sqr[(s^2(u)+s^2(v)]= .87316
Variances averaged over observations
Stochastic frontier based on panel data
Estimation based on 49 individuals
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 1
Deg. freedom for truncation mean: 2
Deg. freedom for inefficiency model: 4
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1486.792
Kodde-Palm C*: 95%: 8.761, 99%: 12.483
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -7.26117*** .25317 -28.68 .0000 -7.75738 -6.76496
LNQ2| .36162*** .01558 23.20 .0000 .33107 .39216
LNQ3| .01947*** .00257 7.58 .0000 .01444 .02451
LPLE| .64342*** .02165 29.72 .0000 .60099 .68584
LPKE| .30730*** .00727 42.24 .0000 .29305 .32156
|Mean of underlying truncated distribution
RACK| .81356 .52427 1.55 .1207 -.21399 1.84112
TUNNEL| 1.46353*** .47072 3.11 .0019 .54094 2.38613
|Scale parms. for random components of e(i)
ln_sgmaU| -.17921 .21781 -.82 .4106 -.60611 .24769
ln_sgmaV| -4.94678*** .20426 -24.22 .0000 -5.34711 -4.54644
|Heteroscedasticity in variance of truncated u(i)
VIRAGE| .06076 .04703 1.29 .1964 -.03142 .15294
|Heteroscedasticity in variance of symmetric v(i)
VIRAGE| -.37544 .44206 -.85 .3957 -1.24185 .49097
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-79

The kernel estimator compares the estimated cost efficiency distributions for the pooled and
basic Pitt and Lee model. The pattern suggested earlier is clearly evident. The same comparison
appears for the truncated normal/heteroscedasticity models. (The estimated cost efficiency results
for the basic Pitt and Lee model and the expanded one are the same to three or four digits.) The
partial listing below shows the estimates for the four models, noting the time invariance of the Pitt
and Lee estimates.

Figure E64.2 Kernel Estimators for Cost Efficiency

Figure E64.3 Estimated Cost Efficiency


E65: Data Envelopment Analysis E-80

E64.3.3 Technical Details


For the three forms of the normal mixture models, we use the following: Let

 = u2 / v2
i = i/u
i = zi for the heterogeneous mean model
, = a constant (0) for the simple truncated (half) normal model
Ai = 1 + Ti
hi = i / Ai– STi i /(u Ai)

t 1  yit  ' xit  .


Ti
i = (1 / Ti )

Then, the contribution of individual i to the log likelihood function for the normal-half normal model
is
log Li = – (Ti/2)log 2–Ti logu– ½ log Ai – (Ti/2) log 

t 1  it2
Ti
– ½( / u2) + ½ Aihi2 + ½ log(hi Ai )– ½ i2– log(i)

For the normal-exponential model, let

hi = – (v/Ti + d i /v)

Then, log Li = – ½ log Ti– (Ti– 1)log 2 + log– (Ti – 1)logv

t 1  it2
Ti
– ½(1/v2) + ½ Ti hi2 + log(hi Ti )
The Jondrow estimator, as formulated in Battese and Coelli (1988) in as follows: Let

i = 1 / (1 + 2Ti),
i2 = u2i,
Ei = i + (1 - i)( – i ),

and i = (1/Ti)tit.

Then, E[ui|i1,i2,...] = Ei + i[(Ei/i) / (Ei/i)].

For the exponential model, replace i with v and Ei with Ti (– i – v2/Ti).


E65: Data Envelopment Analysis E-81

E64.4 Cornwell, Schmidt and Sickles – Time Invariant


Inefficiency, Fixed Effects
Cornwell, Schmidt and Sickles (1990) suggested a modification of the familiar fixed effects
linear regression,
yit = i + ′xit + vit.

The estimated model is


yit = ai + b′xit + vit
= max(ai) + b′xit + vit+ [ai– max(ai)]
= a + b′xit + vit - ui
where ui = max(ai) - ai > 0.

(To change this to a cost frontier, change ui to [ai - min(ai)] This bears resemblance to a stochastic
frontier model, though in fact, it is a „deterministic‟ frontier model. The signature feature is that ui
equals zero for the „most efficient‟ firm in the sample. A natural interpretation of this is that what
we measure with the model is not the absolute inefficiency, but inefficiency of firm i relative to the
other firms in the sample. From the modeler‟s point of view, this approach has several substantive
advantages and disadvantages: The main advantage is

 It is distribution free. It requires only the assumptions of the linear model.

The disadvantages are:

 It does not allow any time invariant variables in the model.


 It labels as inefficiency any and all omitted time invariant effects.
 It can only measure firms relative to each other.

As illustrated in the results below, this approach tends to produce very large estimates of ui.
The invariance assumption about ui has been criticized elsewhere. Attempts to relax this assumption
are a recurrent theme in the literature, including the Battese and Coelli and true fixed and random
effects approaches described later. Other early work on the model suggested direct manipulation of
the fixed effects, for example,

it = i0 + i1t + i2t2.

Other more recent research (Han, Orea and Schmidt (2005)) has proposed factor analytic forms for
it. The sections to follow will include several of these different approaches.
E65: Data Envelopment Analysis E-82

Application

This Cornwell, Schmidt and Sickles (CSS) approach requires only a linear fixed effects
regression and a few instructions to manipulate the fixed effects. The following analyzes the airline
data with this approach. The following computes the CSS estimates and compares them to the
unstructured pooled estimates (using the normal-half normal model from Chapter E62) and the Pitt
and Lee model introduced above. The commands for the analysis are as follows:

SAMPLE ; All $
CREATE ; Railroad = id $
CREATE ; If(railroad > 20)railroad = railroad - 1 $ (There is a gap in the data)
HISTOGRAM ; Rhs = railroad
; Title = Number of Observations for Firms in Swiss Railroad Sample $
SETPANEL ; Group = id ; Pds = ti $
REJECT ; ti = 1 $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $
CREATE ; pooled = Group Mean(eusfpool, Pds = ti) $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Panel ; Costeff = pittlee $
REGRESS ; Lhs = lnc ; Rhs = x ; Panel ; Fixed Effects $
CREATE ; ai = alphafe(railroad) $
CALC ; minai = Min(ai) $
CREATE ; css = Exp((minai - ai)) $
CREATE ; Period = Ndx(id,1) $
REJECT ; period#1 $
PLOT ; Lhs = railroad ; Rhs = pooled,css ; Grid ; Fill ; Limits = 0,1
; Vaxis = Estimated Cost Efficiency
; Title = Half Normal vs. Cornwell, Schmidt, Sickles FE Cost Efficiencies $
PLOT ; Lhs = railroad ; Rhs = css,pittlee ; Grid ; Fill ; Limits = 0,1
; Vaxis = Estimated Cost Efficiency
; Title = Pitt and Lee RE vs. Cornwell, Schmidt, Sickles FE Cost Efficiencies $

The results below show the considerable differences in the parameter estimates produced by the
three models. Figure E64.4 demonstrates the expected quite large differences between the time
varying estimates (using the group means) and the time invariant results based on the CSS model.
Figure E64.5 also shows a striking, albeit commonly observed result – the CSS and Pitt and Lee
estimates are virtually identical.
E65: Data Envelopment Analysis E-83

-----------------------------------------------------------------------------
LSDV least squares with fixed effects ....
LHS=LNC Mean = 11.30305
Standard deviation = 1.09984
No. of observations = 604 Degrees of freedom
Regression Sum of Squares = 726.000 52
Residual Sum of Squares = 3.41179 551
Total Sum of Squares = 729.412 603
Standard error of e = .07869
Fit R-squared = .99532 R-bar squared = .99488
Model test F[ 52, 551] = 2254.77325 Prob F > F* = .00000
Diagnostic Log likelihood = 706.21504 Akaike I.C. = -5.00084
Restricted (b=0) = -914.01557 Bayes I.C. = -4.61443
Chi squared [ 52] = 3240.46122 Prob C2 > C2* = .00000
Estd. Autocorrelation of e(i,t) = .668792
--------------------------------------------------
Panel:Groups Empty 0, Valid data 49
Smallest 3, Largest 13
Average group size in panel 12.33
Variances Effects a(i) Residuals e(i,t)
.423441 .006192
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
LNQ2| .29374*** .02850 10.31 .0000 .23789 .34959
LNQ3| .01612*** .00543 2.97 .0030 .00547 .02676
LPLE| .66452*** .03580 18.56 .0000 .59434 .73469
LPKE| .31777*** .01863 17.05 .0000 .28125 .35430
--------+--------------------------------------------------------------------
(These are the estimated parameters in the estimated pooled stochastic frontier model.)
Constant| -10.0907*** 1.14284 -8.83 .0000 -12.3306 -7.8507
LNQ2| .64179*** .01371 46.80 .0000 .61491 .66867
LNQ3| .06855*** .00655 10.46 .0000 .05570 .08139
LPLE| .53971*** .08858 6.09 .0000 .36610 .71333
LPKE| .26045*** .03260 7.99 .0000 .19655 .32435
|Variance parameters for compound error
Lambda| 1.29697*** .13854 9.36 .0000 1.02545 1.56850
Sigma| .44345*** .00056 789.05 .0000 .44235 .44455
(These are the estimated parameters in the estimated Pitt and Lee model.)
|Deterministic Component of Stochastic Frontier Model
Constant| -7.25643*** .24767 -29.30 .0000 -7.74185 -6.77101
LNQ2| .36259*** .01503 24.12 .0000 .33312 .39205
LNQ3| .01902*** .00240 7.94 .0000 .01432 .02372
LPLE| .64148*** .02112 30.38 .0000 .60009 .68287
LPKE| .30842*** .00700 44.08 .0000 .29471 .32214
|Variance parameters for compound error
Lambda| 12.1932** 5.55909 2.19 .0283 1.2975 23.0888
Sigma(u)| .96071*** .13303 7.22 .0000 .69998 1.22145
E65: Data Envelopment Analysis E-84

Figure E64.4 Cornwell et al. Estimates vs. Normal-Half Normal

Figure E64.5 Estimated Inefficiencies from Cornwell et al. and Pitt and Lee Models
E65: Data Envelopment Analysis E-85

E64.5 Battese and Coelli – Time Dependent Inefficiency


Models
Battese and Coelli (1992) proposed a series of models that can be collected in the general
form
yit = xit + vit - uit
uit = g(zit) |Ui| where Ui is half normal or truncated normal.

Several formulations are available. In Battese and Coelli‟s original formulation, the distribution was
half normal and the base specification was

g(zit) = exp[-(t – T)]

where T is the number of periods in their balanced panel. (Here it would be Ti.) They also suggested

g(zit) = exp[-1(t – T) + -2(t – T)2].

The first (linear) form is taken to be the default case for this model. The second is not provided in
this package. The BC92 model is requested with

FRONTIER ; Lhs = ... ; Rhs = one,...


; Model = BC
; Panel $

A truncated normal version is requested by adding

; Rh2 = list of variables which may (generally should) include one

(The ; Model = T is not needed here.)

We note a warning to practitioners. When the data are very consistent with the model, the
Battese and Coelli model produces quite satisfactory results. The framework has been employed in
many recent empirical applications. But, when the data are not of particularly good quality, or this
is the wrong model, extreme results can emerge. The airline data examined in Chapter E63 (and the
WHO data), for example, are a poor fit to this model.
We have labeled this model as „time dependent‟ rather than time varying. While the
inefficiency component in the model does vary through time, the variation is systematic with respect
to time. A question pursued in the ongoing literature is the extent to which this model actually
moves away from the time invariant specification of Pitt and Lee. Since there is actual variation, the
result is clearly somewhere between Pitt and Lee and what we have labeled the unstructured „pooled‟
model. If  equals zero, Pitt and Lee emerges, so it depends entirely on this parameter. We have
found in some investigations that the end result is actually closer to Pitt and Lee than it is to the
pooled model – that is, there is quite a lot of structure involved in the BC92 model. The example
below illustrates.
E65: Data Envelopment Analysis E-86

E64.5.1 Application
To illustrate the Battese and Coelli models, we return to the railroad data used previously.
The base case is the pooled data stochastic cost frontier. This is followed by the Pitt and Lee model
and, finally, by the original Battese Coelli „time decay‟ model,

g(zit) = exp[-(t - Ti)].

The commands are

SAMPLE ; All $
REJECT ; ti = 1 $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Model = BC ; Panel ; Costeff = eucbc92 $
DSTAT ; Rhs = eucbc92,eusfpool $
KERNEL ; Rhs = eucbc92,eusfpool
; Title = Estimated Cost Efficiencies - Battese-Coelli 1992 vs. Pooled $
KERNEL ; Rhs = eucbc92,pittlee
; Title = Estimated Cost Efficiencies - Battese-Coelli 1992 vs. Pitt and Lee $

The kernel density estimators are used to compare the efficiency estimates from the pooled data
model to the Battese and Coelli model. The estimates of exp(-E[uit|εi]) from the Battese and Coelli
model are far larger than those from the pooled model. The assumption of time invariance of the
random term is a major component of this model. The second kernel estimator below compares
Battese-Coelli to Pitt-Lee. The correspondence of the two results is striking, albeit to be expected
given the small estimated value of .
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function -209.42340
Estimation based on N = 604, K = 7
Inf.Cr.AIC = 432.8 AIC/N = .717
Variances: Sigma-squared(v)= .07332
Sigma-squared(u)= .12333
Sigma(v) = .27077
Sigma(u) = .35119
Sigma = Sqr[(s^2(u)+s^2(v)]= .44345
Gamma = sigma(u)^2/sigma^2 = .62716
Var[u]/{Var[u]+Var[v]} = .37937
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.060
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
E65: Data Envelopment Analysis E-87

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -10.0907*** 1.14284 -8.83 .0000 -12.3306 -7.8507
LNQ2| .64179*** .01371 46.80 .0000 .61491 .66867
LNQ3| .06855*** .00655 10.46 .0000 .05570 .08139
LPLE| .53971*** .08858 6.09 .0000 .36610 .71333
LPKE| .26045*** .03260 7.99 .0000 .19655 .32435
|Variance parameters for compound error
Lambda| 1.29697*** .13854 9.36 .0000 1.02545 1.56850
Sigma| .44345*** .00056 789.05 .0000 .44235 .44455
--------+--------------------------------------------------------------------

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function 530.16177
Estimation based on N = 604, K = 8
Inf.Cr.AIC = -1044.3 AIC/N = -1.729
Stochastic frontier based on panel data
Estimation based on 49 individuals
Variances: Sigma-squared(v)= .00613
Sigma-squared(u)= .97581
Sigma(v) = .07828
Sigma(u) = .98783
Sigma = Sqr[(s^2(u)+s^2(v)]= .99093
Gamma = sigma(u)^2/sigma^2 = .99376
Var[u]/{Var[u]+Var[v]} = .98301
Stochastic Cost Frontier Model, e = v+u
Battese-Coelli Models: Time Varying uit
Time dependent uit=exp[-eta(t-T)]*|U(i)|
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1481.231
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -6.83502*** .27362 -24.98 .0000 -7.37130 -6.29873
LNQ2| .35459*** .01636 21.68 .0000 .32254 .38665
LNQ3| .02183*** .00238 9.17 .0000 .01716 .02649
LPLE| .61516*** .02092 29.40 .0000 .57415 .65617
LPKE| .30931*** .00701 44.09 .0000 .29556 .32306
|Variance parameters for compound error
Lambda| 12.6195*** .01188 1062.18 .0000 12.5962 12.6428
Sigma(u)| .98783*** .15275 6.47 .0000 .68845 1.28721
|Eta parameter for time varying inefficiency
Eta| -.00248*** .00086 -2.89 .0039 -.00416 -.00080
--------+--------------------------------------------------------------------
E65: Data Envelopment Analysis E-88

--------+---------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+---------------------------------------------------------------------
EUCBC92| .514566 .231680 .085140 .982112 604 0
EUSFPOOL| .760991 .095229 .478178 .906348 604 0
--------+---------------------------------------------------------------------

Figure E64.6 Kernel Density Estimates for Inefficiencies from Battese and Coelli Model

Figure E64.7 Kernel Density Estimates for Inefficiencies


E65: Data Envelopment Analysis E-89

E64.5.2 Technical Details


To form the log likelihood function for the model, we use Battese and Coelli‟s
parameterization of the model. The contribution of the ith individual (firm, group, etc.) to the log
likelihood is
T (T  1) log(1   ) 1 Ti it2
log Li   i (log 2  log  2 )  i   t 1
2 2 2 (1   ) 2

2 

T

 log 1    t i 1 g it2  1 
1
  
2
1      A2
  i   log   i   i  log  ( Ai )
2        2
 
2  u2   2v
  u2 /  2
 it  yit  xit
i  0 or  or w i
git  exp[(t  Ti )] or exp(  z it )
S  1 for a production model and -1 for a cost model
(1   )i  S Tt i 1 git it
Ai 

 (1   ) 1   Tti 1 git2  1 
  
Derivatives of this function are complicated in the extreme, and are omitted here. (Some useful
results for obtaining them are found in Battese and Coelli (1992, 1995).)
The Jondrow estimator of uit is

E[uit | i1,i2,...] = git E[ui | i1,i2,...]


  (i / i )  
= git i  i  
  (i / i )  

(1   )i  Tt i 1 git ( S it )


where i =
(1   )  Tti 1 git2

 (1   )2
 2i =
(1   )  Tti 1 git2
E65: Data Envelopment Analysis E-90

E64.6 Time Varying Inefficiency in the Battese Coelli Model


The general form of the Battese and Coelli model is,

yit = xit + vit - uit


uit = g(zit) |Ui| where Ui is half normal or truncated normal.

The default form used earlier is g(zit) = exp[-(t – Ti)]. You may also use a more general form,

g(zit) = exp(zit)

where zit contains any desired set of variables. For this extension, use

FRONTIER ; Lhs = ... ; Rhs = one,...


; Model = BC ; Hfu = the variables in z
; Pds = the panel specification $

As before, the truncated normal version of the model is also supported. For an example, we have
used
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Model = BC ; Panel ; Costeff = eucbc92h
; Hfu = rack,virage,tunnel $

The estimates of cost efficiency produced by this model are identical to those from the base model in
the previous section.

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LNC
Log likelihood function 529.63533
Stochastic frontier based on panel data
Estimation based on 49 individuals
Variances: Sigma-squared(v)= .00615
Sigma-squared(u)= .94808
Sigma(v) = .07840
Sigma(u) = .97369
Sigma = Sqr[(s^2(u)+s^2(v)]= .97685
Gamma = sigma(u)^2/sigma^2 = .99356
Var[u]/{Var[u]+Var[v]} = .98247
Stochastic Cost Frontier Model, e = v+u
Battese-Coelli Models: Time Varying uit
Time varying uit=exp[eta*z(i,t)]*|U(i)|
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 3
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 4
LogL when sigma(u)=0 -210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1480.178
Kodde-Palm C*: 95%: 8.761, 99%: 12.483
E65: Data Envelopment Analysis E-91

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LNC| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -6.89845*** .32923 -20.95 .0000 -7.54374 -6.25316
LNQ2| .35751*** .01591 22.47 .0000 .32632 .38870
LNQ3| .02149*** .00236 9.10 .0000 .01686 .02613
LPLE| .61741*** .02430 25.40 .0000 .56977 .66504
LPKE| .30892*** .00759 40.71 .0000 .29405 .32380
|Variance parameters for compound error
Lambda| 12.4202*** .01108 1120.76 .0000 12.3984 12.4419
Sigma(u)| .97369*** .13513 7.21 .0000 .70884 1.23855
|Coefficients in u(i,t)=[exp{eta*z(i,t)}]*|U(i)|
RACK| .00024 .01743 .01 .9889 -.03392 .03441
VIRAGE| -.02096 .01321 -1.59 .1126 -.04685 .00493
TUNNEL| .00219 .01625 .14 .8926 -.02966 .03405
--------+--------------------------------------------------------------------
(Parameter estimates from base case Battese and Coelli)
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -6.83502*** .27362 -24.98 .0000 -7.37130 -6.29873
LNQ2| .35459*** .01636 21.68 .0000 .32254 .38665
LNQ3| .02183*** .00238 9.17 .0000 .01716 .02649
LPLE| .61516*** .02092 29.40 .0000 .57415 .65617
LPKE| .30931*** .00701 44.09 .0000 .29556 .32306
|Variance parameters for compound error
Lambda| 12.6195*** .01188 1062.18 .0000 12.5962 12.6428
Sigma(u)| .98783*** .15275 6.47 .0000 .68845 1.28721
|Eta parameter for time varying inefficiency
Eta| -.00248*** .00086 -2.89 .0039 -.00416 -.00080
--------+--------------------------------------------------------------------

E64.7 True Fixed Effects Models


The received applications of fixed effects to the stochastic frontier model, primarily
Cornwell, Schmidt and Sickles have actually been reinterpretations of the linear regression model
with fixed effects, not frontier models of the sort considered here. The estimators described below
apply the fixed effects to the stochastic frontier. We label these „true fixed effects models‟ to
distinguish them from the linear regression models as discussed in Section E64.3. (This is not meant
to apply that these are „false fixed effects models.‟ Had we used „real fixed effects models,‟ then the
contrasting „unreal fixed effects models‟ would arise which is likewise problematic. We use this
purely as a concise term of art, not a characterization of the types of estimators considered.)
The stochastic frontier model with fixed effects may be fit in several forms. The base case
applies the heterogeneity to the normal-half normal production function model;

yit = i + xit + vit - Suit,

where S = +1 for a production frontier and -1 for a cost frontier, and

ui = | N[0, u2] |.
E65: Data Envelopment Analysis E-92

This model (as are the others) is fit by maximum likelihood, not least squares. The normal-half
normal model is applied to the stochastic part of the model. Note that the inefficiency term in this
model is time varying. The heterogeneity may appear in Stevenson‟s truncated normal model as
follows. This is a true fixed effects, normal-truncated normal model.

yit = i+ xit + vit - uit,


ui = | N[i, u2] |
i = zi.

In this form, the heterogeneity is still retained in the production function part of the model. Another
possibility is to allow the heterogeneity to enter the mean of the inefficiency distribution rather than
the production function – this seems the most natural of the three forms. In this case,

yit = xit + vit - uit,


uit = | N[it , u2] |
it = i +  (nonzero) or zi.

The mean of the inefficiency distribution shifts in time, but also has a firm specific component.
Finally, the heterogeneity may be shifted to the variance of the inefficiency distribution. In this
form, we have
yit = xit + vit - uit,
uit = | N[0, ui2] |
uit2 = u2  exp(i +zit).

The variables in the variance term may be omitted if only a groupwise heteroscedastic model is
desired. Note this is a half normal model. A model with nonzero underlying mean and variation in
the variance appears to be inestimable. Note that in order to secure identification, this model must
have time varying inefficiency, induced by time variation in the variance.

NOTE: We have had extremely limited success with the second and third forms of the model. The
likelihood function is quite volatile in the parameters of the underlying mean of the truncated
distribution with the result that the estimated variance parameters  and  generally become negative
in the early iterations and estimation must be halted. This occurs even when very good starting
values are used, which suggests that estimation of this model as stated is likely to be extremely
problematic in all but the most favorable of cases. An alternative approach which is simple, but can
be used only with small panels (up to 100 groups), is suggested below.

In terms of implementation, we note that these forms of the models, though they are new
with LIMDEP, have long been feasible. The panels typically used by researchers in this setting are
often fairly small – our airline data for example have only 25 units and the Swiss railroad data has 49
firms. It would always have been possible to create these models simply by adding dummy variables
to the familiar model. However, LIMDEP‟s implementation of the model obviates this by using the
methodology described in Chapter R23. In principle, this allows up to 100,000 firms in the data set.
E65: Data Envelopment Analysis E-93

Results that are kept for this model are

Matrices: b = estimate of 
varb = asymptotic covariance matrix for estimate of .
alphafe = estimated fixed effects (if ; Par is in the command)

Scalars: kreg = number of variables in Rhs


nreg = number of observations
logl = log likelihood function

Last Model: b_variables

The upper limit on the number of groups is 100,000.

E64.7.1 Commands for the Fixed Effects Stochastic Frontier Model


The command for fitting the normal-half normal model with fixed effects is as follows:
FRONTIER ; Lhs = ... ; Rhs = one,... $
FRONTIER ; Lhs = ... ; Rhs = one,...
; FEM ; Pds = specification $

The model must be fit twice. The first model is a pooled data model which provides the starting values
for the second. The second command is identical to the first save for the addition of the panel data
specification. In order to set up the initial values correctly, it is essential that your initial model include
the constant term first in the Rhs list and that the second model specification be identical to the first.
Other options and specifications for the fixed effects models are the same as in other applications. (See
Chapter R23 for details.) The fixed effects command also contains the constant term, but this will be
removed by the command processor later. See the example below for the operation of the command.

NOTE: Starting values must be provided by the first estimator. The specification ; Start = list of
values is not available for this model. You must fit both models each time you fit an FEM. The
starting values are not retained after the FEM is estimated.

All fixed effects forms are estimated by maximum likelihood. You may also fit a two way
fixed effects model
yit = i+ t + xit + vit - ui, (change to v + u for a stochastic cost frontier),
ui = | N[0, u2] |

where t is an additional, time (period) specific effect. The time specific effect is requested by adding

; Time

to the command if the panel is balanced, and

; Time = variable name

if the panel is unbalanced.


E65: Data Envelopment Analysis E-94

For the unbalanced panel, we assume that overall, the sample observation period is
t = 1,2,..., Tmax and that the time variable gives for the specific group, the particular values of t that
apply to the observations. Thus, suppose your overall sample is five periods. The first group is three
observations, periods 1, 2, 4, while the second group is four observations, 2, 3, 4, 5. Then, your
panel specification would be

; Pds = Ti, for example, where Ti = 3, 3, 3, 4, 4, 4, 4


and ; Time = Pd, for example, where Pd = 1, 2, 4, 2, 3, 4, 5.

E64.7.2 Model Specifications for Fixed Effects Stochastic Frontier


Models
This is the full list of general specifications that are applicable to this model estimator.

Controlling Output from Model Commands

; Par keeps ancillary parameter  in main results vector b.


; Table = name saves model results to be combined later in output tables.

Robust Asymptotic Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),
same as ; Printvc.

Optimization Controls for Nonlinear Optimization

; Start = list gives starting values for a nonlinear model.


; Tlg[ = value] sets convergence value for gradient.
; Maxit = n sets the maximum iterations.
; Output = n requests technical output during iterations; the level „n‟ is 1, 2, 3 or 4.
; Set keeps current setting of optimization parameters as permanent.

Predictions and Residuals

; List displays a list of fitted values with the model estimates.


; Keep = name keeps fitted values as a new (or replacement) variable in data set.
; Res = name keeps residuals as a new (or replacement) variable.

Hypothesis Tests and Restrictions

; Test: spec defines a Wald test of linear restrictions.


; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec.
E65: Data Envelopment Analysis E-95

E64.7.3 Application of the True Fixed Effects Model


We have fit the fixed effects model with the airline data used in the previous chapter. These
are simple models that do not use the observed heterogeneity in load factor, stage length or number
of points served. Additional variables which vary over time can also be included in the function. The
commands employed for the example are

SETPANEL ; Group = firm ; Pds = ti $


FRONTIER ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp,lk$
FRONTIER ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp,lk,
; FEM ; Panel ; Techeff = euitfe ; Par $
REGRESS ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp,lk
; Panel ; Fixed Effects $
CREATE ; ai = alphafe(firm) $
CALC ; maxai = Max(ai) $
CREATE ; euicss = exp(-(maxai - ai)) $
CREATE ; meuitfe = Group Mean(euitfe, Pds = ti) $
SAMPLE ; All $
CREATE ; Period = Ndx(firm,1) $
PLOT ; For[period=1] ; Lhs = firm ; Rhs = euitfe,euicss
; Fill ; Symbols ; Limits = 0,1 ; Grid
; Title = Technical Efficiency Estimates, CSS vs. True Fixed Effects
(Group Means)
; Vaxis = Estimated Technical Efficiency $

This command recovers the estimated fixed effects from the Cornwell et al. model. then replicates
them for each year in the data set. This is used to create the plot of the two sets of estimates of u i
shown below.
-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 108.43918
Estimation based on N = 256, K = 9
Inf.Cr.AIC = -198.9 AIC/N = -.777
Model estimated: Aug 17, 2011, 06:36:42
Variances: Sigma-squared(v)= .01902
Sigma-squared(u)= .01692
Sigma(v) = .13791
Sigma(u) = .13007
Sigma = Sqr[(s^2(u)+s^2(v)]= .18957
Gamma = sigma(u)^2/sigma^2 = .47074
Var[u]/{Var[u]+Var[v]} = .24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = .730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
E65: Data Envelopment Analysis E-96

--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439
LF| .37257*** .07038 5.29 .0000 .23463 .51052
LM| .69910*** .07580 9.22 .0000 .55054 .84766
LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299
LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530
LP| .44533*** .09498 4.69 .0000 .25917 .63149
LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759
|Variance parameters for compound error
Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373
Sigma| .18957*** .00064 297.81 .0000 .18832 .19082
--------+--------------------------------------------------------------------

Normal exit from iterations. Exit status=0.


-----------------------------------------------------------------------------
FIXED EFFECTS Frontr Model
Dependent variable LQ
Log likelihood function 205.05799
Estimation based on N = 256, K = 33
Inf.Cr.AIC = -344.1 AIC/N = -1.344
Model estimated: Aug 17, 2011, 06:36:46
Unbalanced panel has 25 individuals
Skipped 0 groups with inestimable ai
Half normal stochastic frontier
Sigma( u) (1 sided) = .11713
Sigma( v) (symmetric)= .08347
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Production / Cost parameters
LF| .20090** .09879 2.03 .0420 .00727 .39453
LM| .78173*** .07495 10.43 .0000 .63483 .92863
LE| .56626 .62357 .91 .3638 -.65591 1.78843
LL| -.16687 .11488 -1.45 .1464 -.39204 .05830
LP| .17273* .09414 1.83 .0665 -.01177 .35724
LK| -.29167 .69055 -.42 .6728 -1.64513 1.06179
|Variance parameter for v +/- u
Sigma| .14383*** .00045 317.51 .0000 .14294 .14472
|Asymmetry parameter, lambda
Lambda| 1.40326*** .21468 6.54 .0000 .98248 1.82403
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-97

-----------------------------------------------------------------------------
LSDV least squares with fixed effects ....
LHS=LQ Mean = -1.11237
Standard deviation = 1.29728
No. of observations = 256 Degrees of freedom
Regression Sum of Squares = 426.103 30
Residual Sum of Squares = 3.04876 225
Total Sum of Squares = 429.152 255
Standard error of e = .11640
Fit R-squared = .99290 R-bar squared = .99195
Model test F[ 30, 225] = 1048.21999 Prob F > F* = .00000
Diagnostic Log likelihood = 203.84835 Akaike I.C. = -4.18825
Restricted (b=0) = -429.37729 Bayes I.C. = -3.75896
Chi squared [ 30] = 1266.45126 Prob C2 > C2* = .00000
Estd. Autocorrelation of e(i,t) = .575211
--------------------------------------------------
Panel:Groups Empty 0, Valid data 25
Smallest 2, Largest 15
Average group size in panel 10.24
Variances Effects a(i) Residuals e(i,t)
.030410 .013550
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error t |t|>T* Interval
--------+--------------------------------------------------------------------
LF| .14860 .09677 1.54 .1259 -.04107 .33828
LM| .80497*** .07843 10.26 .0000 .65125 .95868
LE| .68672 .67075 1.02 .3069 -.62792 2.00136
LL| -.15977 .11829 -1.35 .1780 -.39162 .07208
LP| .16227 .09973 1.63 .1050 -.03320 .35774
LK| -.37897 .74689 -.51 .6123 -1.84284 1.08490
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Figure E64.8 plots the Jondrow estimates of exp(-E[uit|it]) from the true fixed effects model
and the estimates of ui from the Cornwell, Schmidt and Sickles model of Section E64.4 for each
firm. Since the true FE estimates vary by period, we have plotted the group means. The implication
of the regression based model is clear in the figure. The estimates of technical efficiency from the
true FEM are generally considerably larger than those from the deterministic model.

Figure E64.8 True Fixed Effects vs. Fixed Effects Estimates of ui


E65: Data Envelopment Analysis E-98

E64.7.4 Fixed Effects in the Normal-Truncated Normal Model


The preceding may be extended to the truncated normal (with earlier caveats) as follows: For
a model with heterogeneity appearing in the production (or cost) function,

yit = i + xit + vit - uit,


uit = | N[it , u2] |
it =  (nonzero) or zit,

use FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...
; Model = T $
FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...
; FEM ; Panel $

The Rh2 is optional in the first equation if you have only a constant term in the mean of the truncated
distribution. But, you should include it nonetheless so as to insure the match between the first and
second commands. Also, it is essential that both Rhs and Rh2 include constant terms in the first
positions.
To move the heterogeneity to the mean of the underlying truncated normal distribution,

yit = xit + vit - uit,


ui = | N[itu2] |
it = i + zit,

use FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...
; Model = T $
FRONTIER ; Lhs = ... ; Rhs = one, ... ; Rh2 = one, ...
; Model = T
; FEM ; Panel $

Note that this version differs from the earlier one only in the presence of ; Model = T in the second
form and its absence in the first. Again, the variable specifications in the two commands must be
identical, and both must include constant terms in the first position in both lists. As before, you may
use ; Rh2 = one if you do not require variables zit in the mean. (This constant term will be removed
from the fixed effects model, but this common value is used as the starting value for the firm specific
estimates.)
We note, we have had scant success with this model even with a carefully constructed data
set and good starting values. The problem appears to be Newton‟s method, which must be used for
the general fixed effects program which this is part of. If you have a small panel with no more than
100 groups, an alternative approach appears to work better. You may provide a stratification
variable in the cross section template to request that a set of dummy variables be inserted directly
into the function.
E65: Data Envelopment Analysis E-99

To fit a model of the first form above, use

FRONTIER ; Lhs = ... ; Rhs = one,...


; Model = T [ ; Rh2 = list is optional ]
; Str = a variable which provides a group indicator for the panel $

The stratification variable must take the full set of values from 1 to N up to 100 and all groups must
have at least two observations. For the second form, with the heterogeneity embedded in the mean
of the truncated normal distribution, add

; Mean

to the command.
This provides four possible forms of the model, which we illustrate with the airline data:

NAMELIST ; x = one,lf,lm,le,ll,lp,lk $

This is a true fixed effects model with normal-truncated normal structure for uit.

FRONTIER ; Lhs = lq ; Rhs = x


; Model = T
; Str = firm $

This model is the same as the preceding one except now i= 1 + 2loadfctri.

FRONTIER ; Lhs = lq ; Rhs = x


; Model = T
; Rh2 = one,loadfctr
; Str = firm $

This is a true fixed effects model with the fixed effects appearing in i rather than in the production
function.

FRONTIER ; Lhs = lq ; Rhs = x


; Model = T
; Mean
; Str = firm $

This model is the same as the preceding model except that loadfctr now also appears in the mean of
the truncated variable.

FRONTIER ; Lhs = lq ; Rhs = x


; Model = T
; Rh2 = one,loadfctr ; Mean
; Str = firm $
E65: Data Envelopment Analysis E-100

E64.7.5 Fixed Effects in the Heteroscedasticity Model


The firmwise heteroscedasticity model,

yit = xit + vit - uit,


uit = | N[0, uit2] |
uit2 = u2  exp(i +zit)

is requested in the same fashion as the normal-truncated normal model, using a stratification variable
in the cross section formulation. (This likelihood function is likewise quite ill behaved, though less
so than the truncation form.) The command is

FRONTIER ; Lhs = ... ; Rhs = one, ...


; Het
; Hfu = list of variables ; Hfv = one
; Str = stratification variable $

This model also allows for the doubly heteroscedastic form,

yit = xit + vit - uit,


uit = | N[0, uit2] |
uit2 = u2  exp(i +zit)
vit ~ N[0,vit2]
vit2 = v2 exp(′wit)

The command would be

FRONTIER ; Lhs = ... ; Rhs = one, ...


; Het
; Hfu = list of variables ; Hfv = list of variables
; Str = stratification variable $

To continue the earlier example, the following fits a model of heteroscedasticity to the
airline data. The first model has heteroscedasticity and the fixed effects in the variance of ui. The
second is doubly heteroscedastic, again with the fixed effects in the variance of ui.

NAMELIST ; x = one,lf,lm,le,ll,lp,lk $
FRONTIER ; Lhs = lq ; Rhs = x
; Het ; Hfu = one,loadfctr ; Hfv = one ; Str = firm $
FRONTIER ; Lhs = lq ; Rhs = x
; Het ; Hfu = one,loadfctr ; Hfv = one,loadfctr ; Str = firm $
E65: Data Envelopment Analysis E-101

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 182.50025
Variances: Sigma-squared(v)= .00876
Sigma-squared(u)= .04920
Sigma(v) = .09357
Sigma(u) = .22182
Sigma = Sqr[(s^2(u)+s^2(v)]= .24075
Gamma = sigma(u)^2/sigma^2 = .84892
Var[u]/{Var[u]+Var[v]} = .67126
Variances averaged over observations
Stochastic Production Frontier, e = v-u
Stratified by FIRM , 25 groups
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -3.70847*** .75902 -4.89 .0000 -5.19612 -2.22081
LF| .38142*** .08642 4.41 .0000 .21204 .55079
LM| .57659*** .09175 6.28 .0000 .39676 .75642
LE| 2.78934*** .72692 3.84 .0001 1.36459 4.21408
LL| -.41646*** .08641 -4.82 .0000 -.58582 -.24710
LP| .59190*** .11704 5.06 .0000 .36251 .82129
LK| -2.87861*** .80566 -3.57 .0004 -4.45767 -1.29956
|Parameters in variance of v (symmetric)
Constant| -4.73798*** .21921 -21.61 .0000 -5.16764 -4.30833
|Parameters in variance of u (one sided)
Constant| 8.11346 7.80244 1.04 .2984 -7.17903 23.40596
LOADFCTR| -23.6678*** 6.88328 -3.44 .0006 -37.1588 -10.1768
FIRM001| 1.35540 7.37739 .18 .8542 -13.10403 15.81482
FIRM002| .25791 7.25149 .04 .9716 -13.95476 14.47057
FIRM003| .68176 7.22190 .09 .9248 -13.47290 14.83643
(Firms 4-20 omitted)
FIRM021| .73089 7.21226 .10 .9193 -13.40488 14.86666
FIRM022| -.38963 7.46091 -.05 .9584 -15.01274 14.23347
FIRM023| -.63171 7.53984 -.08 .9332 -15.40952 14.14610
FIRM024| -7.77451 41.07339 -.19 .8499 -88.27688 72.72786
--------+--------------------------------------------------------------------
Note: nnnnn.D-xx or D+xx => multiply by 10 to -xx or +xx.
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-102

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 190.29998
Estimation based on N = 256, K = 35
Inf.Cr.AIC = -310.6 AIC/N = -1.213
Model estimated: Aug 22, 2011, 22:57:54
Variances: Sigma-squared(v)= .00906
Sigma-squared(u)= .04124
Sigma(v) = .09519
Sigma(u) = .20307
Sigma = Sqr[(s^2(u)+s^2(v)]= .22427
Gamma = sigma(u)^2/sigma^2 = .81986
Var[u]/{Var[u]+Var[v]} = .62318
Variances averaged over observations
Stochastic Production Frontier, e = v-u
Stratified by FIRM , 25 groups
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -3.00340*** .65319 -4.60 .0000 -4.28364 -1.72316
LF| .24071*** .07721 3.12 .0018 .08938 .39204
LM| .60992*** .07600 8.03 .0000 .46096 .75887
LE| 2.19046*** .62677 3.49 .0005 .96202 3.41890
LL| -.38679*** .07314 -5.29 .0000 -.53015 -.24344
LP| .49345*** .09820 5.03 .0000 .30098 .68591
LK| -2.09638*** .69385 -3.02 .0025 -3.45631 -.73646
|Parameters in variance of v (symmetric)
Constant| -13.5487*** 2.64897 -5.11 .0000 -18.7406 -8.3569
LOADFCTR| 15.5221*** 4.48367 3.46 .0005 6.7343 24.3099
|Parameters in variance of u (one sided)
Constant| 8.01865 5.60084 1.43 .1522 -2.95879 18.99609
LOADFCTR| -23.3031*** 6.88508 -3.38 .0007 -36.7976 -9.8086
FIRM001| .88200 5.06220 .17 .8617 -9.03972 10.80373
FIRM002| -.83198 4.67591 -.18 .8588 -9.99660 8.33264
FIRM003| -.18608 4.65296 -.04 .9681 -9.30573 8.93356
(Firms 4-20 omitted)
FIRM021| .35047 4.63405 .08 .9397 -8.73210 9.43303
FIRM022| -.68781 4.83235 -.14 .8868 -10.15903 8.78342
FIRM023| -.96206 4.88186 -.20 .8438 -10.53033 8.60622
FIRM024| -2.86357 4.82675 -.59 .5530 -12.32383 6.59670
--------+--------------------------------------------------------------------
E65: Data Envelopment Analysis E-103

E64.8 True Random Effects Models


We call the stochastic frontier model with a random as opposed to a fixed effect term a „true
random effects‟ model. The structure is the normal-half normal stochastic frontier model,

yit = wi+  + ′xit + vit + uit


vit ~ N[0,v2]
uit = |Uit|, Uit ~ N[0,u2]
wi ~ N[0,w2].

At first look, this appears to be a model with a three part disturbance, which would surely be
inestimable. But, that is incorrect. It is a model with a traditional random effect, but with the
additional feature that the time varying disturbance is not normally distributed. Specifically, the
model may be written in our familiar form for the stochastic frontier model,

yit =  + ′xit + it + wi


it ~ (2/)(it/)(-it/)
wi ~ N[0,w2].

The model is estimable by maximum simulated likelihood, as shown below. Contrast this to the Pitt
and Lee form,
yit=  + ′xit + vit + ui
vit~ N[0,v2]
ui = |Ui|, Ui ~ N[0,u2].

In this form, ui, the time invariant effect, is the inefficiency. In the true random effects model, uit is
the inefficiency, and it is time varying. The latent heterogeneity, the random effect, is wi. Thus, in
the Pitt and Lee model, the „inefficiency‟ term also contains all other time invariant unmeasured
sources of heterogeneity. In the true random effects model, these effects appear in wi, and uit picks
up the inefficiency. By this interpretation, we will expect (and always find) that estimated
inefficiencies from the Pitt and Lee are larger than those from the true random effects model,
sometimes far larger. The same result is at work in the difference between the Cornwell et al. fixed
effects model and the true fixed effects model. Figure E64.8 clearly shows the effect at work.
The true random effects model is estimated as a form of random parameters (RP) model, in
which the only random parameter in the model is the constant term. Thus, we write the model in the
canonical RP form
yit = i + ′xit + vit + uit
vit ~ N[0,v2]
uit = |Uit|, Uit~ N[0,u2]
i =  + wi
wi ~ N[0,w2]
E65: Data Envelopment Analysis E-104

Details on estimating random parameters models appear in Chapter R24, so they will be omitted
here.
The command structure for the true random effects model is similar to that for the true fixed
effects model. The frontier model must be fit twice, first with no effects to generate the starting
values, then with the effect specified. The commands are

FRONTIER ; Lhs = ... ; Rhs = one,... ; Par $


FRONTIER ; Lhs = ... ; Rhs = one,...
; RPM ; Fcn = one(n) $

If desired, the Jondrow estimates are requested as usual with

; Eff = the variable name

The computation of random parameters models is fairly time consuming because of the simulations.
You can control this in part with

; Pts = the number of replications

For exploratory work (or for examples in program documentation), small values such as 25 or 50 are
sufficient. For final results destined for publication, larger values, in the range of several hundred
are advisable. Also, we advise using Halton sequences rather than pseudorandom numbers for the
simulations (see Chapter R24). The parameter is

; Halton

The random parameters formulation also allows a variety of specifications for the mean of the
underlying uit – the normal-truncated normal model – and for heteroscedasticity. These are
discussed in Section E64.9.

Application

To illustrate the true random effects model, we continue the analysis of the airline data. The
commands below estimate the pooled model, then the true RE model. In like fashion to the analysis
of fixed effects, we then compare the true random effects estimates of inefficiency to the Pitt and Lee
estimates. Figure E64.8 illustrates the general result that the estimated inefficiencies in the true fixed
effects model will differ considerably from those produced by the Cornwell et al. approach to fixed
effects. Figure E64.9 shows the same result for the two approaches to random effects. Numerous
studies in the literature (see Greene (2005) for discussion) have documented the similarity of the
random and fixed approaches – when the same overall structure is used. Thus, Figure E64.10 shows
similar results for the true fixed and random effects models and for the Pitt and Lee and Cornwell et
al. models.
E65: Data Envelopment Analysis E-105

The commands used for this application are as follows:

NAMELIST ; x = one,lf,lm,le,ll,lp,lk $
FRONTIER ; Lhs = lq ; Rhs = x ; Panel ; Eff = uplre $
FRONTIER ; Lhs = lq ; Rhs = x ; Par $
FRONTIER ; Lhs = lq ; Rhs = x ; Panel ; RPM ; Eff = utre
; Fcn = one(n) ; Pts = 50 ; Halton $
FRONTIER ; Lhs = lq ; Rhs = x ; Par $
FRONTIER ; Lhs = lq ; Rhs = x ; Panel ; FEM ; Eff = utfe $
DSTAT ; Rhs = uplre,utre $
CREATE ; utrebar = Group Mean(utre, Str = firm) $
PLOT ; Lhs = uplre ; Rhs = utrebar ; Grid
; Title = Group Means of u(i,t) vs. Time Invariant u(i) $
PLOT ; Lhs = utfe ; Rhs = utre ; Grid
; Title = Time Varying FE u(i) vs. Time Varying RE u(i) $

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 156.04955
Estimation based on N = 256, K = 9
Stochastic frontier based on panel data
Estimation based on 25 individuals
Variances: Sigma-squared(v)= .01342
Sigma-squared(u)= .06529
Sigma(v) = .11582
Sigma(u) = .25552
Sigma = Sqr[(s^2(u)+s^2(v)]= .28054
Gamma = sigma(u)^2/sigma^2 = .82955
Var[u]/{Var[u]+Var[v]} = .63879
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = 95.950
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -1.70327*** .41761 -4.08 .0000 -2.52176 -.88477
LF| .19534** .09759 2.00 .0453 .00407 .38662
LM| .81312*** .06954 11.69 .0000 .67682 .94941
LE| 1.12741*** .34589 3.26 .0011 .44947 1.80534
LL| -.32931*** .07230 -4.55 .0000 -.47102 -.18760
LP| .22206*** .06265 3.54 .0004 .09927 .34485
LK| -.86072** .42646 -2.02 .0436 -1.69657 -.02488
|Variance parameters for compound error
Lambda| 2.20605* 1.31249 1.68 .0928 -.36639 4.77849
Sigma(u)| .25552** .10148 2.52 .0118 .05661 .45442
--------+--------------------------------------------------------------------
E65: Data Envelopment Analysis E-106

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LQ
Log likelihood function 108.43918
Estimation based on N = 256, K = 9
Variances: Sigma-squared(v)= .01902
Sigma-squared(u)= .01692
Sigma(v) = .13791
Sigma(u) = .13007
Sigma = Sqr[(s^2(u)+s^2(v)]= .18957
Gamma = sigma(u)^2/sigma^2 = .47074
Var[u]/{Var[u]+Var[v]} = .24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] = .730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439
LF| .37257*** .07038 5.29 .0000 .23463 .51052
LM| .69910*** .07580 9.22 .0000 .55054 .84766
LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299
LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530
LP| .44533*** .09498 4.69 .0000 .25917 .63149
LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759
|Variance parameters for compound error
Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373
Sigma| .18957*** .00064 297.81 .0000 .18832 .19082
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

These are the estimates of the true random effects model. Note that the variation of the
random terms in the model has been rearranged. In the pooled model, sv = 0.138 and su = 0.130. In
the random effects model, we have sv = .099 and su= .100. But, sw = .140. The proportional
allocation of the total to u and v has stayed roughly the same, but some additional variation is now
attributed to the random effect. Note that the production function parameters have changed
substantially as well.
E65: Data Envelopment Analysis E-107

-----------------------------------------------------------------------------
Random Coefficients Frontier Model
Dependent variable LQ
Log likelihood function 160.58066
Restricted log likelihood .00000
Chi squared [ 1 d.f.] 321.16131
Significance level .00000
Estimation based on N = 256, K = 10
Inf.Cr.AIC = -301.2 AIC/N = -1.176
Model estimated: Aug 22, 2011, 23:15:44
Unbalanced panel has 25 individuals
Stochastic frontier (half normal model)
Simulation based on 50 Halton draws
Sigma( u) (1 sided) = .09962
Sigma( v) (symmetric) = .09857
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Production / Cost parameters, nonrandom first
LF| .20387*** .05183 3.93 .0001 .10229 .30545
LM| .79450*** .04660 17.05 .0000 .70318 .88583
LE| 1.10745*** .33573 3.30 .0010 .44943 1.76547
LL| -.32691*** .04277 -7.64 .0000 -.41074 -.24308
LP| .22812*** .05403 4.22 .0000 .12223 .33401
LK| -.84947** .38344 -2.22 .0267 -1.60101 -.09794
|Means for random parameters
Constant| -1.83727*** .35442 -5.18 .0000 -2.53191 -1.14263
|Scale parameters for dists. of random parameters
Constant| .11729*** .00934 12.56 .0000 .09898 .13559
|Variance parameter for v +/- u
Sigma| .14015*** .01373 10.21 .0000 .11325 .16705
|Asymmetry parameter, lambda
Lambda| 1.01064** .43792 2.31 .0210 .15234 1.86895
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Descriptive Statistics
--------+---------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+---------------------------------------------------------------------
UPLRE| .221170 .117670 .016992 .435912 256 0
UTRE| .078815 .031677 .026405 .305595 256 0
--------+---------------------------------------------------------------------
E65: Data Envelopment Analysis E-108

Figure E64.9 Time Varying vs. Time Invariant Estimates of u(i)

Figure E64.10 Comparison of Time Varying Fixed and Random Effects Estimates
E65: Data Envelopment Analysis E-109

E64.9 Random Parameters Stochastic Frontier Models


The random parameters stochastic frontier model in LIMDEP is very general, and embodies
all three of the formulations discussed in the preceding sections on fixed and random effects.

yit =  ixit + vit - uit,


ui = | N[it, uit2] |
it = imit.
uit2 = u2  exp(iwit).

The model allows, all at once, half normal or truncated normal distribution for ui and firmwise and/or
timewise heteroscedasticity in uit. The model form allows parameters to be random in all three parts
of the specification with the single restriction noted below. (Only the variance of the „disturbance,‟
vit is assumed to be constant. In addition, this model form does not accommodate heteroscedasticity
in vit.) As will be clear in what follows, the true random effects model developed in the previous
section is a special case of this model with nonrandom parameters in it and uit2 and
only a random constant term in  i.

NOTE: The random parameters normal-truncated normal model with heteroscedasticity (in uit) at
the same time is not identified. Only one of these two should be specified. The command parser
will not prevent you from specifying such a model, but it will ultimately be impossible to obtain the
parameter estimates.

The general structure of the random parameters stochastic frontier model is based on the
conditional density
f(yit| xit,  i) = f( ixit), i = 1,...,N, t = 1,...,Ti
where i =  + zi + vi

and f(.) is the density for the stochastic frontier regression model. The model assumes that
parameters are randomly distributed with possibly heterogeneous (across individuals) means

E[ i| zi] =  + zi,

(the second term is optional – the mean may be constant), and

Var[ i| zi] = .

As noted earlier, the heterogeneity term is optional. In addition, it may be assumed that some of the
parameters are nonrandom by placing rows of zeros in the appropriate places in and . The general
form of random parameter vector  i is also extended to i and i. The general aspects of random
parameters model estimation in LIMDEP are described in Chapter R24.
E65: Data Envelopment Analysis E-110

Command for the Random Parameters Model

The model command for the random parameters form of the stochastic frontier model is as
follows. The first FRONTIER command is mandatory, and is needed to obtain the starting values.
This is a pooled data version of the model. Note that it does not include the heteroscedasticity or
truncation specification, even if the second command does.

FRONTIER ; Lhs = dependent variable ; Rhs = independent variables


; Parameters $
FRONTIER ; Lhs = dependent variable
; Rhs = independent variables
[ ; Rh2 = list is optional for the truncated normal model ]
[ ; Hfn = list is optional for the heteroscedasticity model ]
; Pds = fixed periods or count variable
; RPM (may include = variables in z)
; Fcn = random parameters specification $

(Note, again, only one of the two optional specifications noted should be specified.)

NOTE: For this model, your Rhs list must include a constant term. Though not strictly necessary,
you should also include constants in Rh2 or Hfn if they are specified.

Specifying Random Parameters

The ; Fcn = specification is used to define the random parameters. It is constructed from
the list of Rhs names as follows: Suppose your model is specified by

; Rhs = one, x1, x2, x3, x4

This involves five coefficients. Any or all of them may be random; any not specified as random are
assumed to be constant. For those that you wish to specify as random, use the following for
production (cost, profit) function parameters,

; Fcn = variable name (distribution),


variable name (distribution), ...

There are two other sets of parameters in the model, in the mean of and variance of the one sided
disturbance. To specify random parameters in the underlying mean of the truncated normal variable,
use the following:

; Fcn = variable name [distribution],


variable name [distribution], ...

(Note square brackets designate the terms in it.) For parameters in the computation of the variance
of uit, use
; Fcn = variable name <distribution>,
variable name <distribution>, ...
E65: Data Envelopment Analysis E-111

The difference in the three formulations is in the enclosures, ( ) for production function, [ ] for mean
of the truncated distribution, and <> for the variance of the one sided disturbance. This distinction
is necessary because the lists might have variables in common, and this is the only way to distinguish
them. In particular, it is likely that all three lists would include one, so this device is used to
distinguish the three functions.
Three distributions may be specified All random variables have mean 0.

n = standard normal distribution, variance = 1,


t = triangular (tent shaped) distribution in [-1,+1], variance = 1/6,
u = standard uniform distribution [-1,1], variance = 1/3.

Note that each of these is scaled as it enters the distribution, so the variance is only that of the
random draw before multiplication. (See Chapter R23 for discussion of this computation and for
other distributions that can be specified.) The latter two distributions are provided as one may wish
to reduce the amount of variation in the tails of the distribution of the parameters across individuals
and to limit the range of variation. (See Train (2010) for discussion.) For example, to specify that
the constant term and the coefficient on x1 are normally distributed with fixed mean and variance,
and a normally distributed constant in the mean of the truncated distribution, you might use

; Fcn = one(n), x1(n), one[n]

This specifies that the first and second coefficients are random while the remainder are not. The
parameters estimated will be the mean and standard deviations of the distributions of these two
parameters and the fixed values of the other three.

NOTE: If you use the wrong enclosures for the variables, a diagnostic will appear that the program
does not recognize a variable. For example:

FRONTIER ; Lhs = lq ; Rhs = one,lf,lm,le,ll,lp


; Hfn = one,lf ; RPM ; Pds = ni
; Fcn = one(n),lf(n),lf[n] $

Variable in FCN=name[type] is not in RHS/RH2/HFN list.

The reason for the diagnostic is that the lf[n] would indicate a specification for the truncation model,
using ; Rh2 = list. But, this command specifies only heteroscedasticity, which is denoted with <>
enclosures. Hence, when the lf[n] is encountered, LIMDEP searches for lf in an Rh2 list, and finding
no such list, issues the diagnostic.
E65: Data Envelopment Analysis E-112

Correlated Random Parameters

The stochastic frontier model does not support correlated random parameters. The model is
not identified with this extension.

Heterogeneity in the Means

The preceding examples have specified that the mean of the random variable is fixed over
individuals. If there is measured heterogeneity in the means, in the form of

E[ki] = k + mkmzmi

where zmi is a variable that is measured for each individual, then the command may be modified to

; RPM = list of variables in z

In the data set, these variables must be repeated for each observation in the group. Since the
coefficients are assumed to be time invariant, the variables in zi must be also.

The Parameter Vector and Retained Results

The variances of the underlying random variables are given earlier, 1 for the normal
distribution, 1/3 for the uniform, and 1/6 for the tent distribution. The k parameters are only the
standard deviations for the normal distribution. For the other two distributions, k is a scale
parameter. The standard deviation is obtained as k / 3 for the uniform distribution and k / 6 for
the triangular distribution. When the parameters are correlated, the implied covariance matrix is
adjusted accordingly. The correlation matrix is unchanged by this.
Results saved by this estimator are:

Matrices: b = estimate of 
varb = asymptotic covariance matrix for estimate of .
beta_i = individual specific parameters, if ; Par is requested.

Scalars: kreg = number of variables in Rhs


nreg = number of observations
logl = log likelihood function

Last Model: b_variables

Last Function: None


E65: Data Envelopment Analysis E-113

Standard Model Specifications for the Stochastic Frontier Random Parameters


Model
This is the full list of general specifications that are applicable to this model estimator.

Controlling Output from Model Commands

; Par keeps individual specific parameter estimates.


; Table = name saves model results to be combined later in output tables.

Robust Asymptotic Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),
same as ; Printvc.
; Robust requests a „sandwich‟ estimator or robust covariance matrix for TSCS
and several discrete choice models.

Optimization Controls for Nonlinear Optimization

; Tlg [ = value] sets convergence value for gradient.


; Tlf [ = value] sets convergence value for function.
; Tlb[ = value] sets convergence value for parameters.
; Alg = name requests a particular algorithm, Newton, DFP, BFGS, etc.
; Maxit = n sets the maximum iterations.
; Output = n requests technical output during iterations; the level „n‟ is 1, 2, 3 or 4.
; Set keeps current setting of optimization parameters as permanent.

Predictions and Residuals

; List displays a list of fitted values with the model estimates.


; Keep = name keeps fitted values as a new (or replacement) variable in data set.
; Res = name keeps residuals as a new (or replacement) variable.

Hypothesis Tests and Restrictions

; Test: spec defines a Wald test of linear restrictions.


; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec.
; CML: spec defines a constrained maximum likelihood estimator.
; Rst = list specifies equality and fixed value restrictions.

Application
We continue the earlier application by fitting the stochastic frontier model with random
parameters. The random parameters truncation model appears to be unidentified in these data, so the
second model fit is with heteroscedasticity. In the first model, the constant and one of the production
coefficients is specified to be random. In the second, these two coefficients and the parameter on the
variable that enters the variance function are all taken to be random. The kernel density estimators
compare the efficiency estimates from the random parameters model to those from the simplest
pooled estimator.
E65: Data Envelopment Analysis E-114

The commands are:

NAMELIST ; x = one,lf,lm,le,ll,lp,lk $
FRONTIER ; Lhs = lq ; Rhs = x ; Eff = u $
FRONTIER ; Lhs = lq ; Rhs = x
; RPM ; Panel ; Pts = 50 ; Halton; Fcn = one(n),lf(n) ; Eff = urp1 $
KERNEL ; Rhs = urp1,u $
FRONTIER ; Lhs = lq ; Rhs = x $
FRONTIER ; Lhs = lq ; Rhs = x ; Hfn = one,loadfctr
; RPM ; Panel ; Pts = 50 ; Halton
; Fcn = one(n),lf(n),loadfctr<n> $
-----------------------------------------------------------------------------
Random Coefficients Frontier Model
Dependent variable LQ
Log likelihood function 161.33196
Restricted log likelihood .00000
Chi squared [ 2 d.f.] 322.66392
Significance level .00000
Estimation based on N = 256, K = 11
Inf.Cr.AIC = -300.7 AIC/N = -1.174
Model estimated: Aug 22, 2011, 23:28:18
Unbalanced panel has 25 individuals
Stochastic frontier (half normal model)
Simulation based on 50 Halton draws
Sigma( u) (1 sided) = .10598
Sigma( v) (symmetric) = .09399
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Production / Cost parameters, nonrandom first
LM| .81447*** .04526 18.00 .0000 .72577 .90317
LE| 1.16342*** .31391 3.71 .0002 .54817 1.77867
LL| -.33712*** .04111 -8.20 .0000 -.41769 -.25654
LP| .24213*** .04782 5.06 .0000 .14841 .33585
LK| -.94502*** .35520 -2.66 .0078 -1.64119 -.24886
|Means for random parameters
Constant| -1.89056*** .33140 -5.70 .0000 -2.54009 -1.24103
LF| .21430*** .05277 4.06 .0000 .11088 .31773
|Scale parameters for dists. of random parameters
Constant| .12526*** .00926 13.53 .0000 .10711 .14341
LF| .04979*** .00823 6.05 .0000 .03366 .06592
|Variance parameter for v +/- u
Sigma| .14165*** .01265 11.20 .0000 .11686 .16645
|Asymmetry parameter, lambda
Lambda| 1.12768*** .42335 2.66 .0077 .29792 1.95743
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

Figure E64.11 shows the distributions of the estimates of inefficiencies from the random parameters
model and the simple, pooled fixed parameters model. The figure suggests that the RP formulation
is moving some of the variation of the outcome variable out of the inefficiency term and into the
production model, in the form of parameter variation.
E65: Data Envelopment Analysis E-115

Figure E64.11 Kernel Density Estimator for Random Parameters Model Inefficiencies

-----------------------------------------------------------------------------
Random Coefficients FrntrTrn Model
Dependent variable LQ
Log likelihood function 199.14429
Estimation based on N = 256, K = 13
Unbalanced panel has 25 individuals
Stochastic frontier, truncation/hetero.
Simulation based on 50 Halton draws
Estimated parameters of efficiency dstn
s(u) = .189842 s(v)= .07165
avgE[u|e]= .10986 avgE[TE|e]= .90303
Lambda = su/sv = 2.64974
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LQ| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Nonrandom parameters
LM| .62243*** .04223 14.74 .0000 .53966 .70521
LE| .38353 .28063 1.37 .1717 -.16649 .93355
LL| -.36579*** .03589 -10.19 .0000 -.43614 -.29544
LP| .15282*** .04217 3.62 .0003 .07017 .23547
LK| -.16125 .31392 -.51 .6075 -.77652 .45401
suONE| 9.05239*** 1.65934 5.46 .0000 5.80014 12.30464
|Means for random parameters
Constant| -1.17144*** .29799 -3.93 .0001 -1.75549 -.58739
LF| .49011*** .04904 9.99 .0000 .39398 .58623
suLOADFC| -16.4160*** 3.47560 -4.72 .0000 -23.2281 -9.6039
|Scale parameters for dists. of random parameters
Constant| .12591*** .00859 14.65 .0000 .10906 .14275
LF| .01186** .00593 2.00 .0456 .00023 .02350
suLOADFC| 1.47653*** .36192 4.08 .0000 .76718 2.18589
|Sigma(v) from symmetric disturbance.
Sigma(v)| .07165*** .00670 10.69 .0000 .05851 .08478
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-116

E64.10 Alvarez et al. – Fixed Management Model


Alvarez, Arias and Greene (2006) suggested a production model in which an unobserved
factor enters as a latent variable. The core production model is

yit = f(xit,1,xit,2,...,xit,K, mi)

where the unobservable, time invariant factor, „mi‟ is labeled „management‟ in their paper. By
treating the unobserved factor as a random component in the model, the authors develop a stochastic
frontier model in which the resultant functional form is such that all random parameters are functions
of the same single random effect, vi, and the vi appears in squared form in the equation as well. In
generic terms, this model is a random parameters stochastic frontier model with random constant
term and first order terms, and nonrandom second order terms in a translog model. The functional
form is
log yit  i   k 1 k ,i ln xit ,k   k 1  m 1  km ln xit ,k ln xit ,m  vit  uit
K K K

i     wi   ( 12 wi2 )
k ,i  k   k wi
wi ~ N [0,1]
vit ~ N [0, v2 ]
uit | N [0, u2 ] |

This model is specified simply by creating the necessary variables, then building a random
parameters model with the two additional specifications,

; Common ; Mgt

The ; Common specification alone is generic, and applies to all random parameters models. Use it
to specify that the same random component appears in all random parameters. The ; Mgt
specification has no function outside the frontier model. It is used only with the frontier model to
specify this particular form. For example, consider the following three factor translog model:

FRONTIER ; Lhs = yit ; Rhs = one,x1,x2,x3,x11,x12,x13,x22,x23,x33 $


FRONTIER ; Lhs = yit ; Rhs = one,x1,x2,x3,x11,x12,x13,x22,x23,x33
; RPM ; Pds = the panel specification ; Halton
; Fcn = one(n),x1(n),x2(n),x3(n)
; Common ; Mgt $

(It is always necessary to fit the frontier model with fixed parameters first to generate the starting
values.)
E65: Data Envelopment Analysis E-117

An extension of this model that the authors considered was intended to ameliorate the
probable correlation between the random effect wi and the independent variables (factors). The
Mundlak approach to this problem is to incorporate the group means of the variables in the model.
For this model, they proposed

wi =  k=1 τ k log xi,k + fi


K

where fi is now the structural random variable that drives the random parameters. This extension is
requested with
; Means

(The program deduces internally which variables are nonconstant and should be used.)

Application

The following is the Alvarez, Arias and Greene application. The data consists of six years of
observations on 247 Spanish dairy farms. The output, yit is milk production. The four inputs, x1, x2,
x3 and x4 are feed, land, labor and cows. Commands for fitting the model are as follows: (We have
restricted the number of iterations and the number of replications for purpose of this numerical
illustration.) Both models (with and without the Mundlak adjustment) are shown.

FRONTIER ; Lhs = yit


; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44 ; Par $
FRONTIER ; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44
; RPM ; Halton ; Pts = 25 ; Pds = 6 ; Maxit = 25 ; Common ; Mgt
; Fcn = one(n),x1(n),x2(n),x3(n),x4(n) $
FRONTIER ; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44 ; Par $
FRONTIER ; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44
; RPM ; Halton ; Pts = 25 ; Pds = 6 ; Maxit = 25
; Common ; Mgt ; Means
; Fcn = one(n),x1(n),x2(n),x3(n),x4(n) $

The first set of results is the pooled stochastic frontier model with no extensions or
modifications.
E65: Data Envelopment Analysis E-118

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable YIT
Log likelihood function 851.16734
Estimation based on N = 1482, K = 15
Variances: Sigma-squared(v)= .00876
Sigma-squared(u)= .02831
Sigma(v) = .09359
Sigma(u) = .16825
Sigma = Sqr[(s^2(u)+s^2(v)]= .19253
Gamma = sigma(u)^2/sigma^2 = .76371
Var[u]/{Var[u]+Var[v]} = .54012
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 829.23705
Chi-sq=2*[LogL(SF)-LogL(LS)] = 43.861
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
YIT| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 11.6942*** .00529 2209.86 .0000 11.6838 11.7046
X1| .60483*** .02133 28.35 .0000 .56302 .64664
X2| .02246** .01140 1.97 .0489 .00011 .04480
X3| .02336* .01245 1.88 .0606 -.00104 .04776
X4| .44945*** .01172 38.34 .0000 .42647 .47242
X11| .59297*** .13525 4.38 .0000 .32789 .85806
X12| -.17183*** .04842 -3.55 .0004 -.26673 -.07693
X13| .20033*** .06903 2.90 .0037 .06502 .33563
X14| -.32993*** .07299 -4.52 .0000 -.47297 -.18688
X23| .00386 .04203 .09 .9268 -.07852 .08624
X24| .06473** .03009 2.15 .0314 .00576 .12369
X34| -.07096* .03853 -1.84 .0655 -.14648 .00455
X44| .20854*** .04328 4.82 .0000 .12373 .29336
|Variance parameters for compound error
Lambda| 1.79780*** .10292 17.47 .0000 1.59608 1.99951
Sigma| .19253*** .00011 1715.95 .0000 .19231 .19275
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-119

This is the fixed management model without the Mundlak correction.

+---------------------------------------------+
| Random Coefficients Frontier Model |
| Dependent variable YIT |
| Log likelihood function 1327.58807 |
| Estimation based on N = 1482, K = 21 |
| Sample is 6 pds and 247 individuals |
+---------------------------------------------+
-----------------------------------------------------------------------------
All parameters have the same random effect
Alvarez/Arias/Greene Fixed Mgt. SF Model
Stochastic frontier (half normal model)
Simulation based on 25 Halton draws
Sigma( u) (1 sided) = .09355
Sigma( v) (symmetric) = .05799
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
YIT| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Production / Cost parameters, nonrandom first
X11| .19550** .08392 2.33 .0198 .03101 .35999
X12| -.00410 .02903 -.14 .8876 -.06100 .05279
X13| -.03972 .04116 -.96 .3346 -.12039 .04095
X14| -.08681** .04220 -2.06 .0397 -.16952 -.00410
X23| .02377 .02534 .94 .3483 -.02590 .07344
X24| -.01893 .01743 -1.09 .2775 -.05310 .01524
X34| .02550 .02305 1.11 .2684 -.01967 .07067
X44| .09988*** .02339 4.27 .0000 .05403 .14572
|Means for random parameters
Constant| 11.6506*** .00445 2620.80 .0000 11.6418 11.6593
X1| .65048*** .01227 53.03 .0000 .62643 .67452
X2| .03525*** .00681 5.17 .0000 .02190 .04861
X3| .04531*** .00759 5.97 .0000 .03043 .06019
X4| .40147*** .00646 62.16 .0000 .38881 .41413
|Coefficients on unobservable fixed management
Constant| .12579*** .00238 52.96 .0000 .12114 .13045
X1| -.02248* .01218 -1.85 .0649 -.04635 .00139
X2| .00767 .00851 .90 .3676 -.00902 .02436
X3| .00794 .00939 .85 .3979 -.01047 .02635
X4| -.00967 .00657 -1.47 .1410 -.02255 .00320
Alpha_mm| -.02835*** .00414 -6.85 .0000 -.03646 -.02024
|Variance parameter for v +/- u
Sigma| .11007*** .00289 38.04 .0000 .10439 .11574
|Asymmetry parameter, lambda
Lambda| 1.61332*** .11959 13.49 .0000 1.37893 1.84771
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-120

+---------------------------------------------+
| Random Coefficients Frontier Model |
| Dependent variable YIT |
| Log likelihood function 1273.63070 |
| Sample is 6 pds and 247 individuals |
+---------------------------------------------+
-----------------------------------------------------------------------------
All parameters have the same random effect
Alvarez/Arias/Greene Fixed Mgt. SF Model
Stochastic frontier (half normal model)
Simulation based on 25 Halton draws
Sigma( u) (1 sided) = .12577
Sigma( v) (symmetric) = .05376
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
YIT| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Production / Cost parameters, nonrandom first
X11| -.06957 .08521 -.82 .4142 -.23658 .09743
X12| .00164 .02989 .05 .9562 -.05693 .06022
X13| .31592*** .04339 7.28 .0000 .23087 .40097
X14| -.08946* .04767 -1.88 .0606 -.18289 .00398
X23| -.02088 .02784 -.75 .4533 -.07545 .03369
X24| -.04357** .01912 -2.28 .0227 -.08103 -.00610
X34| -.15581*** .02350 -6.63 .0000 -.20187 -.10975
X44| .16310*** .02763 5.90 .0000 .10895 .21725
|Means for random parameters
Constant| 11.6829*** .00449 2601.72 .0000 11.6741 11.6917
X1| .60260*** .02198 27.41 .0000 .55951 .64569
X2| .05221*** .01636 3.19 .0014 .02015 .08427
X3| .10728*** .02775 3.87 .0001 .05290 .16166
X4| .39780*** .01047 38.00 .0000 .37728 .41832
|Coefficients on unobservable fixed management
Constant| .11398*** .00235 48.52 .0000 .10937 .11858
X1| -.05393*** .01134 -4.76 .0000 -.07616 -.03171
X2| .03061*** .00916 3.34 .0008 .01265 .04857
X3| .01309 .01202 1.09 .2760 -.01046 .03665
X4| .01621** .00707 2.29 .0218 .00236 .03007
Alpha_mm| -.03575*** .00368 -9.72 .0000 -.04296 -.02855
|Variance parameter for v +/- u
Sigma| .13678*** .00368 37.19 .0000 .12957 .14399
|Asymmetry parameter, lambda
Lambda| 2.33925*** .14491 16.14 .0000 2.05524 2.62326
|Variable Means in Unobserved Management
X1_bar| -.12466 .22073 -.56 .5722 -.55728 .30796
X2_bar| .00045 .15758 .00 .9977 -.30839 .30930
X3_bar| .01632 .25437 .06 .9489 -.48224 .51487
X4_bar| .15107 .11332 1.33 .1825 -.07102 .37316
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-121

E64.11 Latent Class Stochastic Frontier Models


The latent class framework discussed in Chapter E20 is available for the stochastic frontier
model. The structural equations of the basic model are

yit | j =  jxit + vit - uit,


vi | j = N[0, vj2]
ui | j = | N[uj2] |

where „j‟ indicates class j. The truncation and heteroscedasticity models are not supported by this
estimator. However, the Battese and Coelli model, in which

uit | j = g(zit)| j |Ui|

is available for both forms of g(zit).


The estimation command for the latent class stochastic frontier model is

FRONTIER ; Lhs = dependent variable


; Rhs = one, remaining variables ; Parameters $
FRONTIER ; Lhs = dependent variable
; Rhs = one, remaining variables
; Pds = fixed periods or count variable
; LCM ; Pts = number of classes (2, 3, ..., 9) $

(As in other panel data settings, it is necessary to fit the pooled model first to compute the starting
values.)
The Battese and Coelli models may be specified here with

; Model = BC

for the decay model and

; Model = BC
; Hfu = one, heteroscedasticity variables

For this model, you must fit the identical Battese and Coelli model without the latent class
specification first. The application below demonstrates.
The basic form of the latent class model assumes that the class probabilities are fixed values.
You may make them dependent on time invariant variables, wi with

; LCM = list of variables in w

Do not include one in the list.


E65: Data Envelopment Analysis E-122

Some particular variables computed for the latent class model are

; Group = the index of the most likely latent class


; Cprob = estimated probability for the most likely latent class

You can obtain a listing of these two results by using

; List

An example appears below. You can also use the ; Rst = list option to structure the latent class
model so that different variables appear in different classes or that certain coefficients are equal
across classes. Examples are given in Chapter E20.
Estimates retained by this model include:

Matrices: b = full parameter vector, [ 111,  2,22, ... F1,...,FJ]


varb = full covariance matrix
beta_i = individual specific parameters, if ; Par is requested

Note that b and varb involve J(K+2) estimates. Two additional matrices are created,

b_class = a JK matrix with each row equal to the corresponding  j


class_pr = a J1 vector containing the estimated class probabilities

Scalars: kreg = number of variables in Rhs list


nreg = total number of observations used for estimation
logl = maximized value of the log likelihood function
exitcode = exit status of the estimation procedure

Standard Model Specifications for the Latent Class Stochastic Frontier Model

This is the full list of general specifications that are applicable to this model estimator.

Controlling Output from Model Commands

; Par keeps individual specific parameter estimates.


; Partial Effects displays marginal effects, same as ; Marginal Effects.
; OLS displays least squares starting values when (and if) they are computed.
; Table = name saves model results to be combined later in output tables.

Robust Asymptotic Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),
same as ; Printvc.
; Robust requests a „sandwich‟ estimator or robust covariance matrix for TSCS and
several discrete choice models.
E65: Data Envelopment Analysis E-123

Optimization Controls for Nonlinear Optimization

; Start = list gives starting values for a nonlinear model.


; Tlg [ = value] sets convergence value for gradient.
; Tlf [ = value] sets convergence value for function.
; Tlb[ = value] sets convergence value for parameters.
; Alg = name requests a particular algorithm, Newton, DFP, BFGS, etc.
; Maxit = n sets the maximum iterations.
; Output = n requests technical output during iterations; the level „n‟ is 1, 2, 3 or 4.
; Set keeps current setting of optimization parameters as permanent.

Predictions and Residuals

; List displays a list of fitted values with the model estimates.


; Keep = name keeps fitted values as a new (or replacement) variable in data set.
; Res = name keeps residuals as a new (or replacement) variable.
; Fill fills missing values (outside estimating sample) for fitted values.

Hypothesis Tests and Restrictions

; Test: spec defines a Wald test of linear restrictions.


; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec.
; CML: spec defines a constrained maximum likelihood estimator.
; Rst = list specifies equality and fixed value restrictions.

Application

The airline data used in the preceding examples are clearly not compatible with this model;
no configuration of the equation produces meaningful results. To illustrate the estimator, we have
borrowed the Spanish dairy data used in the previous section. The following commands fit a two
class, Battese and Coelli decay model.

NAMELIST ; x = one,x1,x2,x3,x4 $
FRONTIER ; Lhs = yit ; Rhs = x
; Model = BC
; Pds = 6 $
FRONTIER ; Lhs = yit ; Rhs = x
; Model = BC
; LCM ; Pts = 2 ; Pds = 6 ; List $
E65: Data Envelopment Analysis E-124

These are the initial results from the first command.

-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable YIT
Log likelihood function 1390.20024
Stochastic frontier based on panel data
Estimation based on 247 individuals
Variances: Sigma-squared(v)= .00549
Sigma-squared(u)= .03940
Sigma(v) = .07413
Sigma(u) = .19848
Sigma = Sqr[(s^2(u)+s^2(v)]= .21187
Gamma = sigma(u)^2/sigma^2 = .87759
Var[u]/{Var[u]+Var[v]} = .72263
Stochastic Production Frontier, e = v-u
Battese-Coelli Models: Time Varying uit
Time dependent uit=exp[-eta(t-T)]*|U(i)|
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 809.67610
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1161.048
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
YIT| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 11.7882*** .00716 1646.05 .0000 11.7742 11.8022
X1| .62230*** .01365 45.59 .0000 .59555 .64905
X2| .06001*** .01069 5.61 .0000 .03905 .08096
X3| .05708*** .01454 3.93 .0001 .02858 .08557
X4| .35510*** .00700 50.69 .0000 .34137 .36883
|Variance parameters for compound error
Lambda| 2.67761*** .02351 113.88 .0000 2.63152 2.72369
Sigma(u)| .19848*** .00060 332.72 .0000 .19731 .19965
|Eta parameter for time varying inefficiency
Eta| .08030*** .00432 18.60 .0000 .07184 .08877
--------+--------------------------------------------------------------------
E65: Data Envelopment Analysis E-125

Warning 141: Iterations:current or start estimate of sigma is nonpositive


Normal exit from iterations. Exit status=0.
-----------------------------------------------------------------------------
Latent Class / Panel Frontier Model
Dependent variable YIT
Log likelihood function 1462.93500
Estimation based on N = 1482, K = 17
Sample is 6 pds and 247 individuals
Stoch. frontier (B&C,time varying U)
Ineff=u(i,t)=exp(-eta*(t-T))|U(i)|
Model fit with 2 latent classes.
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
YIT| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Model parameters for latent class 1
Constant| 11.8355*** .02201 537.84 .0000 11.7923 11.8786
X1| .60324*** .03499 17.24 .0000 .53467 .67181
X2| .13327*** .04014 3.32 .0009 .05459 .21195
X3| .10581*** .03248 3.26 .0011 .04216 .16947
X4| .33560*** .01392 24.11 .0000 .30832 .36288
|Square root of variance sum, sqr(s2u + s2v)
Sigma| .71161** .35935 1.98 .0477 .00730 1.41591
|Asymmetry parameter in compound distn, su/sv
Lambda| .02071 .02565 .81 .4194 -.02956 .07098
|Scale factor in time varying inefficiency
Eta| .19551*** .01986 9.84 .0000 .15658 .23444
|Model parameters for latent class 2
Constant| 11.7611*** .01279 919.62 .0000 11.7360 11.7862
X1| .61866*** .01873 33.04 .0000 .58196 .65536
X2| .05041*** .01289 3.91 .0001 .02514 .07567
X3| .06232*** .01830 3.40 .0007 .02645 .09820
X4| .30614*** .01029 29.76 .0000 .28598 .32631
|Square root of variance sum, sqr(s2u + s2v)
Sigma| .92839*** .02938 31.60 .0000 .87081 .98597
|Asymmetry parameter in compound distn, su/sv
Lambda| .05084 .22185 .23 .8187 -.38398 .48566
|Scale factor in time varying inefficiency
Eta| .07059*** .00475 14.87 .0000 .06129 .07990
|Estimated prior probabilities for class membership
Class1Pr| .30612*** .05178 5.91 .0000 .20463 .40760
Class2Pr| .69388*** .05178 13.40 .0000 .59240 .79537
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-126

+---------------------------------------------------+
| Stochastic Frontier Model Variance Parameters |
| Class Lambda Sigma Sigma(u) Sigma(v) |
| 1 .020709 .711607 .014734 .711454 |
| 2 .050840 .928393 .047139 .927195 |
+---------------------------------------------------+
=============================================================================
Predictions computed for the group with the largest posterior probability
Obs. Periods Estimated inefficiencies, E[u|v -/+ u]
=============================================================================
Ind.= 1 J* = 1 P(j)= .889 .111
01-06 .3105 .2554 .2100 .1727 .1421 .1168
Ind.= 2 J* = 2 P(j)= .295 .705
01-06 .0813 .0757 .0706 .0657 .0613 .0571
Ind.= 3 J* = 2 P(j)= .012 .988
01-06 .2254 .2100 .1957 .1824 .1699 .1584
Ind.= 4 J* = 1 P(j)= .955 .045
01-06 .1778 .1463 .1203 .0989 .0814 .0669
Ind.= 5 J* = 1 P(j)= .650 .350
01-06 .2453 .2018 .1659 .1365 .1122 .0923
Ind.= 6 J* = 2 P(j)= .138 .862
01-06 .0517 .0482 .0449 .0418 .0390 .0363
Ind.= 7 J* = 1 P(j)= .985 .015
01-06 .3010 .2476 .2036 .1674 .1377 .1132
Ind.= 8 J* = 2 P(j)= .165 .835
01-06 .0561 .0523 .0487 .0454 .0423 .0394
Ind.= 9 J* = 2 P(j)= .450 .550
01-06 .0134 .0125 .0116 .0108 .0101 .0094
Ind.= 10 J* = 1 P(j)= .999 .001
01-06 .1039 .0855 .0703 .0578 .0475 .0391
(Farms 11-247 omitted)
E65: Data Envelopment Analysis E-127

E65: Data Envelopment Analysis


E65.1 Introduction
There are two broad paradigms used by researchers to analyze efficiency in production,
stochastic frontier analysis (SFA) and data envelopment analysis (DEA). No formulation has yet
been devised that unifies SFA and DEA in a single analytical framework. Arguably, the former is a
fully parameterized model whereas the latter is „nonparametric,‟ albeit also atheoretical in nature.
DEA is currently the conventional approach to deterministic frontier estimation. This is usually
handled with linear programming techniques. The analysis assumes that there is a frontier
technology (in the same spirit as the stochastic frontier production model) that can be described by a
piecewise linear hull that envelopes the observed outcomes. Some (efficient) observations will be on
the frontier while other (inefficient) individuals will be inside. The technique produces a
deterministic frontier that is generated by the observed data, so by construction, some individuals are
„efficient.‟ This is one of the fundamental differences between DEA and SFA. This chapter presents
LIMDEP‟s programs for data envelopment analysis (DEA).

E65.2 Data Envelopment Analysis


Stochastic frontier modeling is based on maximum likelihood or other classical or Bayesian,
parametric econometric techniques. In contrast, DEA is based on nonparametric, linear programming
methods. Both paradigms are based on an underlying construct of the efficient production frontier that
relates maximal output to inputs for the „firm‟ (decision making unit, or DMU). Using SFA methods,
the analyst defines, then estimates a continuous, regular relationship that defines the frontier. DEA
uses linear programming methods to fit a piecewise linear „hull‟ around the data, under the assumption
that the hull adequately approximates the underlying frontier, the more so as the number of
observations increases. (Since the technique is nonstatistical, this is difficult to establish analytically.)
There is a vast literature on the two techniques and comparisons, none of which will be reviewed here.
Our purpose here is only to document the estimator. We recommend, as a departure point in the
literature, a working paper by Coelli (1996a), which describes the techniques documented here and
introduces some of the theoretical notions. He also provides several useful citations.

E65.2.1 Input and Output Oriented Efficiency


The discussion of DEA efficiency measurement begins with the notion of a measure of the
ratio of outputs to inputs for firm „i,‟
Ratioi = yi / xi, i = 1,..,N,
where yi is the vector of M outputs and xi is the vector of K inputs. The optimal weights are defined
by the programming problem,
Maximize wrt ,: yi / xi
Subject to ys / xs < 1, s = 1,...,N
m > 0, m = 1,...,M
k > 0, k = 1,...,K
E65: Data Envelopment Analysis E-128

The optimization program seeks the optimal weights to maximize the „efficiency‟ of firm s subject to
the restriction that the efficiencies of all firms are less than or equal to one, and that all weights are
nonnegative. Because the objective function is homogeneous of degree zero – any multiple of the
weights produces the same solution – it is normalized with a restriction such as xi = 1.
Transforming and simplifying the problem a bit produces the equivalent program,

Maximize wrt ,: yi


Subject to xi = 1
ys - xs < 0, s = 1,...,N
>0
>0

An equivalent form of the problem is the envelopment form (hence the name),

Minimize wrt i, : i


Subject to s sys – yi > 0
i xi -  sxs > 0
s > 0.

The value of i is the input oriented technical efficiency score for the ith firm

TEINPUT,i = i.

It measures the extent to which the firm could reduce inputs to obtain the same output – relative to
other firms in the sample. Note that the program is solved for each firm in the sample – an efficiency
score i is generated for each firm. For some firms in the sample, the efficiency score will be 1.0.
This indicates firms deemed to be technically efficient. Otherwise, i < 1.
The preceding formulation includes an implicit assumption of constant returns to scale
(CRS). The assumption is relaxed to variable returns to scale (VRS), by adding a restriction

s s = 1.

Variable returns to scale is the standard assumption in contemporary applications. This provides a
means by which the „scale efficiency‟ of the firm can be measured. Let iC denote the technical
efficiency measure obtained assuming constant returns and iV be the variable returns to scale
counterpart. Then, the „scale efficiency‟ may be measured by

SEi = iC / iV.

This can be computed using the results of the two different programs after computation. A
„nonincreasing returns to scale‟ (NRS) version of the program can be obtained by changing the adding
up restriction to
s s < 1.
E65: Data Envelopment Analysis E-129

An alternative view of the optimization process is to consider the extent to which outputs
could conceivably be increased using the same inputs – again relative to the standard of other firms
in the sample. The linear program which produces this solution is

Maximize wrt i, : i


Subject to s sys – i yi > 0
xi -  sxs > 0
s > 0.
Once again, this assumes constant returns to scale. The variable returns to scale form is obtained by
adding the constraint ss = 1. In this solution, 1 < i < ∞. The technical efficiency measure is

0 < TEOUTPUT,i = 1/i < 1


As before, some firms in the sample (the same firms) will be found to be technically efficient by this
output oriented efficiency measure.

E65.2.2 Economic and Allocative Efficiency


With input price information, wi, (and assuming cost minimization) a cost minimization
program to find the optimal inputs given the input prices is

Minimize wrt i, : wi i


Subject to s sys – yi > 0
i -  sxs > 0
s > 0.

As before, to allow for variable returns to scale (VRS), we add s s = 1. In this program, i gives the
cost minimizing vector of inputs for output yi and input prices wi. The cost efficiency for the ith firm is
then the ratio
0 < CEi = wii / wixi < 1.
Allocative efficiency may be measured using
0 < AEi = CEi / TEINPUT,i < 1.

E65.2.3 Solutions to the Optimization Problems


We note briefly the mathematical form of LIMDEP‟s solutions to the linear programs above.
The programming problem is defined in terms of

 Activity vector,  = the solution vector


 Coefficient vector, c so that the objective function is c
 Constraint matrix, A
 Lower and upper limits for constraints, bL and bU
 Lower and upper limits for activities, dL and dU
E65: Data Envelopment Analysis E-130

The linear program solution, in general is, then,

Optimize wrt : c


Subject to bL < A < bU
dL <  < dU.

We will define the components for the three programs defined earlier. Note, first, for convenience,
we define the data matrices, Y and X. Y is an NM matrix of outputs whose ith row is the vector of
outputs for firm i; X is the NK matrix of inputs, defined likewise. For an individual firm, we define
yi to the M1 column vector of outputs for firm i; thus, yi is the transpose of the ith row of Y.
Likewise, xi is the column vector of K inputs for firm i, the transpose of the ith row of X. Finally,
the column vector of weights is  = (1,...,N). Thus,

s s ys = Y and s s xs = X.

Finally, we note once again, the programs about to be defined are solved for each firm to obtain the
efficiency scores. (In fact,  should be indexed by firm, since it is recomputed each time. For
convenience, we have omitted this subscript.) We use the symbol ∞K and ∞M to indicate a vector
whose each element equals infinity (or sometimes minus infinity) and boldface 1 or 0 to indicate a
vector of ones or zeros with a subscript to indicate the number of elements. Finally, our tableaus
include the VRS restriction, which may be suppressed by the user for the CRS form.
With all this in place, we can define the solutions to the optimization problems just by
identifying the components of the linear programming problems. These are as follows:

Input Oriented Technical Efficiency

0  0    1 
d L =  N  , c =  N  ,  =   , dU =  N 
0 1 i  1
- K   X -xi   0K 
b L =  y i  , A =  Y 0 M  , bU =   M 
   
 1  1N 0   1 

Output Oriented Technical Efficiency

0  0    1 
d L   N  , c   N  ,     , dU   N 
1 1 i  
  K   X 0 K   xi 
b L   0M  , A   Y -y i  , bU    M 
   

 1  1N 0   1 
E65: Data Envelopment Analysis E-131

Allocative Efficiency

0  0    1 
d L   N  , c   N  ,     , dU   N 
0K  wi  i    
  K   X -I K   0K 
b L   -y i  , A   Y 0M K  , bU    M 
   
 1  1N 0K   1 

One final note, DEA requires a fair amount of computation. The linear program involves
M+K+1 constraints and N+1 activities, and it is computed once for each of the N firms in the sample.
The amount of computation increases with the square of N. The particular computations are quite
fast, however

E65.3 Confidence Limits for Efficiency Scores


A major shortcoming of the DEA approach to modeling production is the absence of a
statistical underpinning. One approach that has been used to try to produce some statistical
characterization of the estimator is to use bootstrapping to obtain confidence limits for the estimated
efficiency scores. A popular method used is that of Simar and Wilson (1998). In brief, their method
amounts to the following: We have in hand for each firm a i estimated using the linear program
defined above. To carry out the bootstrap, we use the following experiment. The data on xm for all
firms, including this one, are proportionally scaled using a randomly generated (see their paper for
the algorithm) scale factor, i/mb for replication b. Then, i,b is recomputed using the revised data,
with the same method. The experiment is repeated B times. The 5th and 95th percentiles of the B
observations provide the confidence limits. This is repeated B times for each firm. To obtain
bootstrapped confidence use the command syntax described below, with the simple addition of the
request for the number of bootstrap replications.
It should be noted, bootstrapping adds considerably to the amount of computation. In
general, the analysis requires the computation of 2N linear programs, two for each firm, to compute
the input and output oriented efficiency scores, plus one more if input prices are supplied for the
allocative efficiency computation. Bootstrapping adds BN more programs. Each program involves
N+1 activities and K+M+1 constraints, so overall, the amount of computation is considerable.
Nonetheless, each component of each linear program is very fast. In the example below, we have
123 observations. We requested 50 bootstrap replications, so we computed altogether 53123 =
6,519 programs, each with 123 activities. The LP computations plus all the ancillary computations
and the display took altogether only 3.84 seconds on our desktop computer.
E65: Data Envelopment Analysis E-132

E65.4 Command Structure


The command for the data envelopment analysis routine is simply

FRONTIER ; Lhs = output variables


; Rhs = input variables (will never include one)
; Alg = DEA $

The following is the full list of specifications for this command.


The default specification uses the variable returns to scale form. If you wish to use the
constant returns to scale form, add

; CRS

to the command. The nonincreasing returns to scale form (Σi i < 1) is requested with

; NRS

If you wish to analyze input price data, add

; Rh2 = input price variables

The program computes the DEA efficiency scores (input and output oriented, and economic
efficiency), and stores them as variables and as matrices. (See the description in the next section.) If
you would like to see a listing of the scores on your screen, in the output window, add

; List

to the command. The list of „peer‟ firms for each observation (see Section E65.5.1 below) may be
requested by adding
; Peers

to the command. Finally, to obtain bootstrapped confidence limits for the estimator, add

; Nbt = the desired number of replications


E65: Data Envelopment Analysis E-133

E65.5 DEA Results


This estimator by default computes both the input and output oriented technical efficiency
scores. Descriptive statistics for the results are the visible output from the estimator. The following
shows an example, using the sample of 1,482 observations on Spanish dairy farms that was
examined in Section E64. This is a one output, four input process.

FRONTIER ; Lhs = milk


; Rhs = cows,land,labor,feed
; Alg = DEA $

+---------------------------------------------------------------------------+
| Data Envelopment Analysis |
| Output Variables: MILK |
| Input Variables: COWS LAND LABOR FEED |
| Underlying Technology assumes VARIABLE Returns to Scale. |
+---------------------------------------------------------------------------+
| Estimated Efficiencies: Mean Std.Deviation Minimum Maximum |
| Technical Efficiency ======= ============= ======= ======= |
| Input Oriented .8301 .1416 .4823 1.0000 |
| Output Oriented .7388 .1268 .3875 1.0000 |
| Sample Size: 1482 Observations. 1482 Complete observations |
| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E |
| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE |
| Incomplete observations are filled with zeros for efficiency values. |
+---------------------------------------------------------------------------+

As noted, the computed efficiency scores are saved in two places, in the data area, as variables
deaeff_i and deaeff_o and deaeff_e if you provide input prices for the economic efficiency analysis.
The same results are saved as matrices, dea_effo, dea_effi, dea_effe. Note that in both occurrences,
the estimator is bypassing missing and bad (nonpositive) data. If any of the variables used in the
analysis are missing, the observation is assigned an efficiency score of 0.0. The matrices will have
row dimension equal to the original sample size, before the bypass of missing values.
The example below includes a listing of the efficiency scores. The observation identifier
shows I = the sequence number of the observation used in the analysis. The R = value shows,
instead, the actual location of the observation in the raw data set. I will not equal R if you have used
a subset of the data (e.g., with SAMPLE or REJECT), or if the program has bypassed missing data
– the listing will only show the complete observations. If you have included observation labels, e.g.,
firm names, in your data set, these observation and row identifiers will be replaced with the
observation names for your data set.
For a second example, the following analyzes the Christensen and Greene (1976) electricity
generation data. For these data, we have the input prices, so we do the full analysis.

FRONTIER ; Alg = DEA ; List ; Nbt = 50


; Lhs = output
; Rhs = labor,capital,fuel
; Rh2 = lprice,cprice,fprice $
E65: Data Envelopment Analysis E-134

+---------------------------------------------------------------------------+
| Data Envelopment Analysis |
| Output Variables: OUTPUT |
| Input Variables: LABOR CAPITAL FUEL |
| Price Variables: LPRICE CPRICE FPRICE |
| Underlying Technology assumes VARIABLE Returns to Scale. |
+---------------------------------------------------------------------------+
| Estimated Efficiencies: Mean Std.Deviation Minimum Maximum |
| Technical Efficiency ======= ============= ======= ======= |
| Input Oriented .7692 .1390 .3464 1.0000 |
| Output Oriented .7657 .1467 .2960 1.0000 |
| Economic Efficiency .4331 .1965 .1411 1.0000 |
| Allocative Effic. .5473 .1754 .1796 1.0000 |
| Sample Size: 123 Observations. 123 Complete observations |
| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E |
| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE |
| Incomplete observations are filled with zeros for efficiency values. |
| Compute allocative efficiency as technical divided by economic efficiency |
+---------------------------------------------------------------------------+

Estimated Efficiency Values for Individual Decision Making Units


(Results are listed only for complete observations)
===============================================================================
Observation | Input Oriented| Output Oriented| Economic | Allocative
Sample Data | Rank Value| Rank Value| Rank Value| Rank Value
================+===============+================+===============+=============
I= 1 R= 1| 1 1.00000| 1 1.00000| 1 1.00000| 1 1.00000
I= 2 R= 2| 13 .98446| 16 .92501| 53 .43644| 87 .44333
I= 3 R= 3| 16 .96243| 28 .88393| 119 .17287| 123 .17962
I= 4 R= 4| 46 .79469| 83 .73593| 96 .29127| 103 .36652
I= 5 R= 5| 115 .57426| 118 .44224| 47 .44703| 15 .77845
I= 6 R= 6| 120 .44307| 122 .35608| 103 .26194| 43 .59120
I= 7 R= 7| 80 .73356| 100 .64826| 101 .26996| 102 .36801
I= 8 R= 8| 123 .34637| 123 .29601| 121 .15388| 85 .44425
I= 9 R= 9| 106 .62517| 110 .57829| 109 .21689| 111 .34692
I= 10 R= 10| 103 .63852| 107 .59578| 66 .38812| 39 .60783
(Remaining observations are omitted.)
----------------------------------------------------------------------------
Results of Bootstrap analysis of technical efficiency. 50 replications
----------------------------------------------------------------------------
Technical Estimated Corrected Standard Confid. Limits
Observation_____ Efficiency Bias Tech.Eff. Deviation Lower Upper
I= 1 R= 1 1.0000 .0000 1.0000 .0000 1.0000 1.0000
I= 2 R= 2 .9845 -.0634 1.0479 .1008 .6583 1.0000
I= 3 R= 3 .9624 -.0898 1.0522 .1391 .5023 1.0000
I= 4 R= 4 .7947 .1091 .6856 .0953 .7222 1.0000
I= 5 R= 5 .5743 .3006 .2737 .1215 .6007 1.0000
I= 6 R= 6 .4431 .4318 .0113 .1246 .5785 1.0000
I= 7 R= 7 .7336 .1086 .6250 .1131 .6609 1.0000
I= 8 R= 8 .3464 .5317 -.1853 .0979 .6977 1.0000
I= 9 R= 9 .6252 .2154 .4097 .1265 .5131 1.0000
I= 10 R= 10 .6385 .2267 .4118 .1062 .6645 1.0000
E65: Data Envelopment Analysis E-135

It is always interesting to compare the DEA results with those obtained using the stochastic
frontier model. The following fits a translog stochastic frontier production function for the
Christensen and Greene data, computes the technical efficiencies, and plots them against the DEA
efficiency scores. As has been widely documented, the results are not so close to each other as one
might hope.

FRONTIER ; Lhs = logq


; Rhs = one,logcap,loglabor,logfuel,
loglsq,logksq,logfsq,logklogl,logklogf,logllogf
; Techeff = tesf $
PLOT ; Lhs = tesf ; Rhs = deaeff_i
; Grid ; Title = DEA Efficiencies vs. Stochastic Frontier JLMS $

Figure E65.1 Comparison of SFA and DEA Efficiency Estimates

E65.5.1 Analysis of Peers


Part of the solution for the technical efficiency is the set of activity multipliers, λi,m for the ith
firm. The vector of N values, λi,m will give the weights that produce the point on the efficient frontier
for this firm. The firms with nonzero values of λi,m – there will typically only be a few or one of them –
will define the „peers‟ for firm i. The listing of the peer firms can be requested by adding ; Peers to the
command. The first few observations for the sample above are shown below.
===============================================================================
Peers - By Firm
===============================================================================
Firm Orient. TechEff Peers
--------------------- ------- ------- --------------------------------------
1 Inputs 1.00000 3 14 101
Outputs 1.00000 1 14 101
2 Inputs .98446 4 71
Outputs .92501 1 71
3 Inputs .96243 3 71
Outputs .88393 1 71
4 Inputs .79469 4 14
Outputs .73593 1 14
5 Inputs .57426 4 71 118
Outputs .44224 1 71
E65: Data Envelopment Analysis E-136

E65.5.2 Application
The following uses all the features of the routine save for the Malmquist TFP computation
and the allocative efficiency routine. The sample data are in an Excel spreadsheet:

IMPORT ; File = … testdea.csv $


FRONTIER ; Lhs = cameras,video,warranty
; Rhs = floor,staff
; Alg = DEA ; CRS
; Peers
; Nbt = 50 $

Figure E65.2 Sample Data for Data Envelopment Analysis

+---------------------------------------------------------------------------+
| Data Envelopment Analysis |
| Output Variables: CAMERAS VIDEO WARRANTY |
| Input Variables: FLOOR STAFF |
| Underlying Technology assumes CONSTANT Returns to Scale. |
+---------------------------------------------------------------------------+
| Estimated Efficiencies: Mean Std.Deviation Minimum Maximum |
| Technical Efficiency ======= ============= ======= ======= |
| Input Oriented .9132 .1270 .6387 1.0000 |
| Output Oriented .9132 .1270 .6387 1.0000 |
| Sample Size: 11 Observations. 11 Complete observations |
| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E |
| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE |
| Incomplete observations are filled with zeros for efficiency values. |
+---------------------------------------------------------------------------+
E65: Data Envelopment Analysis E-137

Estimated Efficiency Values for Individual Decision Making Units


===============================================================================
Observation | Input Oriented| Output Oriented| Economic | Allocative
Sample Data | Rank Value| Rank Value| Rank Value| Rank Value
================+===============+================+===============+=============
Bury | 9 .79126| 9 .79126| 0 .00000| 0 .00000
London | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000
Glasgow | 7 .95227| 7 .95227| 0 .00000| 0 .00000
Bath | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000
Chippenham | 11 .63869| 11 .63869| 0 .00000| 0 .00000
Liverpool | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000
Tunbridge | 8 .90635| 8 .90635| 0 .00000| 0 .00000
Leicester | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000
Malmesbury | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000
Kendal | 10 .75714| 10 .75714| 0 .00000| 0 .00000
Bristol | 1 1.00000| 1 1.00000| 0 .00000| 0 .00000
===============================================================================
Peers - By Firm
Firm Orient. TechEff Peers
--------------------- ------- ------- --------------------------------------
1 Bury Inputs .79126 6 11
Outputs .79126 6 11
2 London Inputs 1.00000 2
Outputs 1.00000 2
3 Glasgow Inputs .95227 2 6 11
Outputs .95227 2 6 11
4 Bath Inputs 1.00000 2 4 8 9
Outputs 1.00000 2 4
5 Chippenham Inputs .63869 6 11
Outputs .63869 6 11
6 Liverpool Inputs 1.00000 6 11
Outputs 1.00000 6
7 Tunbridge Inputs .90635 4 8 9
Outputs .90635 4 8 9
8 Leicester Inputs 1.00000 2 8 9
Outputs 1.00000 2 8
9 Malmesbury Inputs 1.00000 4 6 9
Outputs 1.00000 2 6 9
10 Kendal Inputs .75714 2 4
Outputs .75714 2 4
11 Bristol Inputs 1.00000 2 11
Outputs 1.00000 2 11
===============================================================================
----------------------------------------------------------------------------
Results of Bootstrap analysis of technical efficiency. 50 replications
----------------------------------------------------------------------------
Technical Estimated Corrected Standard Confid. Limits
Observation_____ Efficiency Bias Tech.Eff. Deviation Lower Upper
Bury .7913 .0404 .7509 .0374 .7931 .9074
London 1.0000 .0000 1.0000 .0000 1.0000 1.0000
Glasgow .9523 .0353 .9170 .0143 .9570 1.0000
Bath 1.0000 .0000 1.0000 .0000 1.0000 1.0000
Chippenham .6387 .0392 .5995 .0309 .6411 .7293
Liverpool 1.0000 .0000 1.0000 .0000 1.0000 1.0000
Tunbridge .9064 .0630 .8433 .0333 .9138 1.0000
Leicester 1.0000 .0000 1.0000 .0000 1.0000 1.0000
Malmesbury 1.0000 .0000 1.0000 .0000 1.0000 1.0000
Kendal .7571 .0389 .7183 .0551 .7614 .9307
Bristol 1.0000 .0000 1.0000 .0000 1.0000 1.0000
----------------------------------------------------------------------------
E65: Data Envelopment Analysis E-138

E65.6 Comparing Efficiency Values and Rankings – SFA vs. DEA


In many settings, the efficiency ratings themselves are less interesting than the ranks of the
observations. The WHO study used in numerous examples throughout this chapter is an example, in
which the objective of the efficiency analysis was to rank the countries in terms of their measured
efficiency. A perennial question in the efficiency analysis literature focuses on whether one obtains the
same qualitative results with the two methodologies. We return to the WHO data to provide an
illustration.
The data used are the country means of the output, dale, and two inputs, health expenditure, hexp,
and education, educ. After the raw data are input, we use the following

SAMPLE ; All $
REJECT ; Small > 0 $
CREATE ; dalebar = Group Mean(dale, Str = country) $
CREATE ; hexpbar = Group Mean(hexp, Str = country) $
CREATE ; educbar = Group Mean(educ, Str = country) $
REJECT ; year # 1997 $
CREATE ; logdbar = Log(dalebar) $
CREATE ; loghbar = Log(hexpbar) $
CREATE ; logebar = Log(educbar) $
FRONTIER ; Lhs = logdbar ; Rhs = one,loghbar,logebar ; Techeff = effsfa $
FRONTIER ; Lhs = dalebar ; Rhs = hexpbar,educbar ; Alg = DEA$
DSTAT ; Rhs = effsfa,deaeff_i,deaeff_o ; Output = 2 $
PLOT ; Lhs = effsfa ; Rhs = deaeff_i ; Grid
; Title = SFA Efficiencies vs. DEA Input Efficiencies $
PLOT ; Lhs = effsfa ; Rhs = deaeff_o ; Limits=.4,1.1 ; Grid
; Title = SFA Efficiencies vs. DEA Output Efficiencies $
CREATE ; sfarank = Rnk(effsfa) $
CREATE ; dearanki = Rnk(deaeff_i) $
CREATE ; dearanko = Rnk(deaeff_o) $
CALC ; List ; Rkc(sfarank,dearanki)
; Rkc(sfarank,dearanko)
; Rkc(dearanki,dearanko) $
PLOT ; Lhs = sfarank ; Rhs = dearanki
; Endpoints = 0,200 ; Limits = 0,200 ; Grid
; Title = Ranks of SFA Efficiencies vs. DEA Input Efficiencies $
PLOT ; Lhs = sfarank ; Rhs = dearanko
; Endpoints = 0,200 ; Limits = 0,200 ; Grid
; Title = Ranks of SFA Efficiencies vs. DEA Output Efficiencies $
E65: Data Envelopment Analysis E-139

Normal exit: 11 iterations. Status=0, F= -133.3834


-----------------------------------------------------------------------------
Limited Dependent Variable Model - FRONTIER
Dependent variable LOGDBAR
Log likelihood function 133.38343
Estimation based on N = 191, K = 5
Inf.Cr.AIC = -256.8 AIC/N = -1.344
Variances: Sigma-squared(v)= .00140
Sigma-squared(u)= .04405
Sigma(v) = .03744
Sigma(u) = .20989
Sigma = Sqr[(s^2(u)+s^2(v)]= .21320
Gamma = sigma(u)^2/sigma^2 = .96915
Var[u]/{Var[u]+Var[v]} = .91947
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u): 1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean: 0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0 114.81039
Chi-sq=2*[LogL(SF)-LogL(LS)] = 37.146
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+--------------------------------------------------------------------
| Standard Prob. 95% Confidence
LOGDBAR| Coefficient Error z |z|>Z* Interval
--------+--------------------------------------------------------------------
|Deterministic Component of Stochastic Frontier Model
Constant| 3.57889*** .04980 71.87 .0000 3.48129 3.67649
LOGHBAR| .06480*** .00824 7.86 .0000 .04864 .08096
LOGEBAR| .15292*** .01852 8.26 .0000 .11662 .18923
|Variance parameters for compound error
Lambda| 5.60534*** 1.46657 3.82 .0001 2.73091 8.47977
Sigma| .21320*** .00101 211.97 .0000 .21123 .21517
--------+--------------------------------------------------------------------
Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------

+---------------------------------------------------------------------------+
| Data Envelopment Analysis |
| Output Variables: DALEBAR |
| Input Variables: HEXPBAR EDUCBAR |
| Underlying Technology assumes VARIABLE Returns to Scale. |
+---------------------------------------------------------------------------+
| Estimated Efficiencies: Mean Std.Deviation Minimum Maximum |
| Technical Efficiency ======= ============= ======= ======= |
| Input Oriented .6138 .2089 .2059 1.0000 |
| Output Oriented .8794 .1124 .5061 1.0000 |
| Sample Size: 191 Observations. 191 Complete observations |
| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E |
| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE |
| Incomplete observations are filled with zeros for efficiency values. |
+---------------------------------------------------------------------------+

DSTAT ; Rhs = effsfa,deaeff_i,deaeff_o ; Output = 2 $


E65: Data Envelopment Analysis E-140

Descriptive Statistics
--------+---------------------------------------------------------------------
Variable| Mean Std.Dev. Minimum Maximum Cases Missing
--------+---------------------------------------------------------------------
EFFSFA| .882053 .059219 .801579 .982272 191 0
DEAEFF_I| .613836 .208905 .205870 1.0 191 0
DEAEFF_O| .879363 .112447 .506133 1.0 191 0
--------+---------------------------------------------------------------------

--------+--------------------------
Cor.Mat.| EFFSFA DEAEFF_I DEAEFF_O
--------+--------------------------
EFFSFA| 1.00000 .70610 .75911
DEAEFF_I| .70610 1.00000 .72559
DEAEFF_O| .75911 .72559 1.00000

Figure E65.3 Plot of SFA Efficiency Values vs. DEA Values


E65: Data Envelopment Analysis E-141

Figure E65.4 Plot of Ranks of SFA Efficiency Scores vs. Ranks of DEA Scores
E65: Data Envelopment Analysis E-142

E65.7 Malmquist Index of Total Factor Productivity


(Once again, the user is referred to the relevant literature, such as the numerous papers by Fare
and Grosskopf) for background details. Fare‟s 1994 output based Malmquist productivity change may be
written

TEi (t + 1 | t ) ×TEi (t + 1 | t + 1)
M i,O (t,t + 1) =
TEi (t | t ) ×TEi (t | t + 1)

where TE(r|s) indicates the earlier defined output oriented technical efficiency index for firm i, using
inputs xi,r and producing outputs yi,r relative to production (and input usage) for firms based in period s.
This index is computed using the following program:

0  0   1 
d L   N  , c   N  ,     , dU   N 
0 1 ir  
    X 0 K  x 
bL   K  , A   s  , bU   i 
 0M   Ys -y ir   M 

This uses the constant returns to scale form. Also, since the period r output and input vectors for firm i
will not appear in Ys and Xs when r does not equal s, ir need not be larger than one. Note that this
requires solution of four linear programs for each firm in each period, so the total number of programs to
solve will be 4NT. Each is quite fast, so overall, the computations do not take long. In the sample of
247 firms and six periods, the nearly 6,000 programs, each involving 248 activities and six constraints,
took about 10 seconds.
These computations are carried out for each firm in each period save the last one, and produce an
NT matrix of TFP values, one row for each firm, one column for each period. The TFP value for the last
period is recorded as 1.0, though this is just a space filler.
To compute the Malmquist TFP indices, you will require a panel of data, at least two periods, for
each of N firms. Unlike other panel data routines in LIMDEP, this computation always requires a
balanced panel. Every firm must be observed in the same T periods. Also, this routine has no procedures
for avoiding missing or invalid data such as zero values for inputs or outputs. The balanced panel must be
„clean‟ before computation begins. To request the computations, just add
; Pds = t, the fixed number of periods.
Nothing else need be changed. There is no bootstrap feature (; Nbt = 0); the computations assume
constant returns to scale (; CRS is the default and cannot be changed) and no allocative efficiency (; Rh2
is ignored).
E65: Data Envelopment Analysis E-143

Malmquist TFP Index Application

To illustrate the Malmquist computations, we reexamine the sample of 247 Spanish dairy farms
observed for six years. The output is milk production. Inputs are cows, land, labor and feed.

FRONTIER ; Lhs = milk


; Rhs = cows,land,labor,feed
; Alg = DEA ; Pds = 6
; List $

The following results are displayed. In addition, a matrix containing the full table, named malmquist, is
created.

==============================================================================
Malmquist TFP Index for Productivity Change
Panel contained 247 firms each observed in 6 periods
Full Results saved as matrix MALMQIST
==============================================================================
Average results across firms, by period:
==============================================================================
Period: 1 2 3 4 5
TFP 1.0476 1.0233 1.0247 1.0298 1.0349
==============================================================================
Individual calculations by firm
(Only 8 periods can be displayed. TFP for the final period is not computed.)
==============================================================================
Observation 1 2 3 4 5 6 7 8
Firm = 1 1.1301 1.1002 .9736 1.0291 1.0901 1.
Firm = 2 1.0528 1.0343 1.0212 1.0109 1.0416 1.
Firm = 3 1.0525 1.0383 .9477 1.0465 1.0395 1.
Firm = 4 1.1418 1.0129 1.0079 .9829 1.0476 1.
Firm = 5 1.1192 1.0240 1.0082 1.0245 1.0641 1.
Firm = 6 .9871 1.0073 .9785 1.0322 1.0464 1.
Firm = 7 .9851 1.1484 1.1599 .8054 1.1110 1.
Firm = 8 1.0746 .9796 .9636 1.0671 .9753 1.
Firm = 9 .8977 1.1496 .9818 1.0500 .9867 1.
Firm = 10 1.0105 1.1507 .9751 1.0055 1.0469 1.
Firm = 11 1.1276 .9867 .9636 1.0826 .9873 1.
Firm = 12 1.0310 1.1020 .9822 1.0438 .9914 1.
Firm = 13 1.0549 1.1263 .9221 1.0723 1.1945 1.
Firm = 14 .9408 1.0740 .9938 .9739 1.0336 1.
Firm = 15 .8952 .7156 1.5056 .8614 .9204 1.
(Rows 66 – 247 omitted).

You might also like