Professional Documents
Culture Documents
Theory and Statistical Properties of Quantile Data Envelopment Analysis
Theory and Statistical Properties of Quantile Data Envelopment Analysis
Decision Support
a r t i c l e i n f o a b s t r a c t
Article history: This research proposes Quantile Data Envelopment Analysis (qDEA) as a procedure that accounts for the
Received 3 May 2019 sensitivity of Data Envelopment Analysis (DEA) to data or firm outliers when using DEA to estimate com-
Accepted 28 March 2020
parative efficiency or benchmarking performance metrics. The qDEA methodology endogenously identifies
Available online 12 April 2020
the distance to a qDEA-α hyperplane while allowing up to proportion q = 1 - α of the data observations
Keywords: to lie external to the qDEA-α hyperplane. The ability of qDEA to provide more conventional quantile-
Benchmarking based benchmarking information is discussed. The statistical properties of the qDEA estimator are exam-
Data envelopment analysis ined utilizing nCm subsampling and Monte Carlo procedures. Monte Carlo simulations indicate that qDEA
Partial moments distance estimates share the desirable root-n convergence and large sample normality properties of the
Quantile DEA robust Free Disposal Hull (FDH) based order-m and order-α estimators.
Robust DEA estimator
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.ejor.2020.03.077
0377-2217/© 2020 Elsevier B.V. All rights reserved.
650 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661
to procedures that endogenously identify directions to points that qDEA-α hyperplane and points lying in the qDEA-α half-space as
lie on the efficient DEA frontier but are closer or closest to a given being internal to the qDEA-α hyperplane.
DMU’s data point. Aparicio, Cordero, and Pastor (2017); Aparicio The qDEA approach is similar in concept to the FDH based
and Pastor (2014); Aparicio, Ruiz, and Sirvent (2007); Ramón, Ruiz, order-α approach in that the qDEA process allows the user to
and Sirvent (2018); Ruiz and Sirvent (2016). Even when a closest obtain more robust DEA distance metrics by allowing endoge-
frontier point generated by efficient points is identified, concerns nously identified sets of reference observations to lie external to
over the reachability of such points has led some researchers to the qDEA-α hyperplane. Additionally, the conventional DEA pro-
suggest procedures where sets of efficient points are iteratively cess can be viewed as a special case of qDEA in which q = 0 or
deleted and the closest projected points generated with the re- α = 1 and no observed data points are allowed to be external to
maining sets of data (Ramón et al., 2018). Ramon, Ruiz, and Sivent the qDEA-1 hyperplane. By endogenously identifying and allowing
cite several papers that iteratively eliminate efficient points includ- sets of external of points to be qDEA-α superefficient, qDEA di-
ing a paper by Barr, Durchholz, and Seiford (20 0 0) where the pro- rectly addresses the problem of identifying sets or groups of out-
cess is termed “peeling the onion”. lier firms as discussed by Bogetoft and Otto (page 149).
The sensitivity of comparative performance metrics to non- Extending the work of Atwood and Shaik (2018), we discuss the
representative observations or outliers is not unique to DEA. Any theory, development, usefulness, and limitations qDEA in greater
comparative study will be sensitive to outliers if a given DMU’s depth as well as examine the qDEA estimates’ desirable statisti-
performance is contrasted to the performance of the highest one cal properties of apparent root-n convergence and asymptotic nor-
or two achievers within a set of realized outcomes. Recogniz- mality. In the following sections we specifically: (1) motivate the
ing this potential, it is common practice in conventional bench- qDEA problem by discussing a numerical example borrowed from
marking applications (Boxwell, 1994; Risk Management Associa- (Cooper, Seiford, & Tone, 2006) and modified to include a “data
tion, 2019) to contrast a given DMU’s performance metrics to outlier”, (2) present two partial moment stochastic inequalities and
metrics constructed from a quantile of the population such as their application in constructing linear programming LP models
the top five, ten, 25 or even 50 percent of DMUs in the in- that endogenously identify sets of LP constraints to be violated,
dustry rather than to the top one or two historically performing (3) incorporate the partial moment inequality into DEA to develop
DMUs. the qDEA model and demonstrate procedures to obtain qDEA so-
The use of quantile benchmarking is desirable in that it de- lutions for the example problem, (4) present procedures and the
creases the sensitivity of the performance comparisons to outliers results of a set of simulations numerically examining the statistical
and also allows a given firm to compare itself to different peers properties of the qDEA estimator, and (5) briefly present an ex-
in situations where the performance levels of the top DMUs may ample demonstrating how qDEA benchmarking might be used in
not be viewed as replicable or reachable due to unobserved firm- a policy analysis or regulatory application. We conclude with a set
specific constraints. Quantile-based comparisons provide useful in- of caveats and suggestions with respect to utilizing quantile DEA
formation to a given DMU in that a bench-mark may be viewed as in an applied/regulatory benchmarking setting and potential future
more obtainable or reachable if 10% or more of a group of DMUs research.
have achieved or exceeded the benchmark than if only one or two
exceptional DMUs have attained the standard.
While quantile-based benchmarking is a common practice for 1. Motivating qDEA: a numerical example
more conventional performance metrics, procedures for imple-
menting quantile-based bench marking with DEA are more diffi- We demonstrate the implementation of the qDEA procedure
cult - especially when the analyst wishes to simultaneously iden- using a modification of Cooper, Seiford, and Tone’s (2007, pages
tify sets of more than one potentially outlying firms. Quantile- 26-27) Constant-Returns-to-Scale (CRS) single-input (employees)
regression, order-m and order-α quantile-based estimators that en- single-output (sales) eight DMU example. Given that the traditional
dogenously identify and address sets of outlying data have been DEA input and output orientation results can be obtained as spe-
developed with stochastic frontier or stochastic DEA models (Behr, cial cases of the Directional DEA (DDEA) (Chambers, Chung, & R.,
2010; Jradi & Ruggiero, 2019; Wang, Wang, Dang, & Ge, 2014) or 1996) model, all mathematical results and examples in the follow-
Free Disposal Hull (FDH) assumptions (Cazals et al., 2002; Daouia ing discussion will utilize DDEA. Since qDEA identifies ”q-super ef-
& Simar, 20 05; 20 07; Daraio & Simar, 20 07; Wheelock & Wilson, ficient points”, we first review primal and dual DDEA models that
2008) but quantile estimators have not been commonly available accommodate super-efficiency.
with linear programming based DEA models that accommodate A primal DDEA model that accommodates potential super-
multiple inputs and multiple outputs. efficiency for DMU-0 with V outputs, U inputs, n reference DMU’s,
The traditional DEA model can be interpreted as a procedure for and constant returns to scale (CRS) can be mathematically de-
estimating the distance, in a given direction, from a given DMU’s scribed as:
data point to a projected point lying on an endogenously identified Max φ
hyperplane while requiring that all reference set data points defin- s. t.
ing the possible input-output technology set lie on or within the n
same half-space of the hyperplane. Atwood and Shaik (2018) re- (a ) yvj z j − dv0 φ ≥ y0v f or v = 1, · · · , V
cently introduced the linear programming quantile DEA (qDEA) j=1 (1)
procedure in which quantile DEA distances are obtained by com- n
puting the distance, in a given direction, from a given DMU’s data (b ) xuj z j + du0 φ≤ x0u f or u = 1, · · · , U
point to a point on an endogenously identified hyperplane termed j=1
DMU’s being the black line defined by the origin and point E. If
two external points are allowed (α = 6/8), the qDEA model en-
dogenously chooses to allow points B and E to lie external to the
hyperplane with the resulting qDEA-(6/8) hyperplane for all DMU’s
being the blue line defined by the origin and point D. We return
to this example below when we present procedures enabling the
identification of the qDEA-α hyperplanes in Fig. 1.
Specifying φ as a free variable with dv0 ≥ 0 and du0 ≥ 0 allows
movement in both the northwest and southeast directions indi-
cated by the black and red arcs in Fig. 1. If φ j,α < 02 , DMU j’s data
point is α -superefficient indicating movement in a red arc direc-
tion is required to reach DMU j’s qDEA-α hyperplane. In general, a
qDEA distance φ j, α <= 0, φ j, α = 0, and φ j, α >= 0 indicates that
DMU j’s data point is external to (superefficient), on (efficient), or
internal to (inefficient) its qDEA-α hyperplane. Fig. 1 plots the fea-
sible direction arcs for DMU’s A-H. In the following discussion, we
report the estimated CRS φˆ j, α distances for movements in hori-
zontal (input), vertical (output), and one-one directions indicated
by the dashed lines in Fig. 1.
Table 1
Modified CST Example – Input-Output Levels, CRS DDEA and qDEA distances and projected points for one and two external points.
Input Orientation
Output Orientation
One-one orientation
t g
ρLPM (γ , t ) = (t − x )γ f (x ) dx = (t − x )γ f (x ) dx
−∞ −∞
t
+ (t − x )γ f (x ) dx ⇒
g
g
ρLPM (γ , t ) ≥ (t − x )γ f (x ) dx
−∞
g g
≥ (t − g)γ f (x ) dx = (t − g)γ f (x ) dx ⇒
−∞ −∞
J
with x Normal (100, 25). Fig. 2 plots the probability bounds for the where xk = j=1 yk, j z j is portfolio income in state k, μx is the
ρ (γ ,t )
one-sided Chebychev inequality as well as the LPM based LPM γ
(t−g) mean portfolio income, μz, j is asset j’s mean income, zj is the
probability limits for various combinations of γ and t. Fig. 2 also chosen level of asset j, ai, j and bi are portfolio constraint coef-
contrasts the probability bounds to the actual probability pnorm ficients, yk, j is the return to asset j in state k, t is an endoge-
(60, mean = 100, sd = 25) ∼ = 0.055. The BH semivariance proba- nously determined linear LPM integral limit, δ k is an income de-
J
bility bound is the right-most point on the red curve and is less viation below level t with δk = t − j=1 yk, j z j = t − xk when xk < t
conservative than the Chebychev one-tailed limit. For each γ , less and zero otherwise, rk is a non-negative probability of state k sat-
conservative LPM-based probability bounds can be found depend- isfying Kk=1 rk = 1, ρLPM = ρLPM (t ) is the endogenously computed
ing upon the level of t but an inappropriate level for t may also linear LPM, and g is a level of income that must be exceeded with
generate excessively conservative probability bounds. In situations probability α = 1 − q. System (9) selects a vector of optimal port-
where the Chebychev or the BH inequality bounds are exact, the folio levels z˜ that map to a vector of potential portfolio income
LPM inequality will generate exact probability bounds for an ap- realizations x˜. The model maximizes the portfolio’s expected in-
propriate choice of t and γ . come μx subject to technical portfolio constraints and the con-
The linear partial moment inequality (γ = 1 ) can often gener- straint t˜ − 1 /q ρLPM (t˜) ≥ g thus guarantying Prob(x˜ ≤ g) ≤ q.
ate less conservative bounds than using higher order partial mo- The portfolios and potential income levels x˜ chosen in system
ments. We limit our discussion to the set of linear partial mo- (9) were found to satisfy the probability constraint but were often
ments as the linear partial moment can be computed in an LP found to be excessively conservative in that the actual probabil-
model modeled with a finitely discrete set of outcomes. Atwood ity of portfolio income falling below g was frequently much lower
(1985) and AWHH (1988) demonstrated that, with a linear PM, a than q. The reason for this conservativeness can be explained us-
given application’s least conservative level for t could be endoge- ing insights from two reviewers who correctly pointed out that
nously determined in a linear programming model. Denoting the the AWHH’s (1988) model is similar to Rockafellar and Uryasev
linear partial moments as ρLPM (1, t ) = ρLPM (t ) and ρUPM (1, t ) = (20 0 0)’s conditional Value at Risk (cVaR) model. This perception is
ρUPM (t ), we can rewrite the probability limits (4) and (5) as: correct in that an upside-risk version of system (9) can be shown
to be equivalent to a cVaRq constrained version of Rockafellar and
ρLPM (t ) ρUPM (t ) Urysev’s (RU) model3 . With an upside risk model of losses, Rock-
Prob(x ≤ g) ≤ and Prob(x ≥ g) ≤
(t − g) > 0 (g − t ) > 0 afellar and Urysev’s results imply that the optimal t˜ and g terms
(6) in the binding upside risk constraint t˜ + 1 /q ρUPM (t˜) = g are, respec-
tively, the solution’s q-level Value at Risk (VaRq ) and the associated
Enforcing constraints:
conditional Value at Risk (cVaRq )4 .
t − 1/q ρLPM (t ) ≥ g and ρLPM (t ) ≥ 0 (7) The conservativeness of system (9)’s down-side risk solutions
arises from the result that g < t˜ or cVaRq (x˜) < VaRq (x˜) when
or ρ LPM (t) > 0. System (9) actually guarantees that Prob(x˜ ≤ t˜) ≤ q
t + 1/q ρUPM (t ) ≤ g and ρUPM (t ) ≥ 0 (8) with Prob(x˜ ≤ g) usually being less than q, i.e. conservative. There
are several possible approaches to obtaining less conservative so-
while allowing the LP model to endogenously determine the least lutions than those obtained from system (9). Subsequent unpub-
constraining partial moment limit t˜ and compute the linear par- lished research suggested that a two-stage process where the first
tial moments ρLPM (t˜) or ρUPM (t˜) is sufficient to guarantee that stage (system (9)) is used to identify constraints (constraints with
ρ (t˜) ρ (t˜)
Prob(x ≤ g) ≤ (t˜LPM
−g)>0
≤ q or Prob(x ≥ g) ≤ (g−
UPM
t˜)>0
≤ q. δ k > 0) to be relaxed in a second stage would generate less conser-
vative outcomes.
We again emphasize that, while originally developed as a risk
2.2. Incorporating partial moments into probabilistically constrained
model, system (9) is more broadly applicable than AWHH’s or RU’s
LP problems
chance-constrained or cVaR constrained setting. In system (9), the
yk, j values need not be in monetary or rates of return units and
AWHH utilized the linear LPM inequality results in construct-
the model need not involve risk. System (9) can be generalized to
ing a model that maximized the expected income of a portfolio of
other classes of linear programming models where the researcher
assets subject to a set of technical constraints and the additional
requirement that the probability of income falling below a target
level g not exceed q. Modifying their notation slightly, AWHH’s
portfolio optimization model with J assets, I portfolio constraints, 3
Rockafellar and Uryasev (20 0 0) presented a linear programming model that
and K states of nature can be written as: minimized the upside or loss-risk cVaRq while endogenously identifying the
cVaRq s associated VaRq level. Their objective function is equivalent to minimiz-
ing t + 1 /q ρUPM (t ) = cVaRq (t ) with t = VaRq . Rockafellar and Urysev’s model can be
J
Max μx = μz, j z j modified to optimize an objective function subject to a maximal cVaRq by imposing
z the constraint t + 1 /q ρUPM (t ) ≤ g which is equivalent to the UPM constraint derived
j=1
previously (8).
s.t. 4
The ˜t = VaRq and g = cVaRq results can easily be demonstrated for system
J
(9). System (9) searches over potential portfolios while endogenously identifying
(a ) ai, j z j ≤ bi i = 1, . . . , I the t˜ level associated with the least conservative or minimal level of the proba-
t
j=1 bility limit ρLPM
t−g
(t )
. Note that ρLPM (t ) = (t − x ) f (x ) dx = F (t )[t − E {x|x ≤ t }] and
J −∞
(b ) yk, j z j − t + δk ≥ 0 k = 1, . . . , K ρLPM (t ) = F (t ). Taking the derivative of the linear LPM probability bound with
d
dt
d ρLPM (t ) F (t )(t−g)−ρLPM (t )
j=1 respect to t and setting the result equal to zero gives: dt (t−g) = (t−g)2
=
F (t )(t−g)−F (t )[t−E {x|x≤t }] ρLPM (t )
K = 0 ⇔ g = E {x|x ≤ t˜} with the second derivative of with
(t−g)2 t−g
(c ) rk δk − ρLPM = 0 respect to t (evaluated at the minimal t˜) being strictly positive. Assuming the
k=1 sufficiency constraint is binding i.e. t˜ − 1/qρLPM (t˜) = g implies: ρLPM (t˜) = q (t˜ − g) ⇒
(d ) t − 1q ρLPM ≥ g F (t˜)[t˜ − E {x|x ≤ t˜}] = q (t˜ − g) ⇔ F (t˜) = q. Thus, with AWHH’s downside risk con-
straint t − 1/q ρLPM (t ) ≥ g, the endogenously selected t˜ level is the VaRq (x˜) of the op-
(e ) z j ≥ 0 f or j = 1, · · · J; t free; δk ≥ 0; ρLPM ≥ 0
timal portfolio’s income such that Prob(x˜ ≤ ˜t ) =q and g is the associated cVaRq (x˜)
(9) or the Expected{x˜ | x˜ ≤ ˜t}.
654 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661
desires to allow up to a proportion of endogenously identified con- system (2) while relaxing the constraints associated with the po-
straints to be violated. In this paper, we utilize similar constraints tential external points identified in stage-1.
in a DDEA model that allows a proportion of endogenously identi-
fied input/output points to lie external to the q-DEA hyperplane. 3.1. qDEA example application, implementation, and issues
In the following we utilize a two-stage process where, in Stage
1, an augmented dual-DDEA system similar to system (9) is utilized In this section, we illustrate procedures for obtaining q-DEA es-
to identify constraints to be relaxed (i.e. identify sets of potential timates for the modified Cooper, Sieford, and Tone’s (CST) eight
qDEA-α super-efficient DMUs). Stage 2 of qDEA consists of solving DMU example. The input (employees) and output (sales) data,
conventional DDEA models (1) or (2) with the constraints identi- estimated qDEA-α distances, and projected points are presented
fied in qDEA Stage-1 being relaxed. Given that constraints (2-a) is in Table 1. Table 1 presents the qDEA constant returns to scale
system (2) involve ≤ restrictions, we utilize the upper partial mo- (CRS) results using three directional assumptions: (a) the in-
ment ρ UPM (t) and the t + 1 /q ρUPM (t ) ≤ g sufficiency constraint. put orientation problem or horizontal movement with directions
(duj = xuj , dvj = 0 ), (b) the output orientation problem or verti-
j j j
cal movement with directions(du = 0, dv = yv ), and (c) a unit or
3. The qDEA model: development and application to example j j
problem one-one direction model with diagonal directions (du = 1, dv = 1 )
while allowing 0, 1, or 2 points to lie external to the qDEA-α
Dual DDEA system (2) is augmented to implement the first hyperplane.
stage of the qDEA model as follows: The qDEA solution to this problem is obtained in two stages.
Figs. 3 and 4 illustrate the implementation of qDEA for DMU H
V
U and the ”one-one” direction using screen shots from MS-Excel.
Min −y0v pv + x0u wu Fig. 3 presents an augmented MS Solver tableau for qDEA stage-1
v=1 u=1 (system (10)) while allowing no more than two points to lie exter-
s.t. nal to the hull. From our experience with the conservative partial
V
U
(a ) yvj pv − xuj wu − t − δ j ≤ 0 for j = 1, . . . , n moment models, if we want no more than NP = 2 points to lie
v=1 u=1
external to the hull, identifying the positive δ j values are less sen-
sitive to rounding errors (for small δ j positive values) if q is set just
V
U
(b ) dv0 pv + du0 wu = 1 below (NP+1)/n. For this problem with NP = 2, q was set equal to
v=1 u=1 (2+1) / 8 - 0.0 0 01 = 3/8 - 0.0 0 01 ∼ = 0.37499 which will guarantee
n that the proportion of the positive δ j will not exceed 0.37499 <
1
(c ) δ j − 1 ρUPM = 0 3/8. The value 1/q in the constraint matrix is thus 1 / 0.37499 ∼ =
n
j=1 2.6667.
(d ) 1t + 1
q
ρUPM ≤ 0 The highlighted qDEA stage-1 solution in Fig. 3 indicates that
the model has determined to allow points B and E to potentially
(e ) pv ≥ 0 ; wu ≥ 0 ; t a free variable; δ j ≥ 0 ; ρUPM ≥ 0 lie external to the qDEA-(6/8) hyperplane. The objective value from
qDEA stage-1 is a conservative6 φˆ H = 1.21741 with projected point
(10)
(x˜H , y˜H ) = (xH − φˆ H 1, yH + φˆ H 1 ) = (6.7826, 6.2174). The second
where pv is an output price, wu is an input price, t is the endoge- stage of the qDEA model is a DDEA model (2) with relaxed con-
nously determined upper partial moment (UPM) integral limit, δ j straints. Fig. 4 presents the qDEA stage-2 dual tableau (system (2))
j U j where the constraints associated with points B and E have been
is a deviation above t when Vv=1 yv pv − u=1 xu wu > t and zero
otherwise, ρUPM = ρUPM (t ) is the endogenously calculated linear relaxed. In Fig. 4, we relax the constraints associated with DMU’s
UPM, and 0 < q < 1 is the maximal proportion of data points that B and E by adding large positive values in the corresponding right
are allowed to lie outside the qDEA-α hyperplane. Since we wish hand side locations. The resulting qDEA-(6/8) direction (1, 1) dis-
tance is φˆ H, /8 = 0.5714 for point H.
6
to allow up to proportion q of the constraints in system (2-a) to
be violated, we set g = 0 in the UPM constraint (10-d) and allow t Fig. 4’s left-hand-side (LHS) values for the constraints associ-
to be negative. System (2) is fully nested in system (10) as setting ated with both DMU’s B and E exceed 0, indicating that points B
q < 1/n will guaranty that no points can lie outside the hyperplane and E are indeed external to DMU H’s qDEA-6/8 hyperplane. We
thus obtaining the conventional DDEA results. return to this issue below where we discuss an example where
Potential qDEA-α super-efficient DMU’s or external points in one iteration of the two-stage qDEA process does not result in two
system (10) are identified as constraints with positive δ j values in external points due to the conservativeness of the partial moment
the solution. System (10) may sometimes be conservative in that inequality.
fewer than proportion q of the δ j may be positive but the solu- We conclude this section by noting that the excel screen shots
tions to system (10) will have no more than proportion q of po- presented above were for illustrative purposes only. All results in
tentially external points. We also demonstrate below that the ob- the following discussions were obtained using R. The associated R
jective function value from expression (10) does not give the final code is available from the authors upon request
DDEA distance estimate as system (10)’s extrapolated point (and
the resulting distance metric) will consist of an extrapolation using 3.2. qDEA distances for the modified CST example
information from all external points plus the new qDEA support
points5 . While system (10) generates conservative distance esti- Table 1 presents the DDEA and qDEA distances and pro-
mates, the set of positive δ j values endogenously identifies a set of jected points for all eight DMU’s under constant returns to scale
j j j j j j
potential points allowed to lie external to the given DMU’s qDEA- and for the input (du = xu , dv = 0 ), output (du = 0, dv = yv ), and
α hyperplane. The second stage of the qDEA process re-estimates
6
The qDEA stage 1 solution is conservative in that its projected (xˆ, yˆ ) values
are projected from the two potential external points B and E as well as point D
5
We use the terminology support points to denote the points that define the in Fig. 1. This is apparent by examining the LP’s dual values presented in Fig. 3.
hyperplane onto which the given DMU’s input-output points are projected with φ α These negative of the dual values are DMU projection weights in dual system (10)’s
being the distance from the initial point to the projected point on the hyperplane. associated primal LP.
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 655
We conclude the discussion of the CST example by noting that, ability to identify sets of potentially outlying data points. We now
if estimated with variable returns to scale (VRS), the qDEA-α hy- examine the statistical properties of the qDEA estimators.
perplane’s external points will often be DMU and direction depen-
dent. As discussed by Atwood and Shaik (2018), with VRS qDEA- 4. Statistical properties of qDEA estimates
α distances (in some directions) for some qDEA-α superefficient
DMU’s will be dual unbounded (primal infeasible) due to the in- A natural question is whether the qDEA estimator possesses de-
creased number of points that are allowed to be external to the sirable statistical properties. Given that, to date, we have not been
DMU’s qDEA-α hyperplane. For example, in Fig. 1, with VRS and able to derive closed form expressions for the asymptotic proper-
one external point, points H and A will be qDEA-(7/8) supereffi- ties of the qDEA estimator, we use several assumed data generating
cient. Movement in the input (horizontal) direction from point H processes (DGP) and numerical methods to examine these proper-
or the output (vertical) direction from point A will result in a pri- ties.
mal infeasibility as movement in those directions cannot reach an Specifically, we use Monte Carlo simulations and a slight mod-
endogenously identified qDEA-(7/8) hyperplane. ification of nCm subsample bootstrapping theory to examine the
Although not discussed by Atwood and Shaik, several authors large sample properties of the qDEA estimator. In the next section
(Atwood, S., & Walden, 2017; Chen, 2005; Chen & Liang, 2011; of the paper, we describe: (1) the procedures utilized to estimate
Cook, Liang, Zha, & Zhu, 2009; Lee, Chu, & Zhu, 2011; 2012; Lovell large sample convergence rates and perform normality tests for the
& Rouse, 2003; Seiford & Zhu, 1999) have proposed methods to qDEA estimates and (2) the class of DGP used in the Monte Carlo
allow estimation of DEA metrics for superefficient DMU’s whose simulations. We then present and discuss the results of the large
initial DEA solutions are primal infeasible. Many of the suggested sample convergence rate estimations and normality tests.
procedures involve the identification of alternative directions that
allow movement from the given DMU’s input-output observation 4.1. Numerically estimating large sample statistical properties
to a point on a feasible data-generated hyperplane.
Geyer (2013) presents a succinct review of the Politis, Romano,
3.3. The q-Conservativeness of qDEA Stage-2 solutions and Wolf’s (1999,2001) nCm subsampling process. Space presents
a complete discussion but the results are that confidence intervals
Another concern with qDEA is that, in practice, we often find for a statistic θˆn can often be estimated by utilizing properties of
that a single iteration of the qDEA two-stage process results in the statistic:
q-conservative solutions in that fewer than proportion q of the m β
data points actually lie external to the qDEA-α hyperplane at the Sm = θˆn − n
θˆm − θˆn (11)
completion of qDEA stage-2. In our experience, conservative stage-
2 solutions are more likely if the data contains large influential where θˆn is a parameter estimate obtained from a sample of size
outliers. The reason for conservative stage-2 solutions lies in the n, β is a convergence rate, and θˆm is an estimate from a subsam-
use of the stochastic inequality when identifying potential exter- ple of size m(n)
n where
denotes ”much smaller than”7 .
nal points in qDEA stage-1. The qDEA stage-1 model allows the LP In finite sample nCm bootstrapping, B sets of m observations are
model to select a partial moment limit t and endogenously contort repeatedly resampled from a given sample of size n and used to
the original constraint polytope while constraining the weighted estimate θˆm,b for b = 1, .., B. Confidence intervals for the parameter
sums of deviations above t in such a way that no more than pro- θ can then be estimated by using the quantiles of the set of B
portion q of the δ j = y j p − x j w + t values lie above zero. While the simulated statistics:
solution will commonly have up to proportion q of positive δ j val- m β
Sm, b = θˆn − n
θˆm, b − θˆn (12)
ues identifying potential external points in the original DEA prob-
lem, not all such points may actually be external when the qDEA Constructing expression (12)’s Sm,b values requires that the ana-
stage-2 problem is solved. In response to this issue, we developed lyst identify appropriate levels for both the convergence rate β and
a multi-iteration procedure that addresses the qDEA stage-2 con- the subsample size m. While the convergence rates for the DEA,
servativeness and allows the researcher to identify solutions with FDH, and FDH-related order-m and order-α estimators are known8 ,
proportion q of external points if such a solution is feasible. we have not been able to derive β for the qDEA estimator.
An q-conservative example is easily constructed by modifying Given that there is no known way to exogenously select the op-
DMU B’s observation to a more extremal (xB , yB ) = (2, 5 ) while timal level for the subsample size m (Politis et al., 1999) a common
attempting to allow two external points. At the completion of practice is to estimate θˆmi ,b over a range of mi values and select
the procedures demonstrated in Figs. 3 and 4, we find that only the appropriate mi using data driven results (Politis et al., 1999;
one point actually lies external to the qDEA hyperplane. Supple- Simar & Wilson, 2011b). When both the DGP and β are unknown,
mental materials available from the authors demonstrates a multi- Politis et. al. (1999-chapter 8) and Geyer discuss finite-sample pro-
iteration procedure that addresses the issue and identifies two ex- cedures utilizing the bootstrapped θˆm, b values to estimate the un-
ternal points. As also discussed in the supplemental material, from known convergence rate parameter β . Politis et. al. suggest esti-
our simulation experience it appears that the original distance es- mating convergence rates by examining the rate at which inter-
timates obtained from completing and bootstrapping one iteration quantile-ranges computed from θˆmi , b change with increases in the
of the qDEA process will generate consistent estimates of the φˆ αˆ subsample size mi . We initially utilized their suggested finite sam-
distance where αˆ = 1 − qˆ and qˆ is the proportion of Stage-2’s ac- ple βˆ estimation procedures but found that, while the finite sam-
tual external points. While in many benchmarking exercises one ple βˆ estimates performed relatively well as n increased, the pro-
iteration and an estimate of the corresponding qˆ may be sufficient, cedures gave unstable results at smaller sample sizes such as when
R code available from the authors allows the user to specify the
number of two-stage qDEA iterations and the qˆ -tolerance desired.
7
More formally m(n)
n denotes using a rule for selecting m (as a function on
All results in the following sections of the paper have been esti- √
n) such that Lim m(n ) = ∞ and Lim m(nn ) = 0. One such rule is to set m(n ) = K n
mated with up to 4 iterations of the two-stage qDEA process de- n→∞ n→∞
where K is a positive scalar.
scribed in the appendix. 8
The DEA CRS and VRS convergence rates are, respectively, β = U+2 V and β =
We have presented the qDEA model and demonstrated its abil- 2
. The FDH, order-m, and order-α convergence rates are, respectively, β = U+1 V ,
U+V +1
ity to obtain quantile distance/efficiency metrics as well as qDEA’s β = 1 /2 , and β = 1 /2 (Simar & Wilson, 2011a)
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 657
Table 2
Large sample convergence rate estimates and Pearson-Normality-Test P-Values by DGP, RTS, Number of Inputs U, and Sample Size.
DGP RTS U βˆ m10 0 0 m1395 m1946 m2714 m3786 m5282 m7368 m10278 m14337 m20 0 0 0
Beta13 CRS 1 0.525 0.057 0.131 0.164 0.114 0.436 0.198 0.200 0.452 0.323 0.361
Beta13 CRS 3 0.506 0.355 0.827 0.980 0.188 0.857 0.426 0.343 0.423 0.005 0.206
Beta13 CRS 5 0.522 0.781 0.729 0.420 0.108 0.111 0.486 0.364 0.894 0.674 0.262
Beta13 VRS 1 0.500 0.530 0.684 0.852 0.370 0.738 0.469 0.647 0.499 0.792 0.835
Beta13 VRS 3 0.505 0.527 0.496 0.126 0.817 0.873 0.697 0.355 0.967 0.582 0.146
Beta13 VRS 5 0.513 0.094 0.857 0.803 0.061 0.436 0.346 0.751 0.547 0.111 0.423
Beta24 CRS 1 0.491 0.869 0.131 0.798 0.003 0.978 0.376 0.564 0.657 0.089 0.019
Beta24 CRS 3 0.491 0.884 0.066 0.433 0.575 0.880 0.267 0.103 0.923 0.924 0.506
Beta24 CRS 5 0.498 0.021 0.533 0.729 0.370 0.589 0.204 0.089 0.674 0.052 0.456
Beta24 VRS 1 0.516 0.589 0.650 0.864 0.684 0.726 0.394 0.410 0.585 0.160 0.328
Beta24 VRS 3 0.502 0.558 0.732 0.976 0.994 0.833 0.099 0.190 0.080 0.551 0.097
Beta24 VRS 5 0.503 0.042 0.257 0.394 0.866 0.285 0.213 0.097 0.151 0.056 0.379
Beta510 CRS 1 0.494 0.564 0.039 0.456 0.778 0.916 0.312 0.803 0.544 0.373 0.713
Beta510 CRS 3 0.491 0.086 0.896 0.222 0.413 0.852 0.544 0.095 0.869 0.317 0.690
Beta510 CRS 5 0.508 0.719 0.397 0.254 0.169 0.063 0.674 0.795 0.065 0.738 0.806
Beta510 VRS 1 0.487 0.838 0.911 0.789 0.857 0.997 0.171 0.249 0.509 0.370 0.482
Beta510 VRS 3 0.494 0.931 0.388 0.987 0.909 0.571 0.238 0.664 0.527 0.582 0.267
Beta510 VRS 5 0.501 0.707 0.869 0.025 0.985 0.070 0.472 0.943 0.122 0.602 0.252
n = 100. Due to the unstable results at smaller sample sizes, we When λ = 0.80, we estimated the qDEA model with a variable re-
chose to use Monte Carlo procedures to estimate large sample con- turns to scale (VRS) qDEA model. When λ = 1.0, the qDEA model
vergence rates for a set of assumed data generating processes. was estimated with constant returns to scale (CRS). Efficient in-
When estimating large sample convergence rates, we use an put and the output levels for DMU-0 were set at x˜0u = 0.5 for
assumed DGP to generate 10 0 0 (XN, mcs , YN, mcs )data sets of size U
λ /U
N = 20 0 0 0 for Monte Carlo repetition mcs = 1,...,10 0 0. We then u = 1,...U, with y˜0 = ( x˜0u ) giving y˜0 = 0.5 when λ =1 and
u=1
estimated φˆ m α
i ,mcs
distances for each sample size mi in a vec- y˜0 = 0.57435 when λ = 0.80. Inefficiency is introduced for DMU0
tor of ten subsample sizes in mlist ranging between 10 0 0 and via setting x0u = 2 x˜0u = 1 and y0 = y˜0 . The corresponding input-
20 0 0 0. Motivated by one of Geyer’s examples, we used the fol- orientation DDEA parameter with directions du0 = x0u and dv0 = 0
lowing vector mlist of mi values to estimate large sample conver- can be shown to be φ = 0.5.
gence rates: mlist = (10 0 0, 1395, 1946, 2714, 3786, 5282, 7368, For each Monte Carlo simulation mcs = 1,..., 10 0 0, a reference
10278, 14337, 20 0 0 0). We emphasize that since we are repeat- DMU data set of size N = 20 0 0 0 was generated using simulated
edly simulating data from an assumed DGP and estimating only j iid
α efficient input-output levels generated via: x˜u,mcs ∼ uni f (0, 1 ) and
one set of φˆ m i , mcs
for each mcs repetition, we are not limited by λ /U
U
the mi < < n requirement necessary when repeatedly subsampling j j
y˜mcs = ( x˜u,mcs ) for DMUs j = 1,..., N. Input inefficiency for
from a given sample of size n. For each sample size mi in mlist, u=1
φˆ mα i , mcs is estimated using the first mi observations in the data DMUj is introduced by randomly simulating φmcs ∼ BET A(a, b) and
j
the φˆ 0.95 estimates increase with sample size, and (3) the median
mi
4.2. Data Generating Processes (DGP) values appear to be convergent with increases in sample size. For
Fig. 5. Panel plots of large sample parameter estimates by sample size, Log(iqnr) by Log(Sample Size), and qqnorm plots of parameter estimates by sample size for VRS-
BETA(1,3)-U3 data generating process.
all DGP examined we obtained similar results indicating that the The lower panels in Fig. 5 present qqnorm-qqline plots for the
φˆ α estimator is apparently a biased but convergent estimator as 10 0 0 φˆ 10
0.95
0 0, mcs
and φˆ 20
0.95
0 0 0, mcs
estimates. An examination of the
sample size increases. The bias result is not troubling in that the plots provides strong evidence that the φˆ 0.95 estimates are nor-
mi , mcs
nCm bootstrapping process can be used to obtain bias-corrected mally distributed. When Pearson’s normality test is applied to the
estimates for φˆ α (Politis, et. al). residuals, the resulting p-values, presented in Table 2, are 0.527,
The upper right panel in Fig. 5 presents a boxplot of the log of and 0.146, indicating that normality cannot be rejected for the
the inter-quantile-ranges (iqnr) (generated in the convergence rate φˆ 10
0.95 and φˆ 20
0.95 estimates.
0 0, mcs 0 0 0, mcs
estimation process) plotted against the log of the sample sizes. An
Table 2 presents the large sample βˆ convergence rate esti-
examination of the second boxplot indicates that log(iqnr) visually
mates and normality test p-values for the BETA(1,3), BETA(2,4), and
appears to decrease linearly with log(m). For this example, when
BETA(5,10) data generating processes with constant and variable
log(iqnr) is regressed against log(m), the resulting slope parameter
returns to scale, one output, and number of inputs U = 1, 3, and
is -0.505 giving convergence rate estimate βˆ = 0.505 as presented
5. In Table 2, the estimated convergence rates βˆ are close to 0.5
in Table 2.
for all DGP, returns to scale, and number of input levels U exam-
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 659
Table 3
DEA and qDEA-α benchmarking comparative input contractions by quantile.
estimating more robust DEA distance metrics. We also believe that Chen, Y. (2005). Measuring super-efficiency in DEA in the presence of infeasibility.
qDEA will prove to be particularly useful in regulatory and bench- European Journal of Operational Research, 161, 545–551.
Chen, Y., & Liang, L. (2011). Super-efficiency DEA in the presence of infeasibility:
marking applications in that qDEA directly accommodates the esti- One model approach. European Journal of Operational Research, 213, 359–360.
mation of quantile-based distance metrics while endogenously al- Cook, W., Liang, L., Zha, Y., & Zhu, J. (2009). A modified super-efficiency DEA model
lowing proportions or quantiles (such as the top one, five or ten for infeasibility. Journal of Operational Research Society, 60, 276–281.
Cook, W., Ruiz, J., Sirvent, I., & Zhu, J. (2017). Within-group common benchmarking
percent) of DMU’s to lie external to the resulting hyperplanes. using DEA. European Journal of Operational Research, 256, 901–910.
Quantile-DEA provides a mechanism whereby a decision-maker or Cooper, W., Seiford, L., & Tone, K. (2006). Introduction to data envelopment analysis
researcher can obtain information about the practical reachability and its uses. NewYork: Springer.
Cumova, D., & Nawrocki, D. (2014). Portfolio optimization in an upside potential and
of projected points by exploring the interior of the feasible produc-
downside risk framework. Journal of Economics and Business, 71, 68–89.
tion technology and the identifying the proportion of DMUs that Dai, X., & Kuosmanen, T. (2014). Best-practice benchmarking using clustering meth-
have been able to reach or exceed a qDEA performance metric. The ods: Application to energy regulation. Omega, 42(1), 179–188.
Daouia, A., & Simar, L. (2005). Robust nonparametric estimators of monotone
use of qDEA will thus allow the analyst to directly “slice” rather
boundaries. Journal of Multivariate Analysis, 96, 311–331.
than “peel” the onion of the DEA revealed technology to the de- Daouia, A., & Simar, L. (2007). Nonparametric efficiency analysis: a multivariate con-
sired quantile depth using conventional LP-based procedures. ditional quantile approach. Journal of Econometrics, 140, 375–400.
Daraio, C., & Simar, L. (2007). Advanced robust and nonparametric methods in effi-
ciency analysis. New York: Springer.
References Fishburn, P. (1972). Mean-risk analysis with risk associated with below-target re-
turns. American Economic Review, 67, 116–126.
Adler, N., Liebert, V., & Yazhemsky, E. (2013). Benchmarking airports from a man- Geyer, C. (2013). Statistics 5601 Notes: The subsampling bootstrap. University
agerial perspective. Omega, 41(2), 442–458. of Minnesota. Reference Material, http://www.stat.umn.edu/geyer/5601/examp/
Anthonisz, S. (2012). Asset pricing with partial-moments. Journal of Banking & Fi- subboot.html
nance, 36, 2122–2135. Jradi, S., & Ruggiero, J. (2019). Stochastic data envelopment analysis: A quantile re-
Aparicio, J., Cordero, J., & Pastor, J. (2017). The determination of the least distance gression approach to estimate the production frontier. European Journal of Oper-
to the strongly efficient frontier in data envelopment analysis oriented models: ational Research, 278(2), 385–393.
modelling and computational aspects. Omega, 71, 1–10. Lee, H., Chu, C., & Zhu, J. (2011). Super-efficiency DEA in the presence of infeasibility.
Aparicio, J., & Pastor, J. (2014). Closest targets and strong monotonicity on the European Journal of Operational Research, 212, 141–147.
strongly efficient frontier in DEA. Omega, 44, 51–57. Lee, H., Chu, C., & Zhu, J. (2012). Super-efficiency infeasibility and zero data in DEA.
Aparicio, J., Ruiz, J., & Sirvent, I. (2007). Closest targets and minimum distance to the European Journal of Operational Research, 216, 429–433.
Pareto-efficient frontier in DEA. Journal of Productivity Analysis, 28(3), 209–218. Lovell, C., & Rouse, A. (2003). Equivalent standard DEA models to provide supereffi-
Aragon, Y., Daouia, A., & Thomas-Agnan, C. (2005). Nonparametric frontier estima- ciency scores. Journal of the Operational Research Society, 54(1), 101–108.
tion: A conditional quantile-based approach. Econometric Theory, 21(2), 358–389. Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.
Atwood, J. (1985). Demonstration of the use of lower partial moments to im- Nawrocki, D. (1991). Optimal algorithms and lower partial moment: ex post results.
prove safety-first probability limits. American Journal of Agricultural Economics, Applied Economics, 23(3), 465–470.
67, 787–793. Politis, D., Romano, J., & Wolf, M. (1999). Subsampling. New York: Springer.
Atwood, J., S. , S., & Walden, J. (2017). Radial efficiency metrics using worst-case Politis, D., Romano, J., & Wolf, M. (2001). On the asymptotic theory of subsampling.
reference points. In Proceedings of the 15th European workshop on efficiency and Statistica Sinica, 11, 1105–1124.
productivity analysis. (EWEPA). London. June (pp. 12–15). Ramón, N., Ruiz, J., & Sirvent, I. (2018). Two-step benchmarking: Setting more realis-
Atwood, J., & Shaik, S. (2018). Quantile DEA: estimating qDEA-alpha efficiency es- tically achievable targets in DEA. Expert Systems With Applications, 92, 124–131.
timates with conventional linear programming. In Productivity and inequality. Risk Management Association (2019). Annual statement studies - financial ra-
New York: Springer Press. tio benchmarks, volume I 2017-2018. https://www.rmahq.org/uploadedFiles/
Atwood, J., Watts, M., Helmers, G., & Held, L. (1988). Incorporating safety-first con- Knowledge_Center/Publications_and_Tools/Statement_Studies/2017-18%20FRB%
straints in linear programming production models. Western Journal of Agricul- 20Definition%20of%20Ratios.pdf.
tural Economics, 13, 29–36. Rockafellar, R., & Uryasev, S. (20 0 0). Optimization of conditional value-at-risk. Jour-
Banker, R., F. , F., & Zhang, D. (2017). Use of data envelopment analysis for incen- nal of Risk, 2, 21–41.
tive regulation of electric distribution firms. Data Envelopment Analysis Journal, Ruiz, J., & Sirvent, I. (2016). Common benchmarking and ranking of units with DEA.
3(1-2), 1–47. Omega, 65, 1–9.
Barr, R., Durchholz, M., & Seiford, L. (20 0 0). Peeling the DEA onion: layering and Seiford, L., & Zhu, J. (1999). Infeasibility of super-efficiency data envelopment anal-
rank-ordering DMUs using tiered DEA. Southern Methodist University Technical ysis models. INFOR, 37, 174–187.
Report, Southern Methodist University. Simar, L. (2003). Detecting outliers in frontier models: a simple approach. Journal of
Bawa, V. (1975). Optimal rules for ordering uncertain prospects. Journal of Financial Productivity Analysis, 20, 391–424.
Economics, 2(1), 95–121. Simar, L., & Wilson, P. (2011a). Estimation and inference in nonparametric fron-
Bawa, V., & Lindenberg, E. (1977). Capital market equilibrium in a mean-lower par- tier models: recent developments and perspectives. Foundations and Trends in
tial moment framework. Journal of Financial Economics, 5(1977), 189–200. Econometrics, 5(3-4), 183–337.
Behr, A. (2010). Quantile regression for robust bank efficiency score estimation. Eu- Simar, L., & Wilson, P. (2011b). Inference by the m out of n bootstrap in nonpara-
ropean Journal of Operational Research, 200(2), 568–581. metric frontier models. Journal of Productivity Analysis, 36, 33–53.
Berck, P., & Hihn, J. (1982). Using the semivariance to estimate safety-first rules. Wang, Y., Wang, S., Dang, C., & Ge, W. (2014). Nonparametric quantile frontier esti-
American Journal of Agricultural Economics, 64, 298–300. mation under shape restriction. European Journal of Operational Research, 232(3),
Bogetoft, P. (2012). Performance benchmarking: Measuring and managing performance. 671–678.
New York: Springer. Wheelock, C., & Wilson, P. (2008). Non-parametric, unconditional quantile estima-
Bogetoft, P., & Otto, L. (2011). Benchmarking with DEA, SFA, and R. New York: tion for efficiency analysis with an application to federal reserve check process-
Springer. ing operations. Journal of Econometrics, 145, 209–225.
Bottasso, A., & Conti, A. (2011). Quantitative Techniques for Regulatory Benchmarking Wilson, P. (1993). Detecting outliers in deterministic nonparametric frontier models
- A CERRE Study. Brussels: CERRE. with multiple outputs. Journal of Business and Economic Statistics, 11, 319–323.
Boxwell, R. (1994). Benchmarking for Competitive Advantage. New York, NY.: Mc- Wilson, P. (1995). Detecting influential observations in data envelopment analysis.
Graw-Hill. Journal of Productivity Analysis, 6, 27–45.
Cazals, C., Florens, J., & Simar, L. (2002). Nonparametric frontier estimation: a robust Zanella, A., Camanho, A., & Dias, T. (2013). Benchmarking countries’ environmental
approach. Journal of Econometrics, 106, 1–25. performance. Journal of the Operational Research Society, 64(3), 426–438.
Chambers, G., Chung, Y., & R. , F. (1996). Benefit and distance functions. Journal of Zhu, S., Li, D., & Wang, S. (2009). Robust portfolio selection under downside risk
Economic Theory, 70, 407–419. measures. Quantitative Finance, 9(7), 869–885.
Charnes, A., Cooper, W., & Rhodes, E. (1978). Measuring the efficiency of decision
making units. European Journal of Operational Research, 2, 429–444.