You are on page 1of 13

European Journal of Operational Research 286 (2020) 649–661

Contents lists available at ScienceDirect

European Journal of Operational Research


journal homepage: www.elsevier.com/locate/ejor

Decision Support

Theory and statistical properties of Quantile Data Envelopment


Analysis
Joseph Atwood a,∗, Saleem Shaik b
a
Department of Agricultural Economics and Economics, Montana State University, 208 Linfield Hall, P.O. Box 172920, Bozeman 59717-2920, MT, United
States
b
Department of Agribusiness & Applied Economics, Center for Agricultural Policy and Trade Studies, North Dakota State University, 400 F Barry Hall, Dept
7610, Fargo PO Box 6050, ND 58108-6050, United States

a r t i c l e i n f o a b s t r a c t

Article history: This research proposes Quantile Data Envelopment Analysis (qDEA) as a procedure that accounts for the
Received 3 May 2019 sensitivity of Data Envelopment Analysis (DEA) to data or firm outliers when using DEA to estimate com-
Accepted 28 March 2020
parative efficiency or benchmarking performance metrics. The qDEA methodology endogenously identifies
Available online 12 April 2020
the distance to a qDEA-α hyperplane while allowing up to proportion q = 1 - α of the data observations
Keywords: to lie external to the qDEA-α hyperplane. The ability of qDEA to provide more conventional quantile-
Benchmarking based benchmarking information is discussed. The statistical properties of the qDEA estimator are exam-
Data envelopment analysis ined utilizing nCm subsampling and Monte Carlo procedures. Monte Carlo simulations indicate that qDEA
Partial moments distance estimates share the desirable root-n convergence and large sample normality properties of the
Quantile DEA robust Free Disposal Hull (FDH) based order-m and order-α estimators.
Robust DEA estimator
© 2020 Elsevier B.V. All rights reserved.

Introduction A complication with using DEA in a regulatory or benchmark-


ing setting is the well-known sensitivity of data outliers. For the
Data Envelopment Analysis (DEA) has proven to be a useful pro- purposes of this paper, we define data outliers consistently with
cedure for constructing comparative performance metrics in situa- Bogetoft and Otto’s discussion (2011-page 147) as either (1) sta-
tions where entities or DMUs are engaged in processes that trans- tistical anomalies, (2) observations with high-leverage, or (3) data
form sets of inputs into sets of outputs. DEA is also increasingly from outlier DMUs whose observed input-output levels are not
being utilized in benchmarking and regulatory applications (Adler, practically replicable by a given DMU1 . Concerns with the sen-
Liebert, & Yazhemsky, 2013; Banker, & Zhang, 2017; Bogetoft, 2012; sitivity of distance metrics to statistical outliers has led to the
Bogetoft & Otto, 2011; Bottasso & Conti, 2011; Zanella, Camanho, & development of procedures to identify and delete potential out-
Dias, 2013). In regulatory applications, a firm whose inputs are too liers (Bogetoft & Otto, 2011; Simar, 2003; Wilson, 1993; 1995) as
high or outputs too low relative to other firms in a reference group well as the development of more robust estimators including the
may be given a period of time to make improvements in their cur- Free Disposal Hull (FDH) based order-m and order-α estimators
rent input-output mix or face financial penalties. In benchmark- (Aragon, Daouia, & Thomas-Agnan, 2005; Cazals, Florens, & Simar,
ing applications, a given decision making unit (DMU) may wish 20 02; Daouia & Simar, 20 05; 20 07); Daouai and Simar (2007b),
to compare their input-output mix to the levels obtained by other (Wheelock & Wilson, 2008).
DMUs and attempt to identify potential changes in their operation Concerns with the practical reachability of points on DEA effi-
that might be technically obtainable or reachable in the short, in- cient frontiers has led to the use of clustering procedures where
termediate, or long-run. In both cases, a reference set of observed DMU’s are compared only to sets of DMU’s with similar character-
input-output realizations from other DMUs is used to construct a istics (Cook, Ruiz, Sirvent, & Zhu, 2017; Dai & Kuosmanen, 2014) or
set of potential input-output combinations that might be techni-
cally obtainable or reachable by the given DMU. 1
While we agree with Bogetoft and Otto and a reviewer that firms classified as
outlier’s using the third criterion should be examined as potentially informative
with respect to potential innovations, procedures for analyzing the characteristics
of such firms is beyond the scope of this paper. Rather this paper focuses pro-

Corresponding author. cedures useful in addressing on Bogetoft and Otto’s stated need (page 149) for a
E-mail addresses: jatwood@montana.edu (J. Atwood), Saleem.Shaik@ndsu.edu procedure to endogenously identify and accommodate groups of such potentially
(S. Shaik). outlying firms.

https://doi.org/10.1016/j.ejor.2020.03.077
0377-2217/© 2020 Elsevier B.V. All rights reserved.
650 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661

to procedures that endogenously identify directions to points that qDEA-α hyperplane and points lying in the qDEA-α half-space as
lie on the efficient DEA frontier but are closer or closest to a given being internal to the qDEA-α hyperplane.
DMU’s data point. Aparicio, Cordero, and Pastor (2017); Aparicio The qDEA approach is similar in concept to the FDH based
and Pastor (2014); Aparicio, Ruiz, and Sirvent (2007); Ramón, Ruiz, order-α approach in that the qDEA process allows the user to
and Sirvent (2018); Ruiz and Sirvent (2016). Even when a closest obtain more robust DEA distance metrics by allowing endoge-
frontier point generated by efficient points is identified, concerns nously identified sets of reference observations to lie external to
over the reachability of such points has led some researchers to the qDEA-α hyperplane. Additionally, the conventional DEA pro-
suggest procedures where sets of efficient points are iteratively cess can be viewed as a special case of qDEA in which q = 0 or
deleted and the closest projected points generated with the re- α = 1 and no observed data points are allowed to be external to
maining sets of data (Ramón et al., 2018). Ramon, Ruiz, and Sivent the qDEA-1 hyperplane. By endogenously identifying and allowing
cite several papers that iteratively eliminate efficient points includ- sets of external of points to be qDEA-α superefficient, qDEA di-
ing a paper by Barr, Durchholz, and Seiford (20 0 0) where the pro- rectly addresses the problem of identifying sets or groups of out-
cess is termed “peeling the onion”. lier firms as discussed by Bogetoft and Otto (page 149).
The sensitivity of comparative performance metrics to non- Extending the work of Atwood and Shaik (2018), we discuss the
representative observations or outliers is not unique to DEA. Any theory, development, usefulness, and limitations qDEA in greater
comparative study will be sensitive to outliers if a given DMU’s depth as well as examine the qDEA estimates’ desirable statisti-
performance is contrasted to the performance of the highest one cal properties of apparent root-n convergence and asymptotic nor-
or two achievers within a set of realized outcomes. Recogniz- mality. In the following sections we specifically: (1) motivate the
ing this potential, it is common practice in conventional bench- qDEA problem by discussing a numerical example borrowed from
marking applications (Boxwell, 1994; Risk Management Associa- (Cooper, Seiford, & Tone, 2006) and modified to include a “data
tion, 2019) to contrast a given DMU’s performance metrics to outlier”, (2) present two partial moment stochastic inequalities and
metrics constructed from a quantile of the population such as their application in constructing linear programming LP models
the top five, ten, 25 or even 50 percent of DMUs in the in- that endogenously identify sets of LP constraints to be violated,
dustry rather than to the top one or two historically performing (3) incorporate the partial moment inequality into DEA to develop
DMUs. the qDEA model and demonstrate procedures to obtain qDEA so-
The use of quantile benchmarking is desirable in that it de- lutions for the example problem, (4) present procedures and the
creases the sensitivity of the performance comparisons to outliers results of a set of simulations numerically examining the statistical
and also allows a given firm to compare itself to different peers properties of the qDEA estimator, and (5) briefly present an ex-
in situations where the performance levels of the top DMUs may ample demonstrating how qDEA benchmarking might be used in
not be viewed as replicable or reachable due to unobserved firm- a policy analysis or regulatory application. We conclude with a set
specific constraints. Quantile-based comparisons provide useful in- of caveats and suggestions with respect to utilizing quantile DEA
formation to a given DMU in that a bench-mark may be viewed as in an applied/regulatory benchmarking setting and potential future
more obtainable or reachable if 10% or more of a group of DMUs research.
have achieved or exceeded the benchmark than if only one or two
exceptional DMUs have attained the standard.
While quantile-based benchmarking is a common practice for 1. Motivating qDEA: a numerical example
more conventional performance metrics, procedures for imple-
menting quantile-based bench marking with DEA are more diffi- We demonstrate the implementation of the qDEA procedure
cult - especially when the analyst wishes to simultaneously iden- using a modification of Cooper, Seiford, and Tone’s (2007, pages
tify sets of more than one potentially outlying firms. Quantile- 26-27) Constant-Returns-to-Scale (CRS) single-input (employees)
regression, order-m and order-α quantile-based estimators that en- single-output (sales) eight DMU example. Given that the traditional
dogenously identify and address sets of outlying data have been DEA input and output orientation results can be obtained as spe-
developed with stochastic frontier or stochastic DEA models (Behr, cial cases of the Directional DEA (DDEA) (Chambers, Chung, & R.,
2010; Jradi & Ruggiero, 2019; Wang, Wang, Dang, & Ge, 2014) or 1996) model, all mathematical results and examples in the follow-
Free Disposal Hull (FDH) assumptions (Cazals et al., 2002; Daouia ing discussion will utilize DDEA. Since qDEA identifies ”q-super ef-
& Simar, 20 05; 20 07; Daraio & Simar, 20 07; Wheelock & Wilson, ficient points”, we first review primal and dual DDEA models that
2008) but quantile estimators have not been commonly available accommodate super-efficiency.
with linear programming based DEA models that accommodate A primal DDEA model that accommodates potential super-
multiple inputs and multiple outputs. efficiency for DMU-0 with V outputs, U inputs, n reference DMU’s,
The traditional DEA model can be interpreted as a procedure for and constant returns to scale (CRS) can be mathematically de-
estimating the distance, in a given direction, from a given DMU’s scribed as:
data point to a projected point lying on an endogenously identified Max φ
hyperplane while requiring that all reference set data points defin- s. t.
ing the possible input-output technology set lie on or within the n
same half-space of the hyperplane. Atwood and Shaik (2018) re- (a ) yvj z j − dv0 φ ≥ y0v f or v = 1, · · · , V
cently introduced the linear programming quantile DEA (qDEA) j=1 (1)
procedure in which quantile DEA distances are obtained by com- n

puting the distance, in a given direction, from a given DMU’s data (b ) xuj z j + du0 φ≤ x0u f or u = 1, · · · , U
point to a point on an endogenously identified hyperplane termed j=1

the qDEA-α hyperplane. The qDEA-α hyperplane divides input- (c ) z j ≥ 0 f or j = 1, · · · , n; φ f ree


output space into the qDEA-α and qDEA-q half-spaces where the
where φ is the directional distance for DMU-0, yv and xu are ob-
j j
qDEA-α half-space (in union with the qDEA-α hyperplane) con-
tains at least proportion α of the reference set’s observed input- served levels of output v and input u for DMU j, dv0 and du0 are
output data and the qDEA-q half-space contains the remaining DMU-0’s output v and input u directions, y0v and x0u are DMU-0’s
proportion q = 1 - α of the reference set’s observed data. We refer observed levels for output v and input u, and zj is a projection
j j
to points lying in the qDEA-q half-space as being external to the weight assigned to reference DMU j. We assume that yv and xu
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 651

DMU’s being the black line defined by the origin and point E. If
two external points are allowed (α = 6/8), the qDEA model en-
dogenously chooses to allow points B and E to lie external to the
hyperplane with the resulting qDEA-(6/8) hyperplane for all DMU’s
being the blue line defined by the origin and point D. We return
to this example below when we present procedures enabling the
identification of the qDEA-α hyperplanes in Fig. 1.
Specifying φ as a free variable with dv0 ≥ 0 and du0 ≥ 0 allows
movement in both the northwest and southeast directions indi-
cated by the black and red arcs in Fig. 1. If φ j,α < 02 , DMU j’s data
point is α -superefficient indicating movement in a red arc direc-
tion is required to reach DMU j’s qDEA-α hyperplane. In general, a
qDEA distance φ j, α <= 0, φ j, α = 0, and φ j, α >= 0 indicates that
DMU j’s data point is external to (superefficient), on (efficient), or
internal to (inefficient) its qDEA-α hyperplane. Fig. 1 plots the fea-
sible direction arcs for DMU’s A-H. In the following discussion, we
report the estimated CRS φˆ j, α distances for movements in hori-
zontal (input), vertical (output), and one-one directions indicated
by the dashed lines in Fig. 1.

2. A partial moment stochastic inequality and its application in


DEA LP models

To endogenously identify a set of qDEA-α external points, we


Fig. 1. Cooper, Seiford, and Tones single-input single output example with DDEA modify the probabilistically constrained linear programming prob-
directions and qDEA CRS Hyperplanes. lem presented by Atwood, Watts, Helmers, and Held (1988). As in-
dicated below, while AWHH’s procedures were developed as a risk
model, their procedures are broadly applicable in non-stochastic
are goods for all j, v, and u implying that the DDEA directions dv0 linear programs where the analyst wishes to allow a pre-specified
and du0 are non-negative. proportion of endogenously identified constraints to be violated.
In developing q-DEA, we utilize Charnes, Cooper, and Rhodes This is accomplished by utilizing the partial moment stochastic in-
(1978)’ (CCR) dual LP approach. The dual linear program of system equality presented by (Atwood, 1985). In the following section, we
(1) can be written as: review Atwood’s partial moment inequality and its incorporation
in AWHH’s probabilistically constrained LP problem.

V 
U
Min −y0v pv + x0u wu
2.1. Partial moment stochastic inequalities
v=1 u=1
s.t.
Partial moments can be defined as:

V 
U

(a ) j
yv pv − xuj wu ≤ 0 f or j = 1, · · · , n t
(2) ρLPM (γ , t ) = (t − x )γ f (x ) dx
v=1 u=1 −∞

V 
U and (3)
 ∞
(b ) dv0 pv + du0 wu = 1
ρUPM (γ , t ) = γ
(x − t ) f (x ) dx
v=1 u=1 t
(c ) pv ≥ 0 f or v = 1, · · · , V ; wu ≥ 0 f or u = 1, · · · , U for any γ > 0 and where ρ LPM (γ , t) is a Lower Partial
Moment (LPM), ρ UPM (γ , t) is an Upper Partial Moment (UPM),
where the dual values pv and wu are CCR output and input prices.
x is a random variable, and f(x) is a density function. Mean-
We note that if the analyst knew the set of qDEA-α external points
Partial Moment models are discussed in the finance literature
a-priori, φˆ nα distance estimates could be obtained by deleting ob-
(Anthonisz, 2012; Bawa, 1975; Bawa & Lindenberg, 1977; Cumova
servations from system (1) or constraints from inequalities (2a).
& Nawrocki, 2014; Nawrocki, 1991; Zhu, Li, & Wang, 2009) as al-
Alternatively, φˆ nα estimates could be obtained by solving system
ternatives to Markowitz’s (1952) mean-variance criterion. Fishburn
(1) with set of large negative Big-M values placed in the corre-
(1972) discussed several properties of LPMs including the relation-
sponding primal zj objective function locations or a set of large
ship between mean-LPM efficient solutions and differing degrees
positive Big-M values placed in the corresponding right-hand-side
of stochastic dominance. (Berck & Hihn, 1982) presented a stochas-
locations in system (2-a). As it is unlikely that the appropriate set
tic inequality using semi-variance (i.e. ρ LPM (2, μx ) or ρ UPM (2, μx ))
of superefficient points are known a-priori, we present linear pro-
and demonstrated that their semi-variance based inequality of-
gramming procedures that can endogenously identify sets of qDEA-
ten gave less conservative probability bounds than the one-sided
α external points.
Chebychev inequality. Atwood (1985) generalized Berck and Hihn’s
Table 1 presents modified input-output levels for CST’s example
(BH) semi-variance stochastic inequality using the LPM. The LPM
(CST’s pages 26-27) problem with the input-output mix for point
stochastic inequality is derived via:
B modified to a more extremal value of (X, Y) = (3,4). Fig. 1 repro-
duces CST’s figure 2.1 with the modified data. When no points are
allowed to lie external to the qDEA hyperplane, the resulting CRS 2
In the following discussion, the notation φ denotes the conventional DDEA dis-
qDEA-1 hyperplane for all DMU’s is the red line defined by the tance for DMU-0, φ α denotes DMU-0’s directional distance to the qDEA-α hyper-
plane estimated while allowing up to proportion q = 1-α of the reference set’s
origin and point B. If one external point is allowed (α = 7/8), the input-output observations to lie outside the qDEA-α hyperplane, and φ j,α denotes
qDEA model endogenously chooses to allow point B to lie external the qDEA-α distance for DMU j. The notation φˆ n , φˆ nα , and φˆ nj,α denotes distances
to the hyperplane with the resulting qDEA-(7/8) hyperplane for all estimated using a sample of size n.
652 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661

Table 1
Modified CST Example – Input-Output Levels, CRS DDEA and qDEA distances and projected points for one and two external points.

No external point One external point Two external point

Input Orientation

Distance Proj. Values Distance Proj. Values Distance Proj. Values

DMU x y φˆ j x˜ y˜ φˆ j, 7/8 x˜ y˜ φˆ j, 6/8, x˜ y˜

A 2 1 0.625 0.75 1.00 0.375 1.25 1.00 0.333 1.33 1.00


B 3 4 0.000 3.00 4.00 -0.667 5.00 4.00 -0.778 5.33 4.00
C 3 2 0.500 1.50 2.00 0.167 2.50 2.00 0.111 2.67 2.00
D 4 3 0.437 2.25 3.00 0.062 3.75 3.00 0.000 4.00 3.00
E 5 4 0.400 3.00 4.00 0.000 5.00 4.00 -0.067 5.33 4.00
F 5 2 0.700 1.50 2.00 0.500 2.50 2.00 0.467 2.67 2.00
G 6 3 0.625 2.25 3.00 0.375 3.75 3.00 0.333 4.00 3.00
H 8 5 0.531 3.75 5.00 0.219 6.25 5.00 0.167 6.67 5.00

Output Orientation

Distance Proj. Values Distance Proj. Values Distance Proj. Values

DMU x y φˆ j x˜ y˜ φˆ j, 7/8 x˜ y˜ φˆ j, 6/8 x˜ y˜

A 2 1 1.667 2.00 2.67 0.600 2.00 1.60 0.500 2.00 1.50


B 3 4 0.000 3.00 4.00 -0.400 3.00 2.40 -0.437 3.00 2.25
C 3 2 1.000 3.00 4.00 0.200 3.00 2.40 0.125 3.00 2.25
D 4 3 0.778 4.00 5.33 0.067 4.00 3.20 0.000 4.00 3.00
E 5 4 0.667 5.00 6.67 0.000 5.00 4.00 -0.062 5.00 3.75
F 5 2 2.333 5.00 6.67 1.000 5.00 4.00 0.875 5.00 3.75
G 6 3 1.667 6.00 8.00 0.600 6.00 4.80 0.500 6.00 4.50
H 8 5 1.133 8.00 10.67 0.280 8.00 6.40 0.200 8.00 6.00

One-one orientation

Distance Proj. Values Distance Proj. Values Distance Proj. Values

DMU x y φˆ j x˜ y˜ φˆ j, 7/8 x˜ y˜ φˆ j, 6/8 x˜ y˜

A 2 1 0.714 1.29 1.71 0.333 1.67 1.33 0.286 1.71 1.29


B 3 4 0.000 3.00 4.00 -0.889 3.89 3.11 -1.000 4.00 3.00
C 3 2 0.857 2.14 2.86 0.222 2.78 2.22 0.143 2.86 2.14
D 4 3 1.000 3.00 4.00 0.111 3.89 3.11 0.000 4.00 3.00
E 5 4 1.143 3.86 5.14 0.000 5.00 4.00 -0.143 5.14 3.86
F 5 2 2.000 3.00 4.00 1.111 3.89 3.11 1.000 4.00 3.00
G 6 3 2.143 3.86 5.14 1.000 5.00 4.00 0.857 5.14 3.86
H 8 5 2.429 5.57 7.43 0.778 7.22 5.78 0.571 7.43 5.57

 t  g
ρLPM (γ , t ) = (t − x )γ f (x ) dx = (t − x )γ f (x ) dx
−∞ −∞
 t
+ (t − x )γ f (x ) dx ⇒
g
 g
ρLPM (γ , t ) ≥ (t − x )γ f (x ) dx
−∞
 g  g
≥ (t − g)γ f (x ) dx = (t − g)γ f (x ) dx ⇒
−∞ −∞

ρLPM (γ , t ) ≥ (t − g)γ F (g) ⇒


ρLPM (γ , t )
Prob(x ≤ g) ≤ for all t > g and γ > 0 (4)
(t − g)γ
The preceding results use Fishburn’s lower partial moment but
can easily be modified to the use of upper partial moments (UPM)
when computing limits on the probability of upside events. With
the UPM and setting g > t, similar manipulations give:
ρUPM (γ , t )
Prob(x ≥ g) ≤ for all g > t and γ > 0 (5)
(g − t )γ
The PM inequalities are interesting results that (with an appro-
priate choice of γ and t) will usually generate less conservative
upper bounds than a one-sided Chebychev inequality Prob(μ −
kσ ≤ g) ≤ ( 1+1k2 ) or the BH semi-variance inequality Prob(x ≤ μ −

k ρLPM (2, μ ) ) ≤ ( k12 ).
Fig. 2 illustrates these results by plotting stochastic inequality Fig. 2. Lower Partial Moment (LPM) probability bounds.

probability bounds on the probability of x falling below g = 60


J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 653

J
with x Normal (100, 25). Fig. 2 plots the probability bounds for the where xk = j=1 yk, j z j is portfolio income in state k, μx is the
ρ (γ ,t )
one-sided Chebychev inequality as well as the LPM based LPM γ
(t−g) mean portfolio income, μz, j is asset j’s mean income, zj is the
probability limits for various combinations of γ and t. Fig. 2 also chosen level of asset j, ai, j and bi are portfolio constraint coef-
contrasts the probability bounds to the actual probability pnorm ficients, yk, j is the return to asset j in state k, t is an endoge-
(60, mean = 100, sd = 25) ∼ = 0.055. The BH semivariance proba- nously determined linear LPM integral limit, δ k is an income de-
J
bility bound is the right-most point on the red curve and is less viation below level t with δk = t − j=1 yk, j z j = t − xk when xk < t
conservative than the Chebychev one-tailed limit. For each γ , less and zero otherwise, rk is a non-negative probability of state k sat-

conservative LPM-based probability bounds can be found depend- isfying Kk=1 rk = 1, ρLPM = ρLPM (t ) is the endogenously computed
ing upon the level of t but an inappropriate level for t may also linear LPM, and g is a level of income that must be exceeded with
generate excessively conservative probability bounds. In situations probability α = 1 − q. System (9) selects a vector of optimal port-
where the Chebychev or the BH inequality bounds are exact, the folio levels z˜ that map to a vector of potential portfolio income
LPM inequality will generate exact probability bounds for an ap- realizations x˜. The model maximizes the portfolio’s expected in-
propriate choice of t and γ . come μx subject to technical portfolio constraints and the con-
The linear partial moment inequality (γ = 1 ) can often gener- straint t˜ − 1 /q ρLPM (t˜) ≥ g thus guarantying Prob(x˜ ≤ g) ≤ q.
ate less conservative bounds than using higher order partial mo- The portfolios and potential income levels x˜ chosen in system
ments. We limit our discussion to the set of linear partial mo- (9) were found to satisfy the probability constraint but were often
ments as the linear partial moment can be computed in an LP found to be excessively conservative in that the actual probabil-
model modeled with a finitely discrete set of outcomes. Atwood ity of portfolio income falling below g was frequently much lower
(1985) and AWHH (1988) demonstrated that, with a linear PM, a than q. The reason for this conservativeness can be explained us-
given application’s least conservative level for t could be endoge- ing insights from two reviewers who correctly pointed out that
nously determined in a linear programming model. Denoting the the AWHH’s (1988) model is similar to Rockafellar and Uryasev
linear partial moments as ρLPM (1, t ) = ρLPM (t ) and ρUPM (1, t ) = (20 0 0)’s conditional Value at Risk (cVaR) model. This perception is
ρUPM (t ), we can rewrite the probability limits (4) and (5) as: correct in that an upside-risk version of system (9) can be shown
to be equivalent to a cVaRq constrained version of Rockafellar and
ρLPM (t ) ρUPM (t ) Urysev’s (RU) model3 . With an upside risk model of losses, Rock-
Prob(x ≤ g) ≤ and Prob(x ≥ g) ≤
(t − g) > 0 (g − t ) > 0 afellar and Urysev’s results imply that the optimal t˜ and g terms
(6) in the binding upside risk constraint t˜ + 1 /q ρUPM (t˜) = g are, respec-
tively, the solution’s q-level Value at Risk (VaRq ) and the associated
Enforcing constraints:
conditional Value at Risk (cVaRq )4 .
t − 1/q ρLPM (t ) ≥ g and ρLPM (t ) ≥ 0 (7) The conservativeness of system (9)’s down-side risk solutions
arises from the result that g < t˜ or cVaRq (x˜) < VaRq (x˜) when
or ρ LPM (t) > 0. System (9) actually guarantees that Prob(x˜ ≤ t˜) ≤ q
t + 1/q ρUPM (t ) ≤ g and ρUPM (t ) ≥ 0 (8) with Prob(x˜ ≤ g) usually being less than q, i.e. conservative. There
are several possible approaches to obtaining less conservative so-
while allowing the LP model to endogenously determine the least lutions than those obtained from system (9). Subsequent unpub-
constraining partial moment limit t˜ and compute the linear par- lished research suggested that a two-stage process where the first
tial moments ρLPM (t˜) or ρUPM (t˜) is sufficient to guarantee that stage (system (9)) is used to identify constraints (constraints with
ρ (t˜) ρ (t˜)
Prob(x ≤ g) ≤ (t˜LPM
−g)>0
≤ q or Prob(x ≥ g) ≤ (g−
UPM
t˜)>0
≤ q. δ k > 0) to be relaxed in a second stage would generate less conser-
vative outcomes.
We again emphasize that, while originally developed as a risk
2.2. Incorporating partial moments into probabilistically constrained
model, system (9) is more broadly applicable than AWHH’s or RU’s
LP problems
chance-constrained or cVaR constrained setting. In system (9), the
yk, j values need not be in monetary or rates of return units and
AWHH utilized the linear LPM inequality results in construct-
the model need not involve risk. System (9) can be generalized to
ing a model that maximized the expected income of a portfolio of
other classes of linear programming models where the researcher
assets subject to a set of technical constraints and the additional
requirement that the probability of income falling below a target
level g not exceed q. Modifying their notation slightly, AWHH’s
portfolio optimization model with J assets, I portfolio constraints, 3
Rockafellar and Uryasev (20 0 0) presented a linear programming model that
and K states of nature can be written as: minimized the upside or loss-risk cVaRq while endogenously identifying the
cVaRq  s associated VaRq level. Their objective function is equivalent to minimiz-
ing t + 1 /q ρUPM (t ) = cVaRq (t ) with t = VaRq . Rockafellar and Urysev’s model can be

J
Max μx = μz, j z j modified to optimize an objective function subject to a maximal cVaRq by imposing
z the constraint t + 1 /q ρUPM (t ) ≤ g which is equivalent to the UPM constraint derived
j=1
previously (8).
s.t. 4
The ˜t = VaRq and g = cVaRq results can easily be demonstrated for system

J
(9). System (9) searches over potential portfolios while endogenously identifying
(a ) ai, j z j ≤ bi i = 1, . . . , I the t˜ level associated with the least conservative or minimal level of the proba-
t
j=1 bility limit ρLPM
t−g
(t )
. Note that ρLPM (t ) = (t − x ) f (x ) dx = F (t )[t − E {x|x ≤ t }] and
J −∞

(b ) yk, j z j − t + δk ≥ 0 k = 1, . . . , K ρLPM (t ) = F (t ). Taking the derivative of the linear LPM probability bound with
d
dt
d ρLPM (t ) F (t )(t−g)−ρLPM (t )
j=1 respect to t and setting the result equal to zero gives: dt (t−g) = (t−g)2
=
F (t )(t−g)−F (t )[t−E {x|x≤t }] ρLPM (t )
K = 0 ⇔ g = E {x|x ≤ t˜} with the second derivative of with
(t−g)2 t−g
(c ) rk δk − ρLPM = 0 respect to t (evaluated at the minimal t˜) being strictly positive. Assuming the
k=1 sufficiency constraint is binding i.e. t˜ − 1/qρLPM (t˜) = g implies: ρLPM (t˜) = q (t˜ − g) ⇒
(d ) t − 1q ρLPM ≥ g F (t˜)[t˜ − E {x|x ≤ t˜}] = q (t˜ − g) ⇔ F (t˜) = q. Thus, with AWHH’s downside risk con-
straint t − 1/q ρLPM (t ) ≥ g, the endogenously selected t˜ level is the VaRq (x˜) of the op-
(e ) z j ≥ 0 f or j = 1, · · · J; t free; δk ≥ 0; ρLPM ≥ 0
timal portfolio’s income such that Prob(x˜ ≤ ˜t ) =q and g is the associated cVaRq (x˜)
(9) or the Expected{x˜ | x˜ ≤ ˜t}.
654 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661

desires to allow up to a proportion of endogenously identified con- system (2) while relaxing the constraints associated with the po-
straints to be violated. In this paper, we utilize similar constraints tential external points identified in stage-1.
in a DDEA model that allows a proportion of endogenously identi-
fied input/output points to lie external to the q-DEA hyperplane. 3.1. qDEA example application, implementation, and issues
In the following we utilize a two-stage process where, in Stage
1, an augmented dual-DDEA system similar to system (9) is utilized In this section, we illustrate procedures for obtaining q-DEA es-
to identify constraints to be relaxed (i.e. identify sets of potential timates for the modified Cooper, Sieford, and Tone’s (CST) eight
qDEA-α super-efficient DMUs). Stage 2 of qDEA consists of solving DMU example. The input (employees) and output (sales) data,
conventional DDEA models (1) or (2) with the constraints identi- estimated qDEA-α distances, and projected points are presented
fied in qDEA Stage-1 being relaxed. Given that constraints (2-a) is in Table 1. Table 1 presents the qDEA constant returns to scale
system (2) involve ≤ restrictions, we utilize the upper partial mo- (CRS) results using three directional assumptions: (a) the in-
ment ρ UPM (t) and the t + 1 /q ρUPM (t ) ≤ g sufficiency constraint. put orientation problem or horizontal movement with directions
(duj = xuj , dvj = 0 ), (b) the output orientation problem or verti-
j j j
cal movement with directions(du = 0, dv = yv ), and (c) a unit or
3. The qDEA model: development and application to example j j
problem one-one direction model with diagonal directions (du = 1, dv = 1 )
while allowing 0, 1, or 2 points to lie external to the qDEA-α
Dual DDEA system (2) is augmented to implement the first hyperplane.
stage of the qDEA model as follows: The qDEA solution to this problem is obtained in two stages.
Figs. 3 and 4 illustrate the implementation of qDEA for DMU H

V 
U and the ”one-one” direction using screen shots from MS-Excel.
Min −y0v pv + x0u wu Fig. 3 presents an augmented MS Solver tableau for qDEA stage-1
v=1 u=1 (system (10)) while allowing no more than two points to lie exter-
s.t. nal to the hull. From our experience with the conservative partial

V 
U
(a ) yvj pv − xuj wu − t − δ j ≤ 0 for j = 1, . . . , n moment models, if we want no more than NP = 2 points to lie
v=1 u=1
external to the hull, identifying the positive δ j values are less sen-
sitive to rounding errors (for small δ j positive values) if q is set just

V 
U
(b ) dv0 pv + du0 wu = 1 below (NP+1)/n. For this problem with NP = 2, q was set equal to
v=1 u=1 (2+1) / 8 - 0.0 0 01 = 3/8 - 0.0 0 01 ∼ = 0.37499 which will guarantee

n that the proportion of the positive δ j will not exceed 0.37499 <
1
(c ) δ j − 1 ρUPM = 0 3/8. The value 1/q in the constraint matrix is thus 1 / 0.37499 ∼ =
n
j=1 2.6667.
(d ) 1t + 1
q
ρUPM ≤ 0 The highlighted qDEA stage-1 solution in Fig. 3 indicates that
the model has determined to allow points B and E to potentially
(e ) pv ≥ 0 ; wu ≥ 0 ; t a free variable; δ j ≥ 0 ; ρUPM ≥ 0 lie external to the qDEA-(6/8) hyperplane. The objective value from
qDEA stage-1 is a conservative6 φˆ H = 1.21741 with projected point
(10)
(x˜H , y˜H ) = (xH − φˆ H 1, yH + φˆ H 1 ) = (6.7826, 6.2174). The second
where pv is an output price, wu is an input price, t is the endoge- stage of the qDEA model is a DDEA model (2) with relaxed con-
nously determined upper partial moment (UPM) integral limit, δ j straints. Fig. 4 presents the qDEA stage-2 dual tableau (system (2))
 j U j where the constraints associated with points B and E have been
is a deviation above t when Vv=1 yv pv − u=1 xu wu > t and zero
otherwise, ρUPM = ρUPM (t ) is the endogenously calculated linear relaxed. In Fig. 4, we relax the constraints associated with DMU’s
UPM, and 0 < q < 1 is the maximal proportion of data points that B and E by adding large positive values in the corresponding right
are allowed to lie outside the qDEA-α hyperplane. Since we wish hand side locations. The resulting qDEA-(6/8) direction (1, 1) dis-
tance is φˆ H, /8 = 0.5714 for point H.
6
to allow up to proportion q of the constraints in system (2-a) to
be violated, we set g = 0 in the UPM constraint (10-d) and allow t Fig. 4’s left-hand-side (LHS) values for the constraints associ-
to be negative. System (2) is fully nested in system (10) as setting ated with both DMU’s B and E exceed 0, indicating that points B
q < 1/n will guaranty that no points can lie outside the hyperplane and E are indeed external to DMU H’s qDEA-6/8 hyperplane. We
thus obtaining the conventional DDEA results. return to this issue below where we discuss an example where
Potential qDEA-α super-efficient DMU’s or external points in one iteration of the two-stage qDEA process does not result in two
system (10) are identified as constraints with positive δ j values in external points due to the conservativeness of the partial moment
the solution. System (10) may sometimes be conservative in that inequality.
fewer than proportion q of the δ j may be positive but the solu- We conclude this section by noting that the excel screen shots
tions to system (10) will have no more than proportion q of po- presented above were for illustrative purposes only. All results in
tentially external points. We also demonstrate below that the ob- the following discussions were obtained using R. The associated R
jective function value from expression (10) does not give the final code is available from the authors upon request
DDEA distance estimate as system (10)’s extrapolated point (and
the resulting distance metric) will consist of an extrapolation using 3.2. qDEA distances for the modified CST example
information from all external points plus the new qDEA support
points5 . While system (10) generates conservative distance esti- Table 1 presents the DDEA and qDEA distances and pro-
mates, the set of positive δ j values endogenously identifies a set of jected points for all eight DMU’s under constant returns to scale
j j j j j j
potential points allowed to lie external to the given DMU’s qDEA- and for the input (du = xu , dv = 0 ), output (du = 0, dv = yv ), and
α hyperplane. The second stage of the qDEA process re-estimates
6
The qDEA stage 1 solution is conservative in that its projected (xˆ, yˆ ) values
are projected from the two potential external points B and E as well as point D
5
We use the terminology support points to denote the points that define the in Fig. 1. This is apparent by examining the LP’s dual values presented in Fig. 3.
hyperplane onto which the given DMU’s input-output points are projected with φ α These negative of the dual values are DMU projection weights in dual system (10)’s
being the distance from the initial point to the projected point on the hyperplane. associated primal LP.
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 655

Fig. 3. qDEA Stage-1 Tableau identifying two potential external points.

Fig. 4. QDEA Stage II tableau identifying qDEA efficiency scores.

diagonal (du = 1, dv = 1 ) directions. The φˆ j distance scores are


j j With two external points, DMUs B and E become qDEA-(6/8) su-
perefficient (φˆ B, /8 and φˆ E, /8 < 0) while DMU D (φˆ D, /8 = 0) is
6 6 6
the conventional directional distances indicating that only DMU B
(with a φˆ j distance of zero) is on the CRS DDEA efficient bound- now on the qDEA-(6/8) efficient.
ary (the red ray in Fig. 1). When one external point is allowed, the Table 1 presents the extrapolated points for the DDEA, qDEA-
CRS qDEA-(7/8) distances indicate that point B is now qDEA-(7/8) (7/8), and qDEA-(6/8) models and for the input, output and one-
superefficient (φˆ B, /8 < 0) while point E becomes qDEA-(7/8) effi-
7
one directions. From Fig. 1, and with input directions, we expect
cient (φˆ E, /8 = 0). The CRS qDEA-(7/8) distances of all other points
7 points (B, E), points (D, G) and points (C, F) to project or map
have decreased by a fairly substantial amount. The super-efficiency to the same projected points on the qDEA hyperplanes. Similarly,
of point B and the substantial decrease of the other DMU’s qDEA- with output directions, we expect points (B, C) and points (E, F) to
(7/8) efficiency scores provide evidence that point B is potentially map to the same projected points and with the one-one directions
an influential outlying point. When two external points are al- we expect points (B, D, F) and points (E, G) map to the same points
lowed, the CRS qDEA-(6/8) scores of all DMU’s decrease but by a on the DDEA, qDEA-(7/8) and qDEA-(6/8) hyperplanes. The results
smaller amount than when only one external point was allowed. presented in Table 1 confirm these expectations.
656 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661

We conclude the discussion of the CST example by noting that, ability to identify sets of potentially outlying data points. We now
if estimated with variable returns to scale (VRS), the qDEA-α hy- examine the statistical properties of the qDEA estimators.
perplane’s external points will often be DMU and direction depen-
dent. As discussed by Atwood and Shaik (2018), with VRS qDEA- 4. Statistical properties of qDEA estimates
α distances (in some directions) for some qDEA-α superefficient
DMU’s will be dual unbounded (primal infeasible) due to the in- A natural question is whether the qDEA estimator possesses de-
creased number of points that are allowed to be external to the sirable statistical properties. Given that, to date, we have not been
DMU’s qDEA-α hyperplane. For example, in Fig. 1, with VRS and able to derive closed form expressions for the asymptotic proper-
one external point, points H and A will be qDEA-(7/8) supereffi- ties of the qDEA estimator, we use several assumed data generating
cient. Movement in the input (horizontal) direction from point H processes (DGP) and numerical methods to examine these proper-
or the output (vertical) direction from point A will result in a pri- ties.
mal infeasibility as movement in those directions cannot reach an Specifically, we use Monte Carlo simulations and a slight mod-
endogenously identified qDEA-(7/8) hyperplane. ification of nCm subsample bootstrapping theory to examine the
Although not discussed by Atwood and Shaik, several authors large sample properties of the qDEA estimator. In the next section
(Atwood, S., & Walden, 2017; Chen, 2005; Chen & Liang, 2011; of the paper, we describe: (1) the procedures utilized to estimate
Cook, Liang, Zha, & Zhu, 2009; Lee, Chu, & Zhu, 2011; 2012; Lovell large sample convergence rates and perform normality tests for the
& Rouse, 2003; Seiford & Zhu, 1999) have proposed methods to qDEA estimates and (2) the class of DGP used in the Monte Carlo
allow estimation of DEA metrics for superefficient DMU’s whose simulations. We then present and discuss the results of the large
initial DEA solutions are primal infeasible. Many of the suggested sample convergence rate estimations and normality tests.
procedures involve the identification of alternative directions that
allow movement from the given DMU’s input-output observation 4.1. Numerically estimating large sample statistical properties
to a point on a feasible data-generated hyperplane.
Geyer (2013) presents a succinct review of the Politis, Romano,
3.3. The q-Conservativeness of qDEA Stage-2 solutions and Wolf’s (1999,2001) nCm subsampling process. Space presents
a complete discussion but the results are that confidence intervals
Another concern with qDEA is that, in practice, we often find for a statistic θˆn can often be estimated by utilizing properties of
that a single iteration of the qDEA two-stage process results in the statistic:
q-conservative solutions in that fewer than proportion q of the  m β

data points actually lie external to the qDEA-α hyperplane at the Sm = θˆn − n
θˆm − θˆn (11)
completion of qDEA stage-2. In our experience, conservative stage-
2 solutions are more likely if the data contains large influential where θˆn is a parameter estimate obtained from a sample of size
outliers. The reason for conservative stage-2 solutions lies in the n, β is a convergence rate, and θˆm is an estimate from a subsam-
use of the stochastic inequality when identifying potential exter- ple of size m(n)
n where
denotes ”much smaller than”7 .
nal points in qDEA stage-1. The qDEA stage-1 model allows the LP In finite sample nCm bootstrapping, B sets of m observations are
model to select a partial moment limit t and endogenously contort repeatedly resampled from a given sample of size n and used to
the original constraint polytope while constraining the weighted estimate θˆm,b for b = 1, .., B. Confidence intervals for the parameter
sums of deviations above t in such a way that no more than pro- θ can then be estimated by using the quantiles of the set of B
portion q of the δ j = y j p − x j w + t values lie above zero. While the simulated statistics:
solution will commonly have up to proportion q of positive δ j val-  m β

Sm, b = θˆn − n
θˆm, b − θˆn (12)
ues identifying potential external points in the original DEA prob-
lem, not all such points may actually be external when the qDEA Constructing expression (12)’s Sm,b values requires that the ana-
stage-2 problem is solved. In response to this issue, we developed lyst identify appropriate levels for both the convergence rate β and
a multi-iteration procedure that addresses the qDEA stage-2 con- the subsample size m. While the convergence rates for the DEA,
servativeness and allows the researcher to identify solutions with FDH, and FDH-related order-m and order-α estimators are known8 ,
proportion q of external points if such a solution is feasible. we have not been able to derive β for the qDEA estimator.
An q-conservative example is easily constructed by modifying Given that there is no known way to exogenously select the op-
DMU B’s observation to a more extremal (xB , yB ) = (2, 5 ) while timal level for the subsample size m (Politis et al., 1999) a common
attempting to allow two external points. At the completion of practice is to estimate θˆmi ,b over a range of mi values and select
the procedures demonstrated in Figs. 3 and 4, we find that only the appropriate mi using data driven results (Politis et al., 1999;
one point actually lies external to the qDEA hyperplane. Supple- Simar & Wilson, 2011b). When both the DGP and β are unknown,
mental materials available from the authors demonstrates a multi- Politis et. al. (1999-chapter 8) and Geyer discuss finite-sample pro-
iteration procedure that addresses the issue and identifies two ex- cedures utilizing the bootstrapped θˆm, b values to estimate the un-
ternal points. As also discussed in the supplemental material, from known convergence rate parameter β . Politis et. al. suggest esti-
our simulation experience it appears that the original distance es- mating convergence rates by examining the rate at which inter-
timates obtained from completing and bootstrapping one iteration quantile-ranges computed from θˆmi , b change with increases in the
of the qDEA process will generate consistent estimates of the φˆ αˆ subsample size mi . We initially utilized their suggested finite sam-
distance where αˆ = 1 − qˆ and qˆ is the proportion of Stage-2’s ac- ple βˆ estimation procedures but found that, while the finite sam-
tual external points. While in many benchmarking exercises one ple βˆ estimates performed relatively well as n increased, the pro-
iteration and an estimate of the corresponding qˆ may be sufficient, cedures gave unstable results at smaller sample sizes such as when
R code available from the authors allows the user to specify the
number of two-stage qDEA iterations and the qˆ -tolerance desired.
7
More formally m(n)
n denotes using a rule for selecting m (as a function on
All results in the following sections of the paper have been esti- √
n) such that Lim m(n ) = ∞ and Lim m(nn ) = 0. One such rule is to set m(n ) = K n
mated with up to 4 iterations of the two-stage qDEA process de- n→∞ n→∞
where K is a positive scalar.
scribed in the appendix. 8
The DEA CRS and VRS convergence rates are, respectively, β = U+2 V and β =
We have presented the qDEA model and demonstrated its abil- 2
. The FDH, order-m, and order-α convergence rates are, respectively, β = U+1 V ,
U+V +1
ity to obtain quantile distance/efficiency metrics as well as qDEA’s β = 1 /2 , and β = 1 /2 (Simar & Wilson, 2011a)
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 657

Table 2
Large sample convergence rate estimates and Pearson-Normality-Test P-Values by DGP, RTS, Number of Inputs U, and Sample Size.

Pearson Normality Test P-values by Sample Size

DGP RTS U βˆ m10 0 0 m1395 m1946 m2714 m3786 m5282 m7368 m10278 m14337 m20 0 0 0

Beta13 CRS 1 0.525 0.057 0.131 0.164 0.114 0.436 0.198 0.200 0.452 0.323 0.361
Beta13 CRS 3 0.506 0.355 0.827 0.980 0.188 0.857 0.426 0.343 0.423 0.005 0.206
Beta13 CRS 5 0.522 0.781 0.729 0.420 0.108 0.111 0.486 0.364 0.894 0.674 0.262
Beta13 VRS 1 0.500 0.530 0.684 0.852 0.370 0.738 0.469 0.647 0.499 0.792 0.835
Beta13 VRS 3 0.505 0.527 0.496 0.126 0.817 0.873 0.697 0.355 0.967 0.582 0.146
Beta13 VRS 5 0.513 0.094 0.857 0.803 0.061 0.436 0.346 0.751 0.547 0.111 0.423
Beta24 CRS 1 0.491 0.869 0.131 0.798 0.003 0.978 0.376 0.564 0.657 0.089 0.019
Beta24 CRS 3 0.491 0.884 0.066 0.433 0.575 0.880 0.267 0.103 0.923 0.924 0.506
Beta24 CRS 5 0.498 0.021 0.533 0.729 0.370 0.589 0.204 0.089 0.674 0.052 0.456
Beta24 VRS 1 0.516 0.589 0.650 0.864 0.684 0.726 0.394 0.410 0.585 0.160 0.328
Beta24 VRS 3 0.502 0.558 0.732 0.976 0.994 0.833 0.099 0.190 0.080 0.551 0.097
Beta24 VRS 5 0.503 0.042 0.257 0.394 0.866 0.285 0.213 0.097 0.151 0.056 0.379
Beta510 CRS 1 0.494 0.564 0.039 0.456 0.778 0.916 0.312 0.803 0.544 0.373 0.713
Beta510 CRS 3 0.491 0.086 0.896 0.222 0.413 0.852 0.544 0.095 0.869 0.317 0.690
Beta510 CRS 5 0.508 0.719 0.397 0.254 0.169 0.063 0.674 0.795 0.065 0.738 0.806
Beta510 VRS 1 0.487 0.838 0.911 0.789 0.857 0.997 0.171 0.249 0.509 0.370 0.482
Beta510 VRS 3 0.494 0.931 0.388 0.987 0.909 0.571 0.238 0.664 0.527 0.582 0.267
Beta510 VRS 5 0.501 0.707 0.869 0.025 0.985 0.070 0.472 0.943 0.122 0.602 0.252

n = 100. Due to the unstable results at smaller sample sizes, we When λ = 0.80, we estimated the qDEA model with a variable re-
chose to use Monte Carlo procedures to estimate large sample con- turns to scale (VRS) qDEA model. When λ = 1.0, the qDEA model
vergence rates for a set of assumed data generating processes. was estimated with constant returns to scale (CRS). Efficient in-
When estimating large sample convergence rates, we use an put and the output levels for DMU-0 were set at x˜0u = 0.5 for
assumed DGP to generate 10 0 0 (XN, mcs , YN, mcs )data sets of size U
λ /U
N = 20 0 0 0 for Monte Carlo repetition mcs = 1,...,10 0 0. We then u = 1,...U, with y˜0 = ( x˜0u ) giving y˜0 = 0.5 when λ =1 and
u=1
estimated φˆ m α
i ,mcs
distances for each sample size mi in a vec- y˜0 = 0.57435 when λ = 0.80. Inefficiency is introduced for DMU0
tor of ten subsample sizes in mlist ranging between 10 0 0 and via setting x0u = 2 x˜0u = 1 and y0 = y˜0 . The corresponding input-
20 0 0 0. Motivated by one of Geyer’s examples, we used the fol- orientation DDEA parameter with directions du0 = x0u and dv0 = 0
lowing vector mlist of mi values to estimate large sample conver- can be shown to be φ = 0.5.
gence rates: mlist = (10 0 0, 1395, 1946, 2714, 3786, 5282, 7368, For each Monte Carlo simulation mcs = 1,..., 10 0 0, a reference
10278, 14337, 20 0 0 0). We emphasize that since we are repeat- DMU data set of size N = 20 0 0 0 was generated using simulated
edly simulating data from an assumed DGP and estimating only j iid
α efficient input-output levels generated via: x˜u,mcs ∼ uni f (0, 1 ) and
one set of φˆ m i , mcs
for each mcs repetition, we are not limited by λ /U
U
the mi < < n requirement necessary when repeatedly subsampling j j
y˜mcs = ( x˜u,mcs ) for DMUs j = 1,..., N. Input inefficiency for
from a given sample of size n. For each sample size mi in mlist, u=1
φˆ mα i , mcs is estimated using the first mi observations in the data DMUj is introduced by randomly simulating φmcs ∼ BET A(a, b) and
j

setting ymcs = y˜mcs and xu,mcs = x˜u,mcs /(1 − φmcs ) for u = 1, . . . , U 9 .


j j j j j
set(XN, mcs
,Y
N, mcs
). The 10 0 0 sets of φˆ α estimates are then
mi , mcs
stored in a 10 0 0 x 10 matrix qeffhat. For each simulation repetition mcs, the generated reference data
j j
To estimate the large sample convergence rate, a set of 24 xu, mcs and ymcs was used to populate an N by U input matrix XN, mcs
inter-quantile ranges (iqnr) (quantil e75 − quantil e01 , quantil e76 − and an N by 1 output matrix YN, mcs .
quantil e02 , ..., quantil e99 − quantil e24 ) is computed for each col-
umn in qeffhat. The process generates a vector of 24 inter-quantile 4.3. Estimated convergence rate and normality test results
quantile range (iqnr) estimates for each sample size mi , pools
the iqnr values across sample sizes, matches the iqnr values with We completed the above process for a number of input U
the corresponding sample size, and regresses the log of the iqnr numbers and quantile levels α . Due to space limitations, we dis-
against the log of the sample size mi . The estimated convergence cuss results using the CRS and VRS BETA(1,3), BETA(2,4), and
parameter βˆ is recovered as the regression slope parameter multi- BETA(5,10) DGP with V = 1 output, U = 1, 3, or 5 inputs, and
plied by a negative one. We then applied a Pearson normality test α = 0.95. Fig. 5 presents panel plots demonstrating several results
to the set of estimated qDEA distances φˆ m α for each data gen- for the VRS-BETA(1,3) DGP with U = 3 inputs. The upper left panel
i , mcs
presents a boxplot of the estimated qeffhat = φˆ m0.95 values by sam-
erating process and sample size. The convergence rate estimates βˆ i
ple size. The boxplot demonstrates that: (1) the dispersion of the
and Pearson p-values (with low values indicating that normality
should be rejected) are presented in Table 2.
φmi estimates decrease with sample size, (2) the median values of
ˆ 0 . 95

the φˆ 0.95 estimates increase with sample size, and (3) the median
mi
4.2. Data Generating Processes (DGP) values appear to be convergent with increases in sample size. For

We examined the convergence of the qDEA estimator using 9


With an input orientation DDEA model and DDEA distance φ , the efficient
Monte Carlo procedures with several assumed data generating pro- projected point is x˜ = x − φ x = (1 − φ )x ⇒ x = x˜/(1 − φ ) with 0 ≤ φ < 1. Simar and
cesses (DGP). The results presented below used Simar and Wil- Wilson (2011-b) introduced input inefficiency by generating τ , ∼ EXP(3) and setting
x = x˜eτ . If (1−1φ ) = eτ , it is a relatively easy exercise to show that φ ∼ BETA(1, 3)
son’s (2011-b) assumed DGP with returns to scale parameters λ
when τ ∼ EXP(3). Generating φ j ∼ BETA(a, b) rather than using an exponential dis-
= 0.80 or λ = 1.0 to generate U = 1, 3, and 5 input and V = tribution allowed us greater flexibility in generating sets of φ j values with fewer or
1 output data for a given DMU-0 and also to repeatedly gener- more points near the efficient φ = 0 boundary in our sets of Monte Carlo simula-
ate input-output reference sets of N = 20,0 0 0 reference DMU’s. tions.
658 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661

Fig. 5. Panel plots of large sample parameter estimates by sample size, Log(iqnr) by Log(Sample Size), and qqnorm plots of parameter estimates by sample size for VRS-
BETA(1,3)-U3 data generating process.

all DGP examined we obtained similar results indicating that the The lower panels in Fig. 5 present qqnorm-qqline plots for the
φˆ α estimator is apparently a biased but convergent estimator as 10 0 0 φˆ 10
0.95
0 0, mcs
and φˆ 20
0.95
0 0 0, mcs
estimates. An examination of the
sample size increases. The bias result is not troubling in that the plots provides strong evidence that the φˆ 0.95 estimates are nor-
mi , mcs
nCm bootstrapping process can be used to obtain bias-corrected mally distributed. When Pearson’s normality test is applied to the
estimates for φˆ α (Politis, et. al). residuals, the resulting p-values, presented in Table 2, are 0.527,
The upper right panel in Fig. 5 presents a boxplot of the log of and 0.146, indicating that normality cannot be rejected for the
the inter-quantile-ranges (iqnr) (generated in the convergence rate φˆ 10
0.95 and φˆ 20
0.95 estimates.
0 0, mcs 0 0 0, mcs
estimation process) plotted against the log of the sample sizes. An
Table 2 presents the large sample βˆ convergence rate esti-
examination of the second boxplot indicates that log(iqnr) visually
mates and normality test p-values for the BETA(1,3), BETA(2,4), and
appears to decrease linearly with log(m). For this example, when
BETA(5,10) data generating processes with constant and variable
log(iqnr) is regressed against log(m), the resulting slope parameter
returns to scale, one output, and number of inputs U = 1, 3, and
is -0.505 giving convergence rate estimate βˆ = 0.505 as presented
5. In Table 2, the estimated convergence rates βˆ are close to 0.5
in Table 2.
for all DGP, returns to scale, and number of input levels U exam-
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 659

Fig. 6. Flow chart of an Example of Using qDEA Benchmarking Process

ined. The Pearson p-values indicate that normality of φˆ m0.95 is not


i
request, the next section discusses how qDEA might be applied in
rejected with 7 of 180 or 3.9% of the p-values being below 0.05. a practical setting. We discuss how an analyst who wishes to es-
We utilized the large sample β estimation process with a num- timate an amount that inputs can be proportional reduced while
ber of other assumed DGP including DGP with different ranges for maintaining original output levels can utilize qDEA to obtain more
X and Y, various returns to scale, and multiple inputs and multi- robust proportional input-contraction estimates.
ple outputs. Convergence rate estimates and normality test results
similar to those reported in Table 2 were obtained for all DGP ex- An example of using qDEA benchmarking in obtaining robust
amined to date. As a robustness check of our convergence rate esti- estimates of potential proportional amounts of feasible input
mation process, we used the same procedures and DGP to estimate contraction while maintaining current output levels
large sample convergence rates for the order-α FDH estimator. The
order-α estimator has a known asymptotic convergence rate (root- In this section we demonstrate how benchmarking with qDEA
n) and parameter distribution (normal) (Simar & Wilson, 2011a). In might allow an analyst or regulatory organization to more ro-
the order-α case, our large sample results indicate order-α root-n bustly estimate individual and ”industry level” input contraction
convergence and large sample normality for all DGP examined. levels while maintaining current output levels. The process also
We conclude the statistical section, however, by reminding the addresses potential outliers and influential or non-representative
reader that we have not been able to derive closed form expres- firms in the data. For this demonstration, we generated data for
sions for either the root-n convergence rate or the asymptotic nor- 10 0 0 DMUs using the DGP utilized in the construction of the
mality of the qDEA estimator. We also emphasize that, as α → 1 or Beta510 simulations with three inputs as described in the discus-
q → 0, the convergence rates for both the qDEA and order-α esti- sion of Table 2. The simulated data was used to estimate DEA
mators will approach the convergence rates of the underlying DEA and five qDEA-α directional distances for each DMU. The direc-
or FDH processes. We are experimenting with the use of Politis tional distances for each of the 10 0 0 DMU’s were estimated us-
et al.’s (1999) and Geyer’s finite-sample β estimation process in es- ing directional DEA, and qDEA for 99, 98, 97, 96 and 95 quantile
timating convergence rates as α → 1 and β diverges from 1/2. levels.
We have presented procedures for obtaining quantile-DEA es- The process can be summarized with the following bullet
timates and their statistical properties. In response to a reviewer points and the flow chart presented in Fig. 6.
660 J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661

Table 3
DEA and qDEA-α benchmarking comparative input contractions by quantile.

qDEA-95 qDEA-96 qDEA-97 qDEA-98 qDEA-99 DEA

Min. 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000


1st Quantile 0.0354 0.0461 0.0607 0.0797 0.1021 0.1618
Median 0.1268 0.1379 0.1497 0.1682 0.1897 0.2409
Mean 0.1445 0.1528 0.1631 0.1772 0.1962 0.2473
3rd Quantile 0.2249 0.2333 0.2438 0.2593 0.2764 0.3225
Max. 0.6428 0.6482 0.6519 0.6591 0.6667 0.6865

continue to decline but at a much slower rate. In this application


a more reasonable estimate of potential industry level input con-
traction would appear to lie in the 15 to 18% range.

Summary, Caveats, and Conclusions

This paper has presented procedures for estimating more ro-


bust DEA distance metrics by allowing up to a user-specified pro-
portion of DMU data points to lie external to the qDEA-hyperplane.
The qDEA procedures are applicable with most LP based estimation
procedures and can be implemented using standard linear pro-
gramming software thereby accommodating LP’s ability to incorpo-
rate alternative return-to-scale assumptions and identify or impose
input-output price information. Monte Carlo simulations indicate
that qDEA distance estimates appear to exhibit root-n convergence
and large sample normality.
However, that as α → 1 or q → 0, the convergence rates β and
the distribution of the φˆ α estimates for both the qDEA and order-
α estimators will approach the convergence rates and distributions
of the underlying DEA or FDH processes. In situations where the
researcher uses qDEA to identify and eliminate a relatively small
number of outliers, we suggest that the researcher assume the
Fig. 7. Box-plots of DEA and qDEA-α benchmarking comparative input contractions convergence rate of the underlying DEA process and estimate con-
by quantile.
fidence intervals using nCm bootstrapped empirical quantiles. We
also remind the reader that, while we have obtained results simi-
lar to those reported above for all DGP examined to date, we have
• Compute conventional DEA distance metrics φˆ j for each DMU j. not derived closed forms expressions for the asymptotic properties
• Compute qDEA-alpha distance metrics φˆ α for each DMU j and
j of the qDEA-α estimator.
for several alpha levels (i.e. α = 0.995, 0.99, 0.98, 0.97,0.96, We note that, in our experience, the analyst will find an in-
0.95, etc.) creased incidence of superefficient-infeasible solutions in some di-
• Truncate all φˆ αj < 0 at 0 (DMU’s with φˆ αj < 0 have met the α − rections with VRS for DMU’s that are qDEA-α superefficient. Sev-
standard. eral approaches are available if the analyst wishes to obtain dis-
• Compute average DEA and truncated qDEA − α distances for tance metrics for these DMUs. We refer the reader to the relevant
each α literature should this be a concern.
• Examine results to see if large jumps in mean distance metrics We conclude by addressing a valid concern raised by a reviewer.
at particular quantile levels. The reviewer expressed concern as to how the quantile DEA pro-
• Examine mean distance metrics at different quantile levels to cedure was to be used in practice and how a user could decide
form judgement at to practical feasibility of obtaining estimated on the appropriate quantile. The standard procedure in financial
proportional input reduction levels. and empirical benchmarking efforts is to include information and
metrics compiled from several quantiles of the data. We suggest
The summary statistics of the DEA and qDEA directional dis- that the analyst, decision maker, or regulator examine the results
tance estimates obtained after applying the above procedures are from several quantiles in reaching a decision with respect to po-
presented in Table 3. In Fig. 7, we present the box-plots of the tential improvements in a given firm’s performance. In a regula-
DEA and qDEA benchmarking comparative input contractions by tory setting, the analyst might assess firms utilizing qDEA with a
α level, with the horizontal lines and the diamonds representing 5% or 10% quantile level. Firms who are super-efficient or external
the median and mean values, respectively in Table 3. These are di- to the quantile hyperplane could be judged to have met the stan-
rectional distances and provide direct estimates of proportional in- dard while inefficient firms would be assessed using their qDEA
put contraction/savings. An examination of Table 3 and Fig. 7, the distance metric. We are not suggesting that this approach would
mean DEA estimates indicate the industry-wide inputs can be po- be suitable or desirable in all applications but qDEA allows the an-
tentially be contracted by 24.78% while maintaining original output alyst or regulatory organization to address potential outlying, influ-
levels. However, the qDEA − 99, estimates shows a relatively large ential or non-representative firms and develop their own rules or
reduction from 24.78 to a 19.62% potential mean input contraction, procedures more efficiently with respect to the appropriate quan-
suggesting a small number of observations are driving the original tile level used in quantile-based benchmarking.
DEA-based estimates. As the number of the number of points in- We believe that qDEA should prove useful in identifying the
side the hull are further reduced, the proportional inputs savings three types of outliers discussed by Bogetoft and Otto (2011) and in
J. Atwood and S. Shaik / European Journal of Operational Research 286 (2020) 649–661 661

estimating more robust DEA distance metrics. We also believe that Chen, Y. (2005). Measuring super-efficiency in DEA in the presence of infeasibility.
qDEA will prove to be particularly useful in regulatory and bench- European Journal of Operational Research, 161, 545–551.
Chen, Y., & Liang, L. (2011). Super-efficiency DEA in the presence of infeasibility:
marking applications in that qDEA directly accommodates the esti- One model approach. European Journal of Operational Research, 213, 359–360.
mation of quantile-based distance metrics while endogenously al- Cook, W., Liang, L., Zha, Y., & Zhu, J. (2009). A modified super-efficiency DEA model
lowing proportions or quantiles (such as the top one, five or ten for infeasibility. Journal of Operational Research Society, 60, 276–281.
Cook, W., Ruiz, J., Sirvent, I., & Zhu, J. (2017). Within-group common benchmarking
percent) of DMU’s to lie external to the resulting hyperplanes. using DEA. European Journal of Operational Research, 256, 901–910.
Quantile-DEA provides a mechanism whereby a decision-maker or Cooper, W., Seiford, L., & Tone, K. (2006). Introduction to data envelopment analysis
researcher can obtain information about the practical reachability and its uses. NewYork: Springer.
Cumova, D., & Nawrocki, D. (2014). Portfolio optimization in an upside potential and
of projected points by exploring the interior of the feasible produc-
downside risk framework. Journal of Economics and Business, 71, 68–89.
tion technology and the identifying the proportion of DMUs that Dai, X., & Kuosmanen, T. (2014). Best-practice benchmarking using clustering meth-
have been able to reach or exceed a qDEA performance metric. The ods: Application to energy regulation. Omega, 42(1), 179–188.
Daouia, A., & Simar, L. (2005). Robust nonparametric estimators of monotone
use of qDEA will thus allow the analyst to directly “slice” rather
boundaries. Journal of Multivariate Analysis, 96, 311–331.
than “peel” the onion of the DEA revealed technology to the de- Daouia, A., & Simar, L. (2007). Nonparametric efficiency analysis: a multivariate con-
sired quantile depth using conventional LP-based procedures. ditional quantile approach. Journal of Econometrics, 140, 375–400.
Daraio, C., & Simar, L. (2007). Advanced robust and nonparametric methods in effi-
ciency analysis. New York: Springer.
References Fishburn, P. (1972). Mean-risk analysis with risk associated with below-target re-
turns. American Economic Review, 67, 116–126.
Adler, N., Liebert, V., & Yazhemsky, E. (2013). Benchmarking airports from a man- Geyer, C. (2013). Statistics 5601 Notes: The subsampling bootstrap. University
agerial perspective. Omega, 41(2), 442–458. of Minnesota. Reference Material, http://www.stat.umn.edu/geyer/5601/examp/
Anthonisz, S. (2012). Asset pricing with partial-moments. Journal of Banking & Fi- subboot.html
nance, 36, 2122–2135. Jradi, S., & Ruggiero, J. (2019). Stochastic data envelopment analysis: A quantile re-
Aparicio, J., Cordero, J., & Pastor, J. (2017). The determination of the least distance gression approach to estimate the production frontier. European Journal of Oper-
to the strongly efficient frontier in data envelopment analysis oriented models: ational Research, 278(2), 385–393.
modelling and computational aspects. Omega, 71, 1–10. Lee, H., Chu, C., & Zhu, J. (2011). Super-efficiency DEA in the presence of infeasibility.
Aparicio, J., & Pastor, J. (2014). Closest targets and strong monotonicity on the European Journal of Operational Research, 212, 141–147.
strongly efficient frontier in DEA. Omega, 44, 51–57. Lee, H., Chu, C., & Zhu, J. (2012). Super-efficiency infeasibility and zero data in DEA.
Aparicio, J., Ruiz, J., & Sirvent, I. (2007). Closest targets and minimum distance to the European Journal of Operational Research, 216, 429–433.
Pareto-efficient frontier in DEA. Journal of Productivity Analysis, 28(3), 209–218. Lovell, C., & Rouse, A. (2003). Equivalent standard DEA models to provide supereffi-
Aragon, Y., Daouia, A., & Thomas-Agnan, C. (2005). Nonparametric frontier estima- ciency scores. Journal of the Operational Research Society, 54(1), 101–108.
tion: A conditional quantile-based approach. Econometric Theory, 21(2), 358–389. Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.
Atwood, J. (1985). Demonstration of the use of lower partial moments to im- Nawrocki, D. (1991). Optimal algorithms and lower partial moment: ex post results.
prove safety-first probability limits. American Journal of Agricultural Economics, Applied Economics, 23(3), 465–470.
67, 787–793. Politis, D., Romano, J., & Wolf, M. (1999). Subsampling. New York: Springer.
Atwood, J., S. , S., & Walden, J. (2017). Radial efficiency metrics using worst-case Politis, D., Romano, J., & Wolf, M. (2001). On the asymptotic theory of subsampling.
reference points. In Proceedings of the 15th European workshop on efficiency and Statistica Sinica, 11, 1105–1124.
productivity analysis. (EWEPA). London. June (pp. 12–15). Ramón, N., Ruiz, J., & Sirvent, I. (2018). Two-step benchmarking: Setting more realis-
Atwood, J., & Shaik, S. (2018). Quantile DEA: estimating qDEA-alpha efficiency es- tically achievable targets in DEA. Expert Systems With Applications, 92, 124–131.
timates with conventional linear programming. In Productivity and inequality. Risk Management Association (2019). Annual statement studies - financial ra-
New York: Springer Press. tio benchmarks, volume I 2017-2018. https://www.rmahq.org/uploadedFiles/
Atwood, J., Watts, M., Helmers, G., & Held, L. (1988). Incorporating safety-first con- Knowledge_Center/Publications_and_Tools/Statement_Studies/2017-18%20FRB%
straints in linear programming production models. Western Journal of Agricul- 20Definition%20of%20Ratios.pdf.
tural Economics, 13, 29–36. Rockafellar, R., & Uryasev, S. (20 0 0). Optimization of conditional value-at-risk. Jour-
Banker, R., F. , F., & Zhang, D. (2017). Use of data envelopment analysis for incen- nal of Risk, 2, 21–41.
tive regulation of electric distribution firms. Data Envelopment Analysis Journal, Ruiz, J., & Sirvent, I. (2016). Common benchmarking and ranking of units with DEA.
3(1-2), 1–47. Omega, 65, 1–9.
Barr, R., Durchholz, M., & Seiford, L. (20 0 0). Peeling the DEA onion: layering and Seiford, L., & Zhu, J. (1999). Infeasibility of super-efficiency data envelopment anal-
rank-ordering DMUs using tiered DEA. Southern Methodist University Technical ysis models. INFOR, 37, 174–187.
Report, Southern Methodist University. Simar, L. (2003). Detecting outliers in frontier models: a simple approach. Journal of
Bawa, V. (1975). Optimal rules for ordering uncertain prospects. Journal of Financial Productivity Analysis, 20, 391–424.
Economics, 2(1), 95–121. Simar, L., & Wilson, P. (2011a). Estimation and inference in nonparametric fron-
Bawa, V., & Lindenberg, E. (1977). Capital market equilibrium in a mean-lower par- tier models: recent developments and perspectives. Foundations and Trends in
tial moment framework. Journal of Financial Economics, 5(1977), 189–200. Econometrics, 5(3-4), 183–337.
Behr, A. (2010). Quantile regression for robust bank efficiency score estimation. Eu- Simar, L., & Wilson, P. (2011b). Inference by the m out of n bootstrap in nonpara-
ropean Journal of Operational Research, 200(2), 568–581. metric frontier models. Journal of Productivity Analysis, 36, 33–53.
Berck, P., & Hihn, J. (1982). Using the semivariance to estimate safety-first rules. Wang, Y., Wang, S., Dang, C., & Ge, W. (2014). Nonparametric quantile frontier esti-
American Journal of Agricultural Economics, 64, 298–300. mation under shape restriction. European Journal of Operational Research, 232(3),
Bogetoft, P. (2012). Performance benchmarking: Measuring and managing performance. 671–678.
New York: Springer. Wheelock, C., & Wilson, P. (2008). Non-parametric, unconditional quantile estima-
Bogetoft, P., & Otto, L. (2011). Benchmarking with DEA, SFA, and R. New York: tion for efficiency analysis with an application to federal reserve check process-
Springer. ing operations. Journal of Econometrics, 145, 209–225.
Bottasso, A., & Conti, A. (2011). Quantitative Techniques for Regulatory Benchmarking Wilson, P. (1993). Detecting outliers in deterministic nonparametric frontier models
- A CERRE Study. Brussels: CERRE. with multiple outputs. Journal of Business and Economic Statistics, 11, 319–323.
Boxwell, R. (1994). Benchmarking for Competitive Advantage. New York, NY.: Mc- Wilson, P. (1995). Detecting influential observations in data envelopment analysis.
Graw-Hill. Journal of Productivity Analysis, 6, 27–45.
Cazals, C., Florens, J., & Simar, L. (2002). Nonparametric frontier estimation: a robust Zanella, A., Camanho, A., & Dias, T. (2013). Benchmarking countries’ environmental
approach. Journal of Econometrics, 106, 1–25. performance. Journal of the Operational Research Society, 64(3), 426–438.
Chambers, G., Chung, Y., & R. , F. (1996). Benefit and distance functions. Journal of Zhu, S., Li, D., & Wang, S. (2009). Robust portfolio selection under downside risk
Economic Theory, 70, 407–419. measures. Quantitative Finance, 9(7), 869–885.
Charnes, A., Cooper, W., & Rhodes, E. (1978). Measuring the efficiency of decision
making units. European Journal of Operational Research, 2, 429–444.

You might also like