Dea 082917 PDF

Calculating Efficiency with Financial Accounting Data:
Data Envelopment Analysis for Accounting Researchers
Peter R. Demerjian*
University of Washington
August 2017
Abstract
Recent years have seen a preponderance of accounting research using data envelopment analysis
(DEA) to measure efficiency. In this study, I examine the calculation of efficiency using DEA,
with a focus on large panel datasets of financial accounting data. Using simulation and archival
data, I examine four methodological considerations that arise when calculating efficiency with
panel data: calculation group size, calculating efficiency for datasets covering multiple time
periods, the choice of calculation group classification, and using subsets of efficiency scores
calculated from larger datasets. I find that each of these issues potentially influences the
efficiency scores generated by DEA. Based on these methodological issues, I provide evidence
and prescriptions to aid researchers using DEA.
Keywords: data envelopment analysis, financial accounting, DEA

JEL classifications: C61, C67, M41
* I gratefully acknowledge the helpful feedback of Bok Baik, Doug DeJong, Weili Ge, Allison Koester, Melissa
Lewis-Western, Dawn Matsumoto, Paige Patrick and workshop participants at the University of Maastricht.
Financial support was provided by the Foster School of Business and William R. Gregory Accounting Faculty
Fellowship.
Contact information: pdemerj@uw.edu; 206-221-1648.
1. Introduction
Recent years have seen a surge of interest in financial accounting research using data
envelopment analysis (DEA). DEA is an optimization program that measures the relative
efficiency of different observations within a group. Efficiency is typically defined as the
maximization of outputs for a fixed level of inputs, or alternatively the minimization of inputs for
a fixed level of outputs. Relative efficiency is determined by grouping a set of similar
observations (termed decision-making units, or DMUs) and calculating the subset of the most
efficient of those observations, called the efficient frontier. DMUs that are not on the frontier
receive a score conveying their efficiency relative to frontier observations. For example, a DMU
with an efficiency score of 0.85 is 15 percent less efficient than the closest DMU that is on the
efficient frontier.
Although researchers have used DEA in a variety of research areas (such as operations
and management accounting) for several decades, only recently have studies employed this
method widely in financial accounting research. This interest in DEA in financial accounting
contexts stems from research by Demerjian. et al (2012: DLM). In their study, they calculate the
operating efficiency of firms using DEA employing a wide range of accounting variables over a
long time series.1 Many subsequent studies have used DEA with financial accounting variables
in a similar design as DLM, examining a wide range of research questions.2
Given the recent interest in the literature in DEA, I have three main objectives in
presenting this research. First, use of DEA is relatively new to the financial accounting literature.
I use this study to provide an introduction to this methodology for new users, including
1
DLM uses OLS regression in the second stage of their MA Score calculation. Since the focus of this study is on
DEA, I restrict my discussion to their first stage.
2
In the Appendix I list recently published papers that have used a DEA-based efficiency score in their primary
analysis.
1
describing current and potential applications. Second, there are many assumptions and choices
that go into designing the DEA efficiency program. Even for current DEA users, the implications
of these design choices may not be clear. I therefore revisit these assumptions and choices and
highlight their implications for calculating efficiency with DEA. Third, calculating efficiency
using financial accounting data typically involves large panel datasets that introduce a new set of
methodological concerns not currently examined in the literature. To illustrate potential issues
when calculating efficiency with large panel datasets of accounting data, I use both simulated
and archival data from Compustat and other commonly used databases. I examine four
methodological issues related to implementing DEA.
The first issue I examine relates to calculation group size. The calculation group is the
set of observations against which a researcher calculates a DMU’s efficiency. When the
calculation group has many observations, some DMUs will be on the efficient frontier, but most
will be inside the frontier (i.e., inefficient relative to frontier DMUs). As calculation group size
gets smaller, relatively more DMUs are on the frontier. This has two important effects. First, the
average efficiency score in the calculation group is mechanically higher than if a larger
calculation group had been used. Second, the standard deviation among efficiency scores is
lower. In short, small calculation groups compress the distribution of efficiency scores,
eliminating informative cross-sectional variation. I examine the extent of this issue using
simulated datasets. I find mechanical effects as predicted. Further, the effect is exacerbated by
increasing the number of inputs and outputs. Based on these results, I believe researchers must
be aware of the potential effects of small calculation groups and, where possible, expand
calculation group sizes to avoid distorting inferences.
2
The second issue I investigate is calculating efficiency using data from multiple time
periods. Unlike earlier DEA-based studies, where researchers drew observations from a single
time period, studies using financial accounting data often have panel datasets available. With
firms having multiple observations over time, new and interesting analyses become available to
researchers. For example, we can examine changes in aggregate efficiency over time, or the
evolution of efficiency in a specific firm or reporting unit. Calculating DEA efficiency over
multiple time periods does, however, introduce problems that are not relevant in single time
period settings. I examine these problems using simulated datasets. The first problem is in
measuring efficiency over multiple time periods when the efficient frontier is shifting over time;
this introduces a mechanical issue where efficiency scores are not be comparable across different
time periods. The second issue is the potential of look-ahead bias, where future observations are
used to calculate current efficiency scores. Again using simulation, I find that using of panel data
and calculating with data from multiple time periods has the potential to distort efficiency scores.
Drawing on these analyses, I make two recommendations. First, when calculating the level of
efficiency over time, I recommend running DEA by time period (e.g., year) and using fixed
effects to control for over-time changes in the efficient frontier. Second, when calculating
changes in efficiency, I recommend pooling pairs of years, calculating DEA efficiency using
both years, and retaining only the change from the resulting calculation.
The third problem pertains specifically to the choice of calculation group classification.
Studies using DEA with financial accounting data have typically sorted firms by industry, such
as the Fama and French (1997) 48-industry classification. Sorting by industry conforms to
DEA’s operations-based origins, in that research has implicitly assumed that firms in the same
industry will employ similar operational and productive technologies to produce revenue.
3
Industry sorting introduces two possible methodological issues. First, different industries have
different numbers of firms. As indicated by the results on calculation group size, smaller
industries will have mechanically higher efficiency scores, rendering inferences from tests
pooling scores across industries questionable. Second, sorting by industry and pooling across
years leads to possible look-ahead bias. Using archival data from Compustat, I examine the
implications of these two issues on efficiency scores. Despite the theoretical appeal of industry-
based sorting, I find that year-based sorting yields similar inferences in an empirical test of the
association between firm operating efficiency and accounting profitability, suggesting that
efficiency calculations using Compustat data may be robust to alternative classification schemes.
Based on these results, I recommend using year-based sorting, rather than industry-based, to
avoid problems with calculation group size and look-ahead bias.
The final problem I examine relates to using subsets of efficiency scores calculated from
larger datasets. In many research settings, financial accounting data (from Compustat) is merged
to a database with more limited coverage (such as IBES or Execucomp). This leads to an
important design choice: should the researcher measure efficiency for the entire set of firm-years
with available data (in the case of DLM, the majority of firm-years on Compustat) or only on the
subset of firm-years that are the subject of the study? I examine what effect this design choice
has on inferences from tests. I calculate efficiency scores using all firm-years from Compustat
and merge the subsequent scores with two more limited databases: CRSP and Execucomp. In
each case, I find that the efficiency scores differ whether a) I calculate the scores on just the
subset or b) I draw the scores from the more broadly calculated (i.e., full Compustat-based)
scores. Furthermore, I examine the inferences from a test of firm efficiency on CEO
compensation and find that scores calculated using the different methods yield substantively
4
different inferences. This suggests that the method used to calculate efficiency scores for a subset
of Compustat firm-years is potentially relevant.
I draw two broad conclusions from these analyses. First, each of the issues noted above
has the potential to influence the efficiency scores calculated with DEA. That is, I present
evidence showing how calculation group size, multiple time periods, calculation group
classification, and subsets of efficiency scores can affect the efficiency scores generated by
DEA. I also illustrate how, in some cases, these issues can lead to significant differences in
inferences (for example, using subsets of efficiency scores) but in others inferences appear
unaffected (for example, forming calculation groups by year rather than industry). A key insight
is that researchers must be aware of the possible implications of their implementation choices
and understand how their choices affect the measurement of efficiency and related inferences.
Second, given the broad set of choices that must be made in designing studies using
DEA, researchers should carefully assess the appropriateness and robustness of the choices they
make. I provide several methodological suggestions for researchers who employ DEA with
financial accounting data, but they are only that—suggestions. Within each study, researchers
should carefully consider their objective in measuring efficiency, verify that their efficiency
scores are robust to alternative assumptions, and be able to argue on theoretical grounds why one
set of assumptions is more appropriate than another.
This study contributes to the growing literature that measures efficiency using DEA. In
the spirit of Dyson et al. (2001) and Brown (2006), I identify several “Pitfalls and Protocols”
related to implementing DEA with large panel datasets with particular attention to accounting
data. Thus this research builds on the vast literature on DEA methodology. While the
methodological considerations I discuss are not unique to financial accounting data or databases,
5
the recent prominence of DEA in financial accounting research suggests a need for a clear
delineation of potential issues. By understanding potential methodological issues, researchers are
better equipped to make and justify appropriate choices. This should lead to research yielding
stronger inferences on the effects of efficiency in a variety of contexts.
The results of this study should also be helpful to researchers looking to apply efficiency
in different settings in financial accounting research. Although most applications of DEA using
financial accounting data have focused on measuring firm operating efficiency3, DEA is flexible
enough to provide insight into a variety of applications using financial accounting data. Possible
unexplored avenues for analysis include measuring the efficiency of specific capital investments,
R&D expenditures, and mergers and acquisitions. Additionally, DEA could be used to assess the
efficiency of a firm’s compensation contracts, tax planning strategies, or government and
regulatory lobbying activities. In short, DEA is a powerful method that incorporates flexibility
into the calculation of efficiency. It is important that researchers understand the assumptions and
choices that go into the DEA program as well as the potential effects of their research design
choices.
2. Background on DEA
DEA background and calculation overview
Efficiency, defined broadly, is the maximization of output(s) for a fixed level of input(s),
or alternatively the minimization of input(s) for a fixed level of output(s). Farrell (1957) notes
several problems with measuring efficiency, specifically related to multiple inputs and outputs.
He develops a method of relative efficiency measurement, where the efficiency of any unit is
evaluated not against an absolute benchmark but rather against a set of comparison units.
3
Following DLM, firm operating efficiency is measured as the efficiency of a firm producing revenue given the
level of capital and certain period expenses. I provide a further discussion in Section 2.
6
Drawing on Farrell (1957), Charnes, et al. (1978) develop the DEA model, which expands the
Farrell measure and is innovative in several ways. First, it allows for multiple inputs and outputs
without requiring an explicitly imposed weighting scheme. For example, a researcher may want
to measure the efficiency of firms in producing revenue using capital and labor. Under traditional
efficiency measurement, the researcher would calculate the following:
𝑅𝑒𝑣𝑒𝑛𝑢𝑒
= 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦
𝛾1𝐶𝑎𝑝𝑖𝑡𝑎𝑙 + 𝛾2 𝐿𝑎𝑏𝑜𝑟
The researcher would need to assign weights γ1 and γ2 for the calculation. DEA does not require
such an externally imposed weighting scheme, but rather calculates implicit weights. Second, the
implicit weights are allowed to vary by the unit under study. Following the example above, this
allows variation in the optimal mix of capital and labor, something a fixed weighting scheme
does not allow.
In the remainder of this section I provide an overview of DEA calculations. My treatment
is brief and meant to highlight the fundamentals of this technique; for more in-depth discussions,
I refer readers to Cooper et al. (2006) and Cook and Seiford (2009). Consider a researcher
wanting to measure the relative efficiency of n units (DMUs). The DMUs are sorted into groups
based on commonality in production technology or operations; these are the calculation groups
for the DEA optimization program. The efficiency calculation is based on a vector y of outputs
containing s elements (y1, y2,…, ys) and a vector x of inputs containing m elements (x1, x2,…xm).
The inputs and outputs are used to solve the following program:
∑𝑠𝑟=1 𝒖𝑟 𝑦𝑟𝑛
max𝒖,𝒗 𝜃 = ∑𝑚
𝑖=1 𝒗𝑖 𝑥𝑖𝑛
subject to:
∑𝑠𝑟=1 𝒖𝑟 𝑦𝑟𝑗
∑𝑚
≤1 ∀ j = 1,…,n
𝑖=1 𝒗𝑖 𝑥𝑖𝑗
7
u1,…, us ≥ 0
v1,…, vm ≥ 0
In the program, u and v are vectors of weights—the implicit weights—on the outputs and inputs
respectively. The program’s objective is to find the u and v that maximize the ratio θ subject to
the constraints of the program. The program starts with the first DMU and calculates weight
vectors u and v that maximize the ratio of the weighted average outputs to the weighted average
inputs for that DMU.4 The first condition constrains the maximum efficiency (θ), so the program
initially selects weights u and v which yield efficiency of one.5
This first potential vector of weights for the first DMU is then applied to all other DMUs
in the calculation group. As with the DMU under study, the efficiency for other DMUs is also
constrained to be one or less (from the first constraint, which covers all observations in the
calculation group). As such, if these weights yield a calculated efficiency of greater than one for
any DMU in the calculation group, u and v are rejected and the program selects another set of
weights and starts the process again. The program proceeds to iteratively test different weighting
schemes against the other DMUs in the calculation group. The program ultimately selects the set
of weights that maximizes θ for the DMU under study while not yielding a calculated efficiency
greater than one for any other DMU in the calculation group. This DMU-specific set of weights
u and v leads are used to calculate the efficiency score for the first DMU.
The DEA program then proceeds to the second DMU and follows the same procedure: a
pair of weight vectors u and v is selected that optimize efficiency for the second DMU subject to
4
In practice this fractional program is typically transformed to be run as a linear program; see Cooper et al. (2006)
pgs. 23-25 for details.
5
The second and third conditions require all inputs and outputs to be non-zero with the inequality holding strictly
for at least one output and one input. This prevents degenerate solutions (e.g., efficiency will be measured as infinite
if all input weights are zero.)
8
the constraints of the problem.6 These weights are applied to the inputs and outputs of all other
DMUs in the calculation group (including the first) and the program proceeds iteratively until the
optimal sets of weights that satisfy the program conditions are found. The program then does the
same for the third, fourth, and remaining DMUs until it has calculated DMU-specific weights
and corresponding efficiency scores for each of the n DMUs.
The efficiency scores the program calculates range from a low of zero to a high of one.
Observations having a value of one are on the efficient frontier. DMUs on the frontier are
optimally efficient in the Pareto sense: any possible efficiency improvements due to changing
inputs or outputs will be accompanied by corresponding deterioration in other inputs or outputs
that ultimately countervails the efficiency improvement. In short, there is no alternative
weighting scheme that could yield a higher relative efficiency score for that DMU. For off-
frontier DMUs, the efficiency score conveys the degree of inefficiency relative to the nearest
frontier DMU. For example, a DMU with an efficiency score of 0.85 must improve by 15
percent, either through increases in outputs or decreases in inputs (or some combination of both)
to attain the efficiency of the closest frontier DMU. As such, DEA efficiency score is a radial
efficiency measure, capturing relative efficiency beyond just the ordinal sense.
DEA applications
Non-financial accounting applications
Although originally developed in operations management, researchers have subsequently
used DEA in a variety of applications across a wide range of academic disciplines and research
settings. Charnes et al. (1981) examine the success of an education program. Researchers have
also used DEA to measure the efficiency of higher education (Athanassopoulos and Shale 1997;
6
These weights need not be the same for different DMUs. The flexibility of the weights (allowing DMU-specific
weights rather than imposing weights for all calculation group observations) implicitly allows firms to optimize with
different mixes of inputs and outputs.
9
Avkiran 2001; Abbott and Doucouliagos 2003; Johnes 2006). Other institutional settings include
hospitals (Banker et al. 1986; Jacobs 2001), telecommunications (Kim et al. 1999), aerospace
and airports (Gillen and Lall 1997; Adler and Berechman 2001; Lin and Hong 2006), and
shipping ports (Tongzon 2001).
In the financial realm, researchers have used DEA to measure the efficiency of banks
(Sherman and Gold 1985; Vassiloglou and Giokas 1990; Avkiran 1999; Cook et al. 2000;
Grigorian and Manole 2002). Outside of banking, studies also examine insurance (Kao and
Hwang 2008). A number of studies examine agricultural settings (Fraser and Cordina 1999;
Dhungana et al. 2004) and ecological performance (Dyckhoff and Allen 2001; Korhonen and
Luptacik 2004).
A common thread in terms of research design of DEA studies is that they tend to feature
specific, focused settings. Charnes et al. (1981), in an early application of DEA, examine
“Follow Through”, a federal education program which focused on 70 public schools. Banker et
al. (1986) examine 114 hospitals in North Carolina. Agricultural studies tend to be similarly
circumscribed and specific: Fraser and Cordina (1999) study dairy farms in North Victoria,
Australia, and Dhungana et al. (2004) focus on Nepalese rice farms.
DEA in financial accounting research
A number of early DEA applications examine financial statement information. These
studies use small, homogeneous samples. Smith (1990) extends traditional ratio analysis using
DEA for a sample of 47 pharmaceutical firms. Yeh (1996) completes a similar exercise for six
Taiwanese banks. Feroz et al. (2003) also supplement ratio analysis with DEA using a sample of
29 oil and gas firms.
10
Recent research by DLM has stimulated a new line of DEA-based research using large
panel datasets of financial accounting information. DLM uses DEA to calculate firm-year
efficiency with financial statement variables for a broad set of firms with data available on
Compustat. They calculate firm-year efficiency scores in the first stage of a two-stage procedure;
in the second stage they use Tobit regressions to control for firm-specific features and isolate the
effects of the manager, which they term the managerial ability, or the MA Score. Because the
focus of this study is issues related to calculating efficiency using DEA for large panel datasets
of financial accounting data, I focus on their first-stage DEA calculation.7
DLM uses a single output (sales revenue) and seven inputs (net PP&E, net operating
leases, net capitalized R&D, purchased goodwill, other intangible assets, cost of goods sold, and
SG&A expenses). Their data spans firm-years with sufficient data on Compustat for the period
1980 to 2009. They focus on industrial firms so they exclude financial industries (banks,
insurance, real estate, and financial services). Furthermore, they exclude utilities because these
firms are regulated, which likely affects the correspondence between sales and the inputs. Their
sample consists of 177,512 firm-year observations from 44 Fama and French (1997) industries.
They report considerable variation in the number of observations by industry, ranging from a low
of 268 (tobacco) to a high of 21,884 (business services). The DLM calculation of efficiency
pools all firm-years within an industry and calculates relative efficiency across the full study
time period.
3. Methodological Considerations
7
In the second stage, DLM runs the first stage DEA efficiency score through regressions with firm-year controls
that help or hinder efficiency: size, market share, free cash flows, age, segment density, and foreign operations. They
attribute any variation in operating efficiency that is not explained by the controls to the manager; that is, they use
the residual from the second-stage regressions as their measure of managerial ability.
11
As noted earlier, many of the studies that measure efficiency with DEA use small,
specific samples and calculate efficiency within these groups. The logic underlying this sample
selection approach is that it is difficult to interpret relative efficiency if there was too much
variation in the underlying operations and production of DMUs within a calculation group. In
other words, researchers have typically presumed that DMUs in the same calculation group
should have similar underlying production functions because DMUs with production functions
that differ too dramatically are essentially incomparable.8
Recent research in financial accounting using DEA departs from this design by
employing large panel datasets of financial accounting information. This departure has led to
four methodological considerations not relevant for small-sample, industry-specific studies but
important for researchers to consider when using large financial accounting datasets. Two of
these issues relate to the DEA calculation in general. The first pertains to the size of calculation
groups, and the second relates the measurement and interpretation of efficiency scores over time.
The other issues arise related more specifically to use of accounting databases. The third pertains
to the grouping of firm-years for the DEA calculation, and the fourth relates to using subsets of
efficiency scores based on the intersection of accounting data with other datasets. I discuss each
of these issues in the remainder of this section, and provide recommendations to researchers
confronting these problems, along with tests validating alternative approaches, in subsequent
sections.
Calculation group size
8
This assumption is typically implicit rather than explicitly stated in studies employing DEA. Cook and Seiford
(2009) note “The original idea behind DEA was to provide a methodology whereby, within a set of comparable
decision making units (DMUs), those exhibiting best practice could be identified, and would form an efficient
frontier.” (emphasis added)
12
Because DEA measures relative efficiency, researchers sort observations into groups of
similar DMUs to execute the program. The first constraint in the optimization forces the
maximum efficiency to a value of one. This constraint, coupled with calculation groups of
different sizes, has the potential to mechanically distort efficiency scores across groups. As an
example, consider using DEA in a setting with one output and one input. In this case, there is a
single optimal mix of output and input; holding the output equal, the frontier DMU will be the
one that minimizes inputs. Now, extend the example to one output and two inputs. In this case,
there are potentially multiple DMUs on the frontier. Again holding output equal, there will be
one DMU on the frontier that optimizes with a low value of the first input. Similarly, another
will optimize with a low value of the second input. Additionally, there could be multiple linear
combinations of the two inputs that also yield optimal efficiency; the DMUs with these
combinations of inputs will trace out other regions of the frontier.
With multiple inputs and outputs, high dimensionality can lead to many DMUs tracing
the frontier. Holding the number of inputs and outputs equal, proportionally more DMUs will be
on the frontier when the calculation group is small. Moreover, even DMUs that are not on the
frontier will have more reference points on the frontier, and thus will themselves be closer to the
frontier (and have higher efficiency scores). This suggests, holding other things equal, small
calculation groups will lead to higher mean efficiency scores. Additionally, with the maximum
efficiency being constrained at one, this will also lower variance among these scores. I also
expect that the extent of this problem increases with the number of inputs and outputs. In the
remainder of this section I present empirical tests and results that explore calculation group size
effects.
13
I use simulation to create a dataset that isolates the effects of calculation group size and to
abstract away from the potential idiosyncrasies of archival data. To start, I define a simple
process to generate a dataset with one output and three inputs.9 The output is Sales, which I set to
vary randomly between 10 and 100.10 Two of the inputs are capital, termed Capital A and Capital
B. Their values are also stochastic, ranging from 20 to 140% of sales, and are independent of
each other. The third input is Expense; this is also indexed to Sales, ranges between 20 and 80%,
and is independent of both capital accounts.11 The design of this dataset is meant to emulate a
simple business setting where revenue is generated from two capital sources and one periodic
expense source. The dataset allows for a wide range of aggregate input values, from a low of
60% of sales to a high of 360%. I create a dataset of 1,600 observations using the process noted
above. I present descriptive statistics on these outputs in Table 1, Panel A.
From this sample I make random draws to create DEA calculation groups of different
sizes. Before drawing the data for the simulation, I calculate efficiency scores for the entire
sample of 1,600 DMUs. This yields the true relative efficiency for the sample, and serves as a
benchmark to evaluate efficiency run with smaller calculation groups. I term the efficiency
scores from this full-sample run DEA1600. I then draw subsamples from this base sample of
1,600 DMUs to determine the mechanical effects of smaller calculation groups on efficiency
scores.
For the first round of the simulation, I randomly select 800 observations from the full
sample of 1,600. Using this first draw, I calculate efficiency using DEA and tabulate the scores
9
This example is generalizable to any number of outputs and inputs; here I use a relatively simple example to help
the reader maintain intuition.
10
All the random variables used in the simulation follow a uniform distribution.
11
I define the output and inputs in a financial accounting context due to the focus of the paper. The interpretation of
the simulation results can be applied to setting with non-financial accounting outputs and inputs without loss of
generality.
14
for each observation.12 I then draw a second random, independent calculation group of 800 from
the original 1,600 and calculate efficiency again. I repeat this procedure for a total of 50
iterations. Since each draw comprises half of the base sample, each DMU from the base sample
appears an average of 25 times across the 50 simulation draws.13 Pooling all simulation runs, the
total number of efficiency observations is 40,000 (800 in each calculation group for 50
simulation draws). I merge these by the original sample DMU identifier and calculate the mean
efficiency by DMU across all simulation draws. This yields a sample of 1,600 observations,
matching the size of the original sample and ultimately summarizing the 50 simulation draws. I
term the efficiency scores DEA800.
For the next round of the simulation, I reduce the calculation group size to 400. In order
to maintain the same aggregate number of observations, I increase the number of simulation
draws to 100 (400 in each calculation group for 100 simulation draws). This ensures that each
DMU appears, on average, 25 times across the simulation runs. In each successive simulation
round, I halve the size of the calculation group and double the number of simulation draws: the
subsequent calculation group sizes are 200 (leading to 200 simulation iterations), 100 (400
iterations), 50 (800 iterations), and 25 (1,600 iterations). I call the efficiency scores I generate
DEA400, DEA200, DEA100, DEA50, and DEA25.
I evaluate the effect of calculation group size in several ways, starting by examining the
mean efficiency. I present these results in Table 1, Panel B. As noted earlier, DEA1600 serves as
the benchmark for the true relative efficiency. The mean value of DEA1600 is 0.584, the
standard deviation is 0.213, the median is 0.555, and the scores ranges from 0.25 to 1. The
12
I retain the DMU identifier from the original dataset of 1,600 observations. I use this to compare errors across
different calculation group sizes.
13
Across the 50 simulation draws individual DMUs are drawn as few as 15 times and as many as 38. The median
number of draws is 25, the 25th percentile is 23, and the 75th percentile is 27.
15
second row shows the results for DEA800, where the calculation groups comprise 800
observations. The mean is larger, at 0.605, as is the median of 0.572. Mean and median
differences between the two groups are statistically significant, although their magnitudes are
relatively low.14 It is also notable that the standard deviation, at 0.211, is slightly lower than the
benchmark run; this is consistent with the smaller calculation group increasing the lowest
possible efficiency measures and reducing variability.
I present descriptive statistics for the efficiency from other simulation rounds in the
subsequent rows of Table 1, Panel B. The results reveal a consistent pattern: means, medians,
and other distribution parameters (other than maximums, which are one in every case) are
increasing monotonically as the calculation group size gets smaller. While the mean for the
benchmark sample is 0.584, the mean for DEA25 is 0.839, a highly significant 43.7% increase. A
similarly large difference is in place for medians. The standard deviation decreases
monotonically from 0.213 for the full sample to 0.164, a decline of 23%. This evidence is
consistent with smaller calculation groups a) moving the lower bound of the distribution higher,
b) compressing the distribution of efficiency, and thus c) reducing the variance of the
distribution. This suggests that small calculation groups cause a reduction in potentially
informative variation.
For the next analysis, I measure the correlation between efficiency scores from each of
the calculation group simulation rounds; I present these results in Table 1, Panel C, with Pearson
correlations in the upper triangle. These correlations are highly significant in all pairs, although
they decline monotonically as calculation group size gets smaller. This suggests that smaller
calculation group size introduces increased measurement error. I present Spearman rank-
14
The difference in the means has a t-statistic of 2.82. The difference in medians, tested with a Wilcoxon test, has a
Z-statistic of 2.99.
16
correlations in the lower triangle of Table 1, Panel C. Although these coefficients are higher, a
similar monotonic trend obtains. This suggests that the errors introduced by using calculation
group size do not just affect the cardinal values of efficiency, but even affect the ordinal relation
between the measures.
To further understand the effects of varying calculation group size, I sort observations
into quartiles based on DEA1600 and tabulate the mean and standard deviation of the different
efficiency scores. I present these results in Table 1, Panel D. The first column shows the statistics
for the lowest quartile of DEA1600. Not surprisingly, these statistics show an increasing trend
from DEA1600 to DEA25—the smaller the calculation group size, the higher the mean value.
Interestingly, the standard deviation also reveals an increasing pattern. This suggests that,
opposite of the full sample results, smaller calculation group size leads to more variation in the
low end of the distribution.
The second column shows results for the 2nd quartile of DEA1600. The same pattern is in
place for both the mean and the standard deviation. In the 3rd quartile, the means are still
increasing as calculation group size gets smaller. The mean efficiency scores from the smaller
calculation groups are getting closer to the frontier: the mean value of DEA50 is 0.872, and
DEA25 is 0.929. As the distributions become more compressed near a value of one, the standard
deviation falls. In the third column, the standard deviation increases from DEA1600 to DEA100,
but then decreases for DEA50 and DEA25. In the final column means are still increasing, but
standard deviations are decreasing monotonically. This illustrates the degree to which the
distribution of efficiency in small calculation groups is constrained and thus has little room to
vary. For example, the mean value of DEA25 is 0.992, and the full range in this quartile is 0.905
to 1. This suggests that the most serious inference issues with small calculation group sizes are
17
likely to be among the most efficient DMUs, where more observations are forced toward the
frontier and variation is constrained to be low.
The next test related to calculation group size exploits the simulation structure to assess
the potential for error in measurement from using smaller calculation groups. As an example,
consider the simulation that generates DEA800. In this simulation, I draw subsamples of 800
observations from the base dataset 50 different times and calculate efficiency for each
subsample. Since the draws are random, each DMU from the base dataset appears on average 25
times across the 50 simulation runs. In the analysis above, I average across the 50 runs to get the
mean effect across simulation runs. In this analysis, I measure the standard deviation of scores by
DMU across the 50 runs. For example, say that DMU1138 from the base sample appears 25
times across the 50 simulation runs, and the scores are (0.35, 0.37, 0.29, …, 0.30). If the standard
deviation of this series is low, this means that the score is relatively consistent across simulation
runs, suggesting precise measurement despite being run in smaller calculation groups. Larger
standard deviations, in contrast, imply a greater range across the simulation runs and greater
error. I present these results in Table 1, Panel E, including the standard deviation and confidence
intervals (5%, 95%) of mean efficiency. The standard deviation for DEA800 is 0.023, suggesting
a relatively tight distribution of scores across simulations. The standard deviations increase as
calculation group size gets smaller, up to 0.072 for DEA25. This impact is borne out in the
confidence intervals: The range for DEA800 is relatively narrow at [0.559, 0.651], while DEA25
has the wide range of [0.685, 0.993]. This analysis suggests that not only do the means get larger
and variances smaller for smaller calculation group size, but also that the likelihood of error also
grows as calculation group sizes get smaller.
18
The tests I describe in Table 1 focus on simulated data with one output and three inputs. I
expect, furthermore, that the dimensionality of the output and input sets will affect the issues I
document in Table 1, with larger numbers of inputs and outputs exacerbating small calculation
group issues. To examine this I produce a new simulated dataset with two outputs and six inputs.
The data generating process of the dataset is similar to the dataset used in Table 1, but I double
the number of each type of input and output. For outputs, I include two sales (Sales A and Sales
B), each ranging from 10 to 100 with each being independent of the other. The inputs include
four capital measures. Capital A and B range (independently) between 20 and 140% of Sales A,
and Capital C and D range between 20 and 140% of Sales B. Similarly, Expense A ranges
between 20 and 80% of Sales A and Expense B ranges between 20 and 80% of Sales B.
Using this new dataset, I complete simulation similar as with the original one-output,
three-input dataset. I again start by calculating efficiency for the full 1,600 observations, and
draw random samples of successively smaller size to assess the effects of calculation size group.
In Table 2, Panel A, I present the summary statistics on all outputs and inputs. In Table 2, Panel
B I report the descriptive statistics for efficiency calculated by different sized calculation groups
(similar as in Table 1). This table illustrates two important facts. First, across all calculation
groups (including DEA1600), the mean efficiency is higher and the standard deviation is lower
than the results reported in Table 1. Thus, without regard to the size of the calculation group, the
larger numbers of inputs and outputs compresses the distribution of efficiency scores in its own
right, leading to more DMUs on the frontier. Second, the pattern revealed in Table 1, Panel B—
increasing mean efficiencies and decreasing standard deviations—is present and even more
pronounced when there are more inputs and outputs. For example, there is minimal variation in
DEA25, with most observations residing at or very near the frontier; in fact, the median
19
observation is on the frontier. This shows the effect of small calculation groups increases with
the number of inputs and outputs.15
The simulation-based analysis I present above yields several important inferences for
researchers. First, the distribution of efficiency scores is sensitive to calculation group size, both
in means and standard deviations. Second, this effect is amplified in the higher end of the
distribution of efficiency. When calculation group size is small enough, it may be difficult to
interpret reported efficiency scores, particularly for values at or close to the frontier. Third,
calculation group size effects are more severe the greater the number of inputs and outputs.
Multiple time periods
The second issue involves measuring DEA efficiency when the underlying data covers
multiple time periods. Although early studies focus on single time period settings, recent large-
sample studies use panel datasets. In these studies, firms have multiple observations over
different time periods and researchers can choose to measure the firm’s level of efficiency or
changes in efficiency by individual time period or by pooling multiple time periods. To
understand the implications of calculating efficiency under each method, I run two separate
simulation analyses. First, I consider the implications of calculating DEA separately for each
time period (e.g., year).16 Since DEA efficiency is a relative measure, it may be difficult to draw
inferences comparing efficiency scores calculated separately in different time periods. Second, I
15
For parsimony I do not reproduce the analysis in Table 1, Panels C, D, and E. The tests for correlation and quartile
ranking yield similar inferences as those I report in Table 1. The test for within-DMU variance reveals a somewhat
different pattern. The standard deviation increases for DEA800 through DEA200 (going from 0.021 to 0.028) but
then decreases through DEA25 (falling to 0.017). This pattern emerges due to two countervailing forces affecting the
distribution of efficiency. The first effect, which I document in Table 1, Panel E, is that standard deviations increase
for smaller calculation groups. The second effect involves the compression of the distribution against the frontier.
With the highest value of efficiency constrained to one and smaller calculation group sizes increasing the lowest
value, the range of the distribution is narrower, leading to lower variance. In the two-output, six-input simulation,
the second effect exceeds the first for the smallest calculation groups.
16
Researchers can run DEA with whatever periodicity the data allows. For ease of exposition, I will use “year” as
the default periodicity (because the majority of financial accounting studies in DEA use yearly data), but the
discussion and analysis are generalizable to quarters, months, etc.
20
examine the implications of pooling DMUs from different years and how this affects efficiency
scores. This method should mitigate the issue of non-comparability when the frontier is shifting,
but could introduce look-ahead bias. In addition to examining year-to-year efficiency scores, I
also analyze the implications of measuring the change in efficiency.
I start by analyzing efficiency scores calculated separately by year. If the efficient frontier
is relatively stationary over time, this method should yield efficiency scores that are comparable
across time periods. If, however, the frontier shifts between years (as may be expected when
technologies and productivity improve or if there are economy-wide shocks) the relative
efficiency scores calculated by DEA may also shift, even if the absolute efficiency of the firm
has not changed over time. To quantify how changes in the frontier affect efficiency scores over
time, I return to simulation. I simulate a new dataset with a single output (sales) and three inputs
(Capital A, Capital B, and Expense). Sales vary randomly between 10 and 100. Inputs again vary
as a percentage of sales: Capital A and Capital B are both between 60 to 140% of sales, and
Expense is between 60 and 100% of sales. I construct a dataset with 1,600 DMUs using these
parameters. I term this the current sample and present summary statistics on the inputs and
outputs in Table 3, Panel A.17
To capture the effect of a change in the frontier over time, I create a second dataset, the
future sample. To keep the intuition simple, I keep sales stationary over time. For example, if
DMU112 has sales of 32 in the current sample, it will also have sales of 32 in the future sample.
I alter the frontier via changes in the inputs. Specifically, I shift the distribution for each input
downward by 20%, meaning each Capital account ranges between 40 and 120% of sales, and
Expense between 40 and 80% of sales. Although each DMU has the same sales figure for the
17
I change the parameters of this simulation relative to increase the variation. This allows for a sharper contrast
between the current and future subsamples used in the over-time analysis.
21
current and future period, the inputs are independent within and across the current and future
periods. This means a DMU could have high efficiency in the current period and low efficiency
in the future period; or the opposite might hold. I structure the simulation so that firms improve
on average from the current to the future period. Specifically, the average firm will have inputs at
280% of sales in the current period (100% each for Capital A & B, and 80% for expense) and
220% of sales in the future period (80% each for Capital A and B, 60% for Expense). This
implies an expected improvement of about 21.4%.18 The data generating process, however, is
stochastic so the actual change may differ. I present descriptive statistics for the future sample in
Table 3, Panel B, with combined statistics presented in Table 3, Panel C.
Given these expectations, I start by calculating efficiency for the current and future
subsamples independently (i.e., in separate simulated runs), following an approach where DEA is
run by year. I present summary statistics in Table 4, Panel A. The mean DEA score for the
current sample is 0.832, with a standard deviation of 0.113. The mean efficiency score for the
future sample is significantly lower at 0.782, and significantly more variable with a standard
deviation of 0.141. A striking aspect of this result is that although the simulation was designed to
make future efficiency higher than current, this is not conveyed when comparing mean efficiency
scores. This illustrates the chief shortcoming of calculating DEA by year: as the frontier shifts,
efficiency scores are rendered incomparable between years.
I present additional statistics to illustrate the distortionary effect of separate yearly
calculations of efficiency scores. Since some studies employ a changes research design (e.g.,
Baik et al. 2013), I examine the change in efficiency scores on a DMU-by-DMU basis from the
current to the future period. Based on the differences in means, the mean change is
approximately −4.9%. This difference is, however, variable: 39% of DMUs experienced an
18
1 – (220 / 280)
22
improvement in efficiency and 61% had a decline. Because the design of the simulation is meant
to induce on-average increases in efficiency, at least half (and presumably more than half) of
DMUs should experience an improvement in efficiency. On this basis, I conclude that changes in
DMU efficiency does not yield meaningful inferences when calculated separately by year when
the efficient frontier has shifted.
Due to potential problems in measuring changes when efficiency is calculated on a
period-by-period basis, I next examine pooling all observations regardless of time period and
calculating efficiency scores jointly on the full panel. I start by combining the current and future
samples into a single dataset. I do, however, maintain the distinction between the two so I can
compare these results with those from the prior section’s calculations. I present these results in
Table 4, Panel B. I start with the mean efficiency scores for all 3,200 observations—this includes
two observations for each firm, one from the current period and one from the future. The mean is
significantly lower than either of the separately calculated groups, at 0.673, while the standard
deviation of 0.157 is higher. The greater variation is not surprising, as the joint distribution is
wider than either of the component distributions.
In the next two rows I present statistics of efficiency scores separately for the current and
future samples. The mean value for the current period is 0.564, while the mean value of the
future period is 0.782. The difference between the two groups is a statistically significant at
0.218, very close to the theoretical difference of 0.214 predicted by the design of the simulation.
This suggests that calculating jointly between the two years yields efficiency scores that capture
economic differences over time. Further, the differences in the underlying distributions are clear
for the two subsamples: The future subsample has larger values for every point of the
distribution, including the maximum, where the current subsample has a maximum value of
23
0.855. Finally, the efficiency scores reported for the future subsample are identical to those
reported when DEA is run separately by year. That is, the frontier traced by the future subsample
is the same regardless of the inclusion of the current subsample observations. This indicates that
there is greater risk of measurement error for periods with lower efficiency.
As a final test, I return to measure the direction of the DMU-by-DMU changes between
years. Recall when calculated separately approximately 39% of DMUs showed improvement,
contrary to expectations based on the design of the simulation. When efficiency is calculated
jointly, approximately 91% of DMUs show improvement. This result is more in line with
expectations, given the underlying variance of the distribution. In total, this analysis suggests that
calculating jointly allows for an accurate measure of efficiency score changes.
There is, however, a concern with calculating efficiency jointly across years: look-ahead
bias. For example, consider a researcher calculating efficiency for 2014 and 2015 jointly for a set
of firms with observations in both years. Once the calculation is complete, the researcher uses
efficiency to explain some behavior or phenomenon. Under this method, efficiency will be
calculated for each observation, meaning there will be some efficiency scores for 2014 and some
for 2015. This issue is that the 2014 scores, which are being used to explain some phenomenon
in 2014, are calculated using 2015 data. Since this data is unobservable at the time when the
decision was being made, this may be an inappropriate inference. This problem—look-ahead
bias—is potentially more severe for studies that use longer-term panel datasets.19
19
On a conceptual basis, whether look-ahead bias is an issue depends on the context of analysis for which the
efficiency scores are being used. If efficiency scores are being used to describe an association, look-ahead bias is
likely not particularly damaging to inferences. For example, if a researcher is using a panel dataset to examine the
association between efficiency and profitability, using efficiency scores calculated with the full panel can yield valid
inferences ex post. In contrast, if the researcher is arguing that efficiency causes a certain effect, the inferences may
be distorted by look-ahead bias. For example, if a study contends that the board of directors uses firm efficiency to
set compensation contracts, it is important to measure efficiency that the board would be using. This would
necessarily be efficiency measured free from look-ahead bias.
24
The above analysis suggests that either method of calculating efficiency with panel
data—separately by year, or by pooling multiple years—presents potential inference problems.
The researcher must balance the costs of each when deciding which method to use. If the
researcher believes (or better yet, can show) that the efficient frontier is relatively stationary over
time, and that there are roughly similar amounts of observations over time (to avoid calculation
group size issues), calculating efficiency scores by year likely leads to few errors. Researchers
can control for small changes in the frontier using econometric techniques, such as fixed effects,
that are commonly used with panel data.20 If the frontier is changing significantly over time, or
there is a large disparity between yearly observations, it may be better for the researcher to pool
observations, even at the risk of look-ahead bias.21 Researchers’ willingness to accept look-ahead
bias will also be a function of the research context and will depend on whether the study is
seeking association or more direct causation. If the costs of the different choices are difficult to
assess, I recommend for researchers to use both methods and ensure that their results are robust.
Measuring changes in efficiency presents similar issues of which the researcher must be
aware. Even in cases where the efficient frontier is not changing too much over time, measuring
changes based on efficiency calculated separately by year could introduce noise. As such, I
recommend researchers calculating the change in efficiency should calculate efficiency for pairs
of adjacent years (e.g., if measuring the change for 2014, calculate efficiency by pooling 2013
and 2014) and use this change for subsequent analysis.
Calculation group classification
20
Researchers can also use rolling windows of data to calculate efficiency, but only retain the final year’s efficiency
score for analysis. Cook and Seiford (2009) refer to this as window analysis.
21
If the number of observations is roughly consistent from period to period and the frontier does not change too
dramatically, fixed effects are appropriate to control for changes. When the number of observations differs from
year to year, calculation group size effects may distort efficiency scores. Furthermore, fixed effects do not address
the loss of variation due to small calculation groups.
25
When implementing DEA with financial accounting data, researchers have typically
formed calculation groups by industry. For example, DLM sorts firm-years by Fama and French
(1997) 48 industries. Sorting by industry is motivated by the production-based origins of DEA,
and assumes that firms in the same industry are likely to use a similar mix of capital and
expenses to produce revenue. Despite this logic, industry grouping introduces methodological
problems. The first is the varied sizes of industry groups; this can lead to calculation group
problems. Second, sorting panel datasets by industry (but combining different years within the
same calculation group) introduces look-ahead bias into the calculations. In this section, I
examine the implications of calculation group classification on inferences using efficiency
scores.
Industry groups, regardless of classification system, vary in terms of observations. To
illustrate this, I present the breakout of observations by Fama and French (1997) industry group
in Table 5, Panel A (using data from 1980 to 2015). The industry groups range in size from a
low of 302 (Tobacco) to a high of 26,135 (Business Services). In the same table, I present the
mean operating efficiency score by industry, using the calculation in DLM. This range is also
wide, from a low of 0.267 (Pharmaceuticals) to a high of 0.935 (Shipbuilding). These results are
consistent with the calculation group size issues I discuss earlier, as the smaller industries have,
on average, higher mean efficiency; the correlation coefficient between industry size and
efficiency score is significantly negative at -0.579 (untabulated). The results also show
considerable cross-industry variation in mean efficiency, with a standard deviation of 0.184
(untabulated).
The second issue is look-ahead bias. Although difficult to quantify, the effects of this bias
could be substantial. The long time-series of financial accounting data (30 years in the original
26
version of DLM, and 35 years in the most recently updated dataset made available to
researchers) means that the economy has changed dramatically from the early to the late
observations. Comparing firms in the 1980s to firms in the 2010s may not be yield useful
inferences. Moreover, including subsequent years in the calculation can change a prior year’s
efficiency. As an example, in the DLM 2009 run Microsoft’s 1987 efficiency score was 0.771. In
the 2013 DLM run, Microsoft’s 1987 score had fallen to 0.651.22 This change illustrates how the
reference point and comparison group can affect calculated efficiency, and the importance to
researchers of appreciating these effects.23
The solution to look-ahead bias is to calculate efficiency not by industry, but rather on a
yearly basis.24 Researchers can address any non-stationarity in the frontier in the subsequent
empirical tests (e.g., year fixed effects). Yearly calculation also addresses, at least partially, the
discrepancy in group sizes in the industry-based sorting. In Table 5, Panel B, I present the
number of observations and mean efficiency by year for Compustat firms with sufficient data.
Although the yearly observations vary cyclically over time (rising from 4,674 in 1980 to a high
of 7,621 in the dot-com run-up 1999, then declining following the bursting of the bubble and the
2007-8 financial crisis back to 4,746 in 2015) the variation is muted relative to the industry
sorting. The variability in efficiency scores is similarly low, ranging from 0.237 (1982) to 0.358
(2009). Yearly mean efficiency shows an insignificant positive correlation with yearly
22
This analysis is based on datasets posted by DLM’s authors and is available for download.
23
The context of the research question guides whether look-ahead is an issue. If the researcher wants to understand
retrospectively how efficient Microsoft was in 1987, calculating with the whole panel is appropriate. If the
researcher wants to understand how Microsoft’s efficiency affected some firm decision or choice in 1987, using a
real-time score, not a score with future years’ data, will lead to better inferences.
24
Alternatively, researchers could calculate efficiency by industry and year. In the context of most studies using
financial accounting data (i.e., using Compustat data) calculation groups would be very small and there would be
insufficient variation in efficiency scores to derive valid inferences.
27
observations of 0.054 (untabulated).25 Additionally, the standard deviation of mean efficiency
across years is over six times smaller than the industry sorting, at 0.030.
The rationale for industry-based sorting is to group firms with similar operations. By
sorting by year but not industry, this commonality in production and operations is lost and
DMUs may not be comparable. A key question is whether this will damage the ability of DEA to
effectively differentiate the efficiency of firms. On the one hand, there may be industries that are
inherently more efficient than others; these industries are likely to compose the frontier and other
high efficiency observations, disproportionately underreporting the efficiency of firms in other
industries.26 On the other hand, although operations may differ, the classification and
aggregation of financial statement information may yield comparable results even for different
industries. As an example, firms in the Books and Consumer Products industries use different
production technologies and produce different products. Yet firms in each industry deploy a
variety of capital assets to produce revenue. Given the flexible nature of the DEA program (see
Section 2), it is not clear whether researchers can compare efficiency scores of firm-years across
industries.
This suggests the importance of calculation group classification is an empirical question.
To examine this, I start by calculating efficiency using both an industry-based sorting and a year-
based sorting. I present descriptive statistics for each of these in Table 6, Panel A. Notably, but
not surprisingly (based on the results reported in Table 5), the mean efficiency is lower using the
25
For industry-calculated DEA a regression of efficiency on the number of DMUs in the calculation group leads has
an R2 of 33.5%, suggesting in this design calculation group size explains one-third of variation in efficiency scores.
By way of comparison, for the year-calculated DEA, the R2 is only 0.3%.
26
There is little evidence that certain industries are systematically more efficient than others. In untabulated
analysis, I measure the mean efficiency by industry based on DEA calculated by year. The observations range from
0.207 (Gold Mining) to 0.563 (Tobacco), a much narrower range than when DEA was calculated by year. Moreover,
Tobacco is an outlier; the next highest industry (Soda) has mean efficiency of 0.333. This range (0.126) is almost
identical to range of DEA calculated by year (0.121), suggesting that, with the possible exception of Tobacco firms,
there are not systematic differences in efficiency across industries.
28
year-based sorting. In unreported analysis, I find that the correlation between the two efficiency
measurements is high (ρ = 0.51) but not perfect. This imperfect correlation suggests the
classification method used to form calculation groups could affect inferences from using
efficiency in subsequent empirical tests.
I use regression analysis to examine the impact that the sorting group has on efficiency. I
focus on the role of efficiency in predicting future accounting performance. Intuitively, firms
with higher efficiency should have superior future operating performance; more efficient firms
are better at maximizing revenues and minimizing costs, which leads to superior accounting
performance. I examine the relation between efficiency and future accounting performance using
the following OLS regression:
𝑅𝑂𝐴𝑡+1 = 𝛼 + 𝛽1 𝐸𝑓𝑓𝑐𝑖𝑒𝑛𝑐𝑦𝑡 + 𝛽2 ROA𝑡 + Γ𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠 + 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 + 𝜀
I measure accounting performance with ROA, defined as the firm’s earnings before extraordinary
items scaled by average total assets. I include contemporaneous ROA because accounting
performance is persistent over time. Other controls include firm size (the log of total assets),
leverage (the ratio of debt to total assets), and the market-to-book ratio (the market value of
equity plus the book value of debt scaled by total assets). I also include year and industry fixed
effects. Finally, I cluster standard errors by firm and year.
I present the results in Table 6, Panel B. The regression coefficients show that that higher
efficiency is associated with higher future earnings: the coefficients on both efficiency measures
are positive and significant. It is also notable that the coefficient on efficiency measured by year
(the second column) has a higher t-statistic than efficiency measured by industry (6.49 vs. 3.93),
suggesting a stronger statistical relation between year-based efficiency and future performance.
To test the significance of this difference, and to account for the correlation between efficiency
29
and the control variables, I run a Vuong test; this test compares the explanatory power of non-
nested models. If one regression was better at explaining future ROA this would be reflected in a
significant statistic in the Vuong test. The untabulated results of the Vuong test yields a Z-
statistic of -0.42, which is not statistically significant (p-value: 0.67). This suggests that, despite
the difference in significance on the coefficients for the two efficiency measures, the models are
equally successful in explaining future performance.
The above analysis suggests that, although using different classifications for calculation
groups yields different values for efficiency, this does not necessarily affect inferences. Given
how industry sorting leads to variability in calculation group sizes and possible look-ahead bias,
this evidence suggests that time-based sorting may be a more effective design when using
accounting information to calculate efficiency using DEA. I note, however, that I provide only
one validation test of the impact of this choice; researchers should examine the impact of this
design choice in the specific contexts of their studies.
Subsets of efficiency scores
The calculation group can have a substantial influence on the efficiency score. In addition
to issues discussed above, such as mechanical effects from calculation group size, the
composition of the calculation group will also affect scores and potentially inferences. One way
that this can manifest is when a study requires efficiency scores for a subset of firms with
information available to calculate efficiency. This yields a key design decision: should the
researcher calculate efficiency for the entire set of observations (and then draw the efficiency
scores for the subset), or should the researcher only calculate efficiency scores for the subset? In
the Appendix I provide a summary of research that uses DEA to calculate operating efficiency
30
with Compustat variables, and shows the additional data requirements in each study. This
illustrates the frequency with which this issue may be relevant.
As a concrete example, consider a researcher who wants to understand the association
between firm operating efficiency and management forecast accuracy. There will be fewer firm-
years with forecast data available than firm-years with sufficient data to calculate efficiency. The
researcher must then decide which set of firms to use when calculating efficiency. On the one
hand, efficiency could be calculated on all firm-years with data to calculate efficiency
(comprising most Compustat firm-years). On the other hand, the researcher could restrict the
efficiency calculation to only those firm-years with forecast data available. This leads to three
questions that are relevant to the researcher. First, do the subsample and the broader sample
differ significantly? Following the example from above, firms that issue management forecasts
are larger, have more transparent information environments, and are more profitable (Ajinkya et
al. 2005). Second, when the subsample deviates from the broader sample, does the efficiency
score also differ based on the sample upon which it is calculated? Third, if there is a difference,
does it affect inferences from using the scores? Based on the answers to these questions, the
researcher can make a judgment on the appropriate method to calculate efficiency.
I provide evidence related to these questions in this section. I start by contrasting the
population of Compustat firms with subsamples created by intersecting Compustat with two
commonly used databases, CRSP and Execucomp. After examining descriptive differences in the
samples, I examine differences in efficiency, calculating efficiency scores by year following the
output and inputs used in DLM. I follow with a validation test examining the association
between firm operating efficiency and CEO compensation.
CRSP
31
I start by examining the intersection of the Compustat and CRSP databases. As a
benchmark, I use the full Compustat sample; I calculate operating efficiency scores (following
DLM) for 208,735 firm-years from 1980 to 2015. I match data from this sample to CRSP,
requiring that CRSP have at least one monthly stock return to be matched to an annual
Compustat observation. This matching leads to 153,982 matches, or approximately 73.8% of
Compustat sample observations.
I present descriptive statistics for the full Compustat sample, including various firm
features. Then, for comparison purposes I present similar statistics for the subsample that
matches to CRSP. I present these statistics in Panels A and B of Table 7. The most striking
difference between the two samples is profitability: firms with data on CRSP are significantly
more profitable. Firms with CRSP coverage are also larger, have less debt, and lower book-to-
market ratios. To understand whether these differences affect the measurement of efficiency, I
calculate efficiency two different ways. First, I calculate efficiency using the full Compustat
dataset and merge these values into the Compustat-CRSP intersection. Second, I calculate
efficiency using just those observations in the Compustat-CRSP intersection. I present these
values in Table 7, Panel C. The first row (Efficiency – Full Compustat Sample) presents the
efficiency for the entire Compustat population from 1980 to 2015. The second row (Efficiency –
Compustat/CRSP Matched) presents similar statistics for efficiency (calculated using the full
Compustat sample) but only for the firm-years that have matched data on CRSP. A comparison
of these rows shows that the distributions are almost identical. This means, despite the
differences between firm-years in Full Compustat Sample and in the Compustat/CRSP Matched
sample, the differences do not pervade the measurement of efficiency when drawing this
subsample.
32
Next, I compare efficiency from the subsample of firms in the Compustat-CRSP
intersection where efficiency is calculated on just these firm-years. The logic behind calculating
efficiency separately is that the subsample will be a better comparison group because it will
exclude the much different firms not covered by CRSP. I present statistics for this subsample
calculation (Efficiency – Compustat/CRSP Intersection) in the third row. The mean and median
efficiency scores are higher in Row 3 relative to Row 2, as is the standard deviation; these
differences are all significant as reported in the bottom row of Panel C. I also test the correlation
between the two efficiency measures in Rows 2 and 3. This correlation is highly significant at
approximately 0.70 (untabulated). Collectively, this suggests that even when dealing with a fairly
broad subset of the Compustat firms, calculating efficiency on a non-random subsample of firm-
years can lead to different levels of reported efficiency.27 Thus, researchers should consider the
relevant comparison group (All Compustat firms? Only firms on CRSP?) when designing their
DEA efficiency calculations.
Execucomp
I next examine the intersection of Compustat and Execucomp. Unlike CRSP, which
covers a large portion of Compustat population, Execucomp is more limited both in terms of firm
and year coverage. In Table 8, I present descriptive statistics for Compustat firm-years (Panel A)
and Execucomp (Panel B). Because Execucomp’s broad coverage begins in 1993, I use the 1993
to 2015 time period for both Compustat and Execucomp data. The Execucomp subsample
comprises 34,044 observations, or about 24.2% of the Compustat sample.
The two panels reveal a similar, though more pronounced, pattern of differences between
the two samples as for the Compustat-CRSP intersection. Execucomp firm-years are
27
It is also likely that calculating efficiency on a smaller subsample leads to smaller calculation group sizes. As
illustrated earlier in Section 3, this can lead to mechanically higher levels of efficiency, even when the subsample
has identical characteristics to the sample from which it is drawn.
33
significantly larger and more profitable than the broader Compustat population over the reported
time period. They also have less debt and more value comprised of investment opportunities. In
Table 8, Panel C, I present comparisons between the full sample efficiency (Efficiency – Full
Compustat Sample), the subsample of Execucomp firms using the full sample efficiency
(Efficiency – Compustat/Execucomp Matched), and efficiency calculated yearly using only firm-
years with data available on Execucomp (Efficiency – Compustat/Execucomp Intersection). The
results reveal that efficiency calculated using just the Execucomp firm-years is significantly
higher on average than the efficiency based on all Compustat firm-years. As with CRSP, the
method of calculation appears to have a significant bearing on the reported efficiency.
Validation test
The prior section illustrates differences in the measurement of efficiency when using a
subset of the full population of firms for which efficiency can be calculated. While there are
clearly statistically significant differences, this does not does prove the measurement difference
is meaningful per se; this can only be assessed in application. In this section, I provide a
validation test to assess whether these measurement differences affect the results from empirical
tests.
Due to the dramatic differences reported in Table 8, I examine two different
measurements of efficiency for a sample of Execucomp firms: Efficiency- Compustat/Execucomp
Matched and Efficiency- Compustat/Execucomp Intersection. To test the importance of
differences in these scores, I examine the association between firm efficiency and CEO
compensation. Intuitively, I expect that CEOs of more efficient firms should be paid more than
those of less efficient firms. It is possible, however, that other firm features may fully explain the
34
efficiency part of the CEO’s pay, resulting in no measurable association. I run the following OLS
regression:
𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑚𝑝𝑒𝑛𝑠𝑎𝑡𝑖𝑜𝑛𝑡 = 𝛼 + 𝛽1 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦𝑡 + Γ𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠 + 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 + 𝜀
I measure total compensation as the natural logarithm of TDC1 from Execucomp; I restrict
sample observations to CEOs only. As control variables I include current and past return on
assets, the three-year standard deviation of return on assets, current and past stock returns, the
three-year standard deviation of stock returns, the prior year’s logged sales, the beginning of the
year logged market-to-book ratio, and CEO tenure (Core et al. 2008). I include year and industry
fixed effects, and cluster standard errors by firm and year. I run two specifications, each using
one of the two calculations of efficiency discussed above.
I report regression results in Table 9. The first column presents results using Efficiency-
Compustat/Execucomp Matched, where the coefficient on efficiency is positive and statistically
significant. The economic significance is substantial: a one standard deviation shift in efficiency
is accompanied by a 6.9% increase in total pay. In the second column efficiency I measure
efficiency using just Efficiency – Compustat/Execucomp Intersection. The coefficient on
efficiency in this specification is positive but insignificant. In economic terms, a one standard
deviation increase in efficiency is associated with just a 0.8% increase in total pay. The control
variables are largely consistent in terms of sign and significance.28
These results highlight a challenge to researchers when designing studies using DEA. In
the example noted above, the inferences from the tests are ultimately ambiguous—the researcher
cannot tell definitively whether efficiency is related to CEO compensation or not. The first
column reports the effects of efficiency measured relative to all Compustat firms, and suggests
28
The only exception is contemporaneous ROA, which is negative in the first specification and positive in the
second, although statistically significant in neither.
35
that Execucomp CEOs as a group are compensated for their firms’ superior efficiency. The
second column, on the other hand, examines efficiency measured relative only to other
Execucomp firms, and indicates, that, in the cross-section of these firms, relative efficiency is not
associated with greater compensation. The key issue for researchers is to carefully identify,
within the context of the research question, the appropriate grouping for efficiency calculation.
As this example illustrates, the decision can be subtle, yet have significant implications on the
empirical results and their interpretation. Furthermore, understanding the robustness of an
empirical result to alternative measurement—and providing evidence for readers of robustness—
is a vital way to improve inferences.
4. Conclusion
This study examines methodological considerations of calculating efficiency using DEA
with large panel datasets. The results reveal four important insights for researchers. First, the
calculation group size used for DEA can effect efficiency scores; there is a strong negative
correlation between mean efficiency and calculation group size. Furthermore, small calculation
group size attenuates the variance within the group, meaning that researchers cannot necessarily
address small calculation group problems with fixed effects in subsequent analysis. Second,
comparing efficiency scores calculated in different time-based calculation groups, and especially
using DMU-level changes, can lead to incorrect inferences, especially when the frontier is
shifting over time.
Third, although DEA was developed by operations researchers and grouping is typically
based on common operations or production technologies (i.e., industry), the method appears to
be robust to alternative groupings, including year-based calculation groups. This is relevant when
using panel data, as industry-based groups can lead to small calculation group size and look-
36
ahead bias. Fourth, researchers should be cautious when drawing subsamples of efficiency scores
from a set of scores calculated from a broader population of firm-years, particularly when the
subsample differs from the broader population. I illustrate that inferences are sensitive to this
choice in the context of executive compensation.
DEA provides a powerful method to calculate relative efficiency, and is useful in a
variety of contexts (both those being used currently and those yet to be developed). This study
presents several methodological issues that are likely to arise when using this method with large
panel datasets, and provides a set of prescriptions of how researchers can use DEA to estimate
and design their tests to alleviate these concerns.
37
Appendix
Paper (Authors, Journal, Year) Observations Time Period Data Requirements

Baik, Chae, Choi, and Farber (CAR 2013) 71,733 1976-2008 CRSP
Baik, Farber, and Lee (CAR 2011) 14,315 1995-2005 Execucomp, First Call
Banker, Darrough, Huang, and Plehn-Dujowich (TAR 2012) 2,413 1992-2006 Execucomp
Bonsall, Holzman, and Miller (MS 2017) 87,759 1985-2011 Future ROA, future returns, credit ratings
Chang, Hayes, and Hillegeist (MS 2015) 1,610 1992-2001 Execucomp (CEO turnover)
Chen, Pololski, and Veeraraghavan (JEF 2015) 42,754 1993-2006 NBER patent data, Execucomp
Cornaggia, Krishnan, and Wang (CAR 2017) 25,113 1987-2013 S&P long-term credit ratings
Demerjian, Lev, Lewis, and McVay (TAR 2013) 78,423 1989-2009 Data for calculating EQ measures
Evans, Luo, and Nagarajan (TAR 2014) 264 1980-2004 Bankruptcy data
Francis, Hasan, Mani, and Ye (JFE 2016) 3,694 2006-2010 S&P 1500, Execucomp
Garrett, Hoitash, and Prawitt (JAR 2014) 1,005 2005-2010 “Great Places to Work” survey from Fortune Magazine
Guo, Huang, Zhang, and Zhou (TAR 2016) 7,804 2004-2008 KLD Socrates, Audit Analytics
Jung, Lee, and Weber (CAR 2014) 62,165 1983-2007 CRSP
Koester, Shevlin, and Wangerin (MS 2017) 44,616 1994-2010 Data to calculate cash effective tax rate
Kubick and Lockhart (JCF 2016) 16,150 1994-2012 Execucomp
Qiu, Trapkov, and Yakoub (JBF 2014) 2,198 1994-2010 Merger and acquisition data (SDC)
Notes to the Appendix: This table summarizes published and forthcoming studies that use a DEA-based efficiency measure (including the DLM MA Score) as a
principal explanatory variable. The list does not include studies which use DEA efficiency scores in robustness tests or as a control variable. The first column
presents the authors, journal, and year. The second provides the maximum number of observations used in the study; in many cases, some tests used fewer. The
third column shows the time period of each study’s data. The final column shows the data requirement(s) for the study.
Journal abbreviations: CAR – Contemporary Accounting Research; JAR – Journal of Accounting Research; JBF – Journal of Banking and Finance; JCF –
Journal of Corporate Finance; JEF – Journal of Empirical Finance; JFE – Journal of Financial Economics; MS – Management Science; TAR – The Accounting
Review
38
References
Abbott, M. and C. Doucouliagos. 2003. The efficiency of Australian universities: A data

envelopment analysis. Economics of Education Review 22(1): 89-97.
Adler, N. and J. Berechman. 2001. Measuring airport quality from the airlines’ viewpoint: An
application of data envelopment analysis. Transport Policy 8(3): 171-181.
Ajinkya, B., S. Bhojraj, and P. Sengupta. 2005. The association between outside directors,
institutional investors and the properties of management earnings forecasts. Journal of
Accounting Research 43(3): 343-376.
Athanassopoulos, A. and E. Shale. 1997. Assessing the comparative efficiency of higher

education institutions in the UK by means of data envelopment analysis. Education
Economics 5(2): 117-134.
Avkiran, N. 1999. An application of reference for data envelopment analysis in branch banking:
Helping the novice researcher. International Journal of Bank Marketing 17(5): 206-220.
Avkiran, N. 2001. Investigating technical and scale efficiencies of Australian universities

through data envelopment analysis. Socio-Economic Planning Sciences 35(1): 57-80.
Baik, B., J. Chae, S. Choi, and D. Farber. 2013. Changes in operational efficiency and firm
performance: A frontier analysis approach. Contemporary Accounting Research 30(3): 996-
1026.
Baik, B., D. Farber, and S. Lee. 2011. CEO ability and management earnings forecasts.
Contemporary Accounting Research 28(5): 1645-1668.
Banker, R., R. Conrad, and R. Strauss. 1986. A comparative application of data envelopment
analysis and translog methods: An illustrate study of hospital production. Management
Science 32(1): 30-44.
Banker, R., M. Darrough, R. Huang, and J. Plehn-Dujowich. 2012. The relation between CEO
compensation and past performance. The Accounting Review 88(2): 1-30.
Bonsall, S., E. Holzman, and B. Miller. 2016. Managerial ability and credit risk assessment.
Management Science 63(5): 1425-1449.
Brown, R. 2006. Mismanagement or mismeasurement? Pitfalls and protocols for DEA studies in
the financial services sector. European Journal of Operational Research 174: 1100-1116.
Chang, W., R. Hayes, and S. Hillegeist. 2015. Financial distress risk and new CEO
compensation. Management Science 62(2): 479-501.
39
Charnes, A., W. Cooper, and E. Rhodes. 1981. Data envelopment analysis: Approach for
evaluating program and managerial efficiency—with an application to the program follow
through experiment in U.S. public school education. Management Science 27(6): 668-697.
Charnes, A., W. Cooper, and E. Rhodes. 1978. Measuring the efficiency of decision making
units. European Journal of Operational Research 2(6): 429-444.
Chen, Y., E. Podolski, and M. Veeraraghavan. 2015. Does managerial ability facilitate corporate
innovative success? Journal of Empirical Finance 34: 313-326.
Cook, W., M. Hababou, and H. Tuenter. 2000. Multicomponent efficiency measurement and
shared inputs in data envelopment analysis: An application to sales and service performance
in bank branches. Journal of Productivity Analysis 14(3): 209-224.
Cook, W. and L. Seiford. 2009. Data envelopment analysis (DEA)—Thirty years on.” European
Journal of Operational Research 192: 1-17.
Cooper, W., L. Seiford, and K. Tone. 2006. Introduction to data envelopment analysis and its
uses. Springer, New York.
Core, J., W. Guay, and D. Larcker. 2008. The power of the pen and executive compensation.
Journal of Financial Economics 88: 1-25.
Cornaggia, K., G. Krishnan, and C. Wang. 2017. Managerial ability and credit ratings.
Contemporary Accounting Research, forthcoming.
Demerjian, P., B. Lev, and S. McVay. 2012. Quantifying managerial ability; A new measure and
validity tests. Management Science 58(7): 1229-1248.
Demerjian, P., B. Lev, M. Lewis, and S. McVay. 2013. Managerial ability and earnings quality.
The Accounting Review 88(2): 463-498.
Dhungana, B., P. Nuthall, and G. Nartea. 2004. Measuring the economic inefficiency of
Nepalese rice farms using data envelopment analysis. Australian Journal of Agricultural and
Resource Economics 48(2): 347-369.
Dyckhoff, H., and K. Allen. 2001. Measuring ecological efficiency with data envelopment
analysis (DEA). European Journal of Operational Research 132(2): 312-325.
Dyson, R., R. Allen, A. Camanho, V. Podinovski, C. Sarrico, and E. Shale. 2001. Pitfalls and
protocols in DEA. European Journal of Operational Research 132: 245-259.
Evans, J., S. Luo, and N. Nagarajan. 2013. CEO turnover, financial distress, and contractual
innovation. The Accounting Review 89(3): 959-990.
40
Fama, E. and K. French. 1997. Industry costs of equity. Journal of Financial Economics 43: 153-
193.
Farrell, M. 1957. The measurement of productive efficiency. Journal of the Royal Statistical
Society. Series A (General) 120(3): 253-290.
Feroz, E., S. Kim, and R. Raab. 2003. Financial statement analysis: A data envelopment analysis
approach. Journal of the Operational Research Society 54(1): 48-58..
Francis, B., I. Hasan, S. Mani, and P. Ye. 2016. Relative peer quality and firm performance.
Journal of Financial Economics 122: 267-282.
Fraser, I. and D. Cordina. 1999. An application of data envelopment analysis to irrigated dairy
farms in Northern Victoria, Australia. Agricultural Systems 59(3): 267-282.
Garrett, J., R. Hoitash, and D. Prawitt. 2014. Trust and financial reporting quality. Journal of
Accounting Research 52(5): 1087-1125.
Gillen, D. and A. Lall. 1997. Developing measures of airport productivity and performance: An
application of data envelopment analysis. Transportation Research Part E: Logistics and
Transportation Review 33(4): 261-273.
Grigorian, D. and V. Manole. 2002. Determinants of commercial bank performance in transition:

An application of data envelopment analysis. Comparative Economic Studies 48(3): 497-522.
Guo, J., P. Huang, Y. Zhang, and N. Zhou. 2016. The effect of employee treatment policies on
internal control weaknesses and financial restatements. The Accounting Review 91(4): 1167-
1194.
Jacobs, R. 2001. Alternative methods to examine hospital efficiency: Data envelopment analysis
and stochastic frontier analysis. Health Care Management Science 4(2): 103-115.
Johnes, J. 2006. Data envelopment analysis and its application to the measurement of efficiency
in higher education. Economics of Education Review 25(3): 273-288.
Jung, B., W. Lee, and D. Weber. 2014. Financial reporting quality and labor investment
efficiency. Contemporary Accounting Research 31(4): 1047-1076.
Kao, C. and S. Hwang. 2008. Efficiency decomposition in two-stage data envelopment analysis:
An application to non-life insurance companies in Taiwan. European Journal of Operational
Research 185(10): 418-429.
Kim, S., C. Park, and K. Park. 1999. An application of data envelopment analysis in telephone
offices evaluation with partial data. Computers & Operations Research 26(1): 59-72.
41
Koester, A., T. Shevlin, and D. Wangerin. 2017. The role of managerial ability in corporate tax
avoidance. Management Science, forthcoming.
Korhonen, P. and M. Luptacik. 2004. Eco-efficiency analysis of power plants: An extension of

data envelopment analysis. European Journal of Operational Research 154(2): 437-446.
Kubick, T. and G. Lockhart. 2016. Do external labor market incentives motivate CEOs to adopt
more aggressive corporate tax reporting preferences? Journal of Corporate Finance 36: 255-
277.
Lin, L., and C. Hong. 2006. Operational performance evaluation of international major airports:
An application of data envelopment analysis. Journal of Air Transport Management 12(6):
342-351.
Qiu, B., S. Trapkov, and F. Yakoub. 2014. Do target CEOs trade premiums for personal
benefits? Journal of Banking and Finance 42: 23-41.
Sherman, D. and F. Gold. 1985. Bank branch operating efficiency: Evaluation with data
envelopment analysis. Journal of Banking and Finance 9(2): 297-315.
Smith, P. 1990. Data envelopment analysis applies to financial statements. OMEGA 18(2): 131-
138.
Tongzon, J. 2001. Efficiency measurements of selected Australian and other international ports
using data envelopement analysis. Transportation Research Part A: Policy and Practice
35(2): 107-122.
Vassiloglou, M. and D. Giokas. 1990. A study of the relative efficiency of bank branches: An
application of data envelopment analysis. Journal of the Operational Research Society 41(7):
591-597.
Yeh, Q. 1996. The application of data envelopment analysis in conjunction with financial ratios
for bank performance evaluation. Journal of the Operational Research Society 47(8): 980-
988.
42
TABLE 1
Calculation group size: data and tests
Panel A: Descriptive statistics
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Sales 1,600 55.006 26.654 10.000 31.000 56.000 78.000 100.000
Capital A 1,600 43.670 29.549 2.700 19.450 36.720 62.835 135.240
Capital B 1,600 43.957 29.356 2.200 20.160 36.840 62.790 133.860
Expense 1,600 27.235 16.619 2.310 14.000 23.760 39.000 80.000
Panel B: Efficiency by calculation group
DEA1600 1,600 0.582 0.213 0.250 0.408 0.555 0.741 1.000
(benchmark)
DEA800 1,600 0.605 0.211 0.265 0.428 0.572 0.771 1.000
DEA400 1,600 0.635 0.210 0.285 0.455 0.603 0.810 1.000
DEA200 1,600 0.672 0.208 0.307 0.490 0.649 0.864 1.000
DEA100 1,600 0.723 0.201 0.343 0.548 0.718 0.923 1.000
DEA50 1,600 0.779 0.188 0.371 0.616 0.807 0.965 1.000
DEA25 1,600 0.839 0.164 0.417 0.717 0.895 0.988 1.000
Panel C: Correlations
DEA1600 DEA800 DEA400 DEA200 DEA100 DEA50 DEA25
DEA1600 0.994 0.980 0.958 0.926 0.882 0.823
DEA800 0.995 0.994 0.978 0.951 0.909 0.849
DEA400 0.983 0.995 0.993 0.973 0.936 0.878
DEA200 0.966 0.984 0.995 0.990 0.961 0.908
DEA100 0.947 0.970 0.985 0.994 0.984 0.943
DEA50 0.927 0.953 0.972 0.985 0.993 0.977
DEA25 0.911 0.937 0.958 0.973 0.982 0.988
Panel D: Descriptive statistics by efficiency quartile
1st Quartile 4th Quartile
2nd Quartile 3rd Quartile
(lowest) (highest)
Mean StdDev Mean StdDev Mean StdDev Mean StdDev
DEA1600 0.337 0.043 0.473 0.042 0.642 0.057 0.886 0.084
(benchmark)
DEA800 0.358 0.046 0.496 0.044 0.665 0.065 0.902 0.079
DEA400 0.386 0.052 0.532 0.058 0.702 0.080 0.922 0.071
DEA200 0.421 0.061 0.577 0.076 0.747 0.094 0.944 0.061
DEA100 0.472 0.074 0.645 0.095 0.810 0.100 0.967 0.046
DEA50 0.537 0.094 0.725 0.108 0.872 0.089 0.982 0.030
DEA25 0.621 0.113 0.813 0.105 0.929 0.064 0.992 0.015
43
Panel E: Within Observation Variability
Mean Standard 95% Confidence
Variable
Deviation Interval
DEA800 0.023 [0.559, 0.651]
DEA400 0.037 [0.561, 0.709]
DEA200 0.049 [0.574, 0.770]
DEA100 0.061 [0.601, 0.845]
DEA50 0.072 [0.635, 0.923]
DEA25 0.077 [0.685, 0.993]
Notes to Table 1: This table presents descriptive statistics for analysis of calculation group size. Panel A presents
summary statistics on the simulated dataset used to calculate efficiency scores, including one output (SALES) and
three inputs (CAPITAL A, CAPITAL B, and EXPENSE). SALES is set to vary uniformly between 10 and 100.
CAPITAL A and CAPITAL B each vary randomly between 20 and 140% of SALES. EXPENSE varies randomly
between 20 and 80% of SALES. Using these parameters, I generate a dataset of 1,600 observations. Panel B
presents summary statistics for efficiency scores calculated for different sized calculation groups. DEA1600 is
efficiency calculated on the full dataset and serves as the benchmark for the other calculations. DEA800 is based on
random samples of 800 DMUs drawn from the original dataset; I draw 50 datasets and run DEA for each. The
reported efficiency in Panel B is the mean efficiency, by original sample DMU, over the 50 draws. DEA400,
DEA200, DEA100, DEA50, and DEA25 represent similar statistics from smaller calculation groups. Panel C
presents correlations between the different calculation group measures of efficiency. Panel D presents means and
standard deviations of efficiency by calculation group, sorted by the quartile of DEA1600. Panel E presents the
mean within-DMU standard deviation of efficiency.
44
TABLE 2
Calculation group size: data and tests (2 outputs, 6 inputs)
Sales A 55.058 26.438 10.000 32.000 55.000 78.000 100.000 55.058
Sales B 54.489 26.376 10.000 32.000 53.000 78.000 100.000 54.489
Capital A 44.937 30.752 2.520 20.095 37.120 63.605 139.000 44.937
Capital B 43.909 29.851 2.200 19.740 35.825 63.290 138.600 43.909
Capital C 42.605 29.676 2.100 19.045 34.315 61.550 137.200 42.605
Capital D 43.097 29.222 2.400 19.610 35.865 61.510 136.000 43.097
Expense A 22.142 12.646 2.000 11.730 20.225 30.800 58.410 22.142
Expense B 21.823 12.849 2.100 11.755 19.440 29.825 60.000 21.823
Panel B: Efficiency by calculation group
DEA1600 1,600 0.756 0.170 0.372 0.620 0.768 0.909 1.000
(benchmark)
DEA800 1,600 0.855 0.146 0.433 0.742 0.896 1.000 1.000
DEA400 1,600 0.883 0.135 0.450 0.789 0.941 1.000 1.000
DEA200 1,600 0.912 0.119 0.469 0.847 0.979 1.000 1.000
DEA100 1,600 0.940 0.099 0.498 0.920 0.995 1.000 1.000
DEA50 1,600 0.964 0.074 0.536 0.970 1.000 1.000 1.000
DEA25 1,600 0.982 0.048 0.582 0.992 1.000 1.000 1.000
Notes to Table 2: This table presents descriptive statistics for analysis of calculation group size using an extended
model. Panel A presents summary statistics on the simulated dataset used to calculate efficiency scores, including
one outputs (SALES A and SALES B) and six inputs (CAPITAL A, CAPITAL B, CAPITAL C, CAPITAL D,
EXPENSE A and EXPENSE B). SALES A and SALES B are set to vary uniformly between 10 and 100; their
values are independent. CAPITAL A and CAPITAL B each vary randomly between 20 and 140% of SALES A.
CAPITAL C and CAPITAL D each vary randomly between 20 and 140% of SALES B. EXPENSE A varies
randomly between 20 and 80% of SALES A, and EXPENSE B varies randomly between 20 and 80% of SALES B.
Panel B presents summary statistics for efficiency scores calculated for different sized calculation groups. DEA1600
is efficiency calculated on the full dataset and serves as the benchmark for the other calculations. DEA800 is based
on random samples of 800 DMUs drawn from the original dataset; I draw 50 datasets and run DEA for each. The
reported efficiency in Panel B is the mean efficiency, by original sample DMU, over the 50 draws. DEA400,
DEA200, DEA100, DEA50, and DEA25 represent similar statistics from smaller calculation groups.
45
TABLE 3
Multiple time periods: data
Panel A: Current sample
Sales 1,600 54.519 26.098 10.000 32.000 54.000 77.500 100.000
Capital A 1,600 55.253 30.518 7.260 29.755 51.595 76.830 139.000
Capital B 1,600 54.880 29.662 6.500 31.240 51.455 75.045 138.600
Expense 1,600 43.685 22.090 6.000 25.230 42.765 59.285 99.000
Panel B: Future sample
Sales 1,600 54.519 26.098 10.000 32.000 54.000 77.500 100.000
Capital A 1,600 44.491 25.810 4.300 23.835 39.960 61.550 120.000
Capital B 1,600 43.367 25.255 4.200 22.945 39.055 58.490 119.000
Expense 1,600 32.660 17.045 4.300 18.480 31.680 44.945 80.000
Panel C: All observations
Sales 3,200 54.519 26.098 10.000 32.000 54.000 77.500 100.00
Capital A 3,200 49.872 28.765 4.300 26.620 45.140 69.140 139.000
Capital B 3,200 49.124 28.138 4.200 26.535 44.725 68.250 138.600
Expense 3,200 38.173 20.482 4.300 21.120 36.200 52.490 99.000
Notes to Table 3: This table presents descriptive statistics and analysis on data used to analyze the effects of multiple
time periods. Panel A presents summary statistics for inputs (CAPITAL A, CAPITAL B, and EXPENSE) and the
output (SALES) for the current sample. SALES varies uniformly between 10 and 100. CAPITAL A and CAPITAL
B vary between 60 and 140% of Sales. EXPENSE varies between 60 and 100% of sales. Panel B presents summary
statistics for inputs and outputs for the future sample. SALES are the same in this dataset as in the current sample
(i.e., if DMU112 is 32 in the current sample, then DMU112 must be 32 in the future sample.) CAPITAL A and
CAPITAL B vary between 40 and 120% of Sales, and EXPENSE varies between 40 and 80% of Sales. Panel C
presents summary statistics for the current and future samples combined.
46
TABLE 4
Multiple time periods: tests
Panel A: Separate calculations
Efficiency- Current 1,600 0.832 0.113 0.600 0.741 0.843 0.928 1.000
Efficiency- Future 1,600 0.782 0.141 0.501 0.663 0.784 0.909 1.000
Difference 1,600 -0.049 0.176 -0.487 -0.171 -0.048 0.069 0.394
Improve 1,600 0.393 0.489 0.000 0.000 0.000 1.000 1.000
Panel B: Joint calculations
Efficiency-All 3,200 0.673 0.157 0.401 0.556 0.639 0.785 1.000
Efficiency- Current 1,600 0.564 0.076 0.401 0.506 0.565 0.625 0.855
Efficiency- Future 1,600 0.782 0.141 0.501 0.663 0.784 0.909 1.000
Difference 1,600 0.218 0.156 -0.189 0.102 0.222 0.333 0.596
Improve 1,600 0.906 0.292 0.000 1.000 1.000 1.000 1.000
Notes to Table 4: This table presents efficiency calculations for multiple time periods. Panel A presents summary
statistics when efficiency is measured in separate runs. EFFICIENCY – CURRENT is efficiency calculated with
only the current sample. EFFICIENCY – FUTURE is efficiency calculated with only the future sample.
DIFFERENCE is the difference between the two measures. IMPROVE is the proportion of observations where
EFFICIENCY – FUTURE is higher than EFFICIENCY – CURRENT. Panel B presents summary statistics
efficiency is calculated in the same run. EFFICIENCY – ALL is the efficiency calculated for the combined current
and future samples. Other definitions are similar to those in Panel A.
47
TABLE 5
Observations and efficiency by industry and year
Panel A: Observations sorted by industry
Industry Observations Mean Efficiency
Agriculture 908 0.728
Food 4,218 0.766
Soda 617 0.841
Beer & liquor 973 0.638
Tobacco 302 0.891
Toys 1,931 0.635
Fun 4,246 0.444
Books 1,935 0.618
Household products 4,348 0.728
Clothing 3,004 0.728
Health 4,056 0.567
Medical equipment 6,893 0.453
Drugs 10,186 0.267
Chemicals 4,549 0.609
Rubber 2,454 0.847
Textiles 1,469 0.835
Building materials 5,725 0.591
Construction 2,909 0.669
Steel 3,649 0.684
Fabricated products 1,019 0.892
Machinery 7,895 0.711
Electrical equipment 2,619 0.710
Utilities 4,973 0.490
Automobiles 3,717 0.741
Aerospace 1,211 0.875
Ships 448 0.935
Guns 344 0.931
Gold 2,405 0.320
Mining 2,071 0.279
Coal 502 0.797
Energy 14,760 0.299
Telecom 9,282 0.581
Personal services 2,321 0.709
Business services 26,135 0.371
Computers 9,091 0.479
Chips 13,166 0.464
Laboratory Equipment 4,725 0.645
Paper 3,471 0.796
Boxes 747 0.934
Transportation 6,987 0.605
Wholesale 9,662 0.632
Retail 11,931 0.723
Restaurants 4,881 0.523
Total 208,735 0.545
48
Panel B: Observations sorted by year
Year Observations Mean Efficiency
1980 4,674 0.266
1981 4,692 0.267
1982 4,713 0.237
1983 4,992 0.250
1984 5,143 0.267
1985 5,117 0.258
1986 5,364 0.239
1987 5,621 0.245
1988 5,601 0.268
1989 5,469 0.256
1990 5,437 0.252
1991 5,481 0.240
1992 5,652 0.291
1993 6,013 0.281
1994 6,373 0.305
1995 6,723 0.294
1996 7,525 0.303
1997 7,709 0.261
1998 7,395 0.288
1999 7,621 0.293
2000 7,425 0.264
2001 7,066 0.309
2002 6,616 0.290
2003 6,338 0.290
2004 6,179 0.294
2005 6,007 0.287
2006 5,849 0.285
2007 5,606 0.300
2008 5,343 0.338
2009 5,240 0.358
2010 5,106 0.273
2011 4,955 0.269
2012 4,888 0.318
2013 5,048 0.319
2014 5,008 0.335
2015 4,746 0.336
Total 208,735 0.284
Notes to Table 5: This table presents summary statistics on the Compustat population with sufficient data to
calculate efficiency following the approach of Demerjian et al. (2012). Panel A sorts firms by Fama and Franch
(1997) industry and shows the number of observations and mean efficiency where the calculation is by industry.
Panel B presents observations and efficiency calculated by year.
49
TABLE 6
Calculation group classification: data and tests
Efficiency- Industry 208,735 0.545 0.271 0.012 0.316 0.551 0.770 1.000
Efficiency- Year 208,735 0.284 0.159 0.000 0.204 0.251 0.324 1.000
Panel B: Regression results
ROAt+1
Efficiency – Industry 0.049***
(3.93)
Efficiency – Year 0.070***
(6.44)
ROAt 0.568*** 0.568***
(22.18) (22.40)
Size 0.022*** 0.022***
(8.09) (8.45)
Leverage -0.177*** -0.176***
(-6.83) (-6.80)
Market-to-Book -0.016*** -0.016***
(-14.47) (-14.21)
Intercept -0.026 -0.005
(-1.36) (-0.29)
Fixed Effects? Year, Industry Year, Industry
Observations 175,304 175,304
Adjusted R-squared 0.581 0.581
Notes to Table 6: This table presents results comparing efficiency calculated by industry group (EFFICIENCY –
INDUSTRY) and efficiency calculated by year (EFFICIENCY – YEAR). Panel A presents descriptive statistics on
efficiency measurement. Panel B presents regression results. The dependent variable is the future ROA (ROAt+1),
the future year’s ratio of earning (Compustat: IB) to average total assets (AT). ROA t is the current year’s return on
assets. SIZE is the natural logarithm of total assets. LEVERAGE is the ratio of total debt (DLTT + DLC) scaled by
total assets. MARKET-TO-BOOK is the total firm market value over the total firm book value ((CSHO*PRCC_C)+
(DLTT+DLC) / AT). All variables are winsorized at the top and bottom 1% of observations. Each regression
includes year and industry (Fama and French (1997) 48) fixed effects. Standard errors are clustered by firm and
year. *** indicates statistical significance at the 1% level.
50
TABLE 7
Subsets of efficiency scores: Compustat-CRSP intersection
Panel A: All Compustat efficiency – descriptives from 1980 to 2015
Variable Obs. Mean Median
Total Assets 208,735 2,301.49 113.368
MVE 189,111 2,248.06 96.994
ROA 208,722 -0.090 0.027
Operating ROA 208,143 0.026 0.109
Leverage 208,036 0.301 0.219
Book-to-Market 188,537 1.046 0.895
Panel B: Compustat-CRSP intersection – descriptives from 1980 to 2015
Obs. Mean t-statistic of Median Z-statistic of
difference difference
(pooled t-test) (Wilcoxon)
Total Assets 153,944 2,571.04*** 5.46 142.863*** 35.49
MVE 152,802 2,556.30*** 6.48 132.995*** 36.95
ROA 153,933 -0.028*** 42.94 0.033*** 26.15
Operating ROA 153,552 0.074*** 40.43 0.115*** 20.42
Leverage 153,383 0.238*** -48.37 0.199*** -26.34
Book-to-Market 152,282 1.003*** -14.98 0.879*** -6.51
Panel C: Comparison of efficiency calculations
Efficiency- Full Compustat 208,735 0.284 0.156 0.000 0.204 0.251 0.324 1.000
Sample
Efficiency- Compustat/CRSP 153,982 0.283 0.156 0.001 0.203 0.252 0.326 1.000
Matched
Efficiency- Compustat/CRSP 153,982 0.391 0.232 0.000 0.227 0.354 0.514 1.000
Intersection
Difference (3) – (2) 0.108*** 0.076*** 0.102***
Test (t-test / F-test / Wilcoxon) 151.90 2.20 146.44
Notes to Table 7: This table presents descriptive statistics and tests on the intersection of the Compustat and CRSP
databases. Panel A presents summary statistics on the full Compustat population between 1980 and 2015. Reported
variables include TOTAL ASSETS (AT), MVE (CSHO*PRCC_C), ROA (IB / avg(AT)), OPERATING ROA
(OIADB / avg(AT)), LEVERAGE ((DLTT + DLC) / AT), BOOK-TO-MARKET (AT / (MVE+DLTT+DLC)), and
efficiency scores (calculated for the full Compustat sample, by year). Panel B reports similar statistics for the
subsample of firms with observations on both Compustat and CRSP; I require firms to have at least one monthly
return to be included in this subsample. The table reports differences in means (based on pooled t-tests) and medians
(based on Wilcoxon tests). Panel C presents summary statistics on different efficiency scores. EFFICIENCY- FULL
COMPUSTAT SAMPLE is the efficiency for the entire Compustat population. EFFICIENCY-
COMPUSTAT/CRSP MATCHED is the efficiency of the firms with data on CRSP, based on the full Compustat
calculation of efficiency. EFFICIENCY- COMPUSTAT/CRSP INTERSECTION is efficiency calculated using only
firms with data available on CRSP. The bottom rows present differences between EFFICIENCY-
COMPUSTAT/CRSP MATCHED and EFFICIENCY-COMPUSTAT/CRSP INTERSECTION, including test of the
means (t-test), standard deviations (F-test), and medians (Wilcoxon). *** indicates statistical significance at the 1%
level.
51
TABLE 8
Subsets of efficiency scores: Compustat-Execucomp Intersection
Panel A: All Compustat efficiency – eescriptives from 1993 to 2015
Variable Obs. Mean Median
Total Assets 140,779 3,007.78 168.654
MVE 130,131 3,030,78 159.828
ROA 140,774 -0.124 0.074
Operating ROA 140,351 -0.002 0.172
Leverage 140,282 0.304 0.385
Book-to-Market 129,704 0.997 1.254
Panel B: Compustat-Execucomp intersection – descriptives from 1993 to 2015
Obs. Mean t-statistic of Median Z-statistic of
difference difference
(pooled t-test) (Wilcoxon)
Total Assets 34,044 5,923.28*** 20.76 1,067.09*** 139.01
MVE 33,543 6,849.65*** 27.06 1,172.21*** 144.18
ROA 34,044 0.040*** 79.74 0.096*** 75.81
Operating ROA 33,992 0.145*** 90.94 0.203*** 78.35
Leverage 33,908 0.224*** -39.53 0.333*** -8.97
Book-to-Market 33,410 0.848*** -33.52 1.074*** -24.68
Panel C: Comparison of efficiency calculations
Efficiency- Full Compustat 140,779 0.298 0.164 0.000 0.208 0.262 0.343 1.000
Sample
Efficiency-Compustat 34,044 0.358 0.169 0.003 0.249 0.313 0.413 1.000
/Execucomp Matched
Efficiency- Compustat / 34,044 0.581 0.220 0.030 0.416 0.545 0.724 1.000
Execucomp Intersection
Difference (3) – (2) 0.223*** 0.051*** 0.232***
Test (t-test / F-test / Wilcoxon) 148.22 1.70 140.44
Notes to Table 8: This table presents descriptive statistics and tests on the intersection of the Compustat and
Execucomp databases. Panel A presents summary statistics on the full Compustat population between 1993 and
2015. Reported variables include TOTAL ASSETS (AT), MVE (CSHO*PRCC_C), ROA (IB / avg(AT)),
OPERATING ROA (OIADB / avg(AT)), LEVERAGE ((DLTT + DLC) / AT), BOOK-TO-MARKET (AT /
(MVE+DLTT+DLC)), and efficiency scores (calculated for the full Compustat sample, by year). Panel B reports
similar statistics for the subsample of firms with observations on both Compustat and Execucomp; I require firms to
have annual total compensation (TDC1) to be included in the subsample. The table reports differences in means
(based on pooled t-tests) and medians (based on Wilcoxon tests). Panel C presents summary statistics on efficiency
scores. EFFICIENCY- FULL COMPUSTAT SAMPLE is the efficiency for the entire Compustat population.
EFFICIENCY- COMPUSTAT/EXECUCOMP MATCHED is the efficiency of the firms with data on Execucomp,
based on the full-sample DEA calculation. EFFICIENCY- COMPUSTAT/EXECUCOMP INTERSECTION is the
efficiency calculated with only Execucomp firms. The bottom rows present differences between EFFICIENCY-
COMPUSTAT/EXECUCOMP MATCHED and EFFICIENCY- COMPUSTAT/EXECUCOMP INTERSECTION,
including test of the means (t-test), standard deviations (F-test), and medians (Wilcoxon). *** indicates statistical
significance at the 1% level.
52
TABLE 9
Subsets of efficiency scores: Validation test
Log(CEO Total Compensation)

Efficiency- Compustat/Execucomp Matched 0.411***
(4.355)
Efficiency- Compustat/Execucomp Intersection 0.037
(0.569)
ROAt -0.094 0.002
(-0.777) (0.016)
ROAt-1 -0.125 -0.109
(-1.152) (-0.987)
StdROAt-2,t -0.018 0.067
(-0.125) (0.479)
Returnt 0.181*** 0.187***
(7.544) (7.523)
Returnt-1 0.172*** 0.172***
(7.041) (6.978)
StdReturnt-2,t 0.063 0.090
(0.152) (0.214)
Log(Salest-1) 0.397*** 0.419***
(23.029) (24.974)
Market-to-Bookt-1 0.060*** 0.068***
(3.313) (3.662)
CEO Tenure -0.005 -0.005
(-1.583) (-1.585)
Constant 3.787*** 3.731***
(13.887) (13.565)
Fixed Effects Industry, Year Industry, Year
Observations 29,973 29,973
Adjusted R-squared 0.380 0.378
Notes to Table 9: This table presents regression results examining the association between efficiency and CEO total
Compensation. The dependent variable in the regressions is the natural log of total CEO compensation (TDC1).
EFFICIENCY-EXECUCOMP SUBSAMPLE is efficiency measured for the full Compustat population.
EFFICIENCY-EXECUCOMP ONLY is efficiency measured with only Execucomp firm-years. ROA is the return
on assets (IB / avg(AT)) measured for both for the current and past year; STDROA is the three-year standard
deviation of ROA. RETURN is the one-year buy-and-hold return (calculated with monthly CRSP data) measured for
both the current and past year; STDRETURN is the three-year standard deviation of return (based on 36 monthly
observations). SALES is the reported revenue (SALE). MARKET-TO-BOOK is the total firm market value over the
total firm book value ((CSHO*PRCC_C)+(DLTT+DLC) / AT). CEO TENURE is the number of years the CEO has
been with the firm. All variables are winsorized at the top and bottom 1%. Regressions include fixed effects for
industry and year, and I cluster standard errors by firm and year. *** indicates statistical significance at the 1%
level.
53

Dea 082917 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dea 082917 PDF

Uploaded by

Copyright:

Available Formats

Calculating Efficiency with Financial Accounting Data:

Data Envelopment Analysis for Accounting Researchers

Keywords: data envelopment analysis, financial accounting, DEA

efficiency of different observations within a group. Efficiency is typically defined as the

a fixed level of outputs. Relative efficiency is determined by grouping a set of similar

in a similar design as DLM, examining a wide range of research questions.2

methodological issues related to implementing DEA.

calculation group sizes to avoid distorting inferences.

avoid problems with calculation group size and look-ahead bias.

of Compustat firm-years is potentially relevant.

set of assumptions is more appropriate than another.

delineation of potential issues. By understanding potential methodological issues, researchers are

stronger inferences on the effects of efficiency in a variety of contexts.

efficiency of a firm’s compensation contracts, tax planning strategies, or government and

DEA background and calculation overview

efficiency measurement, the researcher would calculate the following:

does not allow.

In the remainder of this section I provide an overview of DEA calculations. My treatment

initially selects weights u and v which yield efficiency of one.5

and corresponding efficiency scores for each of the n DMUs.

inputs or outputs will be accompanied by corresponding deterioration in other inputs or outputs

that ultimately countervails the efficiency improvement. In short, there is no alternative

Non-financial accounting applications

Although originally developed in operations management, researchers have subsequently

shipping ports (Tongzon 2001).

Australia, and Dhungana et al. (2004) focus on Nepalese rice farms.

DEA in financial accounting research

A number of early DEA applications examine financial statement information. These

29 oil and gas firms.

of financial accounting data, I focus on their first-stage DEA calculation.7

that differ too dramatically are essentially incomparable.8

Calculation group size

combinations of inputs will trace out other regions of the frontier.

above. I present descriptive statistics on these outputs in Table 1, Panel A.

term the efficiency scores DEA800.

DEA400, DEA200, DEA100, DEA50, and DEA25.

possible efficiency measures and reducing variability.

between the measures.

low end of the distribution.

frontier and variation is constrained to be low.

grows as calculation group sizes get smaller.

the number of inputs and outputs.15

Multiple time periods

changes in efficiency by individual time period or by pooling multiple time periods. To

also analyze the implications of measuring the change in efficiency.

outputs in Table 3, Panel A.17

Table 3, Panel B, with combined statistics presented in Table 3, Panel C.

efficiency scores are rendered incomparable between years.

I present additional statistics to illustrate the distortionary effect of separate yearly

the efficient frontier has shifted.

Due to potential problems in measuring changes when efficiency is calculated on a

wider than either of the component distributions.

calculating jointly allows for an accurate measure of efficiency score changes.

data—separately by year, or by pooling multiple years—presents potential inference problems.

and 2014) and use this change for subsequent analysis.

Calculation group classification

(1997) 48 industries. Sorting by industry is motivated by the production-based origins of DEA,

examine the implications of calculation group classification on inferences using efficiency

Industry groups, regardless of classification system, vary in terms of observations. To

considerable cross-industry variation in mean efficiency, with a standard deviation of 0.184

researchers of appreciating these effects.23

high efficiency observations, disproportionately underreporting the efficiency of firms in other

This suggests the importance of calculation group classification is an empirical question.