You are on page 1of 54

Calculating Efficiency with Financial Accounting Data:

Data Envelopment Analysis for Accounting Researchers

Peter R. Demerjian*
University of Washington

August 2017

Abstract
Recent years have seen a preponderance of accounting research using data envelopment analysis
(DEA) to measure efficiency. In this study, I examine the calculation of efficiency using DEA,
with a focus on large panel datasets of financial accounting data. Using simulation and archival
data, I examine four methodological considerations that arise when calculating efficiency with
panel data: calculation group size, calculating efficiency for datasets covering multiple time
periods, the choice of calculation group classification, and using subsets of efficiency scores
calculated from larger datasets. I find that each of these issues potentially influences the
efficiency scores generated by DEA. Based on these methodological issues, I provide evidence
and prescriptions to aid researchers using DEA.

Keywords: data envelopment analysis, financial accounting, DEA


JEL classifications: C61, C67, M41

* I gratefully acknowledge the helpful feedback of Bok Baik, Doug DeJong, Weili Ge, Allison Koester, Melissa
Lewis-Western, Dawn Matsumoto, Paige Patrick and workshop participants at the University of Maastricht.
Financial support was provided by the Foster School of Business and William R. Gregory Accounting Faculty
Fellowship.
Contact information: pdemerj@uw.edu; 206-221-1648.
1. Introduction

Recent years have seen a surge of interest in financial accounting research using data

envelopment analysis (DEA). DEA is an optimization program that measures the relative

efficiency of different observations within a group. Efficiency is typically defined as the

maximization of outputs for a fixed level of inputs, or alternatively the minimization of inputs for

a fixed level of outputs. Relative efficiency is determined by grouping a set of similar

observations (termed decision-making units, or DMUs) and calculating the subset of the most

efficient of those observations, called the efficient frontier. DMUs that are not on the frontier

receive a score conveying their efficiency relative to frontier observations. For example, a DMU

with an efficiency score of 0.85 is 15 percent less efficient than the closest DMU that is on the

efficient frontier.

Although researchers have used DEA in a variety of research areas (such as operations

and management accounting) for several decades, only recently have studies employed this

method widely in financial accounting research. This interest in DEA in financial accounting

contexts stems from research by Demerjian. et al (2012: DLM). In their study, they calculate the

operating efficiency of firms using DEA employing a wide range of accounting variables over a

long time series.1 Many subsequent studies have used DEA with financial accounting variables

in a similar design as DLM, examining a wide range of research questions.2

Given the recent interest in the literature in DEA, I have three main objectives in

presenting this research. First, use of DEA is relatively new to the financial accounting literature.

I use this study to provide an introduction to this methodology for new users, including

1
DLM uses OLS regression in the second stage of their MA Score calculation. Since the focus of this study is on
DEA, I restrict my discussion to their first stage.
2
In the Appendix I list recently published papers that have used a DEA-based efficiency score in their primary
analysis.

1
describing current and potential applications. Second, there are many assumptions and choices

that go into designing the DEA efficiency program. Even for current DEA users, the implications

of these design choices may not be clear. I therefore revisit these assumptions and choices and

highlight their implications for calculating efficiency with DEA. Third, calculating efficiency

using financial accounting data typically involves large panel datasets that introduce a new set of

methodological concerns not currently examined in the literature. To illustrate potential issues

when calculating efficiency with large panel datasets of accounting data, I use both simulated

and archival data from Compustat and other commonly used databases. I examine four

methodological issues related to implementing DEA.

The first issue I examine relates to calculation group size. The calculation group is the

set of observations against which a researcher calculates a DMU’s efficiency. When the

calculation group has many observations, some DMUs will be on the efficient frontier, but most

will be inside the frontier (i.e., inefficient relative to frontier DMUs). As calculation group size

gets smaller, relatively more DMUs are on the frontier. This has two important effects. First, the

average efficiency score in the calculation group is mechanically higher than if a larger

calculation group had been used. Second, the standard deviation among efficiency scores is

lower. In short, small calculation groups compress the distribution of efficiency scores,

eliminating informative cross-sectional variation. I examine the extent of this issue using

simulated datasets. I find mechanical effects as predicted. Further, the effect is exacerbated by

increasing the number of inputs and outputs. Based on these results, I believe researchers must

be aware of the potential effects of small calculation groups and, where possible, expand

calculation group sizes to avoid distorting inferences.

2
The second issue I investigate is calculating efficiency using data from multiple time

periods. Unlike earlier DEA-based studies, where researchers drew observations from a single

time period, studies using financial accounting data often have panel datasets available. With

firms having multiple observations over time, new and interesting analyses become available to

researchers. For example, we can examine changes in aggregate efficiency over time, or the

evolution of efficiency in a specific firm or reporting unit. Calculating DEA efficiency over

multiple time periods does, however, introduce problems that are not relevant in single time

period settings. I examine these problems using simulated datasets. The first problem is in

measuring efficiency over multiple time periods when the efficient frontier is shifting over time;

this introduces a mechanical issue where efficiency scores are not be comparable across different

time periods. The second issue is the potential of look-ahead bias, where future observations are

used to calculate current efficiency scores. Again using simulation, I find that using of panel data

and calculating with data from multiple time periods has the potential to distort efficiency scores.

Drawing on these analyses, I make two recommendations. First, when calculating the level of

efficiency over time, I recommend running DEA by time period (e.g., year) and using fixed

effects to control for over-time changes in the efficient frontier. Second, when calculating

changes in efficiency, I recommend pooling pairs of years, calculating DEA efficiency using

both years, and retaining only the change from the resulting calculation.

The third problem pertains specifically to the choice of calculation group classification.

Studies using DEA with financial accounting data have typically sorted firms by industry, such

as the Fama and French (1997) 48-industry classification. Sorting by industry conforms to

DEA’s operations-based origins, in that research has implicitly assumed that firms in the same

industry will employ similar operational and productive technologies to produce revenue.

3
Industry sorting introduces two possible methodological issues. First, different industries have

different numbers of firms. As indicated by the results on calculation group size, smaller

industries will have mechanically higher efficiency scores, rendering inferences from tests

pooling scores across industries questionable. Second, sorting by industry and pooling across

years leads to possible look-ahead bias. Using archival data from Compustat, I examine the

implications of these two issues on efficiency scores. Despite the theoretical appeal of industry-

based sorting, I find that year-based sorting yields similar inferences in an empirical test of the

association between firm operating efficiency and accounting profitability, suggesting that

efficiency calculations using Compustat data may be robust to alternative classification schemes.

Based on these results, I recommend using year-based sorting, rather than industry-based, to

avoid problems with calculation group size and look-ahead bias.

The final problem I examine relates to using subsets of efficiency scores calculated from

larger datasets. In many research settings, financial accounting data (from Compustat) is merged

to a database with more limited coverage (such as IBES or Execucomp). This leads to an

important design choice: should the researcher measure efficiency for the entire set of firm-years

with available data (in the case of DLM, the majority of firm-years on Compustat) or only on the

subset of firm-years that are the subject of the study? I examine what effect this design choice

has on inferences from tests. I calculate efficiency scores using all firm-years from Compustat

and merge the subsequent scores with two more limited databases: CRSP and Execucomp. In

each case, I find that the efficiency scores differ whether a) I calculate the scores on just the

subset or b) I draw the scores from the more broadly calculated (i.e., full Compustat-based)

scores. Furthermore, I examine the inferences from a test of firm efficiency on CEO

compensation and find that scores calculated using the different methods yield substantively

4
different inferences. This suggests that the method used to calculate efficiency scores for a subset

of Compustat firm-years is potentially relevant.

I draw two broad conclusions from these analyses. First, each of the issues noted above

has the potential to influence the efficiency scores calculated with DEA. That is, I present

evidence showing how calculation group size, multiple time periods, calculation group

classification, and subsets of efficiency scores can affect the efficiency scores generated by

DEA. I also illustrate how, in some cases, these issues can lead to significant differences in

inferences (for example, using subsets of efficiency scores) but in others inferences appear

unaffected (for example, forming calculation groups by year rather than industry). A key insight

is that researchers must be aware of the possible implications of their implementation choices

and understand how their choices affect the measurement of efficiency and related inferences.

Second, given the broad set of choices that must be made in designing studies using

DEA, researchers should carefully assess the appropriateness and robustness of the choices they

make. I provide several methodological suggestions for researchers who employ DEA with

financial accounting data, but they are only that—suggestions. Within each study, researchers

should carefully consider their objective in measuring efficiency, verify that their efficiency

scores are robust to alternative assumptions, and be able to argue on theoretical grounds why one

set of assumptions is more appropriate than another.

This study contributes to the growing literature that measures efficiency using DEA. In

the spirit of Dyson et al. (2001) and Brown (2006), I identify several “Pitfalls and Protocols”

related to implementing DEA with large panel datasets with particular attention to accounting

data. Thus this research builds on the vast literature on DEA methodology. While the

methodological considerations I discuss are not unique to financial accounting data or databases,

5
the recent prominence of DEA in financial accounting research suggests a need for a clear

delineation of potential issues. By understanding potential methodological issues, researchers are

better equipped to make and justify appropriate choices. This should lead to research yielding

stronger inferences on the effects of efficiency in a variety of contexts.

The results of this study should also be helpful to researchers looking to apply efficiency

in different settings in financial accounting research. Although most applications of DEA using

financial accounting data have focused on measuring firm operating efficiency3, DEA is flexible

enough to provide insight into a variety of applications using financial accounting data. Possible

unexplored avenues for analysis include measuring the efficiency of specific capital investments,

R&D expenditures, and mergers and acquisitions. Additionally, DEA could be used to assess the

efficiency of a firm’s compensation contracts, tax planning strategies, or government and

regulatory lobbying activities. In short, DEA is a powerful method that incorporates flexibility

into the calculation of efficiency. It is important that researchers understand the assumptions and

choices that go into the DEA program as well as the potential effects of their research design

choices.

2. Background on DEA

DEA background and calculation overview

Efficiency, defined broadly, is the maximization of output(s) for a fixed level of input(s),

or alternatively the minimization of input(s) for a fixed level of output(s). Farrell (1957) notes

several problems with measuring efficiency, specifically related to multiple inputs and outputs.

He develops a method of relative efficiency measurement, where the efficiency of any unit is

evaluated not against an absolute benchmark but rather against a set of comparison units.

3
Following DLM, firm operating efficiency is measured as the efficiency of a firm producing revenue given the
level of capital and certain period expenses. I provide a further discussion in Section 2.

6
Drawing on Farrell (1957), Charnes, et al. (1978) develop the DEA model, which expands the

Farrell measure and is innovative in several ways. First, it allows for multiple inputs and outputs

without requiring an explicitly imposed weighting scheme. For example, a researcher may want

to measure the efficiency of firms in producing revenue using capital and labor. Under traditional

efficiency measurement, the researcher would calculate the following:

𝑅𝑒𝑣𝑒𝑛𝑢𝑒
= 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦
𝛾1𝐶𝑎𝑝𝑖𝑡𝑎𝑙 + 𝛾2 𝐿𝑎𝑏𝑜𝑟

The researcher would need to assign weights γ1 and γ2 for the calculation. DEA does not require

such an externally imposed weighting scheme, but rather calculates implicit weights. Second, the

implicit weights are allowed to vary by the unit under study. Following the example above, this

allows variation in the optimal mix of capital and labor, something a fixed weighting scheme

does not allow.

In the remainder of this section I provide an overview of DEA calculations. My treatment

is brief and meant to highlight the fundamentals of this technique; for more in-depth discussions,

I refer readers to Cooper et al. (2006) and Cook and Seiford (2009). Consider a researcher

wanting to measure the relative efficiency of n units (DMUs). The DMUs are sorted into groups

based on commonality in production technology or operations; these are the calculation groups

for the DEA optimization program. The efficiency calculation is based on a vector y of outputs

containing s elements (y1, y2,…, ys) and a vector x of inputs containing m elements (x1, x2,…xm).

The inputs and outputs are used to solve the following program:

∑𝑠𝑟=1 𝒖𝑟 𝑦𝑟𝑛
max𝒖,𝒗 𝜃 = ∑𝑚
𝑖=1 𝒗𝑖 𝑥𝑖𝑛

subject to:

∑𝑠𝑟=1 𝒖𝑟 𝑦𝑟𝑗
∑𝑚
≤1 ∀ j = 1,…,n
𝑖=1 𝒗𝑖 𝑥𝑖𝑗

7
u1,…, us ≥ 0

v1,…, vm ≥ 0

In the program, u and v are vectors of weights—the implicit weights—on the outputs and inputs

respectively. The program’s objective is to find the u and v that maximize the ratio θ subject to

the constraints of the program. The program starts with the first DMU and calculates weight

vectors u and v that maximize the ratio of the weighted average outputs to the weighted average

inputs for that DMU.4 The first condition constrains the maximum efficiency (θ), so the program

initially selects weights u and v which yield efficiency of one.5

This first potential vector of weights for the first DMU is then applied to all other DMUs

in the calculation group. As with the DMU under study, the efficiency for other DMUs is also

constrained to be one or less (from the first constraint, which covers all observations in the

calculation group). As such, if these weights yield a calculated efficiency of greater than one for

any DMU in the calculation group, u and v are rejected and the program selects another set of

weights and starts the process again. The program proceeds to iteratively test different weighting

schemes against the other DMUs in the calculation group. The program ultimately selects the set

of weights that maximizes θ for the DMU under study while not yielding a calculated efficiency

greater than one for any other DMU in the calculation group. This DMU-specific set of weights

u and v leads are used to calculate the efficiency score for the first DMU.

The DEA program then proceeds to the second DMU and follows the same procedure: a

pair of weight vectors u and v is selected that optimize efficiency for the second DMU subject to

4
In practice this fractional program is typically transformed to be run as a linear program; see Cooper et al. (2006)
pgs. 23-25 for details.
5
The second and third conditions require all inputs and outputs to be non-zero with the inequality holding strictly
for at least one output and one input. This prevents degenerate solutions (e.g., efficiency will be measured as infinite
if all input weights are zero.)

8
the constraints of the problem.6 These weights are applied to the inputs and outputs of all other

DMUs in the calculation group (including the first) and the program proceeds iteratively until the

optimal sets of weights that satisfy the program conditions are found. The program then does the

same for the third, fourth, and remaining DMUs until it has calculated DMU-specific weights

and corresponding efficiency scores for each of the n DMUs.

The efficiency scores the program calculates range from a low of zero to a high of one.

Observations having a value of one are on the efficient frontier. DMUs on the frontier are

optimally efficient in the Pareto sense: any possible efficiency improvements due to changing

inputs or outputs will be accompanied by corresponding deterioration in other inputs or outputs

that ultimately countervails the efficiency improvement. In short, there is no alternative

weighting scheme that could yield a higher relative efficiency score for that DMU. For off-

frontier DMUs, the efficiency score conveys the degree of inefficiency relative to the nearest

frontier DMU. For example, a DMU with an efficiency score of 0.85 must improve by 15

percent, either through increases in outputs or decreases in inputs (or some combination of both)

to attain the efficiency of the closest frontier DMU. As such, DEA efficiency score is a radial

efficiency measure, capturing relative efficiency beyond just the ordinal sense.

DEA applications

Non-financial accounting applications

Although originally developed in operations management, researchers have subsequently

used DEA in a variety of applications across a wide range of academic disciplines and research

settings. Charnes et al. (1981) examine the success of an education program. Researchers have

also used DEA to measure the efficiency of higher education (Athanassopoulos and Shale 1997;

6
These weights need not be the same for different DMUs. The flexibility of the weights (allowing DMU-specific
weights rather than imposing weights for all calculation group observations) implicitly allows firms to optimize with
different mixes of inputs and outputs.

9
Avkiran 2001; Abbott and Doucouliagos 2003; Johnes 2006). Other institutional settings include

hospitals (Banker et al. 1986; Jacobs 2001), telecommunications (Kim et al. 1999), aerospace

and airports (Gillen and Lall 1997; Adler and Berechman 2001; Lin and Hong 2006), and

shipping ports (Tongzon 2001).

In the financial realm, researchers have used DEA to measure the efficiency of banks

(Sherman and Gold 1985; Vassiloglou and Giokas 1990; Avkiran 1999; Cook et al. 2000;

Grigorian and Manole 2002). Outside of banking, studies also examine insurance (Kao and

Hwang 2008). A number of studies examine agricultural settings (Fraser and Cordina 1999;

Dhungana et al. 2004) and ecological performance (Dyckhoff and Allen 2001; Korhonen and

Luptacik 2004).

A common thread in terms of research design of DEA studies is that they tend to feature

specific, focused settings. Charnes et al. (1981), in an early application of DEA, examine

“Follow Through”, a federal education program which focused on 70 public schools. Banker et

al. (1986) examine 114 hospitals in North Carolina. Agricultural studies tend to be similarly

circumscribed and specific: Fraser and Cordina (1999) study dairy farms in North Victoria,

Australia, and Dhungana et al. (2004) focus on Nepalese rice farms.

DEA in financial accounting research

A number of early DEA applications examine financial statement information. These

studies use small, homogeneous samples. Smith (1990) extends traditional ratio analysis using

DEA for a sample of 47 pharmaceutical firms. Yeh (1996) completes a similar exercise for six

Taiwanese banks. Feroz et al. (2003) also supplement ratio analysis with DEA using a sample of

29 oil and gas firms.

10
Recent research by DLM has stimulated a new line of DEA-based research using large

panel datasets of financial accounting information. DLM uses DEA to calculate firm-year

efficiency with financial statement variables for a broad set of firms with data available on

Compustat. They calculate firm-year efficiency scores in the first stage of a two-stage procedure;

in the second stage they use Tobit regressions to control for firm-specific features and isolate the

effects of the manager, which they term the managerial ability, or the MA Score. Because the

focus of this study is issues related to calculating efficiency using DEA for large panel datasets

of financial accounting data, I focus on their first-stage DEA calculation.7

DLM uses a single output (sales revenue) and seven inputs (net PP&E, net operating

leases, net capitalized R&D, purchased goodwill, other intangible assets, cost of goods sold, and

SG&A expenses). Their data spans firm-years with sufficient data on Compustat for the period

1980 to 2009. They focus on industrial firms so they exclude financial industries (banks,

insurance, real estate, and financial services). Furthermore, they exclude utilities because these

firms are regulated, which likely affects the correspondence between sales and the inputs. Their

sample consists of 177,512 firm-year observations from 44 Fama and French (1997) industries.

They report considerable variation in the number of observations by industry, ranging from a low

of 268 (tobacco) to a high of 21,884 (business services). The DLM calculation of efficiency

pools all firm-years within an industry and calculates relative efficiency across the full study

time period.

3. Methodological Considerations

7
In the second stage, DLM runs the first stage DEA efficiency score through regressions with firm-year controls
that help or hinder efficiency: size, market share, free cash flows, age, segment density, and foreign operations. They
attribute any variation in operating efficiency that is not explained by the controls to the manager; that is, they use
the residual from the second-stage regressions as their measure of managerial ability.

11
As noted earlier, many of the studies that measure efficiency with DEA use small,

specific samples and calculate efficiency within these groups. The logic underlying this sample

selection approach is that it is difficult to interpret relative efficiency if there was too much

variation in the underlying operations and production of DMUs within a calculation group. In

other words, researchers have typically presumed that DMUs in the same calculation group

should have similar underlying production functions because DMUs with production functions

that differ too dramatically are essentially incomparable.8

Recent research in financial accounting using DEA departs from this design by

employing large panel datasets of financial accounting information. This departure has led to

four methodological considerations not relevant for small-sample, industry-specific studies but

important for researchers to consider when using large financial accounting datasets. Two of

these issues relate to the DEA calculation in general. The first pertains to the size of calculation

groups, and the second relates the measurement and interpretation of efficiency scores over time.

The other issues arise related more specifically to use of accounting databases. The third pertains

to the grouping of firm-years for the DEA calculation, and the fourth relates to using subsets of

efficiency scores based on the intersection of accounting data with other datasets. I discuss each

of these issues in the remainder of this section, and provide recommendations to researchers

confronting these problems, along with tests validating alternative approaches, in subsequent

sections.

Calculation group size

8
This assumption is typically implicit rather than explicitly stated in studies employing DEA. Cook and Seiford
(2009) note “The original idea behind DEA was to provide a methodology whereby, within a set of comparable
decision making units (DMUs), those exhibiting best practice could be identified, and would form an efficient
frontier.” (emphasis added)

12
Because DEA measures relative efficiency, researchers sort observations into groups of

similar DMUs to execute the program. The first constraint in the optimization forces the

maximum efficiency to a value of one. This constraint, coupled with calculation groups of

different sizes, has the potential to mechanically distort efficiency scores across groups. As an

example, consider using DEA in a setting with one output and one input. In this case, there is a

single optimal mix of output and input; holding the output equal, the frontier DMU will be the

one that minimizes inputs. Now, extend the example to one output and two inputs. In this case,

there are potentially multiple DMUs on the frontier. Again holding output equal, there will be

one DMU on the frontier that optimizes with a low value of the first input. Similarly, another

will optimize with a low value of the second input. Additionally, there could be multiple linear

combinations of the two inputs that also yield optimal efficiency; the DMUs with these

combinations of inputs will trace out other regions of the frontier.

With multiple inputs and outputs, high dimensionality can lead to many DMUs tracing

the frontier. Holding the number of inputs and outputs equal, proportionally more DMUs will be

on the frontier when the calculation group is small. Moreover, even DMUs that are not on the

frontier will have more reference points on the frontier, and thus will themselves be closer to the

frontier (and have higher efficiency scores). This suggests, holding other things equal, small

calculation groups will lead to higher mean efficiency scores. Additionally, with the maximum

efficiency being constrained at one, this will also lower variance among these scores. I also

expect that the extent of this problem increases with the number of inputs and outputs. In the

remainder of this section I present empirical tests and results that explore calculation group size

effects.

13
I use simulation to create a dataset that isolates the effects of calculation group size and to

abstract away from the potential idiosyncrasies of archival data. To start, I define a simple

process to generate a dataset with one output and three inputs.9 The output is Sales, which I set to

vary randomly between 10 and 100.10 Two of the inputs are capital, termed Capital A and Capital

B. Their values are also stochastic, ranging from 20 to 140% of sales, and are independent of

each other. The third input is Expense; this is also indexed to Sales, ranges between 20 and 80%,

and is independent of both capital accounts.11 The design of this dataset is meant to emulate a

simple business setting where revenue is generated from two capital sources and one periodic

expense source. The dataset allows for a wide range of aggregate input values, from a low of

60% of sales to a high of 360%. I create a dataset of 1,600 observations using the process noted

above. I present descriptive statistics on these outputs in Table 1, Panel A.

From this sample I make random draws to create DEA calculation groups of different

sizes. Before drawing the data for the simulation, I calculate efficiency scores for the entire

sample of 1,600 DMUs. This yields the true relative efficiency for the sample, and serves as a

benchmark to evaluate efficiency run with smaller calculation groups. I term the efficiency

scores from this full-sample run DEA1600. I then draw subsamples from this base sample of

1,600 DMUs to determine the mechanical effects of smaller calculation groups on efficiency

scores.

For the first round of the simulation, I randomly select 800 observations from the full

sample of 1,600. Using this first draw, I calculate efficiency using DEA and tabulate the scores

9
This example is generalizable to any number of outputs and inputs; here I use a relatively simple example to help
the reader maintain intuition.
10
All the random variables used in the simulation follow a uniform distribution.
11
I define the output and inputs in a financial accounting context due to the focus of the paper. The interpretation of
the simulation results can be applied to setting with non-financial accounting outputs and inputs without loss of
generality.

14
for each observation.12 I then draw a second random, independent calculation group of 800 from

the original 1,600 and calculate efficiency again. I repeat this procedure for a total of 50

iterations. Since each draw comprises half of the base sample, each DMU from the base sample

appears an average of 25 times across the 50 simulation draws.13 Pooling all simulation runs, the

total number of efficiency observations is 40,000 (800 in each calculation group for 50

simulation draws). I merge these by the original sample DMU identifier and calculate the mean

efficiency by DMU across all simulation draws. This yields a sample of 1,600 observations,

matching the size of the original sample and ultimately summarizing the 50 simulation draws. I

term the efficiency scores DEA800.

For the next round of the simulation, I reduce the calculation group size to 400. In order

to maintain the same aggregate number of observations, I increase the number of simulation

draws to 100 (400 in each calculation group for 100 simulation draws). This ensures that each

DMU appears, on average, 25 times across the simulation runs. In each successive simulation

round, I halve the size of the calculation group and double the number of simulation draws: the

subsequent calculation group sizes are 200 (leading to 200 simulation iterations), 100 (400

iterations), 50 (800 iterations), and 25 (1,600 iterations). I call the efficiency scores I generate

DEA400, DEA200, DEA100, DEA50, and DEA25.

I evaluate the effect of calculation group size in several ways, starting by examining the

mean efficiency. I present these results in Table 1, Panel B. As noted earlier, DEA1600 serves as

the benchmark for the true relative efficiency. The mean value of DEA1600 is 0.584, the

standard deviation is 0.213, the median is 0.555, and the scores ranges from 0.25 to 1. The

12
I retain the DMU identifier from the original dataset of 1,600 observations. I use this to compare errors across
different calculation group sizes.
13
Across the 50 simulation draws individual DMUs are drawn as few as 15 times and as many as 38. The median
number of draws is 25, the 25th percentile is 23, and the 75th percentile is 27.

15
second row shows the results for DEA800, where the calculation groups comprise 800

observations. The mean is larger, at 0.605, as is the median of 0.572. Mean and median

differences between the two groups are statistically significant, although their magnitudes are

relatively low.14 It is also notable that the standard deviation, at 0.211, is slightly lower than the

benchmark run; this is consistent with the smaller calculation group increasing the lowest

possible efficiency measures and reducing variability.

I present descriptive statistics for the efficiency from other simulation rounds in the

subsequent rows of Table 1, Panel B. The results reveal a consistent pattern: means, medians,

and other distribution parameters (other than maximums, which are one in every case) are

increasing monotonically as the calculation group size gets smaller. While the mean for the

benchmark sample is 0.584, the mean for DEA25 is 0.839, a highly significant 43.7% increase. A

similarly large difference is in place for medians. The standard deviation decreases

monotonically from 0.213 for the full sample to 0.164, a decline of 23%. This evidence is

consistent with smaller calculation groups a) moving the lower bound of the distribution higher,

b) compressing the distribution of efficiency, and thus c) reducing the variance of the

distribution. This suggests that small calculation groups cause a reduction in potentially

informative variation.

For the next analysis, I measure the correlation between efficiency scores from each of

the calculation group simulation rounds; I present these results in Table 1, Panel C, with Pearson

correlations in the upper triangle. These correlations are highly significant in all pairs, although

they decline monotonically as calculation group size gets smaller. This suggests that smaller

calculation group size introduces increased measurement error. I present Spearman rank-

14
The difference in the means has a t-statistic of 2.82. The difference in medians, tested with a Wilcoxon test, has a
Z-statistic of 2.99.

16
correlations in the lower triangle of Table 1, Panel C. Although these coefficients are higher, a

similar monotonic trend obtains. This suggests that the errors introduced by using calculation

group size do not just affect the cardinal values of efficiency, but even affect the ordinal relation

between the measures.

To further understand the effects of varying calculation group size, I sort observations

into quartiles based on DEA1600 and tabulate the mean and standard deviation of the different

efficiency scores. I present these results in Table 1, Panel D. The first column shows the statistics

for the lowest quartile of DEA1600. Not surprisingly, these statistics show an increasing trend

from DEA1600 to DEA25—the smaller the calculation group size, the higher the mean value.

Interestingly, the standard deviation also reveals an increasing pattern. This suggests that,

opposite of the full sample results, smaller calculation group size leads to more variation in the

low end of the distribution.

The second column shows results for the 2nd quartile of DEA1600. The same pattern is in

place for both the mean and the standard deviation. In the 3rd quartile, the means are still

increasing as calculation group size gets smaller. The mean efficiency scores from the smaller

calculation groups are getting closer to the frontier: the mean value of DEA50 is 0.872, and

DEA25 is 0.929. As the distributions become more compressed near a value of one, the standard

deviation falls. In the third column, the standard deviation increases from DEA1600 to DEA100,

but then decreases for DEA50 and DEA25. In the final column means are still increasing, but

standard deviations are decreasing monotonically. This illustrates the degree to which the

distribution of efficiency in small calculation groups is constrained and thus has little room to

vary. For example, the mean value of DEA25 is 0.992, and the full range in this quartile is 0.905

to 1. This suggests that the most serious inference issues with small calculation group sizes are

17
likely to be among the most efficient DMUs, where more observations are forced toward the

frontier and variation is constrained to be low.

The next test related to calculation group size exploits the simulation structure to assess

the potential for error in measurement from using smaller calculation groups. As an example,

consider the simulation that generates DEA800. In this simulation, I draw subsamples of 800

observations from the base dataset 50 different times and calculate efficiency for each

subsample. Since the draws are random, each DMU from the base dataset appears on average 25

times across the 50 simulation runs. In the analysis above, I average across the 50 runs to get the

mean effect across simulation runs. In this analysis, I measure the standard deviation of scores by

DMU across the 50 runs. For example, say that DMU1138 from the base sample appears 25

times across the 50 simulation runs, and the scores are (0.35, 0.37, 0.29, …, 0.30). If the standard

deviation of this series is low, this means that the score is relatively consistent across simulation

runs, suggesting precise measurement despite being run in smaller calculation groups. Larger

standard deviations, in contrast, imply a greater range across the simulation runs and greater

error. I present these results in Table 1, Panel E, including the standard deviation and confidence

intervals (5%, 95%) of mean efficiency. The standard deviation for DEA800 is 0.023, suggesting

a relatively tight distribution of scores across simulations. The standard deviations increase as

calculation group size gets smaller, up to 0.072 for DEA25. This impact is borne out in the

confidence intervals: The range for DEA800 is relatively narrow at [0.559, 0.651], while DEA25

has the wide range of [0.685, 0.993]. This analysis suggests that not only do the means get larger

and variances smaller for smaller calculation group size, but also that the likelihood of error also

grows as calculation group sizes get smaller.

18
The tests I describe in Table 1 focus on simulated data with one output and three inputs. I

expect, furthermore, that the dimensionality of the output and input sets will affect the issues I

document in Table 1, with larger numbers of inputs and outputs exacerbating small calculation

group issues. To examine this I produce a new simulated dataset with two outputs and six inputs.

The data generating process of the dataset is similar to the dataset used in Table 1, but I double

the number of each type of input and output. For outputs, I include two sales (Sales A and Sales

B), each ranging from 10 to 100 with each being independent of the other. The inputs include

four capital measures. Capital A and B range (independently) between 20 and 140% of Sales A,

and Capital C and D range between 20 and 140% of Sales B. Similarly, Expense A ranges

between 20 and 80% of Sales A and Expense B ranges between 20 and 80% of Sales B.

Using this new dataset, I complete simulation similar as with the original one-output,

three-input dataset. I again start by calculating efficiency for the full 1,600 observations, and

draw random samples of successively smaller size to assess the effects of calculation size group.

In Table 2, Panel A, I present the summary statistics on all outputs and inputs. In Table 2, Panel

B I report the descriptive statistics for efficiency calculated by different sized calculation groups

(similar as in Table 1). This table illustrates two important facts. First, across all calculation

groups (including DEA1600), the mean efficiency is higher and the standard deviation is lower

than the results reported in Table 1. Thus, without regard to the size of the calculation group, the

larger numbers of inputs and outputs compresses the distribution of efficiency scores in its own

right, leading to more DMUs on the frontier. Second, the pattern revealed in Table 1, Panel B—

increasing mean efficiencies and decreasing standard deviations—is present and even more

pronounced when there are more inputs and outputs. For example, there is minimal variation in

DEA25, with most observations residing at or very near the frontier; in fact, the median

19
observation is on the frontier. This shows the effect of small calculation groups increases with

the number of inputs and outputs.15

The simulation-based analysis I present above yields several important inferences for

researchers. First, the distribution of efficiency scores is sensitive to calculation group size, both

in means and standard deviations. Second, this effect is amplified in the higher end of the

distribution of efficiency. When calculation group size is small enough, it may be difficult to

interpret reported efficiency scores, particularly for values at or close to the frontier. Third,

calculation group size effects are more severe the greater the number of inputs and outputs.

Multiple time periods

The second issue involves measuring DEA efficiency when the underlying data covers

multiple time periods. Although early studies focus on single time period settings, recent large-

sample studies use panel datasets. In these studies, firms have multiple observations over

different time periods and researchers can choose to measure the firm’s level of efficiency or

changes in efficiency by individual time period or by pooling multiple time periods. To

understand the implications of calculating efficiency under each method, I run two separate

simulation analyses. First, I consider the implications of calculating DEA separately for each

time period (e.g., year).16 Since DEA efficiency is a relative measure, it may be difficult to draw

inferences comparing efficiency scores calculated separately in different time periods. Second, I

15
For parsimony I do not reproduce the analysis in Table 1, Panels C, D, and E. The tests for correlation and quartile
ranking yield similar inferences as those I report in Table 1. The test for within-DMU variance reveals a somewhat
different pattern. The standard deviation increases for DEA800 through DEA200 (going from 0.021 to 0.028) but
then decreases through DEA25 (falling to 0.017). This pattern emerges due to two countervailing forces affecting the
distribution of efficiency. The first effect, which I document in Table 1, Panel E, is that standard deviations increase
for smaller calculation groups. The second effect involves the compression of the distribution against the frontier.
With the highest value of efficiency constrained to one and smaller calculation group sizes increasing the lowest
value, the range of the distribution is narrower, leading to lower variance. In the two-output, six-input simulation,
the second effect exceeds the first for the smallest calculation groups.
16
Researchers can run DEA with whatever periodicity the data allows. For ease of exposition, I will use “year” as
the default periodicity (because the majority of financial accounting studies in DEA use yearly data), but the
discussion and analysis are generalizable to quarters, months, etc.

20
examine the implications of pooling DMUs from different years and how this affects efficiency

scores. This method should mitigate the issue of non-comparability when the frontier is shifting,

but could introduce look-ahead bias. In addition to examining year-to-year efficiency scores, I

also analyze the implications of measuring the change in efficiency.

I start by analyzing efficiency scores calculated separately by year. If the efficient frontier

is relatively stationary over time, this method should yield efficiency scores that are comparable

across time periods. If, however, the frontier shifts between years (as may be expected when

technologies and productivity improve or if there are economy-wide shocks) the relative

efficiency scores calculated by DEA may also shift, even if the absolute efficiency of the firm

has not changed over time. To quantify how changes in the frontier affect efficiency scores over

time, I return to simulation. I simulate a new dataset with a single output (sales) and three inputs

(Capital A, Capital B, and Expense). Sales vary randomly between 10 and 100. Inputs again vary

as a percentage of sales: Capital A and Capital B are both between 60 to 140% of sales, and

Expense is between 60 and 100% of sales. I construct a dataset with 1,600 DMUs using these

parameters. I term this the current sample and present summary statistics on the inputs and

outputs in Table 3, Panel A.17

To capture the effect of a change in the frontier over time, I create a second dataset, the

future sample. To keep the intuition simple, I keep sales stationary over time. For example, if

DMU112 has sales of 32 in the current sample, it will also have sales of 32 in the future sample.

I alter the frontier via changes in the inputs. Specifically, I shift the distribution for each input

downward by 20%, meaning each Capital account ranges between 40 and 120% of sales, and

Expense between 40 and 80% of sales. Although each DMU has the same sales figure for the

17
I change the parameters of this simulation relative to increase the variation. This allows for a sharper contrast
between the current and future subsamples used in the over-time analysis.

21
current and future period, the inputs are independent within and across the current and future

periods. This means a DMU could have high efficiency in the current period and low efficiency

in the future period; or the opposite might hold. I structure the simulation so that firms improve

on average from the current to the future period. Specifically, the average firm will have inputs at

280% of sales in the current period (100% each for Capital A & B, and 80% for expense) and

220% of sales in the future period (80% each for Capital A and B, 60% for Expense). This

implies an expected improvement of about 21.4%.18 The data generating process, however, is

stochastic so the actual change may differ. I present descriptive statistics for the future sample in

Table 3, Panel B, with combined statistics presented in Table 3, Panel C.

Given these expectations, I start by calculating efficiency for the current and future

subsamples independently (i.e., in separate simulated runs), following an approach where DEA is

run by year. I present summary statistics in Table 4, Panel A. The mean DEA score for the

current sample is 0.832, with a standard deviation of 0.113. The mean efficiency score for the

future sample is significantly lower at 0.782, and significantly more variable with a standard

deviation of 0.141. A striking aspect of this result is that although the simulation was designed to

make future efficiency higher than current, this is not conveyed when comparing mean efficiency

scores. This illustrates the chief shortcoming of calculating DEA by year: as the frontier shifts,

efficiency scores are rendered incomparable between years.

I present additional statistics to illustrate the distortionary effect of separate yearly

calculations of efficiency scores. Since some studies employ a changes research design (e.g.,

Baik et al. 2013), I examine the change in efficiency scores on a DMU-by-DMU basis from the

current to the future period. Based on the differences in means, the mean change is

approximately −4.9%. This difference is, however, variable: 39% of DMUs experienced an
18
1 – (220 / 280)

22
improvement in efficiency and 61% had a decline. Because the design of the simulation is meant

to induce on-average increases in efficiency, at least half (and presumably more than half) of

DMUs should experience an improvement in efficiency. On this basis, I conclude that changes in

DMU efficiency does not yield meaningful inferences when calculated separately by year when

the efficient frontier has shifted.

Due to potential problems in measuring changes when efficiency is calculated on a

period-by-period basis, I next examine pooling all observations regardless of time period and

calculating efficiency scores jointly on the full panel. I start by combining the current and future

samples into a single dataset. I do, however, maintain the distinction between the two so I can

compare these results with those from the prior section’s calculations. I present these results in

Table 4, Panel B. I start with the mean efficiency scores for all 3,200 observations—this includes

two observations for each firm, one from the current period and one from the future. The mean is

significantly lower than either of the separately calculated groups, at 0.673, while the standard

deviation of 0.157 is higher. The greater variation is not surprising, as the joint distribution is

wider than either of the component distributions.

In the next two rows I present statistics of efficiency scores separately for the current and

future samples. The mean value for the current period is 0.564, while the mean value of the

future period is 0.782. The difference between the two groups is a statistically significant at

0.218, very close to the theoretical difference of 0.214 predicted by the design of the simulation.

This suggests that calculating jointly between the two years yields efficiency scores that capture

economic differences over time. Further, the differences in the underlying distributions are clear

for the two subsamples: The future subsample has larger values for every point of the

distribution, including the maximum, where the current subsample has a maximum value of

23
0.855. Finally, the efficiency scores reported for the future subsample are identical to those

reported when DEA is run separately by year. That is, the frontier traced by the future subsample

is the same regardless of the inclusion of the current subsample observations. This indicates that

there is greater risk of measurement error for periods with lower efficiency.

As a final test, I return to measure the direction of the DMU-by-DMU changes between

years. Recall when calculated separately approximately 39% of DMUs showed improvement,

contrary to expectations based on the design of the simulation. When efficiency is calculated

jointly, approximately 91% of DMUs show improvement. This result is more in line with

expectations, given the underlying variance of the distribution. In total, this analysis suggests that

calculating jointly allows for an accurate measure of efficiency score changes.

There is, however, a concern with calculating efficiency jointly across years: look-ahead

bias. For example, consider a researcher calculating efficiency for 2014 and 2015 jointly for a set

of firms with observations in both years. Once the calculation is complete, the researcher uses

efficiency to explain some behavior or phenomenon. Under this method, efficiency will be

calculated for each observation, meaning there will be some efficiency scores for 2014 and some

for 2015. This issue is that the 2014 scores, which are being used to explain some phenomenon

in 2014, are calculated using 2015 data. Since this data is unobservable at the time when the

decision was being made, this may be an inappropriate inference. This problem—look-ahead

bias—is potentially more severe for studies that use longer-term panel datasets.19

19
On a conceptual basis, whether look-ahead bias is an issue depends on the context of analysis for which the
efficiency scores are being used. If efficiency scores are being used to describe an association, look-ahead bias is
likely not particularly damaging to inferences. For example, if a researcher is using a panel dataset to examine the
association between efficiency and profitability, using efficiency scores calculated with the full panel can yield valid
inferences ex post. In contrast, if the researcher is arguing that efficiency causes a certain effect, the inferences may
be distorted by look-ahead bias. For example, if a study contends that the board of directors uses firm efficiency to
set compensation contracts, it is important to measure efficiency that the board would be using. This would
necessarily be efficiency measured free from look-ahead bias.

24
The above analysis suggests that either method of calculating efficiency with panel

data—separately by year, or by pooling multiple years—presents potential inference problems.

The researcher must balance the costs of each when deciding which method to use. If the

researcher believes (or better yet, can show) that the efficient frontier is relatively stationary over

time, and that there are roughly similar amounts of observations over time (to avoid calculation

group size issues), calculating efficiency scores by year likely leads to few errors. Researchers

can control for small changes in the frontier using econometric techniques, such as fixed effects,

that are commonly used with panel data.20 If the frontier is changing significantly over time, or

there is a large disparity between yearly observations, it may be better for the researcher to pool

observations, even at the risk of look-ahead bias.21 Researchers’ willingness to accept look-ahead

bias will also be a function of the research context and will depend on whether the study is

seeking association or more direct causation. If the costs of the different choices are difficult to

assess, I recommend for researchers to use both methods and ensure that their results are robust.

Measuring changes in efficiency presents similar issues of which the researcher must be

aware. Even in cases where the efficient frontier is not changing too much over time, measuring

changes based on efficiency calculated separately by year could introduce noise. As such, I

recommend researchers calculating the change in efficiency should calculate efficiency for pairs

of adjacent years (e.g., if measuring the change for 2014, calculate efficiency by pooling 2013

and 2014) and use this change for subsequent analysis.

Calculation group classification

20
Researchers can also use rolling windows of data to calculate efficiency, but only retain the final year’s efficiency
score for analysis. Cook and Seiford (2009) refer to this as window analysis.
21
If the number of observations is roughly consistent from period to period and the frontier does not change too
dramatically, fixed effects are appropriate to control for changes. When the number of observations differs from
year to year, calculation group size effects may distort efficiency scores. Furthermore, fixed effects do not address
the loss of variation due to small calculation groups.

25
When implementing DEA with financial accounting data, researchers have typically

formed calculation groups by industry. For example, DLM sorts firm-years by Fama and French

(1997) 48 industries. Sorting by industry is motivated by the production-based origins of DEA,

and assumes that firms in the same industry are likely to use a similar mix of capital and

expenses to produce revenue. Despite this logic, industry grouping introduces methodological

problems. The first is the varied sizes of industry groups; this can lead to calculation group

problems. Second, sorting panel datasets by industry (but combining different years within the

same calculation group) introduces look-ahead bias into the calculations. In this section, I

examine the implications of calculation group classification on inferences using efficiency

scores.

Industry groups, regardless of classification system, vary in terms of observations. To

illustrate this, I present the breakout of observations by Fama and French (1997) industry group

in Table 5, Panel A (using data from 1980 to 2015). The industry groups range in size from a

low of 302 (Tobacco) to a high of 26,135 (Business Services). In the same table, I present the

mean operating efficiency score by industry, using the calculation in DLM. This range is also

wide, from a low of 0.267 (Pharmaceuticals) to a high of 0.935 (Shipbuilding). These results are

consistent with the calculation group size issues I discuss earlier, as the smaller industries have,

on average, higher mean efficiency; the correlation coefficient between industry size and

efficiency score is significantly negative at -0.579 (untabulated). The results also show

considerable cross-industry variation in mean efficiency, with a standard deviation of 0.184

(untabulated).

The second issue is look-ahead bias. Although difficult to quantify, the effects of this bias

could be substantial. The long time-series of financial accounting data (30 years in the original

26
version of DLM, and 35 years in the most recently updated dataset made available to

researchers) means that the economy has changed dramatically from the early to the late

observations. Comparing firms in the 1980s to firms in the 2010s may not be yield useful

inferences. Moreover, including subsequent years in the calculation can change a prior year’s

efficiency. As an example, in the DLM 2009 run Microsoft’s 1987 efficiency score was 0.771. In

the 2013 DLM run, Microsoft’s 1987 score had fallen to 0.651.22 This change illustrates how the

reference point and comparison group can affect calculated efficiency, and the importance to

researchers of appreciating these effects.23

The solution to look-ahead bias is to calculate efficiency not by industry, but rather on a

yearly basis.24 Researchers can address any non-stationarity in the frontier in the subsequent

empirical tests (e.g., year fixed effects). Yearly calculation also addresses, at least partially, the

discrepancy in group sizes in the industry-based sorting. In Table 5, Panel B, I present the

number of observations and mean efficiency by year for Compustat firms with sufficient data.

Although the yearly observations vary cyclically over time (rising from 4,674 in 1980 to a high

of 7,621 in the dot-com run-up 1999, then declining following the bursting of the bubble and the

2007-8 financial crisis back to 4,746 in 2015) the variation is muted relative to the industry

sorting. The variability in efficiency scores is similarly low, ranging from 0.237 (1982) to 0.358

(2009). Yearly mean efficiency shows an insignificant positive correlation with yearly

22
This analysis is based on datasets posted by DLM’s authors and is available for download.
23
The context of the research question guides whether look-ahead is an issue. If the researcher wants to understand
retrospectively how efficient Microsoft was in 1987, calculating with the whole panel is appropriate. If the
researcher wants to understand how Microsoft’s efficiency affected some firm decision or choice in 1987, using a
real-time score, not a score with future years’ data, will lead to better inferences.
24
Alternatively, researchers could calculate efficiency by industry and year. In the context of most studies using
financial accounting data (i.e., using Compustat data) calculation groups would be very small and there would be
insufficient variation in efficiency scores to derive valid inferences.

27
observations of 0.054 (untabulated).25 Additionally, the standard deviation of mean efficiency

across years is over six times smaller than the industry sorting, at 0.030.

The rationale for industry-based sorting is to group firms with similar operations. By

sorting by year but not industry, this commonality in production and operations is lost and

DMUs may not be comparable. A key question is whether this will damage the ability of DEA to

effectively differentiate the efficiency of firms. On the one hand, there may be industries that are

inherently more efficient than others; these industries are likely to compose the frontier and other

high efficiency observations, disproportionately underreporting the efficiency of firms in other

industries.26 On the other hand, although operations may differ, the classification and

aggregation of financial statement information may yield comparable results even for different

industries. As an example, firms in the Books and Consumer Products industries use different

production technologies and produce different products. Yet firms in each industry deploy a

variety of capital assets to produce revenue. Given the flexible nature of the DEA program (see

Section 2), it is not clear whether researchers can compare efficiency scores of firm-years across

industries.

This suggests the importance of calculation group classification is an empirical question.

To examine this, I start by calculating efficiency using both an industry-based sorting and a year-

based sorting. I present descriptive statistics for each of these in Table 6, Panel A. Notably, but

not surprisingly (based on the results reported in Table 5), the mean efficiency is lower using the

25
For industry-calculated DEA a regression of efficiency on the number of DMUs in the calculation group leads has
an R2 of 33.5%, suggesting in this design calculation group size explains one-third of variation in efficiency scores.
By way of comparison, for the year-calculated DEA, the R2 is only 0.3%.
26
There is little evidence that certain industries are systematically more efficient than others. In untabulated
analysis, I measure the mean efficiency by industry based on DEA calculated by year. The observations range from
0.207 (Gold Mining) to 0.563 (Tobacco), a much narrower range than when DEA was calculated by year. Moreover,
Tobacco is an outlier; the next highest industry (Soda) has mean efficiency of 0.333. This range (0.126) is almost
identical to range of DEA calculated by year (0.121), suggesting that, with the possible exception of Tobacco firms,
there are not systematic differences in efficiency across industries.

28
year-based sorting. In unreported analysis, I find that the correlation between the two efficiency

measurements is high (ρ = 0.51) but not perfect. This imperfect correlation suggests the

classification method used to form calculation groups could affect inferences from using

efficiency in subsequent empirical tests.

I use regression analysis to examine the impact that the sorting group has on efficiency. I

focus on the role of efficiency in predicting future accounting performance. Intuitively, firms

with higher efficiency should have superior future operating performance; more efficient firms

are better at maximizing revenues and minimizing costs, which leads to superior accounting

performance. I examine the relation between efficiency and future accounting performance using

the following OLS regression:

𝑅𝑂𝐴𝑡+1 = 𝛼 + 𝛽1 𝐸𝑓𝑓𝑐𝑖𝑒𝑛𝑐𝑦𝑡 + 𝛽2 ROA𝑡 + Γ𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠 + 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 + 𝜀

I measure accounting performance with ROA, defined as the firm’s earnings before extraordinary

items scaled by average total assets. I include contemporaneous ROA because accounting

performance is persistent over time. Other controls include firm size (the log of total assets),

leverage (the ratio of debt to total assets), and the market-to-book ratio (the market value of

equity plus the book value of debt scaled by total assets). I also include year and industry fixed

effects. Finally, I cluster standard errors by firm and year.

I present the results in Table 6, Panel B. The regression coefficients show that that higher

efficiency is associated with higher future earnings: the coefficients on both efficiency measures

are positive and significant. It is also notable that the coefficient on efficiency measured by year

(the second column) has a higher t-statistic than efficiency measured by industry (6.49 vs. 3.93),

suggesting a stronger statistical relation between year-based efficiency and future performance.

To test the significance of this difference, and to account for the correlation between efficiency

29
and the control variables, I run a Vuong test; this test compares the explanatory power of non-

nested models. If one regression was better at explaining future ROA this would be reflected in a

significant statistic in the Vuong test. The untabulated results of the Vuong test yields a Z-

statistic of -0.42, which is not statistically significant (p-value: 0.67). This suggests that, despite

the difference in significance on the coefficients for the two efficiency measures, the models are

equally successful in explaining future performance.

The above analysis suggests that, although using different classifications for calculation

groups yields different values for efficiency, this does not necessarily affect inferences. Given

how industry sorting leads to variability in calculation group sizes and possible look-ahead bias,

this evidence suggests that time-based sorting may be a more effective design when using

accounting information to calculate efficiency using DEA. I note, however, that I provide only

one validation test of the impact of this choice; researchers should examine the impact of this

design choice in the specific contexts of their studies.

Subsets of efficiency scores

The calculation group can have a substantial influence on the efficiency score. In addition

to issues discussed above, such as mechanical effects from calculation group size, the

composition of the calculation group will also affect scores and potentially inferences. One way

that this can manifest is when a study requires efficiency scores for a subset of firms with

information available to calculate efficiency. This yields a key design decision: should the

researcher calculate efficiency for the entire set of observations (and then draw the efficiency

scores for the subset), or should the researcher only calculate efficiency scores for the subset? In

the Appendix I provide a summary of research that uses DEA to calculate operating efficiency

30
with Compustat variables, and shows the additional data requirements in each study. This

illustrates the frequency with which this issue may be relevant.

As a concrete example, consider a researcher who wants to understand the association

between firm operating efficiency and management forecast accuracy. There will be fewer firm-

years with forecast data available than firm-years with sufficient data to calculate efficiency. The

researcher must then decide which set of firms to use when calculating efficiency. On the one

hand, efficiency could be calculated on all firm-years with data to calculate efficiency

(comprising most Compustat firm-years). On the other hand, the researcher could restrict the

efficiency calculation to only those firm-years with forecast data available. This leads to three

questions that are relevant to the researcher. First, do the subsample and the broader sample

differ significantly? Following the example from above, firms that issue management forecasts

are larger, have more transparent information environments, and are more profitable (Ajinkya et

al. 2005). Second, when the subsample deviates from the broader sample, does the efficiency

score also differ based on the sample upon which it is calculated? Third, if there is a difference,

does it affect inferences from using the scores? Based on the answers to these questions, the

researcher can make a judgment on the appropriate method to calculate efficiency.

I provide evidence related to these questions in this section. I start by contrasting the

population of Compustat firms with subsamples created by intersecting Compustat with two

commonly used databases, CRSP and Execucomp. After examining descriptive differences in the

samples, I examine differences in efficiency, calculating efficiency scores by year following the

output and inputs used in DLM. I follow with a validation test examining the association

between firm operating efficiency and CEO compensation.

CRSP

31
I start by examining the intersection of the Compustat and CRSP databases. As a

benchmark, I use the full Compustat sample; I calculate operating efficiency scores (following

DLM) for 208,735 firm-years from 1980 to 2015. I match data from this sample to CRSP,

requiring that CRSP have at least one monthly stock return to be matched to an annual

Compustat observation. This matching leads to 153,982 matches, or approximately 73.8% of

Compustat sample observations.

I present descriptive statistics for the full Compustat sample, including various firm

features. Then, for comparison purposes I present similar statistics for the subsample that

matches to CRSP. I present these statistics in Panels A and B of Table 7. The most striking

difference between the two samples is profitability: firms with data on CRSP are significantly

more profitable. Firms with CRSP coverage are also larger, have less debt, and lower book-to-

market ratios. To understand whether these differences affect the measurement of efficiency, I

calculate efficiency two different ways. First, I calculate efficiency using the full Compustat

dataset and merge these values into the Compustat-CRSP intersection. Second, I calculate

efficiency using just those observations in the Compustat-CRSP intersection. I present these

values in Table 7, Panel C. The first row (Efficiency – Full Compustat Sample) presents the

efficiency for the entire Compustat population from 1980 to 2015. The second row (Efficiency –

Compustat/CRSP Matched) presents similar statistics for efficiency (calculated using the full

Compustat sample) but only for the firm-years that have matched data on CRSP. A comparison

of these rows shows that the distributions are almost identical. This means, despite the

differences between firm-years in Full Compustat Sample and in the Compustat/CRSP Matched

sample, the differences do not pervade the measurement of efficiency when drawing this

subsample.

32
Next, I compare efficiency from the subsample of firms in the Compustat-CRSP

intersection where efficiency is calculated on just these firm-years. The logic behind calculating

efficiency separately is that the subsample will be a better comparison group because it will

exclude the much different firms not covered by CRSP. I present statistics for this subsample

calculation (Efficiency – Compustat/CRSP Intersection) in the third row. The mean and median

efficiency scores are higher in Row 3 relative to Row 2, as is the standard deviation; these

differences are all significant as reported in the bottom row of Panel C. I also test the correlation

between the two efficiency measures in Rows 2 and 3. This correlation is highly significant at

approximately 0.70 (untabulated). Collectively, this suggests that even when dealing with a fairly

broad subset of the Compustat firms, calculating efficiency on a non-random subsample of firm-

years can lead to different levels of reported efficiency.27 Thus, researchers should consider the

relevant comparison group (All Compustat firms? Only firms on CRSP?) when designing their

DEA efficiency calculations.

Execucomp

I next examine the intersection of Compustat and Execucomp. Unlike CRSP, which

covers a large portion of Compustat population, Execucomp is more limited both in terms of firm

and year coverage. In Table 8, I present descriptive statistics for Compustat firm-years (Panel A)

and Execucomp (Panel B). Because Execucomp’s broad coverage begins in 1993, I use the 1993

to 2015 time period for both Compustat and Execucomp data. The Execucomp subsample

comprises 34,044 observations, or about 24.2% of the Compustat sample.

The two panels reveal a similar, though more pronounced, pattern of differences between

the two samples as for the Compustat-CRSP intersection. Execucomp firm-years are

27
It is also likely that calculating efficiency on a smaller subsample leads to smaller calculation group sizes. As
illustrated earlier in Section 3, this can lead to mechanically higher levels of efficiency, even when the subsample
has identical characteristics to the sample from which it is drawn.

33
significantly larger and more profitable than the broader Compustat population over the reported

time period. They also have less debt and more value comprised of investment opportunities. In

Table 8, Panel C, I present comparisons between the full sample efficiency (Efficiency – Full

Compustat Sample), the subsample of Execucomp firms using the full sample efficiency

(Efficiency – Compustat/Execucomp Matched), and efficiency calculated yearly using only firm-

years with data available on Execucomp (Efficiency – Compustat/Execucomp Intersection). The

results reveal that efficiency calculated using just the Execucomp firm-years is significantly

higher on average than the efficiency based on all Compustat firm-years. As with CRSP, the

method of calculation appears to have a significant bearing on the reported efficiency.

Validation test

The prior section illustrates differences in the measurement of efficiency when using a

subset of the full population of firms for which efficiency can be calculated. While there are

clearly statistically significant differences, this does not does prove the measurement difference

is meaningful per se; this can only be assessed in application. In this section, I provide a

validation test to assess whether these measurement differences affect the results from empirical

tests.

Due to the dramatic differences reported in Table 8, I examine two different

measurements of efficiency for a sample of Execucomp firms: Efficiency- Compustat/Execucomp

Matched and Efficiency- Compustat/Execucomp Intersection. To test the importance of

differences in these scores, I examine the association between firm efficiency and CEO

compensation. Intuitively, I expect that CEOs of more efficient firms should be paid more than

those of less efficient firms. It is possible, however, that other firm features may fully explain the

34
efficiency part of the CEO’s pay, resulting in no measurable association. I run the following OLS

regression:

𝑇𝑜𝑡𝑎𝑙 𝐶𝑜𝑚𝑝𝑒𝑛𝑠𝑎𝑡𝑖𝑜𝑛𝑡 = 𝛼 + 𝛽1 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦𝑡 + Γ𝐶𝑜𝑛𝑡𝑟𝑜𝑙𝑠 + 𝑓𝑖𝑥𝑒𝑑 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 + 𝜀

I measure total compensation as the natural logarithm of TDC1 from Execucomp; I restrict

sample observations to CEOs only. As control variables I include current and past return on

assets, the three-year standard deviation of return on assets, current and past stock returns, the

three-year standard deviation of stock returns, the prior year’s logged sales, the beginning of the

year logged market-to-book ratio, and CEO tenure (Core et al. 2008). I include year and industry

fixed effects, and cluster standard errors by firm and year. I run two specifications, each using

one of the two calculations of efficiency discussed above.

I report regression results in Table 9. The first column presents results using Efficiency-

Compustat/Execucomp Matched, where the coefficient on efficiency is positive and statistically

significant. The economic significance is substantial: a one standard deviation shift in efficiency

is accompanied by a 6.9% increase in total pay. In the second column efficiency I measure

efficiency using just Efficiency – Compustat/Execucomp Intersection. The coefficient on

efficiency in this specification is positive but insignificant. In economic terms, a one standard

deviation increase in efficiency is associated with just a 0.8% increase in total pay. The control

variables are largely consistent in terms of sign and significance.28

These results highlight a challenge to researchers when designing studies using DEA. In

the example noted above, the inferences from the tests are ultimately ambiguous—the researcher

cannot tell definitively whether efficiency is related to CEO compensation or not. The first

column reports the effects of efficiency measured relative to all Compustat firms, and suggests

28
The only exception is contemporaneous ROA, which is negative in the first specification and positive in the
second, although statistically significant in neither.

35
that Execucomp CEOs as a group are compensated for their firms’ superior efficiency. The

second column, on the other hand, examines efficiency measured relative only to other

Execucomp firms, and indicates, that, in the cross-section of these firms, relative efficiency is not

associated with greater compensation. The key issue for researchers is to carefully identify,

within the context of the research question, the appropriate grouping for efficiency calculation.

As this example illustrates, the decision can be subtle, yet have significant implications on the

empirical results and their interpretation. Furthermore, understanding the robustness of an

empirical result to alternative measurement—and providing evidence for readers of robustness—

is a vital way to improve inferences.

4. Conclusion

This study examines methodological considerations of calculating efficiency using DEA

with large panel datasets. The results reveal four important insights for researchers. First, the

calculation group size used for DEA can effect efficiency scores; there is a strong negative

correlation between mean efficiency and calculation group size. Furthermore, small calculation

group size attenuates the variance within the group, meaning that researchers cannot necessarily

address small calculation group problems with fixed effects in subsequent analysis. Second,

comparing efficiency scores calculated in different time-based calculation groups, and especially

using DMU-level changes, can lead to incorrect inferences, especially when the frontier is

shifting over time.

Third, although DEA was developed by operations researchers and grouping is typically

based on common operations or production technologies (i.e., industry), the method appears to

be robust to alternative groupings, including year-based calculation groups. This is relevant when

using panel data, as industry-based groups can lead to small calculation group size and look-

36
ahead bias. Fourth, researchers should be cautious when drawing subsamples of efficiency scores

from a set of scores calculated from a broader population of firm-years, particularly when the

subsample differs from the broader population. I illustrate that inferences are sensitive to this

choice in the context of executive compensation.

DEA provides a powerful method to calculate relative efficiency, and is useful in a

variety of contexts (both those being used currently and those yet to be developed). This study

presents several methodological issues that are likely to arise when using this method with large

panel datasets, and provides a set of prescriptions of how researchers can use DEA to estimate

and design their tests to alleviate these concerns.

37
Appendix

Paper (Authors, Journal, Year) Observations Time Period Data Requirements


Baik, Chae, Choi, and Farber (CAR 2013) 71,733 1976-2008 CRSP
Baik, Farber, and Lee (CAR 2011) 14,315 1995-2005 Execucomp, First Call
Banker, Darrough, Huang, and Plehn-Dujowich (TAR 2012) 2,413 1992-2006 Execucomp
Bonsall, Holzman, and Miller (MS 2017) 87,759 1985-2011 Future ROA, future returns, credit ratings
Chang, Hayes, and Hillegeist (MS 2015) 1,610 1992-2001 Execucomp (CEO turnover)
Chen, Pololski, and Veeraraghavan (JEF 2015) 42,754 1993-2006 NBER patent data, Execucomp
Cornaggia, Krishnan, and Wang (CAR 2017) 25,113 1987-2013 S&P long-term credit ratings
Demerjian, Lev, Lewis, and McVay (TAR 2013) 78,423 1989-2009 Data for calculating EQ measures
Evans, Luo, and Nagarajan (TAR 2014) 264 1980-2004 Bankruptcy data
Francis, Hasan, Mani, and Ye (JFE 2016) 3,694 2006-2010 S&P 1500, Execucomp
Garrett, Hoitash, and Prawitt (JAR 2014) 1,005 2005-2010 “Great Places to Work” survey from Fortune Magazine
Guo, Huang, Zhang, and Zhou (TAR 2016) 7,804 2004-2008 KLD Socrates, Audit Analytics
Jung, Lee, and Weber (CAR 2014) 62,165 1983-2007 CRSP
Koester, Shevlin, and Wangerin (MS 2017) 44,616 1994-2010 Data to calculate cash effective tax rate
Kubick and Lockhart (JCF 2016) 16,150 1994-2012 Execucomp
Qiu, Trapkov, and Yakoub (JBF 2014) 2,198 1994-2010 Merger and acquisition data (SDC)

Notes to the Appendix: This table summarizes published and forthcoming studies that use a DEA-based efficiency measure (including the DLM MA Score) as a
principal explanatory variable. The list does not include studies which use DEA efficiency scores in robustness tests or as a control variable. The first column
presents the authors, journal, and year. The second provides the maximum number of observations used in the study; in many cases, some tests used fewer. The
third column shows the time period of each study’s data. The final column shows the data requirement(s) for the study.

Journal abbreviations: CAR – Contemporary Accounting Research; JAR – Journal of Accounting Research; JBF – Journal of Banking and Finance; JCF –
Journal of Corporate Finance; JEF – Journal of Empirical Finance; JFE – Journal of Financial Economics; MS – Management Science; TAR – The Accounting
Review

38
References

Abbott, M. and C. Doucouliagos. 2003. The efficiency of Australian universities: A data


envelopment analysis. Economics of Education Review 22(1): 89-97.

Adler, N. and J. Berechman. 2001. Measuring airport quality from the airlines’ viewpoint: An
application of data envelopment analysis. Transport Policy 8(3): 171-181.

Ajinkya, B., S. Bhojraj, and P. Sengupta. 2005. The association between outside directors,
institutional investors and the properties of management earnings forecasts. Journal of
Accounting Research 43(3): 343-376.

Athanassopoulos, A. and E. Shale. 1997. Assessing the comparative efficiency of higher


education institutions in the UK by means of data envelopment analysis. Education
Economics 5(2): 117-134.

Avkiran, N. 1999. An application of reference for data envelopment analysis in branch banking:
Helping the novice researcher. International Journal of Bank Marketing 17(5): 206-220.

Avkiran, N. 2001. Investigating technical and scale efficiencies of Australian universities


through data envelopment analysis. Socio-Economic Planning Sciences 35(1): 57-80.

Baik, B., J. Chae, S. Choi, and D. Farber. 2013. Changes in operational efficiency and firm
performance: A frontier analysis approach. Contemporary Accounting Research 30(3): 996-
1026.

Baik, B., D. Farber, and S. Lee. 2011. CEO ability and management earnings forecasts.
Contemporary Accounting Research 28(5): 1645-1668.

Banker, R., R. Conrad, and R. Strauss. 1986. A comparative application of data envelopment
analysis and translog methods: An illustrate study of hospital production. Management
Science 32(1): 30-44.

Banker, R., M. Darrough, R. Huang, and J. Plehn-Dujowich. 2012. The relation between CEO
compensation and past performance. The Accounting Review 88(2): 1-30.

Bonsall, S., E. Holzman, and B. Miller. 2016. Managerial ability and credit risk assessment.
Management Science 63(5): 1425-1449.

Brown, R. 2006. Mismanagement or mismeasurement? Pitfalls and protocols for DEA studies in
the financial services sector. European Journal of Operational Research 174: 1100-1116.

Chang, W., R. Hayes, and S. Hillegeist. 2015. Financial distress risk and new CEO
compensation. Management Science 62(2): 479-501.

39
Charnes, A., W. Cooper, and E. Rhodes. 1981. Data envelopment analysis: Approach for
evaluating program and managerial efficiency—with an application to the program follow
through experiment in U.S. public school education. Management Science 27(6): 668-697.

Charnes, A., W. Cooper, and E. Rhodes. 1978. Measuring the efficiency of decision making
units. European Journal of Operational Research 2(6): 429-444.

Chen, Y., E. Podolski, and M. Veeraraghavan. 2015. Does managerial ability facilitate corporate
innovative success? Journal of Empirical Finance 34: 313-326.

Cook, W., M. Hababou, and H. Tuenter. 2000. Multicomponent efficiency measurement and
shared inputs in data envelopment analysis: An application to sales and service performance
in bank branches. Journal of Productivity Analysis 14(3): 209-224.

Cook, W. and L. Seiford. 2009. Data envelopment analysis (DEA)—Thirty years on.” European
Journal of Operational Research 192: 1-17.

Cooper, W., L. Seiford, and K. Tone. 2006. Introduction to data envelopment analysis and its
uses. Springer, New York.

Core, J., W. Guay, and D. Larcker. 2008. The power of the pen and executive compensation.
Journal of Financial Economics 88: 1-25.

Cornaggia, K., G. Krishnan, and C. Wang. 2017. Managerial ability and credit ratings.
Contemporary Accounting Research, forthcoming.

Demerjian, P., B. Lev, and S. McVay. 2012. Quantifying managerial ability; A new measure and
validity tests. Management Science 58(7): 1229-1248.

Demerjian, P., B. Lev, M. Lewis, and S. McVay. 2013. Managerial ability and earnings quality.
The Accounting Review 88(2): 463-498.

Dhungana, B., P. Nuthall, and G. Nartea. 2004. Measuring the economic inefficiency of
Nepalese rice farms using data envelopment analysis. Australian Journal of Agricultural and
Resource Economics 48(2): 347-369.

Dyckhoff, H., and K. Allen. 2001. Measuring ecological efficiency with data envelopment
analysis (DEA). European Journal of Operational Research 132(2): 312-325.

Dyson, R., R. Allen, A. Camanho, V. Podinovski, C. Sarrico, and E. Shale. 2001. Pitfalls and
protocols in DEA. European Journal of Operational Research 132: 245-259.

Evans, J., S. Luo, and N. Nagarajan. 2013. CEO turnover, financial distress, and contractual
innovation. The Accounting Review 89(3): 959-990.

40
Fama, E. and K. French. 1997. Industry costs of equity. Journal of Financial Economics 43: 153-
193.

Farrell, M. 1957. The measurement of productive efficiency. Journal of the Royal Statistical
Society. Series A (General) 120(3): 253-290.

Feroz, E., S. Kim, and R. Raab. 2003. Financial statement analysis: A data envelopment analysis
approach. Journal of the Operational Research Society 54(1): 48-58..

Francis, B., I. Hasan, S. Mani, and P. Ye. 2016. Relative peer quality and firm performance.
Journal of Financial Economics 122: 267-282.

Fraser, I. and D. Cordina. 1999. An application of data envelopment analysis to irrigated dairy
farms in Northern Victoria, Australia. Agricultural Systems 59(3): 267-282.

Garrett, J., R. Hoitash, and D. Prawitt. 2014. Trust and financial reporting quality. Journal of
Accounting Research 52(5): 1087-1125.

Gillen, D. and A. Lall. 1997. Developing measures of airport productivity and performance: An
application of data envelopment analysis. Transportation Research Part E: Logistics and
Transportation Review 33(4): 261-273.

Grigorian, D. and V. Manole. 2002. Determinants of commercial bank performance in transition:


An application of data envelopment analysis. Comparative Economic Studies 48(3): 497-522.

Guo, J., P. Huang, Y. Zhang, and N. Zhou. 2016. The effect of employee treatment policies on
internal control weaknesses and financial restatements. The Accounting Review 91(4): 1167-
1194.

Jacobs, R. 2001. Alternative methods to examine hospital efficiency: Data envelopment analysis
and stochastic frontier analysis. Health Care Management Science 4(2): 103-115.

Johnes, J. 2006. Data envelopment analysis and its application to the measurement of efficiency
in higher education. Economics of Education Review 25(3): 273-288.

Jung, B., W. Lee, and D. Weber. 2014. Financial reporting quality and labor investment
efficiency. Contemporary Accounting Research 31(4): 1047-1076.

Kao, C. and S. Hwang. 2008. Efficiency decomposition in two-stage data envelopment analysis:
An application to non-life insurance companies in Taiwan. European Journal of Operational
Research 185(10): 418-429.

Kim, S., C. Park, and K. Park. 1999. An application of data envelopment analysis in telephone
offices evaluation with partial data. Computers & Operations Research 26(1): 59-72.

41
Koester, A., T. Shevlin, and D. Wangerin. 2017. The role of managerial ability in corporate tax
avoidance. Management Science, forthcoming.

Korhonen, P. and M. Luptacik. 2004. Eco-efficiency analysis of power plants: An extension of


data envelopment analysis. European Journal of Operational Research 154(2): 437-446.

Kubick, T. and G. Lockhart. 2016. Do external labor market incentives motivate CEOs to adopt
more aggressive corporate tax reporting preferences? Journal of Corporate Finance 36: 255-
277.

Lin, L., and C. Hong. 2006. Operational performance evaluation of international major airports:
An application of data envelopment analysis. Journal of Air Transport Management 12(6):
342-351.

Qiu, B., S. Trapkov, and F. Yakoub. 2014. Do target CEOs trade premiums for personal
benefits? Journal of Banking and Finance 42: 23-41.

Sherman, D. and F. Gold. 1985. Bank branch operating efficiency: Evaluation with data
envelopment analysis. Journal of Banking and Finance 9(2): 297-315.

Smith, P. 1990. Data envelopment analysis applies to financial statements. OMEGA 18(2): 131-
138.

Tongzon, J. 2001. Efficiency measurements of selected Australian and other international ports
using data envelopement analysis. Transportation Research Part A: Policy and Practice
35(2): 107-122.

Vassiloglou, M. and D. Giokas. 1990. A study of the relative efficiency of bank branches: An
application of data envelopment analysis. Journal of the Operational Research Society 41(7):
591-597.

Yeh, Q. 1996. The application of data envelopment analysis in conjunction with financial ratios
for bank performance evaluation. Journal of the Operational Research Society 47(8): 980-
988.

42
TABLE 1
Calculation group size: data and tests
Panel A: Descriptive statistics
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Sales 1,600 55.006 26.654 10.000 31.000 56.000 78.000 100.000
Capital A 1,600 43.670 29.549 2.700 19.450 36.720 62.835 135.240
Capital B 1,600 43.957 29.356 2.200 20.160 36.840 62.790 133.860
Expense 1,600 27.235 16.619 2.310 14.000 23.760 39.000 80.000
Panel B: Efficiency by calculation group
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
DEA1600 1,600 0.582 0.213 0.250 0.408 0.555 0.741 1.000
(benchmark)
DEA800 1,600 0.605 0.211 0.265 0.428 0.572 0.771 1.000
DEA400 1,600 0.635 0.210 0.285 0.455 0.603 0.810 1.000
DEA200 1,600 0.672 0.208 0.307 0.490 0.649 0.864 1.000
DEA100 1,600 0.723 0.201 0.343 0.548 0.718 0.923 1.000
DEA50 1,600 0.779 0.188 0.371 0.616 0.807 0.965 1.000
DEA25 1,600 0.839 0.164 0.417 0.717 0.895 0.988 1.000
Panel C: Correlations
DEA1600 DEA800 DEA400 DEA200 DEA100 DEA50 DEA25
DEA1600 0.994 0.980 0.958 0.926 0.882 0.823
DEA800 0.995 0.994 0.978 0.951 0.909 0.849
DEA400 0.983 0.995 0.993 0.973 0.936 0.878
DEA200 0.966 0.984 0.995 0.990 0.961 0.908
DEA100 0.947 0.970 0.985 0.994 0.984 0.943
DEA50 0.927 0.953 0.972 0.985 0.993 0.977
DEA25 0.911 0.937 0.958 0.973 0.982 0.988
Panel D: Descriptive statistics by efficiency quartile
1st Quartile 4th Quartile
2nd Quartile 3rd Quartile
(lowest) (highest)
Mean StdDev Mean StdDev Mean StdDev Mean StdDev
DEA1600 0.337 0.043 0.473 0.042 0.642 0.057 0.886 0.084
(benchmark)
DEA800 0.358 0.046 0.496 0.044 0.665 0.065 0.902 0.079
DEA400 0.386 0.052 0.532 0.058 0.702 0.080 0.922 0.071
DEA200 0.421 0.061 0.577 0.076 0.747 0.094 0.944 0.061
DEA100 0.472 0.074 0.645 0.095 0.810 0.100 0.967 0.046
DEA50 0.537 0.094 0.725 0.108 0.872 0.089 0.982 0.030
DEA25 0.621 0.113 0.813 0.105 0.929 0.064 0.992 0.015

43
Panel E: Within Observation Variability
Mean Standard 95% Confidence
Variable
Deviation Interval
DEA800 0.023 [0.559, 0.651]
DEA400 0.037 [0.561, 0.709]
DEA200 0.049 [0.574, 0.770]
DEA100 0.061 [0.601, 0.845]
DEA50 0.072 [0.635, 0.923]
DEA25 0.077 [0.685, 0.993]

Notes to Table 1: This table presents descriptive statistics for analysis of calculation group size. Panel A presents
summary statistics on the simulated dataset used to calculate efficiency scores, including one output (SALES) and
three inputs (CAPITAL A, CAPITAL B, and EXPENSE). SALES is set to vary uniformly between 10 and 100.
CAPITAL A and CAPITAL B each vary randomly between 20 and 140% of SALES. EXPENSE varies randomly
between 20 and 80% of SALES. Using these parameters, I generate a dataset of 1,600 observations. Panel B
presents summary statistics for efficiency scores calculated for different sized calculation groups. DEA1600 is
efficiency calculated on the full dataset and serves as the benchmark for the other calculations. DEA800 is based on
random samples of 800 DMUs drawn from the original dataset; I draw 50 datasets and run DEA for each. The
reported efficiency in Panel B is the mean efficiency, by original sample DMU, over the 50 draws. DEA400,
DEA200, DEA100, DEA50, and DEA25 represent similar statistics from smaller calculation groups. Panel C
presents correlations between the different calculation group measures of efficiency. Panel D presents means and
standard deviations of efficiency by calculation group, sorted by the quartile of DEA1600. Panel E presents the
mean within-DMU standard deviation of efficiency.

44
TABLE 2
Calculation group size: data and tests (2 outputs, 6 inputs)
Panel A: Descriptive statistics
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Sales A 55.058 26.438 10.000 32.000 55.000 78.000 100.000 55.058
Sales B 54.489 26.376 10.000 32.000 53.000 78.000 100.000 54.489
Capital A 44.937 30.752 2.520 20.095 37.120 63.605 139.000 44.937
Capital B 43.909 29.851 2.200 19.740 35.825 63.290 138.600 43.909
Capital C 42.605 29.676 2.100 19.045 34.315 61.550 137.200 42.605
Capital D 43.097 29.222 2.400 19.610 35.865 61.510 136.000 43.097
Expense A 22.142 12.646 2.000 11.730 20.225 30.800 58.410 22.142
Expense B 21.823 12.849 2.100 11.755 19.440 29.825 60.000 21.823
Panel B: Efficiency by calculation group
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
DEA1600 1,600 0.756 0.170 0.372 0.620 0.768 0.909 1.000
(benchmark)
DEA800 1,600 0.855 0.146 0.433 0.742 0.896 1.000 1.000
DEA400 1,600 0.883 0.135 0.450 0.789 0.941 1.000 1.000
DEA200 1,600 0.912 0.119 0.469 0.847 0.979 1.000 1.000
DEA100 1,600 0.940 0.099 0.498 0.920 0.995 1.000 1.000
DEA50 1,600 0.964 0.074 0.536 0.970 1.000 1.000 1.000
DEA25 1,600 0.982 0.048 0.582 0.992 1.000 1.000 1.000

Notes to Table 2: This table presents descriptive statistics for analysis of calculation group size using an extended
model. Panel A presents summary statistics on the simulated dataset used to calculate efficiency scores, including
one outputs (SALES A and SALES B) and six inputs (CAPITAL A, CAPITAL B, CAPITAL C, CAPITAL D,
EXPENSE A and EXPENSE B). SALES A and SALES B are set to vary uniformly between 10 and 100; their
values are independent. CAPITAL A and CAPITAL B each vary randomly between 20 and 140% of SALES A.
CAPITAL C and CAPITAL D each vary randomly between 20 and 140% of SALES B. EXPENSE A varies
randomly between 20 and 80% of SALES A, and EXPENSE B varies randomly between 20 and 80% of SALES B.
Panel B presents summary statistics for efficiency scores calculated for different sized calculation groups. DEA1600
is efficiency calculated on the full dataset and serves as the benchmark for the other calculations. DEA800 is based
on random samples of 800 DMUs drawn from the original dataset; I draw 50 datasets and run DEA for each. The
reported efficiency in Panel B is the mean efficiency, by original sample DMU, over the 50 draws. DEA400,
DEA200, DEA100, DEA50, and DEA25 represent similar statistics from smaller calculation groups.

45
TABLE 3
Multiple time periods: data
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Panel A: Current sample
Sales 1,600 54.519 26.098 10.000 32.000 54.000 77.500 100.000
Capital A 1,600 55.253 30.518 7.260 29.755 51.595 76.830 139.000
Capital B 1,600 54.880 29.662 6.500 31.240 51.455 75.045 138.600
Expense 1,600 43.685 22.090 6.000 25.230 42.765 59.285 99.000
Panel B: Future sample
Sales 1,600 54.519 26.098 10.000 32.000 54.000 77.500 100.000
Capital A 1,600 44.491 25.810 4.300 23.835 39.960 61.550 120.000
Capital B 1,600 43.367 25.255 4.200 22.945 39.055 58.490 119.000
Expense 1,600 32.660 17.045 4.300 18.480 31.680 44.945 80.000
Panel C: All observations
Sales 3,200 54.519 26.098 10.000 32.000 54.000 77.500 100.00
Capital A 3,200 49.872 28.765 4.300 26.620 45.140 69.140 139.000
Capital B 3,200 49.124 28.138 4.200 26.535 44.725 68.250 138.600
Expense 3,200 38.173 20.482 4.300 21.120 36.200 52.490 99.000

Notes to Table 3: This table presents descriptive statistics and analysis on data used to analyze the effects of multiple
time periods. Panel A presents summary statistics for inputs (CAPITAL A, CAPITAL B, and EXPENSE) and the
output (SALES) for the current sample. SALES varies uniformly between 10 and 100. CAPITAL A and CAPITAL
B vary between 60 and 140% of Sales. EXPENSE varies between 60 and 100% of sales. Panel B presents summary
statistics for inputs and outputs for the future sample. SALES are the same in this dataset as in the current sample
(i.e., if DMU112 is 32 in the current sample, then DMU112 must be 32 in the future sample.) CAPITAL A and
CAPITAL B vary between 40 and 120% of Sales, and EXPENSE varies between 40 and 80% of Sales. Panel C
presents summary statistics for the current and future samples combined.

46
TABLE 4
Multiple time periods: tests
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Panel A: Separate calculations
Efficiency- Current 1,600 0.832 0.113 0.600 0.741 0.843 0.928 1.000
Efficiency- Future 1,600 0.782 0.141 0.501 0.663 0.784 0.909 1.000
Difference 1,600 -0.049 0.176 -0.487 -0.171 -0.048 0.069 0.394
Improve 1,600 0.393 0.489 0.000 0.000 0.000 1.000 1.000
Panel B: Joint calculations
Efficiency-All 3,200 0.673 0.157 0.401 0.556 0.639 0.785 1.000
Efficiency- Current 1,600 0.564 0.076 0.401 0.506 0.565 0.625 0.855
Efficiency- Future 1,600 0.782 0.141 0.501 0.663 0.784 0.909 1.000
Difference 1,600 0.218 0.156 -0.189 0.102 0.222 0.333 0.596
Improve 1,600 0.906 0.292 0.000 1.000 1.000 1.000 1.000

Notes to Table 4: This table presents efficiency calculations for multiple time periods. Panel A presents summary
statistics when efficiency is measured in separate runs. EFFICIENCY – CURRENT is efficiency calculated with
only the current sample. EFFICIENCY – FUTURE is efficiency calculated with only the future sample.
DIFFERENCE is the difference between the two measures. IMPROVE is the proportion of observations where
EFFICIENCY – FUTURE is higher than EFFICIENCY – CURRENT. Panel B presents summary statistics
efficiency is calculated in the same run. EFFICIENCY – ALL is the efficiency calculated for the combined current
and future samples. Other definitions are similar to those in Panel A.

47
TABLE 5
Observations and efficiency by industry and year
Panel A: Observations sorted by industry
Industry Observations Mean Efficiency
Agriculture 908 0.728
Food 4,218 0.766
Soda 617 0.841
Beer & liquor 973 0.638
Tobacco 302 0.891
Toys 1,931 0.635
Fun 4,246 0.444
Books 1,935 0.618
Household products 4,348 0.728
Clothing 3,004 0.728
Health 4,056 0.567
Medical equipment 6,893 0.453
Drugs 10,186 0.267
Chemicals 4,549 0.609
Rubber 2,454 0.847
Textiles 1,469 0.835
Building materials 5,725 0.591
Construction 2,909 0.669
Steel 3,649 0.684
Fabricated products 1,019 0.892
Machinery 7,895 0.711
Electrical equipment 2,619 0.710
Utilities 4,973 0.490
Automobiles 3,717 0.741
Aerospace 1,211 0.875
Ships 448 0.935
Guns 344 0.931
Gold 2,405 0.320
Mining 2,071 0.279
Coal 502 0.797
Energy 14,760 0.299
Telecom 9,282 0.581
Personal services 2,321 0.709
Business services 26,135 0.371
Computers 9,091 0.479
Chips 13,166 0.464
Laboratory Equipment 4,725 0.645
Paper 3,471 0.796
Boxes 747 0.934
Transportation 6,987 0.605
Wholesale 9,662 0.632
Retail 11,931 0.723
Restaurants 4,881 0.523
Total 208,735 0.545

48
Panel B: Observations sorted by year
Year Observations Mean Efficiency
1980 4,674 0.266
1981 4,692 0.267
1982 4,713 0.237
1983 4,992 0.250
1984 5,143 0.267
1985 5,117 0.258
1986 5,364 0.239
1987 5,621 0.245
1988 5,601 0.268
1989 5,469 0.256
1990 5,437 0.252
1991 5,481 0.240
1992 5,652 0.291
1993 6,013 0.281
1994 6,373 0.305
1995 6,723 0.294
1996 7,525 0.303
1997 7,709 0.261
1998 7,395 0.288
1999 7,621 0.293
2000 7,425 0.264
2001 7,066 0.309
2002 6,616 0.290
2003 6,338 0.290
2004 6,179 0.294
2005 6,007 0.287
2006 5,849 0.285
2007 5,606 0.300
2008 5,343 0.338
2009 5,240 0.358
2010 5,106 0.273
2011 4,955 0.269
2012 4,888 0.318
2013 5,048 0.319
2014 5,008 0.335
2015 4,746 0.336
Total 208,735 0.284

Notes to Table 5: This table presents summary statistics on the Compustat population with sufficient data to
calculate efficiency following the approach of Demerjian et al. (2012). Panel A sorts firms by Fama and Franch
(1997) industry and shows the number of observations and mean efficiency where the calculation is by industry.
Panel B presents observations and efficiency calculated by year.

49
TABLE 6
Calculation group classification: data and tests
Panel A: Descriptive statistics
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Efficiency- Industry 208,735 0.545 0.271 0.012 0.316 0.551 0.770 1.000
Efficiency- Year 208,735 0.284 0.159 0.000 0.204 0.251 0.324 1.000
Panel B: Regression results
ROAt+1
Efficiency – Industry 0.049***
(3.93)
Efficiency – Year 0.070***
(6.44)
ROAt 0.568*** 0.568***
(22.18) (22.40)
Size 0.022*** 0.022***
(8.09) (8.45)
Leverage -0.177*** -0.176***
(-6.83) (-6.80)
Market-to-Book -0.016*** -0.016***
(-14.47) (-14.21)
Intercept -0.026 -0.005
(-1.36) (-0.29)
Fixed Effects? Year, Industry Year, Industry
Observations 175,304 175,304
Adjusted R-squared 0.581 0.581

Notes to Table 6: This table presents results comparing efficiency calculated by industry group (EFFICIENCY –
INDUSTRY) and efficiency calculated by year (EFFICIENCY – YEAR). Panel A presents descriptive statistics on
efficiency measurement. Panel B presents regression results. The dependent variable is the future ROA (ROAt+1),
the future year’s ratio of earning (Compustat: IB) to average total assets (AT). ROA t is the current year’s return on
assets. SIZE is the natural logarithm of total assets. LEVERAGE is the ratio of total debt (DLTT + DLC) scaled by
total assets. MARKET-TO-BOOK is the total firm market value over the total firm book value ((CSHO*PRCC_C)+
(DLTT+DLC) / AT). All variables are winsorized at the top and bottom 1% of observations. Each regression
includes year and industry (Fama and French (1997) 48) fixed effects. Standard errors are clustered by firm and
year. *** indicates statistical significance at the 1% level.

50
TABLE 7
Subsets of efficiency scores: Compustat-CRSP intersection
Panel A: All Compustat efficiency – descriptives from 1980 to 2015
Variable Obs. Mean Median
Total Assets 208,735 2,301.49 113.368
MVE 189,111 2,248.06 96.994
ROA 208,722 -0.090 0.027
Operating ROA 208,143 0.026 0.109
Leverage 208,036 0.301 0.219
Book-to-Market 188,537 1.046 0.895
Panel B: Compustat-CRSP intersection – descriptives from 1980 to 2015
Obs. Mean t-statistic of Median Z-statistic of
difference difference
(pooled t-test) (Wilcoxon)
Total Assets 153,944 2,571.04*** 5.46 142.863*** 35.49
MVE 152,802 2,556.30*** 6.48 132.995*** 36.95
ROA 153,933 -0.028*** 42.94 0.033*** 26.15
Operating ROA 153,552 0.074*** 40.43 0.115*** 20.42
Leverage 153,383 0.238*** -48.37 0.199*** -26.34
Book-to-Market 152,282 1.003*** -14.98 0.879*** -6.51
Panel C: Comparison of efficiency calculations
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Efficiency- Full Compustat 208,735 0.284 0.156 0.000 0.204 0.251 0.324 1.000
Sample
Efficiency- Compustat/CRSP 153,982 0.283 0.156 0.001 0.203 0.252 0.326 1.000
Matched
Efficiency- Compustat/CRSP 153,982 0.391 0.232 0.000 0.227 0.354 0.514 1.000
Intersection
Difference (3) – (2) 0.108*** 0.076*** 0.102***
Test (t-test / F-test / Wilcoxon) 151.90 2.20 146.44

Notes to Table 7: This table presents descriptive statistics and tests on the intersection of the Compustat and CRSP
databases. Panel A presents summary statistics on the full Compustat population between 1980 and 2015. Reported
variables include TOTAL ASSETS (AT), MVE (CSHO*PRCC_C), ROA (IB / avg(AT)), OPERATING ROA
(OIADB / avg(AT)), LEVERAGE ((DLTT + DLC) / AT), BOOK-TO-MARKET (AT / (MVE+DLTT+DLC)), and
efficiency scores (calculated for the full Compustat sample, by year). Panel B reports similar statistics for the
subsample of firms with observations on both Compustat and CRSP; I require firms to have at least one monthly
return to be included in this subsample. The table reports differences in means (based on pooled t-tests) and medians
(based on Wilcoxon tests). Panel C presents summary statistics on different efficiency scores. EFFICIENCY- FULL
COMPUSTAT SAMPLE is the efficiency for the entire Compustat population. EFFICIENCY-
COMPUSTAT/CRSP MATCHED is the efficiency of the firms with data on CRSP, based on the full Compustat
calculation of efficiency. EFFICIENCY- COMPUSTAT/CRSP INTERSECTION is efficiency calculated using only
firms with data available on CRSP. The bottom rows present differences between EFFICIENCY-
COMPUSTAT/CRSP MATCHED and EFFICIENCY-COMPUSTAT/CRSP INTERSECTION, including test of the
means (t-test), standard deviations (F-test), and medians (Wilcoxon). *** indicates statistical significance at the 1%
level.

51
TABLE 8
Subsets of efficiency scores: Compustat-Execucomp Intersection
Panel A: All Compustat efficiency – eescriptives from 1993 to 2015
Variable Obs. Mean Median
Total Assets 140,779 3,007.78 168.654
MVE 130,131 3,030,78 159.828
ROA 140,774 -0.124 0.074
Operating ROA 140,351 -0.002 0.172
Leverage 140,282 0.304 0.385
Book-to-Market 129,704 0.997 1.254
Panel B: Compustat-Execucomp intersection – descriptives from 1993 to 2015
Obs. Mean t-statistic of Median Z-statistic of
difference difference
(pooled t-test) (Wilcoxon)
Total Assets 34,044 5,923.28*** 20.76 1,067.09*** 139.01
MVE 33,543 6,849.65*** 27.06 1,172.21*** 144.18
ROA 34,044 0.040*** 79.74 0.096*** 75.81
Operating ROA 33,992 0.145*** 90.94 0.203*** 78.35
Leverage 33,908 0.224*** -39.53 0.333*** -8.97
Book-to-Market 33,410 0.848*** -33.52 1.074*** -24.68
Panel C: Comparison of efficiency calculations
Variable Obs. Mean StdDev Min. Q1 Median Q3 Max.
Efficiency- Full Compustat 140,779 0.298 0.164 0.000 0.208 0.262 0.343 1.000
Sample
Efficiency-Compustat 34,044 0.358 0.169 0.003 0.249 0.313 0.413 1.000
/Execucomp Matched
Efficiency- Compustat / 34,044 0.581 0.220 0.030 0.416 0.545 0.724 1.000
Execucomp Intersection
Difference (3) – (2) 0.223*** 0.051*** 0.232***
Test (t-test / F-test / Wilcoxon) 148.22 1.70 140.44

Notes to Table 8: This table presents descriptive statistics and tests on the intersection of the Compustat and
Execucomp databases. Panel A presents summary statistics on the full Compustat population between 1993 and
2015. Reported variables include TOTAL ASSETS (AT), MVE (CSHO*PRCC_C), ROA (IB / avg(AT)),
OPERATING ROA (OIADB / avg(AT)), LEVERAGE ((DLTT + DLC) / AT), BOOK-TO-MARKET (AT /
(MVE+DLTT+DLC)), and efficiency scores (calculated for the full Compustat sample, by year). Panel B reports
similar statistics for the subsample of firms with observations on both Compustat and Execucomp; I require firms to
have annual total compensation (TDC1) to be included in the subsample. The table reports differences in means
(based on pooled t-tests) and medians (based on Wilcoxon tests). Panel C presents summary statistics on efficiency
scores. EFFICIENCY- FULL COMPUSTAT SAMPLE is the efficiency for the entire Compustat population.
EFFICIENCY- COMPUSTAT/EXECUCOMP MATCHED is the efficiency of the firms with data on Execucomp,
based on the full-sample DEA calculation. EFFICIENCY- COMPUSTAT/EXECUCOMP INTERSECTION is the
efficiency calculated with only Execucomp firms. The bottom rows present differences between EFFICIENCY-
COMPUSTAT/EXECUCOMP MATCHED and EFFICIENCY- COMPUSTAT/EXECUCOMP INTERSECTION,
including test of the means (t-test), standard deviations (F-test), and medians (Wilcoxon). *** indicates statistical
significance at the 1% level.

52
TABLE 9
Subsets of efficiency scores: Validation test

Log(CEO Total Compensation)


Efficiency- Compustat/Execucomp Matched 0.411***
(4.355)
Efficiency- Compustat/Execucomp Intersection 0.037
(0.569)
ROAt -0.094 0.002
(-0.777) (0.016)
ROAt-1 -0.125 -0.109
(-1.152) (-0.987)
StdROAt-2,t -0.018 0.067
(-0.125) (0.479)
Returnt 0.181*** 0.187***
(7.544) (7.523)
Returnt-1 0.172*** 0.172***
(7.041) (6.978)
StdReturnt-2,t 0.063 0.090
(0.152) (0.214)
Log(Salest-1) 0.397*** 0.419***
(23.029) (24.974)
Market-to-Bookt-1 0.060*** 0.068***
(3.313) (3.662)
CEO Tenure -0.005 -0.005
(-1.583) (-1.585)
Constant 3.787*** 3.731***
(13.887) (13.565)
Fixed Effects Industry, Year Industry, Year
Observations 29,973 29,973
Adjusted R-squared 0.380 0.378
Notes to Table 9: This table presents regression results examining the association between efficiency and CEO total
Compensation. The dependent variable in the regressions is the natural log of total CEO compensation (TDC1).
EFFICIENCY-EXECUCOMP SUBSAMPLE is efficiency measured for the full Compustat population.
EFFICIENCY-EXECUCOMP ONLY is efficiency measured with only Execucomp firm-years. ROA is the return
on assets (IB / avg(AT)) measured for both for the current and past year; STDROA is the three-year standard
deviation of ROA. RETURN is the one-year buy-and-hold return (calculated with monthly CRSP data) measured for
both the current and past year; STDRETURN is the three-year standard deviation of return (based on 36 monthly
observations). SALES is the reported revenue (SALE). MARKET-TO-BOOK is the total firm market value over the
total firm book value ((CSHO*PRCC_C)+(DLTT+DLC) / AT). CEO TENURE is the number of years the CEO has
been with the firm. All variables are winsorized at the top and bottom 1%. Regressions include fixed effects for
industry and year, and I cluster standard errors by firm and year. *** indicates statistical significance at the 1%
level.

53

You might also like